Deploying AI on IoT edge devices is limited by the high computational and memory demands of traditional convolutional neural networks (CNNs). These constraints hinder real-time applications in areas like autonomous systems, smart healthcare, and mobile devices. Efficient model compression is essential to make deep learning viable on resource-constrained hardware.
This work introduces a layer-wise, range-based threshold pruning technique that dynamically adjusts pruning thresholds based on weight distribution in each layer. Unlike fixed-threshold methods, it enables fine-grained pruning while preserving accuracy. The approach was tested on the LeNet-5 architecture using MNIST, Fashion-MNIST, and SVHN datasets, achieving up to 64% weight reduction with only 1–4% accuracy loss. These results demonstrate practical effectiveness, reducing memory and computing needs while maintaining performance—ideal for real-world edge AI deployments.
Flowchart for proposed model based on range-based optimization