Submitted:
13 May 2024
Posted:
14 May 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Introduction of the pyramid training paradigm for efficient optimization of CNNs.
- Detailed methodology for iterative network construction and integration to progressively achieve desired accuracy.
- Implementation of a feature size reduction strategy using a similarity matrix and averaging technique to enhance the network’s architectural efficiency.
2. Related Work
3. Methodology
3.1. Pyramid Training
3.2. Similarity Comparison
3.3. Size Reduction
3.4. Quantization
3.4.1. Fixed-Point Quantization
3.4.2. Integer Quantization
3.4.3. Vector Quantization
3.5. Experimental Fields
3.5.1. Classification
3.5.2. Semantic Segmentation
3.5.3. Object Detection
3.6. Individual Blocks
3.7. Method Description
4. Experiments
4.1. Dataset and Task
- Classification: We utilized the CIFAR-10 dataset [27], a benchmark for image classification tasks, to evaluate the effectiveness of our pyramid training and feature reduction techniques.
- Segmentation: For segmentation tasks, the KITTI dataset [32], which is broadly employed for object detection, tracking, and segmentation, was used.
- Object Detection: The Penn-Fudan Database for Pedestrian Detection [33] was chosen for object detection experiments. This dataset, comprising diverse urban scenes, focuses on pedestrian detection, offering a challenging testbed for our methodologies.
4.2. Experimental Setup
- 1.
- A local GPU setup powered by an RTX-3060 with 6GB of memory.
- 2.
- A cloud-based Google Colab session configured to utilize the T4 GPU runtime.
- Test set accuracy for classification tasks.
- Training loss and validation loss for segmentation tasks.
- Various loss metrics (Loss Box Reg, Loss Classifier, Loss RPN Box Reg, Total Loss) for object detection tasks.
- Training time and model size, measured in megabytes (MB) or gigabytes (GB).
4.3. Experimental Configurations
- BaseNetwork: This configuration serves as a control, comprising Block 1 and Block 2 combined without any feature size reduction steps.
- ReducedNetwork: Incorporates a feature size reduction step between Conv2 and Conv3 layers. This network undergoes iterative pyramid training, progressively refining the model and reducing its complexity by minimizing redundant feature representations.
- Quantization: Both BaseNetwork and ReducedNetwork are subjected to quantization to compress model size and enhance computational efficiency.
- Reduced+Quantization: This setup investigates the synergistic effect of combining feature size reduction with quantization, focusing on the rate of model size reduction and operational efficiency.
- U-Net: Employed for semantic segmentation, this network’s architecture is designed to efficiently segment images by progressively reducing and then expanding the resolution of intermediary representations.
- U-Net-Reduced:A simplified variant of U-Net, where feature size reduction techniques are applied to decrease the complexity and computational demands of the network while aiming to maintain effective segmentation performance.
- Faster R-CNN: Utilizes a two-stage approach for object detection, combining region proposal networks with a CNN classifier. The backbones used in these experiments include CustomBackbone and VGG16, adapted for this framework.
- Faster-RCNN-Reduced: A reduced-complexity version of Faster R-CNN, where feature size reduction techniques are implemented within the convolutional layers of the backbone to explore potential benefits in scenarios with limited computational resources.
4.4. Results
5. Conclusion
- Efficiency: It significantly reduces redundancy and enhances computational efficiency, crucial for environments with limited computational resources.
- Information Preservation: This method ensures that essential features are retained, which is vital for maintaining the effectiveness of the model.
- Scalability: Similarity comparison is flexible and can be applied across different network layers and architectures, making it widely applicable.
- Energy Conservation: Particularly beneficial for mobile and embedded systems, reducing computational demands helps in conserving energy.
Author Contributions
Funding
Conflicts of Interest
References
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, "Going Deeper with Convolutions," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. [CrossRef]
- S. Sladojević, M. Arsenović, A. Anderla, D. Ćulibrk, D. Stefanović, "Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification," Computational Intelligence and Neuroscience, vol. 2016, pp. 1–11, 2016. [CrossRef]
- J. Redmon, S. Divvala, R. Girshick, A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [CrossRef]
- Z. Zhao, P. Zheng, S. Xu, X. Wu, "Object Detection with Deep Learning: A Review," IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212–3232, 2019. [CrossRef]
- A. Khan, A. Sohail, U. Zahoora, A. Qureshi, "A Survey of the Recent Architectures of Deep Convolutional Neural Networks," Artificial Intelligence Review, vol. 53, no. 8, pp. 5455–5516, 2020. [CrossRef]
- P. Malhotra, S. Gupta, D. Koundal, A. Zaguia, W. Enbeyle, "Deep Neural Networks for Medical Image Segmentation," Journal of Healthcare Engineering, vol. 2022, pp. 1–15, 2022. [CrossRef]
- S. Singh, L. Wang, S. Gupta, H. Goli, P. Padmanabhan, B. Gulyás, "3D Deep Learning on Medical Images: A Review," Sensors, vol. 20, no. 18, p. 5097, 2020. [CrossRef]
- M. Junaid, Z. Szalay, Á. Török, "Evaluation of Non-Classical Decision-Making Methods in Self-Driving Cars: Pedestrian Detection Testing on a Cluster of Images with Different Luminance Conditions," Energies, vol. 14, no. 21, p. 7172, 2021. [CrossRef]
- K. Nakata, D. Miyashita, J. Deguchi, R. Fujimoto, "Adaptive Quantization Method for CNN with Computational-Complexity-Aware Regularization," in 2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021. [CrossRef]
- Y. Cai, W. Hua, H. Chen, G. Suh, C. Sa, Z. Zhang, "Structured Pruning Is All You Need for Pruning CNNs at Initialization," 2022. Available: https://arxiv.org/abs/2203.02549.
- G. Hinton, O. Vinyals, J. Dean, "Distilling the Knowledge in a Neural Network," arXiv preprint arXiv:1503.02531, 2015.
- Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, "Gradient-Based Learning Applied to Document Recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [CrossRef]
- A. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Adam, "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," 2017. Available: https://arxiv.org/abs/1704.04861.
- C.-J. Hsieh, M. A. Sustik, I. S. Dhillon, P. Ravikumar, "Low-Rank Matrix Factorization for Deep Neural Network Training with High-Dimensional Output Targets," arXiv preprint arXiv:1611.05725, 2017.
- Z. Liu, M. Sun, T. Zhou, G. Huang, T. Darrell, "Dynamic Network Surgery for Efficient DNNs," arXiv preprint arXiv:2003.02389, 2019.
- B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, "Quantization and training of neural networks for efficient integer-arithmetic-only inference," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. [CrossRef]
- J. Chen, W. K. Cheung, "Similarity preserving deep asymmetric quantization for image retrieval," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 8183-8190, 2019. [CrossRef]
- P. Molchanov, S. Tyree, T. Karras, T. Aila, J. Kautz, "Pruning convolutional neural networks for resource efficient inference," 2016. Available: https://arxiv.org/abs/1611.06440.
- H. Li, A. Kadav, I. Durdanovic, H. Samet, H. P. Graf, "Pruning filters for efficient convnets," 2016. Available: https://arxiv.org/abs/1608.08710.
- J. Frankle, G. K. Dziugaite, D. M. Roy, M. Carbin, "Stabilizing the lottery ticket hypothesis," 2019. Available: https://arxiv.org/abs/1903.01611.
- J. Gou, B. Yu, S. J. Maybank, D. Tao, "Knowledge distillation: a survey," International Journal of Computer Vision, vol. 129, no. 6, pp. 1789-1819, 2021. [CrossRef]
- Y. Zhang, Z. Lin, J. Jiang, Q. Zhang, Y. Wang, H. Xue, C. Zhang, Y. Yang, "Deeper insights into weight sharing in neural architecture search," 2020. Available: https://arxiv.org/abs/2001.01431.
- Y. Koren, R. M. Bell, C. Volinsky, "Matrix factorization techniques for recommender systems," Computer, vol. 42, no. 8, pp. 30-37, 2009. [CrossRef]
- . [CrossRef]
- J. Gibson, H. Oh, "Mutual information loss in pyramidal image processing," Information, vol. 11, no. 6, p. 322, 2020. [CrossRef]
- J. Mao, M. Niu, H. Bai, X. Liang, H. Xu, C. Xu, "Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection," 2021. Available: https://arxiv.org/abs/2109.02499.
- A. Krizhevsky, V. Nair, G. Hinton, "CIFAR-10 (Canadian Institute for Advanced Research)," 2009. Available: https://www.cs.toronto.edu/~kriz/cifar.html.
- O. Ronneberger, P. Fischer, T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015.
- S. Ren, K. He, R. Girshick, J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," in Advances in Neural Information Processing Systems (NeurIPS), 2015.
- K. He, X. Zhang, S. Ren, J. Sun, "Deep Residual Learning for Image Recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- K. Simonyan, A. Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition," arXiv preprint arXiv:1409.1556, 2014.
- A. Geiger, P. Lenz, R. Urtasun, "The KITTI Vision Benchmark Suite," in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
- L. Wang, J. Shi, G. Song, I.-F. Shen, "Object detection combining recognition and segmentation," in Proceedings of the 8th Asian Conference on Computer Vision - Volume Part I, Springer-Verlag, 2007, pp. 189–199. [CrossRef]




| Layer | Type | Input Channels | Output Channels | Kernel Size | Stride | Padding |
|---|---|---|---|---|---|---|
| Conv1 | Conv2d | 3 | 16 | 3x3 | 1 | 1 |
| Pool1 | MaxPool2d | - | - | 2x2 | 2 | 0 |
| Conv2 | Conv2d | 16 | 32 | 3x3 | 1 | 1 |
| Pool2 | MaxPool2d | - | - | 2x2 | 2 | 0 |
| FC1 | Linear | - | 128 | - | - | - |
| FC2 | Linear | - | 10 | - | - | - |
| Layer | Type | Input Channels | Output Channels | Kernel Size | Stride | Padding |
|---|---|---|---|---|---|---|
| Conv3 | Conv2d | 16 | 32 | 3x3 | 1 | 1 |
| Pool1 | MaxPool2d | - | - | 2x2 | 2 | 0 |
| Conv4 | Conv2d | 32 | 128 | 3x3 | 1 | 1 |
| Pool2 | MaxPool2d | - | - | 2x2 | 2 | 0 |
| FC3 | Linear | - | 128 | - | - | - |
| FC4 | Linear | - | 10 | - | - | - |
| Layer | Type | Input Channels | Output Channels | Kernel Size | Stride | Padding |
|---|---|---|---|---|---|---|
| Conv1 | Conv2d | 3 | 16 | 3x3 | 1 | 1 |
| Pool1 | MaxPool2d | - | - | 2x2 | 2 | 0 |
| Conv2 | Conv2d | 16 | 32 | 3x3 | 1 | 1 |
| Pool2 | MaxPool2d | - | - | 2x2 | 2 | 0 |
| Conv3 | Conv2d | 16 | 32 | 3x3 | 1 | 1 |
| Pool1 | MaxPool2d | - | - | 2x2 | 2 | 0 |
| Conv4 | Conv2d | 32 | 128 | 3x3 | 1 | 1 |
| Pool2 | MaxPool2d | - | - | 2x2 | 2 | 0 |
| FC3 | Linear | - | 128 | - | - | - |
| FC4 | Linear | - | 10 | - | - | - |
| Network | Training Time (s) | Test Set Accuracy (%) | Size (MB) |
|---|---|---|---|
| BaseNetwork | 366.51 | 77.31 | 0.65 |
| ReducedNetwork | 613.78 | 74.77 | 0.17 |
| Quantization+BaseNetwork | 343.25 | 76.51 | 0.32 |
| Reduced+Quantization | 601.47 | 73.53 | 0.10 |
| Networks | Train Loss | Validation Loss | Training Time (s) | Size (MB) |
|---|---|---|---|---|
| U-Net | 1.004 | 0.8484 | 1683 | 31 MB |
| U-Net-Reduced | 0.6522 | 0.8945 | 2562 | 7 MB |
| Networks | Loss Box Reg | Loss Classifier | Loss RPN Box Reg | Loss | Training Time (s) | Memory Allocation (GB) |
|---|---|---|---|---|---|---|
| CustomBackbone a | 0.3894 | 0.4018 | 0.3894 | 1.14 | 460 | 15.841 |
| CustomBackbone-Reduced | 0.4402 | 0.4343 | 0.4402 | 1.223 | 823 | 10.312 |
| VGG16 b | 0.1043 | 0.0701 | 0.1043 | 0.216 | 653 | 16.04 |
| VGG16-Reduced | 0.1894 | 0.1394 | 0.1894 | 0.4377 | 1100 | 9.96 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).