Submitted:
12 July 2024
Posted:
15 July 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- The YOLO5s-ADual model is introduced in this study. By utilizing DualConv and C3, we propose a more efficient C3Dual architecture to replace the CBL module in the backbone of the YOLOv5 object detection model. Additionally, this architecture incorporates ADown from YOLOv9 in place of the Conv module in both the head and backbone of YOLOv5s. As a result, the model becomes lightweight, enabling faster reasoning and reducing the need for hardware resources, thereby making it more suitable for mobile device deployment.
- In the actual scenario of TSD, since there are more small objects for traffic sign detection, the CBAM attention mechanism is adopted to improve the small object detection performance.
- The experimental outcomes on the TT100k dataset demonstrate that the proposed method halves the parameters of the original YOLOv5s model while enhancing the mAP by 5 points. Compared to several contemporary lightweight object detection methods, the model presented in this paper demonstrates superior accuracy and a more lightweight structure.
2.1. Research on Two-Stage Approaches in Object Detection
2.2. Lightweight Traffic Sign Detection Network
2.3. YOLOv5s
3.1. Overview of YOLO-ADual
3.2. Adown
3.3. C3Dual
3.3. CBAM
3.3.1. Channel Attention Module
3.3.2. Spatial Attention Module
4. Experiment Analysis
4.1. Datasets
4.2. Experimental Environment
4.3. Experimental Details
| Parameters | Specific Information |
| Epoch | 300 |
| Image size | 640 |
| Batch size | 32 |
| Number of images | 9457 |
| Parameters | 3,817,609 |
| Layers | 246 |
| Algorithm 1: YOLO-ADual training algorithm with transfer learning |
|
Data: COCO Pretrained Weights, Specialized Datasets: TT100k Result: Fine-tuned YOLO-ADual Model Initialize YOLO-ADual neural network with COCO pretrained weights; Initialize training parameters: batch size = 32, initial learning rate = 0.01, total epochs = 300; Initialize loss function: Loss; Initialize optimizer: Adam; Initialize evaluation metrics: Recall, Precision; for epoch ← 1 to total_epochs do for batch ← 1 to total_batches do // Load a batch of training data from TT100k image_batch, ground_truth_batch ← LoadBatchFromDatasets(TT100k, batch_size); // Forward pass through the network predicted_boxes ← YOLO-ADual(image_batch); // Calculate Loss loss ← CalculateIoULoss(predicted_boxes, ground_truth_batch); // Backpropagation and weight update BackpropagateAndOptimize(loss); end // Adjust learning rate (e.g., learning rate decay) if epoch % learning_rate_decay_interval == 0 then AdjustLearningRate(optimizer, new_learning_rate); end // Evaluate the model on validation data recall, precision ← EvaluateModel(YOLO-ADual, ValidationData); if recall > threshold_recall and precision > threshold_precision then // Save the model if recall and precision meet the criteria SaveModel(YOLO-ADual, 'trained_model_epoch_' + epoch); end end |
4.4. Result Analysis
4.5. Ablation Study
4.6. Visualization Result
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Geiger, A.; Lenz, P.; Urtasun, R. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Providence, RI, June 2012; pp. 3354–3361.
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. ArXiv Prepr. ArXiv14091556 2014.
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the Proceedings of the IEEE International Conference on computer vision; 2017; pp. 2980–2988.
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision – ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, 2016; Vol. 9905, pp. 21–37.
- Mittal, U.; Chawla, P.; Tiwari, R. EnsembleNet: A Hybrid Approach for Vehicle Detection and Estimation of Traffic Density Based on Faster R-CNN and YOLO Models. Neural Comput. Appl. 2023, 35, 4755–4774. [CrossRef]
- Ghahremannezhad, H.; Shi, H.; Liu, C. Object Detection in Traffic Videos: A Survey. IEEE Trans. Intell. Transp. Syst. 2023, 24, 6780–6799. [CrossRef]
- Arora, N.; Kumar, Y.; Karkra, R.; Kumar, M. Automatic Vehicle Detection System in Different Environment Conditions Using Fast R-CNN. Multimed. Tools Appl. 2022, 81, 18715–18735. [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition; 2017; pp. 1251–1258.
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters And< 0.5 MB Model Size. ArXiv Prepr. ArXiv160207360 2016.
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. ArXiv Prepr. ArXiv170404861 2017.
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition; 2018; pp. 6848–6856.
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition; 2016; pp. 779–788.
- Xia, X.; Xu, C.; Nan, B. Inception-v3 for Flower Classification. In Proceedings of the 2017 2nd international conference on image, vision and computing (ICIVC); IEEE, 2017; pp. 783–787.
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems; Pereira, F., Burges, C.J., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc., 2012; Vol. 25.
- Koonce, B. MobileNetV3. In Convolutional Neural Networks with Swift for Tensorflow; Apress: Berkeley, CA, 2021; pp. 125–144.
- Yifan Liu; BingHang Lu; Jingyu Peng; Zihao Zhang Research on the Use of YOLOv5 Object Detection Algorithm in Mask Wearing Recognition. World Sci. Res. J. 2020, 6. [CrossRef]
- Purkait, P.; Zhao, C.; Zach, C. SPP-Net: Deep Absolute Pose Regression with Synthetic Views. ArXiv Prepr. ArXiv171203452 2017.
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-Based Fully Convolutional Networks. In Proceedings of the Advances in Neural Information Processing Systems; Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc., 2016; Vol. 29.
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition; 2017; pp. 2117–2125.
- Zhang, S.; Che, S.; Liu, Z.; Zhang, X. A Real-Time and Lightweight Traffic Sign Detection Method Based on Ghost-YOLO. Multimed. Tools Appl. 2023, 82, 26063–26087. [CrossRef]
- Liu, P.; Xie, Z.; Li, T. UCN-YOLOv5: Traffic Sign Object Detection Algorithm Based on Deep Learning. IEEE Access 2023, 11, 110039–110050. [CrossRef]
- Li, Z.; Chen, H.; Biggio, B.; He, Y.; Cai, H.; Roli, F.; Xie, L. Toward Effective Traffic Sign Detection via Two-Stage Fusion Neural Networks. IEEE Trans. Intell. Transp. Syst. 2024, 1–12. [CrossRef]
- Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. ArXiv Prepr. ArXiv180402767 2018.
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal Speed and Accuracy of Object Detection. ArXiv Prepr. ArXiv200410934 2020.
- Li, S.; Wang, S.; Wang, P. A Small Object Detection Algorithm for Traffic Signs Based on Improved YOLOv7. Sensors 2023, 23, 7145. [CrossRef]
- Selcuk, B.; Serif, T. A Comparison of YOLOv5 and YOLOv8 in the Context of Mobile UI Detection. In Mobile Web and Intelligent Information Systems; Younas, M., Awan, I., Grønli, T.-M., Eds.; Lecture Notes in Computer Science; Springer Nature Switzerland: Cham, 2023; Vol. 13977, pp. 161–174.
- Bian, H.; Liu, Y.; Shi, L.; Lin, Z.; Huang, M.; Zhang, J.; Weng, G.; Zhang, C.; Gao, M. Detection Method of Helmet Wearing Based on UAV Images and Yolov7. In Proceedings of the 2023 IEEE 6th Information Technology,Networking,Electronic and Automation Control Conference (ITNEC); IEEE: Chongqing, China, February 24 2023; pp. 1633–1640.
- Mohd Yusof, N. ‘Izzaty; Sophian, A.; Mohd Zaki, H.F.; Bawono, A.A.; Embong, A.H.; Ashraf, A. Assessing the Performance of YOLOv5, YOLOv6, and YOLOv7 in Road Defect Detection and Classification: A Comparative Study. Bull. Electr. Eng. Inform. 2024, 13, 350–360. [CrossRef]
- Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. ArXiv Prepr. ArXiv240213616 2024.
- Zheng, Y.; Cui, Y.; Gao, X. An Infrared Dim-Small Target Detection Method Based on Improved YOLOv7. In Proceedings of the Proceedings of the 2023 Asia Conference on Computer Vision, Image Processing and Pattern Recognition; ACM: Phuket Thailand, April 28 2023; pp. 1–5.
- Zhong, J.; Chen, J.; Mian, A. DualConv: Dual Convolutional Kernels for Lightweight Deep Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2022.
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional Block Attention Module. In Proceedings of the Proceedings of the European conference on computer vision (ECCV); 2018; pp. 3–19.
- Li, Y.; Gong, Z.; Zhou, Y.; He, Y.; Huang, R. Production Evaluation of Citrus Fruits Based on the YOLOv5 Compressed by Knowledge Distillation. In Proceedings of the 2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD); IEEE: Rio de Janeiro, Brazil, May 24, 2023; pp. 1938–1943.
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding Yolo Series in 2021. ArXiv Prepr. ArXiv210708430 2021.










| Configuration | Name | Specific Information |
| Hardware Environment | CPU | Intel(R) core(TM)i5-13490F |
| GPU VRAM |
NVIDIA GeForce RTX4070Ti 12 GB |
|
| Memory | 16 GB | |
| Software Environment | Operating System | Windows 11 |
| Python Version PyTorch Version CUDA Version |
3.9.12 2.0.0 11.8 |
| Method | Precision | Recall | mAP@0.5 | Params | GFLOPs |
| Faster R-CNN[34] | 47.91 | 53.79 | 53.1 | - | - |
| Zhang et al. [20] | 56.10 | 52.10 | 55.5 | 6,655,232 | 8.6 |
| SSD | 51.45 | 53.76 | 53.26 | 641,473 | 3.1 |
| EfficientDet | 68.80 | 46.40 | 51.30 | 5,524,683 | 9.4 |
| YOLOv7-tiny[25] | 61.70 | 57.20 | 59.9 | 6,125,934 | 13.5 |
| YOLOX[35] | 64.47 | 58.27 | 65.4 | 5,044,797 | 15.30 |
| YOLOv5n | 67.90 | 45.10 | 48.8 | 1,820,743 | 4.4 |
| YOLOv5s | 69.34 | 63.23 | 67.6 | 7,365,671 | 17.3 |
| YOLO-ADual | 71.80 | 63.50 | 70.1 | 3,817,609 | 11.2 |
| Method | CBAM | ADown | C3Dual | mAP@0.5 | Params | GFLOPs |
| YOLOv5s | 67.6 | 7,365,671 | 17.3 | |||
| YOLOv5s | √ | 77.47 | 7,156,265 | 16.2 | ||
| YOLOv5s | √ | 72.2 | 5,468,903 | 12.9 | ||
| YOLOv5s | √ | 77.1 | 5,828,903 | 13.0 | ||
| YOLOv5s | √ | √ | 67.87 | 5,501,769 | 12.9 | |
| YOLOv5s | √ | √ | 69.6 | 4,163,719 | 9.5 | |
| YOLOv5s | √ | √ | √ | 70.1 | 3,817,609 | 11.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).