Submitted:
24 October 2024
Posted:
24 October 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Backbone Network Module
2.2. Decoupled Head Module
2.3. SIoU_LOSS
2.4. Model Evaluation Index
3. Results
3.1. Experimental Process and Environment Configuration
3.2. Experimental Data
3.3. Ablation Experiment
3.4. Contrast Experiment
3.5. Experimental Effect
3.6. Experimental Analysis
3.7. Visual Window
4. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Deshpande, N. M., Gite, S., & Aluvalu, R. (2021). A review of microscopic analysis of blood cells for disease detection with AI perspective. PeerJ. Computer science, 7, e460. [CrossRef]
- Tiwari, P., Qian, J., Li, Q., Wang, B., Gupta, D., Khanna, A., Rodrigues, J., & Albuquerque, V.H. (2018). Detection of subtype blood cells using deep learning. Cognitive Systems Research, 52, 1036-1044. [CrossRef]
- Liang, S., & Gu, Y. (2021). A deep convolutional neural network to simultaneously localize and recognize waste types in images. Waste management, 126, 247-257 . [CrossRef]
- Viraktamath, D.S., Yavagal, M., & Byahatti, R. (2021). Object Detection and Classification using YOLOv3.
- Mahto, P. , Garg, P. , Seth, P. , & Panda, J. . Refining yolov4 for vehicle detection. Social Science Electronic Publishing.
- Wang, P., Fu, S., & Cao, X.R. (2022). Improved Lightweight Target Detection Algorithm for Complex Roads with YOLOv5. 2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), 275-283.
- Chen, J., Zhu, J., Li, Z., & Yang, X. (2023). YOLOv7-WFD: A Novel Convolutional Neural Network Model for Helmet Detection in High-Risk Workplaces. IEEE Access, 11, 113580-113592. [CrossRef]
- Han, S., Jiang, X., & Wu, Z. (2023). An Improved YOLOv5 Algorithm for Wood Defect Detection Based on Attention. IEEE Access, 11, 71800-71810. [CrossRef]
- Li, L.; Zhang, R.; Xie, T.; He, Y.; Zhou, H.; Zhang, Y. Experimental Design of Steel Surface Defect Detection Based on MSFE-YOLO—An Improved YOLOV5 Algorithm with Multi-Scale Feature Extraction. Electronics 2024, 13, 3783. [CrossRef]
- Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., & Berg, A.C. (2015). SSD: Single Shot MultiBox Detector. European Conference on Computer Vision.
- Redmon, J., & Angelova, A. (2014). Real-time grasp detection using convolutional neural networks. 2015 IEEE International Conference on Robotics and Automation (ICRA), 1316-1322.
- Redmon, J., & Farhadi, A. (2016). YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525.
- Redmon, J., & Farhadi, A. (2018). YOLOv3: An Incremental Improvement. ArXiv, abs/1804.02767.
- Jocher, G.R., Stoken, A., Borovec, J., NanoCode, Chaurasia, A., TaoXie, Liu, C., Abhiram, Laughing, tkianai, yxNONG, Hogan, A., lorenzomammana, AlexWang, Hájek, J., Diaconu, L., Marc, Kwon, Y., Oleg, wanghaoyang, Defretin, Y., Lohia, A., ah, M., Milanko, B., Fineran, B., Khromov, D.P., Yiwei, D., Doug, Durgesh, & Ingham, F. (2021). ultralytics/yolov5: v5.0—YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations.
- Girshick, R.B., Donahue, J., Darrell, T., Malik, J., & Berkeley, U. (2013). Rich feature hierarchies for accurate object detection and semantic segmentation Tech report.
- He, K., Zhang, X., Ren, S., & Sun, J. (2014). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1904-1916. [CrossRef]
- Ren, S., He, K., Girshick, R.B., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 1137-1149. [CrossRef]
- Xue, B., Sun, C., Chu, H., Meng, Q., & Jiao, S. (2020). Method of Electronic Component Location, Grasping and Inserting Based on Machine Vision. 2020 International Wireless Communications and Mobile Computing (IWCMC), 1968-1971.
- Wang, C., Bochkovskiy, A., & Liao, H.M. (2022). YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7464-7475.
- Li, Y.; Wang, Y.; Lu, L.; An, Q. YOD-SLAM: An Indoor Dynamic VSLAM Algorithm Based on the YOLOv8 Model and Depth Information. Electronics 2024, 13, 3633. [CrossRef]
- Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). YOLOX: Exceeding YOLO Series in 2021. ArXiv, abs/2107.08430.
- Redmon, J., Divvala, S.K., Girshick, R.B., & Farhadi, A. (2015). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788.
- Zhu, X., Lyu, S., Wang, X., & Zhao, Q. (2021). TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2778-2788. [CrossRef]
- Wang, C., Liao, H.M., Yeh, I., Wu, Y., Chen, P., & Hsieh, J. (2019). CSPNet: A New Backbone that can Enhance Learning Capability of CNN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1571-1580.
- Srinivas, A., Lin, T., Parmar, N., Shlens, J., Abbeel, P., & Vaswani, A. (2021). Bottleneck Transformers for Visual Recognition. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16514-16524.
- Cai, L., Janowicz, K., Mai, G., Yan, B., & Zhu, R. (2020). Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting. Transactions in GIS, 24, 736–755. [CrossRef]
- Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J. (2021). YOLOX: Exceeding YOLO Series in 2021. ArXiv, abs/2107.08430.
- Huang, Z., & Wang, J. (2019). DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection. Inf. Sci., 522, 241-258. [CrossRef]
- Zhang, Y., Li, H., Wang, R., Zhang, M., & Hu, X. (2022). Constrained-SIoU: A Metric for Horizontal Candidates in Multi-Oriented Object Detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15, 956-967.
- Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., & Zuo, W. (2020). Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Transactions on Cybernetics, 52, 8574-8586. [CrossRef]









| Method | The first group | The second group | The third group | The fourth group |
| YOLOv5s | √ | √ | √ | √ |
| BotNet | √ | √ | √ | |
| Decoupled Head | √ | √ | ||
| SIoU_Loss | √ | |||
| Map(%) | 83.2 | 83.3 | 83.2 | 83.8 |
| Model | Map (%) | Recall rate (%) |
|---|---|---|
| YOLOv8s | 79.9 | 96 |
| YOLOv5-BS | 83.8 | 99 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).