Submitted:
27 June 2024
Posted:
27 June 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We design and implement the C2f-DM module as a replacement for the current C2f module. The module efficiently integrates local and global information, significantly improving the ability to capture features of small objects while effectively mitigating detection accuracy issues caused by object overlap.
- We propose a feature fusion technique based on the attention mechanism, named BGFPN. This technique utilizes an efficient feature aggregation network and re-parameterization technology to optimize the interaction of information between feature maps of different scales. Through the BRA mechanism, it effectively captures key feature information of small objects.
- We propose a SMPDIoU loss function. This approach thoroughly accounts for the shape and dimensions of the detection boxes, strengthens the model’s focus on the attributes of detection boxes, and provides a more accurate bounding box regression loss calculation method.
2. Related Work
2.1. Feature Extraction
2.2. Feature Fusion
2.3. Optimization of Bounding Box Regression Loss Function
3. Fundamentals of the YOLO v8 Model
4. Methodology
4.1. Framework Overview
4.2. C2f-DM Module
4.3. Bi-Level Routing Attention in Gated Feature Pyramid Network
4.3.1. Improved Feature Fusion Method
4.3.2. Bi-Level Routing Attention
4.4. Smooth Mean Perpendicular Distance Intersection over Union
5. Experiments
5.1. Experimental Setup
| Configuration Item | Name | Specification |
|---|---|---|
| Hardware environment | GPU | NVIDIA GeForce RTX 3080 |
| CPU | Intel Core i7-11700K | |
| VRAM | 12G | |
| RAM | 64G | |
| Software environment | Operating System | Ubuntu 18.04 |
| Python | 3.8.12 | |
| Pytorch | 1.10.0 | |
| CUDA | 10.4 | |
| cuDNN | 7.6.5 |
| Hyperparameter Options | Setting |
|---|---|
| Epochs | 200 |
| Initial Learning Rate 0 | 0.01 |
| Learning Rate Float | 0.01 |
| Input Resolution | 640x640x3 |
| Weight_decay | 0.0005 |
| Momentum | 0.937 |
| Batch_size | 4 |
5.2. Overall Performance of HP-YOLOv8
5.3. Ablation Experiment
5.4. Comparison with Other Models
5.5. Experimental Results Presentation
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Zhang, Z. Drone-YOLO: an efficient neural network method for target detection in drone images. Drones 2023, 7, 526. [Google Scholar] [CrossRef]
- Zhao, D.; Shao, F.; Liu, Q.; Yang, L.; Zhang, H.; Zhang, Z. A Small Object Detection Method for Drone-Captured Images Based on Improved YOLOv7. Remote Sensing 2024, 16, 1002. [Google Scholar] [CrossRef]
- Zhang, J.; Yang, X.; He, W.; Ren, J.; Zhang, Q.; Zhao, Y.; Bai, R.; He, X.; Liu, J. Scale Optimization Using Evolutionary Reinforcement Learning for Object Detection on Drone Imagery. In Proceedings of the AAAI Conference on Artificial Intelligence; 2024; Volume 38, pp. 410–418. [Google Scholar]
- Rostami, M.; Farajollahi, A.; Parvin, H. Deep learning-based face detection and recognition on drones. Journal of Ambient Intelligence and Humanized Computing 2024, 15, 373–387. [Google Scholar] [CrossRef]
- Zeng, S.; Yang, W.; Jiao, Y.; Geng, L.; Chen, X. SCA-YOLO: A new small object detection model for UAV images. The Visual Computer 2024, 40, 1787–1803. [Google Scholar] [CrossRef]
- Lin, C.J.; Jhang, J.Y. Intelligent traffic-monitoring system based on YOLO and convolutional fuzzy neural networks. IEEE Access 2022, 10, 14120–14133. [Google Scholar] [CrossRef]
- Li, A.; Sun, S.; Zhang, Z.; Feng, M.; Wu, C.; Li, W. A multi-scale traffic object detection algorithm for road scenes based on improved YOLOv5. Electronics 2023, 12, 878. [Google Scholar] [CrossRef]
- Ghahremannezhad, H.; Shi, H.; Liu, C. Object detection in traffic videos: A survey. IEEE Transactions on Intelligent Transportation Systems 2023. [Google Scholar] [CrossRef]
- Lai, H.; Chen, L.; Liu, W.; Yan, Z.; Ye, S. STC-YOLO: Small object detection network for traffic signs in complex environments. Sensors 2023, 23, 5307. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.J.; Fang, J.J.; Liu, Y.X.; Le, H.F.; Rao, Z.Q.; Zhao, J.X. CR-YOLOv8: Multiscale object detection in traffic sign images. IEEE Access 2023, 12, 219–228. [Google Scholar] [CrossRef]
- Skripachev, V.; Guida, M.; Guida, N.; Zhukov, A. Investigation of convolutional neural networks for object detection in aerospace images. International Journal of Open Information Technologies 2022, 10, 54–64. [Google Scholar]
- Shi, Q.; Li, L.; Feng, J.; Chen, W.; Yu, J. Automated Model Hardening with Reinforcement Learning for On-Orbit Object Detectors with Convolutional Neural Networks. Aerospace 2023, 10, 88. [Google Scholar] [CrossRef]
- Noroozi, M.; Shah, A. Towards optimal foreign object debris detection in an airport environment. Expert Systems with Applications 2023, 213, 118829. [Google Scholar] [CrossRef]
- Ma, Y.; Zhou, D.; He, Y.; Zhao, L.; Cheng, P.; Li, H.; Chen, K. Aircraft-LBDet: Multi-Task Aircraft Detection with Landmark and Bounding Box Detection. Remote Sensing 2023, 15, 2485. [Google Scholar] [CrossRef]
- Chen, H.B.; Jiang, S.; He, G.; Zhang, B.; Yu, H. TEANS: a target enhancement and attenuated nonmaximum suppression object detector for remote sensing images. IEEE Geoscience and Remote Sensing Letters 2020, 18, 632–636. [Google Scholar] [CrossRef]
- Hou, L.; Lu, K.; Xue, J.; Hao, L. Cascade detector with feature fusion for arbitrary-oriented objects in remote sensing images. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME); IEEE, 2020; pp. 1–6. [Google Scholar]
- Lu, X.; Ji, J.; Xing, Z.; Miao, Q. Attention and feature fusion SSD for remote sensing object detection. IEEE Transactions on Instrumentation and Measurement 2021, 70, 1–9. [Google Scholar] [CrossRef]
- Li, Q.; Mou, L.; Liu, Q.; Wang, Y.; Zhu, X.X. HSF-Net: Multiscale deep feature embedding for ship detection in optical remote sensing imagery. IEEE transactions on geoscience and remote sensing 2018, 56, 7147–7161. [Google Scholar] [CrossRef]
- Dong, R.; Xu, D.; Zhao, J.; Jiao, L.; An, J. Sig-NMS-based faster R-CNN combining transfer learning for small target detection in VHR optical remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing 2019, 57, 8534–8545. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhong, Y.; Ma, A.; Han, X.; Zhao, J.; Liu, Y.; Zhang, L. HyNet: Hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing 2020, 166, 1–14. [Google Scholar] [CrossRef]
- Xi, Y.; Jia, W.; Miao, Q.; Feng, J.; Liu, X.; Li, F. Coderainnet: Collaborative deraining network for drone-view object detection in rainy weather conditions. Remote Sensing 2023, 15, 1487. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Communications of the ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Shen, L.; Lang, B.; Song, Z. DS-YOLOv8-Based Object Detection Method for Remote Sensing Images. IEEE Access 2023, 11, 125122–125137. [Google Scholar] [CrossRef]
- Zhai, X.; Huang, Z.; Li, T.; Liu, H.; Wang, S. YOLO-Drone: An Optimized YOLOv8 Network for Tiny UAV Object Detection. Electronics 2023, 12, 3664. [Google Scholar] [CrossRef]
- Yi, H.; Liu, B.; Zhao, B.; Liu, E. Small Object Detection Algorithm Based on Improved YOLOv8 for Remote Sensing. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2023. [Google Scholar] [CrossRef]
- Lin, B.; Wang, J.; Wang, H.; Zhong, L.; Yang, X.; Zhang, X. Small Space Target Detection Based on a Convolutional Neural Network and Guidance Information. Aerospace 2023, 10, 426. [Google Scholar] [CrossRef]
- Sun, Y.; Zhang, Y.; Wang, H.; Guo, J.; Zheng, J.; Ning, H. SES-YOLOv8n: automatic driving object detection algorithm based on improved YOLOv8. Signal, Image and Video Processing 2024, 18, 3983–3992. [Google Scholar] [CrossRef]
- Yang, G.; Wang, J.; Nie, Z.; Yang, H.; Yu, S. A lightweight YOLOv8 tomato detection algorithm combining feature enhancement and attention. Agronomy 2023, 13, 1824. [Google Scholar] [CrossRef]
- Wang, X.; Gao, H.; Jia, Z.; Li, Z. BL-YOLOv8: An improved road defect detection model based on YOLOv8. Sensors 2023, 23, 8361. [Google Scholar] [CrossRef]
- Li, Y.; Fan, Q.; Huang, H.; Han, Z.; Gu, Q. A modified YOLOv8 detection network for UAV aerial image recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition; 2018; pp. 6154–6162. [Google Scholar]
- Law, H.; Deng, J. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV); 2018; pp. 734–750. [Google Scholar]
- Safaldin, M.; Zaghden, N.; Mejdoub, M. An Improved YOLOv8 to Detect Moving Objects. IEEE Access 2024. [Google Scholar] [CrossRef]
- Wu, T.; Dong, Y. YOLO-SE: Improved YOLOv8 for remote sensing object detection and recognition. Applied Sciences 2023, 13, 12977. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition; 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Li, Y.; Li, X.; Dai, Y.; Hou, Q.; Liu, L.; Liu, Y.; Cheng, M.M.; Yang, J. LSKNet: A Foundation Lightweight Backbone for Remote Sensing. arXiv 2024, arXiv:2403.11735. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF international conference on computer vision; 2021; pp. 2778–2788. [Google Scholar]
- Zhang, J.; Lei, J.; Xie, W.; Fang, Z.; Li, Y.; Du, Q. SuperYOLO: Super resolution assisted object detection in multimodal remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing 2023, 61, 1–15. [Google Scholar]
- Rahman, M.A.; Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. International symposium on visual computing. Springer, 2016; pp. 234–244. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence; 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2019; pp. 658–666. [Google Scholar]
- Zhang, H.; Zhang, S. Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale. arXiv 2023, arXiv:2312.17663. [Google Scholar]
- Siliang, M.; Yong, X. Mpdiou: a loss for efficient and accurate bounding box regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]
- Lou, M.; Zhou, H.Y.; Yang, S.; Yu, Y. TransXNet: Learning both global and local dynamics with a dual dynamic token mixer for visual recognition. arXiv 2023, arXiv:2310.19380. [Google Scholar]
- Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. Biformer: Vision transformer with bi-level routing attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2023; pp. 10323–10333. [Google Scholar]
- Xu, X.; Jiang, Y.; Chen, W.; Huang, Y.; Zhang, Y.; Sun, X. Damo-yolo: A report on real-time object detection design. arXiv 2022, arXiv:2211.15444. [Google Scholar]
- Soudy, M.; Afify, Y.; Badr, N. RepConv: A novel architecture for image scene classification on Intel scenes dataset. International Journal of Intelligent Computing and Information Sciences 2022, 22, 63–73. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Transactions on Geoscience and Remote Sensing 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE transactions on geoscience and remote sensing 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q. Vision meets drones: A challenge. arXiv 2018, arXiv:1804.0743. [Google Scholar]
- Zhang, H.; Chang, H.; Ma, B.; Wang, N.; Chen, X. Dynamic R-CNN: Towards high quality object detection via dynamic training. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XV 16. Springer, 2020; pp. 260–275. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision; 2015; pp. 1440–1448. [Google Scholar]









| Model | Class | Aircraft | Oiltank | Overpass | Playground |
|---|---|---|---|---|---|
| YOLOv8 | P | 96.52 | 97.83 | 71.92 | 95.31 |
| R | 91.62 | 94.34 | 70.21 | 96.82 | |
| AP | 95.34 | 97.05 | 68.87 | 98.02 | |
| HP-YOLOv8 | P | 97.23 | 96.85 | 87.42 | 96.65 |
| R | 90.76 | 92.23 | 81.94 | 97.23 | |
| AP | 95.82 | 98.25 | 87.46 | 98.02 |
| Model | Class | Bridge | Ground Track Field | Ship | Baseball Diamond | Airplane | Basketball Court | Vehicle | Tennis Court | Harbor | Storage Tank |
|---|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv8 | P | 95.95 | 76.84 | 98.65 | 93.89 | 94.56 | 89.94 | 90.12 | 93.21 | 98.45 | 92.82 |
| R | 80.23 | 54,76 | 94.78 | 92.56 | 85.80 | 70.64 | 64.87 | 85.46 | 99.25 | 82.98 | |
| AP | 90.73 | 64.73 | 99.01 | 95.30 | 92.54 | 85.28 | 67.99 | 91.84 | 96.17 | 86.56 | |
| HP-YOLOv8 | P | 96.87 | 97.56 | 98.4 | 92.34 | 96.45 | 94.87 | 91.96 | 95.45 | 98.78 | 93.71 |
| R | 86.65 | 97.50 | 93.97 | 93.48 | 97.89 | 87.62 | 73.45 | 78.21 | 83.45 | 80.67 | |
| AP | 91.15 | 95.45 | 98.32 | 96.66 | 99.33 | 91.84 | 88.63 | 87.84 | 92.13 | 89.20 |
| Model | Class | Van | Pedestrian | Car | Bicycle | Person | Motor | Bus | Tricycle | Truck | Awning-Tricycle |
|---|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv8 | P | 48.47 | 46.87 | 84.98 | 13.78 | 38.07 | 50.26 | 61.72 | 31.88 | 42.87 | 17.87 |
| R | 38.74 | 35.89 | 71.28 | 8.32 | 26.81 | 41.63 | 52.42 | 23.69 | 30.76 | 10.43 | |
| AP | 42.75 | 41.37 | 76.89 | 11.35 | 29.78 | 44.82 | 56.32 | 26.93 | 35.49 | 14.10 | |
| HP-YOLOv8 | P | 62.86 | 63.56 | 92.43 | 42.65 | 53.78 | 63.41 | 73.90 | 44.98 | 47.88 | 37.65 |
| R | 52.56 | 58.72 | 90.02 | 35.62 | 44.69 | 58.32 | 67.54 | 34.12 | 41.42 | 28.64 | |
| AP | 57.45 | 60.30 | 90.05 | 37.55 | 48.22 | 60.41 | 69.77 | 37.62 | 43.33 | 30.27 |
| Model | Params | FPS | P | R | mAP@0.5 | mAP@0.5:0.95 | |||
|---|---|---|---|---|---|---|---|---|---|
| YOLOv8 | C2f-DM | BGFPN | SMPDIoU | ||||||
| ✔ | 43.41M | 75.78 | 89.18 | 89.27 | 89.82 | 57.01 | |||
| ✔ | ✔ | 44.14M | 63.35 | 89.86 | 91.36 | 91.52 | 64.23 | ||
| ✔ | ✔ | 24.61M | 60.49 | 91.78 | 92.41 | 92.56 | 67.78 | ||
| ✔ | ✔ | 43.61M | 75.78 | 90.05 | 91.54 | 91.45 | 64.12 | ||
| ✔ | ✔ | ✔ | 28.52M | 55.46 | 91.89 | 93.78 | 93.98 | 69.78 | |
| ✔ | ✔ | ✔ | ✔ | 28.52M | 55.46 | 92.21 | 94.22 | 95.11 | 72.03 |
| Model | mAP@0.5 | mAP@0.5:0.95 | Params | FPS |
|---|---|---|---|---|
| Faster R-CNN [54] | 85.46 | 54.45 | 42.47M | 31.73 |
| Cascade R-CNN [31] | 86.21 | 55.31 | 70.62M | 26.48 |
| CenterNet [32] | 87.79 | 56.14 | 33.34M | 34.37 |
| Dynamic-RCNN [53] | 85.30 | 55.86 | 42.78M | 31.35 |
| LSKNet [37] | 87.74 | 56.35 | 29.88M | 48.75 |
| TPH-YOLO [38] | 90.46 | 57.32 | 53.59M | 56.26 |
| SuperYOLO [39] | 90.78 | 59.30 | 54.66M | 32.21 |
| LAR-YOLOv8 [25] | 92.92 | 61.55 | 28.56M | 54.89 |
| YOLOv8 | 89.82 | 57.76 | 44.60M | 75.78 |
| HP-YOLOv8 (Ours) | 95.11 | 72.03 | 27.52M | 55.46 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).