Submitted:
09 June 2025
Posted:
11 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Feature Extraction
- Low-resolution small targets are vulnerable to noise interference.
- Densely arranged targets suffer from feature adhesion.
- High-angle shooting induces significant geometric distortions.
2.2. Feature Fusion
2.3. Loss Functions
3. Proposed Model
3.1. Overview of YOLOv10
3.2. Proposed Method
3.2.1. Overall Network Architecture
3.2.2. Dynamic Dilated Snake Convolution (DDSConv)
3.2.3. Multi-Scale Feature Aggregation Module (MFAM)
3.2.4. MFAM-Neck
3.2.5. Loss Functions
4. Experimental Results and Analysis
4.1. Datasets
4.2. Experimental Environment and Parameters
4.3. Evaluation Metrics
4.4. Ablation Study Analysis
4.4.1. Effectiveness Analysis of Single and Multi-Module Improvements
4.4.2. Effectiveness of the MFAM-Neck Module
4.5. Experimental Results and Analysis on VisDrone Test Set
- Dynamic Dilated Snake Convolution kernels are introduced during feature extraction, substantially enhancing the network’s capability to capture critical features.
- A novel multi-scale feature fusion strategy is implemented by integrating high-resolution details from large-scale feature maps with deep semantic information from small-scale feature maps. This dual optimization simultaneously improves sensitivity to small targets and robustness for detecting background-similar objects.
4.6. Experimental Results and Analysis of the HIT-UAV Test Set
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bogle, B.M.; Rosamond, W.D.; Snyder, K.T.; Zègre-Hemsey, J.K. The Case for Drone-Assisted Emergency Response to Cardiac Arrest: An Optimized Statewide Deployment Approach. N. C. Med. J. 2019, 80, 204–212. [Google Scholar] [CrossRef] [PubMed]
- Raoult, V.; Colefax, A.P.; Allan, B.M.; Cagnazzi, D.; Castelblanco-Martínez, N.; Ierodiaconou, D.; Johnston, D.W.; Landeo-Yauri, S.; Lyons, M.; Pirotta, V.; et al. Operational Protocols for the Use of Drones in Marine Animal Research. Drones 2020, 4, 64. [Google Scholar] [CrossRef]
- Potter, B.; Valentino, G.; Yates, L.; Benzing, T.; Salman, A. Environmental Monitoring Using a Drone-Enabled Wireless Sensor Network. In Proceedings of the 2019 Systems and Information Engineering Design Symposium (SIEDS); 2019; pp. 1–6. [Google Scholar]
- Monteiro, J.G.; Jiménez, J.L.; Gizzi, F.; Přikryl, P.; Lefcheck, J.S.; Santos, R.S.; Canning-Clode, J. Novel Approach to Enhance Coastal Habitat and Biotope Mapping with Drone Aerial Imagery Analysis. Sci. Rep. 2021, 11, 574. [Google Scholar] [CrossRef]
- Kyrkou, C.; Theocharides, T. EmergencyNet: Efficient Aerial Image Classification for Drone-Based Emergency Monitoring Using Atrous Convolutional Feature Fusion. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1687–1699. [Google Scholar] [CrossRef]
- Ouattara, T.A.; Sokeng, V.-C.J.; Zo-Bi, I.C.; Kouamé, K.F.; Grinand, C.; Vaudry, R. Detection of Forest Tree Losses in Côte d’Ivoire Using Drone Aerial Images. Drones 2022, 6, 83. [Google Scholar] [CrossRef]
- Degollada, E.; Amigó, N.; O’Callaghan, S.A.; Varola, M.; Ruggero, K.; Tort, B. A Novel Technique for Photo-Identification of the Fin Whale, Balaenoptera Physalus, as Determined by Drone Aerial Images. Drones 2023, 7, 220. [Google Scholar] [CrossRef]
- Chen, J.; Wang, G.; Luo, L.; Gong, W.; Cheng, Z. Building Area Estimation in Drone Aerial Images Based on Mask R-CNN. IEEE Geosci. Remote Sens. Lett. 2021, 18, 891–894. [Google Scholar] [CrossRef]
- Hmidani, O.; Ismaili Alaoui, E.M. A Comprehensive Survey of the R-CNN Family for Object Detection. In Proceedings of the 2022 5th International Conference on Advanced Communication Technologies and Networking (CommNet); 2022; pp. 1–6. [Google Scholar]
- Xu, J.; Ren, H.; Cai, S.; Zhang, X. An Improved Faster R-CNN Algorithm for Assisted Detection of Lung Nodules. Comput. Biol. Med. 2023, 153, 106470. [Google Scholar] [CrossRef]
- Fu, X.; Wei, G.; Yuan, X.; Liang, Y.; Bo, Y. Efficient YOLOv7-Drone: An Enhanced Object Detection Approach for Drone Aerial Imagery. Drones 2023, 7, 616. [Google Scholar] [CrossRef]
- Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo Algorithm Developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
- Chen, Z.; Guo, H.; Yang, J.; Jiao, H.; Feng, Z.; Chen, L.; Gao, T. Fast Vehicle Detection Algorithm in Traffic Scene Based on Improved SSD. Measurement 2022, 201, 111655. [Google Scholar] [CrossRef]
- Zhao, X.; Xia, Y.; Zhang, W.; Zheng, C.; Zhang, Z. YOLO-ViT-Based Method for Unmanned Aerial Vehicle Infrared Vehicle Target Detection. Remote Sens. 2023, 15, 3778. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017; pp. 936–944. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018; pp. 8759–8768. [Google Scholar]
- Deng, S.; Li, S.; Xie, K.; Song, W.; Liao, X.; Hao, A.; Qin, H. A Global-Local Self-Adaptive Network for Drone-View Object Detection. IEEE Trans. Image Process. 2021, 30, 1556–1569. [Google Scholar] [CrossRef] [PubMed]
- Cai, D.; Lu, Z.; Fan, X.; Ding, W.; Li, B. Improved YOLOv4-Tiny Target Detection Method Based on Adaptive Self-Order Piecewise Enhancement and Multiscale Feature Optimization. Appl. Sci. 2023, 13, 8177. [Google Scholar] [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW); 2021; pp. 2778–2788. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.-T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the Computer Vision – ECCV 2018; Springer: Cham, Switzerland, 2018; pp. 122–138. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018; pp. 4510–4520. [Google Scholar]
- Saif, A.F.M.S.; Prabuwono, A.S.; Mahayuddin, Z.R. Moment Feature Based Fast Feature Extraction Algorithm for Moving Object Detection Using Aerial Images. PLOS ONE 2015, 10, e0126212. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); 2017; pp. 764–773. [Google Scholar]
- Du, B.; Huang, Y.; Chen, J.; Huang, D. Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023; pp. 13435–13444. [Google Scholar]
- Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2023; pp. 14408–14419. [Google Scholar]
- Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic Snake Convolution Based on Topological Geometric Constraints for Tubular Structure Segmentation. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV); 2023; pp. 6047–6056. [Google Scholar]
- Niu, Y.; Fan, S.; Cheng, X.; Yao, X.; Wang, Z.; Zhou, J. Road Crack Detection by Combining Dynamic Snake Convolution and Attention Mechanism. Appl. Sci. 2024, 14, 8100. [Google Scholar] [CrossRef]
- Chen, J.; Jin, W.; Liu, Y.; Huang, X.; Zhang, Y. Multi-Scale and Dynamic Snake Convolution-Based YOLOv9 for Steel Surface Defect Detection. J. Supercomput. 2025, 81, 541. [Google Scholar] [CrossRef]
- Wang, S.; Jiang, H.; Yang, J.; Ma, X.; Chen, J. AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images. Drones 2024, 8, 523. [Google Scholar] [CrossRef]
- Zhang, J.; Xie, J.; Gong, W. Object Detection Algorithm with Dual-Modal Rectification Fusion Based on Self-Guided Attention. J. Comput. Eng. Appl. 2023, 36, 793. [Google Scholar] [CrossRef]
- Yang, C.; Huang, Z.; Wang, N. QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2022; pp. 13658–13667. [Google Scholar]
- Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019; pp. 6568–6577. [Google Scholar]
- Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Han, K.; Wang, Y. Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism. Adv. Neural Inf. Process. Syst. 2023, 36, 51094–51112. [Google Scholar]
- Yu, J.; Wu, T.; Zhang, X.; Zhang, W. An Efficient Lightweight SAR Ship Target Detection Network with Improved Regression Loss Function and Enhanced Feature Information Expression. Sensors 2022, 22, 3447. [Google Scholar] [CrossRef] [PubMed]
- Zhu, C.; He, Y.; Savvides, M. Feature Selective Anchor-Free Module for Single-Shot Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019; pp. 840–849. [Google Scholar]
- Chen, X.; Li, L.; Li, Z.; Liu, M.; Li, Q.; Qi, H.; Ma, D.; Wen, Y.; Cao, G.; Yu, P.L.H. KD Loss: Enhancing Discriminability of Features with Kernel Trick for Object Detection in VHR Remote Sensing Images. Eng. Appl. Artif. Intell. 2024, 129, 107641. [Google Scholar] [CrossRef]
- Gevorgyan, Z. SIoU Loss: More Powerful Learning for Bounding Box Regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the Gap between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020; pp. 9756–9765. [Google Scholar]
- Ma, S.; Xu, Y. MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]
- Zhang, H.; Xu, C.; Zhang, S. Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box. arXiv 2023, arXiv:2311.02824. [Google Scholar]
- Wang, M.; Liang, Z.; Huang, H.; Liang, A.; Sun, H.; Zhao, Y. Research and Application of YOLOv10 Algorithm Based on Image Recognition. In Proceedings of the 5th International Conference on Artificial Intelligence and Computer Engineering; ACM: New York, NY, USA, 2025; pp. 535–540. [Google Scholar]
- Guan, S.; Lin, Y.; Lin, G.; Su, P.; Huang, S.; Meng, X.; Liu, P.; Yan, J. Real-Time Detection and Counting of Wheat Spikes Based on Improved YOLOv10. Agronomy 2024, 14, 1936. [Google Scholar] [CrossRef]
- Wang, Q.; Wang, X.; Hou, J.; Liu, X.; Wen, H.; Ji, Z. MF-YOLOv10: Research on the Improved YOLOv10 Intelligent Identification Algorithm for Goods. Sensors 2025, 25, 2975. [Google Scholar] [CrossRef]
- Samma, H.; Suandi, S.A.; Ismail, N.A.; Sulaiman, S.; Ping, L.L. Evolving Pre-Trained CNN Using Two-Layers Optimizer for Road Damage Detection from Drone Images. IEEE Access 2021, 9, 158215–158226. [Google Scholar] [CrossRef]
- Lee, D.-H. CNN-Based Single Object Detection and Tracking in Videos and Its Application to Drone Detection. Multimed. Tools Appl. 2021, 80, 34237–34248. [Google Scholar] [CrossRef]
- Wang, Z.; Dang, C.; Zhang, R.; Wang, L.; He, Y.; Wu, R. MDDFA-Net: Multi-Scale Dynamic Feature Extraction from Drone-Acquired Thermal Infrared Imagery. Drones 2025, 9, 224. [Google Scholar] [CrossRef]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), October 2019; pp. 213–226. [Google Scholar]
- Suo, J.; Wang, T.; Zhang, X.; Chen, H.; Zhou, W.; Shi, W. HIT-UAV: A High-Altitude Infrared Thermal Dataset for Unmanned Aerial Vehicle-Based Object Detection. Sci Data 2023, 10, 227. [Google Scholar] [CrossRef] [PubMed]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); 2017; pp. 2999–3007. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018; pp. 6154–6162. [Google Scholar]
- Sun, C.; Zhang, S.; Qu, P.; Wu, X.; Feng, P.; Tao, Z.; Zhang, J.; Wang, Y. MCA-YOLOV5-Light: A Faster, Stronger and Lighter Algorithm for Helmet-Wearing Detection. Appl. Sci. 2022, 12, 9697. [Google Scholar] [CrossRef]
- Xu, X.; Jiang, Y.; Chen, W.; Huang, Y.; Zhang, Y.; Sun, X. DAMO-YOLO: A Report on Real-Time Object Detection Design. arXiv 2023, arXiv:2301.13566. [Google Scholar]
- Shen, L.; Lang, B.; Song, Z. CA-YOLO: Model Optimization for Remote Sensing Image Object Detection. IEEE Access 2023, 11, 64769–64781. [Google Scholar] [CrossRef]

















| Name | Value | Name | Value |
| Optimizer | SGD | Training Epochs | 150 |
| Image Size | 640×640;512×512 | Workers | 16 |
| Initial Learning Rate | 0.01 | Learning Rate Decay | 0.0001 |
| Weight Decay | 0.0005 | Batch Size | 8 |
| Momentum Factor | 0.937 | Warmup Epochs | 3 |
| DDSConv | MFAM | EW-BBRLF | P(%) | R(%) | mAP50(%) | mAP50:95(%) | Par(M) |
|---|---|---|---|---|---|---|---|
| - | - | - | 49.3 | 38.6 | 39.5 | 22.8 | 13.3 |
| 52.3 | 39.4 | 42.0 | 23.1 | 15.1 | |||
| 54.0 | 42.2 | 44.3 | 25.4 | 18.6 | |||
| 49.8 | 39.7 | 40.6 | 23.4 | 11.3 |
| DDSConv | MFAM | EW-BBRLF | P(%) | R(%) | mAP50(%) | mAP50:95(%) | Par(M) |
|---|---|---|---|---|---|---|---|
| - | - | - | 49.3 | 38.6 | 39.5 | 22.8 | 13.3 |
| 52.3 | 39.4 | 42.0 | 23.1 | 15.1 | |||
| 58.2 | 46.1 | 48.3 | 30.4 | 21.6 | |||
| 54.6 | 43.2 | 45.8 | 27.4 | 15.1 | |||
| 57.9 | 44.2 | 46.5 | 28.3 | 18.6 | |||
| 60.4 | 48.6 | 51.9 | 31.7 | 21.6 |
| Method | mAP50(%) | mAP50:95(%) | Par(M) |
|---|---|---|---|
| YOLOv10s | 39.5 | 22.8 | 11.3 |
| YOLOv10l | 44.5 | 25.7 | 30.5 |
| MFAM-Neck-A | 42.3 | 24.5 | 15.8 |
| MFAM-Neck-B | 42.9 | 24.9 | 15.6 |
| MFAM-Neck-C | 44.3 | 25.4 | 18.6 |
| Method | AP(%) | m1(%) | P(%) | R(%) | m2(%) | Par(M) | |||||||||
| Ped | Peo | Bic | Car | Van | Tru | Tri | Awn | Bus | Mot | all | |||||
| RetinaNet | 13.0 | 7.9 | 1.4 | 45.5 | 19.9 | 11.5 | 6.3 | 4.2 | 17.8 | 11.8 | 13.9 | 37.5 | 28.4 | 12.0 | 15.8 |
| CenterNet | 22.6 | 20.6 | 14.6 | 59.7 | 24.0 | 21.3 | 20.1 | 17.4 | 37.9 | 23.7 | 26.2 | 39.8 | 30.3 | 14.3 | 19.1 |
| QueryDet | 56.8 | 37.4 | 17.6 | 80.3 | 41.9 | 41.8 | 24.2 | 10.1 | 62.1 | 44.8 | 41.7 | 48. 3 | 37.1 | 21.2 | 35.6 |
| YOLOv5s | 37.9 | 31.5 | 12.8 | 70.4 | 34.2 | 31.7 | 18.7 | 12.6 | 41.1 | 37.2 | 32.8 | 42.4 | 33.6 | 17.6 | 7.2 |
| MCA-YOLOv5 | 42.0 | 27.8 | 18.1 | 81.6 | 47.1 | 52.0 | 26.8 | 25.3 | 63.2 | 43.1 | 42.7 | - | - | 28.3 | 31.8 |
| YOLOv8s | 42.7 | 32.6 | 14.2 | 80.6 | 44.4 | 46.8 | 28.2 | 23.1 | 55.7 | 44.5 | 41.2 | 49.6 | 41.7 | 23.1 | 18.6 |
| YOLOv10s | 43.2 | 32.6 | 12.9 | 79.5 | 46.1 | 36.3 | 28.6 | 15.6 | 54.8 | 45.4 | 39.5 | 49.3 | 38.6 | 22.8 | 13.3 |
| DAMO- YOLOv10 |
50.8 | 43.8 | 26.0 | 81.1 | 53.7 | 49.9 | 41.8 | 27.9 | 67.1 | 53.3 | 47.5 | 55.9 | 43.2 | 25.4 | 16.4 |
| CA-YOLO | 55.5 | 45.8 | 23.5 | 85.5 | 52.7 | 42.1 | 38.2 | 22.3 | 64.6 | 57.5 | 48.8 | 62.1 | 45.1 | 27.6 | 31.1 |
| ours | 56.6 | 47.4 | 24.6 | 87.5 | 53.1 | 41.7 | 39.8 | 25.4 | 65.9 | 59.7 | 50.1 | 60.4 | 48.6 | 29.7 | 17.6 |
| Note: m1 is the mAP50 indicator and m2 is the mAP50:95 indicator. | |||||||||||||||
| Method | mAP50(%) | mAP50:95(%) | Par(M) | Test Speed (ms) |
|---|---|---|---|---|
| YOLOv5s | 77.2 | 47.9 | 15.6 | 4.8 |
| YOLOv8s | 80.1 | 49.3 | 18.2 | 5.6 |
| YOLOv10s | 79.3 | 49.5 | 12.5 | 3.8 |
| YOLOv10l | 80.6 | 50.9 | 43.2 | 13.6 |
| CA-YOLOv10 | 81.1 | 51.4 | 23.5 | 8.9 |
| Ours | 81.4 | 52.8 | 14.2 | 4.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).