Submitted:
29 July 2025
Posted:
31 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Multi-scale Feature Complementary Aggregation Module (MFCAM) is integrated into the backbone network. MFCAM is committed to alleviating the problem of difficult feature extraction caused by the loss of small target information as the network depth increases. Through channel and spatial attention mechanism combined with convolutional feature extraction at different scales, the location of small targets in images can be effectively captured.
- We design a new neck architecture called Gated Activation Convolutional Fusion Pyramid Network (GAC-FPN), which efficiently highlights important features and suppresses irrelevant background information during multi-scale feature fusion. GAC-FPN uses three main strategies to enhance small target detection performance: adding a detector head with a small receptive field while deleting the detector head with the original largest receptive field, making full use of large-scale features, and using gated activation convolutional module.
- Aiming at the imbalance of positive and negative samples in the image, we use an adaptive threshold focus loss function in the detection head to replace the original binary cross entropy loss, which speeds up the convergence speed of the network and improves the detection accuracy of the model for small targets.
- In order to meet different practical task requirements, we propose different SRTSOD-YOLO versions of object detection models. These models include high-capacity models for ground workstations, focusing on multi-scale feature fusion and context modeling, giving full play to the parallel computing advantages of GPU clusters, and lightweight models for airborne platforms to achieve edge-end real-time reasoning while ensuring the recall rate of key targets. This hierarchical design paradigm improves the flexibility of algorithm deployment.
2. Related Work
2.1. Target Detection Methods of UAV Aerial Images
2.2. The YOLO Series Algorithms
2.3. The YOLO11 Architecture
3. The Proposed Model
3.1. The SRTSOD-YOLO Network Structure
3.2. The Multi-Scale Feature Complementary Aggregation Module
3.3. The Gated Activation Convolutional Fusion Pyramid Network
3.4. The Adaptive Threshold Focus Loss Function
4. Experiment and Analysis
4.1. Image Datasets for Small Object Detection
4.2. Experimental Setup
4.3. Experimental Evaluation Index
4.4. Assessment of Error Types
- Classification Error: IoUmax ≥ tf for GT of the incorrect class (i.e., localized correctly but classified incorrectly).
- Localization Error: tb ≤ IoUmax ≤ tf for GT of the correct class (i.e., classified correctly but localized incorrectly).
- Both Cls and Loc Error: tb ≤ IoUmax ≤ tf for GT of the incorrect class (i.e., classified incorrectly and localized incorrectly).
- Duplicate Detection Error: IoUmax ≥ tf for GT of the correct class but another higher-scoring detection already matched that GT (i.e., would be correct if not for a higher scoring detection).
- Background Error: IoUmax ≤ tb for all GT (i.e., detected background as foreground).
- Missed GT Error: All undetected ground truth (false negatives) not already covered by classification or localization error.
4.5. Comparative Analysis with YOLO11
4.6. Ablation Experiment
- (1)
- A: Multi-scale Feature Complementary Aggregation Module (MFCAM) is used in the backbone network.
- (2)
- B: Add a detector with a small receptive field and delete the detector with the original maximum receptive field.
- (3)
- C: A multi-scale and multi-level feature fusion pathway is reconstructed at the neck of the model to fully integrate the multi-level expression of large-size feature maps.
- (4)
- D: Use gated activation convolutional modules at the neck of the model.
- (5)
- E: The original binary cross-entropy loss was replaced by using an adaptive threshold focus loss.
4.7. Visual Comparison
4.8. Comparison with YOLO Series Algorithms
4.8.1. Comparison with YOLO Series Lightweight Models
4.8.2. Comparison with YOLO Series Large-Scale Models
4.9. Comparison with Other Object Detection Models
4.10. Comparison of UAVDT Dataset
5. Discussion
5.1. Multi-Scale Object Coexistence and Difficult Feature Extraction Problem
5.2. Complex Background Interference and Positive and Negative Sample Imbalance Problem
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Byun, S.; Shin, I.-K.; Moon, J.; Kang, J.; Choi, S.-I. Road Traffic Monitoring from UAV Images Using Deep Learning Networks. Remote Sensing 2021, 13, 4027. [Google Scholar] [CrossRef]
- Sun, W.; Dai, L.; Zhang, X.; Chang, P.; He, X. RSOD: Real-Time Small Object Detection Algorithm in UAV-Based Traffic Monitoring. Appl Intell 2022, 52, 8448–8463. [Google Scholar] [CrossRef]
- Muhmad Kamarulzaman, A. M.; Wan Mohd Jaafar, W. S.; Mohd Said, M. N.; Saad, S. N. M.; Mohan, M. UAV Implementations in Urban Planning and Related Sectors of Rapidly Developing Nations: A Review and Future Perspectives for Malaysia. Remote Sensing 2023, 15, 2845. [Google Scholar] [CrossRef]
- Yu, Y.; Gu, T.; Guan, H.; Li, D.; Jin, S. Vehicle Detection From High-Resolution Remote Sensing Imagery Using Convolutional Capsule Networks. IEEE Geosci. Remote Sensing Lett. 2019, 16, 1894–1898. [Google Scholar] [CrossRef]
- Li, Y.; Huang, Y.; Tao, Q. Improving Real-Time Object Detection in Internet-of-Things Smart City Traffic with YOLOv8-DSAF Method. Sci Rep 2024, 14. [Google Scholar] [CrossRef]
- An, R.; Zhang, X.; Sun, M.; Wang, G. GC-YOLOv9: Innovative Smart City Traffic Monitoring Solution. Alexandria Engineering Journal 2024, 106, 277–287. [Google Scholar] [CrossRef]
- Li, Z.; Zhang, Y.; Wu, H.; Suzuki, S.; Namiki, A.; Wang, W. Design and Application of a UAV Autonomous Inspection System for High-Voltage Power Transmission Lines. Remote Sensing 2023, 15, 865. [Google Scholar] [CrossRef]
- Vedanth, S.; B, U. N. K.; Harshavardhan, S.; Rao, T.; Kodipalli, A. Drone-Based Artificial Intelligence for Efficient Disaster Management: The Significance of Accurate Object Detection and Recognition. In 2024 IEEE 9th International Conference for Convergence in Technology (I2CT); IEEE: Pune, India, 2024; pp. 1–5. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Girshick_Fast_R-CNN_ICCV_2015_paper.
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need.
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; Uszkoreit, J.; Houlsby, N. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv June 3, 2021. [CrossRef]
- End-to-End Object Detection with Transformers. Lecture Notes in Computer Science; Springer International Publishing: Cham, 2020; pp. 213–229. [Google Scholar] [CrossRef]
- Microsoft COCO: Common Objects in Context. Lecture Notes in Computer Science; Springer International Publishing: Cham, 2014; pp. 740–755. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C. K. I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sensing 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Yu, X.; Gong, Y.; Jiang, N.; Ye, Q.; Han, Z. Scale Match for Tiny Person Detection. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV); IEEE: Snowmass Village, CO, USA, 2020. [Google Scholar] [CrossRef]
- Li, W.; Wei, W.; Zhang, L. GSDet: Object Detection in Aerial Images Based on Scale Reasoning. IEEE Trans. on Image Process. 2021, 30, 4599–4609. [Google Scholar] [CrossRef]
- Liu, K.; Fu, Z.; Jin, S.; Chen, Z.; Zhou, F.; Jiang, R.; Chen, Y.; Ye, J. ESOD: Efficient Small Object Detection on High-Resolution Images. IEEE Trans. on Image Process. 2025, 34, 183–195. [Google Scholar] [CrossRef]
- Adaimi, G.; Kreiss, S.; Alahi, A. Perceiving Traffic from Aerial Images. arXiv September 16, 2020. [CrossRef]
- Bouguettaya, A.; Zarzour, H.; Kechida, A.; Taberkit, A. M. Vehicle Detection From UAV Imagery With Deep Learning: A Review. IEEE Trans. Neural Netw. Learning Syst. 2022, 33, 6047–6067. [Google Scholar] [CrossRef] [PubMed]
- Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q. Vision Meets Drones: A Challenge. arXiv April 23, 2018. [CrossRef]
- Du, B.; Huang, Y.; Chen, J.; Huang, D. Adaptive Sparse Convolutional Networks with Global Context Enhancement for Faster Object Detection on Drone Images. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Vancouver, BC, Canada, 2023; pp. 13435–13444. [Google Scholar] [CrossRef]
- Ghiasi, G.; Cui, Y.; Srinivas, A.; Qian, R.; Lin, T.-Y.; Cubuk, E. D.; Le, Q. V.; Zoph, B. Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segmentation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Nashville, TN, USA, 2021. [Google Scholar] [CrossRef]
- Kisantal, M.; Wojna, Z.; Murawski, J.; Naruniec, J.; Cho, K. Augmentation for Small Object Detection. arXiv February 19, 2019. [CrossRef]
- Chen, C.; Zhang, Y.; Lv, Q.; Wei, S.; Wang, X.; Sun, X.; Dong, J. RRNet: A Hybrid Detector for Object Detection in Drone-Captured Images. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW); IEEE: Seoul, Korea (South), 2019. [Google Scholar] [CrossRef]
- Zhang, X.; Izquierdo, E.; Chandramouli, K. Dense and Small Object Detection in UAV Vision Based on Cascade Network. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW); IEEE: Seoul, Korea (South), 2019; pp. 118–126. [Google Scholar] [CrossRef]
- Wang, X.; Zhu, D.; Yan, Y. Towards Efficient Detection for Small Objects via Attention-Guided Detection Network and Data Augmentation. Sensors 2022, 22, 7663. [Google Scholar] [CrossRef] [PubMed]
- Bosquet, B.; Cores, D.; Seidenari, L.; Brea, V. M.; Mucientes, M.; Bimbo, A. D. A Full Data Augmentation Pipeline for Small Object Detection Based on Generative Adversarial Networks. Pattern Recognition 2023, 133, 108998. [Google Scholar] [CrossRef]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation.
- Liu, Z.; Gao, G.; Sun, L.; Fang, L. IPG-Net: Image Pyramid Guidance Network for Small Object Detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: Seattle, WA, USA, 2020; pp. 4422–4430. [Google Scholar] [CrossRef]
- Gong, Y.; Yu, X.; Ding, Y.; Peng, X.; Zhao, J.; Han, Z. Effective Fusion Factor in FPN for Tiny Object Detection. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV); IEEE: Waikoloa, HI, USA, 2021; pp. 1159–1167. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv November 25, 2019. [CrossRef]
- Yang, X.; Yang, J.; Yan, J.; Zhang, Y.; Zhang, T.; Guo, Z.; Sun, X.; Fu, K. SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Seoul, Korea (South), 2019; pp. 8231–8240. [Google Scholar] [CrossRef]
- Fu, J.; Sun, X.; Wang, Z.; Fu, K. An Anchor-Free Method Based on Feature Balancing and Refinement Network for Multiscale Ship Detection in SAR Images. IEEE Trans. Geosci. Remote Sensing 2021, 59, 1331–1344. [Google Scholar] [CrossRef]
- Lu, X.; Ji, J.; Xing, Z.; Miao, Q. Attention and Feature Fusion SSD for Remote Sensing Object Detection. IEEE Trans. Instrum. Meas. 2021, 70, 1–9. [Google Scholar] [CrossRef]
- Ran, Q.; Wang, Q.; Zhao, B.; Wu, Y.; Pu, S.; Li, Z. Lightweight Oriented Object Detection Using Multiscale Context and Enhanced Channel Attention in Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 2021, 14, 5786–5795. [Google Scholar] [CrossRef]
- Wu, X.; Hong, D.; Chanussot, J. UIU-Net: U-Net in U-Net for Infrared Small Object Detection. IEEE Trans. on Image Process. 2023, 32, 364–376. [Google Scholar] [CrossRef]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking. In Computer Vision – ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, 2018; Vol. 11214, pp. 375–391. [Google Scholar] [CrossRef]
- Zhu, P.; Wen, L.; Du, D.; Bian, X.; Fan, H.; Hu, Q.; Ling, H. Detection and Tracking Meet Drones Challenge. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7380–7399. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Yang, Y.; Zeng, D.; Wang, X. Adaptive and Background-Aware Vision Transformer for Real-Time UAV Tracking. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Paris, France, 2023; pp. 13943–13954. [Google Scholar] [CrossRef]
- Wang, H.; Shen, Q.; Deng, Z. A Diverse Knowledge Perception and Fusion Network for Detecting Targets and Key Parts in UAV Images. Neurocomputing 2025, 612, 128748. [Google Scholar] [CrossRef]
- Chen, C.; Qi, J.; Liu, X.; Bin, K.; Fu, R.; Hu, X.; Zhong, P. Weakly Misalignment-Free Adaptive Feature Alignment for UAVs-Based Multimodal Object Detection.
- Wang, H.; Wang, C.; Fu, Q.; Zhang, D.; Kou, R.; Yu, Y.; Song, J. Cross-Modal Oriented Object Detection of UAV Aerial Images Based on Image Feature. IEEE Trans. Geosci. Remote Sensing 2024, 62, 1–21. [Google Scholar] [CrossRef]
- Liu, J.; Wen, B.; Xiao, J.; Sun, M. Design of UAV Target Detection Network Based on Deep Feature Fusion and Optimization with Small Targets in Complex Contexts. Neurocomputing 2025, 639, 130207. [Google Scholar] [CrossRef]
- Wang, J.; Li, X.; Chen, J.; Zhou, L.; Guo, L.; He, Z.; Zhou, H.; Zhang, Z. DPH-YOLOv8: Improved YOLOv8 Based on Double Prediction Heads for the UAV Image Object Detection. IEEE Trans. Geosci. Remote Sensing 2024, 62, 1–15. [Google Scholar] [CrossRef]
- Suo, J.; Zhang, X.; Shi, W.; Zhou, W. E3-UAV: An Edge-Based Energy-Efficient Object Detection System for Unmanned Aerial Vehicles. IEEE Internet Things J. 2024, 11, 4398–4413. [Google Scholar] [CrossRef]
- Wang, K.; Fu, X.; Huang, Y.; Cao, C.; Shi, G.; Zha, Z.-J. Generalized UAV Object Detection via Frequency Domain Disentanglement. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Vancouver, BC, Canada, 2023; pp. 1064–1073. [Google Scholar] [CrossRef]
- Chen, Y.; Ye, Z.; Sun, H.; Gong, T.; Xiong, S.; Lu, X. Global–Local Fusion With Semantic Information Guidance for Accurate Small Object Detection in UAV Aerial Images. IEEE Trans. Geosci. Remote Sensing 2025, 63, 1–15. [Google Scholar] [CrossRef]
- Ying, Z.; Zhou, J.; Zhai, Y.; Quan, H.; Li, W.; Genovese, A.; Piuri, V.; Scotti, F. Large-Scale High-Altitude UAV-Based Vehicle Detection via Pyramid Dual Pooling Attention Path Aggregation Network. IEEE Trans. Intell. Transport. Syst. 2024, 25, 14426–14444. [Google Scholar] [CrossRef]
- Zou, T.; Ge, Q.; Huang, Y. MFP-DETR: Marine UAV Target Detection Based on Multi-Scale Fuzzy Perception. Neurocomputing 2025, 635, 129843. [Google Scholar] [CrossRef]
- Dutta, A.; Das, S.; Nielsen, J.; Chakraborty, R.; Shah, M. Multiview Aerial Visual Recognition (MAVREC): Can Multi-View Improve Aerial Visual Perception? In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Seattle, WA, USA, 2024; pp. 22678–22690. [Google Scholar] [CrossRef]
- Ding, X.; Zhang, R.; Liu, Q.; Yang, Y. Real-Time Small Object Detection Using Adaptive Weighted Fusion of Efficient Positional Features. Pattern Recognition 2025, 167, 111717. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Las Vegas, NV, USA, 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Honolulu, HI, 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement.
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv December 25, 2016. http://arxiv.org/abs/1612.08242 (accessed 2023-05-25).
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y. M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv April 22, 2020. http://arxiv.org/abs/2004.10934 (accessed 2024-03-11).
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; Li, Y.; Zhang, B.; Liang, Y.; Zhou, L.; Xu, X.; Chu, X.; Wei, X.; Wei, X. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv September 7, 2022. http://arxiv.org/abs/2209.02976 (accessed 2024-03-11).
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y. M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Vancouver, BC, Canada, 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
- YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. Lecture Notes in Computer Science; Springer Nature Switzerland: Cham, 2025; pp. 1–21. [Google Scholar] [CrossRef]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv October 30, 2024. [CrossRef]
- Xiao, Y.; Xu, T.; Xin, Y.; Li, J. FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection. AAAI 2025, 39, 8673–8681. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Honolulu, HI, 2017. [Google Scholar] [CrossRef]
- Liu, H.; Jia, C.; Shi, F.; Cheng, X.; Chen, S. SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures. arXiv March 23, 2025. [CrossRef]
- Dauphin, Y. N.; Fan, A.; Auli, M.; Grangier, D. Language Modeling with Gated Convolutional Networks.
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T. S. Free-Form Image Inpainting With Gated Convolution.
- Li, J.; Nie, Q.; Fu, W.; Lin, Y.; Tao, G.; Liu, Y.; Wang, C. LORS: Low-Rank Residual Structure for Parameter-Efficient Network Stacking. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Seattle, WA, USA, 2024; pp. 15866–15876. [Google Scholar] [CrossRef]
- Yang, B.; Zhang, X.; Zhang, J.; Luo, J.; Zhou, M.; Pi, Y. EFLNet: Enhancing Feature Learning Network for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sensing 2024, 62, 1–11. [Google Scholar] [CrossRef]
- TIDE: A General Toolbox for Identifying Object Detection Errors. In Lecture Notes in Computer Science; Springer International Publishing: Cham, 2020; pp. 558–573. [CrossRef]
- Yue, M.; Zhang, L.; Huang, J.; Zhang, H. Lightweight and Efficient Tiny-Object Detection Based on Improved YOLOv8n for UAV Aerial Images. Drones 2024, 8, 276. [Google Scholar] [CrossRef]
- Xu, H.; Zheng, W.; Liu, F.; Li, P.; Wang, R. Unmanned Aerial Vehicle Perspective Small Target Recognition Algorithm Based on Improved YOLOv5. Remote Sensing 2023, 15, 3583. [Google Scholar] [CrossRef]
- Li, Y.; Fan, Q.; Huang, H.; Han, Z.; Gu, Q. A Modified YOLOv8 Detection Network for UAV Aerial Image Recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
- Tahir, N. U. A.; Long, Z.; Zhang, Z.; Asim, M.; ELAffendi, M. PVswin-YOLOv8s: UAV-Based Pedestrian and Vehicle Detection for Traffic Management in Smart Cities Using Improved YOLOv8. Drones 2024, 8, 84. [Google Scholar] [CrossRef]
- Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef]
- Zhang, Z. Drone-YOLO: An Efficient Neural Network Method for Target Detection in Drone Images. Drones 2023, 7, 526. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, J.; Liu, S.; Xu, L.; Wang, Y. Aams-Yolo: A Small Object Detection Method for UAV Capture Scenes Based on YOLOv7. Cluster Comput 2025, 28. [Google Scholar] [CrossRef]
- Bai, C.; Zhang, K.; Jin, H.; Qian, P.; Zhai, R.; Lu, K. SFFEF-YOLO: Small Object Detection Network Based on Fine-Grained Feature Extraction and Fusion for Unmanned Aerial Images. Image and Vision Computing 2025, 156, 105469. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, Y.; Xing, S. YOLO-LE: A Lightweight and Efficient UAV Aerial Image Target Detection Model. Springer Science and Business Media LLC September 12, 2024. [CrossRef]
- Lu, Y.; Sun, M. Lightweight Multidimensional Feature Enhancement Algorithm LPS-YOLO for UAV Remote Sensing Target Detection. Sci Rep 2025, 15. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Liu, J.; Zhao, J.; Zhang, J.; Zhao, D. Precision and Speed: LSOD-YOLO for Lightweight Small Object Detection. Expert Systems with Applications 2025, 269, 126440. [Google Scholar] [CrossRef]
- Zhou, L.; Zhao, S.; Liu, Z.; Zhang, W.; Qiao, B.; Liu, Y. A Lightweight Aerial Image Object Detector Based on Mask Information Enhancement. IEEE Trans. Instrum. Meas. 2025, 74, 1–17. [Google Scholar] [CrossRef]
- Jiang, L.; Yuan, B.; Du, J.; Chen, B.; Xie, H.; Tian, J.; Yuan, Z. MFFSODNet: Multiscale Feature Fusion Small Object Detection Network for UAV Aerial Images. IEEE Trans. Instrum. Meas. 2024, 73, 1–14. [Google Scholar] [CrossRef]
- Yan, H.; Kong, X.; Wang, J.; Tomiyama, H. ST-YOLO: An Enhanced Detector of Small Objects in Unmanned Aerial Vehicle Imagery. Drones 2025, 9, 338. [Google Scholar] [CrossRef]



















| Model | Depth | Width | Maximum number of channels |
|---|---|---|---|
| yolo11x | 1.00 | 1.50 | 512 |
| yolo11l | 1.00 | 1.00 | 512 |
| yolo11m | 0.50 | 1.00 | 512 |
| yolo11s | 0.50 | 0.50 | 1024 |
| yolo11n | 0.50 | 0.25 | 1024 |
| Layer | Module | SRTSOD-YOLO-n | SRTSOD-YOLO-s | SRTSOD-YOLO-m | SRTSOD-YOLO-l |
|---|---|---|---|---|---|
| 0 | CBS | 8 | 16 | 32 | 32 |
| 1 | CBS | 16 | 32 | 64 | 64 |
| 2 | MFCAM | 16 | 32 | 64 | 64 |
| 3 | CBS | 32 | 64 | 128 | 128 |
| 4 | MFCAM | 32 | 64 | 128 | 128 |
| 5 | CBS | 64 | 128 | 256 | 256 |
| 6 | MFCAM | 64 | 128 | 256 | 256 |
| 7 | CBS | 128 | 256 | 512 | 512 |
| 8 | MFCAM | 128 | 256 | 512 | 512 |
| 9 | SPPF | 128 | 256 | 512 | 512 |
| 10 | C2PSA | 128 | 256 | 512 | 512 |
| Layer | Module | SRTSOD-YOLO-n | SRTSOD-YOLO-s | SRTSOD-YOLO-m | SRTSOD-YOLO-l |
|---|---|---|---|---|---|
| 11 | CBS | 16 | 32 | 64 | 64 |
| 12 | CBS | 16 | 32 | 64 | 64 |
| 13 | CBS | 16 | 16 | 32 | 32 |
| 14 | Upsample | 128 | 256 | 512 | 512 |
| 15 | Concat | 208 | 416 | 832 | 832 |
| 16 | GAC | 208 | 416 | 832 | 832 |
| 17 | C3K2 | 32/n=1 | 64/n=1 | 128/n=2 | 128/n=4 |
| 18 | Upsample | 32 | 64 | 128 | 128 |
| 19 | Concat | 80 | 160 | 320 | 320 |
| 20 | GAC | 80 | 160 | 320 | 320 |
| 21 | C3K2 | 32/n=1 | 64/n=1 | 128/n=2 | 128/n=4 |
| 22 | Upsample | 32 | 64 | 128 | 128 |
| 23 | Concat | 64 | 112 | 224 | 224 |
| 24 | GAC | 64 | 112 | 224 | 224 |
| 25 | C3K2 | 16/n=1 | 32/n=1 | 64/n=2 | 64/n=4 |
| 26 | CBS | 16 | 32 | 64 | 64 |
| 27 | Concat | 48 | 96 | 192 | 192 |
| 28 | C3K2 | 32/n=1 | 64/n=1 | 128/n=2 | 128/n=4 |
| 29 | CBS | 32 | 64 | 128 | 128 |
| 30 | Concat | 64 | 128 | 256 | 256 |
| 31 | C3K2 | 64/n=1 | 128/n=1 | 256/n=2 | 256/n=4 |
| Network | mAP50 (%) | mAP50-95 (%) | Params (M) | GFLOPs | FPS |
|---|---|---|---|---|---|
| YOLO11n | 33.2 | 20.6 | 2.6 | 6.5 | 164 |
| SRTSOD-YOLO-n | 36.3 | 21.8 | 3.5 | 7.4 | 147 |
| YOLO11s | 40.6 | 24.5 | 9.4 | 21.6 | 153 |
| SRTSOD-YOLO-s | 44.4 | 27.0 | 11.1 | 24.2 | 138 |
| YOLO11m | 43.5 | 26.3 | 20.1 | 68.2 | 135 |
| SRTSOD-YOLO-m | 49.6 | 30.4 | 22.2 | 72.7 | 124 |
| YOLO11l | 45.9 | 28.2 | 25.3 | 87.3 | 111 |
| SRTSOD-YOLO-l | 53.8 | 33.8 | 27.6 | 94.7 | 99 |
| Model | Ecls | Eloc | Eboth | Edup | Ebkg | Emissed |
|---|---|---|---|---|---|---|
| YOLO11s | 15.30 | 4.32 | 0.52 | 0.18 | 2.35 | 14.46 |
| SRTSOD-YOLO-s | 15.06 | 4.11 | 0.50 | 0.15 | 2.26 | 14.27 |
| YOLO11l | 14.59 | 4.19 | 0.53 | 0.12 | 2.55 | 15.04 |
| SRTSOD-YOLO-l | 14.09 | 3.91 | 0.49 | 0.03 | 2.13 | 13.96 |
| Network | mAP50 (%) | mAP50-95 (%) |
|---|---|---|
| YOLO11n | 32.3 | 20.2 |
| SRTSOD-YOLO-n | 33.5 | 20.8 |
| YOLO11s | 34.6 | 21.4 |
| SRTSOD-YOLO-s | 38.4 | 23.6 |
| YOLO11m | 39.8 | 24.2 |
| SRTSOD-YOLO-m | 44.7 | 27.3 |
| YOLO11l | 43.9 | 26.5 |
| SRTSOD-YOLO-l | 47.2 | 28.7 |
| Network | A | B | C | D | E | mAP50(%) | mAP50-95 (%) | Params (M) | GFLOPs |
|---|---|---|---|---|---|---|---|---|---|
| YOLO11n | 33.2 | 20.6 | 2.6 | 6.5 | |||||
| SRTSOD-YOLO-n | √ | 33.9 | 20.7 | 2.7 | 6.7 | ||||
| √ | √ | 35.1 | 21.2 | 2.9 | 6.9 | ||||
| √ | √ | √ | 35.6 | 21.5 | 3.3 | 7.3 | |||
| √ | √ | √ | √ | 36.0 | 21.7 | 3.5 | 7.4 | ||
| √ | √ | √ | √ | √ | 36.3 | 21.8 | 3.5 | 7.4 |
| Network | mAP50 (%) | mAP50-95 (%) | Params (M) | GFLOPs |
|---|---|---|---|---|
| YOLOv3-tiny | 23.4 | 13.0 | 12.1 | 18.9 |
| YOLOv5s | 37.7 | 22.3 | 9.1 | 23.8 |
| YOLOv6s | 36.3 | 21.4 | 16.3 | 44.0 |
| YOLOv7-tiny | 32.9 | 16.8 | 6.0 | 13.3 |
| YOLOv8s | 39.0 | 23.3 | 11.6 | 28.7 |
| YOLOv10s | 38.6 | 23.1 | 7.4 | 21.4 |
| SRTSOD-YOLO-s | 44.4 | 27.0 | 11.1 | 24.2 |
| Network | mAP50 (%) | mAP50-95 (%) | Params (M) | GFLOPs |
|---|---|---|---|---|
| YOLOv3 | 44.0 | 26.9 | 103.7 | 282.3 |
| YOLOv5l | 43.0 | 26.2 | 53.2 | 134.7 |
| YOLOv6l | 40.7 | 24.8 | 110.9 | 391.2 |
| YOLOv7 | 46.2 | 25.9 | 37.2 | 105.3 |
| YOLOv8l | 43.8 | 26.9 | 43.6 | 164.9 |
| YOLOv9e | 46.6 | 28.9 | 57.4 | 189.2 |
| YOLOv10l | 43.5 | 26.8 | 24.9 | 120.0 |
| SRTSOD-YOLO-l | 53.8 | 33.8 | 27.6 | 94.7 |
| Network | mAP50 (%) | mAP50-95 (%) | Params (M) | GFLOPs |
|---|---|---|---|---|
| LE-YOLO [73] | 39.3 | 22.7 | 2.1 | 13.1 |
| YOLOv5-pp[74] | 41.7 | - | 10.5 | - |
| Modified YOLOv8[75] | 42.2 | - | 9.66 | - |
| PVswin-YOLOv8[76] | 43.3 | - | 21.6 | - |
| UAV-YOLOv8[77] | 47.0 | 29.2 | 10.3 | - |
| Drone-YOLO[78] | 51.3 | 31.9 | 76.2 | - |
| Aams-yolo[79] | 47.2 | 29.1 | 59.2 | 171.7 |
| SFFEF-YOLO[80] | 50.1 | 31.0 | - | - |
| YOLO-LE[81] | 39.9 | 22.5 | 4.0 | 8.5 |
| LPS-YOLO(large)[82] | 53.2 | 34.3 | 44.1 | - |
| LSOD-YOLO[83] | 37.0 | - | 3.8 | 33.9 |
| BFDet[84] | 51.4 | 29.5 | 5.6 | 25.6 |
| Faster RCNN | 37.2 | 21.9 | 41.2 | 292.8 |
| Cascade RCNN | 39.1 | 24.3 | 68.9 | 320.7 |
| RetinaNet | 19.1 | 10.6 | 35.7 | 299.5 |
| CenterNet | 33.7 | 18.8 | 70.8 | 137.2 |
| MFFSODNet[85] | 45.5 | - | 4.5 | - |
| SRTSOD-YOLO-n | 36.3 | 21.8 | 3.5 | 7.4 |
| SRTSOD-YOLO-s | 44.4 | 27.0 | 11.1 | 24.2 |
| SRTSOD-YOLO-m | 49.6 | 30.4 | 22.2 | 72.7 |
| SRTSOD-YOLO-l | 53.8 | 33.8 | 27.6 | 94.7 |
| Network | mAP50 (%) | mAP50-95 (%) | Params (M) | GFLOPs |
|---|---|---|---|---|
| Aams-yolo [79] | 43.1 | 29.9 | 59.2 | 171.7 |
| SFFEF-YOLO [80] | 44.1 | 29.1 | - | - |
| ST-YOLO [86] | 33.4 | - | 9.0 | 20.1 |
| LSOD-YOLO [83] | 37.1 | 22.1 | - | - |
| BFDet [84] | 46.0 | 26.3 | - | - |
| Faster RCNN | 36.5 | 21.4 | 41.1 | 292.3 |
| Cascade RCNN | 38.7 | 23.9 | 68.8 | 320.5 |
| RetinaNet | 18.8 | 10.4 | 35.7 | 299.5 |
| CenterNet | 32.9 | 18.2 | 70.8 | 137.2 |
| YOLOv7 | 41.9 | 25.4 | 36.5 | 105.3 |
| SRTSOD-YOLO-n | 33.5 | 20.8 | 3.5 | 7.4 |
| SRTSOD-YOLO-s | 38.4 | 23.6 | 11.1 | 24.2 |
| SRTSOD-YOLO-m | 44.7 | 27.3 | 22.2 | 72.7 |
| SRTSOD-YOLO-l | 47.2 | 28.7 | 27.6 | 94.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).