Submitted:
25 June 2025
Posted:
27 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We design the re-parameterized backbone ER-HGNetV2 for low-light environments, whcih effectively captures high-quality features, suppresses noise, and enhances feature representation.
- We develop LFSPN, which enables efficient multi-scale feature fusion and enhances detection capability across diverse object scales.
- We introduce SCSHead, a lightweight detection head leveraging shared convolutions and separate batch normalization layers to minimize computational complexity and enhance inference efficiency.
- Extensive experiments conducted on the ExDark and DroneVehicle datasets demonstrate that ELS-YOLO achieves an optimal balance between detection accuracy and inference speed.
2. Related Work
2.1. DVOD: Drone-View Object Detection
2.2. LLOD: Low-Light Object Detection
3. Baseline Algorithm
4. Methodology
4.1. ER-HGNetV2: Re-Parameterized Backbone
4.2. LFSPN: Lightweight Feature Selection Pyramid Network
4.3. SCSHead: Shared Convolution and Separate Batch Normalization Head
4.4. Network Structure of ELS-YOLO
4.5. Channel Pruning
5. Experimental Results
5.1. Dataset
5.1.1. ExDark
5.1.2. DroneVehicle
5.2. Experimental Environment
5.3. Evaluation Indicators
5.4. Experimental Analysis on the ExDark Dataset
5.4.1. ER-HGNetv2 Experiment
5.4.2. Comparison with YOLOv11
5.4.3. LAMP Experiment
5.4.4. Ablation Experiments
5.4.5. Comparison Experiments with Other Baseline Methods
5.4.6. Visualization Analysis
5.5. Experimental Analysis on the DroneVehicle Dataset
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- P. Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “Detection and Tracking Meet Drones Challenge,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7380–7399, 2022. [CrossRef]
- X. Wu, D. Hong, and J. Chanussot, “Convolutional Neural Networks for Multimodal Remote Sensing Data Classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–10, 2022. [CrossRef]
- Y. Huang, J. Chen, and D. Huang, “UFPMP-Det: Toward Accurate and Efficient Object Detection on Drone Imagery,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
- J. Zhan, Y. Hu, G. Zhou, Y. Wang, W. Cai, and L. Li, “A high-precision forest fire smoke detection approach based on ARGNet,” Computers and Electronics in Agriculture, vol. 196, p. 106874, 2022. [CrossRef]
- L. Zhou, Y. Dong, B. Ma, et al., “Object detection in low-light conditions based on DBS-YOLOv8,” Cluster Computing, vol. 28, no. 55, 2025. [CrossRef]
- R. Kaur and S. Singh, “A comprehensive review of object detection with deep learning,” Digital Signal Processing, vol. 132, pp. 103812, 2023. [CrossRef]
- X. Liu, Z. Wu, A. Li, et al., “NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 6571–6594, 2024. [CrossRef]
- X. Guo, Y. Li, and H. Ling, “LIME: Low-Light Image Enhancement via Illumination Map Estimation,” IEEE Transactions on Image Processing, vol. 26, no. 2, pp. 982–993, 2017. [CrossRef]
- C. Hu, W. Yi, K. Hu, Y. Guo, X. Jing, and P. Liu, “FHSI and QRCPE-Based Low-Light Enhancement With Application to Night Traffic Monitoring Images,” IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 7, pp. 6978–6993, 2024. [CrossRef]
- J. J. Jeon, J. Y. Park, and I. K. Eom, “Low-light image enhancement using gamma correction prior in mixed color spaces,” Pattern Recognition, vol. 146, p. 110001, 2024. [CrossRef]
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, 2016. [CrossRef]
- C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information,” in Proceedings of the European Conference on Computer Vision (ECCV), Milan, Italy, pp. 1–21, 2024. [CrossRef]
- A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding, “YOLOv10: Real-Time End-to-End Object Detection,” in Advances in Neural Information Processing Systems, vol. 37, pp. 107984–108011, 2024.
- M. Everingham, L. Van Gool, C. K. I. Williams, et al., “The PASCAL Visual Object Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, pp. 303–338, 2010. [CrossRef]
- T.-Y. Lin, M. Maire, S. Belongie, et al., “Microsoft COCO: Common Objects in Context,” in Computer Vision – ECCV 2014, Lecture Notes in Computer Science, vol. 8693, Springer, Cham, 2014. [CrossRef]
- X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “RepVGG: Making VGG-style ConvNets Great Again,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13728–13737, 2021. [CrossRef]
- Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539, 2020. [CrossRef]
- J. Lee, S. Park, S. Mo, S. Ahn, and J. Shin, “Layer-Adaptive Sparsity for the Magnitude-Based Pruning,” in Proceedings of the 9th International Conference on Learning Representations (ICLR), 2021.
- S. Deng, S. Li, K. Xie, W. Song, X. Liao, A. Hao, and H. Qin, “A Global-Local Self-Adaptive Network for Drone-View Object Detection,” IEEE Transactions on Image Processing, vol. 30, pp. 1556–1569, 2021. [CrossRef]
- Y. Bai, Y. Zhang, M. Ding, and B. Ghanem, “SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network,” in Computer Vision – ECCV 2018, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds. Lecture Notes in Computer Science, vol. 11217, Springer, Cham, 2018. [CrossRef]
- Y. Xi, J. Zheng, X. He, W. Jia, H. Li, Y. Xie, M. Feng, and X. Li, “Beyond context: Exploring semantic similarity for small object detection in crowded scenes,” Pattern Recognition Letters, vol. 137, pp. 53–60, 2020. [CrossRef]
- H. Qiu, H. Li, Q. Wu, F. Meng, L. Xu, K. N. Ngan, and H. Shi, “Hierarchical context features embedding for object detection,” IEEE Transactions on Multimedia, vol. 22, no. 12, pp. 3039–3050, 2020. [CrossRef]
- G. Li, Z. Liu, D. Zeng, W. Lin, and H. Ling, “Adjacent context coordination network for salient object detection in optical remote sensing images,” IEEE Transactions on Cybernetics, vol. 53, no. 1, pp. 526–538, 2023. [CrossRef]
- W. Zhao, Y. Kang, H. Chen, Z. Zhao, Z. Zhao, and Y. Zhai, “Adaptively attentional feature fusion oriented to multiscale object detection in remote sensing images,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–11, 2023. [CrossRef]
- Y. Sun, B. Cao, P. Zhu, and Q. Hu, “Drone-based RGB-Infrared cross-modality vehicle detection via uncertainty-aware learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 10, pp. 6700–6713, 2022. [CrossRef]
- H. Farid, “Blind inverse gamma correction,” IEEE Transactions on Image Processing, vol. 10, no. 10, pp. 1428–1433, 2001. [CrossRef]
- K. Zuiderveld, “Contrast limited adaptive histogram equalization,” in Graphics Gems IV, USA: Academic Press Professional, Inc., 1994, pp. 474–485.
- M. Li, J. Liu, W. Yang, X. Sun, and Z. Guo, “Structure-revealing low-light image enhancement via robust Retinex model,” IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2828–2841, 2018. [CrossRef]
- K. G. Lore, A. Akintayo, and S. Sarkar, “LLNet: A deep autoencoder approach to natural low-light image enhancement,” Pattern Recognition, vol. 61, pp. 650–662, 2017. [CrossRef]
- X. Li, W. Wang, X. Feng, and M. Li, “Deep parametric Retinex decomposition model for low-light image enhancement,” Computer Vision and Image Understanding, vol. 241, p. 103948, 2024. [CrossRef]
- C. Guo, C. Li, J. Guo, C. C. Loy, J. Hou, S. Kwong, and R. Cong, “Zero-reference deep curve estimation for low-light image enhancement,” in Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1777–1786, 2020. [CrossRef]
- C. Li, C. Guo, and C. C. Loy, “Learning to enhance low-light image via zero-reference deep curve estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4225–4238, 2022. [CrossRef]
- X. Xu, R. Wang, C.-W. Fu, and J. Jia, “SNR-aware low-light image enhancement,” in Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17693–17703, 2022. [CrossRef]
- L. Ma, T. Ma, R. Liu, X. Fan, and Z. Luo, “Toward fast, flexible, and robust low-light image enhancement,” in Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5627–5636, 2022. [CrossRef]
- J. Hu and Z. Cui, “YOLO-Owl: An occlusion aware detector for low illuminance environment,” in Proceedings of the 2023 3rd International Conference on Neural Networks, Information and Communication Engineering (NNICE), pp. 167–170, 2023. [CrossRef]
- Y. Zhang, C. Wu, T. Zhang, Y. Liu, and Y. Zheng, “Self-attention guidance and multiscale feature fusion-based UAV image object detection,” IEEE Geoscience and Remote Sensing Letters, vol. 20, pp. 1–5, 2023. [CrossRef]
- R. Wu, W. Huang, and X. Xu, “AE-YOLO: Asymptotic enhancement for low-light object detection,” in Proceedings of the 2024 17th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–6, 2024. [CrossRef]
- R. Khanam and M. Hussain, “YOLOv11: An Overview of the Key Architectural Enhancements,” arXiv preprint, arXiv:2410.17725, 2024. [CrossRef]
- Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, and J. Chen, “DETRs Beat YOLOs on Real-time Object Detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16965–16974, 2024. [CrossRef]
- Q. Hou, D. Zhou, and J. Feng, “Coordinate Attention for Efficient Mobile Network Design,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13708–13717, 2021. [CrossRef]
- H. Cai, J. Li, M. Hu, C. Gan, and S. Han, “EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 17256–17267. [CrossRef]
- A. Wang, H. Chen, Z. Lin, J. Han, and G. Ding, “Rep ViT: Revisiting Mobile CNN From ViT Perspective,” in Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 15909–15920. [CrossRef]
- D. Qin et al., “MobileNetV4: Universal Models for the Mobile Ecosystem,” in Computer Vision – ECCV 2024, A. Leonardis, E. Ricci, S. Roth, O. Russakovsky, T. Sattler, and G. Varol, Eds. Cham: Springer, 2025, Lecture Notes in Computer Science, vol. 15098. [CrossRef]
- X. Ma, X. Dai, Y. Bai, Y. Wang, and Y. Fu, “Rewrite the Stars,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2024, pp. 5694–5703. [CrossRef]
- R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 618–626, 2017. [CrossRef]
- Y. P. Loh and C. S. Chan, “Getting to Know Low-light Images with The Exclusively Dark Dataset,” Computer Vision and Image Understanding, vol. 178, pp. 30–42, 2019. [CrossRef]
- Y. Sun, B. Cao, P. Zhu, and Q. Hu, “Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, pp. 1–1, 2022. [CrossRef]
















| Backbone | mAP@0.5/% | mAP/% | Params/M | GFLOPs/G |
|---|---|---|---|---|
| EfficientViT[41] | 68.5 | 43.1 | 7.98 | 19.0 |
| RepViT[42] | 69.3 | 43.9 | 10.14 | 23.5 |
| HGNetV2 | 69.7 | 44.6 | 7.61 | 18.9 |
| MobileNetV4[43] | 66.3 | 41.9 | 9.53 | 27.8 |
| StarNet[44] | 65.8 | 40.1 | 8.63 | 17.6 |
| ER-HGNetV2 | 72.6 | 46.5 | 7.17 | 18.3 |
| Models | mAP@0.5/% | mAP/% | Params/M | GFLOPs/G |
|---|---|---|---|---|
| YOLO11n | 67.6 | 42.2 | 2.6 | 6.3 |
| YOLO11s | 71.4 | 45.7 | 9.4 | 21.3 |
| YOLO11m | 73.2 | 47.7 | 20.0 | 67.7 |
| YOLO11l | 74.6 | 48.9 | 25.2 | 86.6 |
| YOLO11x | 75.7 | 49.7 | 56.8 | 194.5 |
| ELS-YOLO | 74.3 | 48.5 | 4.6 | 15.0 |
| Models | mAP@0.5/% | mAP/% | Params/M | GFLOPs/G |
|---|---|---|---|---|
| ELS-YOLO | 74.3 | 48.5 | 4.6 | 15.0 |
| ELS-YOLO(ratio=1.33) | 74.3 | 48.4 | 2.4 | 11.2 |
| ELS-YOLO(ratio=2.0) | 74.2 | 48.1 | 1.3 | 7.4 |
| ELS-YOLO(ratio=4.0) | 62.4 | 37.5 | 0.5 | 3.7 |
| Models | P/% | R/% | mAP@0.5/% | mAP/% | Params/M | GFLOPs/G |
|---|---|---|---|---|---|---|
| YOLOv11s | 78.7 | 60.5 | 71.4 | 45.7 | 9.4 | 21.3 |
| +A | 79.8 | 63.1 | 72.6 | 46.5 | 7.6 | 18.3 |
| +A+B | 78.5 | 65.4 | 73.8 | 47.9 | 7.4 | 18.1 |
| +A+B+C | 79.2 | 65.8 | 74.3 | 48.5 | 4.6 | 15.0 |
| Models | P/% | R/% | mAP@0.5/% | mAP/% | Params/M | GFLOPs/G |
|---|---|---|---|---|---|---|
| YOLOv8n | 70.2 | 59.6 | 65.7 | 41.1 | 3.0 | 8.1 |
| YOLOv8s | 73.9 | 62.7 | 70.4 | 44.3 | 11.1 | 28.5 |
| YOLOv9t | 74.0 | 56.7 | 65.2 | 40.8 | 2.0 | 7.6 |
| YOLOv9s | 74.1 | 62.1 | 69.8 | 44.8 | 7.2 | 26.8 |
| YOLOv10n | 71.8 | 58.1 | 65.0 | 40.5 | 2.7 | 8.2 |
| YOLOv10s | 77.2 | 60.2 | 69.0 | 43.8 | 8.1 | 24.5 |
| Faster R-CNN | 67.4 | 52.6 | 58.9 | 35.2 | 68.9 | 80.2 |
| RetinaNet | 66.3 | 50.7 | 57.6 | 33.9 | 41.2 | 78.2 |
| DETR | 71.9 | 57.3 | 63.8 | 39.7 | 40.8 | 186.2 |
| RT-DETR-r50 | 75.4 | 61.5 | 67.1 | 42.2 | 41.9 | 125.7 |
| RT-DETR-L | 73.1 | 58.1 | 64.6 | 39.9 | 32.0 | 103.5 |
| ELS-YOLO | 79.2 | 65.8 | 74.3 | 48.5 | 4.6 | 15.0 |
| Models | P/% | R/% | mAP@0.5/% | mAP/% |
|---|---|---|---|---|
| YOLO11n | 58.4 | 58.9 | 61.7 | 38.6 |
| YOLO11s | 68.2 | 63.7 | 67.2 | 42.9 |
| RT-DETR-r50 | 65.7 | 63.2 | 66.7 | 41.2 |
| RT-DETR-L | 67.9 | 66.4 | 68.1 | 43.3 |
| ELS-YOLO | 68.3 | 67.5 | 68.7 | 44.5 |
| ELS-YOLO (ratio=2.0) | 68.2 | 67.3 | 68.5 | 44.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).