Submitted:
17 April 2025
Posted:
18 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A lightweight MSCDNet (Multi-Scale Context Detail Network) architecture that solves the computational resource constraints in resource-constrained environments.
- A Multi-Scale Fusion (MSF) module that addresses the challenge of detecting targets with significant dimensional variations, camouflage, and partial occlusion.
- A Context Merge Module (CMM) that overcomes the difficulty of integrating features from different scales for comprehensive target representation.
- A Detail Enhance Module (DEM) that preserves critical edge and texture details essential for distinguishing camouflaged targets in complex environments.
2. Related Work
2.1. Traditional Military Target Detection Methods
2.2. General Deep Learning Methods for Military Target Detection
2.3. Deep Learning Methods for Specific Target Detection
2.4. Key Challenges of Target Detection in Resource-Constrained Environments
3. Methodology
3.1. Overview of Model
3.2. Multi-Scale Fusion Modulation
3.3. Context Merge Module
3.4. Detail Enhance Module
4. Experiments
4.1. Experimental Details and Evaluation Criteria
4.2. Datasets
4.3. Ablation Study
4.4. Comparison with State-of-the-Arts
4.5. Generalization Experiments
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Statement
References
- Jia, L.; Pang, W. Overview of Battlefield Debris Data Fusion Technology for Situation Awareness. In Proceedings of the 2024 Asia-Pacific Conference on Software Engineering, Social Network Analysis and Intelligent Computing (SSAIC); 2024; pp. 474–478. [Google Scholar]
- Dehghan, M.; Khosravian, E. A Review of Cognitive UAVs: AI-Driven Situation Awareness for Enhanced Operations. 2024, 2, 54–65. [Google Scholar] [CrossRef]
- Tiwari, K.; Arora, M.; Singh, D. An assessment of independent component analysis for detection of military targets from hyperspectral images. Int. J. Appl. Earth Obs. Geoinformation 2011, 13, 730–740. [Google Scholar] [CrossRef]
- Palm, H.C.; Ajer, H.; Haavardsholm, T.V. Detection of military objects in LADAR images. 2008. [Google Scholar]
- Deveci, M.; Kuvvetli, Y.; Akyurt, İ.Z. Survey on military operations of fuzzy set theory and its applications. J. Nav. Sci. Eng. 2020, 16, 117–141. [Google Scholar]
- Riedl, J.L. CCD Sensor Array and Microprocessor Application to Military Missile Tracking. In Modern Utilization of Infrared Technology II; SPIE, 1976; Volume 95, pp. 148–154. [Google Scholar]
- Schaber, G.G. SAR Studies in the Yuma Desert, Arizona Sand Penetration, Geology, and the Detection of Military Ordnance Debris. Remote. Sens. Environ. 1999, 67, 320–347. [Google Scholar] [CrossRef]
- Lv, J.; Zhu, D.; Geng, Z.; Han, S.; Wang, Y.; Yang, W.; Ye, Z.; Zhou, T. Recognition of Deformation Military Targets in the Complex Scenes via MiniSAR Submeter Images With FASAR-Net. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 1–19. [Google Scholar] [CrossRef]
- Tiwari, K.C.; Arora, M.K.; Singh, D.P.; Yadav, D.S. Military target detection using spectrally modeled algorithms and independent component analysis. Opt. Eng. 2013, 52, 026402. [Google Scholar] [CrossRef]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision; 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision; 2017; pp. 2961–2969. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer International Publishing, 2016; pp. 21–37. [Google Scholar]
- Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
- Sun, Y.; Liu, Z.; Todorovic, S.; Li, J. Adaptive boosting for SAR automatic target recognition. IEEE Trans. Aerosp. Electron. Syst. 2007, 43, 112–125. [Google Scholar] [CrossRef]
- Zhang, L.; Shi, Z.; Wu, J. A Hierarchical Oil Tank Detector With Deep Surrounding Features for High-Resolution Optical Satellite Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2015, 8, 4895–4909. [Google Scholar] [CrossRef]
- Zhang, L.; Chu, R.; Xiang, S.; Liao, S.; Li, S.Z. Face detection based on multi-block lbp representation. In Proceedings of the Advances in Biometrics: International Conference, ICB 2007, Seoul, Korea, 27–29 August 2007; Proceedings, Part I 14. Springer: Berlin/Heidelberg, 2007; pp. 11–18. [Google Scholar]
- Pei, J.; Huang, Y.; Sun, Z.; Zhang, Y.; Yang, J.; Yeo, T.-S. Multiview Synthetic Aperture Radar Automatic Target Recognition Optimization: Modeling and Implementation. IEEE Trans. Geosci. Remote. Sens. 2018, 56, 6425–6439. [Google Scholar] [CrossRef]
- Che, J.; Fang, L.; Zhong, Z.; Su, X.; Ma, Q.; Yu, G. A survey of automatic target recognition technology based on multi-source data fusion. IET Conf. Proc. 2025, 2024, 1592–1599. [Google Scholar] [CrossRef]
- Li, Y.; Luo, Y.; Zheng, Y.; Liu, G.; Gong, J. Research on Target Image Classification in Low-Light Night Vision. Entropy 2024, 26, 882. [Google Scholar] [CrossRef] [PubMed]
- Salmon, P.M.; Lenné, M.G.; Triggs, T.; Goode, N.; Cornelissen, M.; Demczuk, V. The effects of motion on in-vehicle touch screen system operation: A battle management system case study. Transp. Res. Part F: Traffic Psychol. Behav. 2011, 14, 494–503. [Google Scholar] [CrossRef]
- Bajracharya, M.; Moghaddam, B.; Howard, A.; Brennan, S.; Matthies, L.H. A Fast Stereo-based System for Detecting and Tracking Pedestrians from a Moving Vehicle. Int. J. Robot. Res. 2009, 28, 1466–1485. [Google Scholar] [CrossRef]
- Chaves, S.M. Using Kalman filtering to improve a low-cost GPS-based collision warning system for vehicle convoys. 2010. [Google Scholar]
- Boult, T.; Micheals, R.; Gao, X.; Eckmann, M. Into the woods: visual surveillance of noncooperative and camouflaged targets in complex outdoor settings. Proc. IEEE 2001, 89, 1382–1402. [Google Scholar] [CrossRef]
- Zhao, Q. Aboveground Storage Tank Detection Using Faster R-CNN and High-Resolution Aerial Imagery. Master's thesis, Duke University, 2021. [Google Scholar]
- Peng, S. Multi-object extraction technology for complex background based on faster regions-CNN algorithm in the context of artificial intelligence. Serv. Oriented Comput. Appl. 2024, 19, 15–27. [Google Scholar] [CrossRef]
- Naz, P.; Hengy, S.; Hamery, P. Soldier detection using unattended acoustic and seismic sensors. In Ground/Air Multisensor Interoperability, Integration, and Networking for Persistent ISR III; SPIE, 2012; Volume 8389, pp. 183–194. [Google Scholar]
- Xu, D.; Wu, Y. Improved YOLO-V3 with DenseNet for Multi-Scale Remote Sensing Target Detection. Sensors 2020, 20, 4276. [Google Scholar] [CrossRef]
- Wang, H.; Qian, H.; Feng, S.; Wang, W. L-SSD: lightweight SSD target detection based on depth-separable convolution. J. Real-Time Image Process. 2024, 21, 1–15. [Google Scholar] [CrossRef]
- Wang, S.; Du, Y.; Zhao, S.; Gan, L. Multi-Scale Infrared Military Target Detection Based on 3X-FPN Feature Fusion Network. IEEE Access 2023, 11, 141585–141597. [Google Scholar] [CrossRef]
- Liu, J.; Jia, R.; Li, W.; Ma, F.; Abdullah, H.M.; Ma, H.; Mohamed, M.A. High precision detection algorithm based on improved RetinaNet for defect recognition of transmission lines. Energy Rep. 2020, 6, 2430–2440. [Google Scholar] [CrossRef]
- Pushkarenko, Y.; Zaslavskyi, V. Research on the state of areas in Ukraine affected by military actions based on remote sensing data and deep learning architectures. Radioelectron. Comput. Syst. 2024, 2024, 5–18. [Google Scholar] [CrossRef]
- Li, T.; Wang, H.; Li, G.; Liu, S.; Tang, L. SwinF: Swin Transformer with feature fusion in target detection. J. Phys. Conf. Ser. 2022, 2284, 012027. [Google Scholar] [CrossRef]
- Zhuang, X.; Li, D.; Wang, Y.; Li, K. Military target detection method based on EfficientDet and Generative Adversarial Network. Eng. Appl. Artif. Intell. 2024, 132. [Google Scholar] [CrossRef]
- Sun, Y.; Wang, J.; You, Y.; Yu, Z.; Bian, S.; Wang, E.; Wu, W. YOLO-E: a lightweight object detection algorithm for military targets. Signal, Image Video Process. 2025, 19, 1–12. [Google Scholar] [CrossRef]
- Jani, M.; Fayyad, J.; Al-Younes, Y.; Najjaran, H. Model compression methods for YOLOv5: A review. arXiv arXiv:2307.11904, 2023.
- Zhang, W.; Jiao, L.; Liu, X.; Liu, J. Multi-scale feature fusion network for object detection in VHR optical remote sensing images. IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium; IEEE, 2019; pp. 330–333. [Google Scholar]
- Fan, L.; Wang, H.; Yang, Q.; Chen, X.; Deng, B.; Zeng, Y. Fast Detection and Reconstruction of Tank Barrels Based on Component Prior and Deep Neural Network in the Terahertz Regime. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
- Ma, C.; Zhang, Y.; Guo, J.; Hu, Y.; Geng, X.; Li, F.; Lei, B.; Ding, C. End-to-End Method With Transformer for 3-D Detection of Oil Tank From Single SAR Image. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 1–19. [Google Scholar] [CrossRef]
- Song, Z.; Kang, X.; Wei, X.; Dian, R.; Liu, J.; Li, S. Multi-granularity Context Perception Network for Open Set Recognition of Camouflaged Objects. IEEE Trans. Multimedia 2024, PP, 1–14. [Google Scholar] [CrossRef]
- Naeem, W.; Sutton, R.; Xu, T. An integrated multi-sensor data fusion algorithm and autopilot implementation in an uninhabited surface craft. Ocean Eng. 2012, 39, 43–52. [Google Scholar] [CrossRef]
- Lv, J.; Zhu, D.; Geng, Z.; Han, S.; Wang, Y.; Yang, W.; Ye, Z.; Zhou, T. Recognition of Deformation Military Targets in the Complex Scenes via MiniSAR Submeter Images With FASAR-Net. IEEE Trans. Geosci. Remote. Sens. 2023, 61, 1–19. [Google Scholar] [CrossRef]
- Wu, B.; Zhou, J. Video-Based Martial Arts Combat Action Recognition and Position Detection Using Deep Learning. IEEE Access 2024, 12, 161357–161374. [Google Scholar] [CrossRef]
- Choudhary, S. Real time pixelated camouflage texture generation. Doctoral dissertation, School of Computer Science, UPES, Dehradun, 2023. [Google Scholar]
- Barnawi, A.; Budhiraja, I.; Kumar, K.; Kumar, N.; Alzahrani, B.; Almansour, A.; Noor, A. A comprehensive review on landmine detection using deep learning techniques in 5G environment: open issues and challenges. Neural Comput. Appl. 2022, 34, 21657–21676. [Google Scholar] [CrossRef]
- Anzer, G.; Bauer, P.; Brefeld, U.; Faßmeyer, D. Detection of tactical patterns using semi-supervised graph neural networks. In 16th MIT sloan sports analytics conference. 2022; 1–15. [Google Scholar]
- Wang, Q.; Fu, M.; Wang, J.; Sun, L.; Huang, R.; Li, X.; Jiang, Z.; Huang, Y.; Jiang, C. Free-walking: Pedestrian inertial navigation based on dual foot-mounted IMU. Def. Technol. 2023, 33, 573–587. [Google Scholar] [CrossRef]
- Jocher, G. YOLO11. 2024. Available online: https://github.com/ultralytics/ultralytics/tree/main.
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Computer Vision—ECCV 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal QC, Canada, 10–17 October 2021; pp. 3490–3499. [Google Scholar]
- Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. Rtmdet: An empirical study of designing real-time object detectors. arXiv arXiv:2212.07784, 2022.
- Peng, Y.; Li, H.; Wu, P.; Zhang, Y.; Sun, X.; Wu, F. D-FINE: redefine regression Task in DETRs as Fine-grained distribution refinement. arXiv arXiv:2410.13842, 2024.
- Huang, S.; Lu, Z.; Cun, X.; Yu, Y.; Zhou, X.; Shen, X. DEIM: DETR with Improved Matching for Fast Convergence. arXiv arXiv:2412.04234, 2024.
- Jocher, G.; Stoken, A.; Borovec, J.; Changyu, L.; Hogan, A.; Diaconu, L.; Dave, P. ultralytics/yolov5: v3. 0. Zenodo. 2020. [Google Scholar]
- Glenn, J. YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics/tree/main.
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv arXiv:2405.14458, 2024.
- Ma, X.; Dai, X.; Bai, Y.; Wang, Y.; Fu, Y. Rewrite the stars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2024; pp. 5694–5703. [Google Scholar]
- Yu, W.; Zhou, P.; Yan, S.; Wang, X. Inceptionnext: When inception meets convnext. In Proceedings of the IEEE/cvf conference on computer vision and pattern recognition; 2024; pp. 5672–5683. [Google Scholar]
- Feng, Y.; Huang, J.; Du, S.; Ying, S.; Yong, J.-H.; Li, Y.; Ding, G.; Ji, R.; Gao, Y. Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv arXiv:2206.02424, 10, 2022.
- Chen, J.; Mai, H.; Luo, L.; Chen, X.; Wu, K. Effective feature fusion network in BIFPN for small object detection. 2021 IEEE international conference on image processing (ICIP); IEEE, 2021; pp. 699–703. [Google Scholar]
- Yang, Z.; Guan, Q.; Zhao, K.; Yang, J.; Xu, X.; Long, H.; Tang, Y. Multi-branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for Accurate Object Detection. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV); Springer Nature Singapore: Singapore, 2024; pp. 492–505. [Google Scholar]
- Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Zhang, L. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF international conference on computer vision workshops; 2019; pp. 0–0. [Google Scholar]
- Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Darrell, T. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020; pp. 2636–2645. [Google Scholar]










| YOLO11n | MSFM | CMM | DEM | Param (M) |
FLOPs (G) |
P (%) |
R (%) |
mAP50 (%) |
mAP50-95 (%) |
|---|---|---|---|---|---|---|---|---|---|
| 2.58 | 6.3 | 81.8 | 66.5 | 71.5 | 38.2 | ||||
| 2.53 | 6.3 | 84.9 | 65.8 | 72.5 | 39.1 | ||||
| 2.59 | 6.3 | 82.4 | 67.8 | 72.6 | 38.6 | ||||
| 2.26 | 6.0 | 83.4 | 67.6 | 72.5 | 39.6 | ||||
| 2.54 | 6.3 | 85.5 | 66.7 | 73.7 | 39.5 | ||||
| 2.21 | 6.0 | 84.1 | 67.9 | 73.6 | 38.9 | ||||
| 2.26 | 6.0 | 83.7 | 67.0 | 73.2 | 39.6 | ||||
| 2.22 | 6.0 | 86.1 | 68.1 | 74.7 | 40.1 |
| Model | Precision (%) |
Recall (%) |
mAP50 (%) |
mAP50-95 (%) |
Flops (G) |
Param (M) |
|---|---|---|---|---|---|---|
| SSD | 76.2 | 58.3 | 63.7 | 31.8 | 31.2 | 27.1 |
| DETR | 82.4 | 61.5 | 67.9 | 35.2 | 95.2 | 44.0 |
| TOOD | 83.5 | 63.7 | 70.2 | 37.1 | 199 | 32.04 |
| RTMDet-Tiny | 83.2 | 64.3 | 70.5 | 37.3 | 8.03 | 4.87 |
| DFINE-n | 82.8 | 64.5 | 70.6 | 37.5 | 7.12 | 3.73 |
| DEIM-n | 83.0 | 64.6 | 70.7 | 37.5 | 7.12 | 3.73 |
| YOLOv5n | 85.3 | 64.7 | 70.8 | 37.6 | 5.80 | 2.18 |
| YOLOv8n | 83.0 | 64.9 | 71.4 | 37.8 | 8.1 | 3.0 |
| YOLOv10n | 83.9 | 63.8 | 70.8 | 38.3 | 8.2 | 2.69 |
| YOLOv11n | 81.8 | 66.5 | 71.5 | 38.2 | 6.3 | 2.58 |
| Our | 86.1 | 68.1 | 74.7 | 40.1 | 6.0 | 2.22 |
| Model | Precision (%) |
Recall (%) |
mAP50 (%) |
mAP50-95 (%) |
Flops (G) |
Param (M) |
|---|---|---|---|---|---|---|
| C3K2 | 81.8 | 66.5 | 71.5 | 38.2 | 6.3 | 2.58 |
| C3k2-Star [58] | 82.2 | 61.5 | 68.3 | 35.5 | 6.4 | 2.47 |
| C3k2-IDWC [59] | 83.0 | 65.0 | 71.4 | 37.3 | 6.1 | 2.39 |
| MAN [60] | 83.2 | 62.2 | 70.2 | 37.1 | 8.4 | 3.77 |
| MSFM | 84.9 | 65.8 | 72.5 | 39.1 | 6.3 | 2.53 |
| Model | Precision (%) |
Recall (%) |
mAP50 (%) |
mAP50-95 (%) |
Flops (G) |
Param (M) |
|---|---|---|---|---|---|---|
| Slimneck[61] | 81.8 | 65.2 | 70.1 | 37.0 | 5.9 | 2.57 |
| BiFPN[62] | 82.0 | 66.4 | 70.9 | 37.6 | 6.3 | 1.92 |
| MAFPN[63] | 84.0 | 65.1 | 70.6 | 37.5 | 7.1 | 2.69 |
| CMM | 82.4 | 67.8 | 72.6 | 38.6 | 6.3 | 2.59 |
| Dataset | Model | Precision (%) | Recall (%) | mAP50 (%) | mAP50-95 (%) |
|---|---|---|---|---|---|
| VisDrone | YOLO11n | 45.5 | 33.4 | 33.7 | 19.6 |
| Our | 45.7 | 35.1 | 34.8 | 20.1 | |
| BDD100K | YOLO11n | 58.8 | 41.5 | 42.5 | 27.8 |
| Our | 61.1 | 40.0 | 43.6 | 29.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).