Version 1
: Received: 19 January 2024 / Approved: 23 January 2024 / Online: 23 January 2024 (09:19:12 CET)
How to cite:
Yang, J.; Zhang, Q.; Chen, Y.; Luan, S. Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction. Preprints2024, 2024011672. https://doi.org/10.20944/preprints202401.1672.v1
Yang, J.; Zhang, Q.; Chen, Y.; Luan, S. Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction. Preprints 2024, 2024011672. https://doi.org/10.20944/preprints202401.1672.v1
Yang, J.; Zhang, Q.; Chen, Y.; Luan, S. Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction. Preprints2024, 2024011672. https://doi.org/10.20944/preprints202401.1672.v1
APA Style
Yang, J., Zhang, Q., Chen, Y., & Luan, S. (2024). Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction. Preprints. https://doi.org/10.20944/preprints202401.1672.v1
Chicago/Turabian Style
Yang, J., Yuhang Chen and Sitao Luan. 2024 "Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction" Preprints. https://doi.org/10.20944/preprints202401.1672.v1
Abstract
Pedestrian object detection plays a significant role in intelligent systems such as intelligent traffic and monitoring. Traditional machine learning methods on pedestrian detection have shown various drawbacks, \eg{} low accuracy, slow speed, \etc The Convolutional Neural Network (CNN) based object detection algorithms have demonstrated remarkable advantages in the field of pedestrian detection. However, the mainstream CNNs still face the problems of slow speed and low detection accuracy, especially on small and occluded targets from aerial perspective. In this paper, we propose Multi-Scale Attention YOLO (MSA-YOLO) detection algorithm to address the above issues. MSA-YOLO includes a Squeeze, Excitation and Cross Stage Partial (SECSP) channel attention module for CNNs to extract richer pedestrian features with a small number of extra parameters. It also contains a multi-scale prediction module to capture the information among different pedestrian scales, which can recognize the small objects with higher accuracy and significantly reduce the missed detection. To sufficiently evaluate our proposed model, we manually collect and annotate a new benchmark dataset, Luoyang Pedestrian Dataset \footnote{The dataset can be downloaded through this anonymous link: \url{https://drive.google.com/drive/folders/13po1FX7Qk5RDgb60-dzOi74cq_kqDJtM?usp=sharing} }, which has much more sample annotations, features, scenes and image view angles than the existing benchmark datasets. In addition, the images in our dataset have higher resolution than most of benchmark pedestrian detection datasets, which can provide more detailed features of pedestrians and thus improve the model performance. When tested on Luoyang Pedestrian Dataset, our proposed MSA-YOLO algorithm significantly outperform the most commonly used baseline models with almost the same model size. This shows the efficiency of our proposed model. (The code and new dataset will be released to the public later.)
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.