Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction

Version 1 : Received: 19 January 2024 / Approved: 23 January 2024 / Online: 23 January 2024 (09:19:12 CET)

How to cite: Yang, J.; Zhang, Q.; Chen, Y.; Luan, S. Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction. Preprints 2024, 2024011672. https://doi.org/10.20944/preprints202401.1672.v1 Yang, J.; Zhang, Q.; Chen, Y.; Luan, S. Pedestrian Detection in Aerial Image Based on Convolutional Neural Network with Attention Mechanism and Multi-scale Prediction. Preprints 2024, 2024011672. https://doi.org/10.20944/preprints202401.1672.v1

Abstract

Pedestrian object detection plays a significant role in intelligent systems such as intelligent traffic and monitoring. Traditional machine learning methods on pedestrian detection have shown various drawbacks, \eg{} low accuracy, slow speed, \etc The Convolutional Neural Network (CNN) based object detection algorithms have demonstrated remarkable advantages in the field of pedestrian detection. However, the mainstream CNNs still face the problems of slow speed and low detection accuracy, especially on small and occluded targets from aerial perspective. In this paper, we propose Multi-Scale Attention YOLO (MSA-YOLO) detection algorithm to address the above issues. MSA-YOLO includes a Squeeze, Excitation and Cross Stage Partial (SECSP) channel attention module for CNNs to extract richer pedestrian features with a small number of extra parameters. It also contains a multi-scale prediction module to capture the information among different pedestrian scales, which can recognize the small objects with higher accuracy and significantly reduce the missed detection. To sufficiently evaluate our proposed model, we manually collect and annotate a new benchmark dataset, Luoyang Pedestrian Dataset \footnote{The dataset can be downloaded through this anonymous link: \url{https://drive.google.com/drive/folders/13po1FX7Qk5RDgb60-dzOi74cq_kqDJtM?usp=sharing} }, which has much more sample annotations, features, scenes and image view angles than the existing benchmark datasets. In addition, the images in our dataset have higher resolution than most of benchmark pedestrian detection datasets, which can provide more detailed features of pedestrians and thus improve the model performance. When tested on Luoyang Pedestrian Dataset, our proposed MSA-YOLO algorithm significantly outperform the most commonly used baseline models with almost the same model size. This shows the efficiency of our proposed model. (The code and new dataset will be released to the public later.)

Keywords

pedestrian detection; aerial image; attention mechanism; multi-scale prediction; convolutional neural network; new benchmark dataset

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.