Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

YOLOAX: YOLOX with Multi-Dimensional Attention For Real-Time Object Detection

Version 1 : Received: 21 May 2024 / Approved: 22 May 2024 / Online: 22 May 2024 (10:18:21 CEST)

How to cite: Chen, J.; Xu, K.; Ning, Y.; Xu, Z. YOLOAX: YOLOX with Multi-Dimensional Attention For Real-Time Object Detection. Preprints 2024, 2024051433. https://doi.org/10.20944/preprints202405.1433.v1 Chen, J.; Xu, K.; Ning, Y.; Xu, Z. YOLOAX: YOLOX with Multi-Dimensional Attention For Real-Time Object Detection. Preprints 2024, 2024051433. https://doi.org/10.20944/preprints202405.1433.v1

Abstract

Real-time object detection remains a pivotal topic in the realm of computer vision. Balancing the accuracy and speed of object detectors poses a formidable challenge for both academic researchers and industry practitioners. While recent transformer-based models have showncase the prowess of the attention mechanism, delivering remarkable performance improvements over CNNs, their computational demands can hinder their effectiveness in real-time detection tasks. In this paper, we elect to use YOLOX as our robust starting point and introduce a series of effective enhancements, culminating in a new high-performance detector named YOLOAX. To further exploit the power of the attention mechanism, we devise multi-dimensional attention-based modules that activate CNNs, emphasizing the most pertinent regions and bolstering the capacity to learn the most informative image representations from feature maps. Moreover, we introduce a novel leading label assignment strategy called STA, along with a groundbreaking loss function named GEIOU Loss, to further refine our detector's performance. We provide extensive ablation studies on the COCO and PASCAL VOC 2012 detection datasets to validate our proposed methods. Remarkably, our YOLOAX is trained solely on the MS-COCO dataset from scratch, without leveraging any prior knowledge, which achieves an impressive 54.2% AP on the COCO 2017 test set while maintaining a real-time speed of 72.3 fps, surpassing YOLOX by a significant margin of 3.0% AP. Our source code and pre-trained models are openly accessible at https://github.com/KejianXu/yoloax.

Keywords

real-time object detection; information theory; attention mechanism; Region boosting; label assignment; information loss

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.