Submitted:
06 February 2025
Posted:
07 February 2025
You are already at the latest version
Abstract
To enable accurate and efficient real-time detection of rail fastener defects under resource-constrained environments, we propose DP-YOLO, an advanced lightweight algorithm based on YOLOv5s with four key optimizations. First, we design a Depthwise Separable Convolution Stage Partial (DSP) module that integrates depthwise separable convolution with a CSP residual connection strategy, reducing model parameters while enhancing recognition accuracy. Second, we introduce a Position-Sensitive Channel Attention (PSCA) mechanism, which calculates spatial statistics (mean and standard deviation) across height and width dimensions for each channel feature map. These statistics are multiplied across corresponding dimensions to generate channel-specific weights, enabling dynamic feature recalibration. Third, the Neck network adopts a GhostC3 structure, which reduces redundancy through linear operations, further minimizing computational costs. Fourth, to improve multi-scale adaptability, we replace the standard loss function with Alpha-IoU, enhancing model robustness. Experiments on the augmented Roboflow Universe Fastener-defect-detection Dataset demonstrate DP-YOLO’s effectiveness: it achieves 87.1% detection accuracy, surpassing the original YOLOv5s by 1.3% in mAP0.5 and 2.1% in mAP0.5:0.95. Additionally, the optimized architecture reduces parameters by 1.3% and computational load by 15.19%. These results validate DP-YOLO’s practical value for resource-efficient, high-precision defect detection in railway maintenance systems.
Keywords:
1. Introduction
2. The Basic Framework of YOLOv5
3. DP-YOLO Network Module
3.1. Design of the DSP Module Based on Depthwise Separable Convolution
3.2. PSCA Attention Mechanism
- (1)
- 1D Convolutional Transformation: Use 1D convolution (Conv1d) to transform , , , and to generate intermediate feature maps:
- (2)
- Activation Function: Apply an activation function (such as ReLU or SiLU) to the intermediate feature maps to enhance the non-linearity of the features. Here, ,, r denotes the downsampling ratio, which is used to control the size of the module.
- (3)
- Fusion Learning: Combine the intermediate feature maps from two directions and the standard deviation feature maps to generate the final channel weight map:
- (4)
- Channel Weighting: Use the generated channel weight map to weight the original feature map to generate the final feature map:
3.3. GhostC3 Module
3.4. Alpha-IoU Loss
4. Experiments and Analysis
4.1. Railway Track Fastener Defect Detection Dataset and Evaluation Criteria
4.2. Experimental Environment and Parameter Setting
4.3. Experimental Results and Analysis
4.3.1. Ablation Experiment
4.3.2. Comparison of Experimental Results of Different Algorithms
5. Conclusion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Song, W.; Liao, B.; Ning, K.; Yan, X. Improved Real-Time Detection Transformer-Based Rail Fastener Defect Detection Algorithm. Mathematics 2024, 12, 3349. [CrossRef]
- Zhao, P.; Xu, B.P.; Yan, S.; et al. A scene text detection based on dual-path feature fusion. Control and Decision 2021, 36, 2179–2186.
- Girshick, R.; Donahue, J.; Darrell, T.; et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014; pp. 580–587. [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015; pp. 1440–1448. [CrossRef]
- Ren, S.Q.; He, K.M.; Girshick, R.; et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017, 39, 1137–1149. [CrossRef]
- He, K.; Gkioxari, G.; Dollár, P.; et al. Mask R-CNN. In Proceedings of the Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Venice, 2017; pp. 2961–2969. [CrossRef]
- Cai, Z.W.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018; pp. 6154–6162. [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; et al. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016; pp. 779–788. [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017; pp. 6517–6525. [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D. SSD: Single shot MultiBox detector. In Proceedings of the Proceedings of European Conference on Computer Vision, Amsterdam, 2016; Vol. 9905, pp. 21–37. [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. https://arxiv.org/abs/1804.02767, 2018. Accessed: 2020-02-20.
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 3–8. [CrossRef]
- Zhu, X.; Lyu, S.; Wang, X.; et al. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. arXiv 2021. [CrossRef]
- Wei, F.; Zhou, J.P.; Tan, X.; et al. Lightweight YOLOv5 detection algorithm for low-altitude micro UAV. Journal of Optoelectronics·Laser 2024, 35, 641–649. [CrossRef]
- Chen, G.Y.; Wang, X.J.; Li, X.H. Lightweight YOLOv5 pedestrian detection algorithm based on pixel difference attention. Computer Engineering and Applications 2024, pp. 1–11.
- Zou, X.; Peng, T.; Zhou, Y. UAV-Based Human Detection With Visible-Thermal Fused YOLOv5 Network. IEEE Transactions on Industrial Informatics 2023, 99, 1–10.
- Roboflow. Fastener Defect Detection. https://universe.roboflow.com/learning-dvrz6/fastener-defect-detection, 2023.
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; et al. CSPNet: a new backbone that can enhance learning capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 2020; pp. 1571–1580. [CrossRef]
- Liu, S.; Qi, L.; Qin, H.F.; et al. Path aggregation network for instance segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018; pp. 8759–8768. [CrossRef]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [CrossRef]
- L, Y.; S, Z.; T, Y.; H, N.; et al. NAM: Normalization-based Attention Module. CoRR 2021, abs/2111.12419. [CrossRef]
- Li, C.; Zhou, A.; Yao, M.; et al. Omni-dimensional dynamic convolution. arXiv 2022. https://arxiv.org/abs/2202.08576.
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q.; et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the Computer Vision and Pattern Recognition, 2020, pp. 11531–11539. [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. Proceedings - IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2021, abs/2103.02907, 13713–13722. [CrossRef]
- Woo, S.; Park, J.; Lee, J.Y.; In, S.K.; et al. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, 2018; pp. 3–19. [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [CrossRef]
- He, J.; Erfani, S.; Ma, X.; et al. Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression. arXiv 2021. [CrossRef]
- Ge, Z.; Liu, S.; Wang, F.; et al. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021. [CrossRef]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv preprint arXiv:1904.07850 2019. [CrossRef]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7464–7475. [CrossRef]
- Gilroy, S.; Glavin, M.; Jones, E.; et al. An Objective Method for Pedestrian Occlusion Level Classification. arXiv 2022. [CrossRef]










| Attention mechanism | mAP0.5% | Parameters/M | GFLOPs | FPS |
|---|---|---|---|---|
| NAM[23] | 85.2 | 7.06 | 15.8 | 309 |
| ECA[24] | 85.9 | 7.06 | 15.8 | 305 |
| CA[25] | 86.0 | 7.04 | 15.8 | 306 |
| CBAM[26] | 86.2 | 8.22 | 15.8 | 298 |
| SE[27] | 86.0 | 7.04 | 15.8 | 310 |
| PSCA | 86.4 | 7.04 | 15.8 | 315 |
| Algorithm | mAP0.5/% | mAP0.5:0.95/% |
|---|---|---|
| YOLOv5+IoU loss | 85.8 | 55.5 |
| YOLOv5+GIoU loss | 85.8 | 55.5 |
| YOLOv5+DIoU loss | 86.1 | 55.9 |
| YOLOv5+CIoU loss(default) | 86.1 | 55.9 |
| YOLOv5+SIoU loss | 86.2 | 56.1 |
| YOLOv5+Alpha-IOU loss | 86.2 | 56.8 |
| Improved scheme | mAP0.5/% | mAP0.5:0.95/% | Parameters/M | GFLOPs | FPS | Model size (MB) | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model | PSCA | DSP | C3Ghost | Alpha-IoU | ||||||
| A | × | × | × | × | 0.858 | 0.555 | 7.02 | 15.8 | 310 | 14.4 |
| B | × | × | × | 0.858 | 0.568 | 7.01 | 15.8 | 310 | 13.7 | |
| C | × | × | × | 0.863 | 0.561 | 6.89 | 13.4 | 302 | 13.5 | |
| D | × | × | × | 0.864 | 0.562 | 7.04 | 15.8 | 315 | 13.7 | |
| E | × | 0.867 | 0.574 | 7.04 | 15.8 | 303 | 13.7 | |||
| F | × | × | 0.869 | 0.564 | 6.89 | 13.4 | 306 | 13.5 | ||
| H | 0.871 | 0.576 | 6.92 | 13.4 | 315 | 13.5 | ||||
| Model | Image-size | mAP0.5/% | ||||||
|---|---|---|---|---|---|---|---|---|
| All | fastener | fastener-2 | fastener_broken | fastener2_broken | trackbed_stuff | missing | ||
| Faster-RCNN | 640×640 | 50.2 | 61.5 | 49.6 | 58.4 | 64.6 | 31.5 | 98.2 |
| Cascade R-CNN | 640×640 | 69.3 | 72.3 | 65.4 | 79.6 | 75.5 | 40.4 | 99.3 |
| SSD | 512×512 | 60.1 | 72.2 | 58.4 | 68.4 | 69.9 | 32.4 | 96.8 |
| YOLOX | 640×640 | 80.3 | 92.4 | 74.3 | 90.5 | 91.2 | 40.3 | 98.4 |
| CenterNet | 640×640 | 86.9 | 94.7 | 83.6 | 91.1 | 97.4 | 42.7 | 98.4 |
| YOLOv7 | 640×640 | 87.1 | 97.6 | 85.7 | 92.3 | 99.6 | 46.6 | 97.3 |
| YOLOv5s(Baseline) | 640×640 | 85.8 | 97.2 | 85.3 | 92.0 | 99.5 | 43.5 | 97.1 |
| DP-YOLO(ours) | 640×640 | 87.1 | 97.9 | 86.7 | 92.9 | 99.6 | 47.3 | 96.2 |
| Model | mAP0.5/% | ||||||
|---|---|---|---|---|---|---|---|
| All | fastener | fastener-2 | fastener_broken | fastener2_broken | trackbed_stuff | missing | |
| YOLOv5s(Baseline) | 85.8 | 97.2 | 85.3 | 92.0 | 99.5 | 43.5 | 97.1 |
| YOLOv5-Mobilev3s | 81.1 | 92.4 | 80.2 | 88.4 | 96.2 | 39.6 | 98.6 |
| YOLOv5-Mobilev3l | 82.3 | 93.3 | 81.3 | 89.4 | 97.8 | 40.5 | 98.3 |
| YOLOv5-ShuffleNet | 81.4 | 92.6 | 80.4 | 88.7 | 96.6 | 40.1 | 98.5 |
| YOLOv5-Ghost | 85.5 | 96.9 | 83.3 | 92.3 | 99.1 | 42.6 | 98.2 |
| YOLOv3-Tony | 73.9 | 84.3 | 71.9 | 81.3 | 84.2 | 34.7 | 99.5 |
| DP-YOLO(ours) | 87.1 | 97.9 | 86.7 | 92.9 | 99.6 | 47.3 | 96.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).