Submitted:
03 July 2025
Posted:
03 July 2025
You are already at the latest version
Abstract
Keywords:
1. Background & Summary
- Scarcity of high-quality annotated datasets: While datasets like RailSem19 [1] offer semantic annotations for track scenes, they only cover macro-level categories (e.g., tracks, signals), lacking fine-grained annotations for turnout components. Similarly, CrackTree [2] focuses on rail surface defect detection but excludes structural information specific to turnouts. Existing public datasets remain deficient in pixel-level annotations for functional turnout components, hindering models from capturing key features such as switch rail close-fitting surface deformations [3] and rod connection statuses.
- Inadequate robustness in complex environments: Real-world scenarios involve dynamic challenges like lighting variations (e.g., tunnel reflections) and transient occlusions (e.g., component blockages during train passage), which significantly degrade detection accuracy. Most existing methods rely on laboratory-trained data, exhibiting limited generalization in operational environments. Robust algorithm development therefore requires standardized datasets incorporating multi-environmental variables.
- Switch Rail: Movable track component enabling direction changes, with annotations focusing on critical regions (close-fitting surfaces, rail head profiles);
- Stock Rail: Fixed track component, emphasizing the gauge maintenance area interacting with the switch rail;
- Tie Rod (Switch Machine Rod): Transmission component connecting switch rails to point machines, with precise boundary delineation of connection nodes (pins, bolts);
- Switch Machine: Electromechanical device driving switch rail movement, including detailed contours of moving parts (gear sets, locking rods), adhering to the annotation standards of TB/T 2478-2020 Railway Turnout Conversion Equipment.
2. Methods
2.1. Dataset Acquisition
2.2. Dataset Processing
- Images were uniformly resized to 1280×1280 pixels using a bicubic interpolation algorithm to standardize spatial dimensions;
- Illumination normalization was performed via CLAHE (contrast-limited adaptive histogram equalization), dynamically adjusting brightness and contrast to mitigate variations caused by all-weather conditions—effectively normalizing low-contrast overcast scenes and high-reflectance sunny environments alike;
- A non-local mean denoising algorithm was applied to eliminate motion blur induced by passing trains, enhancing edge clarity through adaptive pixel similarity weighting.
2.3. Dataset Labelling
- For components with clear contours (e.g., switch machines), boundaries were precisely outlined using polygonal tools;
- For complex structural regions such as rod connection systems, a hierarchical annotation strategy was adopted to avoid:
- Blue: Switch Machine
- Yellow: Tie Rod (Switch Machine Rod)
- Purple: Switch Rail
- Green: Stock Rail
3. Data Records
- Hierarchical Storage Structure: Three primary modules are established under the root directory: JPEGImages (stores source *.jpg images), SegmentationClass (contains *.png annotation files), and label_mapping.txt (defines semantic mappings). Both JPEGImages and SegmentationClass directories incorporate subdirectories categorizing standard data into three distinct partitions: train, val, and test;
- Anti-bias Naming Strategy: All files employ cryptographically secure UUID-generated 8-bit hash prefixes, guaranteeing precise image-annotation file matching while neutralizing potential influences of filename sequencing on model training;
- Semantic Encoding Framework: The label_mapping.txt utilizes an ID: Class format to define a five-tier classification system (0: [background], 1: [Switch Machine], 2: [Switch Rail], 3: [Stock Rail], 4: [Tie_Rail]). Single-channel PNG annotation files directly map pixel grayscale values to semantic spaces, enabling pixel-perfect interpretation through this comprehensive encoding protocol.
4. Technical Validation
4.1. Evaluation
4.2. Modelling Assessment
4.3. Quantitative Analysis
-
Key ObservationsUNet: Achieved moderate performance on Point Machine (IoU: 0.8781; boundary F1: 0.8423) and Switch Rail (IoU: 0.7915; boundary F1: 0.7836), but exhibited limitations on Stock Rail (IoU: 0.7048) and Tie Rod (boundary F1: 0.6807), yielding a global mIoU of 0.7736.UNet++: Outperformed all models with superior IoU and boundary F1 scores across components, notably excelling in Point Machine (IoU: 0.9503; boundary F1: 0.9201) and Switch Rail (IoU: 0.8112; boundary F1: 0.8048). Despite achieving the highest mIoU (0.8306), its boundary detection for Tie Rod remained suboptimal (boundary F1: 0.7142), highlighting challenges in segmenting slender structures.UNet3+: Demonstrated comparable performance to UNet (mIoU: 0.7731) but showed degraded boundary precision for Tie Rod (boundary F1: 0.6794), suggesting sensitivity to structural complexity.UNetv2: Delivered marginal improvements over UNet (mIoU: 0.7738) but struggled with boundary delineation in Stock Rail (boundary F1: 0.7015) and Tie Rod (boundary F1: 0.6857).
-
Critical InsightsThe superior performance of UNet++ underscores the effectiveness of nested skip connections in capturing multi-scale features for railway infrastructure.All models exhibited degraded performance on Tie Rod segmentation (boundary F1 max 0.72), emphasizing the need for specialized architectures to address elongated and occluded components.The mIoU gap between UNet++ (0.8306) and other models (max 0.7738) highlights the importance of hierarchical feature fusion in complex railway environments.
5. Code Availability
Author Contributions
Acknowledgments
Conflicts of Interest
References
- Zendel, O.; Murschitz, M.; Zeilinger, M.; Steininger, D.; Abbasi, S.; Beleznai, C. RailSem19: A Dataset for Semantic Rail Scene Understanding. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, pp. 1221–1229. [CrossRef]
- Zou, Q.; Cao, Y.; Li, Q.; Mao, Q.; Wang, S. CrackTree: Automatic crack detection from pavement images. Pattern Recognition Letters 2012, 33, 227–238. [CrossRef]
- fengweidong.; zhangjiwei.; xujianjian.; liuyuansheng.; fanjianjun. single turnout for coal mine roadway.
- Sepasian, M.; Balachandran, W.; Mares, C. Image enhancement for fingerprint minutiae-based algorithms using CLAHE, standard deviation analysis and sliding neighborhood. In Proceedings of the Proceedings of the World congress on Engineering and Computer Science, 2008, pp. 22–24.
- Setiawan, A.W.; Mengko, T.R.; Santoso, O.S.; Suksmono, A.B. Color retinal image enhancement using CLAHE. In Proceedings of the International conference on ICT for smart society. IEEE, 2013, pp. 1–3.
- Qian, Z.; MI, G.; Zhu, Y.; lifeI, W. research on deep learning model for high speed railway freight station location. Journal of Railways 2024, 46, 36–45.
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015; Navab, N.; Hornegger, J.; Wells, W.M.; Frangi, A.F., Eds., Cham, 2015; pp. 234–241.
- Yiming, O.; Jian, D.; Jianhua, L.; Huaguo, L.; Zhengfeng, H.; Gaoming, D. A Fine-Grained Fault-Tolerant Design of Crossbar Based on Path Diversity in Network-on-Chip. Journal of Computer-Aided Design & Computer Graphics 2017, 29, 180–188,210.
- Wang, X.; Wang, Y.; Zhou, J.; Liu, J.; Gao, Y.; Wang, Y.; Zheng, J. An unsupervised learning method based on U-Net++ for low-light image enhancement. Signal, Image and Video Processing 2025, 19, 282.
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. IEEE Transactions on Medical Imaging 2020, 39, 1856–1867. [CrossRef]
- Xinyue, M.; Huifeng, W.; Jiale, Z.; Zhen, G.; Bingyong, Y. Semantic Segmentation of Phased Array Ultrasound Images Based on Improved U-Net3+. Journal of East China University of Science and Technology 2025, 51, 242–249. [CrossRef]
- Peng, Y.; Sonka, M.; Chen, D.Z. U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation, 2024, [arXiv:eess.IV/2311.17791].
- Lian, L.; Cao, Z.; Qin, Y.; Gao, Y.; Bai, J.; Yu, H.; Jia, L. Densely Multiscale Fusion Network for Lightweight and Accurate Semantic Segmentation of Railway Scenes. IEEE Transactions on Instrumentation and Measurement 2024, 73, 1–11. [CrossRef]




| Parameter | Details |
| Camera Type | 4K industrial camera with manual optical zoom (focal length 5–50 mm, 10× zoom) |
| Resolution | 3840×2160 pixels (8.3-megapixel) |
| Image Sensor | Sony IMX415 (1/2.8-inch) |
| Interface | USB 3.0 + HDMI v1.4 (dual-port synchronous output) |
| Exposure Control | Dynamic ISO adjustment (100–3200), exposure time 1/80 s–1/200 s |
| Model Parameters | Value |
| epochs | 1e5 |
| patience | 20 |
| batch size | 32 |
| learning rate | 1e-5 |
| scale | 0.5 |
| models | Switch Machine | Switch Rail | Stock Rail | Switch Rod | mIoU | ||||
| IoU | boundary f1 | IoU | boundary f1 | IoU | boundary f1 | IoU | boundary f1 | ||
| UNet | 0.8788 | 0.7379 | 0.7597 | 0.7428 | 0.6886 | 0.7167 | 0.7672 | 0.6807 | 0.7736 |
| UNet++ | 0.9503 | 0.8008 | 0.8112 | 0.7837 | 0.7424 | 0.7615 | 0.8186 | 0.7142 | 0.8306 |
| UNet3+ | 0.878 | 0.7345 | 0.7509 | 0.7365 | 0.6932 | 0.7187 | 0.7703 | 0.6794 | 0.7731 |
| UNetv2 | 0.8769 | 0.7227 | 0.7594 | 0.7398 | 0.6919 | 0.7158 | 0.7668 | 0.6857 | 0.7738 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).