Submitted:
03 February 2025
Posted:
04 February 2025
You are already at the latest version
Abstract
Public transportation plays a crucial role in our lives, and the road network is a vital component in the implementation of smart cities. Recent advancements in AI have enabled the development of advanced monitoring systems capable of detecting anomalies in road surfaces and road signs, which can lead to serious accidents. This paper presents an innovative approach to enhance road safety through the detection and classification of traffic signs and road surface damage using advanced deep learning techniques (CNN). This integrated approach supports proactive maintenance strategies, improving road safety and resource allocation for the Molise region and the city of Campobasso. The resulting system, developed as part of the CTE Molise research project funded by the Italian Minister of Economic Growth (MIMIT), leverages cutting-edge technologies such as Cloud computing and High Performance Computing with GPU utilization. It serves as a valuable tool for municipalities, for the quick detection of anomalies and the prompt organization of maintenance operations.
Keywords:
1. Introduction
2. Literature Review
2.1. Traffic Sign Detection and Classification
2.1.1. Seminal Papers
2.1.2. YOLO Architecture
2.1.3. Enhancements in YOLO
2.1.4. Traffic Sign Classification
2.1.5. Traffic Sign Damage Classification
2.1.6. Generative AI
2.2. Road Damage Detection
2.2.1. Dataset Utilization
2.2.2. Deep Learning Approaches
2.3. GPS Data
2.3.1. Integration of GPS Data
2.4. Additional Insights from Recent Advances
- A comprehensive review of state-of-the-art traffic sign recognition work, categorizing studies into conventional machine learning and deep learning approaches.
- Discussion of widely adopted traffic sign recognition datasets, their challenges, and limitations, as well as future research prospects in this field.
- Emphasis on the importance of diverse datasets for improving model generalization and robustness.
2.5. Summary
3. Methodology
3.1. First Datasets
Mapillary
- Classes: 401
- Images: 41,906
- Size: 32.8 GB
RDD 2022
- Classes: 4
- Images: 34,007
- Size: 9.6 GB
3.2. First Phase
- Data Augmentation: This involves applying various transformations to the train- ing images, such as rotations, scaling, flipping, and color adjustments. These techniques help to increase the diversity of the training data and make the model more robust to different conditions.
- Normalization: Image pixel values are scaled to a standard range, typically be- tween 0 and 1, to ensure uniformity and improve the convergence of the modelduring training.
- Label Smoothing: This technique is used to reduce overfitting by softening the hard labels in the training data, making the model less confident in its predictions and improving generalization.
- Anchor Box Calculation: Custom anchor boxes are computed based on the dataset to improve the detection accuracy of the YOLO model, especially for objects of various sizes.
YOLOv8s for Road Surface Damages Detection
- Pretrained: Yes
- Epochs: 160
- Image Size: 640
- Patience: 100
- Cache: RAM
- Device: GPU
- Batch Size: 64
YOLOv8x for Road Signs Detection
- Pretrained: Yes
- Epochs: 100
- Image Size: 640
- Patience: 100
- Cache: RAM
- Device: GPU
- Batch Size: Auto
3.3. First Step
- Signs covered with spray-painted graffiti
- Signs covered with stickers
- Bent or physically damaged signs
- Rusty signs
- Focal Loss: This loss function is designed to handle class imbalance by assigning more weight to hard-to-classify examples, reducing the impact of easily classified examples, and improving model performance on imbalanced data.
- Cutout Regularization: This technique involves randomly removing sections of the image during training. It helps improve model robustness and prevent overfitting, thereby enhancing the model’s ability to generalize to new data.
3.4. Second Step
- Input Layer: Accepts images of size 128x128x3 (height, width, color channels).
-
Convolutional Layers with Attention:
- –
- Three convolutional blocks, each with a 2D convolutional layer followed by
- –
- Channel Attention module in each block to focus on the most relevant feature channels.
- –
- Spatial Attention module in each block to emphasize the most important spatial regions.
- –
- Max-Pooling layer after each attention-enhanced convolutional block to reduce the spatial dimensions.
-
Fully Connected Layers:
- –
- A Flatten layer to convert the 2D matrix data to a vector.
- –
- A Dense layer with 256 units and ReLU activation.
- –
- A Dropout layer with a dropout rate of 0.5 to prevent overfitting.
- Output Layer: A single Dense layer with a sigmoid activation function for binary classification.
3.4.1. Attention Mechanisms
- Channel Attention: Enhances feature maps by focusing on significant channels, allowing the model to emphasize important features such as small damages or scratches.
- Spatial Attention: Highlights crucial spatial regions in the image, improving the model’s ability to detect subtle anomalies on road signs.
3.4.2. Data Augmentation and Regularization
- Horizontal and vertical flips.
- Random rotations, width and height shifts.
- Shear and zoom transformations.
3.4.3. Training and Evaluation
- Training the model for 10 epochs with a batch size of 32.
- Using an 80-20 split for training and validation data to ensure a balanced evaluation.
- Employing a learning rate reduction technique with the ReduceLROnPlateau call- back, which decreased the learning rate by a factor of 0.5 if the validation accuracy did not improve for 3 consecutive epochs.
- Saving the best-performing model during training based on validation accuracy using the ModelCheckpoint callback.
CNN
3.5. Gen AI
- Pretrained: Yes
- Base Model: Stable Diffusion v2.1
- Epochs: 50
- Image Size: 512 × 512
- Patience: 10
- Cache: RAM
- Device: GPU
- Batch Size: 32
- Conditioning Method: Image + Text Prompt Encoding
- Optimization: AdamW
- Learning Rate: 5e-5 (Cosine Annealing Scheduler)
- Loss Functions: Contrastive Loss + Perceptual Loss (LPIPS)
- Damage Types: Graffiti, rust, stickers, physical deformations, fading
4. Computational Experiments
4.1. Computational Characteristics
-
GPU: NVIDIA Tesla T4
- –
- CUDA Cores: 2560
- –
- Tensor Cores: 320
- –
- GPU Memory: 16 GB GDDR6
- –
- Memory Bandwidth: 320 GB/s
- –
- Performance: Up to 8.1 TFLOPS (FP32)
-
CPU: Intel(R) Xeon(R) CPU
- –
- vCPUs: 2 (Base Frequency: 2.3 GHz)
- − RAM: 12.7 GB available in the Colab environment
- − Disk: 100 GB available storage
- RAM: 32 GB
- CPU: 24 vCPUs (Base Frequency: 2.5 GHz)
4.2. Metrics for Performance Evaluation
4.2.1. YOLOv8x Accuracy
- mAP50: Mean Average Precision at 50% IoU threshold.
- mAP50-95: Mean Average Precision averaged over IoU thresholds from 50
- Precision: The ratio of true positive detections to the total number of positive detections (true positives + false positives).
- Recall: The ratio of true positive detections to the total number of actual positives (true positives + false negatives).
- The mAP50 metric (blue line) shows a steady improvement, stabilizing around 0.9, indicating a high level of accuracy for the model in detecting objects with a 50% IoU threshold.
- The mAP50-95 metric (orange line) improves gradually, reflecting the model’s performance across a wider range of IoU thresholds. It stabilizes around 0.7, show- casing the model’s robustness in varying detection conditions.
- Precision (cyan line) shows fluctuations but generally trends upwards, indicating improvements in the model’s ability to reduce false positives over time.
- Recall (magenta line) also improves and stabilizes around 0.8, demonstrating the model’s effectiveness in capturing most of the actual positive instances.
4.2.2. YOLOv8x Box Loss
4.2.3. YOLOv8x Object Loss
- Accuracy: The accuracy metrics for YOLOv8s, as shown in Figure 7, indicate steady improvement over epochs, with metrics such as mAP50, mAP50-95, precision, and recall showing consistent performance gains.
- Box Loss: As shown in Figure 8, the box loss decreases over time, indicating improved precision in predicting the bounding box locations for road damages.
- Object Loss: The object loss, depicted in Figure 9, shows a downward trend, demonstrating enhanced capability in distinguishing between damaged and undamaged road surfaces.
4.2.4. CNN

- High Training Accuracy and Precision: The training accuracy metric (blue line) starts low and increases rapidly within the first few epochs. After the initial increase, the accuracy stabilizes around a high value, indicating that the model is effectively learning the training data and making correct predictions and positive identifications. Training precision (green line) remains relatively high, though it fluctuates, suggesting some variability in the learning process regarding positive predictions.
- Validation Metrics: The validation accuracy metric (orange line) starts low and increases over the first few epochs, stabilizing at a value slightly lower than the training accuracy. This indicates that the model is generalizing reasonably well to the validation set without overfitting too much. Validation precision (red line) exhibits significant fluctuation throughout the epochs, likely due to dataset imbal ance and its limited size, which is understandable in this preliminary testing phase. Both training and validation recall (purple and pink lines, respectively) are high and fairly stable, indicating that the model is effectively identifying positive instances with minimal false negatives.
- Potential Overfitting: There is a slight indication of overfitting due to the gap between training and validation precision. This overfitting is likely attributable to the small size and imbalance of the of this preliminary phase, as other influencing factors such as model complexity, data augmentation, and hyperparameter tuning (e.g., regularization, epochs) have been appropriately addressed.
5. Challenges and Solutions
5.1. Class Imbalance
5.2. Detail Recognition
5.3. Environmental Variability
5.4. Overfitting
5.5. Computational Resources
6. Integration with Municipal Maintenance Applications
6.1. Georeferencing and GIS Integration
6.2. User Interface for Maintenance Operators
- Road Damage: Potholes, cracks, and other surface issues.
- Traffic Sign Damage: Defaced, rusty, or obstructed signs.
6.3. Scalability and Replicability
6.4. Future Developments
7. Conclusions
- Incorporating Retroreflectivity Factors: To further refine the classification of road signs, we plan to include retroreflectivity factors in our analysis. This involves detecting and classifying faded or discolored signs, which can significantly impact road safety. Developing models that can identify such signs will be crucial for timely maintenance and replacement.
- Leveraging Generative AI for Data Labeling: The process of manually labeling large datasets is time-consuming and prone to human error. By employing generative AI techniques, we can automate the labeling process, thereby reducing the time and effort required. This will also enable us to handle larger datasets more efficiently.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- W. H. Organization, “Road safety.” https://www.who.int/health-topics/road-safety.
- S. Maldonado-Bascon, S. Lafuente-Arroyo, P. Gil-Jimenez, H. Gomez-Moreno, and F. Lopez-Ferreras, “Road-sign detection and recognition based on support vector machines,” IEEE transactions on intelligent transportation systems, vol. 8, no. 2, pp. 264–278, 2007. [CrossRef]
- C. Fang, S.-W. Chen, and C. Fuh, “Road-sign detection and tracking,” IEEE Trans. Veh. Technol., vol. 52, pp. 1329–1341, 2003. [CrossRef]
- W. Yang and W. Zhang, “Real-time traffic signs detection based on yolo network model,” in 2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 354–357, IEEE, 2020.
- X. Zhang, “Traffic sign detection based on yolo v3,” in 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA), pp. 1044– 1048, IEEE, 2023.
- W. Song and S. A. Suandi, “Sign-yolo: A novel lightweight detection model for chinese traffic sign,” IEEE Access, vol. 11, pp. 113941–113951, 2023.
- T. Xu, L. Ren, T. Shi, Y. Gao, J.-B. Ding, and R.-C. Jin, “Traffic sign detection algorithm based on improved yolox,” Information Technology and Control, 2023.
- D. C. Ciresan, U. Meier, J. Masci, and J. Schmidhuber, “Multi-column deep neural networks for traffic sign classification,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3642–3649, IEEE, 2012.
- W. Song and S. A. Suandi, “Tsr-yolo: A chinese traffic sign recognition algorithm for intelligent vehicles in complex scenes,” Sensors (Basel, Switzerland), 2023. [CrossRef]
- A. Trpkovic, M. Selmic, and S. Jevremovic, “Model for the identification and classification of partially damaged and vandalized traffic signs,” KSCE Journal of Civil Engineering, vol. 25, 07 2021. [CrossRef]
- N. Acilo, A. G. S. D. Cruz, M. K. L. Kaw, M. D. Mabanta, V. G. G. Pineda, and E. A. Roxas, “Traffic sign integrity analysis using deep learning,” 2018 IEEE 14th International Colloquium on Signal Processing Its Applications (CSPA), pp. 107– 112, 2018.
- A. Zhang, K.C. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Fei, Y. Liu, J. Q. Li, and C. Chen, “Automated pixel-level pavement crack detection on 3d asphalt sur-faces using a deep-learning network,” Computer-Aided Civil and Infrastructure Engineering, vol. 32, no. 10, pp. 805–819, 2017. [CrossRef]
- H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, and H. Omata, “Road damage detection using deep neural networks with images captured through a smartphone,” in 2018 IEEE International Conference on Big Data (Big Data), pp. 5207–5212, IEEE, 2018.
- M. Strutu, G. Stamatescu, and D. Popescu, “A mobile sensor network based road surface monitoring system,” 2013 17th International Conference on System Theory, Control and Computing (ICSTCC), pp. 630–634, 2013.
- M. Perttunen, O. Mazhelis, F. Cong, M. Kauppila, T. Leppanen, J. Kantola, J. Collin, S. Pirttikangas, J. Haverinen, T. Ristaniemi, and J. Riekki, “Distributed road surface condition monitoring using mobile phones,” pp. 64–78, 2011.
- R. Tarun and B. P. Esther, “Real-time regional road sign detection and identification using raspberry pi,” 2023 International Conference on Networking and Communications (ICNWC), pp. 1–5, 2023.
- X. R. Lim, C. P. Lee, K. M. Lim, T. S. Ong, A. Alqahtani, and M. Ali, “Recent advances in traffic sign recognition: Approaches and datasets,” Sensors, vol. 23, no. 10, p. 4674, 2023. [CrossRef]
- Neuhold, G., Ollmann, T., Rota Bulo, S., & Kontschieder, P. (2017). The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 4990-4999). [CrossRef]
- D. Arya, H. Maeda, and S. K. Ghosh, “RDD2022: A Multi-National Image Dataset for Automatic Road Damage Detection,” Geoscience Data Journal, 2022. [CrossRef]
- Dewi, Christine & Chen, Rung-Ching & Liu, Yan-Ting & Tai, Shao-Kuo. (2021). Synthetic Data generation using DCGAN for improved traffic sign recognition. Neural Computing and Applications. 34. 1-16. [CrossRef]
- Pensa, D. (2017). Integration of GPS Data into Predictive Models for Tyre Maintenance (Master’s thesis, Politecnico di Milano). Retrieved from https://www.politesi.polimi.it/retrieve/a81cb05c-7f41-616b-e053-1605fe0a889a/tesi.pdf.










Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).