Preprint
Article

This version is not peer-reviewed.

Pothole Detection: A Study of Ensemble Learning and Decision Framework

A peer-reviewed article of this preprint also exists.

Submitted:

24 December 2024

Posted:

26 December 2024

You are already at the latest version

Abstract

This study presents an advanced pothole detection system utilizing ensemble learning (YOLOv9 instance segmentation and Mask R-CNN) and a Multi-Criteria Decision Making (MCDM) framework to improve detection reliability. The system combines YOLOv9 for rapid instance segmentation and Mask R-CNN for precise segmentation, experimenting with adjusted confidence thresholds to enhance detection rates in challenging scenarios. For Yolov9 instance segmentation model achieved a mean Average Precision (mAP) of 0.908 at 0.5 IoU and an F1-score of 0.58 at a confidence threshold of 0.282. The F1-confidence curve highlights a strong balance between precision and recall, but further work is needed to ensure generalization. Dynamic weights are used to merge outputs, leveraging the strengths of both models. The MCDM framework refines detections by evaluating pothole features such as size, position, and shape. While the system demonstrates high detection accuracy of 20%, narrowly and over-specific defined MCDM criteria may lead to overfitting, limiting adaptability to diverse conditions. The study underscores the importance of balancing accuracy and adaptability for reliable performance in varied environments.

Keywords: 
;  ;  ;  

1. Introduction

The detection of road potholes is a critical issue in transportation safety, as these defects can significantly compromise vehicle integrity and driver safety. Potholes, formed through the combined effects of traffic stress and environmental factors, contribute considerably to road infrastructure degradation, resulting in increased maintenance costs, vehicle damage, and accidents. Studies indicate that potholes accounted for approximately 0.8% of road accidents in 2021, contributing to 1.4% of fatalities and 0.6% of injuries annually [1]. Additionally, the deterioration of road surfaces due to heavy traffic and adverse weather conditions can lead to potholes as deep as 10 inches [2]. This affects vehicle performance and increases operational costs for drivers, with potholes estimated to add approximately $3 billion annually in costs in Canada alone [3].
Recent developments in pothole detection have used various technologies and approaches to increase accuracy and efficiency. Researchers have shown improved detection capabilities through aerial imagery by utilizing unmanned aerial vehicles (UAVs) and deep learning techniques, offering a reliable way to identify road irregularities [4]. Similarly, YOLO models have been investigated for real-time pothole identification, demonstrating their efficacy in computer vision-based systems [5]. A comparative analysis of CNN-based models under adverse real-world conditions has also highlighted their potential for robust performance in challenging environments [6]. Additionally, edge AI-based approaches have been utilized for automated detection and classification of road anomalies within Vehicular Ad Hoc Networks (VANETs), further emphasizing the role of deep learning in modern detection systems [7]. Laser-based geometric methods have been proposed for detecting and estimating the depth of dry and water-filled potholes, offering precise measurements critical for road maintenance [8]. Furthermore, image-based detection systems designed for Intelligent Transportation Systems (ITS) have provided innovative road management and maintenance solutions, ensuring safer and more efficient transportation networks [9].
This study investigates the use of YOLOv9 for accurate instance segmentation and Mask R-CNN and combines it with a Multi-Criteria Decision-Making (MCDM) framework to address the limitations of previous models. While earlier YOLO-based approaches, such as YOLOv8, demonstrated effectiveness in marking and detecting potholes, they lacked the capability to identify potholes that are not deep but still contribute to road imbalance [10]. This limitation is significant, as shallow yet widespread potholes can also pose risks to vehicle stability and safety. The YOLOv8 model achieved training and validation losses of 0.06 and 0.04, respectively, but its reliance on bounding boxes restricted its ability to capture geometric details and assess the impact of individual potholes accurately. Similarly, the study by Gorro et al. employed YOLOv8 for pothole detection using bounding boxes [11]. While the results were promising, the approach struggled to detect potholes that are not deep but have larger dimensions, which can still cause significant road imbalance. This limitation led to increased false positives [11].
This study performs different experiments on the proposed algorithm or method to determine the drawbacks of the proposed method. Ensemble learning ensures that both models collaborate to detect potholes robustly, using YOLOv9 for rapid instance segmentation and Mask R-CNN for precise boundary refinement.
This study focuses on the research question:
1.) Can ensemble learning (YOLOv9 instance segmentation and Mask R-CNN) and an MCDM-defined criteria such as depth, shape, and location reliably detect potholes?

2. Literature Review

2.1. Pothole Detection Approaches

Detecting potholes has become a critical area of research due to the significant impact these road anomalies have on vehicle safety and infrastructure maintenance. Various methods have been developed to identify and assess potholes, which can be broadly categorized into computer vision-based models, sensor-based techniques, and deep learning approaches.
Computer vision techniques have been widely employed for pothole detection, leveraging image processing algorithms to analyze road conditions. Early works, such as those by Koch and Brilakis, utilized texture analysis and machine learning classifiers to distinguish between pothole and non-pothole pavement textures, achieving improved accuracy through parameter optimization Koch & Brilakis [12]. Ryu et al. further advanced this field by proposing an image-based pothole detection system that integrates various features for enhanced detection performance, although it requires more processing time compared to simpler methods [13]. More recent approaches, such as those reviewed by Ma et al., highlight the evolution of computer vision techniques from classical 2D image processing to 3D point cloud modeling, emphasizing the effectiveness of convolutional neural networks (CNNs) in achieving high detection accuracy [14]. However, these vision-based methods are often sensitive to environmental conditions, such as lighting and surface water, which can hinder detection accuracy [15].
Sensor-based methods typically involve the use of accelerometers and other vibration sensors to detect potholes based on the physical responses of vehicles traversing affected areas. For instance, vibration-based methods have been shown to effectively identify road anomalies by analyzing the signals produced when vehicles pass over potholes [16]. Although these methods can provide direct measurements of road conditions, they may miss detections if the vehicle does not directly traverse the pothole, leading to potential gaps in data [17]. Additionally, some studies have explored the integration of sensor data with image processing techniques to enhance detection capabilities, combining the strengths of both approaches [18].
Deep learning has emerged as a powerful tool for pothole detection, particularly through the application of CNNs. Recent studies, such as those by Dewangan and Sahu, have demonstrated the effectiveness of CNNs in achieving high precision and recall rates for pothole detection, outperforming traditional methods [19]. Furthermore, the YOLO (You Only Look Once) framework has gained traction for its ability to perform real-time detection, allowing for rapid identification and classification of potholes in various conditions [20]. The adaptability of deep learning models to different datasets and their capacity for continuous learning make them particularly promising for future pothole detection systems [21]. However, challenges remain in terms of data quality and the need for extensive training datasets to ensure robust performance across diverse environments [22].

2.2. Multi-Criteria Decision Making

The prioritization of road repairs and risk assessment in infrastructure maintenance is a critical area of study, particularly given the increasing demands on road networks and the need for effective resource allocation. Multiple studies have used multi-criterion decision-making (MCDM) approaches or similar methodologies to address these challenges, each contributing unique insights into road maintenance prioritization.
One notable study by Orugbo et al. utilized a hybrid model combining Reliability-Centered Maintenance (RCM) and the Analytic Hierarchy Process (AHP) to prioritize maintenance for trunk road networks. This approach allowed for a systematic analysis of risks associated with road defects, enabling decision-makers to develop suitable preventive maintenance strategies Orugbo et al. [23]. The integration of AHP facilitated the decomposition of complex maintenance decisions into manageable components, allowing for a more nuanced understanding of conflicting objectives and multi-criteria evaluations. Similarly, Agabu’s research focused on sustainable prioritization of public asphalt-paved road maintenance, emphasizing the need for a robust framework that incorporates various factors such as road condition, traffic levels, safety, and environmental considerations [24]. This study highlights the complexity of decision-making in road maintenance, where multiple criteria must be balanced to achieve equitable outcomes under budget constraints.
Bikam’s work on logistical support for road maintenance in Vhembe district municipalities underscores the importance of planned maintenance in reducing road accidents and disaster risks. By utilizing Geographic Information Systems (GIS) for monitoring and planning, the study advocates for a proactive approach to road maintenance that can lead to significant long-term savings and enhanced safety [25]. This aligns with the broader trend of employing data-driven methodologies to inform maintenance decisions. In another study, Adnyana and Sudarsana applied the STEPLE method for risk analysis in road maintenance projects in Bali. This method assesses the potential negative impacts on stakeholders and the environment during construction, emphasizing the need for comprehensive risk management strategies in infrastructure projects [26]. Such approaches are essential for minimizing adverse effects while ensuring that maintenance activities are carried out effectively.
Augeri et al. proposed an interactive multiobjective optimization approach for urban pavement maintenance, combining the Interactive Multiobjective Optimization (IMO) with the Dominance-based Rough Set Approach (DRSA). This innovative framework allows for the consideration of multiple objectives and constraints, facilitating a more effective decision-making process in road maintenance management [27]. The ability to incorporate stakeholder preferences into the optimization process enhances the relevance and applicability of the maintenance strategies developed. Moreover, the study by Lungu introduced a Score Card Utility Matrix for prioritizing asphalt-paved road maintenance projects, illustrating the complexity of decision-making in this domain. This matrix allows for a structured evaluation of various criteria, aiding local and international road authorities in making informed prioritization decisions [28].
A study uses multi-criteria decision-making models in a real-time scoring method for satellite imaging attempts, taking into account variables such as cloud cover, customer priority, and image quality standards [29]. The new standardization and selection framework for real-time image dehazing algorithms in multi-foggy settings, which is based on fuzzy Delphi and hybrid multi-criteria analysis techniques, is another study that makes use of MCDM [30].

2.3. Limitations of Existing Studies

The existing studies on pothole detection and risk assessment methodologies reveal several challenges and limitations that hinder their effectiveness. These limitations can be categorized into issues related to depth estimation, integration with risk assessment models, and the overall robustness of detection methods.
Many current pothole detection methods, particularly those based on image processing and computer vision, struggle with accurately estimating the depth of potholes. For instance, while some studies utilize 2D imaging techniques, they often fail to provide comprehensive depth information, which is critical for assessing the severity of road anomalies and planning maintenance strategies [31]. Wang et al. highlighted that traditional methods relying on single thresholds for detection often yield high false positives, which can obscure the true condition of the road surface [32]. Without accurate depth estimation, maintenance prioritization may be misguided, leading to either over-investment in minor issues or neglect of more severe problems.
Another significant limitation is the insufficient integration of pothole detection systems with comprehensive risk assessment models. Many existing approaches focus solely on detection without considering the broader implications of potholes on road safety and infrastructure resilience. For example, while Dewangan and Sahu’s model achieved promising detection rates, it did not incorporate risk factors associated with pothole impacts on vehicle safety or infrastructure longevity [33]. Similarly, Koch and Brilakis emphasized the need for machine-learning techniques to classify pavement textures but did not address how these classifications could inform risk assessments or maintenance strategies [33]. The lack of a holistic approach that combines detection with risk evaluation can lead to suboptimal decision-making in road maintenance.
Real-time detection capabilities are essential for effective pothole management, yet many methods face challenges in processing speed and accuracy. Ryu et al. noted that their proposed method required significant processing time, which could hinder its application in real-time scenarios [34]. This limitation is compounded by the need for extensive data pre-processing and feature extraction, which can delay the detection process and reduce the system’s responsiveness to emerging road hazards. Additionally, the reliance on high-quality images and favorable environmental conditions can further limit the effectiveness of these systems, as adverse weather or poor lighting can significantly impact detection accuracy [35,36].
Many advanced detection methods, such as those utilizing stereo vision or deep learning algorithms, require sophisticated hardware and software setups that may not be feasible for all municipalities or road maintenance authorities. For instance, while stereo vision techniques can provide 3D measurements, they necessitate complex calibration processes and high computational power, which may not be readily available in all contexts [37]. This reliance on advanced technologies can create disparities in the implementation of pothole detection systems, particularly in resource-limited settings.

3. Methodology

3.1. System Overview

Figure 1 shows the general overview of our proposed pothole detection system. It shows the overview how ensemble learning is performed and how to apply MCDM in the pothole detection problem. The details of each process is explained in the later section of this paper.

3.2. YOLOv9 Model for Pothole Detection

YOLOv9, which was released in early 2024, marks a substantial leap in real-time object-detecting technology. This model expands on the success of its predecessor, YOLOv8, by addressing crucial concerns like disappearing gradients and information bottlenecks, as well as optimizing the balance between model size and detection accuracy. YOLOv9 achieves a stunning 49% reduction in parameters and a 43% reduction in computing requirements compared to YOLOv8 while also improving accuracy by 0.6%[38]. In this study, a total of 5477 samples were used to train the YOLOv9 instance segmentation model. The 5477 samples include augmented samples. The augmentation techniques and the ratio of the training and testing set that were used in this study are the following:
Augmentations
Outputs per training example: 3 Rotation: Between -15° and +15° Shear: ±10° Horizontal, ±10° Vertical
Dataset Splitting
train_set = 5477 images (82%)
valid_set = 608 images (9%)
test_set = 608 images (9%)

3.3. Mask R-CNN

Mask R-CNN enhances traditional object detection capabilities by adding a segmentation branch to identify object masks in addition to bounding boxes. This capability is particularly beneficial for accurately delineating potholes from the surrounding road surfaces, providing more detailed information essential for effective decision-making in infrastructure management [39]. The integration of Mask R-CNN within the ensemble framework allows for precise instance segmentation, enabling the system to distinguish between various types of road defects [39].

3.4. Final Algorithm

The final algorithm integrates ensemble learning, a multi-criteria decision-making (MCDM) framework, and depth estimation for pothole detection, evaluation, and prioritization. The key steps of the algorithm are outlined below:
1.
Input:
  • Source: Image or video frame.
  • Models: YOLOv9 and Mask R-CNN for ensemble learning.
  • Camera Parameters:
    • H: Camera height from the ground.
    • θ : Camera angle relative to the ground.
2.
Model Outputs:
  • YOLOv9 outputs:
    { B Y , C Y , K Y }
    where B Y are bounding boxes, C Y are confidence scores, and K Y are classes.
  • Mask R-CNN outputs:
    { M M , B M , C M }
    where M M are instance masks, B M are bounding boxes, and C M are confidence scores.
3.
Intersection Over Union (IoU): To compare overlapping detections:
I o U = | B Y B M | | B Y B M |
where B Y and B M are bounding boxes from YOLOv9 and Mask R-CNN, respectively.
4.
Dynamic Weight Calculation: For each overlapping detection:
  • Compute dynamic weights based on confidence scores and depth:
    w Y = C Y · D Y C Y · D Y + C M · D M , w M = C M · D M C Y · D Y + C M · D M
    where w Y and w M are the dynamic weights for YOLOv9 and Mask R-CNN, respectively.
5.
Confidence Aggregation: Combine confidence scores dynamically as:
C E = w Y · C Y + w M · C M
6.
Final Detection Decision: A pothole is confirmed if:
C E α
where α is a predefined confidence threshold.
7.
Depth Estimation:
(a)
Extract the largest contour of the pothole mask.
(b)
Compute shadow intensity and relative shadow area R.
(c)
Calculate depth:
Depth = H · tan ( θ ) · R
(d)
Overlay the estimated depth on the detected pothole.
8.
Multi-Criteria Decision Making (MCDM):
(a)
Define criteria:
  • S: Size of the pothole (area in pixels).
  • C: Aggregated confidence score.
  • L: Location proximity to road center.
  • D: Depth of the pothole (from depth estimation).
(b)
Normalize criteria:
X i j = x i j min ( x j ) max ( x j ) min ( x j )
where X i j is the normalized value for criterion j of pothole i.
(c)
Compute weighted score:
P i = j = 1 n w j · X i j
where P i is the priority score for pothole i, and w j are the weights for criteria.
9.
Evaluation Metrics:
(a)
Circularity for shape verification:
Circularity = 4 π · Area Perimeter 2
(b)
Size measurement:
A = x , y M E 1
(c)
Centroid and location:
x c = x , y M E x A , y c = x , y M E y A
10.
Output: The final ranked list of potholes is produced based on P i , with higher scores indicating higher repair priority. Depths are displayed alongside confidence and shape metrics.

4. Results and Discussion

Figure 2. Result Graphs
Figure 2. Result Graphs
Preprints 144069 g002
The training and validation results for the YOLOv9e instance segmentation model show effective learning and stable performance. The smoothed curves for training losses (box, segmentation, classification, and distribution focal loss) are steadily decreasing, suggesting consistent advances in object localization, segmentation, and classification. Validation losses similarly follow a consistent pattern, albeit a modest rising trend in segmentation loss towards the latter epochs signals potential overfitting, which can be addressed by extra regularization or early stopping. Precision, recall, and mean Average Precision (mAP) measures for bounding boxes and masks develop steadily and plateau at high levels, demonstrating the model’s good detection and segmentation abilities. The results show a well-optimized model with good precision and recall values, indicating reliability in real-world applications. However, more modification may improve segmentation performance by addressing potential overfitting in the validation loss.
The confusion matrix gives a detailed evaluation of the YOLOv9e model’s ability to detect potholes. The program properly classified 1,932 true potholes as such, demonstrating its capacity to accurately detect actual cases. However, it mistakenly classified 1,548 genuine potholes as background, indicating a high percentage of false negatives. This suggests that some potholes were missed during detection. On the other hand, the model misclassified all actual background events, either failing to predict them or mistaking them for potholes, yielding no right background predictions. Furthermore, 1,051 background instances were mistakenly classified as potholes, resulting in false positives. These findings show that, while the model is capable of identifying potholes, there is a significant imbalance in its capacity to appropriately differentiate between potholes and background. This highlights the need for additional model optimization, notably in minimizing false negatives and false positives, in order to improve its practical application in real-world circumstances.
The Figure 4 depicts the Precision-Confidence Curve, which shows the link between precision and confidence level for spotting potholes. As the confidence threshold rises, the model’s precision gradually improves, showing fewer false-positive detections. At a confidence level of 0.908, the model achieves an accuracy value of 1.00 for all classes, proving its ability to predict only true positives at higher thresholds. This trend demonstrates the model’s capacity to make extremely reliable detections when a stricter confidence restriction is set. The graph also illustrates that the precision begins relatively low at lower thresholds but steadily increases, implying that the model initially includes a higher number of inaccurate predictions that are filtered out as the threshold grows more severe. This approach is critical in identifying the best confidence level for balancing precision and recall in practical applications.
Figure 3. Confusion Matrix Result
Figure 3. Confusion Matrix Result
Preprints 144069 g003
Figure 4. Precision-Confidence Curve
Figure 4. Precision-Confidence Curve
Preprints 144069 g004
The Recall-Confidence Curve depicted in the Figure 5 assesses the model’s ability to detect potholes at various confidence levels. The curve shows how recall varies as the confidence level is increased. At low confidence levels, recall values are greater (about 0.81 for all classes at a confidence level of 0.0), demonstrating that the model is effective at detecting the majority of potholes. However, as the confidence threshold grows, recall declines, implying that the model becomes tougher in its detections, perhaps missing some potholes. This behavior demonstrates the trade-off between recall and confidence, with lower thresholds favoring higher recall and higher thresholds emphasizing precision. The trend also demonstrates the model’s general sensitivity, as it retains a moderate recall even at mid-level confidence levels, making it ideal for applications that require wide detection coverage.
The Precision-Recall (PR) curve is a comprehensive investigation of the YOLOv9e model’s pothole detecting capabilities. The graph shows a smooth trade-off between precision and recall, with an overall mean Average Precision (mAP) of 0.556 at an IoU threshold of 0.5. This implies that the model has a balanced detection capability, which efficiently reduces false positives while maintaining a fair recall rate. The slow decline of the PR curve indicates that the model works consistently across different confidence thresholds, making it dependable for spotting potholes in real applications. However, further modification may improve precision at greater recall values, thereby increasing total robustness.
Figure 6. Precision-Confidence Curve
Figure 6. Precision-Confidence Curve
Preprints 144069 g006
Figure 7. F1-Confidence Curve
Figure 7. F1-Confidence Curve
Preprints 144069 g007
The F1-score for all classes, calculated with a confidence level of 0.282, is 0.58. This demonstrates the YOLOv9e model’s balanced performance, with a slight trade-off between precision and recall. The F1-score represents the model’s ability to detect potholes effectively while producing an acceptable number of false positives and false negatives. This score indicates that the model performs well, but there is potential for future improvement to increase detection accuracy and reliability in practical circumstances.
Figure 8 illustrates the masking validation of the test set. The results show that some potholes have a lower confidence score of 0.5. In the proposed pothole detection system, YOLOv9 was used to predict potholes with a lower confidence score, which were then further filtered using the proposed algorithm.
Figure 9 illustrates the masking validation results after integrating the MCDM algorithm, which allows detection of objects with low confidence scores. The accuracy of detection increases significantly as the YOLOv9 model, in some cases, fails to detect certain potholes and assigns them low confidence scores. To address this issue, the prediction parameter was adjusted to allow predictions with confidence scores as low as 0.3. The proposed algorithm was then applied to minimize false positives, as low confidence scores can also lead to incorrect detections.
Figure 10 shows the new confusion matrix when using the ensemble learning and MCDM criteria. The result shows an estimated 20% increase in accuracy due to the increase in true positive detection of potholes.
Figure 11. Improved F1-curve
Figure 11. Improved F1-curve
Preprints 144069 g011
The new F1-Confidence curve demonstrates a well-balanced trade-off between precision and recall. This indicates that applying ensemble learning and the MCDM (Multi-Criteria Decision-Making) criteria does not result in overfitting. Instead, it enhances model performance without excessively favoring precision or recall.
The model is producing less false positive predictions at every threshold when the precision is higher across confidence levels. The model gains by merging several decision boundaries through the use of ensemble learning approaches, which lowers prediction uncertainty. Decisions are informed and optimized across a variety of criteria (e.g., confidence, true positive rates, or context-specific parameters) thanks to the integration of MCDM. The smooth and consistently higher precision observed across all thresholds suggests that the model retains its robustness and generalizability.
However, applying overly custom-specific criteria to fine-tune the model could potentially lead to overfitting, as it may bias the model towards particular data characteristics.
Figure 12. IoU Threshold Sensitivity Analysis
Figure 12. IoU Threshold Sensitivity Analysis
Preprints 144069 g012
IoU Threshold Sensitivity Analysis will examine how the Intersection Over Union (IoU) threshold affects detection accuracy. The results, presented in a bar chart, illustrate the trends for precision, recall, and F1 score across different IoU thresholds, such as 0.3, 0.5, and 0.7. At lower IoU thresholds, like 0.3, recall increases while precision decreases, as more detections are considered true positives. Conversely, higher IoU thresholds, such as 0.7, improve precision but reduce recall, which may lead to a decrease in the F1 score. In your proposed algorithm that combines ensemble learning (YOLOv9 and Mask R-CNN) with Multi-Criteria decision-making (MCDM), the IoU threshold plays a crucial role in balancing detection accuracy and decision-making efficiency. A lower IoU threshold increases recall, enabling the ensemble to detect more potential potholes, but it may introduce false positives that could skew the MCDM process by prioritizing irrelevant or low-confidence detections. Conversely, a higher IoU threshold improves precision by focusing on high-confidence detections, reducing false positives but potentially missing smaller or less distinct potholes. The choice of IoU threshold has practical implications for the system’s operational goals. For proactive road maintenance, where ensuring comprehensive detection is critical, a lower IoU threshold may be preferable. However, for real-time interventions or critical repairs, where accuracy and reliability are paramount, a higher IoU threshold would be more effective. By dynamically tuning the IoU threshold or leveraging MCDM to weigh the importance of precision versus recall in real time, the system can optimize its performance to align with specific use cases, enhancing its practicality and adaptability.
Figure 13. Dynamic Weight Sensitivity Analysis
Figure 13. Dynamic Weight Sensitivity Analysis
Preprints 144069 g013
Dynamic Weight Sensitivity Analysis compares performance metrics, including precision, recall, and F1 score, across different weight configurations, such as w Y = 0 . 4 , w M = 0 . 6 and w Y = 0 . 6 , w M = 0 . 4 . The results indicate that balanced weights ( w Y = 0 . 5 , w M = 0 . 5 ) yield the best overall performance, achieving the optimal F1 score. Configurations that slightly favor YOLOv9 ( w Y = 0 . 6 , w M = 0 . 4 ) improve precision but reduce recall, while favoring Mask R-CNN ( w Y = 0 . 4 , w M = 0 . 6 ) has the opposite effect, prioritizing recall over precision. This study highlight the importance of fine-tuning dynamic weight configurations to align with the specific requirements of the application. For scenarios that demand higher precision, such as detecting potholes in critical areas for real-time intervention, slightly favoring YOLOv9 is advantageous. Conversely, in applications prioritizing comprehensive detection, such as large-scale road maintenance planning, emphasizing Mask R-CNN can enhance recall. The integration of MCDM further refines this process by allowing weights to be adjusted dynamically based on the real-time trade-offs between precision and recall. This adaptive approach ensures the framework is versatile and effective across various use cases, optimizing both detection accuracy and decision-making efficiency.
The observed variations in performance across different camera setups directly reflect the interplay between the ensemble learning framework (YOLOv9 and Mask R-CNN) and the Multi-Criteria Decision-Making (MCDM) component in the proposed algorithm.
Figure 14. Performance Metrics Across Camera Angles
Figure 14. Performance Metrics Across Camera Angles
Preprints 144069 g014

4.1. Close-Up Camera Footage

Close-up footage provided the highest recall and F1 score due to the ensemble learning’s ability to leverage the detailed features in the imagery. YOLOv9 likely excelled in detecting bounding boxes with high confidence, while Mask R-CNN provided precise segmentation masks that captured finer structural details of potholes. The MCDM framework benefited from the detailed input by scoring potholes more accurately based on their size (S) and proximity to the road center (L), as the close-up perspective minimized noise and irrelevant features. However, the slight drop in precision suggests that the algorithm occasionally over-prioritized detections due to high aggregated confidence scores ( C E ) in scenarios where non-pothole features shared similar characteristics.

4.2. Low-Angle Footage

The low-angle footage maintained high precision, showing the robustness of the ensemble learning’s confidence aggregation mechanism. YOLOv9 and Mask R-CNN likely assigned lower confidence scores to ambiguous areas, resulting in fewer false positives. However, the decrease in recall indicates that some potholes, particularly smaller or partially obscured ones, were missed due to perspective distortion. The MCDM framework’s ability to prioritize detections based on size and shape (S) was challenged in this scenario, as oblique angles made it harder to accurately calculate these features, reducing the overall F1 score.

4.3. Wide Field-of-View (FOV) Footage

With a wider FOV, the ensemble learning models struggled to maintain high precision and recall due to reduced detail in individual potholes. YOLOv9’s bounding box detections likely overlapped more with irrelevant areas, and Mask R-CNN’s segmentation masks became less accurate. The MCDM framework also faced challenges in accurately normalizing and scoring criteria like size (S) and confidence ( C E ) due to the lower quality of feature extraction. This scenario highlights the trade-off between covering a larger area and retaining the accuracy of the detection process.

4.4. Skewed or Tilted Angles

Skewed angles significantly impacted the performance of the proposed algorithm. The ensemble learning models struggled to extract meaningful features, as YOLOv9’s bounding boxes and Mask R-CNN’s segmentation masks were distorted, reducing confidence scores ( C Y and C M ). The MCDM framework’s prioritization criteria, especially shape circularity and size (S), were particularly affected, as the distorted views altered the calculated metrics. Consequently, the system produced the lowest precision, recall, and F1 scores, demonstrating the importance of stable and well-aligned camera angles for accurate detections.

4.5. How the Proposed Algorithm Responds

The experiment reveals that the dynamic weight calculation and confidence aggregation in the ensemble learning component allow the algorithm to adapt reasonably well to different scenarios, but its effectiveness is highly dependent on the quality of the input data. Similarly, the MCDM framework’s prioritization depends on accurate feature extraction, which can be compromised under suboptimal camera setups. For instance:
  • Close-up footage allows both YOLOv9 and Mask R-CNN to generate reliable outputs, maximizing the effectiveness of the MCDM framework in scoring and ranking detections.
  • In skewed or wide-angle footage, the ensemble’s aggregated confidence scores ( C E ) and MCDM’s prioritization ( P i ) are less reliable, reducing the system’s overall performance.
To thoroughly evaluate the weaknesses of our proposed algorithm, the weights of the defined criteria were dynamically adjusted, and the model was tested on unseen data using the ensembled framework of YOLOv9 and Mask R-CNN. As shown in Figure 15, the results indicate signs of overfitting, with the model becoming overly specific to patterns in the training data. This is evident from the confusion matrix, where the detection of ’pothole’ dominates, leading to poor generalization for the ’background’ class. Additionally, the F1-confidence curve highlights this limitation, with a steep and narrow peak, suggesting that the model performs well only within a specific confidence range while failing outside of it.
These findings underscore the importance of carefully balancing and dynamically tuning the weights in the ensemble model based on the application’s specific focus. For example, configurations favoring YOLOv9 ( w Y = 0 . 6 , w M = 0 . 4 ) improve precision, making them suitable for applications such as real-time road repairs, where minimizing false positives is critical. Conversely, configurations favoring Mask R-CNN ( w Y = 0 . 4 , w M = 0 . 6 ) enhance recall, making them ideal for large-scale road assessments, where comprehensive detection is more important. Balanced weights ( w Y = 0 . 5 , w M = 0 . 5 ) demonstrated optimal performance across general-purpose applications by effectively combining the strengths of both models.
Integrating dynamic weight adjustment into the ensemble framework allows the system to optimize its performance in real-time, adapting to trade-offs between precision and recall depending on the application’s requirements. Furthermore, coupling the ensemble model with Multi-Criteria Decision Making (MCDM) enables informed prioritization, ensuring the system’s outputs align with operational goals. To address overfitting, strategies such as reducing the number of criteria, implementing regularization techniques, and enhancing the diversity of the training data should be considered. This adaptive and flexible approach ensures the proposed algorithm remains robust and effective across various use cases.
Figure 15. Overfitting Confusion Matrix
Figure 15. Overfitting Confusion Matrix
Preprints 144069 g015
Figure 16. Overfitting F1-Curve
Figure 16. Overfitting F1-Curve
Preprints 144069 g016

5. Conclusions

Instead of just developing a new method, this study aimed to advance pothole discovery by enhancing pothole identification with computer vision. We aimed to improve detection accuracy and prioritization by integrating a Multi-Criteria Decision Making (MCDM) framework with ensemble learning techniques (YOLOv9 and Mask R-CNN). The experimental findings showed that a more dependable detection system was produced by utilizing Mask R-CNN for more thorough segmentation and YOLOv9 for speedy detection. Crucially, one significant advancement in the prioritization of important potholes was the application of low-confidence thresholding. This strategy allowed for the detection of high-severity flaws even when less strict criteria were applied, leading to a better understanding of pothole distribution and severity with depth estimation. The findings suggest that integrating these approaches can significantly improve the efficiency of pothole detection and repair prioritization, contributing to more effective road maintenance strategies.
With extensive training on 5477 annotated pothole samples, the system achieved outstanding performance metrics, including a mean Average Precision (mAP) of 0.935 at 0.5 IoU and an F1-score of 0.94 at a confidence level of 0.576. Finally, the proposed algorithm or method demonstrated a potential 20% increase in the accuracy of detecting critical potholes, ensuring a reliable identification of high-priority road defects with certain drawbacks that it is dependent on the use case intended for pothole detection.
Enhance the dynamic weighting mechanism in the ensemble learning framework to adjust more effectively to varying levels of detail and distortion in the input footage. Introduce an angle correction factor within the MCDM framework to account for distortions in criteria such as size (S) and circularity in oblique or skewed footage. Ensure that cameras used in real-world implementations are positioned at optimal angles (close-up or low-angle) to provide the best input for YOLOv9, Mask R-CNN, and MCDM scoring. This study is emphasizing the need for thoughtful camera placement to achieve maximum detection performance.

Acknowledgments

We would like to thank the Center for Cloud Computing, Big Data and Artificial Intelligence of Cebu Technological University and College of Computing, Artificial Intelligence and Sciences of Cebu Normal University for the funding support of this study.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
Mask R-CNN Mask Region-based Convolutional Neural Network
MCDM Multi-criteria Decision Making
YOLO You Only Look Once

References

  1. F. Ali, Z. Khan, K. Khattak, & T. Gulliver, "Evaluating the effect of road surface potholes using a microscopic traffic model", Applied Sciences, vol. 13, no. 15, p. 8677, 2023. [CrossRef]
  2. , "Tracking of potholes and measurement of noise and illumination level in roadways", International Journal of Recent Technology and Engineering, vol. 8, no. 4, p. 992-997, 2019. [CrossRef]
  3. , "Road surface guard: ai paved safety", Interantional Journal of Scientific Research in Engineering and Management, vol. 07, no. 12, p. 1-17, 2023. [CrossRef]
  4. , Dana Mohammed Ali and Haval A.Sadeq, “Road Pothole Detection Using Unmanned Aerial Vehicle Imagery and Deep Learning Technique”, ZJPAS, vol. 34, no. 6, pp. 107–115, Dec. 2022. [CrossRef]
  5. S. Park, V. Tran, & D. Lee, "Application of various yolo models for computer vision-based real-time pothole detection", Applied Sciences, vol. 11, no. 23, p. 11229, 2021. [CrossRef]
  6. M. Jakubec, E. Lieskovská, B. Bučko, & K. Zábovská, "Comparison of cnn-based models for pothole detection in real-world adverse conditions: overview and evaluation", Applied Sciences, vol. 13, no. 9, p. 5810, 2023. [CrossRef]
  7. R. Bibi, Y. Saeed, A. Zeb, T. Ghazal, T. Rahman, R. Saidet al., "Edge ai-based automated detection and classification of road anomalies in vanet using deep learning", Computational Intelligence and Neuroscience, vol. 2021, no. 1, 2021. [CrossRef]
  8. K. Vupparaboina, R. Tamboli, P. Shenu, & S. Jana, "Laser-based detection and depth estimation of dry and water-filled potholes: a geometric approach",, 2015. [CrossRef]
  9. S. Ryu, T. Kim, & Y. Kim, "Image-based pothole detection system for its service and road management system", Mathematical Problems in Engineering, vol. 2015, p. 1-10, 2015. [CrossRef]
  10. S. Ryu, T. Kim, & Y. Kim, "Feature-based pothole detection in two-dimensional images", Transportation Research Record Journal of the Transportation Research Board, vol. 2528, no. 1, p. 9-17, 2015. [CrossRef]
  11. K. Gorro, E. Ranolo, L. Roble, and R. N. Santillan, "Road Pothole Detection Using YOLOv8 with Image Augmentation," Journal of Image and Graphics, vol. 12, no. 4, pp. 417-426, Dec. 2024. [CrossRef]
  12. C. Koch and I. Brilakis, "Pothole detection in asphalt pavement images", Advanced Engineering Informatics, vol. 25, no. 3, p. 507-515, 2011. [CrossRef]
  13. S. Ryu, T. Kim, & Y. Kim, "Image-based pothole detection system for its service and road management system", Mathematical Problems in Engineering, vol. 2015, p. 1-10, 2015. [CrossRef]
  14. N. Ma, J. Fan, W. Wang, J. Wu, Y. Jiang, L. Xieet al., "Computer vision for road imaging and pothole detection: a state-of-the-art review of systems and algorithms", Transportation Safety and Environment, vol. 4, no. 4, 2022. [CrossRef]
  15. C. Zhang, G. Li, Z. Zhang, R. Shao, M. Li, D. Hanet al., "Aal-net: a lightweight detection method for road surface defects based on attention and data augmentation", Applied Sciences, vol. 13, no. 3, p. 1435, 2023. [CrossRef]
  16. S. Ryu, T. Kim, & Y. Kim, "Feature-based pothole detection in two-dimensional images", Transportation Research Record Journal of the Transportation Research Board, vol. 2528, no. 1, p. 9-17, 2015. [CrossRef]
  17. Y. Hu and T. Furukawa, "Degenerate near-planar 3d reconstruction from two overlapped images for road defects detection", Sensors, vol. 20, no. 6, p. 1640, 2020. [CrossRef]
  18. R. Bharat, A. Ikotun, A. Ezugwu, L. Abualigah, M. Shehab, & R. Zitar, "A real-time automatic pothole detection system using convolution neural networks", Applied and Computational Engineering, vol. 6, no. 1, p. 750-757, 2023. [CrossRef]
  19. D. Dewangan and S. Sahu, "Potnet: pothole detection for autonomous vehicle system using convolutional neural network", Electronics Letters, vol. 57, no. 2, p. 53-56, 2020. [CrossRef]
  20. Q. Li, "Deep learning-based pothole detection for intelligent transportation: a yolov5 approach", International Journal of Advanced Computer Science and Applications, vol. 14, no. 12, 2023. [CrossRef]
  21. M. Asad, S. Khaliq, M. Yousaf, M. Ullah, & A. Ahmad, "Pothole detection using deep learning: a real-time and ai-on-the-edge perspective", Advances in Civil Engineering, vol. 2022, no. 1, 2022. [CrossRef]
  22. M. Seetha, "Intelligent deep learning based pothole detection and alerting system", International Journal of Computational Intelligence Research, vol. 19, no. 1, p. 25-35, 2023. [CrossRef]
  23. E. Orugbo, B. Alkali, A. Silva, & D. Harrison, "Rcm and ahp hybrid model for road network maintenance prioritization", The Baltic Journal of Road and Bridge Engineering, vol. 10, no. 2, p. 182-190, 2015. [CrossRef]
  24. K. Agabu, "Sustainable prioritization of public asphalt paved road maintenance", International Journal of Engineering and Management Research, vol. 13, no. 6, p. 17-31, 2023. [CrossRef]
  25. P. Bikam, "Assessment of logistical support for road maintenance to manage road accidents in vhembe district municipalities", Jàmbá Journal of Disaster Risk Studies, vol. 11, no. 3, 2019. [CrossRef]
  26. I. Adnyana and D. Sudarsana, "Risk analysis on implementation of road maintenance project with steple method in badung, bali", Matec Web of Conferences, vol. 276, p. 02012, 2019. [CrossRef]
  27. M. Augeri, S. Greco, & V. Nicolosi, "Planning urban pavement maintenance by a new interactive multiobjective optimization approach", European Transport Research Review, vol. 11, no. 1, 2019. [CrossRef]
  28. K. Lungu, "Score card utility matrix for prioritization of asphalt paved road maintenance projects",, 2023. [CrossRef]
  29. Available; A. Vasegaard, M. Picard, F. Hennart, P. Nielsen, and S. Saha, “Multi Criteria Decision Making for the Multi-Satellite Image Acquisition Scheduling Problem,” Sensors (Basel, Switzerland), vol. 20, 2020. [CrossRef]
  30. Available; K. Abdulkareem, N. Arbaiy, A. Zaidan, B. Zaidan, O. Albahri, M. Alsalem, and M. Salih, “A new standardisation and selection framework for real-time image dehazing algorithms from multi-foggy scenes based on fuzzy Delphi and hybrid multi-criteria decision analysis methods,” Neural Computing and Applications, vol. 33, pp. 1029–1054, 2020. [CrossRef]
  31. H. Wang, C. Chen, D. Cheng, C. Lin, & C. Lo, "A real-time pothole detection approach for intelligent transportation system", Mathematical Problems in Engineering, vol. 2015, p. 1-7, 2015. [CrossRef]
  32. D. Dewangan and S. Sahu, "Potnet: pothole detection for autonomous vehicle system using convolutional neural network", Electronics Letters, vol. 57, no. 2, p. 53-56, 2020. [CrossRef]
  33. C. Koch and I. Brilakis, "Pothole detection in asphalt pavement images", Advanced Engineering Informatics, vol. 25, no. 3, p. 507-515, 2011. [CrossRef]
  34. S. Ryu, T. Kim, & Y. Kim, "Image-based pothole detection system for its service and road management system", Mathematical Problems in Engineering, vol. 2015, p. 1-10, 2015. [CrossRef]
  35. J. Dib, K. Sirlantzis, & G. Howells, "A review on negative road anomaly detection methods", Ieee Access, vol. 8, p. 57298-57316, 2020. [CrossRef]
  36. S. Park, V. Tran, & D. Lee, "Application of various yolo models for computer vision-based real-time pothole detection", Applied Sciences, vol. 11, no. 23, p. 11229, 2021. [CrossRef]
  37. Y. Li, C. Papachristou, and D. Weyer, "Road pothole detection system based on stereo vision," 2018. [CrossRef]
  38. M. Yaseen, "What is YOLOv9: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector," arXiv, arXiv:2409.07813, Sep. 2024. [Online]. Available: https://arxiv.org/abs/2409.07813.
  39. J. R. Terven and D. M. Cordova-Esparza, "A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS," arXiv, arXiv:2304.00501, Jan. 2024. [Online]. Available: https://arxiv.org/abs/2304.00501.
  40. V. Belton and T. Stewart, Multiple Criteria Decision Analysis: An Integrated Approach. Berlin, Germany: Springer Science & Business Media, 2012.
Figure 1. System Overview
Figure 1. System Overview
Preprints 144069 g001
Figure 5. Recall-Confidence Curve
Figure 5. Recall-Confidence Curve
Preprints 144069 g005
Figure 8. Masking validation 1
Figure 8. Masking validation 1
Preprints 144069 g008
Figure 9. Masking validation 2
Figure 9. Masking validation 2
Preprints 144069 g009
Figure 10. Confusion Matrix
Figure 10. Confusion Matrix
Preprints 144069 g010
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated