Preprint
Article

This version is not peer-reviewed.

Development of An Effective Corruption-Related Scenario-Based Testing Approach for Robustness Verification and Enhancement of Perception Systems in Autonomous Driving

A peer-reviewed version of this preprint was published in:
Sensors 2024, 24(1), 301. https://doi.org/10.3390/s24010301

Submitted:

17 November 2023

Posted:

20 November 2023

You are already at the latest version

Abstract
Since sensor-based perception systems are used in autonomous vehicle applications, validating such systems is imperative to guarantee the robustness of the systems before they are being put to use. In this study, a comprehensive corruption-related simulation-based robustness verification and enhancement process for sensor-based perception systems is proposed. Firstly, we present a methodology and scenario-based corruption generation tools for creating diverse simulated test scenarios that can analogously represent real-world traffic environments, especially considering corruption types related to safety concern. Then, an effective corruption similarity filtering algorithm is proposed to remove corruption types with high similarity and identify the representative corruption types to represent all considered corruption types. As a result, we can generate efficient corruption-related robustness test scenarios with less testing time and good scenario coverage. Subsequently, we perform the vulnerability analysis of object detection models to identify model weaknesses and construct an effective training dataset for model vulnerability enhancement. This enhances the tolerance of object detection models to weather and noise-related corruptions, ultimately improving the robustness of the perception system. We employ case studies to demonstrate the feasibility and effectiveness of the proposed robustness verification and enhancement procedures. Additionally, we explore the impact of different "similarity overlap threshold" parameter settings on scenario coverage, effectiveness, scenario complexity (size of training and testing datasets), and time costs.
Keywords: 
;  ;  ;  ;  

1. Introduction

In recent years, as major automakers have actively pursued the development of autonomous vehicles, the requirements of design and testing validation for self-driving cars have become increasingly stringent. These vehicles must demonstrate their reliability and safety to meet the safety standards for autonomous driving. Autonomous driving systems primarily consist of four processing modules: sensor-based perception, localization, trajectory planning and control. These modules are interdependent, and the performance of the sensor-based perception system, in particular, is crucial. Insufficient robustness in this system can lead to severe traffic accidents and pose significant risks. The sensor-based perception system comprises the primary sensors, including the camera, LiDAR, and radar. These various sensors are capable of detecting environmental cues, yet each possesses its own strengths and weaknesses. The perception functionality could be susceptible to corruption or misjudgment under varying road situations, weather conditions, and traffic environments. Therefore, how to ensure that the autonomous perception system can operate safely and reliably in various driving scenarios is of utmost importance, as it will influence the reliability and safety performance of the self-driving systems [1]. The test coverage affects the testing quality of the autonomous driving systems. Increasing the coverage of rare driving situations and corner cases is thus crucial. To collect data from all possible scenarios through real-world testing would require a minimum of 275 million miles of testing mileage [2], incurring substantial costs and time, making it a challenging and economically impractical endeavor.
The methodology of testing and verification of autonomous driving systems primarily consist of simulation approach and real-world testing. Currently, we observe that many major automakers are conducting real-world road tests with autonomous vehicles. However, most of these tests are conducted in common and typical scenarios. To test for safety-critical or rare, low-probability adverse driving scenarios and corner cases, one must contend with certain risks and substantial costs and time. Even when there is an opportunity to test rare combinations of factors such as weather, lighting, traffic conditions, and sensor noise corruption that lead to failures in autonomous perception systems [3–5], replicating the same real-world testing conditions remains challenging and infeasible.
On the other hand, vehicle simulation approach allows for the rapid creation of various test scenarios, including adjustments to different weather conditions within the same test scenario. The advantages of simulation are easy to build the desired scenarios and perform the scenario-based testing with lower cost and safe manner. Presently, there are various vehicle simulation tools available, such as Vires Virtual Test Drive (VTD) [6] , IPG Automotive CarMaker[7] , CARLA [8], etc. However, the soundness of test results depends on the accuracy of simulating sensors, vehicles, and environments. For instance, simulating rainfall involves adjusting rainfall model parameters like intensity, and the challenge is how to match real-world rainfall conditions, such as an hourly rainfall rate, for example 20 mm/h in the simulation environment. Additionally, the modeling accuracy of simulating cameras installed behind the windshield of autonomous vehicles is limited by factors like raindrops affecting the camera's view, which cannot be fully replicated at present.
In the context of autonomous driving perception systems, object detection is of paramount importance. Any failure or anomaly in object detection can significantly affect the system's prediction and decision-making processes, potentially leading to hazardous situations. Therefore, extensive testing and verification must be conducted under various condition combinations. According to ISO 26262 standards, autonomous driving systems are allowed a maximum of one failure per 10 9 kilometers driven, making it impractical to conduct real-world testing and verification [9].
Currently, there are several real traffic image datasets available, such as nuScenes [10], BDD100K [11], Cityscapes [12], KITTI [13], etc., which assist in training and verifying object detection models. However, these datasets still cannot cover all possible environmental influences. Some image datasets are recorded only under favorable weather conditions, and moreover the scenario parameters cannot be quantified to understand the severity of the corruption types including weather-related and noise-related factors and their impact on the object detection capability.
Adverse weather conditions can lead to a decrease in object detection accuracy. In addition, when camera noise appears in the image, even a single-pixel corruption has the potential to cause errors or failures in the object detection [14]. This, in turn, can result in erroneous judgments during autonomous vehicle operation. Presently, there is no comprehensive and quantifiable benchmark test dataset for adverse weather and image noise corruption conditions, which could provide adequate test coverage for the verification process of the autonomous driving systems.
While there has been related research using vehicle simulation environments to test and verify advanced driver-assistance systems by adjusting various weather, lighting, and camera conditions, it still lacks for simulating the scenarios where raindrops interfere with the image lens mounted behind the windshield and the severity effect of image lens corruption on image quality. In real-world situations, this can affect object detection accuracy. Hence, leveraging the advantages of simulation and establishing effective benchmark test datasets for testing and verification, and improving the reliability and robustness of autonomous vehicle perception systems is a topic worthy of exploration.
This study primarily explores how to integrate raindrops falling on the windshield and various types of weather-related and noise-related corruptions into scenarios in the simulation environment. By adjusting weather-related and noise-related parameters, it aims to establish an effective benchmark test dataset to aid in testing and verifying the reliability and robustness of object detection models. Furthermore, through this benchmark test dataset, the study intends analyzing the vulnerabilities of object detection models. This analysis will help designers identify areas of improvement and propose enhanced training datasets for transfer learning, thereby enhancing model robustness and reliability.
The primary challenge here lies in the complexity of scenarios involving single corruption factors and combinations of multiple corruption factors, resulting in a considerable volume of test data and time cost for verification. To facilitate rapid testing and verification in the early stage of system development while ensuring the quality of the test dataset, this work introduces a corruption similarity analysis algorithm. This algorithm explores the similarity of different corruptions and relies on the setup of overlapping threshold value to reduce corruption types with higher similarity. The choice of overlapping threshold value affects the number of retained corruption types, the size of training and test datasets, training and verification time, and test coverage. By analyzing these factors, appropriate "corruption overlapping threshold" can be set to obtain an optimal benchmark test dataset that meets the requirements of time cost and test scenario coverage, considering both cost-effectiveness and testing quality.
The remaining paper is organized as follows. In Section 2, the related works are summarized. An effective methodology of corruption-related simulation-based testing scenario benchmark generation for robustness verification and enhancement is proposed in Section 3. Then, we analyze and discuss the experimental results in Section 4. The conclusions appear in Section 5.

4. Experimental Results and Analysis

4.1. Corruption Types and Benchmark Dataset Generation

This work is based on the VTD vehicle simulation platform and demonstrates two of the most common driving scenarios: city roads and highways. Two city road scenarios and two highway scenarios were created. The recorded video footage was captured using cameras installed on the windshields of simulated vehicles, with a resolution of 800 × 600 pixels and a frame rate of 15 frames per second (FPS). These recordings were used to create benchmark test datasets and model enhancement training datasets. In total, there were 5,400 original, unaltered video frames for the city road scenarios and 3,840 for the highway scenarios.
As shown in Table 9, we utilized the weather-related, and noise-related generation tools proposed in this study to demonstrate the robustness and performance testing of object detection models under nine different corruption types. The weather corruption types include fog and rain. There are a total of five severity regions for fog, with visibility ranging from 200 to 20 meters. In the case of rain corruption, there are two severity regions, with rainfall ranging from 43 to 54.8 mm/h, resulting in approximately 300 to 100 meters of visibility. Noise corruption types comprise seven categories: hot pixel defect, single pixel defect, cluster 22/33/44 pixel defect, column pixel defect, and raindrop corruption. Among these, raindrop corruption is configured with two severity regions, including rainfall rates of 20-50 mm/h and raindrop diameters of 0.183-0.229 cm. The remaining noise corruption types are set with three severity regions, and the range of image noise corruption is 1-15%.
We started by generating two original, corruption-free image datasets using the VTD simulation platform: one for city scenes and the other for highways, each containing 5400 images. The benchmark testing dataset includes datasets for all individual corruption types and severity regions. For the complete benchmark testing dataset, we generated datasets with corruption for each corruption type and severity region by using weather and noise-related corruption generation tools on the original, corruption-free image dataset. Overall, we demonstrated a total of nine different corruption types, along with an image type with no corruption. As mentioned before, each severity region in a corruption type needs 5400 original, corruption-free images to create single-corruption induced scenarios. Therefore, the total number of images for the city and highway original benchmark testing datasets is 5400 images × 28 = 151,200 images, respectively.
As displayed in Figure 10, using the VTD simulation combined with the weather and noise-related generation tools, we can clearly observe that as severity increases, the corruption in the images also significantly increases. Users can quantitatively adjust severity to generate benchmark testing image datasets with varying degrees of corruption. This capability helps improve test coverage for rare adverse weather or noise corruption scenarios, ensuring that the system's reliability and robustness meet requirements.

4.2. Corruption Type Selection

Due to the differences between city and highway scenarios, such as the presence of various buildings and traffic participants in city areas, as opposed to the relatively simpler highway environment, the performance of object detection models in autonomous driving perception systems may vary. The reliability and robustness of these models under different scenarios and corruption types may also differ. Therefore, we present experimental demonstrations in both city and highway scenarios. We use the "faster_rcnn_resnet101_coco" pre-trained model provided in the TensorFlow model zoo, employing corruption filtering algorithms and adjusting different overlap threshold parameters to remove corruption types with high similarity. We then compare and analyze the differences in filtered corruption types after filtering in the city and highway scenarios.
As illustrated in Figure 11, the experimental results of the overlapping scores for all corruption types in city and highway scenarios reveal that two corruption types, "fog" and "rain," exhibit high similarity in both scenarios. However, the similarity of these two corruption types with other corruption types is significantly lower. Furthermore, the "raindrops" corruption type also shows a considerable dissimilarity with other corruption types. In addition, it is evident that there are differences in the overlapping scores between the two scenarios, especially in the case of the "noise" corruption category. We observed another interesting point that the overlapping score between "raindrop" and "column" corruption types is 0.38 in the highway scenario, whereas it is zero in the city scenario. This variation may be primarily attributed to the traffic environment of the city and highway scenarios. Therefore, it is necessary to conduct separate testing and validation analyses for different categories of scenarios to assess the impact of different corruption types and their severity.
Figure 12 and Figure 13 show filtered corruption types in city and highway scenarios with different thresholds. For city scenarios, the thresholds were 0.4 and 0.6, resulting in three and four filtered corruption types, respectively. For highway scenarios, the thresholds were 0.4 and 0.5, also yielding three and four filtered corruption types, respectively. In the city scenario, with an overlap threshold of 0.6, corruption types were reduced to four, which included "fog," "cluster 22,", "raindrop", and "column". While threshold reduces to 0.4, column corruption was removed, leaving the other three corruption types unchanged. On the other hand, in the highway scenario, we obtain the same filtered corruption types with the city scenario while setting overlap thresholds at 0.4 and 0.5.
After performing the filtering similarity corruption algorithm, the corruption grouping algorithm based on the filtering results was used to acquire the corresponding corruption groups. Figure 14 exhibits the corruption groups formed by different thresholds in city and highway scenarios. For threshold at 0.4, the first group includes two corruption types: "fog" and "rain," with "fog" representing this group. The second group comprises six corruption types: "hot pixel," "single pixel," "cluster 22/33/44," and "column", with "cluster 22" as the representative corruption type. The third group consists of only one corruption type, "raindrop". For city threshold at 0.6 and highway threshold at 0.5, the only change is in the group represented by "cluster 22", where the "column pixel" corruption becomes a separate group.

4.3. Model Corruption Type Vulnerability Analysis and Enhanced Training

If the object detection model is trained with less severe or almost non-impactful data or if it uses a dataset that includes all possible corruption types and severity regions, it may consume a significant amount of training time and data space, with limited improvement in overall model robustness. Concerning about the effectiveness of the training process for the models, we propose an approach based on the corruption type vulnerability analysis to discover the severity region which has the most significant impact on the accuracy of model for the considered corruption types. The main objective of this approach is to identify the severity region with the highest impact on the model for each corruption type. Subsequently, we can propose an appropriate enhanced training dataset for the vulnerability region of each filtered corruption type and perform transfer learning. In this way, the size of the enhanced training dataset and the time required for model training can be reduced significantly, and meanwhile, we can efficiently improve the model robustness for the considered corruption types.
We use an example as shown in Table 10 to explain how to identify the vulnerability region for a corruption type. Table 10 lists nine considered corruption types C1 ~ C9 and according to Figure 13(b), C1 and C7 ~ C9 are the representative corruption types selected from the corruption filtering algorithm. In the ‘Fog’ corruption type, we vary visibility from 200 to 20 meters, with a difference of 30 meters between each severity level. This results in a total of seven different levels of severity (NSL). In the "Noise" corruption types, "Cluster22" and "Column", we set severity levels between 1% and 15%, with a 2% difference between each severity level, resulting in eight different levels of severity for each type. Finally, for the "Raindrop" corruption type, we vary rain quantity from 20 to 50 mm and raindrop diameter from 0.183 to 0.229 cm, resulting in three different levels of severity. Then, we create test datasets for the representative corruption types with severity levels shown in Table 10 in both city and highway scenarios. Therefore, the number of test datasets required for C1 and C7 ~ C9 are 7, 8, 8 and 3, respectively.
Next, we conduct the model test for each test dataset and analyze the slope variations of the different severity regions in terms of changes in object detection accuracy. The test results are exhibited in Table 11. As shown in Table 11, in this experiment, we utilized the faster_rcnn_resnet101_coco pre-trained model provided in the TensorFlow model zoo for object detection. Subsequently, we employed two distinct clean datasets: one representing city roads and the other representing highways. We first conducted transfer learning to develop two object detection models, namely city_clean and highway_clean models. Then, we proceeded to evaluate and compare the impact of different corruption types and severity levels on the city_clean and highway_clean models within the two distinct scenes. From the experimental results, it is evident that even though the corruption types tested in both city and highway scenes are the same, the model performs notably better in the highway scene. The primary reason for this discrepancy lies in the fact that the overall complexity of backgrounds in highway images is considerably lower compared to city road scenes. City roads are characterized by various buildings and diverse objects in the vicinity, leading to lower overall model detection accuracy. From Table 11, we observe significant differences in the object detection accuracy for the fog corruption type in both city and highway scenes. In the visibility range of 200m to 20m, the overall model Average Precision (AP) drops by 68.67% in city road scenes, while in the highway scene, the drop is only 49.06%. Specifically, under a severity level of 20 meters, the test results differ by as much as 32.56, highlighting the more pronounced and severe impact of fog corruption on the model in city road scenarios.
Within the same corruption type, we assessed the influence of different severity regions on the model's detection rate by analyzing the variation in detection accuracy with severity levels. For each corruption type and its respective severity regions, we calculated the slope of the variation in model detection accuracy and identified the severity region with the steepest slope. This region represents the model's vulnerability to that specific corruption type, indicating the area where the model requires enhanced training most. From Table 11, we see that the visibility region between 50m and 20m represents the severity region with the highest slope of change for fog in both city and highway contexts. For the other three corruption types, their influence on object detection in city and highway scenes is less pronounced than fog. From the experimental results, it becomes evident that different corruption types affect the model to varying degrees. This insight allows us to identify which corruption types require special attention in terms of enhancing the model's detection capabilities. In city road scenes, the severity regions with the most significant impact for Column and Cluster22 corruption types are 13% to 15%, and for Raindrop corruption, it is 20mm to 35mm/h. In the highway scenario, except for Column corruption, which has a severity range of 1% to 3%, the chosen severity regions for other corruption types are the same as those in the city scene. Subsequently, based on the identified vulnerability regions for corruption types, we propose the enhanced training dataset to conduct transfer learning to improve the model robustness. This approach effectively improves the model's robustness and meanwhile reduces the training time cost.
The construction of the scene for enhanced training dataset was carried out using the VTD simulation platform, where city and highway scenarios were established. The simulation was configured to generate 15 frames per second, resulting in a total of 3,840 frames for each scenario. Additionally, weather conditions and image noise, along with a raindrop generation tool proposed in this research, were used to generate training datasets for each corruption type with their respective vulnerability regions. Furthermore, within the dataset for model enhancement and transfer learning, we divide it into two benchmarks: Benchmark 1, focusing on the enhancement training of a single corruption type, and Benchmark 2, which combines two corruption types for training. To prevent the issue of model forgetting during training, we incorporate a set of clean images, not affected by any corruption, into each dataset. This helps ensure that the accuracy of object detection for objects unaffected by corruptions is not compromised.
We performed a comparative analysis of the robustness performance in city and highway scenes using different enhanced training datasets for model transfer learning. Based on the publicly available pre-trained model, the faster_rcnn_resnet101_coco model provided in the TensorFlow model zoo, which we refer to as M1 object detection model. The hyperparameters of training process for transfer learning is depicted as follows. The number of input images per step (IS) was fixed at two, the total number of epochs (NE) for model enhancement transfer learning was set to 8, and the experiment was conducted on a computer equipped with an Intel i5-10400 CPU, DDR4 16GB RAM, and a GeForce RTX™ 3060 GPU. The training time required for each step in the M1 model transfer learning, denoted as ETM1, is approximately 0.275 seconds. We then compared the training times and robustness improvements for different training datasets.
Table 12 shows the performance of our detection model in city and highway scenarios, undergoing reinforcement training for a single corruption type using four distinct reinforcement learning datasets. These datasets are described below.
  • Dataset without any corruption.
  • Dataset containing all corruption types with all severity regions.
  • Dataset with three corruption types (F3) derived from the corruption filtering algorithm.
  • Dataset with four corruption types (F4) derived from the corruption filtering algorithm.
We investigated the impact of different reinforcement learning datasets on the training time required for model training while maintaining the same training hyperparameters as given above. Relevant parameters are defined as follows:
  • NEI: Number of images containing in an enhancement dataset.
  • NEC: Number of enhancement corruption types.
  • ECi: The i-th enhancement corruption type, where i=1 to NEC
  • NER(ECi): Number of severity regions for enhancement corruption type i, where i is from 1 to NEC.
  • ECi(Rj): The j-th region of enhancement corruption type i, where j = 1 to NER(ECi)
  • For each severity region ECi(Rj), the number of reinforcement training images is denoted as NEI.
  • IS: Number of input images for each transfer learning step.
  • ET: Estimated the time of required for transfer learning in second per step for object detection model.
  • NE: The total number of epochs for model reinforcement transfer learning.
  • MET: The training time required for object detection model reinforcement.
The first training model, M1-Clean, uses only the clean training dataset without any corruption for reinforcement training. Here, NEI is 3840 images. The estimated training time for model reinforcement is calculated as   M E T = N E I × E T × N E I S . Since this model is trained solely with the clean dataset without any corruption, it requires the shortest time and the fewest training images. Therefore, M1-Clean serves as the baseline for evaluating the robustness of other reinforcement training models.
The second training model, M1-All, incorporates all corruption types and severity regions into the reinforcement training dataset. As shown in Table 10, there are a total of nine corruption types (NEC=9). The number of severity levels for NER(ECFog) is seven, while both NER(ECRain) and NER(ECRaindrop) have three severity levels each. Finally, the corruption types Hot, Single, Cluster22/33/44, and Column all have eight severity levels. In total, there are 61 severity levels for all corruption types. Therefore, the total number of images in the reinforcement training dataset including the clean type is 62 × N E I . The estimated training time for this model is 62 × N E I × N E × E T I S , making it the model with the highest number of reinforcement training images and the longest training time among the four models. The third and fourth training models, M1-B1F3 and M1-B1F4, were created based on the enhanced training dataset generated from the vulnerability regions of the representative corruption types. Both city and highway scenes have the same set of representative corruption types, with a total of 3 corruption types (NEC=3) for M1-B1F3 and 4 corruption types (NEC=4) for M1-B1F4. Therefore, the estimated training time for M1-B1F3 and M1-B1F4 are 4 × N E I × N E × E T I S and 5 × N E I × N E × E T I S respectively. Overall, the required training time for M1-B1F3 and M1-B1F4 is significantly reduced compared to M1-All.
As depicted in Table 12, the four reinforcement training datasets not only differ significantly in the number of images but also show a substantial difference in the estimated training time for model reinforcement transfer learning, especially the M1-All model, which is trained with all corruption types and severity levels. The overall training time for M1-All is much higher compared to models that have reduced corruption types (M1-B1F3 and M1-B1F4). We note that blindly increasing the number of training images without a proper selection strategy can lead to a sharp increase in training time. On the other hand, if additional reinforcement training data consists of non-severe or less impactful datasets, it may incur significant training costs and time without substantial improvements in the robustness of the object detection model. The issue of how effective of M1-B1F3 and M1-B1F4 compared to M1-All model will be discussed in the following subsection.

4.4. Robustness Analysis of Enhanced Training Models

Table 13 and Table 14 show the experimental results of Table 6 described previously. Four different test datasets were used for assessing robustness, where the assessment method involves summing and averaging the average precision (AP) scores of individual corruption types. The first test dataset comprises all corruption types plus the clean dataset, and the models were tested individually for each corruption type, followed by summing and averaging the AP scores. The results indicate that the object detection model trained with representative corruption types closely resembles the model trained with all corruption types. In the city scenario of Table 13, the two models exhibit AP scores of 75.8% and 75.06%, differing by only 0.74%. Similarly, in the highway scenario, the two models have very similar AP scores of 89.38% and 88.55%, with a difference of 0.83%. Table 14 presents the similar results. This demonstrates that the corruption filtered algorithm proposed in this study effectively reduces training time by removing highly similar corruption types while maintaining high training coverage. As a result, it achieves similar robustness to models trained with all corruption types.
The second test dataset was constructed from the representative corruption types plus the clean dataset. To objectively assess the robustness performance of these test datasets, the AP scores were calculated based on the group's representative corruption type's AP score, multiplied by the number of corruption types in that group, and then summed and averaged. This method enables a fair comparison between the test datasets formed by representative corruption types and all corruption type test dataset. As can be seen from Table 13, the results between the test datasets formed by representative corruption types and all corruption type test dataset are very similar for City_M1-B1F3, and highway_M1-B1F3 possesses the same phenomenon. It indicates that test dataset formed by representative corruption types maintains a similar test coverage to all corruption type test dataset. Besides, we observe the similar results in Table 14.
Another interesting point to be explored is the impact of setting overlap threshold values on the number of representative corruption types and the coverage of model training and testing. In both city and highway scenarios, the models M1-B1F3 and M1-B1F4 exhibit highly similar robustness performance on all corruption type test dataset. In this context, three representative corruption types are sufficient to cover all considered corruption types, and adding a fourth corruption type, such as Column corruption, does not provide any significant improvement. The third and fourth test datasets in Table 13 and Table 14 were built by corruption groups. These test datasets can be used to verify how effective of the representative corruption type to represent its corresponding corruption group. The results between x_M1-B1Fy, where x = City or Highway, y= 3 or 4, and M1-All for the third and fourth test datasets exhibit the effectiveness of the representative corruption type to represent its corresponding corruption group. For example, the data shown in the third test dataset of Table 13 (a) are almost identical and it means that the fog corruption can represent the rain corruption very well. Similarly, data shown in the fourth test dataset indicate that cluster22 also represents its corresponding corruption group well. In summary, according to Table 13 and Table 14, we can assure that the fog and cluster 22 can represent their corresponding corruption groups well. Consequently, we can achieve the good training and test coverage with lower training and test time cost.

4.5. Exploring Scenarios with Two Corruption Combinations

In real-world scenarios of autonomous vehicles, besides being affected by a single type of corruption, there are often situations where two different corruption types occur simultaneously, such as rain and fog occurring together. Therefore, we should further investigate the model's robustness performance in scenarios where two corruption types are combined. To simplify the experiments, we used three representative corruption types in city and highway scenarios to create reinforcement training datasets for M1-B2F3 model. As shown in Table 7 and Table 15, we divided the severity regions of corruption types into two sub-regions, denoted as SR1 and SR2. SR1 and SR2 sub-regions are represented by the severity data as described in Table 15, indicating the severity level of that corruption type's sub-region. There are three pairs of two-corruption combinations, including (Fog & Cluster22), (Fog & Raindrop), and (Cluster22 & Raindrop). To simplify the demonstration, the training datasets of two-corruption type combinations for (corruption type 1, corruption type 2) only consider the combinations of [corruption type 1 (SR1), corruption type 2 (SR1)] and [corruption type 1 (SR2), corruption type 2 (SR2)] as shown in Table 15. However, we also can follow the combinations described in Table 7 to increase the coverage of training.
For each two-corruption type combination, we generated its reinforcement training dataset using the following method: we used the original corruption-free image dataset, with the first half of the images generated with the SR1 severity combination from Table 15 to introduce two-corruption induced scenarios, and the second half of the images generated with the SR2 severity combination to introduce two-corruption induced scenarios. For example, in the case of the Fog & Cluster22 combination, the first half of the image dataset was created with a severity combination of 42.5m+13.5%, and the second half was produced with a severity combination of 27.5m+14.5%. The reinforcement training datasets for the two-corruption type combinations required 4 × N E I images, and the test datasets for two-corruption combinations were constructed in the same manner.
Table 16 displays the performance of single and two-corruption type trained models in city and highway scenarios on single and two-corruption test datasets. From these data, it can be observed that the performance of model trained with two corruption types is slightly inferior to the single corruption type trained model on B1 single corruption test dataset and approximating the result of the M1-All model trained with all corruption types. All four models show a significant decrease in performance on the two-corruption test datasets, where the model trained with two corruption types outperforms the others. This suggests that in scenarios involving two-corruption type combinations, despite their lower occurrence probability, the testing of such combinations can reveal shortcomings in single corruption type trained models, highlighting the need for specific training on scenarios involving two corruption types to enhance the overall model robustness. This is an area that can be explored in future research.

4.6. Real-World Scenario Testing and Verification

In real-world driving environments for autonomous vehicles, the perception system may encounter adverse conditions or various potential corruptions, leading to system failures or anomalies. To improve the test coverage of these adverse scenarios or corner cases, in the experiments mentioned above, we generated datasets containing various weather and image noise corruption types using a vehicle simulation platform and developed the tools for generating scenario-based test datasets. The results of the experiments demonstrate that by identifying model vulnerabilities through the vulnerability analysis proposed in this study and building effective model vulnerability enhancement training datasets, the tolerance and robustness of object detection models to weather and noise-related corruptions can be significantly improved.
From the previous experimental results, it has been shown that the proposed robustness verification and enhancement approach can effectively improve the robustness of the detection models when tested against benchmark test datasets composed of various corruption types in simulated environments. To further confirm that object detection models trained through our method can perform well in real-world situations, we used the DAWN real-world adverse weather driving scenario dataset [39] and the Foggy Cityscapes real-world scene dataset with foggy weather as benchmark test datasets for real adverse driving scenarios to test our trained models. In the DAWN dataset, we excluded snowy corruption type that was not included in the experiments and the images with no vehicles. Additionally, for Foggy Cityscapes[40], we set the visibility parameter β to 0.08 to produce more severe fog with a visibility of approximately 37 meters in real-world scene datasets. Then, we can evaluate the robustness and reliability of models trained by simulated scenarios with these two real-world test datasets.
Table 17 presents the performance of four object detection models trained using different enhancement training datasets for city and highway scenarios. We observed that the models' robustness performance in real-world scenarios is significantly lower compared to the results obtained in virtual simulation environments, which is a common phenomenon in current virtual simulation environments. It should be pointed out that the model performance becomes very poor for Foggy Cityscapes test dataset because this test dataset was set under very severe fog condition. Our analysis here mainly focuses on the robustness verification and enhancement procedures proposed in this research, and how the models trained using the proposed methods perform in real-world scenarios. We note that the models trained by corruption-enhanced dataset indeed exhibit better robustness in real-world scenarios compared to models that have not undergone corruption-enhanced training.
From Table 17, we can observe that in city and highway scenarios, robustness performance of M1-B1F3 on the real-world test dataset is higher compared to the M1-Clean model. This demonstrates that the approach proposed in this research is indeed capable of improving the robustness of object detection models. We also found that the models trained in city scenarios exhibit better robustness performance compared to those trained in highway scenarios except M1-B1F3 model at DAWN test dataset. This is likely because city scenarios are more complex, leading to more effective training.
Besides, we adopted a strategy from reference [41] that involves incorporating a small amount of real image data for secondary transfer learning to enhance models further. This method has proven to be effective in enhancing the robustness of object detection models in real environments. Therefore, we introduced 800 images from the BDD dataset that were not affected by any corruption into both the M1-Clean and M1-B1F3 models. Subsequently, we performed secondary transfer learning and validated the models using two real-world benchmark datasets with adverse weather conditions. The experimental results clearly show that the M1-Clean_R800 and M1-B1F3_R800 models, trained with the addition of a small number of real data, can significantly improve the robustness compared to models trained using only simulated datasets. Particularly, the M1-B1F3_R800 model exhibits excellent improvement of robustness performance in adverse real-world weather conditions compared to models trained using only simulated datasets. Overall, the experimental results validate that the robustness verification and enhancement procedures proposed in this research can achieve good training and test coverage in both simulated and real-world environments for object detection models. Additionally, our approach can greatly reduce the time required for model enhancement training and validation. Through the various model enhancement schemes presented, designers can choose a suitable method for enhancing object detection models and conducting test verification with an efficient manner.

5. Conclusions

The design and test of autonomous driving systems require demonstrating their reliability and safety in any environment, especially when it comes to the robustness testing and verification of perception systems. Based on sensor-based perception systems, this study proposes a comprehensive method for robustness validation and enhancement using a simulation platform. We delve into issues related to test scenario coverage/effectiveness and scenario complexity (training and testing dataset sizes)/time efficiency. There are numerous factors affecting autonomous driving safety, and in this study, we focus on corruptions related to weather and noise, especially the corruption effects of single-corruption induced, and two-corruption induced scenarios on the robustness of autonomous driving perception systems. Given the complexity of single corruption and combinations of multiple corruptions in test scenarios and the substantial testing time required, it is essential to ensure the quality of test datasets and to conduct rapid testing and validation during the early stages of system development. Therefore, we propose an effective corruption filtering algorithm to reduce the number of considered corruption types to mitigate the complexity of test datasets and meanwhile maintain the good test coverage. We investigate the relationship between the parameter of "overlap threshold" in the corruption filtering algorithm and scenario complexity/time cost and test scenario coverage. This parameter setting affects the number of selected corruption types, the size of training and testing datasets, training and testing time, and test coverage rate. Through this analysis, an appropriate "overlap threshold" parameter value can be set to meet the requirements of effectiveness and economy, resulting in a set of optimal benchmark test datasets that satisfy time cost and test scenario coverage requirements. This ensures improving the test scenario coverage and effectiveness in less testing time.
To expedite the generation of benchmark datasets, we have developed the tools for generating simulated test scenarios dataset, which comprise weather-related test scenario generator and sensor noise injectors to emulate real traffic environments. We then utilize these benchmark datasets to test object detection models of perception systems. The model vulnerability analysis was performed to identify the fragile region of corruptions and create effective model vulnerability-enhanced training datasets to enhance the model's tolerance to weather and noise-related corruptions, thereby improving the perception system's robustness. We use case studies to demonstrate how to generate test scenarios related to weather (e.g., rain, fog) and noise (e.g., camera pixel noise) and perform robustness testing of the perception system using object detection models in these test scenario datasets. Then, the corruption similarity filtering algorithm was employed to identify the representative corruption types to represent all considered corruption types. Subsequently, we showcase the identification of vulnerability regions of representative corruption types and enhancement process using data augmentation techniques to generate effective training datasets for enhancing the robustness of the perception system. We further discuss the effect of two-corruption induced scenarios on the robustness of the models and leave the issue to be explored in future research. Finally, we verified our models trained in simulated environment with the real-word adverse test datasets to ascertain the effectiveness of our proposed approach.

References

  1. Min, K.; Han, S.; Lee, D.; Choi, D.; Sung, K.; Choi, J. SAE Level 3 Autonomous Driving Technology of the ETRI. In Proceedings of the 2019 International Conference on Information and Communication Technology Convergence (ICTC), October 2019; pp. 464–466. [Google Scholar]
  2. Klück, F.; Zimmermann, M.; Wotawa, F.; Nica, M. Genetic Algorithm-Based Test Parameter Optimization for ADAS System Testing. In Proceedings of the 2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS), July 2019; pp. 418–425. [Google Scholar]
  3. Koopman, P.; Wagner, M. Autonomous Vehicle Safety: An Interdisciplinary Challenge. IEEE Intell. Transp. Syst. Mag. 2017, 9, 90–96. [Google Scholar] [CrossRef]
  4. Pezzementi, Z.; Tabor, T.; Yim, S.; Chang, J.K.; Drozd, B.; Guttendorf, D.; Wagner, M.; Koopman, P. Putting Image Manipulations in Context: Robustness Testing for Safe Perception. In Proceedings of the 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), August 2018; pp. 1–8. [Google Scholar]
  5. Bolte, J.; Bar, A.; Lipinski, D.; Fingscheidt, T. Towards Corner Case Detection for Autonomous Driving. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), June 2019; pp. 438–445. [Google Scholar]
  6. VIRES Simulationstechnologie GmbH VTD - VIRES Virtual Test Drive 2022.
  7. CarMaker | IPG Automotive. Available online: https://ipg-automotive.com/en/products-solutions/software/carmaker/ (accessed on 11 October 2022).
  8. Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An Open Urban Driving Simulator. In Proceedings of the Proceedings of the 1st Annual Conference on Robot Learning; PMLR, October 18 2017; pp. 1–16.
  9. Bernuth, A. von; Volk, G.; Bringmann, O. Simulating Photo-Realistic Snow and Fog on Existing Images for Enhanced CNN Training and Evaluation. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), October 2019; pp. 41–46. [Google Scholar]
  10. Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020; pp. 11618–11628. [Google Scholar]
  11. Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020; pp. 2633–2642. [Google Scholar]
  12. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. ArXiv160401685 Cs 2016. [Google Scholar]
  13. Geiger, A.; Lenz, P.; Urtasun, R. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, June 2012; pp. 3354–3361. [Google Scholar]
  14. Su, J.; Vargas, D.V.; Kouichi, S. One Pixel Attack for Fooling Deep Neural Networks. IEEE Trans. Evol. Comput. 2019, 23, 828–841. [Google Scholar] [CrossRef]
  15. Zofka, M.R.; Klemm, S.; Kuhnt, F.; Schamm, T.; Zöllner, J.M. Testing and Validating High Level Components for Automated Driving: Simulation Framework for Traffic Scenarios. In Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV), June 2016; pp. 144–150. [Google Scholar]
  16. Yu, H. ; Xin Li Intelligent Corner Synthesis via Cycle-Consistent Generative Adversarial Networks for Efficient Validation of Autonomous Driving Systems. In Proceedings of the 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), January 2018; pp. 9–15. [Google Scholar]
  17. Muckenhuber, S.; Holzer, H.; Rübsam, J.; Stettinger, G. Object-Based Sensor Model for Virtual Testing of ADAS/AD Functions. In Proceedings of the 2019 IEEE International Conference on Connected Vehicles and Expo (ICCVE), November 2019; pp. 1–6. [Google Scholar]
  18. Menzel, T.; Bagschik, G.; Isensee, L.; Schomburg, A.; Maurer, M. From Functional to Logical Scenarios: Detailing a Keyword-Based Scenario Description for Execution in a Simulation Environment. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), June 2019; pp. 2383–2390. [Google Scholar]
  19. Horé, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, August 2010; pp. 2366–2369. [Google Scholar]
  20. Zhu, H.; Ng, M.K. Structured Dictionary Learning for Image Denoising Under Mixed Gaussian and Impulse Noise. IEEE Trans. Image Process. 2020, 29, 6680–6693. [Google Scholar] [CrossRef] [PubMed]
  21. Wu, D.; Du, X.; Wang, K. An Effective Approach for Underwater Sonar Image Denoising Based on Sparse Representation. In Proceedings of the 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), June 2018; pp. 389–393. [Google Scholar]
  22. Zhou Wang; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [CrossRef]
  23. Cord, A.; Gimonet, N. Detecting Unfocused Raindrops: In-Vehicle Multipurpose Cameras. IEEE Robot. Autom. Mag. 2014, 21, 49–56. [Google Scholar] [CrossRef]
  24. Qian, R.; Tan, R.T.; Yang, W.; Su, J.; Liu, J. Attentive Generative Adversarial Network for Raindrop Removal from a Single Image. ArXiv171110098 Cs 2018. [Google Scholar]
  25. von Bernuth, A.; Volk, G.; Bringmann, O. Rendering Physically Correct Raindrops on Windshields for Robustness Verification of Camera-Based Object Recognition. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), June 2018; pp. 922–927. [Google Scholar]
  26. Porav, H.; Musat, V.-N.; Bruls, T.; Newman, P. Rainy Screens: Collecting Rainy Datasets, Indoors 2020.
  27. Deter, D.; Wang, C.; Cook, A.; Perry, N.K. Simulating the Autonomous Future: A Look at Virtual Vehicle Environments and How to Validate Simulation Using Public Data Sets. IEEE Signal Process. Mag. 2021, 38, 111–121. [Google Scholar] [CrossRef]
  28. Johnson-Roberson, M.; Barto, C.; Mehta, R.; Sridhar, S.N.; Rosaen, K.; Vasudevan, R. Driving in the Matrix: Can Virtual Worlds Replace Human-Generated Annotations for Real World Tasks? In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), May 2017; pp. 746–753. [Google Scholar]
  29. Gaidon, A.; Wang, Q.; Cabon, Y.; Vig, E. VirtualWorlds as Proxy for Multi-Object Tracking Analysis. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016; pp. 4340–4349. [Google Scholar]
  30. Jin, J.; Fatemi, A.; Pinto Lira, W.M.; Yu, F.; Leng, B.; Ma, R.; Mahdavi-Amiri, A.; Zhang, H. RaidaR: A Rich Annotated Image Dataset of Rainy Street Scenes. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), October 2021; pp. 2951–2961. [Google Scholar]
  31. Hendrycks, D.; Dietterich, T. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. ArXiv190312261 Cs Stat 2019. [Google Scholar]
  32. Muhammad, K.; Ullah, A.; Lloret, J.; Ser, J.D.; de Albuquerque, V.H.C. Deep Learning for Safe Autonomous Driving: Current Challenges and Future Directions. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4316–4336. [Google Scholar] [CrossRef]
  33. Nowruzi, F.E.; Kapoor, P.; Kolhatkar, D.; Hassanat, F.A.; Laganiere, R.; Rebut, J. How Much Real Data Do We Actually Need: Analyzing Object Detection Performance Using Synthetic and Real Data. ArXiv190707061 Cs 2019. [Google Scholar]
  34. Marzuki, M.; Randeu, W.L.; Schönhuber, M.; Bringi, V.N.; Kozu, T.; Shimomai, T. Raindrop Size Distribution Parameters of Distrometer Data With Different Bin Sizes. IEEE Trans. Geosci. Remote Sens. 2010, 48, 3075–3080. [Google Scholar] [CrossRef]
  35. Serio, M.A.; Carollo, F.G.; Ferro, V. Raindrop Size Distribution and Terminal Velocity for Rainfall Erosivity Studies. A Review. J. Hydrol. 2019, 576, 210–228. [Google Scholar] [CrossRef]
  36. Roser, M.; Kurz, J.; Geiger, A. Realistic Modeling of Water Droplets for Monocular Adherent Raindrop Recognition Using Bézier Curves. In Proceedings of the Computer Vision – ACCV 2010 Workshops; Koch, R., Huang, F., Eds.; Springer: Berlin, Heidelberg, 2011; pp. 235–244. [Google Scholar]
  37. Huang, W.; Lv, Y.; Chen, L.; Zhu, F. Accelerate the Autonomous Vehicles Reliability Testing in Parallel Paradigm. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), October 2017; pp. 922–927. [Google Scholar]
  38. Laugros, A.; Caplier, A.; Ospici, M. Using the Overlapping Score to Improve Corruption Benchmarks. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), September 2021; pp. 959–963. [Google Scholar]
  39. Kenk, M.A.; Hassaballah, M. DAWN: Vehicle Detection in Adverse Weather Nature Dataset 2020.
  40. Sakaridis, C.; Dai, D.; Van Gool, L. Semantic Foggy Scene Understanding with Synthetic Data. Int. J. Comput. Vis. 2018, 126, 973–992. [Google Scholar] [CrossRef]
  41. Poucin, F.; Kraus, A.; Simon, M. Boosting Instance Segmentation with Synthetic Data: A Study to Overcome the Limits of Real World Data Sets. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp. 945–953.
Figure 1. The process of generating benchmark datasets for robustness validation and enhancement of object detection models.
Figure 1. The process of generating benchmark datasets for robustness validation and enhancement of object detection models.
Preprints 90809 g001
Figure 2. An overview of the automated testing scenario generation process using VTD.
Figure 2. An overview of the automated testing scenario generation process using VTD.
Preprints 90809 g002
Figure 3. Workflow diagram of the automated testing scenario generation tool.
Figure 3. Workflow diagram of the automated testing scenario generation tool.
Preprints 90809 g003
Figure 4. VTD environmental simulation of different weather conditions. (a) no rain; (b) rain; (c) rain with visibility influence.
Figure 4. VTD environmental simulation of different weather conditions. (a) no rain; (b) rain; (c) rain with visibility influence.
Preprints 90809 g004
Figure 5. Workflow for generating noise-related and raindrop images.
Figure 5. Workflow for generating noise-related and raindrop images.
Preprints 90809 g005
Figure 6. Comparative analysis of the impact of noise corruption on object detection models in images (a) no corruption (b) noise corruption.
Figure 6. Comparative analysis of the impact of noise corruption on object detection models in images (a) no corruption (b) noise corruption.
Preprints 90809 g006
Figure 7. Corruption percentage for different severity levels of noise corruption types.
Figure 7. Corruption percentage for different severity levels of noise corruption types.
Preprints 90809 g007
Figure 8. Comparison between real raindrops [36] and simulated raindrop models.
Figure 8. Comparison between real raindrops [36] and simulated raindrop models.
Preprints 90809 g008
Figure 9. Overlapping analysis procedure and corruption filtering concept among different corruption types. (a) generation of scenario corruption; (b) training and benchmark testing datasets; (c) corruption overlap score table; (d) results of corruption filtering methodology and benchmark types.
Figure 9. Overlapping analysis procedure and corruption filtering concept among different corruption types. (a) generation of scenario corruption; (b) training and benchmark testing datasets; (c) corruption overlap score table; (d) results of corruption filtering methodology and benchmark types.
Preprints 90809 g009
Figure 10. Comparison of corruption effects in different severity regions for nine corruption types.
Figure 10. Comparison of corruption effects in different severity regions for nine corruption types.
Preprints 90809 g010
Figure 11. Overlapping score matrix for corruption types in city and highway scenarios. (a) city; (b) highway.
Figure 11. Overlapping score matrix for corruption types in city and highway scenarios. (a) city; (b) highway.
Preprints 90809 g011
Figure 12. Filtered corruption types in city scenario with different thresholds. (a) threshold =0.4; (b) threshold = 0.6.
Figure 12. Filtered corruption types in city scenario with different thresholds. (a) threshold =0.4; (b) threshold = 0.6.
Preprints 90809 g012
Figure 13. Filtered corruption types in highway scenario with different thresholds. (a) threshold = 0.4 (b) threshold = 0.5.
Figure 13. Filtered corruption types in highway scenario with different thresholds. (a) threshold = 0.4 (b) threshold = 0.5.
Preprints 90809 g013
Figure 14. Corruption groups formed by different thresholds in city and highway scenarios.
Figure 14. Corruption groups formed by different thresholds in city and highway scenarios.
Preprints 90809 g014
Table 1. Definition of severity regions for various weather-related corruption types.
Table 1. Definition of severity regions for various weather-related corruption types.
Preprints 90809 i001
Table 2. Subinterval definitions for severity regions of different weather-related corruption types.
Table 2. Subinterval definitions for severity regions of different weather-related corruption types.
Preprints 90809 i002
Table 3. Weather parameters.
Table 3. Weather parameters.
Preprints 90809 i003
Table 4. Severity levels of hourly rainfall and their corresponding visibility.
Table 4. Severity levels of hourly rainfall and their corresponding visibility.
Preprints 90809 i005
Table 5. Raindrop models and their corresponding raindrop volume size ranges.
Table 5. Raindrop models and their corresponding raindrop volume size ranges.
Preprints 90809 i010
Table 6. Comparative analysis of models trained on different datasets and tested on various testing sets.
Table 6. Comparative analysis of models trained on different datasets and tested on various testing sets.
Preprints 90809 i013
Table 7. Severity setting for subregions of the worst-performing region in single corruption type detection.
Table 7. Severity setting for subregions of the worst-performing region in single corruption type detection.
Preprints 90809 i014
Table 8. Detection performance of different trained models on single and two corruption type combinations benchmark testing datasets.
Table 8. Detection performance of different trained models on single and two corruption type combinations benchmark testing datasets.
Preprints 90809 i015
Table 9. Number of severity regions and descriptions of severity levels for corruption types.
Table 9. Number of severity regions and descriptions of severity levels for corruption types.
Preprints 90809 i016
Table 10. Test datasets generated for corruption types at different severity levels.
Table 10. Test datasets generated for corruption types at different severity levels.
Preprints 90809 i017
Table 11. Test results for city and highway scenes.
Table 11. Test results for city and highway scenes.
Preprints 90809 i018
Table 12. Number of images and training time required for model reinforcement training on city and highway scenarios.
Table 12. Number of images and training time required for model reinforcement training on city and highway scenarios.
Preprints 90809 i019
Table 13. Performance of three representative corruption type training models on various test datasets. (a) city model and scenario; (b) highway model and scenario.
Table 13. Performance of three representative corruption type training models on various test datasets. (a) city model and scenario; (b) highway model and scenario.
Preprints 90809 i020
Table 14. Performance of four representative corruption type training models on various test datasets. (a) city model and scenario; (b) highway model and scenario.
Table 14. Performance of four representative corruption type training models on various test datasets. (a) city model and scenario; (b) highway model and scenario.
Preprints 90809 i021
Table 15. Examples of two corruption type combinations.
Table 15. Examples of two corruption type combinations.
Preprints 90809 i022
Table 16. Performance of single and two-corruption type trained models in city and highway scenarios on single and two-corruption test datasets. (a) city environment; (b) highway environment.
Table 16. Performance of single and two-corruption type trained models in city and highway scenarios on single and two-corruption test datasets. (a) city environment; (b) highway environment.
Preprints 90809 i023
Table 17. Performance of models trained by simulation scenarios tested on real-world adverse test datasets. (a) city model; (b) highway model.
Table 17. Performance of models trained by simulation scenarios tested on real-world adverse test datasets. (a) city model; (b) highway model.
Preprints 90809 i024
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Accessibility

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated