2. Related Works
The actual road testing and validation of autonomous vehicles may require covering several million kilometers to gather performance statistics, primarily relevant to mechanical configurations and algorithm parameters. However, this approach is less effective during the development process, such as the V model development workflow [15]. Nevertheless, the testing and validation of autonomous driving functions necessitate various complex traffic scenarios, including uncertainties associated with other vehicles or motorcycles. Adjusting parameter variations for different scenarios is more easily achieved through simulation and emulation, facilitating broader coverage of traffic scenarios, and enabling repeatable test runs with traceable experimental results. Relying solely on the total mileage driven for the validation of autonomous driving is an unacceptable solution. Statistical data suggests that autonomous vehicle testing and validation would necessitate records of 106 to 108 miles of driving [16]. Therefore, relying solely on real road testing for validation is almost impractical. On the other hand, reproducing various road conditions, weather scenarios, and traffic situations is exceedingly challenging and time-consuming, incurring substantial costs. Thus, simulation and emulation present a viable alternative for testing and validation.
Testing and validating autonomous driving assistance systems (ADAS) presents significant challenges, as any failures during testing can potentially compromise safety and lead to unfortunate incidents. In response to this challenge, literature [17] proposes the integration of multiple ADAS sensors and their corresponding parameters into a virtual simulation platform. The primary contribution lies in the ability to parameterize and adjust the testing and validation of various specific sensors and different mounting positions, facilitating the assessment of sensor fusion algorithms. However, this approach does not delve into the interference caused by varying levels of weather severity or image noise, which could potentially result in sensor failures or misjudgments.
Additionally, literature [18] introduces a significant contribution by presenting a keyword-based scene description methodology, enabling the conversion of relevant data formats for simulation environments. This transformation facilitates the transition to data formats such as OpenDRIVE and OpenSCENARIO, providing a more efficient means of creating diverse testing scenarios. Nevertheless, there is a relatively limited exploration of the analysis of various weather interference types and different levels of severity within scenarios.
Literature [4] makes a significant contribution by introducing the concept of an autonomous driving image perception systems. Even in the absence of adversarial malicious interference, this system acknowledges that natural interferences can impact the image input, leading to a proposal for an image interference generator. This generator can transform normal real-world images into images affected by interference. However, it's worth noting that generating a huge number of scenarios and creating more complex image variations using this method can be highly time-consuming. Moreover, a comprehensive analysis of effect of weather interference on testing and validation is still an area that requires further research.
Machine learning algorithms often employ well-known open-source datasets such as KITTI, Cityscape, or BDD for model training and validation. However, most of these datasets were captured under ideal weather conditions. Therefore, the primary contribution of literature [9] lies in the establishment of models based on real-world snowfall and foggy conditions. These models allow for the creation of image datasets with quantifiable adjustable parameters, thus enhancing the testing coverage by introducing adverse weather conditions like snowfall and fog. Furthermore, it facilitates the quantifiable adjustment of parameters to create high-interference weather image datasets, aiding in the training and validation of object detection models.
The occurrence of camera noise in images can potentially lead to errors or failures in AI-based object detection systems [19]. Such errors may have severe consequences in autonomous vehicles, impacting human safety. However, camera noise is an inevitable issue in images, prompting numerous scholars to propose various algorithms to detect and eliminate noise in images [20,21]. Many of these algorithms introduce Gaussian noise to the images and then perform denoising. However, real-world camera noise includes various types beyond Gaussian noise. Furthermore, most of these algorithms rely on common image quality metrics like Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR), and Structured Similarity Indexing Method (SSIM) [19,22] to evaluate the effectiveness of denoising algorithms. This approach may not adequately assess whether the restored images sufficiently improve the detection rates of AI object detection systems. In response, several studies have proposed algorithms for noise removal and tested the resulting images in object detection systems to demonstrate their reliability. However, these algorithms often focus on a single type of noise, lacking a comprehensive analysis of the impact of noise removal on the subsequent object detection process.
Literature [23] examines a vision-based driver assistance system that performs well under clear weather conditions but experiences a significant drop in reliability and robustness during rainy weather. Raindrops tend to accumulate on the vehicle's windshield, leading to failures in camera-based ADAS systems. Literature [20] employs two different raindrop detection algorithms to calculate the positions of raindrops on the windshield and utilizes a raindrop removal algorithm to ensure the continued perception capabilities of the ADAS system, even in the presence of raindrop interference. To obtain a real dataset with raindrops on the windshield, the paper [24] artificially applies water to a glass surface and places it 3 to 5 centimeters in front of a camera. Since the dataset was captured in the real world, it accurately represents light reflection and refraction. However, this method is only suitable for static scenes, as each scene requires two images, one with the water-covered glass and another without. Additionally, this method cannot precisely control the size and quantity of water droplets on the glass.
Another raindrop approach presented in the paper [25], the authors reconstruct 3D scenes using stereo images or LiDAR data from datasets like VTD, KITTI, and Cityscape, creating nearly photorealistic simulation results. However, constructing scenes and rendering raindrops in this manner is time-consuming, with each KITTI image requiring three minutes. To be practical, a method to expedite this process is necessary. Authors in the paper [26] adopted publicly available datasets with label files (e.g., BDD, Cityscapes) and displayed dataset images on a high-resolution computer screen. They placed a 20-degree tilted glass between a high-resolution DSLR camera and the computer screen to simulate a real car's windshield. While this method efficiently generates raindrops on the windshield without manual annotation, quantifying the raindrops is challenging. Using screen-capture from a camera may lead to color distortion and reduce dataset quality.
Reference [11] provides a dataset for training the perception system of self-driving cars. This dataset consists of imagery from 100,000 videos. Unlike other open datasets, the annotated information in these videos includes scene labels, object bounding boxes, lane markings, drivable areas, semantic and instance segmentation, multi-object tracking, and multi-object tracking with semantic segmentation. It caters to various task requirements. However, it contains minor labeling inaccuracies, such as classifying wet roads as rainy weather without distinguishing the severity of rainfall. Training models with inaccurate labels can result in decision-making errors. Therefore, it necessitates correcting the labeling errors before use.
Paper [27] primarily focuses on reconstructing real-world scene datasets based on a simulation platform. It demonstrates the parameterization of different sensors for simulation matching. Finally, it evaluates detection results using object detection algorithms with AP and IoU threshold indicators, comparing real data from the front camera module of the nuScenes dataset and synthetic front camera images generated through the IPG Carmaker simulation environment.
In papers [28,29], the use of a game engine for simulation allows for adjustments to various weather conditions, but these weather settings lack quantifiable parameters that correspond to real-world environments. Additionally, the dynamic vehicle simulation in the game engine falls short of the realism compared to the specialized software tools such as VTD or PreScan. Furthermore, it does not support the simulation of heterogeneous sensors and provides only limited image-related data, which may be somewhat lacking in robustness verification.
Reference [30] makes a contribution by introducing a dedicated dataset for rainy road street scene images. The images include distortions caused by water droplets on camera lenses, as well as visual interference from fog and road reflections. This dataset facilitates testing and validating perception systems designed for rainy conditions and model training. Furthermore, it presents a novel image transformation algorithm capable of adding or removing rain-induced artifacts, simulating driving scenarios in transitioning from clear to rainy weather. However, this dataset lacks image interference caused by raindrops on car windshields and cannot quantify the adjustments to different raindrop parameters, somewhat limiting its functionality.
Reference [31] proposed two benchmark test datasets encompassing 15 different image interference scenarios, including noise, blur, weather-related interference, and digital noise. These datasets involve object displacement and subtle image adjustments, serving to test the resilience of AI classifier models. The paper presented a metric for computing the error rates of AI classifier models under various interferences. This metric can help infer the vulnerabilities of models to specific types of interferences, thereby enhancing their robustness. While these benchmark test datasets cover a variety of interferences, the paper did not account for the similarity and overlap of interferences between test scenarios. This oversight may lead to ineffective testing and inadequate coverage, resulting in incomplete verification results. Hence, future benchmark test datasets must consider the issue of interference similarity and overlap among test scenarios to ensure both coverage and effectiveness in verification testing.
Paper [32] argues that most autonomous driving data collection and training occur in specific environments. When faced with uncertain or adverse conditions such as snow, rain, or fog, there are considerable challenges. While there are some available datasets for evaluating self-driving perception systems, it is still a need for a more comprehensive and versatile benchmark test dataset to continually enhance the overall functionality and performance of these systems.
Paper [33] presented an approach to train AI object detection models using virtual image datasets. The experiments demonstrate that by combining virtual image datasets with a small fraction of real image data, models can be trained to surpassing models trained solely with real image datasets. However, this paper did not address the use of virtual datasets augmented with weather or noise interference for training.
In summary, due to the diversity of interference types and varying severity levels, testing all possible interference types and their severity would result in a vast amount of test data and time requirements. There is a need to propose an efficient way to reduce the number of similar interference types to be investigated while maintaining the required test coverage and shortening the testing time effectively.