Safety Concerns in Autonomous Driving: A Taxonomy of Test and Evaluation Approaches

Arif Hossain

doi:10.20944/preprints202508.2152.v1

Submitted:

28 August 2025

Posted:

29 August 2025

You are already at the latest version

Abstract

Ensuring the safety of autonomous vehicles (AVs) requires rigorous and scalable testing methodologies capable of capturing both routine and safety-critical scenarios. Scenario-based testing has emerged as a vital approach to expose AVs to diverse and challenging conditions beyond traditional road mileage accumulation. This survey focuses on scenario generation—an essential component enabling automated, efficient, and comprehensive testing of autonomous driving systems (ADS). We categorize existing scenario generation methods into three primary paradigms: rule-based, data-driven, and learning-based. For each, we analyze the core methodologies, simulation platforms, scenario description languages, and evaluation metrics used to assess realism, diversity, and criticality. We further identify key research challenges such as the reality gap, limited data generalization, and rare-event modeling, and discuss emerging trends including language-driven generation, hybrid modeling frameworks, and standardized scenario repositories. This work provides a unified perspective on scenario generation, aiming to support researchers and practitioners in advancing safe and certifiable autonomous driving technologies.

Keywords:

autonomous driving systems

;

autonomous vehicles

;

safety testing

Subject:

Engineering - Safety, Risk, Reliability and Quality

1. Introduction

Autonomous Driving Systems (ADS) have rapidly advanced in recent years, fueled by breakthroughs in artificial intelligence, sensor technologies, and vehicular connectivity [1,2,3]. These systems are envisioned to reduce traffic accidents, improve efficiency, and enhance mobility across diverse populations. Yet, before large-scale deployment, ensuring their safety and reliability remains a formidable challenge [4,5,6,7,8]. A major barrier lies in the need for rigorous testing strategies capable of validating system performance under the diversity, uncertainty, and risk inherent in real-world conditions [9,10,11,12].

Public perception underscores these concerns. According to AAA’s 2024 annual autonomous vehicle survey [13], 66% of U.S. drivers reported being afraid of fully self-driving vehicles, while only 9% expressed trust, with fears largely driven by highly publicized accidents and safety incidents. As illustrated in Figure 1, such skepticism has remained persistent, signaling that safety concerns are not only technical but also societal barriers to adoption.

Testing therefore plays a central role in the development, validation, and regulatory approval of autonomous vehicles (AVs) [14,15,16,17]. Unlike conventional vehicles, AVs must independently perceive, predict, and act in dynamic environments, where unpredictable human behaviors, ambiguous road infrastructure, and rare but safety-critical events—such as sudden occlusions, erratic maneuvers, or sensor degradation—may occur [18,19,20,21,22,23]. Traditional metrics based on mileage accumulation and disengagement rates are increasingly recognized as inadequate: real-world testing rarely captures statistically rare edge cases and often wastes resources on uneventful scenarios [24,25,26,27].

As a result, scenario-based testing has emerged as a more systematic and safety-oriented paradigm. This approach centers on constructing and executing targeted scenarios that represent both common driving contexts and rare but hazardous corner cases [28,29,30]. These include aggressive lane changes, pedestrian dart-outs, or multi-agent negotiation in dense traffic [31,32]. Scenario-based testing enables structured evaluation of AV capabilities under uncertainty, while simulation platforms and formal scenario description languages (e.g., ASAM OpenSCENARIO, Scenic) facilitate repeatable, scalable, and transferable validation [33,34,35]. Regulatory frameworks such as ISO 21448 (SOTIF) and UNECE WP.29 are increasingly incorporating scenario-driven safety validation, underscoring its role in certification.

Within this context, scenario generation has become a critical enabler. Automated generation methods aim to synthesize diverse, realistic, and safety-critical test cases, going beyond manually curated or replayed datasets. These synthetic scenarios are especially valuable for stress-testing ADS against rare but high-risk events—such as near-miss collisions, unprotected turns, or aggressive cut-ins—that are unlikely to be encountered through brute-force road testing [36,37,38,39]. Recent research has explored reinforcement learning, Bayesian optimization, and other advanced algorithms to produce edge-case scenarios at scale [16,40,41,42,43,44,45,46,47,48,49,50,51]. Such approaches enable closed-loop simulations that iteratively challenge AV decision-making under uncertainty [52,53,54,55].

This survey provides a comprehensive review of scenario generation techniques for AV safety testing. We classify existing methods into three main paradigms: rule-based, data-driven, and learning-based approaches. For each, we analyze their algorithmic foundations, scenario description languages, evaluation metrics (e.g., coverage, criticality, diversity), and supporting simulation tools. We then highlight research gaps, including generalization across domains, realism of simulated environments, and the modeling of complex multi-agent interactions. Finally, we discuss emerging directions, such as semantic generation with large language models (LLMs), interactive multi-agent testing, and standardized scenario libraries for benchmarking [56,57]. We aim for this survey to serve as a resource for researchers and practitioners striving to enhance the safety, reliability, and public trust of autonomous driving systems.

2. Autonomous Driving Testing

Ensuring the safety and reliability of Autonomous Driving Systems (ADS) requires rigorous and systematic testing pipelines. Due to the complexity and unpredictability of real-world environments, conventional automotive testing methodologies are insufficient to evaluate the operational safety of AVs across their intended domain. To address this limitation, researchers and regulators have established multi-tiered testing frameworks that combine simulation, controlled track experiments, and on-road trials, thereby enabling comprehensive assessment under diverse conditions.

Testing strategies are commonly divided into three major categories: simulation-based testing, closed-track testing, and on-road testing, each with distinct contributions to the validation process.

Simulation-based testing leverages virtual environments to evaluate AV perception, planning, and control modules across a wide range of conditions. Platforms such as CARLA [58] and LGSVL [59] allow scalable, low-cost testing of scenarios including sudden occlusions, erratic maneuvers, and adverse weather or lighting [60]. Simulation ensures coverage of rare or hazardous cases that are often infeasible in physical testing. However, challenges remain in closing the sim-to-real gap, particularly in sensor modeling, environmental fidelity, and behavioral realism [61,62].

Closed-track testing provides controlled, repeatable environments at facilities such as Mcity [63] and AstaZero [64]. It serves as a bridge between simulation and on-road validation, offering safe conditions to evaluate critical functionalities like automated emergency braking, lane changes, and V2X communication systems [65]. While track testing ensures physical realism and safety, it remains costly and limited in terms of environmental diversity and traffic complexity.

On-road testing represents the final and most realistic stage, exposing AVs to real traffic agents, unstructured environments, and regulatory contexts. Companies including Waymo, Cruise, and Baidu Apollo have reported millions of kilometers of road testing to validate their systems. Despite its necessity, on-road testing is constrained by the rarity of safety-critical events, limited statistical coverage, and significant liability risks [24,66]. For this reason, it is typically preceded by extensive simulation and track-based testing.

Recent efforts emphasize hybrid validation pipelines that integrate these three levels: simulation for exploratory testing, closed tracks for controlled validation, and on-road deployment for certification. Table ?? provides a comparative summary of their trade-offs.

In the context of AV safety validation, it is essential to distinguish between three core concepts foundational to testing and scenario generation [67]:

Scene: A static snapshot of the driving environment at a particular time, including road layout, infrastructure, traffic participants, environmental conditions, and the states of agents (e.g., position, velocity, heading). Scenes capture the spatial and contextual setup without temporal evolution.
Scenario: A temporal sequence of scenes that models interactions among agents and unfolding events (e.g., overtaking, merging, emergency braking). Scenarios provide the basis for behavior modeling and safety testing [68].
Test Case: A parameterized instantiation of a scenario, specifying initial configurations (e.g., positions, velocities, traffic density) along with measurable evaluation criteria. Test cases enable structured verification outcomes such as pass/fail or quantitative safety margins.

Such distinctions help formalize testing logic, improve comparability across datasets and tools, and ensure consistent interpretation in regulatory frameworks.

To support reproducible and certifiable ADS testing, a number of international standards and research initiatives have been proposed. These frameworks define methodologies for scenario modeling, simulation integration, and safety validation.

PEGASUS Project (Project for the Establishment of Generally Accepted Quality Criteria, Tools and Methods as well as Scenarios and Situations) is a German-led initiative that formalizes scenario-based testing pipelines. It emphasizes data-driven scenario derivation, parameterization using statistical distributions, execution in simulation, and coverage-driven evaluation [69]. PEGASUS has become a reference model in Europe for integrating simulation, track, and road testing with quantifiable safety arguments.

ISO 21448, also known as SOTIF (Safety of the Intended Functionality), complements ISO 26262 by addressing safety risks arising from limitations of perception and decision-making in complex or uncertain scenarios. It focuses on hazard identification and mitigation for failures not caused by hardware/software faults but by incomplete environment modeling or AI reasoning [69].

ASAM (Association for Standardisation of Automation and Measuring Systems) has developed open standards that enable interoperability across testing toolchains:

OpenSCENARIO: Defines dynamic scenario logic, including actors, maneuvers, triggers, and actions.
OpenDRIVE: Provides detailed road network geometry, topology, and infrastructure models.
OpenXOntology: Establishes a shared semantic vocabulary to ensure consistent tool integration.

Additional efforts, such as ISO 34501/34502, introduce common terminologies, scenario taxonomies, and auditability criteria to ensure traceability in AV certification processes. Collectively, these frameworks mark a global shift toward scenario-centric testing that emphasizes reproducibility, coverage, and proactive hazard exposure.

2.1. Rule-Based Scenario Generation

Rule-based scenario generation constructs test cases based on predefined rules, templates, or expert knowledge. These methods often rely on regulatory conditions, structured parameter combinations, or safety-critical templates, offering high interpretability and traceability.

A classical line of work focuses on parameterized templates constrained by domain knowledge. Klischat and Althoff [70] proposed an approach that minimizes the solution space of the ego vehicle’s motion planner using evolutionary algorithms such as Differential Evolution and Particle Swarm Optimization. Their framework effectively generates high-risk scenarios in multi-vehicle interactions and intersections, though computational efficiency decreases in high-dimensional settings due to collision constraints.

Althoff and Lutz [71] combined reachability analysis with optimization methods to synthesize collision-avoidance scenarios. Their method rapidly constructs scenarios within seconds, focusing on the safety envelope of the ego vehicle. However, it does not yet extend to multi-step trajectories or adversarial agent modeling.

To improve diversity and efficiency, Feng et al. [72] introduced the Adaptive Testing Scenario Library Generation (ATSLG) framework. Leveraging Bayesian Optimization with Gaussian Process Regression, ATSLG incrementally refines the scenario space, reducing required test iterations by one to two orders of magnitude. While effective in discovering critical scenarios, its scalability in high-dimensional feature spaces remains limited.

Gao et al. [73] proposed a combinatorial test generation method (CTBC) that integrates test matrices with a complexity-driven algorithm. By incorporating hierarchical influence modeling through AHP, their framework balances coverage and complexity. However, the accuracy of complexity estimation is constrained by underlying approximations.

More recently, Zhang et al. [74] extended rule-based methods by incorporating knowledge-enhanced scenario synthesis with large language models (LLMs). This approach aligns natural language intent with parameterized generation, enhancing flexibility while maintaining the structured backbone of rule-based design.

Overall, rule-based approaches remain widely adopted for their interpretability, repeatability, and suitability for regulatory testing. Nonetheless, they struggle to capture rare, emergent, or long-tail interactions and often lack adaptability in open-ended driving environments.

2.2. Data-Driven Scenario Generation

Data-driven scenario generation exploits large-scale naturalistic driving datasets to extract, learn, and synthesize scenarios that reflect realistic traffic patterns, driver behaviors, and rare safety-critical cases. Unlike rule-based methods that rely on expert-crafted templates, this approach leverages statistical analysis, machine learning, and data mining to uncover latent dynamics directly from empirical driving records. The availability of rich datasets such as NGSIM, Argoverse, and SHRP2 enables scalable creation of test cases that closely mirror real-world complexity.

A dominant line of work is trajectory-based learning. Zhang et al. [75] proposed DP-TrajGAN, a GAN-based framework augmented with differential privacy to synthesize realistic vehicle trajectories while preserving data confidentiality. Their method achieved strong fidelity and privacy trade-offs on NGSIM and Argoverse. Similarly, Krajewski et al. [76] combined GANs and VAEs to model diverse vehicle maneuvers, improving both scenario realism and simulation variability.

To generate more safety-critical conditions, researchers have explored adversarial perturbation. Wang et al. [77] introduced AdvSim, which perturbs trajectories and LiDAR signals to create adversarial test cases. Their results demonstrated the effectiveness of adversarial replay in uncovering system vulnerabilities.

Another important research thread focuses on criticality assessment and targeting. Westhofen et al. [78] provided a systematic review of criticality metrics and proposed a framework to evaluate their applicability for AV testing. Kang et al. [79] applied voxel-based 3D modeling and vision transformers to identify latent safety threats in LiDAR data, reporting an F1 score of 98.26% for risky zone detection.

Driving instability has also been employed as a data-driven indicator of crash risk. Arvin et al. [80] analyzed SHRP2 crash and near-crash events, demonstrating a strong correlation between pre-crash instability and severity. Such findings can guide the synthesis of high-risk scenarios for testing.

Rare and corner-case detection represents another active frontier. Bolte et al. [81] proposed a hybrid offline–online anomaly detection framework to identify low-frequency, high-impact events. This enables automatic harvesting of rare driving situations from continuous driving logs.

From a parameterization perspective, Muslim et al. [82] introduced a cut-out scenario generation method that abstracts highway traffic into interpretable parameter boundaries. This ensures both plausibility and controllable variability. Along similar lines, Huang et al. [83] presented the CaDRE framework, which integrates

3. Evaluation

A critical component of scenario-based testing for autonomous driving is not only the generation of test scenarios but also their systematic evaluation. Without rigorous assessment, generated scenarios may fail to meaningfully contribute to safety validation. To this end, researchers have proposed a range of quality criteria and quantitative metrics to judge whether scenarios are realistic, diverse, comprehensive, and safety-critical. These evaluation dimensions determine how effectively scenarios expose weaknesses in the system under test and ensure their value in regulatory and industrial validation pipelines.

Scenario evaluation is generally structured around four core dimensions:

Realism: Assesses how closely generated scenarios reflect real-world driving conditions, including plausible agent behavior, physically consistent motion dynamics, and compliance with traffic rules. Realism is crucial for external validity and is typically verified through statistical comparison with naturalistic datasets or human expert annotation [84,85,86].
Diversity: Measures the variability among generated scenarios in terms of maneuvers, traffic densities, environmental conditions, and multi-agent interactions. High diversity enhances the likelihood of uncovering unexpected system failures and ensures broader stress-testing of ADS [87,88,89,90,91].
Coverage: Captures how thoroughly scenarios span the operational design domain (ODD) and functional safety requirements. Coverage can be quantified by semantic labels (e.g., highway merging, unprotected left turns) or by parameter-space sampling strategies, ensuring systematic exploration of safety-relevant conditions [43,45,90,92].
Criticality: Evaluates the level of risk or difficulty posed by a scenario. Metrics include time-to-collision (TTC), minimum distance, required deceleration, and collision probability, all of which help identify scenarios most likely to trigger unsafe ADS responses [34,88,93,94,95,96,97,98].

To operationalize these metrics, effective evaluation requires advanced simulation environments that are realistic, flexible, and extensible. Such simulators allow controlled testing of perception, planning, and control modules at scale, while reducing the risks and costs of physical trials. In parallel, scenario description languages (e.g., OpenSCENARIO, Scenic) provide structured mechanisms to encode, manipulate, and standardize test cases. Together, high-fidelity simulators and formalized scenario languages form the backbone of scenario-based testing pipelines, ensuring that evaluation is both reproducible and transferable across different platforms and stakeholders.

4. Challenges and Research Gaps

Despite significant progress in scenario generation for autonomous driving testing, several challenges and open research questions remain. These issues span technical, practical, and regulatory domains, highlighting the need for more robust, scalable, and standardized solutions. This section outlines the most critical challenges faced by current approaches and identifies gaps that present opportunities for future work.

4.1. Limited Data Diversity and Generalization

Most data-driven and learning-based generation techniques rely on real-world driving datasets, which, while extensive, often exhibit distributional bias. They tend to overrepresent common urban driving patterns and underrepresent edge cases such as near-misses, rare weather conditions, or unusual traffic interactions. As a result, generated scenarios may lack diversity and fail to generalize across unseen domains or geographies. Bridging this gap requires improved domain adaptation methods, cross-city or cross-country datasets, and techniques for transferring knowledge between different driving contexts.

4.1.1. Reality Gap in Synthetic Scenarios

A persistent challenge is the so-called reality gap—the mismatch between synthetic, simulator-generated scenarios and real-world driving environments. Even high-fidelity simulators may fail to capture subtle behaviors of pedestrians, occlusions, sensor noise, or infrastructure imperfections. This gap can lead to overestimation of AV performance in simulated testing and underpreparedness in real-world deployment. Addressing this issue involves combining real-world and simulated data, applying domain randomization, and improving simulator realism both at the perception and decision-making levels.

4.1.2. Scalability of Scenario Space

The scenario space for AV testing is practically infinite, encompassing a large number of interacting variables—agent types, behaviors, road layouts, environmental conditions, and temporal sequences. Exhaustive exploration of this space is infeasible. Thus, existing generation methods often sample from constrained subspaces, risking incomplete validation. New scalable methods for scenario space abstraction, semantic scenario clustering, and combinatorial coverage optimization are needed to ensure high testing efficiency without sacrificing thoroughness.

4.1.3. Modeling Safety-Critical but Rare Events

Safety-critical scenarios—such as sudden pedestrian crossings, aggressive merges, or multi-agent collisions—are rare but essential for validating robustness. However, they are underrepresented in data and difficult to model without manual engineering or adversarial optimization. Existing methods either rely on hand-crafted triggers or heuristic optimization, both of which may fail to cover unknown failure modes. Learning methods that can autonomously discover and amplify rare, high-risk patterns are still in their infancy and represent a vital area for advancement.

4.1.4. Standardization and Regulatory Alignment

A major obstacle in the deployment of scenario-based testing is the lack of unified standards for scenario modeling, evaluation, and exchange. While efforts like ASAM OpenSCENARIO and ISO 21448 (SOTIF) provide foundational standards, their integration into automated generation pipelines remains limited. Furthermore, regulatory acceptance of synthetic and generated scenarios for AV certification is still evolving. Research is needed on formal scenario verification, traceability, and compliance to ensure generated scenarios meet safety assurance requirements in a legally verifiable way.

5. Emerging Trends and Future Directions

As autonomous driving technologies mature, the demands on scenario generation frameworks are becoming more sophisticated. To support next-generation AV testing, recent research efforts have begun to explore novel paradigms that go beyond conventional rule-based or data-centric approaches. This section highlights several emerging trends and outlines promising future directions that are expected to shape the field in the coming years.

5.1. Semantic and Language-Driven Scenario Generation

The integration of large language models (LLMs) such as GPT and Claude into the AV testing pipeline has opened a new frontier: semantic scenario generation. Instead of specifying low-level scene parameters manually or learning them from data, users can now describe high-level scenarios in natural language—e.g., “a pedestrian suddenly crosses at a dimly lit intersection during rain”—which are then automatically translated into structured scenario code (e.g., OpenSCENARIO or Scenic). This paradigm enables more intuitive, human-centered interaction and lowers the barrier for specifying complex or rare situations.

Several early-stage systems now link language models with simulation backends (e.g., CARLA + LLM), enabling real-time scenario synthesis and editing. Future work may focus on integrating commonsense reasoning, legal constraints, and safety specifications directly into the generation pipeline through natural language interfaces.

5.2. Multi-Modal and Multi-Agent Scene Synthesis

Another active trend is the development of multi-modal scenario synthesis that incorporates visual, spatial, and behavioral information from multiple sources—such as video, LiDAR, maps, and text—to construct comprehensive test scenes. Generative models are being trained to combine these modalities into coherent environments, which better reflect the sensor fusion-based perception systems in real AVs.

In parallel, there is increasing interest in multi-agent interaction modeling. Modern urban scenarios often involve complex interactions among multiple agents with varying intent (e.g., pedestrians, cyclists, autonomous and human-driven vehicles). Modeling these interactions realistically, and generating coordinated behavior trajectories, remains a significant challenge. Multi-agent reinforcement learning, game-theoretic approaches, and diffusion-based generative models are emerging tools for tackling this complexity.

5.3. Hybrid Data-Driven and Rule-Based Approaches

Recognizing the limitations of pure rule-based or data-driven methods, researchers are moving towards hybrid frameworks that combine both strengths. Rule-based constraints provide safety and structure, while data-driven models contribute realism and diversity.

In practice, this might involve using data-driven models to sample base scenes and agents, with rule-based logic applied to inject specific intent, constraints, or triggers. Alternatively, hybrid approaches may operate in a layered architecture—where a symbolic planner outlines scenario semantics, and a learned module fills in the low-level details. These combinations are particularly promising for balancing interpretability with expressive power.

5.4. Towards Standardized Scenario Repositories and Benchmarks

As the field grows, there is a growing need for open, standardized scenario repositories and benchmarking protocols. Currently, many datasets and scenarios are either proprietary or fragmented, making reproducibility and comparative evaluation difficult.

Initiatives such as the ASAM OpenX family (OpenSCENARIO, OpenDRIVE, OpenLABEL) and projects like GENEVA or SAFETAG aim to unify scenario description formats and provide comprehensive libraries of validated test cases. Benchmarking tools that evaluate scenario quality, failure exposure, and coverage are also under active development.

In the future, publicly maintained scenario banks—similar to ImageNet for vision or GLUE for NLP—may become the cornerstone for training, testing, and certifying autonomous driving systems under global safety standards.

6. Conclusions

Scenario generation has emerged as a central pillar in the validation and verification of autonomous driving systems. This survey has reviewed the landscape of scenario generation techniques from multiple perspectives, including rule-based, data-driven, and learning-based approaches. We have discussed key simulation platforms and scenario description languages that underpin automated testing pipelines, as well as the metrics used to evaluate scenario quality in terms of realism, diversity, criticality, and reproducibility.

From the reviewed literature and practices, several important takeaways can be drawn. Rule-based generation provides structure and standardization but struggles with diversity and scalability. Data-driven approaches benefit from realism grounded in real-world observations but are constrained by dataset limitations and rarity of critical events. Learning-based methods offer promising adaptability and automation, especially for generating adversarial or failure-triggering scenarios, but face challenges related to safety, interpretability, and validation.

Scenario quality evaluation remains a non-trivial task, requiring multidimensional metrics and feedback from simulation environments. Tools like Scenic and general coverage metrics are gaining traction in quantifying scenario space exploration. Moreover, the growing integration of scenario generation with closed-loop simulators enables dynamic, intelligent testing strategies that evolve alongside AV system development.

Looking forward, advancing scenario generation will be key to achieving safe and efficient autonomous driving. Future research should prioritize hybrid and semantic methods that balance structure with adaptability, develop standardized scenario libraries and benchmarking protocols, and close the realism gap between synthetic and real-world driving environments. Through collaborative efforts in research, tool development, and regulation, scenario-based testing will continue to evolve as a robust framework for ensuring safety in increasingly complex autonomous systems.

References

Zhang, Q.; Hua, K.; Zhang, Z.; Zhao, Y.; Chen, P. ACNet: An Attention–Convolution Collaborative Semantic Segmentation Network on Sensor-Derived Datasets for Autonomous Driving. Sensors 2025, 25, 4776. [CrossRef]
Wei, Z.; Gutierrez, C.A.; Rodr, J.; Wang, J. 6G-enabled Vehicle-to-Everything Communications: Current Research Trends and Open Challenges. IEEE Open Journal of Vehicular Technology 2025. [CrossRef]
Kumar, H.; Mamoria, P.; Dewangan, D.K. Vision technologies in autonomous vehicles: Progress, methodologies, and key challenges. International Journal of System Assurance Engineering and Management 2025. [CrossRef]
Liu, X.; Huang, H.; Bian, J.; Zhou, R.; Wei, Z. Generating intersection pre-crash trajectories for autonomous driving safety testing using Transformer Time-Series GANs. Engineering Applications of Artificial Intelligence 2025. [CrossRef]
Liu, H.X.; Feng, S. Curse of rarity for autonomous vehicles. nature communications 2024, 15, 4808. [CrossRef]
Zhao, M.; Liang, C.; Wang, T.; Guan, J.; Wan, L. Scenario Hazard Prevention for Autonomous Driving Based on Improved STPA. In Safety, Reliability, and Security; Springer, 2025.
Zhou, R.; Huang, H.; Lee, J.; Huang, X.; Chen, J.; Zhou, H. Identifying typical pre-crash scenarios based on in-depth crash data with deep embedded clustering for autonomous vehicle safety testing. Accident Analysis & Prevention 2023, 191, 107218. [CrossRef]
Huang, H.; Huang, X.; Zhou, R.; Zhou, H.; Lee, J.J.; Cen, X. Pre-crash scenarios for safety testing of autonomous vehicles: A clustering method for in-depth crash data. Accident Analysis & Prevention 2024, 203, 107616. [CrossRef]
da Costa, A.A.B.; Irvine, P.; Dodoiu, T.; Khastgir, S. Building a Robust Scenario Library for Safety Assurance of Automated Driving Systems: A Review. IEEE Transactions on Intelligent Transportation Systems 2025. [CrossRef]
Lahikainen, J. AI-Driven Inverse Method for Identifying Mechanical Properties From Small Punch Tests. PhD thesis, Aalto University, 2025.
Zhou, R.; Zhang, G.; Huang, H.; Wei, Z.; Zhou, H.; Jin, J.; Chang, F.; Chen, J. How would autonomous vehicles behave in real-world crash scenarios? Accident Analysis & Prevention 2024, 202, 107572. [CrossRef]
Gu, J.; Bellone, M.; Lind, A. Camera-LiDAR Fusion based Object Segmentation in Adverse Weather Conditions for Autonomous Driving. In Proceedings of the 2024 19th Biennial Baltic Electronics Conference (BEC). IEEE, 2024.
American Automobile Association. Fear of Self-Driving Cars Persists. https://newsroom.acg.aaa.com/michigan-fear-of-self-driving-cars-persists/, 2024. Accessed: 2025-08-29.
Feng, S.; Sun, H.; Yan, X.; Zhu, H.; Zou, Z.; Shen, S.; Liu, H.X. Dense reinforcement learning for safety validation of autonomous vehicles. Nature 2023, 615, 620–627. [CrossRef]
Feng, S.; Yan, X.; Sun, H.; Feng, Y.; Liu, H.X. Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment. Nature communications 2021, 12, 748. [CrossRef]
Huang, X.; Cen, X.; Cai, M.; Zhou, R. A framework to analyze function domains of autonomous transportation systems based on text analysis. Mathematics 2022, 11, 158. [CrossRef]
Scanlon, J.M.; Kusano, K.D.; Daniel, T.; Alderson, C.; Ogle, A.; Victor, T. Waymo simulated driving behavior in reconstructed fatal crashes within an autonomous vehicle operating domain. Accident Analysis & Prevention 2021, 163, 106454. [CrossRef]
Huang, W.; Wang, K.; Lv, Y.; Zhu, F. Autonomous vehicles testing methods review. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2016, pp. 163–168.
Hungar, H. Scenario-based validation of automated driving systems. In Proceedings of the International Symposium on Leveraging Applications of Formal Methods. Springer, 2018, pp. 449–460.
Deldari, N. Scenario Annotation in Autonomous Driving: An Outlier Detection Framework. PhD thesis, Uppsala University, 2025.
Pisinger, D.; Ropke, S. Large Neighborhood Search. In Handbook of Metaheuristics; Gendreau, M.; Potvin, J.Y., Eds.; Springer US: Boston, MA, 2010; pp. 399–419.
Zhou, Y.; Sun, Y.; Tang, Y.; Chen, Y.; Sun, J.; Poskitt, C.M.; Liu, Y.; Yang, Z. Specification-Based Autonomous Driving System Testing. IEEE Transactions on Software Engineering 2023, 49, 3391–3410. [CrossRef]
Zhang, H.; Sun, J.; Tian, Y. Accelerated Risk Assessment for Highly Automated Vehicles: Surrogate-Based Monte Carlo Method. IEEE Transactions on Intelligent Transportation Systems 2024, 25, 5488–5497. [CrossRef]
Kalra, N.; Paddock, S.M. Driving to Safety: How Many Miles of Driving Would It Take to Demonstrate Autonomous Vehicle Reliability? Transportation Research Part A 2016. [CrossRef]
Ding, W.; Lin, H.; Li, B.; Zhao, D. Causalaf: Causal autoregressive flow for safety-critical driving scenario generation. In Proceedings of the Conference on robot learning. PMLR, 2023, pp. 812–823.
Li, C.; Sifakis, J.; Wang, Q.; Yan, R.; Zhang, J. Simulation-based validation for autonomous driving systems. In Proceedings of the Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023, pp. 842–853.
Klamann, B.; Lippert, M.; Amersbach, C.; Winner, H. Defining Pass-/Fail-Criteria for Particular Tests of Automated Driving Functions. 2019 IEEE Intelligent Transportation Systems Conference (ITSC) 2019, 169–174. [CrossRef]
Nalic, D.; Mihalj, T.; Bäumler, M.; Lehmann, M.; Eichberger, A.; Bernsteiner, S. Scenario based testing of automated driving systems: A literature survey. In Proceedings of the FISITA web Congress, 2020, Vol. 10.
Finding Critical Scenarios for Automated Driving Systems: A Systematic Mapping Study. IEEE Transactions on Software Engineering 2023, 49, 991–1026. [CrossRef]
Sahu, N.; Bhat, A.; Rajkumar, R. SafeRoute: Risk-Minimizing Cooperative Real-Time Route and Behavioral Planning for Autonomous Vehicles. In Proceedings of the 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2024.
Nagy, R.; Szalai, I. Development of an unsupervised learning-based annotation method for road quality assessment. Transportation Engineering 2025. [CrossRef]
Menzel, T.; Bagschik, G.; Maurer, M. Scenarios for development, test and validation of automated vehicles. In Proceedings of the 2018 IEEE intelligent vehicles symposium (IV). IEEE, 2018, pp. 1821–1827.
Menzel, T.; Bagschik, M.; Maurer, M. Scenarios for Safety Validation of Highly Automated Vehicles. Transportation Research Part F 2020.
Sun, J.; Zhang, H.; Zhou, H.; Yu, R.; Tian, Y. Scenario-based test automation for highly automated vehicles: A review and paving the way for systematic safety assurance. IEEE transactions on intelligent transportation systems 2021, 23, 14088–14103. [CrossRef]
Fiorino, M.; Naeem, M.; Ciampi, M.; Coronato, A. Defining a metric-driven approach for learning hazardous situations. Technologies 2024, 12, 103. [CrossRef]
Mammen, M.; Kayatas, Z.; Bestle, D. Evaluation of Different Generative Models to Support the Validation of Advanced Driver Assistance Systems. Applied Mechanics 2025. [CrossRef]
Zhang, X.; Xiong, L.; Zhang, P.; Huang, J.; Ma, Y. Real-world Troublemaker: A 5G Cloud-controlled Track Testing Framework for Automated Driving Systems in Safety-critical Interaction Scenarios. arXiv preprint arXiv:2502.14574 2025. [CrossRef]
Wang, J.; Wang, X. Safety-Critical Scenario Generation for Self-Driving Systems Based on Domain Models. In Proceedings of the 2nd International Conference on Intelligent Robotics and Control Engineering, 2025.
Zhou, R.; Huang, H.; Lee, J.; Huang, X.; Chen, J.; Zhou, H. Identifying typical pre-crash scenarios based on in-depth crash data with deep embedded clustering for autonomous vehicle safety testing. Accident Analysis & Prevention 2023, 191, 107218. [CrossRef]
Xu, C.; Ding, Z.; Wang, C.; Li, Z. Statistical analysis of the patterns and characteristics of connected and autonomous vehicle involved crashes. Journal of Safety Research 2019, 71, 41–47. [CrossRef]
Lenard, J. Typical pedestrian accident scenarios for the development of autonomous emergency braking test protocols. Accident Analysis & Prevention 2014, p. 8. [CrossRef]
Zhou, R.; Liu, Y.; Zhang, K.; Yang, O. Genetic algorithm-based challenging scenarios generation for autonomous vehicle testing. IEEE Journal of Radio Frequency Identification 2022, 6, 928–933. [CrossRef]
Zhu, B.; Sun, Y.; Zhao, J.; Han, J.; Zhang, P.; Fan, T. A critical scenario search method for intelligent vehicle testing based on the social cognitive optimization algorithm. IEEE Transactions on Intelligent Transportation Systems 2023, 24, 7974–7986. [CrossRef]
Zhou, R. Efficient Safety Testing of Autonomous Vehicles via Adaptive Search over Crash-Derived Scenarios. arXiv preprint arXiv:2508.06575 2025. [CrossRef]
Bian, J.; Huang, H.; Yu, Q.; Zhou, R. Search-to-Crash: Generating safety-critical scenarios from in-depth crash data for testing autonomous vehicles. Energy 2025, p. 137174. [CrossRef]
Sun, J.; Zhang, H.; Zhou, H.; Yu, R.; Tian, Y. Scenario-Based Test Automation for Highly Automated Vehicles: A Review and Paving the Way for Systematic Safety Assurance. IEEE Transactions on Intelligent Transportation Systems 2022, 23, 14088–14103. [CrossRef]
Tian, Y.; Zheng, W.; Shao, Y.; Zhang, H.; Sun, J. MJTG: A Multi-vehicle Joint Trajectory Generator for Complex and Rare Scenarios. IEEE Transactions on Vehicular Technology 2025. [CrossRef]
Mondelli, A.; Li, Y.; Zanardi, A.; Frazzoli, E. Test Automation for Interactive Scenarios via Promptable Traffic Simulation. arXiv preprint arXiv:2506.01199 2025. [CrossRef]
Mei, Y.; Nie, T.; Sun, J.; Tian, Y. Llm-attacker: Enhancing closed-loop adversarial scenario generation for autonomous driving with large language models. arXiv preprint arXiv:2501.15850 2025. [CrossRef]
Zeng, Z.; Shi, Q.; Zhuang, W.; Wang, X.; Fan, X. Adversarial Generation for Autonomous Vehicles in Safety-Critical Ramp Merging Scenarios. In Proceedings of the International Conference on Electric Vehicle and Vehicle Engineering. Springer, 2024, pp. 427–434.
Feng, S.; Feng, Y.; Sun, H.; Zhang, Y.; Liu, H.X. Testing scenario library generation for connected and automated vehicles: An adaptive framework. IEEE Transactions on Intelligent Transportation Systems 2020, 23, 1213–1222. [CrossRef]
Tang, S.; Zhang, Z.; Zhou, J.; Lei, L.; Zhou, Y.; Xue, Y. Legend: A top-down approach to scenario generation of autonomous driving systems assisted by large language models. In Proceedings of the Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024, pp. 1497–1508.
Arnav, M. Scenario generation methods for functional safety testing of automated driving systems 2025.
Huai, Y.; Almanee, S.; Chen, Y.; Wu, X.; Chen, Q.A.; Garcia, J. scenoRITA: Generating Diverse, Fully-Mutable, Test Scenarios for Autonomous Vehicle Planning. IEEE Transactions on Software Engineering 2023, pp. 1–21. [CrossRef]
Ding, W.; Lin, H.; Li, B.; Zhao, D. Generalizing goal-conditioned reinforcement learning with variational causal reasoning. Advances in Neural Information Processing Systems 2022, 35, 26532–26548. [CrossRef]
Cai, X.; Bai, X.; Cui, Z.; Xie, D.; Fu, D.; Yu, H.; Ren, Y. Text2scenario: Text-driven scenario generation for autonomous driving test. arXiv preprint arXiv:2503.02911 2025. [CrossRef]
Ricotta, C.; Khzym, S.; Faron, A.; Emadi, A. Property Optimized GNN: Improving Data Association Performance Using Cost Function Optimization for Sensor Fusion In High Density Environments. In Proceedings of the 2024 IEEE Smart World Congress (SWC). IEEE, 2024, pp. 1871–1877.
Dosovitskiy, A.; Ros, G.; Codevilla, F.; Lopez, A.; Koltun, V. CARLA: An open urban driving simulator. In Proceedings of the Conference on robot learning. PMLR, 2017, pp. 1–16.
Rong, G.; Shin, B.H.; Tabatabaee, H.; Lu, Q.; Lemke, S.; Možeiko, M.; Boise, E.; Uhm, G.; Gerow, M.; Mehta, S.; et al. Lgsvl simulator: A high fidelity simulator for autonomous driving. In Proceedings of the 2020 IEEE 23rd International conference on intelligent transportation systems (ITSC). IEEE, 2020, pp. 1–6.
Maier, R.; Grabinger, L.; Urlhart, D.; Mottok, J. Causal models to support scenario-based testing of adas. IEEE Transactions on Intelligent Transportation Systems 2023, 25, 1815–1831. [CrossRef]
Fremont, D.J.; Kim, E.; Pant, Y.V.; Seshia, S.A.; Acharya, A.; Bruso, X.; Wells, P.; Lemke, S.; Lu, Q.; Mehta, S. Formal scenario-based testing of autonomous vehicles: From simulation to the real world. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020, pp. 1–8.
Sonetta, H.Y. Bridging the simulation-to-reality gap: Adapting simulation environment for object recognition. Master’s thesis, University of Windsor (Canada), 2021.
Dong, Y.; Zhong, Y.; Yu, W.; Zhu, M.; Lu, P.; Fang, Y.; Hong, J.; Peng, H. Mcity data collection for automated vehicles study. arXiv preprint arXiv:1912.06258 2019. [CrossRef]
Jacobson, J.; Janevik, P.; Wallin, P. Challenges in creating AstaZero, the active safety test area. In Proceedings of the Transport Research Arena (TRA) 5th Conference: Transport Solutions from Research to DeploymentEuropean CommissionConference of European Directors of Roads (CEDR) European Road Transport Research Advisory Council (ERTRAC), 2014.
Ma, Y.; Sun, C.; Chen, J.; Cao, D.; Xiong, L. Verification and validation methods for decision-making and planning of automated vehicles: A review. IEEE Transactions on Intelligent Vehicles 2022, 7, 480–498. [CrossRef]
Mariani, R. An overview of autonomous vehicles safety. In Proceedings of the 2018 IEEE International Reliability Physics Symposium (IRPS). IEEE, 2018, pp. 6A–1.
International Organization for Standardization. ISO 34501:2022 - Road vehicles — Test scenarios for automated driving systems — Vocabulary, 2022.
Batsch, F.; Kanarachos, S.; Cheah, M.; Ponticelli, R.; Blundell, M. A taxonomy of validation strategies to ensure the safe operation of highly automated vehicles. Journal of Intelligent Transportation Systems 2021, 26, 14–33. [CrossRef]
Wang, C.; Storms, K.; Zhang, N.; Winner, H. Runtime unknown unsafe scenarios identification for SOTIF of autonomous vehicles. Accident Analysis & Prevention 2024, 195, 107410. [CrossRef]
Klischat, M.; Althoff, M. Generating critical test scenarios for automated vehicles with evolutionary algorithms. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), 2019.
Althoff, M.; Lutz, S. Automatic generation of safety-critical test scenarios for collision avoidance of road vehicles. In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), 2018.
Feng, S.; Feng, Y.; Sun, H.; Zhang, Y.; Liu, H.X. Testing scenario library generation for connected and automated vehicles: An adaptive framework. IEEE Transactions on Intelligent Transportation Systems 2020. [CrossRef]
Gao, F.; Duan, J.; He, Y.; Wang, Z. A test scenario automatic generation strategy for intelligent driving systems. Mathematical Problems in Engineering 2019. [CrossRef]
Zhang, J.; Xu, C.; Li, B. Chatscene: Knowledge-enabled safety-critical scenario generation for autonomous vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
Zhang, J.; Huang, Q.; Huang, Y. DP-TrajGAN: A privacy-aware trajectory generation model with differential privacy. Future Generation Computer Systems 2023. [CrossRef]
Krajewski, R.; Moers, T.; Nerger, D. Data-driven maneuver modeling using generative adversarial networks and variational autoencoders for safety validation of highly automated vehicles. In Proceedings of the 21st International Conference on Intelligent Transportation Systems (ITSC), 2018.
Wang, J.; Pun, A.; Tu, J. Advsim: Generating safety-critical scenarios for self-driving vehicles. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
Westhofen, L.; Neurohr, C.; Koopmann, T. Criticality metrics for automated driving: A review and suitability analysis of the state of the art. Archives of Computational Methods in Engineering 2023. [CrossRef]
Kang, M.; Seo, J.; Hwang, K. Critical voxel learning with vision transformer and derivation of logical AV safety assessment scenarios. Accident Analysis & Prevention 2024. [CrossRef]
Arvin, R.; Kamrani, M.; Khattak, A.J. The role of pre-crash driving instability in contributing to crash intensity using naturalistic driving data. Accident Analysis & Prevention 2019. [CrossRef]
Bolte, J.A.; Bar, A.; Lipinski, D. Towards corner case detection for autonomous driving. In Proceedings of the IEEE Intelligent Vehicles Symposium, 2019.
Muslim, H.; Endo, S.; Imanaga, H. Cut-out scenario generation with reasonability foreseeable parameter range from real highway dataset for autonomous vehicle assessment. IEEE Access 2023. [CrossRef]
Huang, P.; Ding, W.; Francis, J. CaDRE: Controllable and Diverse Generation of Safety-Critical Driving Scenarios using Real-World Trajectories. arXiv preprint arXiv:2401.XXXX 2024. [CrossRef]
Zhou, R.; Lin, Z.; Zhang, G.; Huang, H.; Zhou, H.; Chen, J. Evaluating autonomous vehicle safety performance through analysis of pre-crash trajectories of powered two-wheelers. IEEE Transactions on Intelligent Transportation Systems 2024, 25, 13560–13572. [CrossRef]
Zhou, R.; Gui, W.; Huang, H.; Liu, X.; Wei, Z.; Bian, J. DiffCrash: Leveraging Denoising Diffusion Probabilistic Models to Expand High-Risk Testing Scenarios Using In-Depth Crash Data. Expert Systems with Applications 2025, p. 128140. [CrossRef]
Zhang, G.; Huang, H.; Zhou, R.; Li, S.; Bian, J. High-Risk Trajectories Generation for Safety Testing of Autonomous Vehicles Based on In-Depth Crash Data. IEEE Transactions on Intelligent Transportation Systems 2025. [CrossRef]
Oliveira, B.B.; Carravilla, M.A.; Oliveira, J.F. A diversity-based genetic algorithm for scenario generation. European journal of operational research 2022, 299, 1128–1141. [CrossRef]
Batsch, F.; Kanarachos, S.; Cheah, M.; Ponticelli, R.; Blundell, M. A taxonomy of validation strategies to ensure the safe operation of highly automated vehicles. Journal of Intelligent Transportation Systems 2020, 26, 14 – 33. [CrossRef]
Chu, Q.; Yue, Y.; Yao, D.; Pei, H. DiCriTest: Testing Scenario Generation for Decision-Making Agents Considering Diversity and Criticality. arXiv preprint arXiv:2508.11514 2025. [CrossRef]
Ding, W.; Xu, C.; Arief, M.; Lin, H.; Li, B.; Zhao, D. A survey on safety-critical driving scenario generation—a methodological perspective. IEEE Transactions on Intelligent Transportation Systems 2023, 24, 6971–6988. [CrossRef]
Zhou, R.; Huang, H.; Zhang, G.; Zhou, H.; Bian, J. Crash-Based Safety Testing of Autonomous Vehicles: Insights From Generating Safety-Critical Scenarios Based on In-Depth Crash Data. IEEE Transactions on Intelligent Transportation Systems 2025. [CrossRef]
Zhou, R.; Lin, Z.; Huang, X.; Peng, J.; Huang, H. Testing scenarios construction for connected and automated vehicles based on dynamic trajectory clustering method. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 3304–3308.
Li, S.; Zhou, R.; Huang, H. Multidimensional Evaluation of Autonomous Driving Test Scenarios Based on AHP-EWN-TOPSIS Models. Automotive Innovation 2025, pp. 1–15. [CrossRef]
Wei, Z.; Huang, H.; Zhang, G.; Zhou, R.; Luo, X.; Li, S.; Zhou, H. Interactive critical scenario generation for autonomous vehicles testing based on in-depth crash data using reinforcement learning. IEEE Transactions on Intelligent Vehicles 2024. [CrossRef]
Luo, X.; Wei, Z.; Zhang, G.; Huang, H.; Zhou, R. High-risk powered two-wheelers scenarios generation for autonomous vehicle testing using WGAN. Traffic Injury Prevention 2025, 26, 243–251. [CrossRef]
Wei, Z.; Zhou, H.; Zhou, R. Risk and Complexity Assessment of Autonomous Vehicle Testing Scenarios. Applied Sciences 2024, 14, 9866. [CrossRef]
Wei, Z.; Bian, J.; Huang, H.; Zhou, R.; Zhou, H. Generating risky and realistic scenarios for autonomous vehicle tests involving powered two-wheelers: A novel reinforcement learning framework. Accident Analysis & Prevention 2025, 218, 108038. [CrossRef]
Tang, S.; Zhang, Z.; Zhang, Y.; Zhou, J.; ling Guo, Y.; Liu, S.; Guo, S.; Li, Y.; Ma, L.; Xue, Y.; et al. A Survey on Automated Driving System Testing: Landscapes and Trends. ACM Transactions on Software Engineering and Methodology 2022, 32, 1 – 62. [CrossRef]

Figure 1. Public attitudes toward self-driving vehicles in 2024. Source: AAA Newsroom, “Fear of Self-Driving Cars Persists,” March 28, 2024.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.