Submitted:
12 July 2025
Posted:
14 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction

2. Reinforcement Learning in Autonomous Driving
2.1. Overview of Reinforcement Learning
2.2. CARLA Simulator as an Evaluation Platform
- Management and control of both vehicles and pedestrians.
- Environment customization (e.g., weather, lighting, time of day).
- Integration of diverse sensors such as LiDAR, radar, GPS, and RGB cameras.
- Python/C++ APIs for interaction and simulation control.
2.3. Related Works About Autonomus Driving
3. Review of Reinforcement Learning Approaches
4. Gaps and Limitations in RL-Based Methods
- Limited Generalization to Real-World Scenarios — Some RL-based models exhibit strong performance within narrowly defined simulation environments but fail to generalize to the complexities of real-world driving. For instance, the agent described in Reinforcement Learning-Based Autonomous Driving at Intersections in CARLA Simulator [8] was trained under constrained conditions using a binary action set (stop or proceed). Although suitable for intersection navigation in simulation, this minimalist approach lacks the flexibility and robustness required for dynamic urban environments characterized by unpredictable agent interactions, diverse traffic rules, and complex road layouts.
- Pipeline Complexity and Practical Deployment Issues — Advanced RL systems often involve highly layered and interdependent components, which, while enhancing learning performance in simulation, may hinder real-time applicability. The architecture proposed in Safe Navigation: Training Autonomous Vehicles using Deep Reinforcement Learning in CARLA [11] integrates multiple sub-models for driving and braking decisions. Despite its efficacy within the training context, the increased architectural complexity poses challenges in deployment scenarios, such as increased inference latency, reduced system interpretability, and difficulties in modular updates or extensions.
-
Reward Function Design and Safety Trade-offs — Crafting a reward function that effectively balances task completion with safe, rule-compliant behavior remains an ongoing challenge. Poorly calibrated rewards may inadvertently incentivize agents to exploit loopholes or adopt high-risk behaviors that maximize rewards at the cost of safety. The reward formulation presented in Deep Reinforcement Learning Based Control for Autonomous Vehicles in CARLA [7] mitigates this risk by incorporating lane deviation, velocity alignment, and collision penalties. Nevertheless, even well-intentioned designs can result in unintended behaviors if the agent learns to over-prioritize specific features, highlighting the need for reward tuning and safety regularization.Several works have specifically tackled these limitations by employing advanced RL strategies. Yang et al. (2021) developed uncertainty-aware collision avoidance techniques to enhance safety in autonomous vehicles [85]. Fang et al. (2022) proposed hierarchical reinforcement learning frameworks addressing the complexity of urban driving environments [86]. Cui et al. (2023) further advanced curriculum reinforcement learning methods to tackle complex and dynamic driving scenarios effectively [84]. Feng et al. (2025) utilized domain randomization strategies to enhance the generalization of RL policies, significantly improving performance across varying simulated environments [90].
5. Imitation Learning (IL) for Autonomous Driving
5.1. Core Concepts
- Agent — The learner that interacts with the environment. Unlike in reinforcement learning, the agent does not rely on scalar rewards but instead learns to imitate the expert’s demonstrated behavior.
- Expert — Typically a human operator or a pre-trained model that provides high-quality demonstration trajectories. These trajectories serve as the ground truth for training.
- Model-based vs. Model-free — Model-based approaches attempt to build a transition model of the environment and use it for planning. Model-free techniques rely solely on observed expert trajectories without reconstructing the environment’s internal dynamics.
- Policy-based vs. Reward-based — Policy-based methods directly learn a mapping from states to actions, while reward-based methods (such as inverse reinforcement learning) infer the expert’s reward function before deriving a policy.
5.2. Reviewed IL-Based Architectures
5.3. IL in the Context of This Paper - Hybrid Solutions
6. Comparative Analysis and Research Implications
7. Discussion
8. Conclusions
Author Contributions
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| RL | Reinforcement Learning |
| IL | IL |
| CARLA | Car Learning to Act (Autonomous Driving Simulator) |
| CNN | Convolutional Neural Network |
| BEV | Bird’s Eye View |
| PGV | Polar Grid View |
| IRL | Inverse Reinforcement Learning |
| GAIL | Generative Adversarial IL |
| MDP | Markov Decision Process |
| PPO | Proximal Policy Optimization |
| DQN | Deep Q-Network |
| A3C | Asynchronous Advantage Actor-Critic |
| LiDAR | Light Detection and Ranging |
| RGB | Red-Green-Blue (Color Image Input) |
| BC | Behavioral Cloning |
| DAgger | Dataset Aggregation |
| PID | Proportional-Integral-Derivative (Controller) |
| PGM | Probabilistic Graphical Model |
References
- Dhinakaran, M., Rajasekaran, R.T., Balaji, V., Aarthi, V., Ambika, S. (2024). Advanced Deep Reinforcement Learning Strategies for Enhanced Autonomous Vehicle Navigation Systems.
- Govinda, S., Brik, B., Harous, S. (2025). A Survey on Deep Reinforcement Learning Applications in Autonomous Systems: Applications, Open Challenges, and Future Directions.
- Kong, Q., Zhang, L., Xu, X. (2021). Constrained Policy Optimization Algorithm for Autonomous Driving via Reinforcement Learning.
- Kim, S., Kim, G., Kim, T., Jeong, C., Kang, C.M. (2025). Autonomous Vehicle Control Using CARLA Simulator, ROS, and EPS HILS.
- Malik, S., Khan, M.A., El-Sayed, H. (2022). CARLA: Car Learning to Act – An Inside Out.
- Razak, A. I. (2022). Implementing a Deep Reinforcement Learning Model for Autonomous Driving.
- Pérez-Gil, Ó., Barea, R., López-Guillén, E., Bergasa, L. M., Gómez-Huélamo, C., Gutiérrez, R., Díaz-Díaz, A. (2022). Deep Reinforcement Learning Based Control for Autonomous Vehicles in CARLA.
- Gutiérrez-Moreno, R., Barea, R., López-Guillén, E., Araluce, J., Bergasa, L. M. (2022). Reinforcement Learning-Based Autonomous Driving at Intersections in CARLA Simulator.
- Dosovitskiy, A., Ros, G., Codevilla, F., López, A., Koltun, V. (2017). CARLA: An Open Urban Driving Simulator.
- Li, Q., Jia, X., Wang, S., Yan, J. (2024). Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-V2).
- Nehme, G., Deo, T. Y. (2023). Safe Navigation: Training Autonomous Vehicles using Deep Reinforcement Learning in CARLA.
- Codevilla, F., Müller, M., López, A., Koltun, V., Dosovitskiy, A. (2017). End-to-end Driving via Conditional IL.
- Chen, D., Zhou, B., Koltun, V., Krähenbühl, P. (2019). Learning by Cheating.
- Eraiqi, H. M., Moustafa, M. N., HÖner, J. (2022). Dynamic Conditional IL for Autonomous Driving.
- Abdou, M., Kamai, H., El-Tantawy, S., Abdelkhalek, A., Adei, O., Hamdy, K., Abaas, M. (2019). End-to-End Deep Conditional IL for Autonomous Driving.
- Li, Z. (2021). A Hierarchical Autonomous Driving Framework Combining Reinforcement Learning and IL.
- Arulkumaran, K., Deisenroth, M. P., Brundage, M., Bharath, A. A. (2017). A Deep Reinforcement Learning: A Brief Survey.
- Shrestha, A., Mahmood, A. (2019). Review of Deep Learning Algorithms and Architectures.
- Elavarasan, D., Vincent, P.M.D. (2020). Crop Yield Prediction Using Deep Reinforcement Learning Model for Sustainable Agrarian Applications.
- Zhou, Z., Chen, X., Li, E., Zeng, L., Lue, K., Zhang, J. (2019). Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing.
- Sutton, R.S., Barto, A.G. (1998). Reinforcement Learning: An Introduction.
- Lapan, M. (2022) Głębokie uczenie przez wzmacnianie. Praca z chatbotami oraz robotyka, optymalizacja dyskretna i automatyzacja sieciowa w praktyce.
- Cui, J., Liu, Y., Arumugam, N. (2019). Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks.
- Shaukat, K., Luo, S., Varadharajan, V., Hameed, I., Xu, M. (2020). A Survey on Machine Learning Techniques for Cyber Security in the Last Decade.
- Ye, H., Li, G.Y., Juang, B.F. (2019). Deep Reinforcement Learning Based Resource Allocation for V2V Communications.
- Le, L., Nguyen, T.N. (2022). DQRA: Deep Quantum Routing Agent for Entanglement Routing in Quantum Networks.
- Scholköpf, B., Locatello, F., Bauer, S., Ke, N.R., Kalchbrenner, N., Goyal, A., Bengio, Y. (2021). Toward Causal Representation Learning.
- Huang, C., Zhang, H., Wang, L., Luo, X., Song, Y. (2022). Mixed Deep Reinforcement Learning Considering Discrete-continuous Hybrid Action Space for Smart Home Energy Management.
- Sogabe, T., Malla, D.B., Takayama, S., Shin, S., Sakamoto, K., Yamaguchi, K., Singh, T.P., Sogabe, M., Hirata, T., Okada, Y. (2018). Smart Grid Optimization by Deep Reinforcement Learning over Discrete and Continuous Action Space.
- Guériau, M., Cardozo, N., Dusparic, I. (2019). Constructivist Approach to State Space Adaptation in Reinforcement Learning.
- Abdulazeez, D.H., Askar, S.K. (2023). Offloading Mechanisms Based on Reinforcement Learning and Deep Learning Algorithms in the Fog Computing Environment.
- Mahadevkar, S.V., Khemani, B., Patil, S., Kotecha, K., Vora, D.R., Abraham, A., Gabralla, L.A. (2022). A Review on Machine Learning Styles in Computer Vision—Techniques and Future Directions.
- Shukla, I., Dozier, H.R., Henslee, A.C. (2022). A Study of Model Based and Model Free Offline Reinforcement Learning.
- Hyang, Q. (2020). Model-Based or Model-Free, a Review of Approaches in Reinforcement Learning.
- Beyon, H. (2023). Advances in Value-based, Policy-based, and Deep Learning-based Reinforcement Learning.
- Liu, M., Wan, Y., Lewis, F.L., Lopez, V.G. (2020). Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning.
- Banerjee, C., Chen, Z., Noman, N., Lopez, V.G. (2022). Improved Soft Actor-Critic: Mixing Prioritized Off-Policy Samples With On-Policy Experiences.
- Nikpour, B., Sinodinos, D., Armanfard, N. (2022). Deep Reinforcement Learning in Human Activity Recognition: A Survey.
- Kim, J., Kim, G., Hong, S., Cho, S. (2024). Advancing Multi-Agent Systems Integrating Federated Learning with Deep Reinforcement Learning: A Survey.
- Hofbauer, M., Kuhn, C., Petrovic, G., Steinbach, E. (2020). TELECARLA: An Open Source Extension of the CARLA Simulator for Teleoperated Driving Research Using Off-the-Shelf Components.
- Sakhai, M.; Wielgosz, M. Towards End-to-End Escape in Urban Autonomous Driving Using Reinforcement Learning. In: Arai, K. (Ed.) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 823. Springer, Cham, 2024. [CrossRef]
- Kołomański, M.; Sakhai, M.; Nowak, J.; Wielgosz, M. Towards End-to-End Chase in Urban Autonomous Driving Using Reinforcement Learning. In: Arai, K. (Ed.) Intelligent Systems and Applications. IntelliSys 2022. Lecture Notes in Networks and Systems, vol 544. Springer, Cham, 2023. [CrossRef]
- Sakhai, M.; Mazurek, S.; Caputa, J.; Argasiński, J.K.; Wielgosz, M. Spiking Neural Networks for Real-Time Pedestrian Street-Crossing Detection Using Dynamic Vision Sensors in Simulated Adverse Weather Conditions. Electronics 2024, 13, 4280. [CrossRef]
- Yang, Z., Jia, X., Li, Q., Yang, X., Yao, M., Yan, J. (2025). Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2).
- Uppuluri, B., Patel, A., Mehta, N., Kamath, S., Chakraborty, P. (2025). CuRLA: Curriculum Learning Based Deep Reinforcement Learning For Autonomous Driving.
- Surmann, H., de Heuvel, J., Bennewitz, M. (2025). Multi-Objective Reinforcement Learning for Adaptive Personalized Autonomous Driving.
- Bertsekas, D. P. (2024). Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming.
- Vu, T. M., Moezzi, R., Cyrus, J., Hlava, J. (2021). Model Predictive Control for Autonomous Driving Vehicles.
- Liang, X., Wang, T., Yang, L., & Xing, E. (2018). Cirl: Controllable imitative reinforcement learning for vision-based self-driving. In Proceedings of the European conference on computer vision (ECCV) (pp. 584-599).
- Chekroun, R., Toromanoff, M., Hornauer, S., & Moutarde, F. (2023). Gri: General reinforced imitation and its application to vision-based autonomous driving. Robotics, 12(5), 127.
- Phan-Minh, T., Howington, F., Chu, T. S., Lee, S. U., Tomov, M. S., Li, N., ... & Wolff, E. M. (2022). Driving in real life with inverse reinforcement learning. arXiv preprint arXiv:2206.03004.
- Ho, J., & Ermon, S. (2016). Generative adversarial IL. Advances in neural information processing systems, 29.
- Han, Y., & Yilmaz, A. (2022, August). Learning to drive using sparse imitation reinforcement learning. In 2022 26th International Conference on Pattern Recognition (ICPR) (pp. 3736-3742). IEEE.
- Reddy, S., Dragan, A. D., & Levine, S. (2019). Sqil: IL via reinforcement learning with sparse rewards. arXiv preprint arXiv:1905.11108.
- Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., Yogamani, S., & Pérez, P. (2021). Deep reinforcement learning for autonomous driving: A survey. IEEE transactions on intelligent transportation systems, 23(6), 4909-4926.
- Zhu, Z., & Zhao, H. (2021). A survey of deep RL and IL for autonomous driving policy learning. IEEE Transactions on Intelligent Transportation Systems, 23(9), 14043-14065.
- Dosovitskiy, A., Ros, G., Codevilla, F., López, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. arXiv preprint arXiv:1711.03938.
- Codevilla, F., Müller, M., López, A., Koltun, V., & Dosovitskiy, A. (2017). End-to-end driving via conditional imitation learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (pp. 4693–4700). Singapore, Singapore.
- Pérez-Gil, Ó., Barea, R., López-Guillén, E., Bergasa, L. M., Gómez-Huélamo, C., Gutiérrez, R., & Díaz-Díaz, A. (2022). Deep reinforcement learning based control for autonomous vehicles in CARLA. Electronics, 11(7), 1035. [CrossRef]
- Nehme, G., & Deo, T. Y. (2023). Safe navigation: Training autonomous vehicles using deep reinforcement learning in CARLA. Sensors, 23(18), 7611. [CrossRef]
- Li, Q., Jia, X., Wang, S., & Yan, J. (2024). Think2Drive: Efficient reinforcement learning by thinking with latent world model for autonomous driving (in CARLA-V2). IEEE Transactions on Intelligent Vehicles, Early Access. [CrossRef]
- Sakhai, M.; Sithu, K.; Soe Oke, M.K.; Wielgosz, M. Cyberattack Resilience of Autonomous Vehicle Sensor Systems: Evaluating RGB vs. Dynamic Vision Sensors in CARLA. Applied Sciences 2025, 15(13), 7493. [CrossRef]
- Aranceta-Bartrina, Javier. 1999a. Title of the cited article. Journal Title 6: 100–10.
- Aranceta-Bartrina, Javier. 1999b. Title of the chapter. In Book Title, 2nd ed. Edited by Editor 1 and Editor 2. Publication place: Publisher, vol. 3, pp. 54–96.
- Baranwal, Ajay K., and Costea Munteanu. 1955. Book Title. Publication place: Publisher, pp. 154–96. First published 1921 (op-tional).
- Berry, Evan, and Amy M. Smith. 1999. Title of Thesis. Level of Thesis, Degree-Granting University, City, Country. Identifi-cation information (if available).
- Cojocaru, Ludmila, Dragos Constatin Sanda, and Eun Kyeong Yun. 1999. Title of Unpublished Work. Journal Title, phrase indicating stage of publication.
- Driver, John P., Steffen Rohrs, and Sean Meighoo. 2000. Title of Presentation. In Title of the Collected Work (if available). Paper presented at Name of the Conference, Location of Conference, Date of Conference.
- Harwood, John. 2008. Title of the cited article. Available online: URL (accessed on Day Month Year).
- Azikiwe, H. and Bello, A. (2020a). Title of the cited article. Journal Title, Volume(Issue), Firstpage–Lastpage or Article Number.
- Azikiwe, H. and Bello, A. (2020b). Book title. Publisher Name.
- Davison, T. E. (2019). Title of the book chapter. In A. A. Editor (Ed.), Title of the book: Subtitle (pp. Firstpage–Lastpage). Publisher Name. (Original work published 1623) (Optional).
- Fistek, A., Jester, E., & Sonnenberg, K. (2017, Month Day). Title of contribution [Type of contribution]. Conference Name, Conference City, Conference Country.
- Hutcheson, V. H. (2012). Title of the thesis [XX Thesis, Name of Institution Awarding the Degree].
- Lippincott, T., & Poindexter, E. K. (2019). Title of the unpublished manuscript [Unpublished manuscript/Manuscript in prepara-tion/Manuscript submitted for publication]. Department Name, Institution Name.
- Toromanoff, M., Wirbel, E., Moutarde, F. (2020). End-to-end Model-free Reinforcement Learning for Urban Driving Using Implicit Affordances. arXiv preprint arXiv:2001.09445.
- Wang, Y., Chitta, K., Liu, H., Chernova, S., Schmid, C. (2021). InterFuser: Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer. arXiv preprint arXiv:2109.05499.
- Chen, Y., Li, H., Wang, Y., Tomizuka, M. (2021). Learning Safe Multi-Vehicle Cooperation with Policy Optimization in CARLA. IEEE Robotics and Automation Letters, 6(2), 3568-3575.
- Huang, Y., Xu, X., Yan, Y., Liu, Z. (2022). Transfer Reinforcement Learning for Autonomous Driving under Diverse Weather Conditions. IEEE Transactions on Intelligent Vehicles, 7(3), 593-603.
- Chen, J., Peng, Y., Wang, X. (2023). Reinforcement Learning-Based Motion Planning for Autonomous Vehicles at Unsignalized Intersections. Transportation Research Part C, 158, 104945.
- Zeng, R., Luo, J., Wang, J. (2024). Benchmarking Autonomous Driving Systems in Simulated Dynamic Traffic Environments. IEEE Transactions on Intelligent Transportation Systems, 25(1), 121-132.
- Liu, Y., Zhang, Q., Zhao, L. (2025). Multi-Agent Reinforcement Learning for Cooperative Autonomous Vehicles in CARLA. Journal of Intelligent Transportation Systems, 29(2), 198-212.
- Jia, Z., Yang, Y., Zhang, S. (2020). Towards Realistic End-to-End Autonomous Driving with Model-Based Reinforcement Learning. arXiv preprint arXiv:2006.06713.
- Cui, X., Yu, H., Zhao, J. (2023). Adaptive Curriculum Reinforcement Learning for Autonomous Driving in Complex Scenarios. IEEE Transactions on Vehicular Technology, 72(8), 9874-9886.
- Yang, Z., Liu, J., Wu, H. (2021). Safe Reinforcement Learning for Autonomous Vehicles with Uncertainty-Aware Collision Avoidance. IEEE Robotics and Automation Letters, 6(3), 6312-6319.
- Fang, Y., Yan, J., Luo, H. (2022). Hierarchical Reinforcement Learning Framework for Urban Autonomous Driving in CARLA. Robotics and Autonomous Systems, 158, 104212.
- Jiang, X., Zhao, H., Zeng, Y. (2025). Benchmarking Reinforcement Learning Algorithms in CARLA: Performance, Stability, and Robustness Analysis. Transportation Research Record, 2025(1), 247-258.
- Cheng, Y., Wu, J., Wang, Z. (2023). End-to-End Urban Autonomous Driving with Deep Reinforcement Learning and Curriculum Strategies. Applied Sciences, 13(9), 5432.
- Huang, X., Chen, H., Zhao, L. (2024). Hybrid Imitation and Reinforcement Learning for Safe Autonomous Driving in CARLA. IEEE Transactions on Intelligent Transportation Systems, Early Access.
- Feng, R., Xu, L., Luo, X. (2025). Generalization of Reinforcement Learning Policies in Autonomous Driving: A Domain Randomization Approach. IEEE Transactions on Vehicular Technology, Early Access.
- Li, Z., Zhang, S., Zhou, D. (2022). Behavioral Cloning and Reinforcement Learning for Autonomous Driving: A Comparative Study. IEEE Intelligent Transportation Systems Magazine, 14(4), 27-41.
- Luo, Y., Wang, Z., Zhang, X. (2023). Improving Imitation Learning for Autonomous Driving through Adaptive Data Augmentation. Sensors, 23(11), 4981.
- Mohanty, A., Lee, J., Patel, R. (2024). Inverse Reinforcement Learning for Human-Like Autonomous Driving Behavior in CARLA. IEEE Transactions on Human-Machine Systems, Early Access.
- Kim, J., Cho, S. (2025). Reinforcement and Imitation Learning Fusion for Autonomous Vehicle Safety Enhancement. IEEE Transactions on Intelligent Vehicles, Early Access.











| Method | Use model | Based on | On-Policy/Off-policy |
| Q-learning | Model free | Values | Off-policy |
| REINFORCE | Model free | Policy | On-policy |
| Actor-Critic | Model free | Hybrid (Values + Policy) | On-policy |
| Proximal Policy Optimization (PPO) | Model free | Policy | On-policy |
| Deep Q-Network | Model free | Values (approx. using neural network) | Off-policy |
| Model Predictive Control (MPC) | Model based | Dynamics model + Policy Optimization | Typically off-policy |
| Paper[Agent] | State space | Action space |
| Implementing a deep reinforcement learning model for autonomous driving[6] [PPO agent] | ||
| Deep reinforcement learning based control for Autonomous Vehicles in CARLA[7][DRL-flatten-image agent] | ||
| Deep reinforcement learning based control for Autonomous Vehicles in CARLA[7][DRL-Carla-Waypoints agent] | ||
| Deep reinforcement learning based control for Autonomous Vehicles in CARLA[7][DRL-CNN agent] | ||
| Deep reinforcement learning based control for Autonomous Vehicles in CARLA[7][DRL-Pre-CNN agent] | ||
| Reinforcement Learning-Based Autonomous Driving at Intersections in CARLA Simulator[8] [RL agent] | distance from intersection, speed | stop[speed=0m/s], drive[speed=5m/s] |
| CARLA: An Open Urban Driving Simulator[9] [A3C agent] | , speed, distance to goal, damage from colisions | steer[-1, 1], throttle[0, 1], brake[0, 1] |
| Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving[10] [Think2Drive agent] | , speed | throttle[0, 1], brake[0, 1], steer[-1, 1] |
| Safe Navigation: Training Autonomous Vehicles using Deep Reinforcement Learning in CARLA[11] [Combined DQN agent] | distance, distance from obstacle, [degress], speed, traffic light | brake[throttle=0.0, brake=1.0, steer=0.0], drive straight[throttle=0.3, brake=0.0, steer=0.0], turn left[throttle=0.1, brake=0.0, steer=-0.6], turn right[throttle=0.1, brake=0.0, steer=0.6], turn slightly left[throttle=0.4, brake=0.0, steer=-0.1], turn slightly right[throttle=0.4, brake=0.0, steer=0.1] |
| Paper[Agent] | Reward |
| Implementing a deep reinforcement learning model for autonomous driving[6][There was only one agent presented] | |
| Deep reinforcement learning based control for Autonomous Vehicles in CARLA[7][DRL-flatten-image agent, DRL-Carla-Waypoints agent, DRL-CNN agent, DRL-Pre-CNN agent] | |
| Reinforcement Learning-Based Autonomous Driving at Intersections in CARLA Simulator[8][There was only one agent presented] | |
| CARLA: An Open Urban Driving Simulator[9][There was only one agent presented] | |
| Think2Drive: Efficient RL by Thinking with Latent World Model[10][There was only one agent presented] | |
| Safe Navigation: Training Autonomous Vehicles using DRL in CARLA[11][One agent combined two models] | |
| Value | Description |
| v | Current velocity, normalized in the range [0, 20] km/h |
| Minimum acceptable velocity | |
| Target velocity, set to 20 km/h | |
| Normalization factor for lateral lane offset, defined as m | |
| Component of reward penalizing angular deviation from road heading | |
| Current angular deviation | |
| Maximum tolerated deviation; values above this nullify |
| Agent | Architecture Illustration |
| DRL-flatten-image agent | ![]() |
| DRL-Carla-Waypoints agent | ![]() |
| DRL-CNN agent | ![]() |
| DRL-Pre-CNN agent | ![]() |
| Value | Description |
| Distance remaining to the goal at time step t | |
| Vehicle velocity at time t | |
| Collision damage accumulated at time t | |
| Boolean indicating whether the vehicle is on the sidewalk at time t (0 or 1) | |
| Boolean indicating whether the vehicle is in the opposite lane at time t (0 or 1) |
| Value | Description |
| Reward term proportional to the vehicle’s instantaneous velocity | |
| Incentivizes forward progress based on cumulative distance traveled | |
| Penalizes deviation from the centerline of the driving lane (normalized by ) | |
| Penalizes abrupt changes in steering between consecutive steps | |
| Scaling coefficients for , , and respectively; values are task-specific and not reported in the article |
| Value | Description |
| Reward assigned based on the output of a predefined suboptimal policy | |
| d | Lateral deviation from the road centerline |
| v | Vehicle velocity |
| a | Braking model action: 0 = brake, 1 = drive |
| Angular deviation between vehicle heading and road direction | |
| Distance to nearest obstacle from depth image | |
| Binary flag indicating whether a collision has occurred |
| Method | Use model | Based on |
| Behavioral Cloning (BC) | Model-free | Values |
| DAgger (Dataset Aggregation) | Model-free | Policy |
| Inverse Reinforcement Learning (IRL) | Indirect (Learns Reward) | Reward |
| Generative Adversarial IL (GAIL) | Model-free | Policy |
| Paper[Agent] | State space | Action space |
| End-to-end Driving via Conditional IL[12] [Simulation and physical agent; identical state/action spaces] | ||
| Learning by Cheating[13] [Privileged agent] | ||
| Learning by Cheating[13] [Sensorimotor agent] | ||
| Dynamic Conditional IL for Autonomous Driving[14] [Single-agent architecture] |
| Article | Godal | Architecture Complexity | Environment Complexity | Scenarios | Advantages |
| Implementing a Deep Reinforcement Learning Model for Autonomous Driving | End-to-end RL agent with Variational Autoencoder(VAE) | 2 | 3 | One scenario in three towns(1, 2 and 7) | An extensive and complex approach to the subject |
| Deep Reinforcement Learning Based Control for Autonomous Vehicles in CARLA | Comparison of DQN and DDPG on few models | 1 | 1 | Two scenarios in town 1 | Multiple and varied approaches to agent’s architecture |
| Reinforcement Learning-Based Autonomous Driving at Intersections in CARLA Simulator | Complete agent capable of driving through intersection with traffic | 2 | 3 | thee scenarios, one for each type of intersection(lights, stop signal, uncontrolled) | Complex approach to crossing intersection with traffic |
| CARLA: An Open Urban Driving Simulator | Comparison of performance of RL, IL and Modular Pipeline | 1 | 3 | Five scenarios in four possibilities(training conditions, new town, new weather, new weather and town) | A good comparison of RL with other machine learning methods with a multi-sensor agent |
| Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-V2) | RL agent possible to drive in multiple corner cases | 3 | 3 | 39 detailed scenarios | A highly complex approach to an agent capable of driving multiple corner case scenarios |
| Safe Navigation: Training Autonomous Vehicles using Deep Reinforcement Learning in CARLA | RL agent capable of driving to maintain speed and break when necessary to avoid collision | 2 | 2 | Four scenarios in town 2 | A complex and robust approach to an agent composed of two models (breaking model and driving model) |
| Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2) | Model-based RL agent capable of learning effective driving from raw sensor data | 3 | 3 | 220 routes from Batch2Drive benchmark | Recent highly complex and highly robust dual-stream Model Based RL approach with one stream of privileged sensor data |
| CuRLA: Curriculum Learning Based Deep Reinforcement Learning For Autonomous Driving | PPO+VAE agent for driving in environment with increasing traffic | 2 | 2 | Few routes in town 7 with changing traffic | A modern approach combining Curriculum Learning with Deep Reinforcement Learning |
| Multi-Objective Reinforcement Learning for Adaptive Personalized Autonomous Driving | Multi-objective RL agent capable of driving acording to preferences | 2 | 1 | One scenario | A modern approach to autonomous driving with a multi-objective (selected driving preferences) end-to-end agent |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).



