Submitted:
08 January 2025
Posted:
08 January 2025
You are already at the latest version
Abstract
Deep reinforcement learning (DRL) trains agents to make decisions by learning from rewards and penalties, using trial and error. It combines reinforcement learning with deep neural networks, enabling agents to process large datasets and learn from complex environments. DRL has achieved notable success in gaming, robotics, decision-making, etc. However, real-world applications, such as self-driving cars, face challenges due to complex state and action spaces, requiring precise control. Researchers continue to develop new algorithms to improve performance in dynamic settings. A key algorithm, Deep Q-Network (DQN), uses neural networks to approximate the Q-value function, but suffers from overestimation, leading to suboptimal outcomes. To address this, Double Deep Q-Network (DDQN) was introduced to reduce bias by separating action selection from evaluation, resulting in more stable learning. This work examines the effectiveness of DQN and DDQN in autonomous driving using the CARLA simulator, highlighting DDQN's benefits in reducing bias and enhancing policy performance.
Keywords:
1. Introduction
2. Overview of Deep Reinforcement Learning in Autonomous Driving
2.1. Reinforcement Learning

2.2. Deep Reinforcement Learning
2.3. Deep Q-Networks (DQN)
2.4. Double Deep Q-Networks (DDQN)
3. Benchmarking - Urban Driving Simulator

4. Related work
5. Methodology
5.1. Proposed Model
5.2. Model Architecture and State Space

| Layer | Input Dimensions | Output Dimensions | Activation | Dropout | Notes |
|---|---|---|---|---|---|
| Convolutional 1 | 84x84x1 | 32x42x32 | ReLU | Yes (0.4) | 8x8 kernel, stride 4 |
| Convolutional 2 | 32x42x32 | 64x21x64 | ReLU | Yes (0.4) | 4x4 kernel, stride 2 |
| Convolutional 3 | 64x21x64 | 64x10x64 | ReLU | No | 3x3 kernel, stride 1 |
| Max Pooling 1 | 64x10x64 | 64x5x64 | N/A | No | 2x2 kernel, stride 2 |
| Convolutional 4 | 64x5x64 | 64x4x64 | ReLU | No | 3x3 kernel, stride 1 |
| Max Pooling 2 | 64x4x64 | 64x2x64 | N/A | No | 2x2 kernel, stride 2 |
| LSTM | 64x2x64 (flattened) | 256 hidden units | N/A | No | 1 layer |
| Fully Connected 1 | 256 | 512 | ReLU | No | Dense layer |
| Fully Connected 2 | 512 | 6 | N/A | No | Action output layer |
5.3. Experiments
5.4. Mounted Sensors and Hyperparameters
| Sensor | Usage | Specific Attributes | Location of the Vehicle | Function |
|---|---|---|---|---|
| RGB Camera | Images are resized to 84x84 pixels and converted to grayscale to be processed by the CNN-LSTM model. Used for autonomous decision-making. |
Resolution: 640x480 pixels (modifiable). Field of View (FOV): 110 degrees |
Mounted at the front, coordinates: x=2.5, z=0.7 |
Captures color images for visual perception of the environment. |
| Collision Sensor | Detects vehicle collisions with the environment and logs these events. | Logs collisions in a collision_hist list. Negative reward assigned in case of collision, with a penalty of -20 points. |
Attached to the vehicle (exact position unspecified). | Used to evaluate safety during training, penalizing unsafe behavior and preventing crashes. |
| Hyperparameters | Value | Description |
| SHOW_PREVIEW | False | Controls whether the front camera preview is shown during the simulation. |
| IM_WIDTH | 640 | Width of the camera image. |
| IM_HEIGHT | 480 | Height of the camera image. |
| SECONDS_PER_EPISODE | 30 | Maximum time (in seconds) for each episode in the environment. |
| MIN_REWARD | -200 | Minimum reward threshold to consider for episode termination. |
| STEER_AMT | 1.0 | Steering amount (how much the vehicle turns when an action is taken). |
| num_frames | 8 | Number of image frames to stack for input to the neural network. |
| Gamma | 0.99 | The discount factor is used in Q-learning to calculate the future expected rewards. |
| Batch_Size | 32 | Number of transitions to sample from the replay buffer in each training iteration. |
| Buffer_Size | 5,000,000 | Maximum size of the replay buffer (number of stored transitions). |
| Min_Replay_Size | 100,000 | Minimum number of transitions to collect before starting training. |
| Episodes | 2000 | Total number of episodes to run the training loop. |
| Epsilon | 1.0 | Initial exploration rate for ε-greedy policy (controls how often random actions are taken). |
| min_epsilon | 0.0001 | The minimum value to which epsilon can decay. |
| Decay | (min_epsilon/epsilon)**(1/episodes) | Decay factor for reducing epsilon after each episode. |
| lstm_hidden_size | 256 | Number of hidden units in the LSTM layer. |
| num_lstm_layers | 1 | Number of layers in the LSTM. |
| Optimizer | Adam | Optimization algorithm used for training the neural network. |
| learning_rate | 5e-4 | The learning rate for the Adam optimizer. |
| Dropout | 0.4 | Dropout rate used in the CNN layers to prevent overfitting. |
| target_net_update_freq | Every 4 episodes | Frequency of updating the target network with the weights from the online network. |
| max_pool_kernel_size | 2 | Kernel size for max-pooling layers in the CNN. |
| reward_success | 250 | The reward is assigned when the vehicle completes an episode without collision within the time limit. |
| reward_collision | -20 | A reward penalty is assigned when a collision occurs. |
| reward_step | 5 | The reward for each successful step taken without a collision. |
5.5. Action Space
5.6. Results





6. Conclusion
References
- Shalev-Shwartz, S., Shammah, S., and Shashua, A. (2016). Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving. ArXiv, abs/1610.03295.
- Haklidir, M., and Temeltas, H. (2022). Autonomous Driving Systems for Decision-Making Under Uncertainty Using Deep Reinforcement Learning. 2022 30th Signal Processing and Communications Applications Conference (SIU), 1-4.
- Qian, Z., Guo, P., Wang, Y., and Xiao, F. (2023). Ethical and moral decision-making for self-driving cars based on deep reinforcement learning. Journal of Intelligent and Fuzzy Systems, 45(4), 5523–5540. [CrossRef]
- Sallab, A.E., Abdou, M., Perot, E., and Yogamani, S.K. (2017). Deep Reinforcement Learning framework for Autonomous Driving. Autonomous Vehicles and Machines. [CrossRef]
- Kuutti, S., Bowden, R., Jin, Y., Barber, P., and Fallah, S. (2020). A Survey of Deep Learning Applications to Autonomous Vehicle Control. IEEE Transactions on Intelligent Transportation Systems, 22, 712-733. [CrossRef]
- Liu, Z., Cai, Y., Wang, H., Chen, L., Gao, H., Jia, Y., and Li, Y. (2021). Robust target recognition and tracking of Self-Driving cars with radar and camera information fusion under severe weather conditions. IEEE Transactions on Intelligent Transportation Systems, 23(7), 6640–6653. [CrossRef]
- Hoel, C., Wolff, K., and Laine, L. (2018). Automated Speed and Lane Change Decision Making using Deep Reinforcement Learning. 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2148-2155.
- Ronecker, M.P., and Zhu, Y. (2019). Deep Q-Network Based Decision Making for Autonomous Driving. 2019 3rd International Conference on Robotics and Automation Sciences (ICRAS), 154-160.
- Elallid, B.B., Benamar, N., Mrani, N., and Rachidi, T. (2022). DQN-based Reinforcement Learning for Vehicle Control of Autonomous Vehicles Interacting With Pedestrians. 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), 489-493.
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M.A., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518, 529-533. [CrossRef]
- Elallid, B.B., Benamar, N., Bagaa, M., and Aoul, Y.H. (2024). Enhancing Autonomous Driving Navigation Using Soft Actor-Critic. Future Internet, 16, 238.
- Katrakazas, C., Quddus, M.A., Chen, W., and Deka, L. (2015). Real-time motion planning methods for autonomous on-road driving: State-of-the-art and future research directions. Transportation Research Part C-emerging Technologies, 60, 416-442. [CrossRef]
- Giannaros, A., Karras, A., Theodorakopoulos, L., Karras, C.N., Kranias, P., Schizas, N., Kalogeratos, G., and Tsolis, D. (2023). Autonomous Vehicles: Sophisticated Attacks, Safety Issues, Challenges, Open Topics, Blockchain, and Future Directions. J. Cybersecur. Priv., 3, 493-543. [CrossRef]
- Pérez-Gil, Ó., Barea, R., López-Guillén, E., Bergasa, L.M., Gómez-Huélamo, C., Gutiérrez, R., and Diaz-Diaz, A. (2022). Deep reinforcement learning based control for Autonomous Vehicles in CARLA. Multimedia Tools and Applications, 81, 3553 - 3576. [CrossRef]
- Thompson, C.R., Talla, R.R., Gummadi, J.C., and Kamisetty, A. (2019). Reinforcement Learning Techniques for Autonomous Robotics. Asian Journal of Applied Science and Engineering. [CrossRef]
- Hu, D., Huang, C., Wu, J., and Gao, H. (2024). Pre-trained Transformer-Enabled Strategies with Human-Guided Fine-Tuning for End-to-end Navigation of Autonomous Vehicles. ArXiv, abs/2402.12666.
- Georgeon, O.L., Casado, R.C., and Matignon, L. (2015). Modeling Biological Agents Beyond the Reinforcement-learning Paradigm. Biologically Inspired Cognitive Architectures. [CrossRef]
- Rizehvandi, A., Azadi, S., and Eichberger, A. (2024). Decision-Making Policy for Autonomous Vehicles on Highways Using Deep Reinforcement Learning (DRL) Method. Automation. [CrossRef]
- Chen, Y., Ji, C., Cai, Y., Yan, T., and Su, B. (2024). Deep Reinforcement Learning in Autonomous Car Path Planning and Control: A Survey. ArXiv, abs/2404.00340.
- Ronecker, M.P., and Zhu, Y. (2019). Deep Q-Network Based Decision Making for Autonomous Driving. 2019 3rd International Conference on Robotics and Automation Sciences (ICRAS), 154-160.
- Zhang, Y., Sun, P., Yin, Y., Lin, L., and Wang, X. (2018). Human-like Autonomous Vehicle Speed Control by Deep Reinforcement Learning with Double Q-Learning. 2018 IEEE Intelligent Vehicles Symposium (IV), 1251-1256.
- Dosovitskiy, A., Ros, G., Codevilla, F., López, A.M., and Koltun, V. (2017). CARLA: An Open Urban Driving Simulator. Conference on Robot Learning.
- Li, P.X., Kusari, A., and Leblanc, D. (2021). A Novel Traffic Simulation Framework for Testing Autonomous Vehicles Using SUMO and CARLA. ArXiv, abs/2110.07111.
- Papadakis, A., Theodorou, T., Mamatas, L., and Petridou, S.G. (2021). An experimentation environment for SDN-based autonomous vehicles in smart cities. 2021 17th International Conference on Network and Service Management (CNSM), 391-393.
- Elallid, B.B., Benamar, N., Hafid, A.S., Rachidi, T., and Mrani, N. (2022). A Comprehensive Survey on the Application of Deep and Reinforcement Learning Approaches in Autonomous Driving. J. King Saud Univ. Comput. Inf. Sci., 34, 7366-7390. [CrossRef]
- Elallid, B.B., Benamar, N., Mrani, N., and Rachidi, T. (2022). DQN-based Reinforcement Learning for Vehicle Control of Autonomous Vehicles Interacting With Pedestrians. 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), 489-493.
- Hossain, J. (2023). Autonomous Driving with Deep Reinforcement Learning in CARLA Simulation. ArXiv, abs/2306.11217.
- Tammewar, A., Chaudhari, N., Saini, B., Venkatesh, D., Dharahas, G., Vora, D.R., Patil, S.A., Kotecha, K.V., and Alfarhood, S. (2023). Improving the Performance of Autonomous Driving through Deep Reinforcement Learning. Sustainability. [CrossRef]
- Bojarski, M., Testa, D.W., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L.D., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., and Zieba, K. (2016). End to End Learning for Self-Driving Cars. ArXiv, abs/1604.07316.
- Chen, Y., Palanisamy, P., Mudalige, P.W., Muelling, K., and Dolan, J.M. (2018). Learning On-Road Visual Control for Self-Driving Vehicles With Auxiliary Tasks. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 331-338.
- Chen, L., Hu, X., Tang, B., and Cheng, Y. (2022). Conditional DQN-Based Motion Planning with Fuzzy Logic for Autonomous Driving. IEEE Transactions on Intelligent Transportation Systems, 23, 2966-2977. [CrossRef]
- Pérez-Gil, Ó., Barea, R., López-Guillén, E., Bergasa, L.M., Gómez-Huélamo, C., Gutiérrez, R., and Diaz-Diaz, A. (2022). Deep reinforcement learning based control for Autonomous Vehicles in CARLA. Multimedia Tools and Applications, 81, 3553 - 3576. [CrossRef]

| Actions | Control Commands |
| 0 | Steer left |
| 1 | Go straight |
| 2 | Steer right |
| 3 | Slow down and steer left |
| 4 | Slow down and go straight |
| 5 | Slow down and steer right |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).