Submitted:
20 March 2026
Posted:
23 March 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methodology
3. Simulation Platform Establishment
3.1. Accessible Lanes Perception
3.1.1. Accessible Diverging Path Generation
3.1.2. Perception Based on Path
- Vehicle-related information: It comprises three categories: dynamic kinematic states, including vehicle’s position ((, ), velocity (, ), and longitudinal acceleration (); static attributes, including the toll collection type () and initial lane (); and surrounding vehicle indicators () for the presence of other vehicles in predefined surrounding zone.
- Path-related information: It includes the available longitudinal distance (), lateral moving magnitude (), and the queue length () for each accessible path, where j is the toll lane number determined by the current vehicle’s toll collection type.
3.2. Dynamic Toll Lane Decision
3.3. Dynamic Toll Lane Decision
4. Multi-Agent Cooperative Decision Model
4.1. Action Space
4.2. State Space
- Local observation space: During the decentralized execution phase, each agent solely perceives the environment information through its own sensors, then forms a local observation . This allows any differences in their traffic performance to be attributed solely to their respective control strategies or human behavioral models. Specifically, includes the vehicle’s ego state (, , , ,), surrounding vehicles information (-), and path-related information (, , ).
- Global state space: During the centralized training phase, the critic network takes the global state information as input to accurately estimate the expected joint return of the agents, enabling the learning of cooperative policies. Consequently, the global state is defined as:
4.3. Reward Function
4.3.1. Traffic Efficiency Reward
4.3.2. Traffic Efficiency Reward
4.4. MAPPO Training Framework
5. Simulation Experiment
5.1. Data Collection and Processing
5.2. Model Setup
5.2.1. Simulation Platform Setup
5.2.2. Simulation Platform Setup
6. Simulation Results and Analysis
6.1. Benchmark Implementation
6.2. Performance Evaluation
6.3. Comparative Analysis
- Traffic volume sensitivity test: The length of the diverging area was fixed at 140 m, while traffic volumes were set to 1500, 1750, and 2000 veh/h.
- Geometric sensitivity test: With the traffic volume fixed at 1500 veh/h, the lengths of the diverging area were set to 120, 140, 160, and 180 m.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| CAVs | Connected and autonomous vehicles |
| MAPPO | Multi-agent proximal policy optimization |
| CTDE | Centralized training and decentralized execution |
| MARL | Multi-agent reinforcement learning |
| PDA | Perception-Decision-Action |
| ETC | Electronic toll collection |
| MTC | Manual toll collection |
| FVD | Full velocity difference |
| MADDPG | Multi-agent deep deterministic policy gradient |
| QMIX | Monotonic mixing network |
| PPO | Proximal policy optimization |
| GAE | Generalized advantage estimation |
| TD | Temporal difference |
| UAV | Unmanned aerial vehicle |
| MLP | Multilayer perceptron |
| HVs | Human-driven vehicles |
| ETTC | Extended time-to-collision |
References
- Talebpour, A.; Mahmassani, H.S. Influence of Connected and Autonomous Vehicles on Traffic Flow Stability and Throughput. Transportation Research Part C: Emerging Technologies 2016, 71, 143–163. [Google Scholar] [CrossRef]
- Rahman, Md.M.; Thill, J.-C. Impacts of Connected and Autonomous Vehicles on Urban Transportation and Environment: A Comprehensive Review. Sustainable Cities and Society 2023, 96, 104649. [Google Scholar] [CrossRef]
- Liu, W.; Hua, M.; Deng, Z.; Meng, Z.; Huang, Y.; Hu, C.; Song, S.; Gao, L.; Liu, C.; Shuai, B.; et al. A Systematic Survey of Control Techniques and Applications in Connected and Automated Vehicles. IEEE Internet Things J. 2023, 10, 21892–21916. [Google Scholar] [CrossRef]
- Abdelwahab, H.T.; Abdel-Aty, M.A. Artificial Neural Networks and Logit Models for Traffic Safety Analysis of Toll Plazas. Transportation Research Record: Journal of the Transportation Research Board 2002, 1784, 115–125. [Google Scholar] [CrossRef]
- Saad, M.; Abdel-Aty, M.; Lee, J. Analysis of Driving Behavior at Expressway Toll Plazas. Transportation Research Part F: Traffic Psychology and Behaviour 2019, 61, 163–177. [Google Scholar] [CrossRef]
- Fei, Y.; Long, K.; Xing, L.; Pei, X.; Li, X.; Yao, L. Safety Performance Analysis of Toll Plaza Diverging Area Based on an Improved Simulation Platform for Weak-Constraint Driving Behaviors. Accident Analysis & Prevention 2025, 220, 108177. [Google Scholar] [CrossRef]
- Shladover, S.E.; Nowakowski, C.; Lu, X.-Y.; Ferlis, R. Cooperative Adaptive Cruise Control: Definitions and Operating Concepts. Transportation Research Record: Journal of the Transportation Research Board 2015, 2489, 145–152. [Google Scholar] [CrossRef]
- Lukose, E.; Levin, M.W.; Boyles, S.D. Incorporating Insights from Signal Optimization into Reservation-Based Intersection Controls. Journal of Intelligent Transportation Systems 2019, 23, 250–264. [Google Scholar] [CrossRef]
- Kamal, Md.A.S.; Imura, J.; Hayakawa, T.; Ohata, A.; Aihara, K. A Vehicle-Intersection Coordination Scheme for Smooth Flows of Traffic Without Using Traffic Lights. IEEE Trans. Intell. Transport. Syst. 2015, 16, 1136–1147. [Google Scholar] [CrossRef]
- Wu, Y.; Chen, H.; Zhu, F. DCL-AIM: Decentralized Coordination Learning of Autonomous Intersection Management for Connected and Automated Vehicles. Transportation Research Part C: Emerging Technologies 2019, 103, 246–260. [Google Scholar] [CrossRef]
- Boukerche, A.; Zhong, D.; Sun, P. A Novel Reinforcement Learning-Based Cooperative Traffic Signal System Through Max-Pressure Control. IEEE Trans. Veh. Technol. 2022, 71, 1187–1198. [Google Scholar] [CrossRef]
- Zhou, M.; Yu, Y.; Qu, X. Development of an Efficient Driving Strategy for Connected and Automated Vehicles at Signalized Intersections: A Reinforcement Learning Approach. IEEE Trans. Intell. Transport. Syst. 2020, 21, 433–443. [Google Scholar] [CrossRef]
- Zhang, J.; Chang, C.; Zeng, X.; Li, L. Multi-Agent DRL-Based Lane Change With Right-of-Way Collaboration Awareness. IEEE Trans. Intell. Transport. Syst. 2023, 24, 854–869. [Google Scholar] [CrossRef]
- Mirheli, A.; Tajalli, M.; Hajibabai, L.; Hajbabaie, A. A Consensus-Based Distributed Trajectory Control in a Signal-Free Intersection. Transportation Research Part C: Emerging Technologies 2019, 100, 161–176. [Google Scholar] [CrossRef]
- Xing, L.; He, J.; Abdel-Aty, M.; Cai, Q.; Li, Y.; Zheng, O. Examining Traffic Conflicts of Upstream Toll Plaza Area Using Vehicles’ Trajectory Data. Accident Analysis & Prevention 2019, 125, 174–187. [Google Scholar]
- Xing, L.; He, J.; Li, Y.; Wu, Y.; Yuan, J.; Gu, X. Comparison of Different Models for Evaluating Vehicle Collision Risks at Upstream Diverging Area of Toll Plaza. Accident Analysis & Prevention 2020, 135, 105343. [Google Scholar] [CrossRef]
- Aoki, S.; Higuchi, T.; Altintas, O. Cooperative Perception with Deep Reinforcement Learning for Connected Vehicles. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), October 19 2020; IEEE: Las Vegas, NV, USA; pp. 328–334. [Google Scholar]
- Waga, A.; Benhlima, S.; Bekri, A.; Abdouni, J.; Saber, F.Z. A Survey on Autonomous Navigation for Mobile Robots: From Traditional Techniques to Deep Learning and Large Language Models. J. King Saud Univ. Comput. Inf. Sci. 2025, 37, 198. [Google Scholar] [CrossRef]
- Gregurić, M.; Kušić, K.; Ivanjko, E. Impact of Deep Reinforcement Learning on Variable Speed Limit Strategies in Connected Vehicles Environments. Engineering Applications of Artificial Intelligence 2022, 112, 104850. [Google Scholar] [CrossRef]
- Jin, J.; Huang, H.; Li, Y.; Dong, Y.; Zhang, G.; Chen, J. Variable Speed Limit Control Strategy for Freeway Tunnels Based on a Multi-Objective Deep Reinforcement Learning Framework with Safety Perception. Expert Systems with Applications 2025, 267, 126277. [Google Scholar] [CrossRef]
- Li, G.; Qiu, Y.; Yang, Y.; Li, Z.; Li, S.; Chu, W.; Green, P.; Li, S.E. Lane Change Strategies for Autonomous Vehicles: A Deep Reinforcement Learning Approach Based on Transformer. IEEE Trans. Intell. Veh. 2023, 8, 2197–2211. [Google Scholar] [CrossRef]
- Zhang, S.; Zhuang, W.; Li, B.; Li, K.; Xia, T.; Hu, B. Integration of Planning and Deep Reinforcement Learning in Speed and Lane Change Decision-Making for Highway Autonomous Driving. IEEE Trans. Transp. Electrific. 2025, 11, 521–535. [Google Scholar] [CrossRef]
- Fei, Y.; Xing, L.; Yao, L.; Yang, Z.; Zhang, Y. Deep Reinforcement Learning for Decision Making of Autonomous Vehicle in Non-Lane-Based Traffic Environments. PLoS ONE 2025, 20, e0320578. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, Y.; Zhang, X.S.; Zang, Y.; Cheng, J. Intrinsic Action Tendency Consistency for Cooperative Multi-Agent Reinforcement Learning. AAAI 2024, 38, 17600–17608. [Google Scholar] [CrossRef]
- Xing, L.; Zou, D.; Fei, Y.; Long, K.; Wang, J. Safety Evaluation of Toll Plaza Diverging Area Considering Different Vehicles’ Toll Collection Types. Applied Sciences 2023, 13, 9005. [Google Scholar] [CrossRef]
- Bai, R.; Xu, R.; Rui, T.; Liu, J.; Lee, H.L.; Oung, Q.W.; Tian, Z.; Yuan, F. Safe and Efficient Lane-Changing for Autonomous Vehicles: An Improved Double Quintic Polynomial Approach with Time-to-Collision Evaluation. J. King Saud Univ. Comput. Inf. Sci. 2026, 38, 36. [Google Scholar] [CrossRef]
- Li, Y.; Li, L.; Ni, D. Dynamic Trajectory Planning for Automated Lane Changing Using the Quintic Polynomial Curve. Journal of Advanced Transportation 2023, 2023, 1–14. [Google Scholar] [CrossRef]
- Kumar, P.; Perrollaz, M.; Lefevre, S.; Laugier, C. Learning-Based Approach for Online Lane Change Intention Prediction. In Proceedings of the 2013 IEEE Intelligent Vehicles Symposium (IV), June 2013; IEEE: Gold Coast City, Australia; pp. 797–802. [Google Scholar]
- Shi, Q.; Zhang, H. An Improved Learning-Based LSTM Approach for Lane Change Intention Prediction Subject to Imbalanced Data. Transportation Research Part C: Emerging Technologies 2021, 133, 103414. [Google Scholar] [CrossRef]
- Peng, J.; Guo, Y.; Fu, R.; Yuan, W.; Wang, C. Multi-Parameter Prediction of Drivers’ Lane-Changing Behaviour with Neural Network Model. Applied Ergonomics 2015, 50, 207–217. [Google Scholar] [CrossRef]
- Song, X.-M.; Jin, S.; Wang, D.-H.; Cao, J.-H. Vehicle-Following Model Considering Lateral Offset. Journal of Jilin University(Engineering and Technology Edition) 2011, 41, 333–337. [Google Scholar]
- Qi, W.; Ma, S.; Fu, C. An Improved Car-Following Model Considering the Influence of Multiple Preceding Vehicles in the Same and Two Adjacent Lanes. Physica A: Statistical Mechanics and its Applications 2023, 632, 129356. [Google Scholar] [CrossRef]
- Helbing, D.; Tilch, B. Generalized Force Model of Traffic Dynamics. Phys. Rev. E 1998, 58, 133–138. [Google Scholar] [CrossRef]
- Hoel, C.-J.; Wolff, K.; Laine, L. Automated Speed and Lane Change Decision Making Using Deep Reinforcement Learning. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, November 2018; IEEE; pp. 2148–2155. [Google Scholar]
- Zheng, O.; Abdel-Aty, M.; Wu, Y. UCF-SST Automated Roadway Conflicts Identify System (ARCIS). Available online: https://github.com/fatemehjdi/A-R-C-I-S (accessed on 15 March 2026).












| Variable | Description | |
| Vehicle-related variables | Longitudinal position of SV at time step . | |
| Lateral position of SV at time step . | ||
| The velocity of SV in X direction at time step . | ||
| The velocity of SV in Y direction at time step . | ||
| Longitudinal acceleration of SV at time step . | ||
| The current toll collection type of SV, 0 for a MTC vehicle, 1 for an ETC vehicle. | ||
| The initial lane of SV before it enters the diverging area. | ||
| Presence of another vehicle in the left area at time . (1 = Yes, 0 = No) | ||
| Presence of another vehicle in the right area at time . (1 = Yes, 0 = No) | ||
| Presence of another vehicle in the right-behind area at time . (1 = Yes, 0 = No) | ||
| Presence of another vehicle in the left-behind area at time . (1 = Yes, 0 = No) | ||
| Path-related variables |
Available longitudinal distance on path at time . | |
| Required steering magnitude for selecting path at time (positive: leftward turn, negative: rightward turn) | ||
| The number of vehicles queued on path at time | ||
| Mainline lane | Lane ID | 1 | 2 | 3 | Total | ||||
| Toll type | ETC | MTC | ETC | MTC | ETC | MTC | ETC | MTC | |
| Vehicle counts | 115 | 29 | 202 | 54 | 122 | 106 | 439 | 189 | |
| Toll lane | Lane ID | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Toll type | ETC | ETC | ETC | ETC | ETC | MTC | MTC | MTC | |
| Vehicle counts | 165 | 128 | 94 | 42 | 10 | 94 | 69 | 26 | |
| Parameters | Values | Parameters | Values | |
| Number of hidden layers | 2 | Learning rate | 0.001 | |
| Number of units per layer | 256 | Learning rate | 0.001 | |
| Entropy coefficient | 0.1 | Batch size | 128 | |
| Discount coefficient | 0.98 | Buffer size | 20000 | |
| coefficient | 0.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).