Submitted:
27 May 2026
Posted:
28 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Problem Formulation
2.1. Vehicle Dynamic Model
2.2. NMPC Controller for Trajectory Tracking
3. Construction of PPO+NMPC+CBF Framework
3.1. PPO Model Training
| Algorithm 1. Offline PPO Training Framework. |
| Input: Initial policy , value network , reference path environment, NMPC solver. |
| Output: Trained policy . Initialize networks and empty rollout buffer ; while training budget not exhausted do Reset environment; obtain ; while episode not finish do Policy outputs to rescale () diagonals; NMPC solves horizon problem; apply first control; Simulate environment to obtain and reward ; Store () in ; end Estimate GAE and value targets from ; Update via clipped surrogate with entropy; Update by regression to value targets; end Return trained policy . |
3.2. PPO+NMPC
3.3. PPO+NMPC+CBF
4. Experiments
4.1. Simulation Setup
4.2. Simulation Results and Analysis
4.3. Robustness Evaluation
5. Conclusions
Acknowledgments
Conflicts of Interest
Appendix A
Appendix B



References
- Marino, R.; Scalzi, S.; Orlando, G.; Netto, M. A nested PID steering control for lane keeping in vision-based autonomous vehicles. Proc. Amer. Control Conf., Jun. 2009; pp. 2885–2890. [Google Scholar]
- Du, X.; Tan, K. K.; Htet, K. K. K. Vision approach towards fully self-reverse parking system. Proc. IEEE Int. Conf. Mechatronics Autom., Aug. 2014; pp. 186–191. [Google Scholar]
- Du, X.; Tan, K. K. Autonomous reverse parking system based on robust path generation and improved sliding mode control. IEEE Trans. Intell. Transp. Syst. 2015, vol. 16(no. 3), 1225–1237. [Google Scholar] [CrossRef]
- Chen, X.; Bao, Q.; Zhang, B. Research on 4WIS electric vehicle path tracking control based on adaptive fuzzy PID algorithm. Proc. Chinese Control Conf., Jul. 2019; pp. 6753–6760. [Google Scholar]
- Yeh, Y.-C.; Li, T.-H. S.; Chen, C.-Y. Adaptive fuzzy sliding-mode control of dynamic model-based car-like mobile robot. Int. J. Fuzzy Syst. 2009, vol. 11(no. 4), 272–281. [Google Scholar]
- Limon, D.; Ferramosca, A.; Alvarado, I.; Alamo, T. Model predictive control for setpoint tracking. arXiv 2024, arXiv:2403.02973. [Google Scholar] [CrossRef]
- Zhang, C.; Chu, D.; Liu, S.; Deng, Z.; Wu, C.; Su, X. Trajectory planning and tracking for autonomous vehicle based on state lattice and model predictive control. IEEE Intell. Transp. Syst. Mag. 2019, vol. 11(no. 2), 29–40. [Google Scholar] [CrossRef]
- Wang, H.; Liu, B.; Ping, X.; An, Q. Path tracking control for autonomous vehicles based on an improved MPC. IEEE Access 2019, vol. 7, 161064–161073. [Google Scholar] [CrossRef]
- Stano, P.; Montanaro, U.; Tavernini, D.; Tufo, M.; Fiengo, G.; Novella, L.; et al. Model predictive path tracking control for automated road vehicles: A review. Annu. Rev. Control 2023, vol. 55, 194–236. [Google Scholar] [CrossRef]
- Köhler, J.; Müller, M. A.; Allgöwer, F. A nonlinear tracking model predictive control scheme for dynamic target signals. Automatica 2020, vol. 118, Art.(no. 109030). [Google Scholar] [CrossRef]
- Yuan, H.; Sun, X.; Gordon, T. Unified decision-making and control for highway collision avoidance using active front steer and individual wheel torque control. Veh. Syst. Dyn. 2019, vol. 57(no. 8), 1188–1205. [Google Scholar] [CrossRef]
- Zhu, G.; Jie, H.; Hong, W. NMPC-based path tracking control strategy for 4WID autonomous vehicle considering handling stability under extreme conditions. Proc. 7th CAA Int. Conf. Veh. Control Intell., Oct. 2023; pp. 1–6. [Google Scholar]
- Du, X.; Htet, K. K. K.; Tan, K. K. Development of a genetic-algorithm-based nonlinear model predictive control scheme on velocity and steering of autonomous vehicles. IEEE Trans. Ind. Electron. 2016, vol. 63(no. 11), 6970–6977. [Google Scholar] [CrossRef]
- Le, V.; Malikopoulos, A. Controller adaptation via learning solutions of contextual Bayesian optimization. IEEE Robot. Autom. Lett. vol. 10, 8308–8315, 2025. [CrossRef]
- Alcalá, E.; Puig, V.; Quevedo, J.; Rosolia, U. Autonomous racing using Linear Parameter Varying-Model Predictive Control (LPV-MPC). Control Eng. Pract. 2020, vol. 95, Art.(no. 104270). [Google Scholar] [CrossRef]
- Zarrouki, B.; Spanakakis, M.; Betz, J. A safe reinforcement learning driven weights-varying model predictive control for autonomous vehicle motion control. Proc. IEEE Intell. Veh. Symp. (IV) 2024, 1401–1408. [Google Scholar] [CrossRef]
- Ostafew, C. J.; Schoellig, A. P.; Barfoot, T. D. Robust constrained learning-based NMPC enabling reliable mobile robot path tracking. Int. J. Robot. Res. 2016, vol. 35(no. 13), 1547–1563. [Google Scholar] [CrossRef]
- Pannek, J.; Gerdts, M. Performance of sensitivity based NMPC updates in automotive applications. arXiv 2014, arXiv:1401.3548. [Google Scholar] [CrossRef]
- Diehl, M.; Ferreau, H. J.; Haverbeke, N. “Efficient numerical methods for nonlinear MPC and moving horizon estimation,” in Nonlinear Model Predictive Control: Towards New Challenging Applications; Magni, L., Raimondo, D. M., Allgöwer, F., Eds.; Springer: Berlin, Heidelberg, 2009; pp. 391–417. [Google Scholar]
- Kayacan, E.; Saeys, W.; Ramon, H.; Belta, C.; Peschel, J. M. Experimental validation of linear and nonlinear MPC on an articulated unmanned ground vehicle. IEEE/ASME Trans. Mechatron. 2018, vol. 23(no. 5), 2023–2030. [Google Scholar] [CrossRef]
- Stella, L.; Themelis, A.; Sopasakis, P.; Patrinos, P. A simple and efficient algorithm for nonlinear model predictive control. Proc. 56th IEEE Conf. Decis. Control, Dec. 2017; pp. 1939–1944. [Google Scholar]
- Goodwin, G. C.; Cea, M. G.; Seron, M. M.; Ferris, D.; Middleton, R. H.; Campos, B. Opportunities and challenges in the application of nonlinear MPC to industrial problems. Proc. IFAC World Congr., Aug. 2012; pp. 39–49. [Google Scholar]
- Pane, Y. P.; Nageshrao, S. P.; Babuška, R. Actor-critic reinforcement learning for tracking control in robotics. Proc. 55th IEEE Conf. Decis. Control, Dec. 2016; pp. 5819–5826. [Google Scholar]
- Kosta, K.; Anwar, M. A.; Panda, P.; Raychowdhury, A.; Roy, K. RAPID-RL: A reconfigurable architecture with preemptive-exits for efficient deep-reinforcement learning. Proc. IEEE Int. Conf. Robot. Autom., May 2022; pp. 7492–7498. [Google Scholar]
- Riemer, M.; Subbaraj, G.; Berseth, G.; Rish, I. Enabling realtime reinforcement learning at scale with staggered asynchronous inference. arXiv 2024, arXiv:2412.14355. [Google Scholar] [CrossRef]
- Shan, Y.; Zheng, B.; Chen, L.; Chen, L.; Chen, D. A reinforcement learning-based adaptive path tracking approach for autonomous driving. IEEE Trans. Veh. Technol. 2020, vol. 69(no. 10), 10581–10595. [Google Scholar] [CrossRef]
- Sierra-Garcia, J. E.; Santos, M. Combining reinforcement learning and conventional control to improve automatic guided vehicles tracking of complex trajectories. Expert Syst. 2023, vol. 41(no. 2), Art. no. e13076. [Google Scholar] [CrossRef]
- Bellegarda, G.; Byl, K. An online training method for augmenting MPC with deep reinforcement learning. Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 2020; pp. 5453–5459. [Google Scholar]
- Chen, Z.; Lai, J.; Li, P.; Awad, O. I.; Zhu, Y. Prediction horizon-varying model predictive control (MPC) for autonomous vehicle control. Electronics 2024, vol. 13(no. 8, Art. no. 1442). [Google Scholar] [CrossRef]
- Rokonuzzaman, M.; Mohajer, N.; Nahavandi, S.; Mohamed, S. Model predictive control with learned vehicle dynamics for autonomous vehicle path tracking. IEEE Access 2021, vol. 9, 128233–128249. [Google Scholar] [CrossRef]
- Akmandor, N. Ü.; Prajapati, S.; Zolotas, M.; Padır, T. Re4MPC: Reactive nonlinear MPC for multi-model motion planning via deep reinforcement learning. arXiv 2025, arXiv:2506.08344. [Google Scholar]
- Reiter, R.; Ghezzi, A.; Baumgärtner, K.; Hoffmann, J.; McAllister, R. D.; Diehl, M. AC4MPC: Actor-critic reinforcement learning for nonlinear model predictive control. arXiv 2024, arXiv:2406.03995. [Google Scholar]
- Martinsen, B.; Lekkas, A. M.; Gros, S. Reinforcement learning-based NMPC for tracking control of ASVs: Theory and experiments. Control Eng. Pract. 2022, vol. 120, Art.(no. 105024). [Google Scholar] [CrossRef]
- Berg, H. S.; Menges, D.; Tengesdal, T.; Rasheed, A. Digital twin syncing for autonomous surface vessels using reinforcement learning and nonlinear model predictive control. Sci. Rep. 2025, vol. 15(no. 1, Art. no. 9344). [Google Scholar] [CrossRef]
- Mehndiratta, M.; Camci, E.; Kayacan, E. Automated tuning of nonlinear model predictive controller by reinforcement learning. Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 2018; pp. 3016–3021. [Google Scholar]
- Ceusters, G.; Camargo, L. R.; Franke, R.; Nowé, A.; Messagie, M. Safe reinforcement learning for multi-energy management systems with known constraint functions. arXiv 2022, arXiv:2207.03830. [Google Scholar] [CrossRef]
- Malu, S. K.; Majumdar, J. Kinematics, localization and control of differential drive mobile robot. Glob. J. Res. Eng. 2014, vol. 14(no. H1), 1–7. [Google Scholar]
- Yang, H.; Deng, F.; He, Y.; Jiao, D.; Han, Z. Robust nonlinear model predictive control for reference tracking of dynamic positioning ships based on nonlinear disturbance observer. Ocean Eng. 2020, vol. 215, Art.(no. 107885). [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-dimensional continuous control using generalized advantage estimation. arXiv 2015, arXiv:1506.02438. [Google Scholar]
- Han, M.; Zhang, L.; Wang, J.; Pan, W. Actor-critic reinforcement learning for control with stability guarantee. IEEE Robot. Autom. Lett. 2020, vol. 5(no. 4), 6217–6224. [Google Scholar] [CrossRef]
- Devlin, S.; Kudenko, D. Dynamic potential-based reward shaping. Proc. Int. Conf. Autonomous Agents Multiagent Syst., Jun. 2012; pp. 433–440. [Google Scholar]
- Ames, D.; Coogan, S.; Egerstedt, M.; Notomista, G.; Sreenath, K.; Tabuada, P. Control barrier functions: Theory and applications. Proc. Eur. Control Conf., Jun. 2019; pp. 3420–3431. [Google Scholar]
- Horváth, Z.; Song, Y.; Terlaky, T. Invariance conditions for nonlinear dynamical systems. arXiv 2016, arXiv:1607.01107. [Google Scholar] [CrossRef]
- Ames, D.; Xu, X.; Grizzle, J. W.; Tabuada, P. Control barrier function based quadratic programs for safety critical systems. IEEE Trans. Autom. Control 2017, vol. 62(no. 8), 3861–3876. [Google Scholar] [CrossRef]
- Suwartadi, E.; Kungurtsev, V.; Jäschke, J. Sensitivity-Based Economic NMPC with a Path-Following Approach. Processes 2017, vol. 5(no. 1), 8. [Google Scholar] [CrossRef]
- Zhang, H.; Li, P.; García, C. E. Robust stability of nonlinear model predictive control based on extended Kalman filter. J. Process Control 2012, vol. 22(no. 1), 82–89. [Google Scholar]
| Sheng Jin received the B.E. degree in Automation from Guangdong University of Technology, Guangzhou, China, in 2024. He is currently pursuing the M.Sc. degree in Advanced Control and Systems Engineering with the Department of Electrical and Electronic Engineering at the University of Manchester, Manchester, U.K. His research interests include learning-based control and reinforcement learning. |
| Joel Loh received the Ph.D. degree in Electrical and Computer Engineering from the University of Toronto, Toronto, Canada, in 2020. He is currently a Dame Kathleen Ollernshaw Fellow (Assistant Professor) with the Department of Electrical and Electronic Engineering, University of Manchester, Manchester, U.K. His research interests include metamaterials, memristors, chemical sensing, and artificial intelligence. |









| Parameter | Value |
|---|---|
| Vehicle mass | |
| Inertia | |
| Wheel Radius | |
| Wheelbase | |
| Rolling Friction coef. | |
| Velocity Damping coef. | |
| Angular Damping coef. | |
| NMPC Horizon | |
| Time Step | |
| Max Velocity | |
| Max Torque |
| Metrics | NMPC | PPO+NMPC | NMPC+CBF | PPO+NMPC+CBF |
|---|---|---|---|---|
| Steps | ||||
| Mean position error | ||||
| Path length | ||||
| Total energy | ||||
| Energy efficiency |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).