Submitted:
28 April 2026
Posted:
29 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A hierarchical cruise-to-park AVP framework is developed by integrating camera-triggered slot detection, NMPC-based long-range cruising, and TD3-based low-speed parking within a unified simulation environment.
- A customized NMPC cost design is introduced for the cruising phase to improve long-horizon path-following performance by jointly regulating tracking accuracy, vehicle speed, and control smoothness in structured parking-lot routes.
- A time-penalized TD3 reward formulation is proposed for the parking phase to reduce inefficient oscillation or spinning near the goal and to improve convergence toward the target pose during collision-aware docking.
- The proposed framework is validated on multiple previously unseen parking slots under fixed controller settings to examine phase transition behavior, docking feasibility, and target-slot transfer within the same parking-lot layout.
2. Methodology
2.1. Problem Formulation and System Architecture
2.2. Simulation Environment
2.3. Vehicle Model and Constraints
2.4. Perception and Switching Logic
2.4.1. Camera-Based Free Slot Detection
2.4.2. LiDAR Feedback for Final Parking
2.4.3. Seamless Cruise-to-Park Transition
2.5. NMPC Cruising Controller
2.5.1. NMPC Optimization Formulation
2.5.2. Custom NMPC Cost Function
2.6. TD3 Parking Controller
2.6.1. MDP Formulation
2.6.2. Network Architecture and Training Configuration

2.6.3. Reward Shaping with an Explicit Time Penalty
2.7. Metrics and Evaluation Setup
2.7.1. Training–Validation Split and Generalization Setup
2.7.2. Metrics and Logging
3. Results
3.1. Overall Performance of the Proposed AVP Framework
3.2. NMPC Cruising Results
3.3. TD3 Parking Results
3.4. Validation in Unseen Slots
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Khalid, M.; Wang, K.; Aslam, N.; Cao, Y.; Ahmad, N.; Khan, M.K. From Smart Parking towards Autonomous Valet Parking: A Survey, Challenges and Future Works. J. Netw. Comput. Appl. 2021, 175, 102935. [Google Scholar] [CrossRef]
- Paden, B.; Čáp, M.; Yong, S.Z.; Yershov, D.; Frazzoli, E. A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles. IEEE Trans. Intell. Veh. 2016, 1, 33–55. [Google Scholar] [CrossRef]
- Liu, W.; Li, Z.; Li, L.; Wang, F.-Y. Parking Like a Human: A Direct Trajectory Planning Solution. IEEE Trans. Intell. Transp. Syst. 2017, 18, 3388–3397. [Google Scholar] [CrossRef]
- Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Sallab, A.A.A.; Yogamani, S.; Pérez, P. Deep Reinforcement Learning for Autonomous Driving: A Survey. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4909–4926. [Google Scholar] [CrossRef]
- Jang, C.; Kim, C.; Lee, S.; Kim, S.; Lee, S.; Sunwoo, M. Re-Plannable Automated Parking System with a Standalone Around View Monitor for Narrow Parking Lots. IEEE Trans. Intell. Transp. Syst. 2020, 21, 777–790. [Google Scholar] [CrossRef]
- Qin, Z.; Chen, X.; Hu, M.; Chen, L.; Fan, J. A Novel Path Planning Methodology for Automated Valet Parking Based on Directional Graph Search and Geometry Curve. Robot. Auton. Syst. 2020, 132, 103606. [Google Scholar] [CrossRef]
- Shi, J.; Li, K.; Piao, C.; Gao, J.; Chen, L. Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertical Parking Spaces. Sensors 2023, 23, 7124. [Google Scholar] [CrossRef]
- Song, S.; Chen, H.; Sun, H.; Liu, M. Data Efficient Reinforcement Learning for Integrated Lateral Planning and Control in Automated Parking System. Sensors 2020, 20, 7297. [Google Scholar] [CrossRef]
- Li, B.; Acarman, T.; Zhang, Y.; Ouyang, Y.; Yaman, C.; Kong, Q.; Zhong, X.; Peng, X. Optimization-Based Trajectory Planning for Autonomous Parking with Irregularly Placed Obstacles: A Lightweight Iterative Framework. IEEE Trans. Intell. Transp. Syst. 2022, 23, 11970–11981. [Google Scholar] [CrossRef]
- Ren, H.; Niu, Y.; Li, Y.; Yang, L.; Gao, H. Automatic Parking Trajectory Planning Based on Warm Start Nonlinear Dynamic Optimization. Sensors 2025, 25, 112. [Google Scholar] [CrossRef]
- Li, B.; Yin, Z.; Ouyang, Y.; Zhang, Y.; Zhong, X.; Tang, S. Online Trajectory Replanning for Sudden Environmental Changes During Automated Parking: A Parallel Stitching Method. IEEE Trans. Intell. Veh. 2022, 7, 748–757. [Google Scholar] [CrossRef]
- Tang, X.; Yang, Y.; Liu, T.; Lin, X.; Yang, K.; Li, S. Path Planning and Tracking Control for Parking via Soft Actor-Critic Under Non-Ideal Scenarios. IEEE/CAA J. Autom. Sin. 2024, 11, 181–195. [Google Scholar] [CrossRef]
- Alighanbari, S.; Azad, N. L. Deep Reinforcement Learning with NMPC Assistance Nash Switching for Urban Autonomous Driving. IEEE Trans. Intell. Veh. 2023, 8, 2604–2615. [Google Scholar] [CrossRef]
- Dang, F.; Chen, D.; Chen, J.; Li, Z. Event-Triggered Model Predictive Control with Deep Reinforcement Learning for Autonomous Driving. IEEE Trans. Intell. Veh. 2024, 9, 463–468. [Google Scholar] [CrossRef]
- Suhr, J.K.; Jung, H.G. End-to-End Trainable One-Stage Parking Slot Detection Integrating Global and Local Information. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4570–4582. [Google Scholar] [CrossRef]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018) Proceedings of Machine Learning Research; Volume, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1587–1596. Available online: https://proceedings.mlr.press/v80/fujimoto18a.html.
- Jeng, S.-L.; Chiang, C. End-to-End Autonomous Navigation Based on Deep Reinforcement Learning with a Survival Penalty Function. Sensors 2023, 23, 8651. [Google Scholar] [CrossRef] [PubMed]
- MathWorks. Train PPO Agent for Automatic Parking Valet; MATLAB & Simulink Documentation . Available online: https://www.mathworks.com/help/reinforcement-learning/ug/train-ppo-agent-for-automatic-parking-valet.html (accessed on 27 March 2026).
- Kong, J.; Pfeiffer, M.; Schildbach, G.; Borrelli, F. Kinematic and Dynamic Vehicle Models for Autonomous Driving Control Design. In Proceedings of the 2015 IEEE Intelligent Vehicles Symposium (IV), Seoul, Republic of Korea, 28 June–1 July 2015; pp. 1094–1099. [Google Scholar] [CrossRef]
- Mayne, D.Q.; Rawlings, J.B.; Rao, C.V.; Scokaert, P.O.M. Constrained Model Predictive Control: Stability and Optimality. Automatica 2000, 36, 789–814. [Google Scholar] [CrossRef]
- MathWorks. Configure Optimization Solver for Nonlinear MPC; MATLAB & Simulink Documentation . Available online: https://www.mathworks.com/help/mpc/ug/configure-optimization-solver-for-nonlinear-mpc.html (accessed on 27 March 2026).
- Al-Mousa, A.; Arrabi, A.; Daoud, H. A Reinforcement Learning-Based Reverse-Parking System for Autonomous Vehicles . IET Intell. Transp. Syst. 2025, 19, e12614. [Google Scholar] [CrossRef]
- Leng, B.; Yu, Y.; Liu, M.; Cao, L.; Yang, X.; Xiong, L. Deep Reinforcement Learning-Based Drift Parking Control of Automated Vehicles. Sci. China Technol. Sci. 2023, 66, 1152–1165. [Google Scholar] [CrossRef]
- Ng, A.Y.; Harada, D.; Russell, S.J. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML 1999), 1999; pp. 278–287. [Google Scholar] [CrossRef]
- Knox, W.B.; Allievi, A.; Banzhaf, H.; Schmitt, F.; Stone, P. Reward (Mis)design for Autonomous Driving. Artif. Intell. 2023, 316, 103829. [Google Scholar] [CrossRef]
- Lim, W.; Lee, S.; Sunwoo, M.; Jo, K. Hierarchical Trajectory Planning of an Autonomous Car Based on the Integration of a Sampling and an Optimization Method. IEEE Trans. Intell. Transp. Syst. 2018, 19, 613–626. [Google Scholar] [CrossRef]
- Chai, R.; Liu, D.; Liu, T.; Tsourdos, A.; Xia, Y.; Chai, S. Deep Learning-Based Trajectory Planning and Control for Autonomous Ground Vehicle Parking Maneuver. IEEE Trans. Autom. Sci. Eng. 2023, 20, 1633–1647. [Google Scholar] [CrossRef]
- Zhang, J.; Chen, H.; Song, S.; Hu, F. Reinforcement Learning-Based Motion Planning for Automatic Parking System. IEEE Access 2020, 8, 154485–154501. [Google Scholar] [CrossRef]
- Guvenc, L.; Aksun-Guvenc, B.; Zhu, S.; Gelbal, S.Y. Autonomous Road Vehicle Path Planning and Tracking Control; Wiley, IEEE Press, 2022. [Google Scholar]
- Cao, X.; Chen, H.; Gelbal, S.Y.; Aksun-Guvenc, B.; Guvenc, L. Vehicle-in-Virtual-Environment (VVE) Method for Autonomous Driving System Development, Evaluation and Demonstration. Sensors 2023, 23, 5088. [Google Scholar] [CrossRef]
- Wen, B.; Gelbal, S.Y.; Guvenc, B.A.; Guvenc, L. Localization and Perception for Control and Decision-Making of a Low-Speed Autonomous Shuttle in a Campus Pilot Deployment. SAE Int. J. Connect. Autom. Veh. 2018, 1, 53–66. [Google Scholar] [CrossRef]
- Necipoglu, S.; Cebeci, S.A.; Has, Y.E.; Guvenc, L.; Basdogan, C. Robust repetitive controller for fast AFM imaging. IEEE Trans. Nanotechnol. 2011, 10, 1074–1082. [Google Scholar] [CrossRef]
- Guvenc, L.; Guvenc, B.A.; Demirel, B.; Emirler, M.T. Control of Mechatronic Systems; The Institution of Engineering and Technology: London, UK, 2017. [Google Scholar]
- Wang, H.; Gelbal, S.Y.; Guvenc, L. Multi-objective digital PID controller design in parameter space and its application to automated path following. IEEE Access 2021, 9, 46874–46885. [Google Scholar] [CrossRef]
- Zhu, S.; Wang, J.; Yang, Y.; Aksun-Guvenc, B. Stability of Local Trajectory Planning for Level-2+ Semi-Autonomous Driving without Absolute Localization. Electronics 2024, 13, 3808. [Google Scholar] [CrossRef]
- Zhu, S.; Aksun-Guvenc, B. Trajectory planning of autonomous vehicles based on parameterized control optimization in dynamic on-road environments. J. Intell. Robot. Syst. 2020, 100, 1055–1067. [Google Scholar] [CrossRef]












| Item | Symbol | Value |
| Controller sample time | 0.1 s | |
| Longitudinal speed bound | ||
| Steering angle bound | ||
| Workspace limits | defined by parking-lot map bounds |
| Category | Parameter | Value |
| Tracking | 40 | |
| Tracking | 8 | |
| Tracking | 25 | |
| Effort | 0.6 | |
| Effort | 2.0 | |
| Smoothness | 6.0 | |
| Smoothness | 10.0 | |
| Terminal | 120 |
| Hyperparameter | Value | Explanation |
| Observation dimension | 16 | Target-relative pose features and normalized LiDAR ranges |
| Action dimension | 1 | Continuous steering command |
| Actor hidden layers | Two fully connected hidden layers | |
| Critic hidden layers | Two fully connected hidden layers for each critic | |
| Actor learning rate | Learning rate of the policy network | |
| Critic learning rate | Learning rate of each Q-network | |
| Discount factor | 0.99 | Discount factor for future rewards |
| Experience buffer length | Capacity of the replay buffer | |
| Mini-batch size | 256 | Number of sampled transitions per update |
| Sample time | 0.1 s | Agent interaction and update interval |
| Delayed policy update frequency | 2 | Delayed policy update interval |
| Target smooth factor | Soft target update factor | |
| Target policy smoothing variance | 0.1 | Variance of smoothing noise for target actions |
| Target policy smoothing bounds | Clipping range for target smoothing noise | |
| Exploration noise standard deviation |
0.15 | Initial exploration noise level |
| Exploration noise minimum | 0.02 | Minimum exploration noise level |
| Exploration noise decay rate | Decay rate of exploration noise | |
| Target policy smoothing decay rate | Decay rate of target smoothing variance |
|
| Gradient threshold | 1 | Gradient clipping threshold |
| Actor regularization | Regularization for the actor optimizer |
|
| Maximum episodes | 15000 | Training budget |
| Maximum steps per episode | 500 | Episode horizon |
| Component | Value | Role |
| Distance reward scale | 2 | Scales the position-based reward |
| Longitudinal position weight | 0.05 | Penalizes target-relative -error in distance reward |
| Lateral position weight | 0.04 | Penalizes target-relative -error in distance reward |
| Progress reward scale | 3 | Scales the stepwise progress term |
| Progress saturation lower bound | 0 | Prevents negative progress reward |
| Progress saturation upper bound | 0.1 | Limits excessively large progress increments |
| Orientation reward scale | 0.1 | Scales the heading-alignment reward |
| Orientation error weight | 20 | Penalizes heading mismatch in orientation reward |
| Steering penalty weight | 0.05 | Penalizes large steering commands |
| Steering increment penalty weight | 0.1 | Penalizes rapid steering variation |
| Parking bonus | 100 | Reward for successful parking |
| Invalid-operation penalty | -50 | Penalty for collision or invalid termination |
| Time penalty | -0.02 | Per-step penalty during active episode |
| Slot | Duration (s) | Position Error (m) | Lateral Error (m) | Longitudinal Error (m) | Yaw Error (rad) |
| 14 | 6.4 | 0.7403 | -0.5601 | -0.4841 | 0.1164 |
| 15 | 6.8 | 0.6573 | -0.3459 | -0.5589 | 0.0362 |
| 23 | 6.5 | 0.6626 | -0.5036 | -0.4305 | 0.0803 |
| 39 | 6.4 | 0.6655 | -0.5504 | -0.3740 | 0.0880 |
| 47 | 7.4 | 1.3380 | 1.0361 | -0.8466 | 0.0077 |
| 64 | 6.7 | 1.1849 | -0.7824 | -0.8898 | 0.1623 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).