Submitted:
19 August 2025
Posted:
20 August 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- To address the attacking-multi interceptor-target adversarial scenario, a Markov Decision Process (MDP) model is constructed. This model takes the observable states of both sides as input and outputs the penetration acceleration commands for the attacking missile, enabling intelligent maneuvering penetration decisions in a continuous state space.
- To tackle the coupling problem between penetration maneuvers and guidance tasks, a multi-objective reward function is designed. It maximizes the penetration success rate while constraining the maneuvering range through an energy consumption penalty term, ensuring terminal strike accuracy.
- To overcome the training efficiency bottleneck caused by sparse rewards, a fusion of Generative Adversarial Imitation Learning (GAIL) and Proximal Policy Optimization (PPO) algorithms is proposed. Expert trajectory priors are utilized to guide exploration, significantly improving policy sampling efficiency and asymptotic performance.
2. Optimal BANG-BANG Penetration Strategy
2.1. Mathematical Model of Engagement Scenario
2.2. Derivation of the Optimal BANG-BANG Penetration Strategy
2.3. Performance Evaluation of the BANG-BANG Penetration Strategy

3. GAIL-PPO Penetration Strategy
3.1. Construction of the MDP Model for the Penetration Process
3.1.1. Definition of the State Space
3.1.2. Definition of the Action Space
3.1.3. Definition of the Reward Function
3.2. GAIL-PPO Algorithm
3.2.1. GAIL Training Network Construction

3.2.2. PPO Training Network Construction
3.2.3. Training Procedure of the GAIL-PPO Algorithm

4. Training Results and Performance Validation
4.1. Training Results
| Parameters | Value |
|---|---|
| Discount Factor | 0.99 |
| Clip Factor | 0.1 |
| Entropy Loss Weight | 0.05 |
| GAE Factor | 0.95 |
| Mini Batch Size | 128 |
| Experience Horizon | 1024 |
| Sample Time | 0.01 |
| Interceptor 1 Initial Position(m) | ([45000,50000], -10000) |
| Interceptor 2 Initial Position(m) | ([45000,50000], 10000) |
| Target Position(m) | ([50000,55000], -10000) |
| LOS angle(rad) | [,] |


4.2. Performance Validation


4.3. Monte Carlo Simulation
4.3.1. Testing in Training Parameters

4.3.2. Testing Under Non-Training Parameters

5. Conclusions
- The BANG-BANG penetration strategy, which maximizes the miss distance in a one-on-one attacking missile-interceptor scenario, is derived and utilized as expert experience for GAIL training.
- An MDP model tailored for penetration and guidance adversarial scenarios is established. A reward function is designed to reduce energy consumption while ensuring mission success, considering both penetration and guidance tasks comprehensively.
- A combined GAIL-PPO agent training method is proposed. Compared to the pure PPO algorithm, the convergence speed is improved by 50%.
- Monte Carlo simulation results validate the effectiveness of the proposed strategy. In the trained parameter scenarios, the penetration success rate reaches 98.5%, significantly outperforming both the BANG-BANG strategy and the PPO strategy. Even in untrained scenarios, the strategy achieves a penetration success rate of 86.3% and a mission success rate of 77%, demonstrating its robustness and generalization ability.
References
- Wright, D.; Tracy, C.L. Hypersonic Weapons: Vulnerability to Missile Defenses and Comparison to MaRVs. Sci. Glob. Secur. 2023, 31, 68–114. [CrossRef]
- Jon, H.; Oscar, W. S-400 and S-500: Russia’s Long-Range Air Defenders. Jane’s Int. Def. Rev. 2019, 52, 56–60.
- Yu, J.; Dong, X.; Li, Q. et al. Distributed Cooperative Encirclement Hunting Guidance for Multiple Flight Vehicles System. Aerosp. Sci. Technol. 2019, 95, 105475. [CrossRef]
- Yu, J.; Dong, X.; Li, Q. et al. Distributed Adaptive Cooperative Time-Varying Formation Tracking Guidance for Multiple Aerial Vehicles System. Aerosp. Sci. Technol. 2021, 117, 106925. [CrossRef]
- Zhan, Y.; Li, S.Y.; Zhou, D. Time-to-Go Based Three-Dimensional Multi-Missile Spatio-Temporal Cooperative Guidance Law: A Novel Approach for Maneuvering Target Interception. ISA Trans. 2024, 149, 178–195. [CrossRef]
- Jiang, Q.J.; Wang, X.G.; Bai, Y.L. et al. Intelligent Game-Maneuvering Policy for Reentry Glide Vehicle in Diving Phase. J. Astronaut. 2023, 44, 851–862.
- Shen, Z.P.; Yu, J.L.; Dong, X.W. et al. Penetration Trajectory Optimization for the Hypersonic Gliding Vehicle Encountering Two Interceptors. Aerosp. Sci. Technol. 2022, 121, 107363. [CrossRef]
- Guo, R.; Ding, Y. et al. An Intelligent Penetration Guidance Law Based on DDPG for Hypersonic Vehicle. In Proc. ICCES 2023: Comput. Exp. Simul. Eng., 2024; pp. 1349–1361.
- Liu, S.X.; Liu, S.J. et al. Current Developments in Foreign Hypersonic Vehicles and Defense Systems. Air Space Def. 2023, 6, 39–51.
- Guo, X. Penetration Game Strategy for Hypersonic Vehicles. Ph.D. Thesis, Northwestern Polytechnical University, Xi’an, China, 2018.
- Ren, L.L.; Guo, W.L. et al. Deep Reinforcement Learning Based Integrated Evasion and Impact Hierarchical Intelligent Policy of Exo-Atmospheric Vehicles. Chin. J. Aeronaut. 2025, 38, 103193. [CrossRef]
- Liu, P.; Yin, H.; Wang, W.D. et al. Maneuvering Trajectory Planning During the Whole Phase Based on Piecewise Radau Pseudospectral Method. In Proc. 37th Chin. Control Conf., Wuhan, China, 2018; pp. 4628–4632.
- Zarchan, P. Proportional Navigation and Weaving Targets. J. Guid. Control Dyn. 1995, 18, 969–974. [CrossRef]
- Sahlholm, T.; Sahlholm, A.; Putaala, A. Simple Missile Models Against High-G Barrel Roll Maneuver. In Proc. AIAA Guid. Navig. Control Conf., Portland, USA, 2011; pp. 1–12.
- Singh, S.K.; Reddy, P.V. Dynamic Network Analysis of a Target Defense Differential Game With Limited Observations. IEEE Trans. Control Netw. Syst. 2023, 10, 308–320. [CrossRef]
- Segal, A.; Miloh, T. Novel Three-Dimensional Differential Game and Capture Criteria for a Bank-to-Turn Missile. J. Guid. Control Dyn. 1994, 17, 1068–1074. [CrossRef]
- Liang, H.Z.; Wang, J.Y. et al. Optimal Guidance Against Active Defense Ballistic Missiles via Differential Game Strategies. Chin. J. Aeronaut. 2020, 33, 978–989. [CrossRef]
- Liu, F.; Dong, X.W. et al. Cooperative Differential Games Guidance Laws for Multiple Attackers Against an Active Defense Target. Chin. J. Aeronaut. 2022, 35, 374–389. [CrossRef]
- Xie, R.H.; Ding, Y. et al. Research on a New Maneuver Penetration Strategy of Ballistic Missile. Command Control Simul. 2021, 43, 12–17.
- Gavra, V.; Cook, A. et al. Missile Avoidance Using Reinforcement Learning. AIAA SCITECH 2025 Forum. [CrossRef]
- Jacob, T.; Jay, P. Defender-Aware Attacking Guidance Policy for the Target-Attacker-Defender Differential Game. J. Aerosp. Inf. Syst. 2021, 18, 366–376.
- Jiang, Q.J.; Wang, X.G. et al. Intelligent Game-Maneuvering Policy for Reentry Glide Vehicle in Diving Phase. J. Aerosp. Inf. 2023, 44, 851–862.
- Gaudet, B.; Furfaro, R. Terminal Adaptive Guidance for Autonomous Hypersonic Strike Weapons via Reinforcement Meta Learning. J. Spacecr. Rockets 2023, 60, 286–298. [CrossRef]
- Wang, X.F.; Gu, K.R. A Penetration Strategy Combining Deep Reinforcement Learning and Imitation Learning. J. Astronaut. 2023, 44, 914–925.
- Yao, D.D.; Xia, Q.L. Finite-Time Convergence Guidance Law for Hypersonic Morphing Vehicle. Aerospace 2024, 11, 680. [CrossRef]
- Yan, T.; Jiang, Z.; Li, T.; et al. Intelligent maneuver strategy for hypersonic vehicles in three-player pursuit-evasion games via deep reinforcement learning. Front Neurosci 2024, 18, 1362303. [CrossRef]
- Wang, X.F.; Zhang, X. et al. Integrated Strategy of Penetration and Attack Based on Optimal Control. Flight Dyn. 2022, 40, 51–71.
- Schulman, J.; Wolski, F. et al. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347.
- Bai, G.Y.; Shen, H.R. et al. Study on Omni-Directional Interception Guidance Law for High-Speed Maneuvering Targets. J. Equip. Acad. 2016, 27, 75–80.

| Parameters | Value |
|---|---|
| () | 80 |
| () | 80 |
| Interceptor 1 Initial Position(m) | (48000, -10000) |
| Interceptor 2 Initial Position(m) | (48000, 10000) |
| Target Position(m) | (55000, -10000) |
| Number of floors | Actor | Critic | Discriminator |
|---|---|---|---|
| Input Layer | 6(states) | 6(states) | 7(states) |
| Hidden Layer 1 | 256 | 256 | 256 |
| BatchNorm 1 | ones | ones | ones |
| Activation Function 1 | Relu | Relu | Relu |
| Hidden Layer 2 | 256 | 256 | 256 |
| BatchNorm 1 | ones | ones | ones |
| Activation Function 2 | Relu | Relu | Relu |
| Output Layer 1 | 1(Mean output) | 1(Value function) | 1() |
| Output Layer 2 | 1(Standard deviation output) | – | – |
| Algorithm 2 Simplified GAIL-PPO |
|---|
| 1. Input: initial policy parameters , initial value function parameters , initial |
| discriminator parameters . |
| 2. While do |
| 3. Compute (Equation 23). |
| 4. If |
| 5. Collect via . |
| 6. Update discriminator: (Adam). |
| 7. Compute GAIL rewards: . |
| 8. Else |
| 9. Collect via . |
| 10. Use environment rewards (Equation 22). |
| 11. End if |
| 12. Compute rewards-to-go: |
| (use or ). |
| 13. Compute GAE advantage (Equation 26). |
| 14. Update policy via PPO-CLIP: |
| . |
| 15. Update value func. via MSE: |
| . |
| 16. If for N consecutive rounds, break. |
| 17. . |
| 18. End while |
| 19. Output: . |
| Parameters | Value |
|---|---|
| Interceptor 1 Initial Position(m) | ([42000,52000], -10000) |
| Interceptor 2 Initial Position(m) | ([42000,52000], 10000) |
| Target Position(m) | ([50000,60000], -10000) |
| LOS angle(rad) | [,] |
| (rad) | [-/20,/20] |
| (m/s) | 20 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).