Submitted:
31 July 2023
Posted:
01 August 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Establish the motion model of aircraft.
- The basic predictor-corrector algorithm is given. The Q-learning algorithm is used for attack and sweep angle scheme, which can cross the no-fly zones from above. B-spline curve method is used to solve the flight path points to ensure that the aircraft can cross the no-fly zones through the points. The size of bank angle is solved by the state error of the aircraft arriving at the target and flight point. The change logic of the bank angle sign is designed to ensure the aircraft flying safely to the target.
- The Monte Carlo Reinforcement Learning method is used to improve the predictor-corrector algorithm, and the Depth Neural Network is used to fit the reward function.
- Verify the effectiveness of the algorithm through simulation.
2. Materials and Methods
2.1. Aircraft Motion Model
- The earth is a homogeneous sphere,
- The aircraft is a mass point which satisfies the assumption of instantaneous equilibrium,
- Sideslip angle β and the lateral force Z are both 0 during flight,
- Earth rotation is not taken into account.
2.2. Constraint Model
- Heating rate constraint:
- 2.
- Dynamic pressure q constraint:
- 3.
- Overload n constraint:
- 4.
- No-fly Zone Model
3. Basic Predictor-corrector Guidance Algorithm
3.1. Attack Angle and Sweep Angle Scheme
3.1.1. Q-learning Principles
- Selection algorithm parameters: α∈(0,1), γ∈(0,1), maximum iteration steps tmax.
- Initialization: For all s∈S, a∈A (s), initialize Q(s, a)=0, t=0.
- For each learning round:
- 4.
- Reach the termination state, or t>tmax.
3.1.2. Q-learning Algorithm Setting
- State set
- 2.
- Action set
- 3.
- Reward function
3.2. Flight Path Point Plan
3.2.1. B-spline curve principle
3.2.2. No-fly Zone Avoidance Methods
- Based on the location of the circles, choose an appropriate direction to get the tangent points of the circles, and then select different combinations of tangent points to obtain the initial control points. If the initial point and target line through the threat zone, at least one tangent point is selected as the control point, and at most one tangent point is selected for each zone.
- Augment initial control point set. The initial augmentation control point is located on the initial heading to ensure the initial heading angle, and the intermediate augmented control points are located on both sides of the tangent points, then the control point set is obtained. The initial position P0 and end position Pn of the curve correspond to the initial position of the aircraft and the target. In order to ensure that the aircraft can avoid the threat area, as long as the aircraft is on the other side of the threat area tangent line Therefore, the B-spline curve is designed to be tangent to the circle of the zone. According to the characteristic of the curve, the tangent point can be the middle point of three collinear control points. Then adjust the distance d1 and d2 between the two adjacent control points to control the curvature of the curve near the tangent point so that it does not intersect the circle, as shown in Figure 3. In the figure, P0~P4 are control points, and the red spline curve is tangent to the no-fly zone, avoiding the curve from crossing the zone.
- 3.
- Take the distance between the tangent point and the augmented point as the optimization variable. Take the spline curve length and mean curvature as the performance indicators. The optimal curve is obtained through genetic algorithm, and the control points are obtained. The optimization model is as follows:
- 4.
- Simplify the control points to obtain the flight path points.
3.3. Bank Angle Scheme
3.3.1. Bank Angle Size Scheme
3.3.2. Bank Angle Sign Scheme
4. Improving Predictor-corrector methods
4.1. Monte Carlo Reinforcement Learning Method
4.1.1. MCRL Principle
| Algorithm: MCRL algorithm Input: environment E, state space S, action space A, initialization behavior value function Q. Output: Optimal strategy π *. Initialize Q(s, a) = 0, total reward G=0 For k=0, 1, ..., n Execute in E ε-greedy strategy π'generates trajectory For t=0, 1, 2, ..., n End for End for |
4.1.2. MCRL Method Settings
4.2. Deep Neural Network Fitting the Reward Function
- DNN principle
- 2.
- DNN settings
5. Simulation
5.1. Simulation of Attack and Sweep Angle Scheme





5.2. Flight Path Point Planning Results
5.3. Simulation of Network Training
- 1.
- DNN network training



5.4. Simulation of trajectory planning algorithm








6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jason Bowman, Ryan Plumley, Jeffrey Dubois and David Wright. "Mission Effectiveness Comparisons of Morphing and Non-Morphing Vehicles," AIAA 2006-7771. 6th AIAA Aviation Technology, Integration and Operations Conference (ATIO). September 2006. https://doi.org/10.2514/6.2006-7771. [CrossRef]
- Austin, A. Phoenix, Jesse R. Maxwell, and Robert E. Rogers. "Mach 5–3.5 Morphing Wave-rider Accuracy and Aerodynamic Performance Evaluation". Journal of Aircraft, 2019 56:5, 2047-2061.
- W. Peng, Z. Feng, T. Yang and B. Zhang, "Trajectory multi-objective optimization of hypersonic morphing aircraft based on variable sweep wing," 2018 3rd International Conference on Control and Robotics Engineering (ICCRE), Nagoya, Japan, 2018, pp. 65-69.
- H. Yang, T. Chao and S. Wang, "Multi-objective Trajectory Optimization for Hypersonic Telescopic Wing Morphing Aircraft Using a Hybrid MOEA/D," 2022 China Automation Congress (CAC), Xiamen, China, 2022, pp. 2653-2658.
- C Wei, X Ju, F He and B G Lu. "Research on Non-stationary Control of Advanced Hypersonic Morphing Vehicles," AIAA 2017-2405. 21st AIAA International Space Planes and Hypersonics Technologies Conference. March 2017. [CrossRef]
- J. Guo, Y. Wang, X. Liao, C. Wang, J. Qiao and H. Teng, "Attitude Control for Hypersonic Morphing Vehicles Based on Fixed-time Disturbance Observers," 2022 China Automation Congress (CAC), Xiamen, China, 2022, pp. 6616-6621. [CrossRef]
- Wingrove, R. C. (1963). Survey of Atmosphere Re-entry Guidance and Control Methods. AIAA Journal, 1(9), 2019–2029. [CrossRef]
- Mease K, Chen D, Tandon S, et al. A three-dimensional predictive entry guidance approach [C]. AIAA Guidance, Navigation and Control Conference and Exhibit. American Institute of Aeronautics and Astronautics, 2000.
- H.L. Zhao and H. W. Liu, "A Predictor-corrector Smoothing Newton Method for Solving the Second-order Cone Complementarity," 2010 International Conference on Computational Aspects of Social Networks, Taiyuan, China, 2010, pp. 259-262. [CrossRef]
- H. Wang, Q. Li and Z. Ren, "Predictor-corrector entry guidance for high-lifting hypersonic vehicles," 2016 35th Chinese Control Conference (CCC), Chengdu, China, 2016, pp. 5636-5640. [CrossRef]
- S. Liu, Z. Liang, Q. Li and Z. Ren, "Predictor-corrector guidance for entry with terminal altitude constraint," 2016 35th Chinese Control Conference (CCC), Chengdu, China, 2016, pp. 5557-5562. [CrossRef]
- M. Xu, L. Liu, G. Tang and K. Chen, "Quasi-equilibrium glide auto-adaptive entry guidance based on ideology of predictor-corrector," Proceedings of 5th International Conference on Recent Advances in Space Technologies - RAST2011, Istanbul, Turkey, 2011, pp. 265-269.
- W Li, S Sun and Z Shen, "An adaptive predictor-corrector entry guidance law based on online parameter estimation," 2016 IEEE Chinese Guidance, Navigation and Control Conference (CGNCC), Nanjing, 2016, pp. 1692-1697. [CrossRef]
- Z. Liang, Z. Ren, C. Bai and Z. Xiong, "Hybrid reentry guidance based on reference-trajectory and predictor-corrector," Proceedings of the 32nd Chinese Control Conference, Xi'an, China, 2013, pp. 4870-4874.
- Jay W. McMahon, Davide Amato, Donald Kuettel and Melis J. Grace. "Stochastic Predictor-Corrector Guidance," AIAA 2022-1771. AIAA SCITECH 2022 Forum. January 2022. 20 January. [CrossRef]
- H. Chi and M. Zhou, "Trajectory Planning for Hypersonic Vehicles with Reinforcement Learning," 2021 40th Chinese Control Conference (CCC), Shanghai, China, 2021, pp. 3721-3726. [CrossRef]
- Z. Shen, J. Yu, X. Dong and Z. Ren, "Deep Neural Network-Based Penetration Trajectory Generation for Hypersonic Gliding Vehicles Encountering Two Interceptors," 2022 41st Chinese Control Conference (CCC), Hefei, China, 2022, pp. 3392-3397.
- Z. Kai and G. Zhenyun, "Neural predictor-corrector guidance based on optimized trajectory," Proceedings of 2014 IEEE Chinese Guidance, Navigation and Control Conference, Yantai, China, 2014, pp. 523-528. [CrossRef]
- Y. Lv, D. Hao, Y. Gao and Y. Li, "Q-Learning Dynamic Path Planning for an HCV Avoiding Unknown Threatened Area," 2020 Chinese Automation Congress (CAC), Shanghai, China, 2020, pp. 271-274. [CrossRef]
- Brian Gaudet, Kris Drozd and Roberto Furfaro. "Adaptive Approach Phase Guidance for a Hypersonic Glider via Reinforcement Meta Learning," AIAA 2022-2214. AIAA SCITECH 2022 Forum. January 2022. [CrossRef]
- J. Subramanian and A. Mahajan, "Renewal Monte Carlo: Renewal Theory-Based Reinforcement Learning," in IEEE Transactions on Automatic Control, vol. 65, no. 8, pp. 3663-3670, Aug. 2020.
- J. F. Peters, D. Lockery and S. Ramanna, "Monte Carlo off-policy reinforcement learning: a rough set approach," Fifth International Conference on Hybrid Intelligent Systems (HIS'05), Rio de Janeiro, Brazil, 2005, pp. 6.
- Rory Lipkis, Ritchie Lee, Joshua Silbermann and Tyler Young. "Adaptive Stress Testing of Collision Avoidance Systems for Small UASs with Deep Reinforcement Learning," AIAA 2022-1854. AIAA SCITECH 2022 Forum. January 2022. [CrossRef]
- Abhay Singh Bhadoriya, Swaroop Darbha, Sivakumar Rathinam, David Casbeer, Steven J. Rasmussen and Satyanarayana G. Manyam. "Multi-Agent Assisted Shortest Path Planning using Monte Carlo Tree Search," AIAA 2023-2655. AIAA SCITECH 2023 Forum. January 20233. [CrossRef]
- Lu, Ping. "Entry Guidance: A Unified Method". Journal of Guidance, Control, and Dynamics, 37(3), 713–728. [CrossRef]
- P. Han and J. Shan, "RLV's re-entry trajectory optimization based on B-spline theory," 2011 International Conference on Electrical and Control Engineering, Yichang, China, 2011, pp. 4942-4946.
- E. Adsawinnawanawa and N. Keeratipranon, "The Sharing of Similar Knowledge on Monte Carlo Algorithm applies to Cryptocurrency Trading Problem," 2022 International Electrical Engineering Congress (iEECON), Khon Kaen, Thailand, 2022, pp. 1-4. [CrossRef]






| Trajectory | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| J1 | 55.51 | 54.25 | 54.06 | 56.83 | 54.24 | 57.23 | 55.62 | 60.9 |
| J2 | 275.21 | 174.19 | 181.97 | 230.29 | 175.02 | 251.12 | 329.97 | 350.7 |
![]() |
| parameter | b1 | b2 | b3 | μ | σ1 | σ2 | σ3 |
| value | 0.8 | 0.1 | 0.1 | 1000 | 0.0001 | 100 | 1000000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
