In this section, the manuscript analyzes and verifies escape guidance strategy based on Meta SAC. SAC is used to solve specific escape guidance mission. We conduct comprehensive experiments to verify whether the UAV can complete the guidance escape mission under satisfying terminal constraints and process constraints. Once the UAV guidance escape mission changes, the original strategy based on SAC is difficult to adapt to the changed mission, which is need to be relearned and retrained. The manuscript proposes an optimization method via Meta learning, which improves the learning ability of UAV during the training process. This section focuses on verifying the validation of Meta SAC, demonstrating the performance in various new missions. Besides, the maneuvering overload commands under different pursuit evading distances are analyzed, which is used to explore the influence of different maneuvering timings and distances on the escape results. Taking CAV-H to verify the escape guidance performance. The initial conditions, terminal altitude and Meta SAC training parameters are given in
Table 1.
6.1. Validity verification on SAC
In order to verify the effectiveness of SAC, three different pursuit evading scenarios are constructed, and the terminal reward value, miss distance and terminal position deviate are respectively analyzed. As shown in
Figure 8(a), the terminal reward value is poor in the initial phase of training, which manifests the optimal strategy is not found. After 500 episodes, the terminal reward value increases gradually, indicating better strategy is explored and converged. At the last 100 episodes, the optimal strategy is trained and learned, meanwhile, the network parameters have been adjusted to the optimal. As can be seen from
Figure 8(b), the miss distance is relatively divergent in the first 150 episodes of training, indicating the action network in SAC is constantly exploring new strategies, and the critic network is also learning scientific evaluation criteria. After 500 training episodes, the network is gradually learning and training in the direction of optimal solution. The miss distance at the encounter moment converges to about 20m. As shown in
Figure 8(c), the terminal position of UAV has a large deviation in the early training phase, which is attributed to the exploration of escape strategy by network. In the later training phase, the position deviation is less affected by exploration. These pursuit evading scenarios tested in the manuscript can achieve convergence, and the final convergence values are all within 1m.
Figure 6.
Train results of SAC. (a) reward value, (b) miss distance, (c) target deviate.
Figure 6.
Train results of SAC. (a) reward value, (b) miss distance, (c) target deviate.
In order to verify whether the SAC algorithm can solve the escape guidance strategy that meets the mission requirements in different pursuit and evasion scenarios, the pursuing and evading distance is changed, and the training results are shown in the
Figure 7. In medium range scenario, the miss distance converges to about 2m, and the terminal deviation converges to about 1m.
Figure 7.
Train results of SAC. (a) reward value, (b) miss distance, (c) target deviate.
Figure 7.
Train results of SAC. (a) reward value, (b) miss distance, (c) target deviate.
As shown in
Figure 8, In long range attack and defense scenarios, the miss distance converges to about 5m, and the terminal deviation converges to about 1m.
Figure 8.
Train results of SAC. (a) reward value, (b) miss distance, (c) target deviate.
Figure 8.
Train results of SAC. (a) reward value, (b) miss distance, (c) target deviate.
Based on above simulation analysis, SAC is a feasible method to solve the UAV guidance escape strategy. After limited episodes of learning and training, network parameters are converged, which is used to test on flight mission.
6.2. Validity Verification on Meta SAC
When the mission of UAV changes, the original SAC parameters can not meet acquirements of new mission,which needs to be re-trained and learned. The SAC proposed in the manuscript is improved via Meta learning. Strong adaptive network parameters are found by learning and training, when the pursuit evading environment changes, the network parameters is fine-tuned to adapt to the new environment immediately.
Meta SAC is divided into meta training phase and meta testing phase, Initialization parameters for SAC network are trained in meta training phase, which is fine-tuned by interacting with the new environment in meta testing phase. By changing the initial interceptor position, three different pursuit evading scenarios are constructed, which respectively represents short distance, medium and long distance.
Training results of Meta SAC and SAC are compared, terminal reward values are represented as shown in
Figure 9(a). Meta SAC is an effective method to speed up the training process, after 100 episodes, better strategy is learned by the network and converged gradually, contraryly, the SAC network needs 500 episodes to find the optimal solution. Miss distance is shown in
Figure 9(b). The better strategy is quickly learned by Meta SAC, which is more effective than the SAC method.
Figure 9(c) show the terminal deviate between UAV and target.
To explore the optimal solution as much as possible, some strategies with large terminal position deviation appear in the training process. As shown in
Figure 10(b-c), in medium range attack and defense scenarios, the miss distance converges to about 8m based on Meta SAC, and the terminal deviation converges to about 1m.
As shown in
Figure 11(b-c), in long range attack and defense scenarios, the miss distance converges to about 10m based on Meta SAC, and the terminal deviation converges to about 1m.
According to the theoretical analysis, in the training process, new missions corresponding to the same distribution are used to execute micro-testing by Meta SAC, resulting in more gradient descending directions of optimal solution are learned by network. Combined with the theory analysis and training results, the manuscript manifests Meta learning is a feasible method to accelerate convergence and improve the efficiency of training.
In the previous analysis, when the pursuit evading secenerio is changed, network parameters obtained in the meta training phase are fine-tuned through few interactions. The manuscript verifies meta testing performance by changing the initial interceptor position, and results compared with SAC method are shown in the
Table 2. Based on the network parameters of meta training phase, the strategic solutions meeting escape guidance missions are found through training within 10 episodes. On the contrary, network parameters based on SAC need more interaction to find solutions, and the the episode of interactions is basically more than 50 episodes. According to above simulation, the adaptability of Meta SAC is much greater than SAC, once the escape mission changing, through very few episodes of learning, the new mission is completed by UAV without re-learning and designing strategy. The method provides possibility for realizing UAV learning while flying.
6.3. Strategy Analysis Based on Meta SAC
This section tests the network parameters based on Meta SAC, and analyzes the escape strategy and flight state under different pursuit evading distances. As shown in
Figure 12(a), for the pursuit evading scene of short distance, the longitudinal maneuvering overload is larger in the first half phase of escape, resulting in velocity slope angle decreases gradually. In the second half phase of escape, if strategy is executed under the original maneuvering overload, the terminal altitude constraint can not be satisfied, therefore, the overload gradually decreases, the velocity slope angle is slowly reduced. As shown from
Figure 12(b), at the beginning of escape, the lateral maneuvering overload is positive, and the velocity azimuth angle is constantly increasing. With the distance between UAV and interceptor reducing, the overload increases gradually in the opposite direction, and the velocity azimuth angle decrease. On the one hand, it can confuse the strategy of interceptor, on the other hand, the guidance course is corrected.
As shown in
Figure 13(a), compared with the pursuit evading scene of short distance, the medium escape process takes longer, the pursuing time left to interceptor is longer, and the UAV flies under the direction of increasing the velocity slope angle. The timing of maximum escape overload corresponding to the medium distance is also different. As shown in
Figure 13(b), in the first half phase of escape, lateral maneuvering overload corresponding to medimum distance is larger than that in the short distance, and in the second half phase of the escape, the corresponding reverse maneuvering overload is smaller, resulting in UAV can use the longer escape time to slowly correct the course.
As shown in
Figure 14, under the long pursuit distance, the overload change of UAV maneuver is similar to that of medium range, and the escape timing is basically the same as the escape strategy.
According to the above analysis, the escape guidance strategy via Meta SAC can be used as a tactical escape strategy, and the timing of escape and maneuvering overload are adjusted timely under different pursuit evading distances. On the one hand, the overload corresponding to this strategy can confuse the interceptor and cause some interference, on the other hand, it can take into account the guidance mission, correcting the course deviation caused by escape.
Figure 15(a) shows the flight trajectory of interceptor against UAV at the North East Down (NED) coordinate (10 km,30 km,30 km), the trajectory point at the encountering moment is shown in
Figure 15(b), and the miss distance is 19 m in this pursuit evading scene. To verify the scientific and applicability of Meta SAC, the initial position of interceptor is changed. Flight trajectories are respectively represented shown in
Figure 15(c,e), and trajectory points at the encountering moment are shown in
Figure 15(d,f). The miss distances in these two pursuit evading scenarios are 3 m and 6 m respectively. Based on the CAV-H structure, the miss distance between UAV and interceptor is greater than 2 m at the encountering moment, which means the escape mission is achieved.
Based on the principle of Meta SAC and optimal guidance, flight states are shown in the
Figure 16. Longitude, latitude and altitude during flight of UAV are shown in
Figure 16(a)-(b), under different pursuit evading scenarios, terminal position and altitude constraints are meet. There is larger amplitude modification in the velocity slope and azimuth angle, which is attributed to escape strategy via lateral and longitudinal maneuvering, as shown in
Figure 16(c)-(d). The total change of velocity slope and azimuth angle is within two degrees, which meets flight process constraints. Through the analysis of flight states, this escape strategy is an effective measure for guidance escape with high accuracy.
Flight process deviation mainly includes aerodynamic calculation deviation and output overload deviation. For the aerodynamic deviation, the manuscript uses interpolation method to calculate based on the flight Mach number and angle of attack, which may have some deviation. Therefore, when calculating the aerodynamic coefficient, random noise with an amplitude of 0.1 is added to verify whether the UAV can complete the guidance mission. As shown in
Figure 17 (a), aerodynamic deviation noise causes certain disturbances to the angle of attack during flight. At the 10th second and end of flight, the maximum deviation of the angle of attack is 2 °. However, overall, the impact of aerodynamic deviation on the entire flight is relatively small, and the change in angle of attack is still within the safe range of the UAV. As shown in
Figure 17 (b), due to the constraints of UAV game confrontation and guidance missions, the bank angle during the entire flight process changes significantly, and aerodynamic deviation noise has a small impact on the bank angle. After increasing the aerodynamic deviation noise, the miss distance between the UAV and the interceptor at the time of encounter is 8.908m, and the terminal position deviation is 0.52m. Therefore, under the influence of aerodynamic deviation, the UAV can still complete the escape guidance mission.
For the output overload deviation, the total overload is composed of the guiding overload derived from the optimal guidance law and the maneuvering overload output from the neural network. random maneuvering overload with an amplitude of 0.1 is added to verify whether the UAV can complete the maneuver guidance mission. As shown in
Figure 18, random overloads are added in the longitudinal and lateral directions respectively. Through simulation testing, the miss distance between the UAV and the interceptor at the encounter point is 10.51m, and the terminal deviation of the UAV is 0.6m. Under this deviation, the UAV can still achieve high-precision guidance and efficient penetration.