Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Mastering Air Combat Game With Deep Reinforcement Learning

Version 1 : Received: 10 May 2023 / Approved: 11 May 2023 / Online: 11 May 2023 (13:42:45 CEST)

A peer-reviewed article of this Preprint also exists.

Zhu, J.; Kuang, M.; Zhou, W.; Shi, H.; Zhu, J.; Han, X. Mastering Air Combat Game with Deep Reinforcement Learning. Defence Technology 2023, doi:10.1016/j.dt.2023.08.019. Zhu, J.; Kuang, M.; Zhou, W.; Shi, H.; Zhu, J.; Han, X. Mastering Air Combat Game with Deep Reinforcement Learning. Defence Technology 2023, doi:10.1016/j.dt.2023.08.019.

Abstract

Reinforcement learning is used for air combat problems in recent years, and the idea of curriculum learning is often used for reinforcement learning, but traditional curriculum learning suffers from the problem of plasticity loss in neural networks. Plasticity loss is the difficulty of learning new knowledge after the network has converged. To this end, we propose a motivational curriculum learning distributed proximal policy optimization (MCLDPPO) algorithm, through which agents trained can significantly outperform the predictive game tree and mainstream reinforcement learning methods. The motivational curriculum learning is to help the agent gradually improve their combat ability by observing the agent's unsatisfactory performance and providing corresponding rewards as a guide. Additionally, a complete tactical maneuver is encapsulated based on existing air combat knowledge, and through the flexible use of these maneuvers, some tactics beyond human knowledge can be realized. In addition, we designed an interruption mechanism for the agent to increase the frequency of decision-making when the agent faces an emergency. When the number of threats the agent receives changes, the current action will be interrupted to reacquire observations and make decisions again. Using the interruption mechanism can significantly improve the ability of the agent. To simulate actual air combat better, we built an air combat environment based on digital twin technology to create an air combat engine core such as missiles and radars. Based on this environment, we develop a distributed intelligent training system that employs Redis as the cache and obtains the data generated in the distributed environment through a multi-threaded TCP/IP protocol, which significantly increases the training data throughput and helps the agent accelerates convergence. The experimental results demonstrate that the agent can fully exploit the situational information to make reasonable decisions and affords air combat tactical adaption, verifying the effectiveness of the algorithmic framework proposed in this paper.

Keywords

Air combat; MCLDPPO; Interruption mechanism; Digital twin; Distributed system

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.