Submitted:
19 May 2026
Posted:
21 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We propose HC-MARL (Hierarchical Cascaded Multi-Agent Reinforcement Learning), a general-purpose hierarchical collaborative multi-agent architecture for intelligent cyber wargaming, which effectively handles cross-level information transfer and alert information sharing. The architecture is inherently consistent with the organizational form of cyber defense;
- We design a message transformation function that converts variable-length messages fed back from lower layers into fixed-length observation vectors, enabling adaptation to different network scenarios and personnel configurations. An attention mechanism is also employed to optimize the efficiency of information transfer between cross-level agents;
- We adopt an approach combining global rewards with local rewards, as well as long-horizon rewards with instantaneous rewards, to achieve more stable multi-agent policy learning;
- We design a policy function that integrates neural networks with empirical knowledge to enhance threat response efficiency while ensuring smooth transmission of up-level commands.
2. Related Work
3. Methods
3.1. Mathematical Representation of the Defense in Intelligent Cyber Wargaming
3.2. HC-MARL Framework Design
3.3. Observation Space Design
3.4. Transformation Function Design
3.5. Policy Function Design
3.6. Reward Design
3.7. Learning and Optimization
4. Experiments
5. Discussion
Author Contributions
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kiely, M.; Ahiskali, M.; Borde, E.; Bowman, B.; Bowman, D.; Van Bruggen, D.; Cowan, K.; Dasgupta, P.; Devendorf, E.; Edwards, B.; et al. Exploring the efficacy of multi-agent reinforcement learning for autonomous cyber defence: A CAGE challenge 4 perspective. AAAI 2025, 39, 28907–28913. [Google Scholar] [CrossRef]
- Kiely, M.; Bowman, D.; Standen, M.; Moir, C. On Autonomous Agents in a Cyber Defence Environment. 2023. Available online: http://arxiv.org/abs/2309.07388. [CrossRef]
- Kunz, T.; Fisher, C.; Novara-Gsell, J.L.; Nguyen, C.; Li, L. A Multiagent CyberBattleSim for RL Cyber Operation Agents. 2023. Available online: http://arxiv.org/abs/2304.11052. [CrossRef]
- Landolt, C.R.; Würsch, C.; Meier, R.; Mermoud, A.; Jang-Jaccard, J. Multi-Agent Reinforcement Learning in Cybersecurity: From Fundamentals to Applications. 2025. Available online: http://arxiv.org/abs/2505.19837. [CrossRef]
- Nguyen, T.T.; Reddi, V. Deep Reinforcement Learning for Cyber Security. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 3779–3795. [Google Scholar] [CrossRef] [PubMed]
- Oesch, S.; Austria, P.; Chaulagain, A.; Weber, B.; Watson, C.; Dixson, M.; Sadovnik, A. The Path To Autonomous Cyber Defense. 2024. Available online: http://arxiv.org/abs/2404.10788. [CrossRef]
- Wang, M.; Dechene, R. Multi-Agent Actor-Critics in Autonomous Cyber Defense. arXiv 2024. [Google Scholar] [CrossRef]
- Standen, M.; Lucas, M.; Bowman, D.; Richer, T.J.; Kim, J.; Marriott, D. CybORG: A Gym for the Development of Autonomous Cyber Agents. 2021. Available online: http://arxiv.org/abs/2108.09118. [CrossRef]
- Singh, A.V.; Rathbun, E.; Graham, E.; Oakley, L.; Boboila, S.; Chin, P.; Oprea, A. Hierarchical multi-agent reinforcement learning for cyber network defense. 2025. [Google Scholar]
- Tang, Y.; Sun, J.; Wang, H.; Deng, J.; Tong, L.; Xu, W. A method of network attack-defense game and collaborative defense decision-making based on hierarchical multi-agent reinforcement learning. Comput. Secur. 2024, 142, 103871. [Google Scholar] [CrossRef]
- Hürten, T.; Loevenich, J.F.; Spelter, F.; Adler, E.; Braun, J.; Moxon, L.; Gourlet, Y.; Lefeuvre, T.; Lopes, R. Hierarchical multi-agent reinforcement learning for autonomous cyber defense in coalition networks. MILCOM 2024 - 2024 IEEE Military Communications Conference (MILCOM), Washington, DC, USA, 2024; IEEE; pp. 176–181. [Google Scholar] [CrossRef]
- Alshamrani, A. Federated hierarchical MARL for zero-shot cyber defense. PLoS ONE 2025, 20, e0329969. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Tu, J.; Hong, Y.; Xiong, L.; Jin, Y.; Tang, Y.; Li, F. HCPO: Hierarchical Conductor-Based Policy Optimization in Multi-Agent Reinforcement Learning. 2025. Available online: http://arxiv.org/abs/2511.12123. [CrossRef]
- Singh, A.V.; Rathbun, E.; Graham, E.; Oakley, L.; Boboila, S.; Oprea, A.; Chin, P. Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense. 2024. Available online: http://arxiv.org/abs/2410.17351. [CrossRef]
- Paolo, G.; Benechehab, A.; Cherkaoui, H.; Thomas, A.; Kégl, B. TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning. 2025. Available online: http://arxiv.org/abs/2502.15425. [CrossRef]
- Fox, D.; McCollum, C.; Arnoth, E.; Mak, D. Cyber Wargaming: Framework for Enhancing Cyber Wargaming with Realistic Business Context; The Homeland Security Systems Engineering and Development Institute (HSSEDI)TM, 2018. [Google Scholar]
- He, J.; Lian, X.; Qi, Q.; Zhang, H. Discussion on Key Technologies of Large-Scale Network Confrontation Wargaming System. Commun. Technol. 2018, 51, 450–456. [Google Scholar]
- Haggman, A. Cyber Wargaming: Finding, Designing, and Playing Wargames for Cyber Security Education. 2019. Available online: https://pure.royalholloway.ac.uk/en/publications/cyber-wargaming-finding-designing-and-playing-wargames-for-cyber-.
- Curry, J.; Drage, N. The handbook of cyber wargames: Wargaming the 21st century; History of Wargaming Project: London, 2020. [Google Scholar]
- Chen, S. An analysis of cyber wargaming: Current games, limitations, and recommendations. In CMC Senior Theses; 2022. [Google Scholar]
- Roche, E. Cyber wargaming: Research and education for security in a dangerous digital world. J. Strateg. Secur. 2025, 18. [Google Scholar]
- Yang, Y.; Wu, J.; Gao, Z.; Liang, Z.; Hong, C.; Li, P.; Zhang, Y. Wargaming Technology Towards Cybersecurity Threat Evolution in Power Information Network. South. Power Syst. Technol. 2025, 19(6), 52–62. [Google Scholar] [CrossRef]
- Foley, M.; Hicks, C.; Highnam, K.; Mavroudis, V. Autonomous Network Defence using Reinforcement Learning. In Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, Nagasaki Japan, 2022; ACM; pp. 1252–1254. [Google Scholar] [CrossRef]
- Hannay, J. john-cardiff/-cyborg-cage-2. 2025. Available online: https://github.com/john-cardiff/-cyborg-cage-2.
- Wiebe, J.; Mallah, R.A.; Li, L. Learning Cyber Defence Tactics from Scratch with Multi-Agent Reinforcement Learning. 2023. Available online: http://arxiv.org/abs/2310.05939. [CrossRef]
- Wang, Q.; He, Z.; Shi, H. Simplifying communication control: A cooperative multi-agent reinforcement learning framework based on group decision-making. J. King Saud. Univ. Comput. Inf. Sci. 2025, 37, 317. [Google Scholar] [CrossRef]
- Palmer, G.; Parry, C.; Harrold, D.J.B.; Willis, C. Deep Reinforcement Learning for Autonomous Cyber Defence: A Survey. 2024. Available online: http://arxiv.org/abs/2310.07745. [CrossRef]
- Lazer, S.J.; Aryal, K.; Gupta, M.; Bertino, E. A survey of agentic AI and cybersecurity: Challenges, opportunities and use-case prototypes. 2026. Available online: http://arxiv.org/abs/2601.05293. [CrossRef]
- Oliehoek, F.A.; Amato, C. A concise introduction to decentralized POMDPs; Springer International Publishing: Cham, 2016. [Google Scholar] [CrossRef]
- TTCP CAGE Working Group: TTCP CAGE challenge 4. 2023. Available online: https://github.com/cage-challenge/cage-challenge-4.









| Hyperparameter | Values |
| d_model | 76 |
| nhead | 4 |
| num_encoder_layers | 1 |
| dim_feedforward | 256 |
| dropout | 0.1 |
| activation | ReLU |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.