Preprint
Article

This version is not peer-reviewed.

M-VP2: Microservice-Oriented Vulnerability Patch Planning - A Cost-Aware Approach Using Multi-Agent Reinforcement Learning

Submitted:

22 January 2026

Posted:

23 January 2026

You are already at the latest version

Abstract
Microservice architectures amplify the volume and complexity of security vulnerabilities, making it increasingly difficult for security and SRE teams to decide which services to patch, when to patch them, and how to coordinate patches under strict cost and availability constraints. Traditional prioritization schemes based on CVSS scores or static business criticality heuristics ignore inter-service dependencies, deployment topologies, and operational costs such as downtime, rollback risk, and engineering effort. In this paper, we propose M-VP2a microservice-oriented vulnerability patch planning framework that formulates patch scheduling as a cost-aware multi-agent reinforcement learning (MARL) problem. Each microservice is modeled as an autonomous agent that selects patching actions over time (e.g. patch now, defer, or batch with other changes), while a joint reward function balances security risk reduction, patching and downtime cost, and compliance with service-level objectives. The environment captures call-graph dependencies, cascading failure modes, and temporal exploit likelihood, enabling agents to learn coordination strategies that avoid risky simultaneous updates on tightly coupled services. We design a hierarchical actor–critic architecture with centralized training and decentralized execution, augmented with a risk-aware reward shaping mechanism to penalize unsafe patch combinations and SLA violations. Extensive simulation experiments on synthetic and real-world–inspired microservice topologies show that M-VP2 reduces expected breach risk and aggregate patching cost by up to double-digit percentages compared with CVSS-based heuristics, greedy risk–cost ranking, and single-agent RL baselines, while producing patch plans that are more stable, interpretable, and aligned with operational constraints.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated