M-VP2: Microservice-Oriented Vulnerability Patch Planning - A Cost-Aware Approach Using Multi-Agent Reinforcement Learning

Daoquan Zhou

doi:10.20944/preprints202601.1784.v1

Submitted:

22 January 2026

Posted:

23 January 2026

You are already at the latest version

Abstract

Microservice architectures amplify the volume and complexity of security vulnerabilities, making it increasingly difficult for security and SRE teams to decide which services to patch, when to patch them, and how to coordinate patches under strict cost and availability constraints. Traditional prioritization schemes based on CVSS scores or static business criticality heuristics ignore inter-service dependencies, deployment topologies, and operational costs such as downtime, rollback risk, and engineering effort. In this paper, we propose M-VP2a microservice-oriented vulnerability patch planning framework that formulates patch scheduling as a cost-aware multi-agent reinforcement learning (MARL) problem. Each microservice is modeled as an autonomous agent that selects patching actions over time (e.g. patch now, defer, or batch with other changes), while a joint reward function balances security risk reduction, patching and downtime cost, and compliance with service-level objectives. The environment captures call-graph dependencies, cascading failure modes, and temporal exploit likelihood, enabling agents to learn coordination strategies that avoid risky simultaneous updates on tightly coupled services. We design a hierarchical actor–critic architecture with centralized training and decentralized execution, augmented with a risk-aware reward shaping mechanism to penalize unsafe patch combinations and SLA violations. Extensive simulation experiments on synthetic and real-world–inspired microservice topologies show that M-VP2 reduces expected breach risk and aggregate patching cost by up to double-digit percentages compared with CVSS-based heuristics, greedy risk–cost ranking, and single-agent RL baselines, while producing patch plans that are more stable, interpretable, and aligned with operational constraints.

Keywords:

microservice vulnerability patch planning

;

multi-agent reinforcement learning (MARL)

;

cost-aware

;

vulnerability management

;

patch scheduling

;

dependency-aware coordination

Subject:

Computer Science and Mathematics - Computer Science

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

M-VP2: Microservice-Oriented Vulnerability Patch Planning - A Cost-Aware Approach Using Multi-Agent Reinforcement Learning

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe