FedMARL-LTI: Federated Multi-Agent Reinforcement Learning with LLM-Driven Threat Intelligence for Cooperative Cyber Defense

Fatih Şahin

doi:10.20944/preprints202607.0295.v1

Submitted:

02 July 2026

Posted:

03 July 2026

You are already at the latest version

Abstract

Cross-organization cyber defense must reconcile collaborative learning with privacy and adversarial robustness — yet standard federated learning ships full gradient tensors, leaking sensitive posture and inviting Byzantine manipulation. We present FedMARL-LTI, a federated multi-agent reinforcement learning framework whose architecture answers both pressures with a single decision: organizations share neither raw data nor model weights, only differentially-private 768-dimensional semantic threat embeddings. The contribution is fourfold. (1) Semantic Abstraction (SA) channel: per organization, each round, the local gradient is summarized by an LLM, projected to a 768-dim embedding, L2-clipped, and Gaussian-noised before any numeric quantity leaves the host. The bottleneck reduces the per-element noise scale from O(√(d_model )) to O(√m) with m=768≪d_model≈3×10⁵. (2) Formal privacy analysis: the SA+DP cascade satisfies (ε,δ)-DP and bounds per-round mutual-information leakage by min{T_toklog₂V, m⁄2 log₂ (1+C²/(mσ²))} (Theorem 1), with Rényi composition over T federation rounds (Theorem 2). (3) Byzantine-resilient ClippedClustering aggregator combining L2 clipping with cosine-similarity clustering. (4) Hierarchical MARL policy with threat-profile-aware LLM-IRR reward shaping, wired end-to-end and disclosed honestly (the LLM call is currently stubbed with a deterministic projection for reproducibility). We evaluate on CybORG CAGE-4 with n=5 organizations, 30 federation rounds × 5 episodes × 100 steps per round. The SA channel adds statistical-zero utility cost vs. no-privacy baseline: SA-only Δreward = -0.66 (t=+0.31, NS), dual SA + Weight-DP Δreward = +1.90 (t=-0.71, NS), all N=5 seeds, all |t|< 1.3. A controlled signal/noise probe confirms a 19.58× improvement of SA over Weight-DP at fixed DP budget — matching the predicted √(d/m)≈19.8. Under Byzantine sign_flip at 30% (N=15), ClippedClustering is directionally strongest (F₁=0.025 vs FedAvg 0.020, Krum 0.016) but the edge is not statistically significant (CC vs Krum t=+1.59, p=0.15, d=+0.58; the earlier N=5 “3.4×” gap was small-sample optimism, §5.2); its decisive Byzantine win is the harsher random_noise attack, where FedAvg diverges to NaN and Krum collapses to 0.002 while ClippedClustering survives at 0.020 (§5.7, Cohen’s d=+3.77). The cooperative-PPO family (MAPPO, IPPO) outperforms value/actor-critic (QMIX, MADDPG) by ≈20 reward units, p< 0.001. All host-level F1 values stay below 0.05 at the 15K-step training horizon used here; the relative claims of the paper (privacy zero-cost, ClippedClustering’s decisive Byzantine win on the harshest attacks per §5.7, cooperative-PPO dominance) are unaffected by this scope. A 200K-step long-horizon replication (§6.3 L1) lifts F₁ above the 15K plateau (to ≈0.044, N=5) — confirming that horizon, not the privacy/Byzantine machinery, gates absolute accuracy — but a finer 60-checkpoint run shows the climb is volatile and non-monotonic and does not reach deployment-grade, an honest stability-not-compute limitation. We release all 141 raw run JSON outputs (Phases 1–3, the L4 backend comparison, and the algorithm/aggregator baselines), the figures, and analysis scripts for replication.

Keywords:

federated learning

;

multi-agent reinforcement learning

;

differential privacy

;

Byzantine-resilient aggregation

;

semantic abstraction

;

cyber defense

;

LLM-driven reward refinement

;

threat intelligence

Subject:

Computer Science and Mathematics - Security Systems

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

FedMARL-LTI: Federated Multi-Agent Reinforcement Learning with LLM-Driven Threat Intelligence for Cooperative Cyber Defense

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe