Preprint
Article

This version is not peer-reviewed.

Multi-Agent Reinforcement Learning for Persistent Satellite Custody of Moving Ground Targets

Submitted:

30 May 2026

Posted:

01 June 2026

You are already at the latest version

Abstract
Maintaining persistent custody of dynamic ground targets using constellations of low Earth orbit (LEO) satellites is a critical capability for intelligence, surveillance, and reconnaissance (ISR) missions. Building upon our prior work using centralized PPO with stable-baselines3, this study presents an enhanced multi-agent formulation using PettingZoo ParallelEnv and Ray RLlib with a shared policy architecture. Key improvements include a larger effective field-of-view, slower target dynamics, richer per-agent observations incorporating tip density and velocity cues, and a refined reward structure that strongly incentivizes proactive tipping-and-cueing. The trained policy achieved 71.3% mean custody coverage over 500-step episodes, substantially outperforming random (28.6%) and greedy (38.1%) baselines. Analysis of handoff frequency and per-target performance demonstrates emergent cooperative behavior. These results highlight the value of modern multi-agent RL tooling for space domain awareness applications and provide a reproducible benchmark environment for future research.
Keywords: 
;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated