Multi-Agent Reinforcement Learning for Persistent Satellite Custody of Moving Ground Targets

David Schuster

doi:10.20944/preprints202606.0109.v1

Submitted:

30 May 2026

Posted:

01 June 2026

You are already at the latest version

Abstract

Maintaining persistent custody of dynamic ground targets using constellations of low Earth orbit (LEO) satellites is a critical capability for intelligence, surveillance, and reconnaissance (ISR) missions. Building upon our prior work using centralized PPO with stable-baselines3, this study presents an enhanced multi-agent formulation using PettingZoo ParallelEnv and Ray RLlib with a shared policy architecture. Key improvements include a larger effective field-of-view, slower target dynamics, richer per-agent observations incorporating tip density and velocity cues, and a refined reward structure that strongly incentivizes proactive tipping-and-cueing. The trained policy achieved 71.3% mean custody coverage over 500-step episodes, substantially outperforming random (28.6%) and greedy (38.1%) baselines. Analysis of handoff frequency and per-target performance demonstrates emergent cooperative behavior. These results highlight the value of modern multi-agent RL tooling for space domain awareness applications and provide a reproducible benchmark environment for future research.

Keywords:

reinforcement learning

;

multi-agent RL

;

satellite custody

;

PPO

;

RLlib

;

PettingZoo

;

space domain awareness

;

ISR

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Multi-Agent Reinforcement Learning for Persistent Satellite Custody of Moving Ground Targets

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe