Maintaining persistent custody of dynamic ground targets using constellations of low Earth orbit (LEO) satellites is a critical capability for intelligence, surveillance, and reconnaissance (ISR) missions. Building upon our prior work using centralized PPO with stable-baselines3, this study presents an enhanced multi-agent formulation using PettingZoo ParallelEnv and Ray RLlib with a shared policy architecture. Key improvements include a larger effective field-of-view, slower target dynamics, richer per-agent observations incorporating tip density and velocity cues, and a refined reward structure that strongly incentivizes proactive tipping-and-cueing. The trained policy achieved 71.3% mean custody coverage over 500-step episodes, substantially outperforming random (28.6%) and greedy (38.1%) baselines. Analysis of handoff frequency and per-target performance demonstrates emergent cooperative behavior. These results highlight the value of modern multi-agent RL tooling for space domain awareness applications and provide a reproducible benchmark environment for future research.