This research introduces an edge-optimized reinforcement learning (RL) ecosystem engineered for sustainable logistics in the blue economy, spanning maritime shipping, automated port operations, and offshore resource transportation. At its core, the system processes vast streams of real-time data from IoT sensors embedded in vessels, buoys, and drones directly at edge nodes, bypassing cloud latency to enable instantaneous decision-making in unpredictable marine conditions like storms or currents. Carbon capture analytics, derived from spectroscopic sensors quantifying direct air capture (DAC) efficiency and CO2 sequestration rates on ships, dynamically adjusts RL reward functions to favour fuel-efficient paths that maximize emissions offsets, aligning with International Maritime Organization (IMO) mandates for net-zero operations by 2050. The framework exploits 6G networks' terabit speeds, sub-millisecond latency, and non-terrestrial network integration via low-earth-orbit satellites for seamless swarm intelligence orchestration. Autonomous agents unmanned surface vessels (USVs), aerial drones, and autonomous underwater vehicles (AUVs) exhibit flocking behaviour’s inspired by particle swarm optimization, sharing pheromone-like digital signals over holographic beamforming channels to collaboratively resolve complex tasks like dynamic routing, collision avoidance, and load redistribution. Methodologically, proximal policy optimization (PPO) algorithms facilitate stable, lightweight training on resource-constrained edge hardware, augmented by federated learning to aggregate insights across privacy-sensitive multi-operator fleets without central data pooling. Rigorous evaluations in NS-3 for 6G emulation and Gazebo for maritime physics reveal transformative gains: 42% reductions in carbon footprints, 65% lower end-to-end latency versus 5G-cloud hybrids, and 30% improvements in throughput under adverse weather. Scalability tests with 1000+ agents confirm robustness in GPS-denied zones, while ablation studies highlight the synergistic impact of carbon feedback and swarm coordination over siloed baselines like genetic algorithms or centralized RL. By embedding quantum-safe encryption for 6G links and digital twin interfaces for predictive maintenance, this ecosystem not only decarbonizes blue economy logistics but also sets a scalable blueprint for AI-driven sustainability in cyber-physical systems worldwide.