Preprint
Article

This version is not peer-reviewed.

Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization

Submitted:

06 February 2026

Posted:

13 February 2026

You are already at the latest version

Abstract
This study formulates collaborative large language model (LLM) agents as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and optimizes group behavior using centralized training with decentralized execution (CTDE). A group-relative policy optimization (GRPO) objective is introduced to jointly optimize solution quality, coordination consistency, and response latency. Experiments are conducted on collaborative writing and collaborative coding benchmarks comprising 6,000 multi-agent episodes with 2–4 agents per task. Compared with single-agent and prompt-only collaboration baselines, the proposed approach achieves a 3.1× reduction in task completion time, a 19.4% improvement in output consistency, and a 21.7% increase in coding test pass rate, demonstrating effective performance optimization under partial observability.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated