This study formulates collaborative large language model (LLM) agents as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and optimizes group behavior using centralized training with decentralized execution (CTDE). A group-relative policy optimization (GRPO) objective is introduced to jointly optimize solution quality, coordination consistency, and response latency. Experiments are conducted on collaborative writing and collaborative coding benchmarks comprising 6,000 multi-agent episodes with 2–4 agents per task. Compared with single-agent and prompt-only collaboration baselines, the proposed approach achieves a 3.1× reduction in task completion time, a 19.4% improvement in output consistency, and a 21.7% increase in coding test pass rate, demonstrating effective performance optimization under partial observability.