Submitted:
12 April 2026
Posted:
13 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction

- We propose WirelessLLM-Agent, a unified LLM-based agent framework that addresses multiple wireless communication decision-making tasks through semantic state serialization, multi-task adapter architecture, and a two-stage SFT-GRPO training paradigm.
- We design a MoE-LoRA-based multi-task adapter that enables parameter-efficient knowledge sharing across diverse wireless tasks while maintaining task-specific expertise, achieving superior performance with only 1.13M trainable parameters.
- We demonstrate through extensive experiments that WirelessLLM-Agent consistently outperforms existing methods across channel estimation, beamforming, task offloading, and cooperative caching scenarios, while exhibiting strong zero-shot generalization to unseen network configurations.
2. Related Work
2.1. LLM-Based Methods for Wireless Communication Optimization
2.2. Reinforcement Learning and Agent Frameworks for Wireless Networks
3. Method

3.1. Semantic State Serialization
3.1.1. Channel State Serialization
3.1.2. Network Topology Serialization
3.1.3. Task Request Serialization
3.2. Multi-Task Adapter Architecture
3.2.1. LoRA-Based Task Adapters
3.2.2. Mixture-of-Experts Gating
3.2.3. Task-Specific Output Heads
3.3. Two-Stage Training Paradigm
3.3.1. Stage 1: Supervised Fine-Tuning
3.3.2. Stage 2: GRPO Reinforcement Learning
3.3.3. Lookahead Collaborative Simulation
4. Experiments
4.1. Experimental Setup
4.2. Main Results
4.3. Ablation Study
4.4. Effectiveness of GRPO Training
4.5. Generalization to Unseen Scenarios
4.6. Scalability Analysis
4.7. Caching Performance under Different Capacities
4.8. Human Evaluation
| Method | Rationality | Interpretability | Adaptation |
| DQN | 2.8 | 2.1 | 2.5 |
| LLM4CP | 3.5 | 3.2 | 3.1 |
| SFT-7B | 3.2 | 3.8 | 2.8 |
| GRPO-7B | 3.8 | 3.6 | 3.5 |
| Ours | 4.3 | 4.1 | 4.2 |
5. Conclusion
References
- Wu, Q.; et al. A Contemporary Survey on 6G Wireless Networks: Potentials, Recent Advances, Technical Challenges and Future Trends. arXiv preprint arXiv:2306.08265 2023.
- Yang, N.; Zhong, H.; Zhang, H.; Berry, R. Vision-LLMs for Spatiotemporal Traffic Forecasting. arXiv preprint arXiv:2510.11282 2025.
- Alwarafy, A.; Abdallah, M.; et al. Deep Reinforcement Learning for Radio Resource Allocation and Management in Next Generation Heterogeneous Wireless Networks: A Survey. arXiv preprint arXiv:2106.00574 2021.
- Yang, N.; Fan, M.; Wang, W.; Zhang, H. Decision-Making Large Language Model for Wireless Communication: A Comprehensive Survey on Key Techniques. IEEE Communications Surveys & Tutorials 2025.
- Shao, J.; Tong, J.; Wu, Q.; Guo, W.; Li, Z.; Lin, Z.; Zhang, J. WirelessLLM: Empowering Large Language Models Towards Wireless Intelligence. IEEE Wireless Communications 2025.
- Liu, X.; Gao, S.; Liu, B.; Cheng, X.; Yang, L. LLM4WM: Adapting LLM for Wireless Multi-Tasking. IEEE Journal on Selected Areas in Communications 2025.
- Liang, L.; Ye, H.; Sheng, Y.; Wang, O.; Wang, J.; Jin, S.; Li, G.Y. LLMs for Wireless Communications: From Adaptation to Autonomy. arXiv preprint arXiv:2507.21524 2025.
- Yang, N.; Cheng, C.; Zhang, H. COMLLM: Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing. arXiv preprint arXiv:2604.07148 2026.
- Yang, N.; Wang, W.; Ouyang, L.; Zhang, H. Cooperative Edge Caching with Large Language Model in Wireless Networks. arXiv preprint arXiv:2602.13307 2026.
- Li, P.; Sun, J.; Lin, F.; Xing, S.; Fu, T.; Feng, S.; Ni, C.; Tu, Z. Traversal-as-policy: Log-distilled gated behavior trees as externalized, verifiable policies for safe, robust, and efficient agents. arXiv preprint arXiv:2603.05517 2026.
- Li, P.; Lin, F.; Xing, S.; Sun, J.; Zhang, D.; Yang, S.; Ni, C.; Tu, Z. Let the Abyss Stare Back Adaptive Falsification for Autonomous Scientific Discovery. arXiv preprint arXiv:2603.29045 2026.
- Wei, B.; Jiang, R.; Zhang, R.; Liu, Y.; Niyato, D.; et al. LLMs for Next-Generation Wireless Network Management: A Survey and Tutorial. arXiv preprint arXiv:2509.05946 2025.
- Wang, X.; Zhu, J.; Zhang, R.; Feng, L.; Niyato, D.; et al. Chain-of-Thought for Large Language Model-empowered Wireless Communications. arXiv preprint arXiv:2505.22320 2025.
- Maatouk, A.; et al. TeleQnA: A Benchmark Dataset to Assess Large Language Models in Telecommunications. arXiv preprint arXiv:2310.15051 2023.
- Chen, Y.; Li, R.; et al. Split Fine-Tuning for Large Language Models in Wireless Networks. IEEE Transactions on Wireless Communications 2025.
- Lin, Y.; Zhang, R.; Huang, W.; Wang, K.; Ding, Z.; So, D.K.; Niyato, D. Empowering LLMs in Wireless Communication: A Novel Dataset and Fine-Tuning Framework. arXiv preprint arXiv:2501.09631 2025.
- Zhao, Y.; et al. WiFo: Wireless Foundation Model for Channel Prediction. arXiv preprint arXiv:2412.08908 2024.
- Li, P.; Lin, F.; Xing, S.; Zheng, X.; Hong, X.; Yang, S.; Sun, J.; Tu, Z.; Ni, C. Bibagent: An agentic framework for traceable miscitation detection in scientific literature. arXiv preprint arXiv:2601.16993 2026.
- Yang, N.; Zhang, H.; Long, K.; Hsieh, H.Y.; Liu, J. Deep neural network for resource management in NOMA networks. IEEE Transactions on Vehicular Technology 2019, 69, 876–886.
- Tong, J.; Guo, W.; Shao, J.; Wu, Q.; Li, Z.; Lin, Z.; Zhang, J. WirelessAgent: Large Language Model Agents for Intelligent Wireless Networks. arXiv preprint arXiv:2505.01074 2025.
- Zhao, Z.; et al. Deep Multi-Agent Reinforcement Learning Based Cooperative Edge Caching. IEEE Transactions on Communications 2019.


| Method | CE↓ | CP↓ | BF↑ | Offload↓ | Cache↑ | Avg.Rank↓ |
| CNN | 0.119 | 0.125 | 0.356 | 3.40 | 0.508 | 5.2 |
| LSTM | 1.000 | 0.161 | - | 3.52 | - | 6.1 |
| Cross-stitch | 0.157 | 0.112 | 0.858 | - | - | 4.0 |
| LLM4CP | 0.106 | 0.106 | 0.682 | 3.12 | 0.531 | 3.3 |
| DQN | - | - | - | 3.40 | - | 5.8 |
| DDPG | - | - | - | - | 0.508 | 5.5 |
| GRPO-7B | - | - | - | 3.12 | 0.531 | 3.0 |
| Ours | 0.098 | 0.101 | 0.912 | 2.95 | 0.558 | 1.0 |
| Configuration | Avg. Loss | Loss Increase |
| WirelessLLM-Agent (Full) | 0.082 | 0.00% |
| w/o MoE Gating | 0.091 | 10.98% |
| w/o LACS | 0.089 | 8.54% |
| w/o GRPO (SFT only) | 0.098 | 19.51% |
| w/o Adapter (Full Fine-tuning) | 0.108 | 31.71% |
| Frozen LLM | 0.095 | 15.85% |
| Training | Offloading (%) | Cache (2-BS) | Cache (5-BS) |
| SFT Only | 72.65 | 0.531 | 0.589 |
| GRPO Only | 89.20 | 0.525 | 0.581 |
| SFT+GRPO (Ours) | 96.86 | 0.558 | 0.620 |
| Method | =10 | =15 | =20 | =25 | =30 |
| FIFO | 0.289 | 0.371 | 0.440 | 0.501 | 0.555 |
| LRU | 0.488 | 0.589 | 0.669 | 0.729 | 0.771 |
| LFU | 0.501 | 0.598 | 0.674 | 0.728 | 0.771 |
| Exhaustive | 0.521 | 0.616 | 0.681 | 0.739 | 0.775 |
| SFT LLM | 0.531 | 0.612 | 0.675 | 0.731 | 0.764 |
| Ours | 0.554 | 0.634 | 0.695 | 0.748 | 0.782 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).