Submitted:
07 April 2026
Posted:
08 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Federated Learning (FL) [1]: As a leading paradigm for distributed machine learning, FL allows edge devices to collaboratively train a shared model while keeping their raw data localized. Devices only share model updates (e.g., gradients) with a central server for aggregation, thus effectively preserving user data privacy and inherently addressing the Non-Independent and Identically Distributed (Non-IID) nature of edge data, a challenge often tackled by advanced FL approaches [2].
- Multi-Agent Partially Observable Markov Decision Processes (MA-POMDP) [3] and Deep Reinforcement Learning (DRL) [3]: In decentralized edge environments, each device or base station can be modeled as an intelligent agent with access only to local observations. MA-POMDP provides a robust framework for modeling collaborative decision-making problems under incomplete information, while DRL equips these agents with the ability to learn optimal policies directly from experience, making it suitable for complex, dynamic environments, especially for cognitive resource management [4].
- Non-Orthogonal Multiple Access (NOMA) [5] Resource Management: NOMA is a key enabler for 5G/6G, significantly enhancing spectral efficiency by allowing multiple users to transmit concurrently on the same time-frequency resources. Nevertheless, its sophisticated successive interference cancellation (SIC) mechanisms and the intricate multi-user scheduling problem transform resource management tasks—such as sub-channel allocation, power control, and task offloading—into an NP-hard optimization challenge. This necessitates intelligent, end-to-end optimization solutions, often employing deep neural networks [6].

- We propose FedHeRM, a novel federated heterogeneity-aware reinforcement learning framework, specifically designed for secure and efficient NOMA resource management in dynamic vehicular edge networks, addressing the complex joint optimization of user association, power allocation, and task offloading.
- We introduce a Heterogeneity-aware Federated Multi-Agent Reinforcement Learning (HeRA-FMARL) framework, featuring a Dynamic Heterogeneity Measure (DHM) and an adaptive weighted aggregation strategy. This mechanism significantly accelerates global model convergence and enhances generalization by selectively leveraging high-contributing and representative local model updates from heterogeneous edge agents.
- We integrate robust privacy-preserving mechanisms, including Differential Privacy (DP) for local model updates and Secure Aggregation (SA) protocols at the federal server. This ensures strong data privacy for individual edge agents without compromising the efficacy of collaborative learning in FMARL for NOMA resource management.
2. Related Work
2.1. Federated Multi-Agent Reinforcement Learning and Privacy-Preserving AI
2.2. Resource Management and NOMA in Edge Computing
3. Method
3.1. System Model and Problem Formulation
- NOMA User Grouping: Assigning vehicles in to specific NOMA groups on available sub-channels , denoted by . This determines which users share a sub-channel using NOMA principles.
- Sub-channel Allocation: Allocating sub-channels to NOMA groups, represented by a binary variable , where 1 indicates vehicle k is allocated sub-channel c by RSU m.
- Power Control: Determining the transmission power for each vehicle k on sub-channel c. This is a continuous variable, subject to maximum power constraints, i.e., .
- Task Partial Offloading Ratio: Deciding the fraction of vehicle k’s task to be offloaded to RSU m, denoted by . The remaining is processed locally by the vehicle.
3.2. Heterogeneity-aware Federated Multi-Agent Reinforcement Learning (HeRA-FMARL)

3.2.1. Local Agent Training
3.2.2. Dynamic Heterogeneity Measure (DHM)
3.2.3. Adaptive Weighted Aggregation Strategy
3.2.4. Privacy-Preserving Aggregation Mechanisms
3.3. NOMA Resource Joint Optimization
4. Experiments
4.1. Experimental Setup
4.2. Evaluation Metrics
- System Sum Throughput (Gbps): The total volume of data successfully transmitted and processed across all vehicles and RSUs per unit time. This measures the overall network capacity utilization.
- Average Task Completion Latency (ms): The average time taken from a task’s generation to its successful completion, encompassing transmission, computation, and queuing delays. Lower latency indicates better responsiveness.
- Average System Energy Consumption (mJ/task): The average energy consumed by all participating vehicles (for local computation and transmission) and RSUs (for computation and communication) per completed task. This reflects resource efficiency.
- User Fairness Index (Jain’s Fairness): Quantifies the equity of resource allocation among vehicles, calculated using Jain’s Fairness Index on achieved throughputs. A value closer to 1 indicates fairer resource distribution.
- Model Convergence Speed (Federated Rounds): The number of global aggregation rounds required for the federated model to reach a stable and satisfactory performance level (e.g., 90% of its peak throughput). Faster convergence indicates better learning efficiency.
- Privacy Preservation Strength: A qualitative measure (or quantitative for DP based on parameters) indicating the degree to which individual vehicle data and model updates are protected from inference by other agents or the federal server.
- Scalability: A qualitative assessment of the framework’s ability to maintain performance and efficiency as the number of vehicles and RSUs increases.
4.3. Baseline Algorithms
- Centralized MARL (C-MARL): This serves as an upper bound, representing an ideal scenario where a single central agent has complete global observation and control over all RSUs and vehicles. It uses a centralized DRL algorithm (e.g., MADDPG) to optimize NOMA resource management, assuming no privacy constraints or communication overheads.
- DRL-NOMA: A centralized Deep Reinforcement Learning approach specifically tailored for NOMA resource management, adapted from existing literature [13]. Each RSU operates its DRL agent, but without any federated learning or explicit privacy-preserving mechanisms. Decisions are made locally based on local observations without global coordination or model sharing.
- Fed-MARL (Standard): This baseline incorporates the fundamental principles of Federated Multi-Agent Reinforcement Learning [10,12]. Local DRL agents (RSUs) train independently and periodically upload their model updates to a central server, which then performs standard FedAvg aggregation (equal weighting or proportional to client data size) and distributes the global model back. It includes basic Differential Privacy but lacks heterogeneity-aware aggregation and advanced secure aggregation protocols found in FedHeRM.
- Heuristic-based Resource Allocation (HRA): A traditional, non-learning based approach. It employs greedy heuristics for sub-channel allocation (e.g., allocating channels to users with best CSI), round-robin scheduling for NOMA user grouping, and fixed power control policies, along with a threshold-based offloading strategy. This baseline represents conventional non-AI resource management.
4.4. Performance Comparison
4.5. Ablation Study
- FedHeRM-DHM: A variant where the Dynamic Heterogeneity Measure (DHM) is removed, and the federal server uses traditional FedAvg (uniform weighting) for model aggregation.
- FedHeRM-SA: A variant where the Secure Aggregation (SA) protocol is disabled, meaning the federal server can directly observe individual (DP-noisy) model updates. Differential Privacy (DP) is still applied.
- FedHeRM-DP: A variant where Differential Privacy (DP) is removed from local model updates, but Secure Aggregation (SA) is still active, protecting individual updates from the server.
4.6. Human Evaluation (Proxy User Experience Metrics)
- Perceived Latency and Application Responsiveness: Directly linked to the Average Task Completion Latency, FedHeRM’s low latency translates into a superior user experience for latency-sensitive applications (e.g., autonomous driving, AR/VR) with scores of 4.7 and 4.8, respectively.
- Network Reliability: The robust and adaptive nature of FedHeRM, particularly its handling of heterogeneity and dynamic resource management, contributes to more stable and reliable network services, indicated by a score of 4.4.
- Data Privacy Assurance: Due to its strong privacy-preserving mechanisms (DP and SA), FedHeRM achieves the highest score of 4.9, ensuring user confidence in data security, a crucial factor in adopting edge intelligence solutions.
- Operator Workload: As an autonomous and intelligent resource management framework, FedHeRM significantly reduces the need for manual configuration and optimization, leading to a lower operator workload (score of 4.5). In contrast, heuristic approaches often require constant tuning, resulting in higher workload.
4.7. Analysis of Heterogeneity Handling
4.8. Privacy-Utility Trade-off Analysis
4.9. Scalability and Robustness Analysis
4.10. Computational Overhead Analysis
5. Conclusions
Conflicts of Interest
References
- Lin, B.Y.; He, C.; Ze, Z.; Wang, H.; Hua, Y.; Dupuy, C.; Gupta, R.; Soltanolkotabi, M.; Ren, X.; Avestimehr, S. FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022. Association for Computational Linguistics, 2022, pp. 157–175. [CrossRef]
- Yang, N.; Yuan, X.; Lin, H.; Zhang, H.; Lyu, P.; Wang, J. FedDM: Federated Learning Incorporating Dissimilarity Measure for Mobile Edge Computing Systems. IEEE Transactions on Cognitive Communications and Networking 2025.
- Sun, H.; Zhong, J.; Ma, Y.; Han, Z.; He, K. TimeTraveler: Reinforcement Learning for Temporal Knowledge Graph Forecasting. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021, pp. 8306–8319. [CrossRef]
- Yang, N.; Zhang, H.; Berry, R. Partially observable multi-agent deep reinforcement learning for cognitive resource management. In Proceedings of the GLOBECOM 2020-2020 IEEE Global Communications Conference. IEEE, 2020, pp. 1–6.
- Gu, J.; Kong, X. Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 2021, pp. 120–133. [CrossRef]
- Yang, N.; Zhang, H.; Long, K.; Hsieh, H.Y.; Liu, J. Deep neural network for resource management in NOMA networks. IEEE Transactions on Vehicular Technology 2019, 69, 876–886.
- Yi, J.; Wu, F.; Wu, C.; Liu, R.; Sun, G.; Xie, X. Efficient-FedRec: Efficient Federated Learning Framework for Privacy-Preserving News Recommendation. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021, pp. 2814–2824. [CrossRef]
- Fornaciari, T.; Uma, A.; Paun, S.; Plank, B.; Hovy, D.; Poesio, M. Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021, pp. 2591–2597. [CrossRef]
- Anil, R.; Ghazi, B.; Gupta, V.; Kumar, R.; Manurangsi, P. Large-Scale Differentially Private BERT. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, 2022, pp. 6481–6491. [CrossRef]
- Zhang, S.; Yang, Z.; Yang, J.; Huang, Y. Provably Secure Generative Linguistic Steganography. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 2021, pp. 3046–3055. [CrossRef]
- Niu, G.; Li, B.; Zhang, Y.; Pu, S. CAKE: A Scalable Commonsense-Aware Framework For Multi-View Knowledge Graph Completion. In Proceedings of the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2022, pp. 2867–2877. [CrossRef]
- Andong, F.J.E.N.; Min, Q. Federated Multi-Agent Reinforcement Learning for Privacy-Preserving and Energy-Aware Resource Management in 6G Edge Networks. CoRR 2025. [CrossRef]
- Rahmati, M.; Nadeem, M.; Sadhu, V.; Pompili, D. UW-MARL: Multi-Agent Reinforcement Learning for Underwater Adaptive Sampling using Autonomous Vehicles. CoRR 2019.
- Mao, W.; Zhang, K.; Miehling, E.; Basar, T. Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the 59th IEEE Conference on Decision and Control, CDC 2020, Jeju Island, South Korea, December 14-18, 2020. IEEE, 2020, pp. 6124–6131. [CrossRef]
- Li, H.; Guo, D.; Fan, W.; Xu, M.; Huang, J.; Meng, F.; Song, Y. Multi-step Jailbreaking Privacy Attacks on ChatGPT. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, 2023, pp. 4138–4153. [CrossRef]
- Mireshghallah, F.; Goyal, K.; Uniyal, A.; Berg-Kirkpatrick, T.; Shokri, R. Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2022, pp. 8332–8347. [CrossRef]
- Zhang, W.; Deng, Y.; Liu, B.; Pan, S.; Bing, L. Sentiment Analysis in the Era of Large Language Models: A Reality Check. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024. Association for Computational Linguistics, 2024, pp. 3881–3906. [CrossRef]
- Xu, X.; Tu, W.; Yang, Y. CASE-Net: Integrating local and non-local attention operations for speech enhancement. Speech Communication 2023, 148, 31–39.
- Xu, X.; Tu, W.; Yang, Y. Pcnn: A lightweight parallel conformer neural network for efficient monaural speech enhancement. arXiv preprint arXiv:2307.15251 2023.
- Xu, X.; Wang, Y.; Xu, D.; Peng, Y.; Zhang, C.; Jia, J.; Chen, B. Vsegan: Visual speech enhancement generative adversarial network. In Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7308–7311.
- Wang, Y.; Wang, C.; Li, R.; Lin, H. On the Use of Bert for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation. In Proceedings of the Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2022, pp. 3416–3425. [CrossRef]
- Chen, S.; Aguilar, G.; Neves, L.; Solorio, T. Data Augmentation for Cross-Domain Named Entity Recognition. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021, pp. 5346–5356. [CrossRef]
- Hu, X.; Zhang, C.; Yang, Y.; Li, X.; Lin, L.; Wen, L.; Yu, P.S. Gradient Imitation Reinforcement Learning for Low Resource Relation Extraction. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021, pp. 2737–2746. [CrossRef]
- Tang, Z.; Lei, J.; Bansal, M. DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021, pp. 2415–2426. [CrossRef]
- Wei, L.; Hu, D.; Zhou, W.; Yue, Z.; Hu, S. Towards Propagation Uncertainty: Edge-enhanced Bayesian Graph Convolutional Networks for Rumor Detection. In Proceedings of the Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 2021, pp. 3845–3854. [CrossRef]
- Qiao, S.; Ou, Y.; Zhang, N.; Chen, X.; Yao, Y.; Deng, S.; Tan, C.; Huang, F.; Chen, H. Reasoning with Language Model Prompting: A Survey. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2023, pp. 5368–5393. [CrossRef]
- Hedderich, M.A.; Lange, L.; Adel, H.; Strötgen, J.; Klakow, D. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021, pp. 2545–2568. [CrossRef]


| Evaluation Metric | C-MARL | DRL-NOMA | Fed-MARL | HRA | FedHeRM (Ours) |
|---|---|---|---|---|---|
| System Sum Throughput (Gbps) | 3.00 | 2.70 | 2.80 | 2.20 | 3.15 |
| Avg. Task Latency (ms) | 38 | 45 | 42 | 55 | 36 |
| Avg. System Energy (mJ/task) | 15 | 20 | 18 | 25 | 14 |
| User Fairness Index | 0.88 | 0.82 | 0.85 | 0.70 | 0.90 |
| Model Convergence (Rounds) | — | — | 180 | — | 150 |
| Privacy Preservation | None | None | Medium | N/A | Strong |
| Scalability | Limited | Medium | Good | Good | Excellent |
| Evaluation Metric | FedHeRM | FedHeRM-DHM | FedHeRM-SA | FedHeRM-DP |
|---|---|---|---|---|
| System Sum Throughput (Gbps) | 3.15 | 2.95 | 3.12 | 3.10 |
| Avg. Task Latency (ms) | 36 | 40 | 37 | 38 |
| Avg. System Energy (mJ/task) | 14 | 16 | 15 | 15 |
| User Fairness Index | 0.90 | 0.86 | 0.89 | 0.88 |
| Model Convergence (Rounds) | 150 | 190 | 155 | 160 |
| Privacy Preservation | Strong | Strong | Medium | Medium |
| Proxy UX Metric | C-MARL | DRL-NOMA | Fed-MARL | HRA | FedHeRM (Ours) |
|---|---|---|---|---|---|
| Perceived Latency | 4.5 | 3.8 | 4.0 | 3.0 | 4.7 |
| Application Responsiveness | 4.6 | 4.0 | 4.2 | 3.2 | 4.8 |
| Network Reliability | 4.2 | 3.5 | 3.8 | 3.0 | 4.4 |
| Data Privacy Assurance | 1.0 | 1.0 | 3.5 | N/A | 4.9 |
| Operator Workload | 2.0 | 2.5 | 3.5 | 4.0 | 4.5 |
| Evaluation Metric | |||
|---|---|---|---|
| System Sum Throughput (Gbps) | 3.15 | 5.80 | 10.50 |
| Avg. Task Latency (ms) | 36 | 38 | 40 |
| User Fairness Index | 0.90 | 0.88 | 0.87 |
| Model Convergence (Rounds) | 150 | 165 | 180 |
| Avg. Aggr. Time (ms/round) | 20 | 45 | 90 |
| Metric | FedHeRM | Fed-MARL (Standard) | FedHeRM-DHM-Privacy |
|---|---|---|---|
| A. L. Tr. T. (ms/epoch/agent) | 8.5 | 7.5 | 7.5 |
| A. Aggr. T. (ms/round) | 20 | 12 | 8 |
| T. FL E. (mJ/round) | 120 | 80 | 60 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).