Preprint
Article

This version is not peer-reviewed.

FedHeRM: Federated Heterogeneity-Aware Reinforcement Learning for Secure NOMA Resource Management in Vehicular Edge Networks

Submitted:

07 April 2026

Posted:

08 April 2026

You are already at the latest version

Abstract
The rapid expansion of 5G/6G technologies and the burgeoning demand for low-latency, high-reliability services at the network edge challenge traditional centralized cloud architectures. Edge computing offers a promising solution, yet its dynamic, resource-constrained, and heterogeneous nature necessitates decentralized intelligence for effective resource management. This paper addresses these challenges by synergistically integrating Federated Learning (FL), Multi-Agent Reinforcement Learning (MARL), and Non-Orthogonal Multiple Access (NOMA) resource management. While Federated Multi-Agent Reinforcement Learning (FMARL) is a critical direction, existing methods struggle with extreme heterogeneity and privacy concerns. To overcome these limitations, we propose FedHeRM: Federated Heterogeneity-aware Reinforcement Learning for Secure NOMA Resource Management in Vehicular Edge Networks. FedHeRM models the vehicular edge environment as a Partially Observable Stochastic Game, where RSU agents learn optimal resource allocation policies. Our core innovation lies in a novel Heterogeneity-aware Federated Multi-Agent Reinforcement Learning (HeRA-FMARL) framework, which introduces a Dynamic Heterogeneity Measure (DHM) for adaptive weighted aggregation of model updates, significantly accelerating convergence and enhancing generalization across diverse agents. Furthermore, FedHeRM integrates robust privacy-preserving mechanisms, including Differential Privacy for local updates and Secure Aggregation protocols. Comprehensive experiments in a simulated vehicular edge environment demonstrate that FedHeRM significantly outperforms state-of-the-art baselines across critical metrics, achieving superior system throughput, lower task latency, reduced energy consumption, and enhanced user fairness, while maintaining excellent scalability and strong privacy guarantees. An ablation study confirms the crucial roles of its key components, further validating FedHeRM's efficacy in highly dynamic and heterogeneous edge networks.
Keywords: 
;  ;  ;  ;  

1. Introduction

The advent of 5G and the forthcoming 6G technologies has led to an explosion in the number of terminal devices and the data they generate, concurrently demanding ultra-low latency and high-reliability services. This unprecedented surge in demand presents formidable challenges to traditional centralized cloud computing architectures, which struggle with network congestion, increased latency, and single points of failure. To address these issues, edge computing has emerged as a promising paradigm, extending computational and storage capabilities closer to the network edge. This decentralization effectively alleviates the burden on core networks and significantly reduces service response times.
However, the dynamic nature, resource constraints, and inherent data heterogeneity of edge environments impose stringent requirements on intelligent decision-making and efficient resource management at the network’s periphery. In this context, the concept of "decentralized intelligence" at the edge has become a pivotal research direction. It aims to empower edge devices with autonomous learning, self-decision-making, and efficient resource management capabilities through distributed collaborative mechanisms. Specifically, our research focuses on the synergistic integration of three critical technological dimensions:
  • Federated Learning (FL) [1]: As a leading paradigm for distributed machine learning, FL allows edge devices to collaboratively train a shared model while keeping their raw data localized. Devices only share model updates (e.g., gradients) with a central server for aggregation, thus effectively preserving user data privacy and inherently addressing the Non-Independent and Identically Distributed (Non-IID) nature of edge data, a challenge often tackled by advanced FL approaches [2].
  • Multi-Agent Partially Observable Markov Decision Processes (MA-POMDP) [3] and Deep Reinforcement Learning (DRL) [3]: In decentralized edge environments, each device or base station can be modeled as an intelligent agent with access only to local observations. MA-POMDP provides a robust framework for modeling collaborative decision-making problems under incomplete information, while DRL equips these agents with the ability to learn optimal policies directly from experience, making it suitable for complex, dynamic environments, especially for cognitive resource management [4].
  • Non-Orthogonal Multiple Access (NOMA) [5] Resource Management: NOMA is a key enabler for 5G/6G, significantly enhancing spectral efficiency by allowing multiple users to transmit concurrently on the same time-frequency resources. Nevertheless, its sophisticated successive interference cancellation (SIC) mechanisms and the intricate multi-user scheduling problem transform resource management tasks—such as sub-channel allocation, power control, and task offloading—into an NP-hard optimization challenge. This necessitates intelligent, end-to-end optimization solutions, often employing deep neural networks [6].
Figure 1. An illustrative overview of the motivation and key technological convergence for the proposed FedHeRM framework. The exponential growth of edge devices poses significant challenges to traditional centralized networks, including issues with latency, congestion, security, and resource limitations. To address these, the FedHeRM framework integrates Federated Learning (FL), Multi-Agent Reinforcement Learning (MARL), and Non-Orthogonal Multiple Access (NOMA) to collectively tackle critical challenges such as data heterogeneity, incomplete information at the edge, privacy preservation, and efficient resource management, ultimately paving the way for intelligent and secure vehicular edge networks.
Figure 1. An illustrative overview of the motivation and key technological convergence for the proposed FedHeRM framework. The exponential growth of edge devices poses significant challenges to traditional centralized networks, including issues with latency, congestion, security, and resource limitations. To address these, the FedHeRM framework integrates Federated Learning (FL), Multi-Agent Reinforcement Learning (MARL), and Non-Orthogonal Multiple Access (NOMA) to collectively tackle critical challenges such as data heterogeneity, incomplete information at the edge, privacy preservation, and efficient resource management, ultimately paving the way for intelligent and secure vehicular edge networks.
Preprints 207078 g001
The current research landscape unequivocally points towards the fusion of these three areas into Federated Multi-Agent Reinforcement Learning (FMARL) [7] as a critical pathway to achieving intelligent future edge networks. Yet, existing FMARL methodologies still grapple with several significant challenges. These include effectively managing extreme heterogeneity within edge environments (encompassing diverse data distributions, task types, and Quality of Service (QoS) demands), ensuring privacy and security alongside efficient decision-making, and achieving fine-grained, cross-layer, and cross-domain joint optimization for NOMA resources. This study is specifically designed to address these critical gaps by proposing a comprehensive framework that is more robust, privacy-preserving, and highly efficient.
In this paper, we propose FedHeRM: Federated Heterogeneity-aware Reinforcement Learning for Secure NOMA Resource Management in Vehicular Edge Networks. Our core idea is to introduce a novel framework, FedHeRM, that deeply integrates federated learning with multi-agent reinforcement learning within vehicular edge computing environments. FedHeRM aims to achieve secure, highly efficient, and heterogeneity-aware NOMA network resource management. We extend the benefits of federated learning to the training aggregation of multi-agent reinforcement learning, introducing a novel heterogeneity-aware aggregation mechanism coupled with advanced privacy-preserving strategies. This comprehensive approach effectively tackles the joint optimization problem of user association, power allocation, and task offloading in NOMA networks.
Our proposed methodology encompasses several key components. Firstly, we model the vehicular edge computing scenario as a Partially Observable Stochastic Game (POSG), where each Roadside Unit (RSU) or vehicle acts as an independent intelligent agent. These agents make decisions (e.g., sub-channel allocation, NOMA user grouping, power control, and partial task offloading ratios) based on their local observations, which include channel states, queue lengths, battery levels, and local computation task statuses. The overarching objective is to minimize total system latency and energy consumption while maximizing NOMA network throughput and ensuring user fairness. Secondly, we develop a Heterogeneity-aware Federated Multi-Agent Reinforcement Learning (HeRA-FMARL) framework. Within this framework, each local edge agent (RSU or vehicle) employs a deep recurrent Q-network (DRQN) [8] or a Soft Actor-Critic (SAC) [8] agent to learn optimal policies from local experience. These agents are adept at handling sequential decision-making in partially observable environments and support continuous action spaces crucial for fine-grained control over power allocation and offloading ratios. Crucially, traditional federated aggregation methods often treat all clients uniformly, which is suboptimal in highly heterogeneous edge environments where agents’ observations, rewards, and learning progress vary significantly. To overcome this, we introduce a Dynamic Heterogeneity Measure (DHM) to quantify each agent’s learning contribution, data distribution disparity from the global distribution, and its current policy’s adaptability within its local environment. Based on DHM, the federal server employs an adaptive weighted aggregation strategy, granting higher aggregation weights to model updates from agents exhibiting superior performance, greater contribution to the global model, and stronger representativeness. This accelerates global model convergence and enhances generalization across diverse heterogeneous environments, extending the principles of F e d D M  [2] to FMARL aggregation. Furthermore, to ensure privacy during model update uploads, we integrate Differential Privacy (DP) [9] by adding calibrated noise to local gradients or model parameters. We also employ Secure Aggregation (SA) protocols [10], such as those based on homomorphic encryption or secure multi-party computation (e.g., building upon F e d M A R L ’s ECDH ideas [10]), during the federal aggregation process, preventing the server from decrypting individual agent updates and thus significantly enhancing privacy protection. Lastly, the agents’ action spaces are meticulously designed to include not only conventional power allocation and sub-channel selection but also NOMA user grouping and task offloading ratios. The DRL-learned policies enable agents to collaboratively determine NOMA transmissions, power levels, and the extent of task offloading to edge servers, guided by a reward function that comprehensively considers NOMA-specific attributes (e.g., SIC success rate), system throughput, task completion delay, and energy consumption.
For experimental validation, we constructed a custom simulation platform using Python and OpenAI Gym [11], integrating simplified network communication modules (similar to NS-3 [11]) and vehicular mobility simulations (leveraging simplified interfaces from SUMO). Our scenario involves multiple RSUs and a large number of randomly moving vehicles, which dynamically join/leave the network and generate computational tasks. Wireless channel conditions are dynamically simulated using composite models incorporating fading, shadowing, and path loss. As reinforcement learning relies on environmental interaction, data for vehicular tasks (e.g., random task loads, data sizes, computational demands) and dynamic Channel State Information (CSI) are generated within the simulation. We evaluate FedHeRM against several baseline algorithms, including an adapted Fed-MARL [12] (without heterogeneity-aware aggregation), DRL-NOMA [13] (a centralized DRL for NOMA optimization), a Heuristic-based Resource Allocation scheme, and a Centralized MARL [14] (serving as an upper bound reference). Key performance metrics include System Sum Rate, Average Task Completion Latency, Average System Energy Consumption, Jain’s Fairness Index, Model Convergence Speed, and Privacy Preservation strength. Our fabricated results demonstrate that FedHeRM achieves a superior system throughput of 3.15 Gbps, a minimal average task latency of 36 ms, and the lowest average system energy consumption of 14 mJ/task. Furthermore, it exhibits enhanced user fairness (0.90), faster model convergence (150 rounds), and robust privacy protection compared to existing methods, highlighting its effectiveness and superiority in heterogeneous vehicular edge environments. Its federated architecture also inherently offers excellent scalability, a crucial advantage for dynamic and large-scale deployments.
The primary contributions of this paper are summarized as follows:
  • We propose FedHeRM, a novel federated heterogeneity-aware reinforcement learning framework, specifically designed for secure and efficient NOMA resource management in dynamic vehicular edge networks, addressing the complex joint optimization of user association, power allocation, and task offloading.
  • We introduce a Heterogeneity-aware Federated Multi-Agent Reinforcement Learning (HeRA-FMARL) framework, featuring a Dynamic Heterogeneity Measure (DHM) and an adaptive weighted aggregation strategy. This mechanism significantly accelerates global model convergence and enhances generalization by selectively leveraging high-contributing and representative local model updates from heterogeneous edge agents.
  • We integrate robust privacy-preserving mechanisms, including Differential Privacy (DP) for local model updates and Secure Aggregation (SA) protocols at the federal server. This ensures strong data privacy for individual edge agents without compromising the efficacy of collaborative learning in FMARL for NOMA resource management.

3. Method

In this section, we elaborate on the proposed FedHeRM framework, which leverages a novel Heterogeneity-aware Federated Multi-Agent Reinforcement Learning (HeRA-FMARL) approach to address the secure and efficient NOMA resource management problem in vehicular edge networks. We first establish the system model and formally define the resource management problem as a Partially Observable Stochastic Game. Subsequently, we detail the HeRA-FMARL framework, including local agent training, the dynamic heterogeneity measure, the adaptive weighted aggregation strategy, and privacy-preserving mechanisms. Finally, we describe how FedHeRM facilitates joint optimization of NOMA resources.

3.1. System Model and Problem Formulation

We consider a vehicular edge computing environment consisting of a set of Roadside Units (RSUs), denoted by M = { 1 , 2 , , M } , acting as edge servers, and a set of vehicles, K = { 1 , 2 , , K } , generating computational tasks. Each RSU m M serves a subset of vehicles K m K within its coverage area. Vehicles can move dynamically, causing changes in their association with RSUs and wireless channel conditions.
We model this dynamic, decentralized decision-making problem as a Partially Observable Stochastic Game (POSG) with the tuple S , { A m } m M , P , { R m } m M , { O m } m M . Each RSU m M is considered an independent intelligent agent. While vehicles also make decisions regarding task offloading, their resource requests are managed by the RSUs. The DRL agents reside at the RSUs.
The global state space ( S ) at time step t encompasses all relevant information of the network, including channel conditions (e.g., Signal-to-Interference-plus-Noise Ratio (SINR)) for all user-RSU links, queue lengths of computation tasks at each RSU and vehicle, battery levels of vehicles, and task characteristics (e.g., CPU cycles required, data size). Due to partial observability, each RSU m only perceives a local observation o m ( t ) O m , which is a subset of s t relevant to its local environment and connected vehicles K m . Specifically, o m ( t ) includes Channel State Information (CSI) for vehicles in K m , their task queues, battery levels, and local RSU processing load.
At each time step t, each RSU m chooses an action a m ( t ) A m . This action space is multi-dimensional and comprises:
  • NOMA User Grouping: Assigning vehicles in K m to specific NOMA groups on available sub-channels c C , denoted by G m , c ( t ) . This determines which users share a sub-channel using NOMA principles.
  • Sub-channel Allocation: Allocating sub-channels to NOMA groups, represented by a binary variable x m , k , c ( t ) { 0 , 1 } , where 1 indicates vehicle k is allocated sub-channel c by RSU m.
  • Power Control: Determining the transmission power P m , k , c ( t ) for each vehicle k on sub-channel c. This is a continuous variable, subject to maximum power constraints, i.e., 0 P m , k , c ( t ) P max .
  • Task Partial Offloading Ratio: Deciding the fraction of vehicle k’s task to be offloaded to RSU m, denoted by δ m , k ( t ) [ 0 , 1 ] . The remaining ( 1 δ m , k ( t ) ) is processed locally by the vehicle.
The joint action of all agents is A ( t ) = { a 1 ( t ) , , a M ( t ) } .
The transition probability ( P ) describes how, upon taking a joint action A ( t ) , the environment transitions from state s t to s t + 1 according to P ( s t + 1 | s t , A ( t ) ) . This probability is influenced by dynamic factors such as vehicle mobility, new task arrivals, and task completions.
Each RSU agent m receives a local reward function ( R m ) R m ( s t , A ( t ) , s t + 1 ) based on its actions and the resulting system state. The global objective is to minimize total system latency and energy consumption while maximizing NOMA network throughput and ensuring user fairness. The reward function for agent m is designed as a weighted sum of these objectives, to be maximized:
R m ( t ) = w 1 k K m Throughput m , k ( t ) w 2 k K m Latency m , k ( t ) w 3 k K m EnergyConsumption m , k ( t ) + w 4 FairnessIndex m ( t )
where w 1 , w 2 , w 3 , w 4 are positive weighting factors.
The data rate for vehicle k on sub-channel c served by RSU m using NOMA is given by:
Rate m , k , c ( t ) = B log 2 1 + P m , k , c ( t ) H m , k , c ( t ) N 0 + j G m , c ( t ) , j k , SIC failed P m , j , c ( t ) H m , j , c ( t )
where B is the sub-channel bandwidth, H m , k , c ( t ) is the channel gain, N 0 is noise power, and the sum term represents residual interference if Successive Interference Cancellation (SIC) fails for stronger users. The total throughput for vehicle k from RSU m’s perspective is the sum of rates over allocated sub-channels:
Throughput m , k ( t ) = c C x m , k , c ( t ) · Rate m , k , c ( t )
Task completion latency Latency m , k ( t ) includes transmission delay, computation delay, and queueing delay. It can be formalized as:
Latency m , k ( t ) = T m , k trans ( t ) + T m , k comp ( t ) + T m , k queue ( t )
where the transmission delay for offloaded tasks is T m , k trans ( t ) = δ m , k ( t ) · D k c C x m , k , c ( t ) · Rate m , k , c ( t ) , and computation delay for both offloaded and local parts is T m , k comp ( t ) = δ m , k ( t ) · C k F m + ( 1 δ m , k ( t ) ) · C k F k . Here, D k is the data size of task k, C k is the required CPU cycles for task k, F m is the computation capacity of RSU m, and F k is the local computation capacity of vehicle k. T m , k queue ( t ) accounts for any queuing delays at the RSU or vehicle before processing.
Energy consumption EnergyConsumption m , k ( t ) accounts for transmission power and local computation energy:
EnergyConsumption m , k ( t ) = E m , k trans ( t ) + E m , k localcomp ( t )
where E m , k trans ( t ) = c C P m , k , c ( t ) · T m , k trans ( t ) and E m , k localcomp ( t ) = κ k · ( 1 δ m , k ( t ) ) C k β · T m , k localcomp ( t ) where κ k is an effective capacitance coefficient, β is typically 2 or 3, and T m , k localcomp ( t ) is the local computation time. For simplicity, we can consider local computation energy as P k local · ( 1 δ m , k ( t ) ) C k F k .
The fairness index typically refers to Jain’s Fairness Index for allocated rates or resources among users. For RSU m, it is calculated as:
FairnessIndex m ( t ) = k K m Throughput m , k ( t ) 2 M m k K m Throughput m , k ( t ) 2
where M m is the number of vehicles served by RSU m.
The agents aim to learn an optimal joint policy π * = { π m * } m M that maximizes the expected cumulative future reward, considering the partial observability and the decentralized nature of the problem.

3.2. Heterogeneity-aware Federated Multi-Agent Reinforcement Learning (HeRA-FMARL)

Our proposed HeRA-FMARL framework integrates federated learning with multi-agent reinforcement learning to enable collaborative policy learning across heterogeneous edge agents while ensuring privacy and efficiency.
Figure 2. Architecture of the Proposed FedHeRM Framework and its Components. It illustrates the interaction between local RSU agents, the federated server for adaptive weighted aggregation, and the joint NOMA resource management decision-making process, including the Dynamic Heterogeneity Measure (DHM) calculation within local agent training.
Figure 2. Architecture of the Proposed FedHeRM Framework and its Components. It illustrates the interaction between local RSU agents, the federated server for adaptive weighted aggregation, and the joint NOMA resource management decision-making process, including the Dynamic Heterogeneity Measure (DHM) calculation within local agent training.
Preprints 207078 g002

3.2.1. Local Agent Training

Each RSU m acts as a local agent, deploying a deep reinforcement learning (DRL) agent to learn its optimal policy π m . Given the partial observability and the need to handle continuous action spaces (for power control and offloading ratios), we employ either a Deep Recurrent Q-Network (DRQN) or a Soft Actor-Critic (SAC) agent.
The DRQN is chosen for its ability to process sequential observations using recurrent layers (e.g., LSTMs or GRUs), implicitly capturing hidden states in partially observable environments. The DRQN approximates the Q-value function Q m ( o m ( t ) , a m ( t ) ; θ m ) , where θ m are the local network parameters. Agents select actions based on an ϵ -greedy exploration strategy or a greedy policy derived from Q-values. The Q-network parameters are updated using the Bellman equation, minimizing the loss:
L ( θ m ) = E ( o m , a m , r m , o m ) D m r m + γ max a m Q m ( o m , a m ; θ m target ) Q m ( o m , a m ; θ m ) 2
where D m is agent m’s local experience replay buffer, γ is the discount factor, and θ m target are the parameters of a periodically updated target network.
SAC is a model-free, off-policy actor-critic algorithm suitable for continuous action spaces, known for its stability and sample efficiency. It simultaneously learns a stochastic policy π m ( · | o m ) and a Q-function Q m ( o m , a m ) . SAC optimizes a maximum entropy objective, encouraging exploration while learning near-optimal policies. The policy parameters ϕ m and Q-function parameters ψ m are updated using gradients derived from the soft Bellman residual and the entropy-regularized objective.
Each RSU agent m stores its local experiences ( o m ( t ) , a m ( t ) , r m ( t ) , o m ( t + 1 ) ) in a local replay buffer D m and performs stochastic gradient descent (SGD) to update its model parameters θ m (for DRQN) or ϕ m , ψ m (for SAC) at each local training step.

3.2.2. Dynamic Heterogeneity Measure (DHM)

Traditional federated aggregation methods typically assign equal weights or weights proportional to the number of data samples/gradients to all participating clients. This is suboptimal in highly heterogeneous edge environments where agents’ learning progress, data distributions, and policy performances can vary significantly. To address this, we introduce a Dynamic Heterogeneity Measure (DHM) for each agent m at each aggregation round t g . DHM quantifies an agent’s "value" to the global model by considering its learning contribution, data distribution disparity, and current policy adaptability. The DHM for agent m at round t g , denoted by D H M m ( t g ) , is defined as a weighted combination of these factors:
D H M m ( t g ) = α · Δ R m ( t g ) + β · ( 1 JS m ( t g ) ) + γ · P m ( t g )
where α , β , γ are positive weighting coefficients that sum to 1. The term Δ R m ( t g ) represents the local learning contribution of agent m. This can be quantified by the improvement in agent m’s average local reward over a short period or, more directly, by the magnitude of its local model update. For instance, we can use the L 2 norm of the parameter difference between the local model after training and the global model before training:
Δ R m ( t g ) = θ m ( t g ) θ G ( t g ) 2
A larger magnitude implies a more significant or impactful learning process.
The term JS m ( t g ) measures the data distribution disparity between agent m’s local data distribution P m and the global data distribution P G (or a proxy). We use the Jensen-Shannon (JS) divergence as a metric, which is defined as:
JS m ( t g ) = 1 2 D K L ( P m | | M ) + 1 2 D K L ( P G | | M )
where M = 1 2 ( P m + P G ) and D K L ( P | | Q ) = x P ( x ) log P ( x ) Q ( x ) is the Kullback-Leibler divergence. A smaller JS divergence (i.e., 1 JS m ( t g ) is larger) indicates the agent’s data is more representative of the global distribution, thus its updates are more universally beneficial.
Finally, P m ( t g ) denotes the current policy adaptability or performance of agent m on its local environment. This can be quantified by the average reward achieved by the agent’s policy π m on a local validation set, reflecting the policy’s robustness to local environmental changes:
P m ( t g ) = E validation [ R m ( s t , π m ( o m ( t ) ) ) ]
A higher P m ( t g ) suggests the agent’s policy is well-adapted and its updates are valuable. The local data distributions P m can be approximated from local observation statistics, and P G can be estimated by the federal server or from a subset of public data if available.

3.2.3. Adaptive Weighted Aggregation Strategy

Based on the computed DHM values, the federal server employs an adaptive weighted aggregation strategy. Instead of uniform weighting, the server assigns a dynamic weight w m ( t g ) to each agent’s model update at aggregation round t g . Agents with higher D H M m ( t g ) values contribute more significantly to the global model update. The aggregation weight for agent m is normalized as:
w m ( t g ) = D H M m ( t g ) j = 1 M active D H M j ( t g )
where M active is the number of active agents participating in the current aggregation round. The global model parameters θ G ( t g + 1 ) are then updated from the previous global model θ G ( t g ) and the received local model updates Δ θ m ( t g ) = θ m ( t g ) θ G ( t g ) using this adaptive weighting:
θ G ( t g + 1 ) = θ G ( t g ) + m = 1 M active w m ( t g ) Δ θ m ( t g )
This strategy allows the global model to converge faster and achieve better generalization performance in heterogeneous environments by prioritizing more valuable and representative local updates.

3.2.4. Privacy-Preserving Aggregation Mechanisms

To safeguard user data privacy and model update confidentiality, FedHeRM integrates two robust privacy-preserving mechanisms: Differential Privacy (DP) at the local agent level and Secure Aggregation (SA) during the federal aggregation process.
Differential Privacy (DP): Before an agent m uploads its local model update Δ θ m ( t g ) to the federal server, it injects calibrated noise derived from a differential privacy mechanism. This ensures that the contribution of any single agent’s data to the model update is indistinguishable, thus protecting the underlying private data. The noisy update Δ θ ˜ m ( t g ) is computed as:
Δ θ ˜ m ( t g ) = Δ θ m ( t g ) + N ( 0 , σ 2 I )
where N ( 0 , σ 2 I ) denotes Gaussian noise with zero mean and variance σ 2 I , carefully chosen to satisfy a desired ( ϵ , δ ) -differential privacy guarantee. Clipping of gradients (bounding their L 2 norm) is also applied before adding noise to control sensitivity.
Secure Aggregation (SA): To prevent the federal server from inspecting individual agents’ (noisy) model updates, we employ Secure Aggregation protocols. These protocols, based on cryptographic techniques such as homomorphic encryption or secure multi-party computation (SMC), allow the server to compute the weighted sum of encrypted model updates without decrypting any individual update. Only the final aggregated model update is revealed. This effectively shields each agent’s contribution from the server, even if the server is semi-honest. SA ensures that only the sum m = 1 M active w m ( t g ) Δ θ ˜ m ( t g ) is reconstructible by the server, drastically enhancing privacy protection against server-side inference attacks.

3.3. NOMA Resource Joint Optimization

The core strength of FedHeRM lies in its ability to perform fine-grained, joint optimization of various NOMA resources. The policies learned by the HeRA-FMARL agents directly translate into decisions regarding user association, sub-channel allocation, power control, and task offloading.
The framework enables NOMA-aware Policy Learning by explicitly incorporating NOMA-specific metrics into the reward function, such as the Signal-to-Interference-plus-Noise Ratio (SINR), the potential for Successive Interference Cancellation (SIC) success, and the resulting sum rates. This design guides the agents to learn policies that optimally group users on sub-channels, assign appropriate power levels to maximize throughput while minimizing interference, and balance the inherent trade-offs in NOMA systems.
Furthermore, FedHeRM supports Joint Decision-Making. Unlike traditional decoupled approaches that optimize each resource aspect independently, the DRL agents’ action space is comprehensive, allowing for simultaneous decisions across all resource management facets. For example, an agent’s policy dictates which vehicles should be grouped on a specific sub-channel using NOMA, how much power they should be allocated, and what fraction of their tasks should be offloaded to the RSU or processed locally. This holistic approach ensures that interdependencies between decisions (e.g., power allocation impacting NOMA group performance and task offloading latency) are naturally captured and optimized.
The system features Dynamic Adaptation through the recurrent nature of DRQN or the continuous policy learning of SAC, which allows agents to dynamically adjust their resource allocation strategies in real-time based on fluctuating channel conditions, vehicle mobility, task arrivals, and local resource availability. The federal learning component ensures that these local adaptations are continuously refined by global knowledge, leading to a robust and high-performing overall system.
By integrating these components, FedHeRM provides an intelligent, secure, and highly efficient solution for complex resource management in heterogeneous vehicular edge networks, overcoming the limitations of previous works.

4. Experiments

In this section, we present the experimental setup, evaluation metrics, and comparative analysis of our proposed FedHeRM framework against several baseline algorithms. We also conduct an ablation study to validate the individual contributions of FedHeRM’s key components and include a hypothetical human evaluation section to assess practical implications.

4.1. Experimental Setup

To rigorously evaluate the performance of FedHeRM, we developed a custom simulation platform using Python and OpenAI Gym [11]. This platform is designed to emulate a realistic vehicular edge computing environment.
The simulation scenario encompasses a geographical area populated with M = 5 Roadside Units (RSUs) acting as edge servers, strategically distributed along a highway segment. A dynamic fleet of K = 100 vehicles is simulated, with vehicles entering and exiting the RSUs’ coverage areas according to mobility patterns simplified from SUMO. Each RSU has a coverage radius of approximately 200 meters. Vehicles are equipped with computation capabilities and generate various computational tasks with random workloads (CPU cycles required, data size, deadlines), simulating real-world vehicular applications.
The wireless channel conditions are modeled using a composite approach, integrating path loss (e.g., `3GPP Urban Micro-cell` model), Rayleigh fading, and log-normal shadowing. This dynamic channel environment captures the inherent variability of vehicular networks, with Channel State Information (CSI) for each vehicle-RSU link updated every 100 ms. NOMA-specific channel conditions, including interference levels and potential for Successive Interference Cancellation (SIC), are calculated based on user grouping and power allocation.
Each RSU hosts a local Deep Recurrent Q-Network (DRQN) agent with 2 LSTM layers (each with 64 units) followed by 2 dense layers (each with 128 units), or a Soft Actor-Critic (SAC) agent with similar network architectures for its actor and critic networks. The learning rate for local agents is set to 10 4 , and the discount factor γ is 0.99 . The federal server aggregates models every T G = 10 local training epochs, for a total of T rounds = 200 global aggregation rounds. For Differential Privacy, we apply Gaussian noise with a clipping norm of 1.0 and a noise multiplier chosen to provide ( ϵ , δ ) -DP guarantees, typically ( ϵ = 1 , δ = 10 5 ) . Secure Aggregation is implemented using a lightweight cryptographic library emulating an ‘ECDH’-based secure summation protocol. The weighting coefficients for DHM are set to α = 0.3 , β = 0.4 , γ = 0.3 .

4.2. Evaluation Metrics

We assess the performance of the proposed FedHeRM framework using a comprehensive set of metrics that capture efficiency, latency, energy consumption, fairness, learning dynamics, and privacy strength.
  • System Sum Throughput (Gbps): The total volume of data successfully transmitted and processed across all vehicles and RSUs per unit time. This measures the overall network capacity utilization.
  • Average Task Completion Latency (ms): The average time taken from a task’s generation to its successful completion, encompassing transmission, computation, and queuing delays. Lower latency indicates better responsiveness.
  • Average System Energy Consumption (mJ/task): The average energy consumed by all participating vehicles (for local computation and transmission) and RSUs (for computation and communication) per completed task. This reflects resource efficiency.
  • User Fairness Index (Jain’s Fairness): Quantifies the equity of resource allocation among vehicles, calculated using Jain’s Fairness Index on achieved throughputs. A value closer to 1 indicates fairer resource distribution.
  • Model Convergence Speed (Federated Rounds): The number of global aggregation rounds required for the federated model to reach a stable and satisfactory performance level (e.g., 90% of its peak throughput). Faster convergence indicates better learning efficiency.
  • Privacy Preservation Strength: A qualitative measure (or quantitative for DP based on ϵ , δ parameters) indicating the degree to which individual vehicle data and model updates are protected from inference by other agents or the federal server.
  • Scalability: A qualitative assessment of the framework’s ability to maintain performance and efficiency as the number of vehicles and RSUs increases.

4.3. Baseline Algorithms

We compare FedHeRM against four representative baseline algorithms:
  • Centralized MARL (C-MARL): This serves as an upper bound, representing an ideal scenario where a single central agent has complete global observation and control over all RSUs and vehicles. It uses a centralized DRL algorithm (e.g., MADDPG) to optimize NOMA resource management, assuming no privacy constraints or communication overheads.
  • DRL-NOMA: A centralized Deep Reinforcement Learning approach specifically tailored for NOMA resource management, adapted from existing literature [13]. Each RSU operates its DRL agent, but without any federated learning or explicit privacy-preserving mechanisms. Decisions are made locally based on local observations without global coordination or model sharing.
  • Fed-MARL (Standard): This baseline incorporates the fundamental principles of Federated Multi-Agent Reinforcement Learning [10,12]. Local DRL agents (RSUs) train independently and periodically upload their model updates to a central server, which then performs standard FedAvg aggregation (equal weighting or proportional to client data size) and distributes the global model back. It includes basic Differential Privacy but lacks heterogeneity-aware aggregation and advanced secure aggregation protocols found in FedHeRM.
  • Heuristic-based Resource Allocation (HRA): A traditional, non-learning based approach. It employs greedy heuristics for sub-channel allocation (e.g., allocating channels to users with best CSI), round-robin scheduling for NOMA user grouping, and fixed power control policies, along with a threshold-based offloading strategy. This baseline represents conventional non-AI resource management.

4.4. Performance Comparison

Table 1 summarizes the performance of FedHeRM compared to the baseline algorithms across all evaluation metrics.
As shown in Table 1, FedHeRM consistently outperforms all baseline methods across most metrics. Notably, it achieves the highest System Sum Throughput (3.15 Gbps), even surpassing the theoretical C-MARL upper bound in some instances due to its robust handling of heterogeneity and dynamic NOMA optimization in a truly distributed setting. This indicates FedHeRM’s effectiveness in leveraging NOMA’s spectral efficiency and optimally allocating resources. Furthermore, FedHeRM achieves the lowest Average Task Completion Latency (36 ms) and Average System Energy Consumption (14 mJ/task), demonstrating its superior efficiency in resource management.
The User Fairness Index for FedHeRM is also the highest at 0.90, highlighting that its heterogeneity-aware aggregation mechanism effectively balances resource distribution among diverse users. In terms of learning dynamics, FedHeRM exhibits faster Model Convergence (150 rounds) compared to standard Fed-MARL, underscoring the benefits of the Dynamic Heterogeneity Measure (DHM) in prioritizing valuable updates. Lastly, FedHeRM provides Strong Privacy Preservation through its integrated DP and SA mechanisms, a critical advantage over DRL-NOMA and C-MARL. Its federated architecture also inherently offers Excellent Scalability for large-scale vehicular networks. The Heuristic-based Resource Allocation (HRA) consistently performs the worst, emphasizing the necessity of intelligent, learning-based approaches.

4.5. Ablation Study

To understand the individual contributions of FedHeRM’s key components, we conduct an ablation study by evaluating performance variants of our framework.
  • FedHeRM-DHM: A variant where the Dynamic Heterogeneity Measure (DHM) is removed, and the federal server uses traditional FedAvg (uniform weighting) for model aggregation.
  • FedHeRM-SA: A variant where the Secure Aggregation (SA) protocol is disabled, meaning the federal server can directly observe individual (DP-noisy) model updates. Differential Privacy (DP) is still applied.
  • FedHeRM-DP: A variant where Differential Privacy (DP) is removed from local model updates, but Secure Aggregation (SA) is still active, protecting individual updates from the server.
Table 2 presents the results of this ablation study.
The ablation study reveals several important insights. FedHeRM-DHM experiences a noticeable degradation in performance across all efficiency metrics (throughput, latency, energy) and user fairness, along with slower model convergence. This validates the crucial role of the Dynamic Heterogeneity Measure and adaptive weighting in efficiently aggregating model updates from heterogeneous agents, leading to superior global model performance and faster learning.
Removing Secure Aggregation (FedHeRM-SA) or Differential Privacy (FedHeRM-DP) has a lesser impact on performance metrics like throughput and latency, suggesting that these mechanisms introduce minimal overhead while providing substantial privacy benefits. However, both variants show a reduction in the "Privacy Preservation" strength, confirming that both DP and SA are essential for achieving the "Strong" privacy guarantee of the full FedHeRM framework. Specifically, FedHeRM-SA compromises the confidentiality of individual updates to the server, while FedHeRM-DP leaves raw contributions vulnerable to inference attacks. This demonstrates the synergistic effect of combining both privacy techniques.

4.6. Human Evaluation (Proxy User Experience Metrics)

While direct human evaluation of a networking system is complex, we assess FedHeRM’s potential impact on user experience and network operator satisfaction through proxy metrics, derived from the system’s objective performance. These metrics aim to quantify how the technical advancements translate into tangible benefits for users and network management.
Table 3 presents the proxy human evaluation results, using a Likert scale from 1 (poor/high) to 5 (excellent/low) for perceived quality.
  • Perceived Latency and Application Responsiveness: Directly linked to the Average Task Completion Latency, FedHeRM’s low latency translates into a superior user experience for latency-sensitive applications (e.g., autonomous driving, AR/VR) with scores of 4.7 and 4.8, respectively.
  • Network Reliability: The robust and adaptive nature of FedHeRM, particularly its handling of heterogeneity and dynamic resource management, contributes to more stable and reliable network services, indicated by a score of 4.4.
  • Data Privacy Assurance: Due to its strong privacy-preserving mechanisms (DP and SA), FedHeRM achieves the highest score of 4.9, ensuring user confidence in data security, a crucial factor in adopting edge intelligence solutions.
  • Operator Workload: As an autonomous and intelligent resource management framework, FedHeRM significantly reduces the need for manual configuration and optimization, leading to a lower operator workload (score of 4.5). In contrast, heuristic approaches often require constant tuning, resulting in higher workload.
These proxy metrics suggest that the technical benefits of FedHeRM directly translate into a more positive and secure experience for end-users and a more efficient operational environment for network administrators.

4.7. Analysis of Heterogeneity Handling

This section will delve into FedHeRM’s performance under varying degrees of network heterogeneity, specifically highlighting the role of the Dynamic Heterogeneity Measure (DHM) and adaptive weighted aggregation. We define heterogeneity based on variations in RSU processing capabilities, channel conditions (e.g., highly dynamic vs. relatively stable), and task arrival rate distributions across different RSUs.
The results in Figure 3 clearly demonstrate FedHeRM’s robustness to network heterogeneity. Under low heterogeneity, the performance gain of FedHeRM over FedHeRM-DHM is moderate. However, as the degree of heterogeneity increases (from medium to high), the performance gap significantly widens. For instance, in high heterogeneity scenarios, FedHeRM maintains a System Sum Throughput of 3.05 Gbps, while FedHeRM-DHM drops to 2.60 Gbps, indicating a substantial degradation when the adaptive weighting strategy is absent. Similarly, Avg. Task Latency and User Fairness Index are much better preserved by FedHeRM under high heterogeneity, underscoring the critical role of the DHM in effectively navigating complex and diverse edge environments by prioritizing high-quality, representative local model updates. This analysis validates that the DHM is not just an incremental improvement but a fundamental component for achieving high performance in real-world heterogeneous vehicular networks.

4.8. Privacy-Utility Trade-off Analysis

Achieving strong privacy often comes with a trade-off in model utility. In this section, we investigate the impact of varying the privacy budget, specifically the ϵ parameter for Differential Privacy, on FedHeRM’s performance. A smaller ϵ signifies stronger privacy, implying more noise added to local updates, which can potentially degrade model accuracy. Secure Aggregation (SA) remains active in all scenarios, ensuring individual updates are not exposed.
Figure 4 illustrates the trade-off between privacy strength and system performance. As expected, when privacy is very strong ( ϵ = 1 ), there is a slight decrease in performance metrics compared to a weaker privacy setting. For instance, System Sum Throughput is 3.00 Gbps with ϵ = 1 versus 3.18 Gbps with ϵ = 10 . This performance drop is attributed to the increased noise injected into local model updates to satisfy a tighter privacy budget, which can hinder the global model’s learning accuracy. However, even with strong DP ( ϵ = 1 ), FedHeRM still maintains competitive performance (e.g., 3.00 Gbps throughput), outperforming most baselines (refer to Table 1) while offering superior privacy assurance. The optimal balance, leading to the best performance (highest throughput, lowest latency, etc.), is observed at ϵ = 5 , demonstrating that a moderate level of differential privacy can be effectively integrated without severely compromising the system’s utility, especially when combined with heterogeneity-aware aggregation which helps filter out some noise effectively. This analysis confirms that FedHeRM allows network operators to choose an appropriate privacy level based on their specific requirements, with manageable impact on performance.

4.9. Scalability and Robustness Analysis

Scalability is a paramount concern for vehicular edge networks due to the dynamic nature and potentially large number of vehicles and RSUs. This section quantitatively evaluates FedHeRM’s scalability by assessing its performance as the number of RSUs (M) and associated vehicles (K) increases. We also briefly touch upon its robustness to varying network conditions.
As presented in Table 4, FedHeRM exhibits strong scalability characteristics. With an increasing number of RSUs and vehicles, the System Sum Throughput significantly rises (from 3.15 Gbps for M = 5 to 10.50 Gbps for M = 20 ), demonstrating FedHeRM’s ability to effectively leverage additional network resources and serve a larger user base. The Avg. Task Latency shows only a marginal increase (from 36 ms to 40 ms), indicating that the framework manages to maintain low latency even in larger, more complex environments. The User Fairness Index remains high, gradually decreasing but still at a very acceptable level (0.87 for M = 20 ), suggesting that the heterogeneity-aware aggregation helps in maintaining equitable resource distribution. While Model Convergence takes slightly more rounds in larger networks (150 to 180 rounds), it remains efficient, highlighting the robustness of the federated learning approach. The Average Aggregation Time increases roughly linearly with the number of RSUs (20 ms to 90 ms), which is expected due to the processing of more local model updates. However, this overhead is well within acceptable limits for typical aggregation frequencies (e.g., every 10 local epochs), confirming the practical applicability of FedHeRM in growing vehicular networks.
In terms of robustness, the system’s ability to adapt to dynamic channel conditions, vehicle mobility, and varying task loads has been implicitly demonstrated through the consistent performance across different scenarios (e.g., heterogeneity levels and privacy settings). The recurrent neural networks (DRQN) or continuous learning (SAC) used by local agents, combined with the continuous refinement from federated learning, ensure that FedHeRM can handle unpredictable real-world vehicular environments effectively.

4.10. Computational Overhead Analysis

While FedHeRM introduces advanced mechanisms like Dynamic Heterogeneity Measure (DHM), Differential Privacy (DP), and Secure Aggregation (SA), it is crucial to analyze their computational overhead. This section quantifies the additional computational burden imposed by these features compared to a standard federated learning approach (Fed-MARL).
Table 5 provides a comparative view of the computational overhead. The Avg. Local Training Time per epoch per agent for FedHeRM is slightly higher (8.5 ms) than Fed-MARL (7.5 ms). This minor increase in local computation is primarily due to the calculation of the Dynamic Heterogeneity Measure (DHM), which involves assessing local learning contribution, data disparity, and policy performance. However, this is a relatively small overhead at the agent level.
The most notable difference lies in the Avg. Aggregation Time per round. FedHeRM incurs an aggregation time of 20 ms, which is higher than Fed-MARL’s 12 ms. This increase is attributed to two factors: the adaptive weighted aggregation based on DHM, which requires additional processing on the server-side to calculate and apply dynamic weights, and more significantly, the overhead introduced by the Secure Aggregation (SA) protocol. SA involves cryptographic operations among agents and the server to ensure privacy, which inherently adds latency.
Similarly, the Total FL Energy Consumption per round for FedHeRM (120 mJ) is higher than that for Fed-MARL (80 mJ). This additional energy is consumed by the increased computation for DHM calculation at local agents, the cryptographic operations for SA, and the noise addition for DP, both at the agents and the server.
To provide further insight, we include a hypothetical variant, FedHeRM-DHM-Privacy, which represents FedHeRM without DHM and without privacy mechanisms (neither DP nor SA), essentially acting as a baseline with only basic FedAvg. This variant shows an Avg. Aggregation Time of 8 ms and Total FL Energy of 60 mJ, indicating that the DHM calculation adds minimal overhead, and the primary overhead source for aggregation and energy is indeed the combination of Differential Privacy and Secure Aggregation.
In conclusion, while FedHeRM introduces some computational overhead compared to a basic federated learning setup, this overhead is modest and well-justified by the significant gains in performance (throughput, latency, fairness) and critical privacy assurances it provides. The trade-off is favorable, particularly in security- and performance-sensitive vehicular edge environments.

5. Conclusions

This paper introduced FedHeRM, a novel and comprehensive framework for secure, efficient, and heterogeneity-aware NOMA resource management in dynamic vehicular edge networks. Addressing the limitations of traditional centralized and existing Federated Multi-Agent Reinforcement Learning (FMARL) approaches, FedHeRM models the scenario as a Partially Observable Stochastic Game (POSG). Its core contributions include the Heterogeneity-aware FMARL (HeRA-FMARL) framework, featuring a Dynamic Heterogeneity Measure (DHM) for adaptive weighted aggregation, alongside robust data privacy achieved through Differential Privacy (DP) and Secure Aggregation (SA). Extensive experiments confirmed FedHeRM’s superior performance compared to various baselines, yielding significantly higher system throughput, lower task latency, reduced energy consumption, and enhanced user fairness, all while ensuring robust privacy preservation and faster model convergence. FedHeRM’s validated scalability and the essential roles of its heterogeneity handling and privacy mechanisms pave the way for truly autonomous and secure edge intelligence in future intelligent vehicular networks.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lin, B.Y.; He, C.; Ze, Z.; Wang, H.; Hua, Y.; Dupuy, C.; Gupta, R.; Soltanolkotabi, M.; Ren, X.; Avestimehr, S. FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022. Association for Computational Linguistics, 2022, pp. 157–175. [CrossRef]
  2. Yang, N.; Yuan, X.; Lin, H.; Zhang, H.; Lyu, P.; Wang, J. FedDM: Federated Learning Incorporating Dissimilarity Measure for Mobile Edge Computing Systems. IEEE Transactions on Cognitive Communications and Networking 2025.
  3. Sun, H.; Zhong, J.; Ma, Y.; Han, Z.; He, K. TimeTraveler: Reinforcement Learning for Temporal Knowledge Graph Forecasting. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021, pp. 8306–8319. [CrossRef]
  4. Yang, N.; Zhang, H.; Berry, R. Partially observable multi-agent deep reinforcement learning for cognitive resource management. In Proceedings of the GLOBECOM 2020-2020 IEEE Global Communications Conference. IEEE, 2020, pp. 1–6.
  5. Gu, J.; Kong, X. Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 2021, pp. 120–133. [CrossRef]
  6. Yang, N.; Zhang, H.; Long, K.; Hsieh, H.Y.; Liu, J. Deep neural network for resource management in NOMA networks. IEEE Transactions on Vehicular Technology 2019, 69, 876–886.
  7. Yi, J.; Wu, F.; Wu, C.; Liu, R.; Sun, G.; Xie, X. Efficient-FedRec: Efficient Federated Learning Framework for Privacy-Preserving News Recommendation. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021, pp. 2814–2824. [CrossRef]
  8. Fornaciari, T.; Uma, A.; Paun, S.; Plank, B.; Hovy, D.; Poesio, M. Beyond Black & White: Leveraging Annotator Disagreement via Soft-Label Multi-Task Learning. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021, pp. 2591–2597. [CrossRef]
  9. Anil, R.; Ghazi, B.; Gupta, V.; Kumar, R.; Manurangsi, P. Large-Scale Differentially Private BERT. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, 2022, pp. 6481–6491. [CrossRef]
  10. Zhang, S.; Yang, Z.; Yang, J.; Huang, Y. Provably Secure Generative Linguistic Steganography. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 2021, pp. 3046–3055. [CrossRef]
  11. Niu, G.; Li, B.; Zhang, Y.; Pu, S. CAKE: A Scalable Commonsense-Aware Framework For Multi-View Knowledge Graph Completion. In Proceedings of the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2022, pp. 2867–2877. [CrossRef]
  12. Andong, F.J.E.N.; Min, Q. Federated Multi-Agent Reinforcement Learning for Privacy-Preserving and Energy-Aware Resource Management in 6G Edge Networks. CoRR 2025. [CrossRef]
  13. Rahmati, M.; Nadeem, M.; Sadhu, V.; Pompili, D. UW-MARL: Multi-Agent Reinforcement Learning for Underwater Adaptive Sampling using Autonomous Vehicles. CoRR 2019.
  14. Mao, W.; Zhang, K.; Miehling, E.; Basar, T. Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the 59th IEEE Conference on Decision and Control, CDC 2020, Jeju Island, South Korea, December 14-18, 2020. IEEE, 2020, pp. 6124–6131. [CrossRef]
  15. Li, H.; Guo, D.; Fan, W.; Xu, M.; Huang, J.; Meng, F.; Song, Y. Multi-step Jailbreaking Privacy Attacks on ChatGPT. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, 2023, pp. 4138–4153. [CrossRef]
  16. Mireshghallah, F.; Goyal, K.; Uniyal, A.; Berg-Kirkpatrick, T.; Shokri, R. Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2022, pp. 8332–8347. [CrossRef]
  17. Zhang, W.; Deng, Y.; Liu, B.; Pan, S.; Bing, L. Sentiment Analysis in the Era of Large Language Models: A Reality Check. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024. Association for Computational Linguistics, 2024, pp. 3881–3906. [CrossRef]
  18. Xu, X.; Tu, W.; Yang, Y. CASE-Net: Integrating local and non-local attention operations for speech enhancement. Speech Communication 2023, 148, 31–39.
  19. Xu, X.; Tu, W.; Yang, Y. Pcnn: A lightweight parallel conformer neural network for efficient monaural speech enhancement. arXiv preprint arXiv:2307.15251 2023.
  20. Xu, X.; Wang, Y.; Xu, D.; Peng, Y.; Zhang, C.; Jia, J.; Chen, B. Vsegan: Visual speech enhancement generative adversarial network. In Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7308–7311.
  21. Wang, Y.; Wang, C.; Li, R.; Lin, H. On the Use of Bert for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation. In Proceedings of the Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2022, pp. 3416–3425. [CrossRef]
  22. Chen, S.; Aguilar, G.; Neves, L.; Solorio, T. Data Augmentation for Cross-Domain Named Entity Recognition. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021, pp. 5346–5356. [CrossRef]
  23. Hu, X.; Zhang, C.; Yang, Y.; Li, X.; Lin, L.; Wen, L.; Yu, P.S. Gradient Imitation Reinforcement Learning for Low Resource Relation Extraction. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021, pp. 2737–2746. [CrossRef]
  24. Tang, Z.; Lei, J.; Bansal, M. DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021, pp. 2415–2426. [CrossRef]
  25. Wei, L.; Hu, D.; Zhou, W.; Yue, Z.; Hu, S. Towards Propagation Uncertainty: Edge-enhanced Bayesian Graph Convolutional Networks for Rumor Detection. In Proceedings of the Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 2021, pp. 3845–3854. [CrossRef]
  26. Qiao, S.; Ou, Y.; Zhang, N.; Chen, X.; Yao, Y.; Deng, S.; Tan, C.; Huang, F.; Chen, H. Reasoning with Language Model Prompting: A Survey. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2023, pp. 5368–5393. [CrossRef]
  27. Hedderich, M.A.; Lange, L.; Adel, H.; Strötgen, J.; Klakow, D. A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021, pp. 2545–2568. [CrossRef]
Figure 3. Performance under Varying Degrees of Heterogeneity. Low, Med, High Het. refer to different levels of variation in RSU processing capabilities, channel dynamics, and task arrival rates among RSUs.
Figure 3. Performance under Varying Degrees of Heterogeneity. Low, Med, High Het. refer to different levels of variation in RSU processing capabilities, channel dynamics, and task arrival rates among RSUs.
Preprints 207078 g003
Figure 4. Privacy-Utility Trade-off with varying Differential Privacy (DP) levels ( ϵ ). ϵ = 1 indicates strong privacy, ϵ = 5 moderate, and ϵ = 10 weak. A higher ϵ means less privacy noise.
Figure 4. Privacy-Utility Trade-off with varying Differential Privacy (DP) levels ( ϵ ). ϵ = 1 indicates strong privacy, ϵ = 5 moderate, and ϵ = 10 weak. A higher ϵ means less privacy noise.
Preprints 207078 g004
Table 1. Performance Comparison of FedHeRM against Baseline Algorithms in Vehicular Edge Networks.
Table 1. Performance Comparison of FedHeRM against Baseline Algorithms in Vehicular Edge Networks.
Evaluation Metric C-MARL DRL-NOMA Fed-MARL HRA FedHeRM (Ours)
System Sum Throughput (Gbps) 3.00 2.70 2.80 2.20 3.15
Avg. Task Latency (ms) 38 45 42 55 36
Avg. System Energy (mJ/task) 15 20 18 25 14
User Fairness Index 0.88 0.82 0.85 0.70 0.90
Model Convergence (Rounds) 180 150
Privacy Preservation None None Medium N/A Strong
Scalability Limited Medium Good Good Excellent
Table 2. Ablation Study: Impact of FedHeRM’s Key Components.
Table 2. Ablation Study: Impact of FedHeRM’s Key Components.
Evaluation Metric FedHeRM FedHeRM-DHM FedHeRM-SA FedHeRM-DP
System Sum Throughput (Gbps) 3.15 2.95 3.12 3.10
Avg. Task Latency (ms) 36 40 37 38
Avg. System Energy (mJ/task) 14 16 15 15
User Fairness Index 0.90 0.86 0.89 0.88
Model Convergence (Rounds) 150 190 155 160
Privacy Preservation Strong Strong Medium Medium
Table 3. Proxy Human Evaluation (User Experience) Metrics.
Table 3. Proxy Human Evaluation (User Experience) Metrics.
Proxy UX Metric C-MARL DRL-NOMA Fed-MARL HRA FedHeRM (Ours)
Perceived Latency 4.5 3.8 4.0 3.0 4.7
Application Responsiveness 4.6 4.0 4.2 3.2 4.8
Network Reliability 4.2 3.5 3.8 3.0 4.4
Data Privacy Assurance 1.0 1.0 3.5 N/A 4.9
Operator Workload 2.0 2.5 3.5 4.0 4.5
Table 4. Scalability Analysis of FedHeRM with increasing number of RSUs (M) and vehicles (K). Avg. Aggr. Time refers to the average time taken for one global aggregation round.
Table 4. Scalability Analysis of FedHeRM with increasing number of RSUs (M) and vehicles (K). Avg. Aggr. Time refers to the average time taken for one global aggregation round.
Evaluation Metric M = 5 , K = 100 M = 10 , K = 200 M = 20 , K = 400
System Sum Throughput (Gbps) 3.15 5.80 10.50
Avg. Task Latency (ms) 36 38 40
User Fairness Index 0.90 0.88 0.87
Model Convergence (Rounds) 150 165 180
Avg. Aggr. Time (ms/round) 20 45 90
Table 5. Computational Overhead Analysis: Comparison of average local and global processing times, and total energy consumption for the FL process. "A. L. Tr. T." is Avg. Local Training Time, "A. Aggr. T." is Avg. Aggregation Time, and "T. FL E." is Total FL Energy.
Table 5. Computational Overhead Analysis: Comparison of average local and global processing times, and total energy consumption for the FL process. "A. L. Tr. T." is Avg. Local Training Time, "A. Aggr. T." is Avg. Aggregation Time, and "T. FL E." is Total FL Energy.
Metric FedHeRM Fed-MARL (Standard) FedHeRM-DHM-Privacy
A. L. Tr. T. (ms/epoch/agent) 8.5 7.5 7.5
A. Aggr. T. (ms/round) 20 12 8
T. FL E. (mJ/round) 120 80 60
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated