1. Introduction
Federated learning (FL), as defined by 3GPP [
1] and IEEE [
2,
3], represents an innovative paradigm in machine learning that addresses the challenges of decentralized data environments [
4]. The integration of artificial intelligence (AI) into the 3GPP framework has reached a significant milestone, with ongoing research and specifications aimed at enhancing data utilization and predictive capabilities expected to be featured in Release 19 and Release 20. This development underscores the organization’s commitment to leveraging advanced analytical techniques to improve telecommunications standards. Within this context, both the Technical Specification Group for Radio Access Networks (TSG RAN) and the Technical Specification Group for Service and System Aspects (TSG SA) have outlined specific requirements to incorporate AI alongside its companion, machine learning (ML).
Notably, Working Group RAN3 has finalized Technical Report 37.817, which investigates enhancements for data collection relating to New Radio (NR) and Evolved Universal Terrestrial Radio Access (ENDC). This report highlights three primary use cases where AI and ML can provide meaningful solutions in the Network Energy Savings, addressing strategies such as traffic offloading, modifications to coverage, and cell deactivation to reduce energy consumption. Moreover Load Balancing supports the Implementation techniques to optimize load distribution across cells or groups of cells in multi-frequency and multi-radio access technology (RAT) environments, thereby improving overall network performance based on predictive analytics. Finally for the Mobility Optimization, ensuring robust network performance during user equipment (UE) mobility events by selecting optimal mobility targets grounded in predictive assessments of service delivery. These focal points illustrate a strategic approach to evolving network management through intelligent data-driven decision-making.
The Federated Learning architecture will be supported by the NWDAF in the crucial 3GPP standard method [
5]. It efficiently collects data from user equipment, network functions, operations, administration, and maintenance (OAM) systems within the 5G Core, Cloud, and Edge networks. This wealth of data is then utilized for powerful 5G analytics, enabling better insights and actions to enhance the overall end-user experience. In any case the FEELL shall comply with the existing NWDAF framework and 5GS framework as specified in [
6,
7,
8,
9]. Artificial intelligence (AI) and machine learning (ML) over NWDAF have become pivotal technologies driving advancements in wireless communication networks. These cutting-edge tools offer innovative solutions to enhance the efficiency, scalability, and performance of modern network infrastructures. Recognizing their potential, the Third Generation Partnership Project (3GPP) has integrated AI/ML technologies into the Radio Access Network (RAN) as part of Release 18, marking the beginning of 5G-Advanced [
10]. This milestone represents a significant step forward in optimizing network capabilities to accommodate the rapid proliferation of massive Internet-of-Things (IoT) devices and the exponential growth in data traffic. The inclusion of AI/ML in Release 18 underscores its importance in advancing the capabilities of 5G networks. These technologies play a central role in enabling intelligent resource allocation, optimizing network performance, and addressing the unique challenges posed by increasing IoT connectivity. By leveraging NWDAF AI/ML, 5G-Advanced aims to deliver superior reliability, efficiency, and adaptability to meet the evolving demands of modern communication systems.
The Management Data Analytics Function (MDAF) serves as a fundamental component for enabling network automation and intelligence by processing data on network conditions and service events to generate detailed analytics reports. This function utilizes input data from various network functions (e.g., NWDAF) and entities (e.g., 6G gNB or 5G gNB). MDAF deployment is versatile, operating at different levels such as the domain level—targeting specific areas like the Radio Access Network (RAN) or core network—and in a centralized configuration to deliver comprehensive, end-to-end or cross-domain analytics [
11]. Efforts within 3GPP to integrate AI/ML and advanced data analytics into 5G system design, prior to Release 18, have established a robust framework for further development in 5G-Advanced. Release 18 incorporates an extensive range of studies and work items related to AI/ML, involving contributions across multiple 3GPP working groups, thereby paving the way for enhanced capabilities in network optimization and intelligence, as indicated in
Figure 1 [
11].
AI and ML technologies play a crucial role in mobile devices within the 5G ecosystem, supporting functionalities such as image recognition, speech processing, and video analysis. However, preloading all possible AI/ML models onto user equipment (UE) is impractical. As a result, models often need to be downloaded dynamically based on specific requirements. Additionally, some UEs may lack the computational resources needed to perform inference operations locally, necessitating the offloading of these tasks to the 5G cloud or edge infrastructure. Furthermore, collaborative training of global AI/ML models across multiple entities in the 5G framework requires efficient mechanisms for sharing training data. The growing demand for transferring AI/ML models and data introduces a new category of network traffic that 5G systems must accommodate. The 3GPP SA1 group, tasked with defining service and performance requirements for 3GPP systems, initiated a study in Release 18 to explore use cases and establish the requirements for AI/ML model transfers [
12]. This study identified three key types of AI/ML operations, as shown in
Figure 2 [
11].
The first type, AI/ML Operation Splitting, involves dividing tasks between endpoints. Privacy-sensitive or latency-critical components of operations are retained within the UE, while computationally intensive tasks are offloaded to network endpoints. The second type, Model and Data Distribution, focuses on enabling adaptive downloading of models from network endpoints to UEs as needed. The third type, Distributed or Federated Learning, allows UEs to perform partial training on local datasets, with a central entity aggregating these results to form a unified global model [
13]. The study identified potential service requirements and performance metrics, including those related to training, inference, distribution, monitoring, prediction, and management of AI/ML models within the 5G ecosystem. Following this initial exploration, 3GPP SA1 launched a subsequent work item in Release 18 to define normative service and performance requirements, building on the findings of the study to address the evolving demands of AI/ML integration in 5G systems [
13]. Moreover the 3GPP [
14] study aims to lay the groundwork for leveraging AI/ML to enhance the air interface, addressing multiple critical dimensions. Key areas of focus include defining the stages of AI/ML algorithm deployment, determining the degree of collaboration required between the gNB and UE, identifying the datasets necessary for training, validating, and testing AI/ML models, and managing the entire life cycle of these models. These efforts are essential for ensuring that AI/ML technologies can be effectively integrated into current and future network architectures, setting the stage for 6G.
Unlike traditional machine learning approaches that rely on centrally aggregated data, FEEL enables multiple entities, often referred to as clients, to collaboratively train a shared model while ensuring their data remains localized. This approach is particularly significant in scenarios where data privacy, security, and regulatory compliance are critical concerns, such as in the telecommunications sector. The distinguishing characteristic of FEEL lies in its approach to data heterogeneity. In decentralized settings, data samples across clients are not guaranteed to be independently and identically distributed (non-IID). This stands in contrast to centralized models, where uniform data distribution is often assumed. This inherent heterogeneity of FEEL systems poses unique challenges and necessitates tailored algorithms to ensure effective learning across diverse data distributions. A key motivation for federated learning is its potential to address data minimization and optimization challenges, especially in fields where data privacy, bandwidth efficiency and throughput optimization are critical. By training models locally on client devices or nodes and sharing only model parameters—such as weights and biases—FEEL minimizes the need for raw data exchange [
15]. This not only reduces privacy risks but also mitigates bandwidth constraints, making it an appealing solution for large-scale, distributed systems like telecommunications networks.
At the core of the FEEL paradigm is a collaborative training process that iteratively combines local computations into a global model. Each client trains a model on its local data and periodically transmits the updated parameters to a central aggregator. The aggregator then consolidates these updates to refine the global model, which is subsequently shared back with the clients. This iterative process continues until the model achieves a predefined level of performance, with good potential examples in 6G networks [
16]. The potential of FEEL extends far beyond data privacy. It aligns seamlessly with the growing emphasis on distributed computing and edge intelligence in modern network architectures. For example, in telecommunication networks, FEEL can facilitate real-time optimization of network resources, enhance service delivery, and drive innovation in predictive maintenance and user behavior analytics [
17]. However, implementing FEEL in practice introduces several challenges, including communication overhead, computational constraints at edge devices, and the need for robust algorithms to handle non-IID data. Addressing these challenges requires interdisciplinary efforts that span machine learning, distributed computing, and network optimization [
18].
Recent advances in FEEL have explored techniques to improve communication efficiency, such as model compression and adaptive update mechanisms. Additionally, privacy-preserving technologies, including secure aggregation and differential privacy, are increasingly being integrated into FEEL frameworks to ensure that sensitive information remains protected throughout the training process [
19]. In 6G networks and in mobile communications, FEEL holds promise for transforming network management and optimization. By leveraging localized data at various network nodes, operators can enhance coverage, capacity, and user experience while adhering to stringent privacy regulations. Furthermore, FEEL aligns with the broader trend toward 6G networks, which emphasize edge intelligence, data efficiency, and distributed learning [
20]. The significance of FEEL is underscored by its applicability across a diverse range of domains, including healthcare, finance, and industrial automation. In healthcare, for instance, FEEL enables collaborative training of diagnostic models across hospitals without exposing sensitive patient data. Similarly, in finance, it allows institutions to develop fraud detection models without sharing proprietary transaction data.
A number of different algorithms for federated optimization have been proposed. Deep learning training often utilizes variations of Federated stochastic gradient descent (FedSGD), where gradients are computed on a randomly selected portion of the dataset and then used to update the model through a single step of gradient descent. In a federated learning context, FedSGD adapts this process by distributing the computation across multiple nodes. A random fraction
C of the available nodes is selected, and each node uses its entire local dataset to calculate gradients. These gradients are then aggregated by a central server, weighted according to the number of training samples on each node, building then combined gradients which are subsequently applied to update the global model through a single gradient descent step [
21].
Another algorithm is the Federated Averaging (FedAvg) which builds upon the concept of Federated Stochastic Gradient Descent (FedSGD) by enabling local nodes to conduct multiple updates on their respective local datasets before sharing their parameters with the central server. Unlike FedSGD, where the exchanged information consists of gradients calculated after a single update, FedAvg aggregates the locally updated model weights directly [
22]. The fundamental insight underpinning this approach is that, when local models originate from identical initial conditions, averaging their gradients in FedSGD is mathematically analogous to averaging the model weights. However, FedAvg goes further by leveraging the averaged weights from locally tuned models, a process that maintains—if not enhances—the performance of the aggregated global model. This enhancement arises because the averaging process effectively captures the learning progress made by each node, even when working with heterogeneous local data distributions. This paradigm shift in federated optimization aligns with findings in distributed machine learning research, where locally adjusted weights retain key features learned during training. Notably, studies highlight that the use of FedAvg often leads to faster convergence and reduced communication overhead compared to FedSGD, making it a preferred method in many real-world federated learning scenarios.
Federated Learning with Dynamic Regularization (FedDyn) addresses a critical challenge in federated learning—handling heterogeneous data distributions across devices. When device datasets are non-identically distributed, minimizing individual device loss functions does not necessarily align with minimizing the overarching global loss function. Recognizing this issue, a new algorithm was introduced to mitigate the adverse effects of data heterogeneity [
23]. FedDyn introduces a dynamic regularization mechanism to adjust each device’s local loss function, ensuring that the aggregated modifications contribute effectively to the global loss minimization. By aligning local losses with the global objective, FedDyn becomes robust to varying degrees of heterogeneity, enabling devices to perform full optimization locally without sacrificing overall model convergence. Theoretical analysis confirms that FedDyn achieves convergence to a stationary point for non-convex loss functions, regardless of the heterogeneity level. These theoretical guarantees are supported by extensive experimental evaluations across diverse datasets, demonstrating its efficacy and reliability.
Dynamic Aggregation Using Inverse Distance Weighting (IDA) is an innovative adaptive technique designed to address challenges associated with unbalanced and non-independent identically distributed (non-iid) data in federated learning environments. This method assigns weights to clients dynamically based on meta-information, with a focus on improving both the robustness and efficiency of model aggregation [
24]. The key principle of IDA lies in leveraging the distance between model parameters. By using this distance as a weighting factor, the method reduces the influence of outlier models, which can arise due to significant data distribution differences or irregular client behavior. This strategy not only mitigates the negative impact of outliers but also enhances the global model’s convergence speed by ensuring that updates from more representative or reliable clients carry greater significance in the aggregation process. Extensive studies in the field of federated learning underscore the importance of such dynamic aggregation methods, especially when dealing with real-world datasets characterized by heterogeneity and imbalances. By prioritizing model updates that are closer in parameter space, IDA aligns closely with the global optimization objective, effectively addressing the divergence issues often observed in federated systems. The integration of IDA into federated frameworks demonstrates promising results, as evidenced by improved model accuracy and faster convergence rates across various experimental setups. This method represents a significant step toward more adaptive and resilient federated learning systems, where the quality and relevance of client contributions are dynamically optimized.
Combining the Federated Learning approach with the aid of optimization algorithms has been also studied in the international literature. The study in [
25] addresses a federated learning (FL) scenario operating over wireless channels, explicitly considering coding rates and packet transmission errors. The communication channels are modeled as packet erasure channels (PEC), where the probability of packet erasure is influenced by factors such as block length, coding rate, and signal-to-noise ratio (SNR). To mitigate the adverse effects of packet erasure on FL performance, two optimization strategies are introduced: the central node (CN) either utilizes past local updates or reverts to previous global parameters in instances of packet loss. The mathematical analysis explores the relationship between coding rates and FL convergence under both short-packet and long-packet communication regimes with transmission errors. Simulation results demonstrate that even incorporating minimal memory at the CN—such as retaining one prior update—significantly enhances FL performance in the presence of transmission errors.
The following study [
26] addresses the critical challenge of unreliable communication in decentralized federated learning by introducing the Soft-DSGD algorithm. Unlike traditional federated learning approaches, which depend on a central node for parameter aggregation, decentralized methods allow devices to exchange model updates directly. However, existing frameworks for decentralized learning often assume idealized conditions with perfect communication among devices. In such scenarios, devices are expected to reliably exchange information, such as gradients or model parameters, without any loss or error. Unfortunately, real-world communication networks are rarely this reliable, as they are susceptible to issues like packet loss and transmission errors. Thus ensuring communication reliability often comes at a significant cost. The most common solution involves using robust transport layer protocols such as Transmission Control Protocol (TCP). While TCP ensures reliable data transmission, it introduces substantial communication overhead, which can degrade the efficiency of the decentralized learning process. Furthermore, TCP reduces the overall network connectivity, as the stringent requirements for reliable communication can limit the number of participating devices. To address these challenges, this study [
26] proposes a robust decentralized stochastic gradient descent (SGD) algorithm, referred to as Soft-DSGD. Unlike conventional approaches, Soft-DSGD is designed to operate effectively over lightweight and unreliable communication protocols, such as the User Datagram Protocol (UDP). UDP is a connectionless protocol that provides faster and more flexible communication but does not guarantee reliable packet delivery. This protocol is particularly well-suited for decentralized learning in scenarios where communication efficiency and low latency are critical. By leveraging lightweight protocols like UDP and adopting innovative techniques to handle partial messages, Soft-DSGD offers a practical solution for training machine learning models in decentralized environments. Its ability to maintain robust performance under realistic conditions positions it as a valuable tool for advancing federated learning in the era of edge computing and Internet-of-Things (IoT) applications.
In [
27] the authors presented a robust solution for decentralized learning in dynamic and unreliable wireless environments. The proposed approach is specifically tailored for decentralized learning in wireless networks characterized by random, time-varying communication topologies. In these networks, participating devices may experience communication impairments, and some devices can become stragglers—failing to meet computational or communication demands—at any point during the training process. To mitigate the impact of these challenges, the algorithm incorporates a novel consensus strategy. This strategy leverages time-varying mixing matrices that dynamically adjust based on the instantaneous state of the network. By adapting to the current network topology, the algorithm ensures robust communication and improves the overall efficiency of the learning process. By addressing the limitations of prior frameworks, the asynchronous DSGD algorithm enables efficient and reliable collaborative training, paving the way for scalable decentralized learning applications in real-world wireless networks.
Our paper work is motivated by the observation that retransmission mechanisms are nowadays integral to many modern wireless communication standards, such as 3GPP 5G-Advanced gNBs and IEEE WiFi. However while extensively explored in traditional communication systems, the application of HARQ retransmission in distributed learning remains relatively under-researched. Indeed the paper work in [
28] presents a statistical quality-of-service (QoS) analysis for a block-fading device-to-device (D2D) communication link within a multi-tier cellular network, comprising a macro base station (BSMC) and a micro base station (BSmC), both operating in full-duplex (FD) mode. Effective capacity (EC) is computed for the D2D link, assuming no channel state information (CSI) at the transmitting D2D node, which operates at a fixed transmission rate and power. The communication link is modeled as a six-state Markov system under both overlay and underlay configurations. To enhance throughput, the study incorporates Hybrid Automatic Repeat Request (HARQ) and truncated HARQ schemes, along with two queue models based on responses to decoding failures. Simulation results reveal superior self-interference cancellation at BSmC and BSMC in FD mode enhances EC. However there is not any similar analysis in federated learning 6G networks with multiple collaborative devices.
In [
29] it is lately explored the rapid expansion of Artificial Intelligence (AI) and Machine Learning (ML)-driven applications with all related challenges into the growing adoption of distributed intelligence solutions to leverage the computational capabilities of cloud infrastructure, edge nodes, and end-devices for enhancing the overall processing power, meet diverse application demands, and optimize system performance. This paper delves into the complex distributed intelligence landscape, critically analyzing key research advancements in the field, however the MAC layer 2 importance of HARQ retransmissions is not fully exploited. Moreover a semantic-aware HARQ (SemHARQ) framework for robust and efficient transmission of semantic features is introduced in [
30]. A multi-task semantic encoder enhances semantic coding robustness, while a feature importance ranking (FIR) method prioritizes critical feature delivery under constrained channel resources. Additionally, a novel feature distortion evaluation (FDE) network detects transmission errors and supports an efficient HARQ scheme by retransmitting corrupted features with incremental updates, however the important MAC HARQ retransmissions for cooperative federated learning devices is not mentioned and studied.
The closest HARQ analysis for Federated Learning is in paper [
31] introducing a Federated Edge Learning (FEEL) framework designed to address the challenges posed by unreliable wireless channels, where gradients from local devices are divided into packets and subject to packet error rates (PER). Unreliable transmissions introduce bias between the actual and theoretical global gradients, adversely affecting model training. A mathematical analysis evaluates the impact of PER on convergence rates and communication costs and a an optimized device retransmission selection scheme is proposed, based on a classical convex optimization obtained solution through the Karush-Kuhn-Tucker (KKT) condition, balancing convergence performance with communication overhead. The paper derives the optimal retransmission strategy to enhance model training efficiency and provides an analysis of its effectiveness. Additionally, a signaling protocol is developed to support the proposed retransmission scheme, ensuring robust and efficient model training under imperfect communication conditions.
The motivation of our paper work is to examine the implications of retransmission strategies on distributed learning, focusing on balancing the dual objectives of reliability (throughput) and timeliness to optimize performance in diverse communication environments. Our paper study emphasizes on the challenges of the unreliable wireless channels, considering the impact the timeliness of data transmission as per [
31], but improving the analysis including the eventual throughput. In certain scenarios, prioritizing timeliness over reliability might be a desirable trade-off. To optimize the performance of Federated Learning over unreliable faded wireless channels, we are in preference of the paper in [
32] where a well-known solution to large-scale machine learning problems in D2D topologies with ideal communication, the Decentralized Stochastic Gradient Descent (DSGD), is applied guaranteeing the convergence to optimality under assumptions of convexity and connectivity. To our opinion the DSGD algorithm is a superior alternative to the classical convex optimization approach using KKT for large-scale federated learning (FEEL) under fading wireless channels. While KKT-based solutions provide an elegant framework for finding optimal solutions in convex problems, their reliance on centralized computation and global knowledge of constraints limits their applicability in large-scale decentralized environments like FEEL. In such systems, the dynamic and distributed nature of data across devices, combined with the unpredictability of fading wireless channels, poses significant challenges to centralized KKT-based methods. DSGD excels in these environments since it enables local updates at individual devices, which are then aggregated through peer-to-peer communication. This reduces the need for centralized control, making DSGD scalable to large networks with many devices. Furthermore, DSGD is robust to communication impairments caused by fading channels, as it can operate effectively with partial or asynchronous updates, mitigating the impact of packet loss or delays that often hinder KKT-based approaches. Finally KKT-based solutions typically involve solving complex optimization problems that require significant computational resources and are sensitive to changes in network conditions. DSGD, on the other hand, employs stochastic updates, allowing devices to compute gradients on smaller data subsets, significantly reducing computation and energy requirements. This makes DSGD particularly well-suited for resource-constrained devices in FEEL settings, where the iterative nature of DSGD ensures gradual convergence even in non-ideal conditions. The algorithm dynamically adapts to variations in wireless channel conditions by integrating local updates and employing mixing matrices or weights that account for communication reliability. This adaptability is critical for maintaining performance in environments with time-varying channel quality, where centralized KKT-based methods struggle to maintain consistency.
Optimizing Hybrid Automatic Repeat Request (HARQ) retransmissions under the constraints of unreliable wireless channels with fading conditions presents a significant challenge due to the dynamic nature of the environment and the processing load required. The constraints and variables involved in the optimization process change more rapidly than the feedback mechanism can provide updates. In the context of 6G networks, the federated learning (FEEL) paradigm, which involves a large number of devices collaborating in a decentralized manner, adds further complexity to this optimization task. Traditional global optimization techniques have focused on finding the global minimum or maximum of a function, even when its analytical expression is unavailable but can be evaluated. However such evaluations are often computationally expensive, particularly in the context of wireless networks with critical real-time processing.
A promising development in this domain is the adoption of techniques based on general radial basis functions (RBF) which demonstrate significant potential in tackling global optimization problems, especially for partially known or difficult-to-evaluate functions [
33]. In mathematics a radial basis function (RBF) is a real-valued function φ whose value depends only on the distance between the input and any desired or predefined fixed point c. The strength of RBFs lies in their ability to approximate complex functions effectively, providing a practical way to navigate optimization landscapes where explicit mathematical formulations are infeasible. In the specific context of 6G networks, RBFs have facilitated advancements in federated edge computing and learning [
34]. These methods enable efficient optimization by leveraging the decentralized nature of federated learning, distributing the computational load across edge devices while accounting for the unreliability of wireless channels. By approximating the cost and utility functions with RBFs, it becomes possible to make near-optimal decisions in real-time, despite the rapidly changing network conditions [
35].
The radial basis function (RBF) approach offers distinct advantages over the Decentralized Stochastic Gradient Descent (DSGD) algorithm for global optimization of HARQ retransmissions in large-scale federated learning (FEEL) under fading wireless channels. Unlike DSGD, which is iterative and dependent on stochastic updates, RBF methods construct surrogate models that approximate the underlying cost function. This allows RBF to evaluate complex, partially known functions with fewer iterations, making it more computationally efficient for resource-constrained FEEL scenarios. Another key advantage is the ability of RBF to adapt to limited and noisy feedback from the network, reducing dependency on gradient information that may be unreliable in fading wireless environments. While DSGD relies on consistent communication among devices for gradient updates, RBF can operate effectively with sparse or incomplete data, leveraging its interpolation capabilities. This flexibility makes them particularly suitable for HARQ retransmissions in federated learning scenarios, where the communication and computation interplay is demanding careful consideration.
To our knowledge there exists not any recent study to address the challenge the HARQ retransmissions minimizing retransmission delays while maximizing data reliability in 5G or 6G networks using RBF global optimization approach. Our paper work scope is to take an optimal global decision in HARQ retransmission scenarios by selecting the values of specific variables that yield the most desirable outcomes. By enabling adaptive decision-making and reducing processing overhead, HARQ retransmissions provide a robust framework for addressing the complexities of 6G networks and FEEL environments. Ultimately, this approach represents a significant step forward in achieving efficient and scalable optimization for HARQ retransmissions under challenging wireless channel conditions