Submitted:
30 November 2025
Posted:
02 December 2025
You are already at the latest version
Abstract
Keywords:
0. Introduction
- Latency. Round-trip times to remote data centers are often incompatible with interactive or safety-critical applications such as AR/VR, cooperative driving, or closed-loop industrial control.
- Bandwidth. Uploading raw or lightly processed sensor data from massive fleets of devices is expensive, competes with other traffic, and can be infeasible in constrained or bursty networks (e.g., cellular uplinks, LPWANs).
- Privacy, regulation, and data residency. Data-protection laws (e.g., GDPR, HIPAA) and sector-specific regulations increasingly restrict cross-border and cross-organizational movement of raw data, while users and operators demand stronger privacy and governance guarantees.
- Robustness, sustainability, and energy. Heavy dependence on centralized infrastructure creates single points of failure and exacerbates the carbon footprint of repeated long-haul transfers and always-on connectivity, conflicting with emerging goals in green and resilient AI.
- Edge AI pushes inference and, increasingly, training closer to data sources—onto edge servers (MEC nodes, micro data centers, gateways) and resource-constrained devices (phones, wearables, microcontrollers). Edge AI exploits local compute and storage to deliver low-latency responses, reduce backbone traffic, and improve robustness to connectivity disruptions.
- Federated learning (FL) trains global or multi-tenant models across many clients without centralizing raw data. Clients perform local optimization on their private datasets and share model parameters, gradients, or compressed updates with one or more aggregators, which periodically produce global or regional models.
- Hierarchical and multi-tier FL that explicitly mirrors network hierarchies (device–edge–cloud, multi-level edge), combining local adaptation with global coordination.
- Edge-only and on-premise FL for hospitals, factories, and critical infrastructure, where data and coordination remain within tightly governed domains.
- Decentralized and blockchain-enabled FL, using gossip, consensus, and distributed ledgers to remove or weaken central trust anchors in vehicular, UAV, and community networks.
- Split and hybrid architectures that combine FL with split learning and model offloading, enabling devices and TinyML platforms to leverage large backbones hosted at edge or cloud while keeping sensitive signals local.
0.1. Scope and Contributions
- Set the stage. We introduce the basics of cloud–fog–edge computing, multi-tier network hierarchies, and edge/on-device AI, and we formalize the FL optimization problem and its variants (cross-device, cross-silo, personalized, hierarchical, and decentralized) relevant to edge settings.
- Map the architectural design space. We develop a taxonomy of FL architectures for edge AI, spanning centralized, hierarchical, edge-only, decentralized, and split/hybrid patterns. For each, we discuss coordination topology, aggregation locality, trust and failure models, communication complexity, and representative application domains, supported by comparative tables.
- Analyze cross-layer challenges and techniques. We synthesize core challenges in federated edge AI—statistical heterogeneity and personalization, system heterogeneity and stragglers, communication and energy efficiency, and privacy, security, and trust. We review enabling techniques such as personalized FL, client and cluster selection, compression and over-the-air aggregation, green FL strategies, and combinations of differential privacy, secure aggregation, and robust/Byzantine-resilient aggregation.
- Survey hardware and software enablers. We review edge hardware accelerators, TinyML and microcontroller platforms, and neuromorphic/processing-in-memory prototypes, along with FL and edge AI frameworks (TensorFlow Federated, Flower, FedML, FATE, OpenFL, TinyFL, and related tools) that support simulation, prototyping, and deployment across device–edge–cloud continua.
- Highlight applications and case studies. We organize representative applications in healthcare and wearables, intelligent transportation, industrial IoT and smart manufacturing, and smart cities and infrastructure, emphasizing the interplay between data modalities, latency and residency constraints, and chosen architectural patterns.
- Articulate future directions. We outline research frontiers in federated continual and lifelong learning, the integration of foundation models with on-device adaptation, green and trustworthy FL across tiers, and the convergence of FL with TinyML, neuromorphic computing, and emerging hardware paradigms.
0.2. Organization
1. Background
1.1. From Cloud to Edge Computing
- Latency. Interactive or safety-critical loops (e.g., collision avoidance, robotic control, real-time AR/VR overlays) cannot tolerate hundreds of milliseconds of round-trip latency to the cloud and back.
- Bandwidth. High-resolution video, LiDAR, and dense telemetry streams can saturate backhaul links if naively uploaded; this is especially acute in mobile and wireless environments.
- Data residency and privacy. Regulations and internal governance often require that sensitive data (medical records, industrial process logs, location traces) stay within certain legal or administrative domains.
- Resilience. Dependence on a small number of centralized data centers introduces single points of failure and can make systems fragile to outages, overloads, or connectivity disruptions.
- Cloud. Large, centralized data centers with very high compute density and storage capacity, accessible over wide-area networks (WANs).
- Fog / regional edge. Intermediate aggregation and compute points—metro data centers, operator points-of-presence, regional edge sites—that sit closer (in network terms) to end devices than core cloud.
- Edge and device. On-premise micro data centers, base stations, gateways, and the devices themselves (phones, wearables, embedded controllers, microcontrollers).
1.2. Edge AI and TinyML
- Reducing latency. By running inference (and sometimes training) directly on devices or nearby edge servers, systems avoid wide-area round-trips and can respond in a few milliseconds instead of hundreds.
- Lowering bandwidth and backhaul load. Instead of streaming raw video or telemetry, devices can extract features, predictions, or compressed summaries locally and only transmit what is necessary (e.g., alerts, aggregated statistics, or model updates).
- Improving privacy and availability. Keeping raw data local to devices or on-premise sites reduces exposure and enables operation even when WAN connectivity is intermittent or unavailable.
- Model compression. Techniques such as pruning (removing redundant weights or channels), quantization (representing parameters and activations with lower precision), knowledge distillation (training smaller student models from larger teachers), and low-rank factorization reduce model size and compute without catastrophic loss in accuracy.
- Architecture and hardware-aware design. Neural architecture search (NAS) and manual design of compact architectures (e.g., MobileNet-style, shift or depthwise separable convolutions, lightweight attention blocks) produce models that fit within the memory, compute, and latency envelopes of edge hardware.
- TinyML. TinyML targets microcontroller-class devices with kilobytes of RAM and milliwatt-level power budgets. TinyML models support always-on sensing (e.g., keyword spotting, anomaly detection, vibration analysis) and frequently rely on aggressive quantization, integer-only kernels, and highly optimized runtimes.
1.3. Federated Learning Fundamentals
- Server broadcast. At round t, the server samples a subset of clients from and broadcasts the current global model parameters to each .
- Local training. Each selected client initializes its local model with and performs several epochs of (stochastic) gradient descent on its local data , obtaining an updated local model .
- Aggregation. The server collects the updated models (or equivalent updates) from participating clients and aggregates them, typically via a weighted average:where . The resulting becomes the new global model and the process repeats.
- Non-IID and unbalanced data. Clients often observe different data distributions (e.g., different users, locations, devices, or environments), and some clients may have orders of magnitude more data than others. This statistical heterogeneity can slow convergence and result in global models that are suboptimal for many clients.
- System heterogeneity. Edge devices vary widely in compute, memory, energy availability, and connectivity. Some may be offline, intermittently connected, or unable to complete local training in time. Algorithms must accommodate partial participation, stragglers, and varying capabilities.
- Communication bottlenecks. Models can be large, and frequent transmission of full parameters or gradients may be prohibitively expensive over wireless links or congested backhaul. This motivates compression, sparsification, structured updates, and reduced communication frequency.
- Privacy and security. Although raw data never leaves devices, model updates can leak information (e.g., via gradient inversion or membership inference attacks). Additionally, some clients or even servers may be adversarial, attempting to poison the model or infer sensitive attributes.
1.4. Taxonomy of FL for Edge AI
-
Topology. How do clients and aggregators communicate?
- Centralized: a single logical server (often in the cloud) orchestrates training in a star topology.
- Hierarchical / multi-tier: devices first aggregate at edge nodes, which in turn synchronize with regional or core clouds, forming a tree or DAG.
- Decentralized / peer-to-peer: there is no central server; nodes exchange updates with neighbors in a mesh, overlay, or gossip network.
-
Objective. What is the learning goal across clients?
- Global model: all clients share a single model, as in classical FL formulations.
- Personalized FL: each client adapts a shared model or learns client-specific parameters, balancing global knowledge with local specialization.
- Clustered FL: clients are grouped into clusters (e.g., by data or topology), each with its own model; clusters may share information at higher tiers.
- Multi-task FL: each client or cluster is treated as a related but distinct task, and the system learns to share representations or priors rather than a single model.
-
Resource-awareness. How does the system explicitly account for communication, computation, and energy constraints?
- Communication-efficient: leverages compression, sparsification, update subsampling, reduced participation, or over-the-air aggregation to reduce traffic.
- Computation-aware: adapts local training effort, model size, or split placement to device capabilities and current load.
- Energy-aware / green FL: explicitly optimizes or constrains energy usage and carbon impact across devices, edges, and clouds.
-
Trust model. What are the assumed adversaries and defenses?
- Honest-but-curious server: the server executes the protocol but may attempt to infer information from updates; secure aggregation and differential privacy are common defenses.
- Malicious or Byzantine clients: some clients may send arbitrary or adversarial updates; robust aggregation and anomaly detection are needed.
- Byzantine servers or aggregators: in decentralized or multi-tier systems, intermediary nodes themselves may be compromised, necessitating consensus, replication, or blockchain-style mechanisms.
- Formal privacy guarantees: some systems require -differential privacy at the user or client level, or cryptographic end-to-end guarantees using secure multi-party computation or homomorphic encryption.
2. Architectures for Federated Edge AI
2.1. Centralized FL with Edge Devices
2.1.1. Basic star topology and workflow
Model broadcast and client sampling.
Local training and objective.
Update upload, aggregation, and model semantics.
- full model parameters ;
- a model delta ;
Temporal behavior and system orchestration.
Why the star topology remains a baseline.
2.1.2. Cross-device vs. cross-silo regimes at the edge
Cross-device FL: many tiny, unreliable clients.
Cross-silo FL: few powerful, stable clients.
Axis of contrast: scale, reliability, and governance.
- Scale and granularity. Cross-device FL involves many low-resource clients with small datasets; cross-silo FL involves few high-resource clients with large datasets.
- Reliability and participation. Cross-device FL expects intermittent, opportunistic participation and must tolerate high churn; cross-silo FL expects almost always-on silos and often assumes full or near-full participation.
- Governance and trust. In cross-device FL, clients are typically end-user devices governed by a single platform provider, and privacy is enforced via platform policies, secure aggregation, and differential privacy. In cross-silo FL, each silo may represent a separate organization with its own governance, legal constraints, and trust assumptions, leading to more complex threat models and negotiation of protocols [8,17].
Implications for federated edge AI.
2.1.3. Performance and limitations in edge contexts
- WAN bottleneck and latency. Every participating device uploads and downloads model parameters or gradients across potentially long and noisy wide-area links. If the model has size and the per-round participation fraction is q, the total WAN traffic per round scales as in both uplink and downlink [4].
2.2. Hierarchical and Multi-Tier FL
2.2.1. Intuition: matching the network hierarchy
2.2.2. Two-tier device–edge–cloud architectures
Tier 1: device–edge (local clusters).
Tier 2: edge–cloud (global coordination).
Nested FedAvg and multi-timescale optimization.
2.2.3. Multi-tier and topology-aware FL
From two tiers to multiple tiers.
- fast device-level rounds every few seconds or minutes within a cell;
- regional aggregation every few minutes or hours; and
- global aggregation once per day or per week.
Clustered FL: grouping by similarity or locality.
Tree-based aggregation: leveraging routing structure.
Hybrid centralized–decentralized schemes.
2.2.4. System-level trade-offs and design knobs
Aggregation cadence: how often tiers synchronize.
Cluster formation: who should train together.
- Network-centric: group clients by latency, bandwidth, or radio conditions (e.g., same base station, same WLAN).
- Data-centric: group clients with similar data distributions (e.g., similar label distributions or embedding statistics).
- Task-centric: group clients that solve similar tasks or share similar models.
Role placement: where to put aggregators.
Staleness management: dealing with delayed and outdated models.
- Staleness-aware weighting: updates that are older or computed on older model versions receive lower weight in aggregation.
- Asynchronous protocols: servers do not wait for all aggregators or devices; they integrate updates as they arrive, often with correction terms.
- Bounded staleness: enforcing maximum age limits on updates or requiring periodic resynchronization.
Communication complexity: how much traffic, and where.
2.3. Edge-Only and On-Premise FL
2.3.1. Motivation: data residency and offline operation
2.4. Architectural patterns
Single-site edge-only FL.
Multi-site private federation.
- Multi-plant industrial groups: each plant trains local models from its own equipment and process data using on-site edge servers; periodically, plants share model updates or calibrated summaries over a private WAN to build a corporate-level model that captures fleet-wide patterns (e.g., failures across all turbines) [30,36,41].
Integration with existing industrial and healthcare stacks.
Deployment and platform choices.
- Custom in-house frameworks built on general-purpose distributed training libraries (e.g., PyTorch Distributed, Ray, Kubernetes operators) with domain-specific integration.
2.4.1. Advantages and constraints
Advantages.
Constraints.
Hybrid edge-only + cloud pretraining.
2.4.2. Representative edge-only FL deployments and frameworks
2.5. Decentralized FL
2.5.1. Removing the central server
2.5.2. Gossip and consensus protocols
Randomized gossip.
- It is inherently asynchronous: nodes can initiate gossip whenever they are available, without waiting for global synchronization.
- It adapts naturally to dynamic graphs: as long as the time-averaged connectivity graph remains connected, convergence can be guaranteed.
- It is local: each node communicates only with a small subset of neighbors, which matches constraints in wireless and ad hoc networks.
Deterministic consensus methods.
Overlay networks and logical topologies.
- Ring overlays, where nodes are arranged in a logical ring and communicate primarily with their immediate logical neighbors.
- Expander graphs and other high-connectivity structures, which ensure rapid mixing and robustness to node or edge failures.
- Small-world networks, which combine local connectivity with a few long-range links to reduce average path length.
Use cases in edge environments.
- Vehicle-to-vehicle (V2V) learning: vehicles in proximity can exchange model updates directly over V2V links, learning traffic patterns or driving policies collaboratively without relying on base-station connectivity [23].
- UAV swarms: drones performing cooperative sensing or search may run DFL over mesh networks, adapting models as they move and communicate intermittently [53].
- Community wireless or mesh networks: user-owned routers and access points can share models for anomaly detection or resource allocation via DFL, reflecting the decentralized governance of the underlying infrastructure.
2.5.3. Blockchain-enabled DFL
Ledger as a coordination substrate.
- Submit a transaction containing the update (or a compressed/hash representation) to the blockchain.
- Let the consensus protocol order and validate these transactions.
- Use smart contracts to aggregate (e.g., average) the updates contained in recent blocks into a new global model state.
Auditable contributions and incentives.
Resilience and trust decentralization.
Overheads and privacy implications.
2.5.4. Challenges for DFL at the edge
Topology design and dynamics.
Adversaries and trust.
Energy and spectrum usage.
System heterogeneity and fairness.
Debugging, monitoring, and lifecycle management.
2.5.5. Representative DFL protocols and systems at the edge
2.6. Split and Hybrid Architectures
2.6.1. From FL to split learning
- a front (or client-side) sub-network consisting of layers , which runs on the device or edge node; and
- a back (or server-side) sub-network consisting of layers , which runs on a more powerful edge server or cloud node.
- Computation offloading. Because only the front sub-network runs on the device, the majority of parameters and floating-point operations (FLOPs) can be offloaded to the server. This allows the overall model to be much larger than what would fit entirely on the device, enabling edge applications to benefit from state-of-the-art architectures [59].
- Bandwidth vs. privacy. Transmitted activations generally have much lower dimensionality than raw inputs, particularly for high-resolution images or long text sequences. This reduces bandwidth usage relative to sending raw data. At the same time, activations may obfuscate some aspects of the input, providing a degree of privacy; however, they can still leak information via inversion attacks, motivating additional protections such as noise injection, activation compression, or cryptographic masking [58,59].
- Limited exposure of labels and gradients. In many SL variants, labels reside only on the server side, or are kept on the client and used jointly with cut-layer gradients, depending on the threat model. In medical imaging, for example, it is common to keep labels and sensitive metadata only on the hospital side while offloading some computation to a semi-trusted cloud or vendor server [59,60].
- Protocol complexity. Unlike standard FL, where only models or gradients are exchanged once per round, SL introduces per-batch interactions: each mini-batch requires sending activations and receiving gradients. This places stricter demands on latency and availability of links between device and server, and it complicates batching and pipeline parallelism. Many systems adopt micro-batching, asynchronous queues, or pipelined training to amortize these costs [59].
2.6.2. Split-federated and three-tier hybrids
Split-federated learning (SpFL).
- SL loop: for each client, per-batch activations and gradients flow between the client’s front and the shared back-end, enabling end-to-end training without exposing raw data.
- FL loop: across clients, front sub-network parameters are periodically aggregated at the server (or at an intermediate edge node) to form a global front, which is then redistributed.
U-shaped split FL for encoder–decoder models.
- Client hosts early encoder layers and late decoder layers, keeping raw images and final predictions on-site.
- Server hosts mid-level encoder and decoder layers, using them to capture global structure and complex patterns.
Three-tier U-shape split learning (client–edge–cloud).
- The client hosts very shallow input layers (e.g., initial convolutions), pre-processing data and extracting low-level features while ensuring that raw sensor values remain local.
- The edge hosts intermediate layers that capture regional or cluster-specific patterns, possibly shared across a small number of clients.
- The cloud hosts deep backbones or task heads that require large memory or compute, such as transformer blocks or large classification/regression heads.
Edge–cloud collaborative FL/SL and split fine-tuning.
- Cloud-to-edge transfer: A large cloud model is split, and its early layers are distilled or fine-tuned at the edge using local data; logits or intermediate features from the cloud guide edge training (teacher–student style).
- Edge-to-cloud transfer: Edge FL models provide distilled knowledge (e.g., averaged logits, feature statistics) back to the cloud, which uses them to refine its global model without direct access to raw edge data.
- Selective splitting: Only certain layers or modules are split across edge and cloud, while others remain entirely local or entirely centralized; split decisions may change over time based on load and resource conditions.
2.6.3. Resource and privacy trade-offs
Compute vs. communication.
- Early splits (small ℓ) place only a few layers on the device. This minimizes on-device compute and memory but requires sending larger activations, since early feature maps often have high spatial or temporal resolution. Early splits therefore favor low compute but high communication.
- Late splits (large ℓ) place many layers on the device, which reduces the dimensionality of activations and thus communication cost, but increases local compute and memory usage. Late splits therefore favor high compute but low communication.
Privacy vs. utility.
- Federated learning: rather than training a single split model per client in isolation, multiple clients engage in FL to share knowledge via parameter averaging, making it harder to attribute any particular behavior to a single client.
- Secure aggregation and encryption: activations or gradients may be encrypted or aggregated across clients at intermediate nodes before reaching the back-end, reducing exposure even further.
Scheduling and placement.
- available resources at each tier (compute, memory, network);
- workload patterns (burstiness, diurnal variations, mobility);
- application-level SLAs (latency, throughput, availability); and
- security and privacy policies (which layers and data can leave certain domains).
2.6.4. Representative split and hybrid architectures in edge AI
| Category | Architecture / framework | Split location / tiers | FL coupling | Representative edge use cases | Ref. |
|---|---|---|---|---|---|
| Split learning (baseline) | Original split learning | Single cut between client and server; client: early layers; server: rest | None (per-client SL) | Mobile/embedded vision, speech; small clients leveraging larger server models | [57,58] |
| Split learning survey | Edge–cloud SL survey | Single or multiple cuts between device and edge/cloud | None or optional FL | Vision and IoT analytics in MEC, industrial edge | [59] |
| Split-federated learning | SpFL for recommendation / personalization | Client: front sub-networks; server: shared back-end | Fronts trained via FL; back-end shared or partially centralized | On-device recommendation, user modeling, content ranking at the edge | [61] |
| Split-federated learning | RoS-FL (U-shaped split FL) | U-Net-like encoder/decoder split across client and server | Client-side segments aggregated via FL; mid/back segments centralized | Medical image segmentation at hospital edge with vendor/cloud assistance | [60] |
| Three-tier split learning | FUSE (three-tier U-shaped SL) | Client: shallow encoder/decoder; edge: mid-level; cloud: deep backbone | FL across clients and possibly across edge nodes; global backbones shared at cloud | Industrial monitoring and anomaly detection with theft-resilient learning | [62] |
| Three-tier split learning | EUSFL (edge–client–cloud U-shaped FL) | Client: input and output layers; edge: mid encoder; cloud: shared head or deep layers | FL at client–edge and edge–cloud boundaries; supports hierarchical FL | Medical imaging and multi-site healthcare with strict privacy partitioning | [63] |
| Edge–cloud collaborative FT | Split fine-tuning (SFT) | Cloud model partially split; early layers adapted at edge; rest in cloud | Edge models trained via FL; cloud model updated using distilled knowledge | Personalization of large pre-trained models at the mobile/edge tier | [64] |
| Edge–cloud collab. FL/SL | Edge–cloud collaborative FL / knowledge transfer | Flexible split between edge backbones and cloud heads | FL among edge nodes; cloud learns from edge logits or features | Federated perception in autonomous driving, multi-camera edge vision | [65] |
| Unified split–FL | USFL (unified split + federated learning) | Multiple optional cut points across device/edge/cloud tiers | FL over selected segments at each tier; supports mixed deployment modes | General-purpose edge AI platform supporting heterogeneous devices and networks | [66] |
| Split learning variants | Label-/feature-split variants, vertical SL | Cuts separating feature owners and label owners across institutions | Often coupled with cross-silo FL | Cross-institution finance, healthcare, and advertising with vertically partitioned data | [58,59] |
| Category | Architecture / framework | Privacy / security features | Resource-aware mechanisms | Ref. |
|---|---|---|---|---|
| Split learning (baseline) | Original split learning | Raw data stays on client; optional label hiding; can add DP or crypto | Offloads most compute to server; fixed cut layer; basic activation compression | [57,58] |
| Split learning survey | Edge–cloud SL survey | Discusses activation leakage, DP, secure protocols | Surveys strategies for cut placement, activation compression, and pipelining | [59] |
| Split-federated learning | SpFL for recommendation / personalization | Data localized at clients; optional DP; back-end may be semi-trusted | Balances client compute vs. bandwidth by tuning front depth; uses model compression | [61] |
| Split-federated learning | RoS-FL (U-shaped split FL) | Keeps raw images and labels on client; mid-level activations only to server; can add DP | Allocation of encoder/decoder segments to balance compute/memory; heuristic split search | [60] |
| Three-tier split learning | FUSE (three-tier U-shaped SL) | Sensitive signals, labels at client; activations encrypted or compressed; optional DP | Joint placement of segments based on latency and compute; supports pipeline parallelism | [62] |
| Three-tier split learning | EUSFL (edge–client–cloud U-shaped FL) | Local output heads and labels never leave client; secure aggregation for shared layers | Dynamic selection of cut points; load-aware edge vs. cloud distribution | [63] |
| Edge–cloud collaborative FT | Split fine-tuning (SFT) | Edge keeps local data; cloud sees logits/features; can apply DP on distilled signals | RL- or heuristic-based selection of which layers to adapt at edge vs. cloud | [64] |
| Edge–cloud collab. FL/SL | Edge–cloud collaborative FL / knowledge transfer | No raw data leaves edge; only predictions or embeddings; optional DP | Schedules knowledge transfer based on link quality and resource state | [65] |
| Unified split–FL | USFL (unified split + federated learning) | Combines FL, SL, and DP; supports varying trust boundaries per customer/site | Uses RL or heuristic search for cut placement, tier mapping, and sync frequency | [66] |
| Split learning variants | Label-/feature-split variants, vertical SL | Each party sees only subset of features/labels; cryptographic protocols for joint training | Optimizes communication via feature selection, secure aggregation; handles heterogeneous schemas | [58,59] |
2.7. Design Space and Comparative View
- Coordination / topology. Who orchestrates training? Is there a single logical coordinator, multiple tiered coordinators, or no coordinator at all? How are clients connected in the logical communication graph (star, tree, cluster, mesh, overlay)?
- Aggregation locality. Where are model updates combined: in a central cloud, at edge clusters, within on-premise sites, or via fully peer-to-peer averaging? How many hops does an update traverse before being aggregated?
- Trust and failure model. Which entities are assumed to be trusted, semi-trusted, or adversarial? Where do we tolerate failures, and which failures are catastrophic (single point of failure) vs. local (cluster/site failures)?
- Communication cost and path. How many messages of what size traverse which links (device–edge, edge–cloud, peer-to-peer)? How does the cost scale with the number of clients K, the number of edge servers S, the model size , and the participation fraction q?
- Representative domains. Which architectures naturally align with the operational and regulatory reality of mobile personalization, IIoT, healthcare, critical infrastructure, or multi-stakeholder ecosystems?
| Architecture | Coordination / topology | Aggregation locality | Trust / failure model (typical) |
|---|---|---|---|
| Centralized FL (cross-device) | Star: device ↔ cloud | Single cloud server aggregates updates from a (small) sampled subset of devices | Server trusted or honest-but-curious; secure aggregation mitigates server visibility; server is single point of failure / bottleneck |
| Centralized FL (cross-silo) | Star: silo ↔ coordinator (cloud or regional edge) | Aggregation at central coordinator from tens to hundreds of silos | Coordinator semi-trusted; silos are institutions with separate governance; single point of failure at coordinator; strong emphasis on cross-org DP and secure aggregation |
| Hierarchical FL (2-tier device–edge–cloud) | Tree: device ↔ edge ↔ cloud | Within-cluster aggregation at edge servers; cross-cluster aggregation at cloud | Edge nodes and cloud are semi-trusted; failures mostly localized per cluster; global coordinator can still be single point of failure |
| Hierarchical / multi-tier FL (multi-level edge) | Tree / DAG spanning access, metro, regional, and core | Aggregation at multiple tiers (access edge, metro edge, regional DC, core cloud) with different cadences | Trust roughly increases toward core; failures can be isolated per branch; robustness depends on redundancy of mid-tier nodes |
| Clustered FL (topology-aware) | Clustered star / cluster mesh | Aggregation first within clusters (logical or geographic), then across cluster heads | Cluster heads semi-trusted; failures localized per cluster; cluster formation and reconfiguration affect trust assumptions |
| Edge-only FL (single site) | Star or small tree: device ↔ on-prem edge | Aggregation entirely within site (plant, hospital, campus) at one or a few edge servers | Single administrative domain; high trust in on-prem edges; failures local to site; internet outages do not impact FL |
| Edge-only FL (multi-site private federation) | Two-level tree: device ↔ site edge; sites interconnected via private backbone | Aggregation at site edges; cross-site aggregation at corporate DC or higher-level private edge | Single organization or tightly governed consortium; trust anchored in corporate DC; failures localized per site / region |
| Decentralized FL (gossip / consensus) | Mesh or overlay (ring, expander, small-world) among peers | No explicit aggregation point; updates mixed locally via gossip or consensus | No central trust anchor; robustness via redundancy; vulnerable to local adversaries; security depends on local and graph-wide assumptions |
| Blockchain-enabled FL / DFL | Mesh or hub-and-spoke with blockchain nodes; logical ledger overlay | Logical aggregation via ledger and smart contracts; physical aggregation may still occur at powerful validator nodes | Trust decentralized across validators; immutability and auditability; consensus tolerates some Byzantine faults but at higher cost |
| Split / hybrid FL–SL (device–cloud) | Star with cut layers: device ↔ server (edge or cloud) | Front-end layers at device; back-end layers at server; aggregation may occur at fronts (FL) and/or back-end (centralized) | Trust partitioned: device trusted with raw data, server trusted with back-end model; additional risk of activation leakage; failures at server impact many clients |
| Split / hybrid FL–SL (device–edge–cloud) | Mixed: device ↔ edge ↔ cloud with multiple cuts | Fronts at device, mid-layers at edge, heavy backbones at cloud; FL at device–edge and/or edge–cloud boundaries | Trust distributed across tiers; strict residency constraints can pin certain layers to specific domains; failures at intermediate tiers degrade performance locally |
| Architecture | Comm. cost per global round (rough) | Representative works / domains |
|---|---|---|
| Centralized FL (cross-device) | uplink + downlink over WAN; uplink from devices is often dominant | On-device personalization for keyboards and recommendations, cross-device analytics [7,8,9,10] |
| Centralized FL (cross-silo) | uplink + downlink, but with K small (tens–hundreds) and often over well-provisioned links | Cross-hospital learning, cross-factory models, financial consortia, cross-operator collaboration [5,8,17] |
| Hierarchical FL (2-tier device–edge–cloud) | over device–edge (local/LAN) + over edge–cloud (WAN); since , WAN traffic is reduced vs. centralized FL | MEC, smart factories, hospital groups, regional telecom deployments [1,19,24,25,27] |
| Hierarchical / multi-tier FL (multi-level edge) | Sum of tier-wise costs: with nodes and possibly tier-specific model sizes | Large-scale IIoT, nation-wide telecom, smart-city infrastructures [5,6,21] |
| Clustered FL (topology-aware) | within clusters (often over LAN/regional links) + at cluster-head layer, C number of clusters | Latency-aware and data-aware clustering in MEC and IIoT; region-specific personalization [19,25,27] |
| Edge-only FL (single site) | over LAN / private 5G only; WAN cost essentially zero | Predictive maintenance, industrial control, on-prem healthcare analytics, security analytics on corporate campuses [18,20,35,37] |
| Edge-only FL (multi-site private federation) | within sites + across sites over private WAN; can be scheduled infrequently or offline | Multi-plant industrial groups, regional hospital networks, rail and energy operators [30,31,36,42] |
| Decentralized FL (gossip / consensus) | over peer-to-peer links, where is average node degree; traffic localized if neighborhood small | Community networks, mesh Wi-Fi, sensor swarms, ad hoc or delay-tolerant networks [6,22,50,51] |
| Blockchain-enabled FL / DFL | Additional or per block for B validators; latency and energy overhead from consensus; storage cost for ledger | Multi-stakeholder IoT consortia, cross-provider collaboration, incentive-driven open federations [23,54,55] |
| Split / hybrid FL–SL (device–cloud) | Per batch: activations uplink + gradients downlink; plus periodic FL sync for fronts | Mobile AR/VR, vision-based assistance, medical imaging with vendor cloud backbones, large-model personalization [57,58,59,60] |
| Split / hybrid FL–SL (device–edge–cloud) | Combination of device–edge + edge–cloud activations, plus FL sync terms and | Smart factories and IIoT, multi-site healthcare, edge–cloud foundation-model adaptation and split fine-tuning [61,62,63,64,66] |
| Architecture | Strict data residency (per site / country) | Offline operation (no public internet) | Support for massive K (cross-device) | Support for large backbones (billions of params) | No single trusted coordinator | Alignment with legacy edge / OT systems |
|---|---|---|---|---|---|---|
| Centralized FL (cross-device) | (✓) via DP + secure agg, but cloud still sees updates | × (depends on cloud connectivity) | ✓(designed for millions of devices) | (✓) if model fits in device memory; heavy backbones challenging on low-end devices | × (central server) | (✓) for mobile platforms; less natural for OT/IIoT |
| Centralized FL (cross-silo) | (✓) with careful region-specific hosting and legal controls | (✓) if silos connect via private WAN only, but usually some external connectivity | × (typically tens–hundreds silos) | ✓ (silos often have server-class hardware) | × (central coordinator) | (✓) in modern IT-heavy OT environments |
| Hierarchical FL (2-tier / multi-tier) | (✓) when edge tiers are placed along jurisdictional boundaries | (✓) if clouds replaced by on-prem hubs; otherwise limited by top tier | ✓(edge tier absorbs massive cross-device scale) | ✓(backbones at edge / cloud) | × or (✓) if top tier replicated | ✓(naturally mirrors access–aggregation–core hierarchies) |
| Edge-only FL (single / multi-site) | ✓(data and updates confined to site or private backbone) | ✓(designed for disconnected or air-gapped settings) | (✓) within site; overall K limited by deployment scale | ✓(on-prem GPU clusters for heavy models) | × or (✓) if sites coordinate via multi-master patterns | ✓(explicitly integrates with industrial and healthcare stacks) |
| Decentralized FL (gossip / DFL) | (✓) if communication restricted within jurisdiction and data never leaves devices | ✓(no reliance on central infrastructure) | (✓) but convergence and overhead grow with K | (✓) only on capable nodes; small devices may host compressed or partial models | ✓(no central coordinator; trust spread over graph) | (✓) in peer-managed or community OT scenarios; integration non-trivial |
| Blockchain-enabled FL / DFL | ✓(permissioned ledgers scoped to region / consortium) | (✓) if ledger runs entirely on-prem; consensus requires connectivity among validators | (✓) at protocol level; practical scale limited by ledger performance | ✓(validators can host large models; clients can remain light) | ✓(trust anchored in consensus among validators) | (✓) in consortia willing to run blockchain infra; heavier to integrate with legacy OT |
| Split / hybrid FL–SL (device–cloud) | (✓) for inputs and labels (stay on device) but activations cross boundaries | × or (✓) if “cloud” is replaced with on-prem edge only | ✓(fronts can be small and light) | ✓(back-end hosted centrally with large capacity) | × (back-end server is critical) | (✓) where OT devices can run thin front-ends and connect to central analyzers |
| Split / hybrid FL–SL (device–edge–cloud, USFL-style) | ✓(layers pinned to specific tiers and jurisdictions) | ✓(if cloud tier is optional or replaced by central on-prem) | ✓(multi-tier scaling similar to hierarchical FL) | ✓(heavy backbones at cloud; modest partitions at edge) | (✓) if multiple coordinators and mixed FL/DFL are used | ✓(flexible mapping of layers to OT vs. IT infrastructure) |
- Cloud connectivity and WAN sensitivity. If WAN bandwidth and latency are dominant concerns, but a cloud (or central regional hub) is acceptable, hierarchical FL is a natural evolution from centralized FL. It pushes as much traffic as possible into cheap device–edge links and reserves WAN for aggregated updates. Multi-tier designs offer further savings when access, metro, and core tiers can be exploited.
- Data residency and offline operation. If data residency and offline operation are paramount—e.g., due to regulatory constraints, exposure risk, or physical isolation—edge-only FL (possibly with multi-site private federation and scheduled synchronization) is attractive. Here, the central design question is how to align FL boundaries with legal and organizational perimeters (sites, regions, countries).
- Trust distribution and governance. If no single entity is fully trusted or if governance is fundamentally decentralized (e.g., across multiple organizations or communities), decentralized or blockchain-enabled FL becomes appealing despite higher protocol complexity. Architectures in this quadrant emphasize consensus, auditability, and incentive mechanisms over central control.
- Model capacity and device limits. If full models cannot fit on devices, or if cloud-scale backbones are desired, split or hybrid FL–SL architectures become necessary. These introduce additional dials: where to split, which tiers host which layers, and how often to synchronize FL across front-end partitions. In practice, large foundation models are often hosted centrally (cloud or powerful edge), while smaller front-ends run on devices for personalization and privacy.
- Topology and resilience. If the underlying network already has a strong hierarchical structure (e.g., telecom or utility networks), architectures that mirror this hierarchy (hierarchical/multi-tier FL, split FL across tiers) will be easier to deploy and more robust. In contrast, highly dynamic or infrastructure-poor environments (e.g., vehicular, UAV, or disaster scenarios) may favor gossip-based DFL or hybrid schemes that rely only on local contact opportunities.
3. Core Challenges and Enabling Techniques
3.1. Statistical Heterogeneity and Personalization
Local fine-tuning and head personalization.
Model interpolation and proximal regularization.
Meta-learning and good initializations.
Multi-task and cluster-based formulations.
Heterogeneity-aware benchmarks and design principles.
3.2. System Heterogeneity and Stragglers
Client selection and pacing.
Asynchronous and semi-synchronous FL.
Resource-aware scheduling and co-design.
3.3. Communication Efficiency
Compression: quantization, sparsification, and sketches.
Update frequency and local computation.
Structured and partial updates.
Over-the-air aggregation.
3.4. Energy Efficiency and Green FL
Lightweight models and TinyFL.
Energy-aware client selection and scheduling.
Edge offloading and co-location with computation.
3.5. Privacy, Security, and Trust
Inference attacks and defenses.
- Regularization and early stopping to avoid overfitting local data, which correlates strongly with susceptibility to membership inference;
- Representation-level defenses, such as learning more invariant or obfuscated representations that are less invertible from gradients.
Poisoning attacks and robust aggregation.
Byzantine behavior, trust, and decentralization.
Secure aggregation and hardware roots of trust.
3Auditing, accountability, and regulation.
4. Hardware and Software Enablers
4.1. Edge Hardware Accelerators
NPUs and DSPs in mobile and embedded SoCs.
- Mixed-precision arithmetic (e.g., INT8, INT4, FP16, bfloat16), enabling higher throughput and lower energy per operation compared to FP32;
- Sparsity exploitation, where zero-valued activations or weights are skipped or compressed to save bandwidth and compute;
- On-chip memory hierarchies that minimize off-chip DRAM access, which is often the dominant energy cost in edge devices.
- efficient local training of compact models (e.g., MobileNets, small transformers) on-device, provided that training kernels (backpropagation, optimizers) are supported by vendor libraries;
- low-latency, energy-efficient inference for continuously adapted models, enabling on-device personalization loops between FL rounds;
- support for mixed-precision FL, where local updates are computed and communicated in reduced precision, aligning with communication-efficient FL schemes.
Edge GPUs and ASICs in edge servers and gateways.
- sustain larger batch sizes and more complex backbones (e.g., ResNeXt, ViTs, LLM decoders), enabling heavy training tasks at the edge;
- provide richer programming models (CUDA, ROCm, vendor SDKs), easing integration with mainstream ML frameworks (PyTorch, TensorFlow, JAX);
- can be shared across multiple FL tasks via containerization and multi-tenant scheduling (e.g., Kubernetes), making them natural aggregation and training hubs for hierarchical FL.
Neuromorphic and processing-in-memory accelerators.
- extremely low-power nodes (sensors, wearables) learn local representations or features with ultra-compact online models;
- higher-tier devices (gateways, edge servers) perform heavier FL-style aggregation and meta-learning, using neuromorphic nodes as intelligent pre-processors;
- specialized FL algorithms exploit the analog nature and limited precision of neuromorphic updates, potentially combining them with error-correcting schemes at higher tiers.
4.2. TinyML and Microcontroller Platforms
TinyML runtimes and toolchains.
- Runtimes and libraries such as TensorFlow Lite for Microcontrollers (TFLM), CMSIS-NN, microTVM, and vendor-specific SDKs compile models into integer-only kernels that fit in microcontroller flash and RAM.
- Toolchains (e.g., Edge Impulse Studio, TinyML IDEs) provide end-to-end pipelines for data collection, model design, quantization, and deployment to MCUs.
Federated learning on TinyML devices.
- Model and update compression. TinyFL deployments rely on sub-100k-parameter models, aggressive weight quantization (e.g., INT4), and heavily compressed updates (e.g., sparse or binary masks) to keep communication and storage costs manageable.
- Traffic shaping and duty cycling. Communication is often scheduled infrequently and piggybacked on existing control traffic; nodes may participate in FL rounds only when energy budgets allow.
- FL-specific TinyML runtimes. Frameworks such as TinyFedTL implement federated transfer learning on microcontrollers, combining pre-trained small backbones with efficient on-MCU fine-tuning and compressed update protocols [116,117]. TinyReptile applies federated meta-learning (Reptile-style) to TinyML models, enabling rapid personalization with minimal communication [118].
On-device adaptation and intermittent FL.
- global TinyFL updates provide coarse-grained, fleet-level improvements;
- per-device online learning runs continuously at ultra-low power;
- intermittent connectivity and energy constraints are treated as first-class inputs to FL scheduling and model design.
4.3. Networking and Cloud–Edge Orchestration
5G/6G and MEC as FL substrates.
- network slicing can provision logical networks with different latency and bandwidth guarantees for FL traffic versus user data;
- radio resource management and scheduler design can explicitly account for FL rounds, prioritizing FL updates when needed and shaping the participation set;
Cloud–edge orchestration and placement.
- Workload placement. Deciding where each part of the FL pipeline runs—simulation, training, aggregation, evaluation, and serving—across containers, VMs, and nodes in a Kubernetes-style cluster;
- Topology management. Mapping FL logical topologies (centralized, hierarchical, clustered, decentralized) onto physical routing graphs, with attention to latency, bandwidth, and resilience [6];
- Lifecycle management. Handling versioning, model rollout, and rollback across thousands of devices and multiple sites, including safe co-existence of multiple FL experiments.
Integration with OT networks and protocols.
- deterministic timing and safety certification requirements;
- air-gapped or constrained connectivity, where backhaul to the public internet is limited or non-existent;
- co-existence with control loops, ensuring that FL traffic does not interfere with real-time control messages.
4.4. Frameworks for FL and Edge AI
- Research-oriented FL frameworks such as TensorFlow Federated (TFF), PySyft/Syft, and various simulators, which emphasize flexibility in expressing new algorithms and experimenting in controlled environments;
- Production-oriented FL platforms such as FedML, OpenFL, Substra, FATE, and FLUTE, which target cross-silo deployments in healthcare, finance, and industrial applications;
- Edge-friendly FL frameworks such as Flower, FedML, and TinyFL/TinyFedTL, which explicitly address heterogeneous devices, mobile platforms, and TinyML targets;
- On-device ML runtimes such as TFLite, microTVM, and vendor TinyML SDKs, which focus on inference (and limited training) on devices and must be integrated with FL stacks.
TensorFlow Federated and research-centric frameworks.
Production-grade and cross-silo frameworks.
- FATE (Federated AI Technology Enabler) is an industrial-grade platform focusing on secure cross-silo FL with strong support for homomorphic encryption and multiparty computation. It targets financial and enterprise consortia and has evolved into a rich ecosystem including KubeFATE, FATE-Flow, and FATE-LLM for LLM-centric federations [126,127].
Edge-centric and cross-device frameworks.
- Flower is a framework-agnostic FL library supporting PyTorch, TensorFlow, JAX, and even raw NumPy, with a strong emphasis on extensibility and language-agnostic clients. Comparative evaluations show that Flower scales to millions of simulated clients and offers flexible strategies for client selection, aggregation, and compression [106,121,134].
Extended comparison of FL and edge-AI frameworks.
| Framework / runtime | Primary stack / language | FL scope and topology | Edge and TinyML suitability |
|---|---|---|---|
| TensorFlow Federated (TFF) | Python, TensorFlow | Research / simulation (Sim); primarily centralized and cross-device abstractions; custom topologies implementable via Federated Core | Good for simulating large federations; limited direct support for production edge deployment; relies on separate runtimes for on-device inference |
| PySyft / Syft | Python, PyTorch-centric | Research / Sim; central or decentralized graphs; can model cross-silo and CD | Primarily research; edge deployment possible but requires manual integration with device runtimes; more focused on privacy pipelines than system orchestration |
| Flower (flwr) | Python; framework-agnostic (PyTorch, TF, JAX, NumPy) | CD, CS, Sim; centralized and hierarchical topologies via strategies; supports millions of simulated clients | Designed for heterogeneous clients; lightweight client SDKs; supports deployment on mobile, edge devices, containers, and cloud VMs; strong simulator support |
| FedML | Python; supports PyTorch, TF, etc. | CD, CS, Sim; hierarchical (device–edge–cloud) and cross-cloud FL; also distributed training (non-FL) | Explicit support for smartphones, IoT, edge servers, and multi-cloud; client SDKs for mobile; MLOps stack for orchestration and monitoring |
| Open Federated Learning (OpenFL) | Python; framework-agnostic (TF, PyTorch) | CS, Sim; primarily cross-institution FL; centralized or hub-and-spoke | Targeted at institutional edge/cloud nodes (hospitals, research centers); not aimed at tiny devices; works with containerized workloads |
| Substra | Python; backend-agnostic; web UI and CLI | CS; privacy-preserving orchestration across institutions; DAGs of computations over FL or other workflows | Deployed in hospital and biotech settings; assumes Kubernetes clusters at each site; not aimed at microcontrollers but suitable for hospital edge servers |
| FATE (Federated AI Technology Enabler) | Python/Java; multiple backends | CS; industrial-grade cross-silo FL; vertical, horizontal, and transfer FL; supports complex multi-party topologies | Targets enterprise data centers; not designed for tiny devices; can be integrated with edge gateways that speak FATE protocols |
| FLUTE (Microsoft) and similar simulators | Python, C++; tightly coupled to PyTorch / ONNX | Sim, CS; highly scalable simulation of thousands of clients for research and benchmarking | Primarily for offline experiments; used to generate insights before deployment in production frameworks |
| On-device ML runtimes (TFLite, TFLite Micro, microTVM, vendor TinyML SDKs) | C/C++, Python bindings; MCU and embedded-focused | Not FL frameworks by themselves; support local training in limited cases; FL requires external coordination layer | Ideal for TinyML inference; some support limited on-device training or fine-tuning; can serve as client runtimes for TinyFL frameworks |
| TinyFL / TinyFedTL and related TinyML-FL libraries | C/C++/Python hybrids; tightly coupled with MCU runtimes | CD (on MCUs); typically star or small-tree topologies with simple coordinators | Designed specifically for MCUs with tens of kB RAM; extreme focus on communication, energy, and on-device training cost |
| Other FL frameworks (e.g., LEAF, FedScale, Fed-BioMed, etc.) | Python and mixed stacks | Primarily benchmarking, domain-specific FL (e.g., medical, mobile workloads), or simulators | Edge suitability depends on integration with runtimes like Flower/FedML; often used as data/benchmarking layers rather than deployment frameworks |
| Framework / runtime | Built-in privacy / security features | Representative uses / notes |
|---|---|---|
| TensorFlow Federated (TFF) | DP primitives and secure aggregation examples; integrates with TF privacy; cryptographic extensions via external libs | Research on new FL algorithms, personalization, and DP; prototyping cross-device FL for mobile use cases [122] |
| PySyft / Syft | Strong focus on HE, MPC, and DP; advanced privacy-preserving computation backends | Prototype secure FL workflows with complex threat models; education and experimentation with privacy techniques [123] |
| Flower (flwr) | Supports secure aggregation and DP via strategy extensions; integration with external crypto libs; pluggable aggregation strategies | End-to-end FL from research to production; large-scale simulation studies; benchmarking FL algorithms; widely used in academia and industry [106,134] |
| FedML | DP and secure aggregation support; integration with MLOps; role-based access control; options for HE via plugins | Production AI platform for FL at scale; used for smartphone FL, cross-silo FL, and multi-cloud deployments [124,125,135] |
| Open Federated Learning (OpenFL) | Focus on confidential computing using SGX; secure aggregation and encrypted channels; support for confidential FL workflows | Healthcare imaging consortia (FeTS), cross-hospital FL, confidential FL for LLM fine-tuning and evaluation [131,132,133] |
| Substra | Ledger-based traceability and audit; strong permissioning; supports DP and secure orchestration; can integrate with HE/MPC tools | Healthcare research consortia (MELLODDY, HealthChain); traceable, compliant FL with strong governance and audit trails [128,129,130] |
| FATE (Federated AI Technology Enabler) | Rich HE/MPC support; secure protocols for vertical FL; DP, access control, auditing; production-grade security | Financial services, cross-silo enterprise FL; open ecosystem (KubeFATE, FATE-Flow, FATE-LLM) for secure, large-scale FL [126,127] |
| FLUTE (Microsoft) and similar simulators | Supports DP, secure aggregation, and realistic network models in simulation settings | Benchmarking FL algorithms at scale; evaluation of communication and robustness strategies in controlled environments [105] |
| On-device ML runtimes (TFLite, TFLite Micro, microTVM, vendor TinyML SDKs) | Some DP and encryption primitives at app level; no built-in FL security; rely on host systems for secure aggregation | Keyword spotting, anomaly detection, vibration monitoring, and other TinyML tasks; integrated into TinyFL/TinyFedTL-style systems for collaborative training [113,114] |
| TinyFL / TinyFedTL and related TinyML-FL libraries | Lightweight security (e.g., symmetric encryption, basic DP); constrained by MCU resources; some works explore HE at gateway nodes | First FL implementations targeting microcontrollers; federated transfer learning and meta-learning for TinyML tasks in IoT scenarios [114,116,117] |
| Other FL frameworks (e.g., LEAF, FedScale, Fed-BioMed, etc.) | Varying; often focus on data/benchmarking rather than cryptography; can integrate with DP and secure aggregation libs | Benchmarks and specialized domains (e.g., mobile workloads, biomedical data); complement general-purpose FL frameworks [105,106] |
5. Applications and Case Studies
| Domain | Typical clients / tiers | Main data modalities | Representative tasks |
|---|---|---|---|
| Healthcare & wearables | Hospitals, imaging centers, clinics, patient portals, wearables, home gateways | Imaging (MRI, CT, X-ray, ultrasound), ECG/PPG, EHR tables, clinical text, wearable time series | Lesion / tumor detection, segmentation, outcome prediction, arrhythmia detection, remote monitoring, risk scoring, triage, readmission prediction |
| Intelligent transportation | Vehicles (CAVs), roadside units (RSUs), traffic cameras, edge servers at intersections, traffic-control centers | Camera and LiDAR streams, radar, GPS trajectories, CAN bus signals, traffic counts, map/HD map updates | Collaborative perception, traffic-flow prediction, trajectory prediction, driver behavior modeling, traffic signal control, high-precision positioning |
| Industrial IoT & smart manufacturing | Machines, robots, PLCs, gateways, shop-floor edge servers, plant data centers, corporate clouds | Vibration and acoustic signals, PLC logs, machine status, quality inspection images, sensor arrays, MES/ERP logs | Predictive maintenance, quality inspection, anomaly detection, process optimization, demand forecasting, digital-twin synchronization |
| Smart cities & infrastructure | Cameras, traffic sensors, smartphones, smart meters, building controllers, lamppost gateways, city/utility data centers | Video, mobility traces, smart-meter readings, environmental sensors (air quality, noise), grid telemetry, social media signals | Mobility and crowd analytics, anomaly and incident detection, smart-grid forecasting, adaptive lighting and HVAC control, environmental monitoring |
| Domain | Arch. pattern (typical) | Example works |
|---|---|---|
| Healthcare & wearables | Cross-silo FL (C) for hospitals; cross-device FL (C/H) for wearables; edge-only FL (E) in hospital networks; emerging split / hybrid (S) with cloud-hosted backbones | Surveys of FL in healthcare and smart healthcare [137,138,144]; cardiology- and arrhythmia-focused FL [145,146,147]; wearable-focused and IoT health FL [148,149] |
| Intelligent transportation | Hierarchical FL (H) with vehicles–RSUs–cloud; decentralized / gossip (D) among vehicles; edge-only (E) inside roadside clusters | ITS-focused FL surveys [139,150]; cooperative perception frameworks [151,152]; vehicular positioning and edge computing [153,154] |
| Industrial IoT & smart manufacturing | Edge-only FL (E) within plants; hierarchical FL (H) across lines/plants; multi-site cross-silo FL (C); early decentralized pilots (D) for collaborative robotics | Surveys on FL+IIoT and smart manufacturing [141,155,156]; predictive maintenance and quality inspection case studies [157,158,159]; cross-plant and product-lifecycle FL [160] |
| Smart cities & infrastructure | Hierarchical FL (H) (device–edge–city DC); edge-only FL (E) within utilities; cross-silo FL (C) across agencies; blockchain-enabled FL (D) in multi-stakeholder settings | Surveys on FL for smart cities and privacy [142,143]; FL for smart grids and energy systems [161,162]; smart-city infrastructure and edge-AI FL case studies [163,164,165] |
| Domain | Arch. pattern (typical) | Example works |
|---|---|---|
| Healthcare & wearables | Cross-silo FL (C) for hospitals; cross-device FL (C/H) for wearables; edge-only FL (E) in hospital networks; emerging split / hybrid (S) with cloud-hosted backbones | Surveys of FL in healthcare and smart healthcare [137,138,144]; cardiology- and arrhythmia-focused FL [145,146,147]; wearable-focused and IoT health FL [148,149] |
| Intelligent transportation | Hierarchical FL (H) with vehicles–RSUs–cloud; decentralized / gossip (D) among vehicles; edge-only (E) inside roadside clusters | ITS-focused FL surveys [139,150]; cooperative perception frameworks [151,152]; vehicular positioning and edge computing [153,154] |
| Industrial IoT & smart manufacturing | Edge-only FL (E) within plants; hierarchical FL (H) across lines/plants; multi-site cross-silo FL (C); early decentralized pilots (D) for collaborative robotics | Surveys on FL+IIoT and smart manufacturing [141,155,156]; predictive maintenance and quality inspection case studies [157,158,159]; cross-plant and product-lifecycle FL [160] |
| Smart cities & infrastructure | Hierarchical FL (H) (device–edge–city DC); edge-only FL (E) within utilities; cross-silo FL (C) across agencies; blockchain-enabled FL (D) in multi-stakeholder settings | Surveys on FL for smart cities and privacy [142,143]; FL for smart grids and energy systems [161,162]; smart-city infrastructure and edge-AI FL case studies [163,164,165] |
5.1. Healthcare and Wearables
Cross-institutional diagnostic models.
Cardiology and arrhythmia detection across hospitals and wearables.
Remote monitoring, telemedicine, and home health.
Governance, regulation, and future directions.
5.2. Intelligent Transportation
Collaborative perception and prediction across vehicles.
Traffic-flow prediction and adaptive traffic control.
Vehicular edge computing and resource management.
Challenges and outlook.
5.3. Industrial IoT and Smart Manufacturing
Predictive maintenance and quality inspection.
Cross-plant collaboration and product lifecycle management.
Anomaly detection, safety, and control.
Practical constraints and trends.
5.4. Smart Cities and Infrastructure
Privacy-preserving analytics on camera and mobility data.
Smart grids, energy, and critical infrastructure.
Adaptive urban services and resilience.
Multi-stakeholder governance and blockchain-enabled FL.
6. Future Directions
6.1. Federated Continual and Lifelong Learning
- Mobile usage patterns change over time (new apps, changing language, seasonal trends);
- Industrial processes and equipment are reconfigured, upgraded, or replaced;
- Clinical guidelines, populations, and sensing technologies evolve in healthcare;
- Smart-city sensing infrastructures expand or change placement.
Non-stationarity and task evolution at the edge.
- mechanisms to detect and characterize drift locally on clients and globally at aggregators;
- model architectures that can grow or reconfigure to accommodate new tasks (e.g., dynamic heads, modular networks) while preserving important knowledge for old tasks;
- policies on when to “forget” obsolete patterns for privacy and relevance, possibly under regulatory requirements for machine unlearning.
Avoiding catastrophic forgetting under privacy constraints.
- Federated rehearsal via synthetic or distilled data, where clients or servers train small generative models or distillation targets that capture past task structure without revealing individual examples;
- Parameter-importance estimation and regularization applied locally, with aggregated importance signals guiding global updates so that globally shared parameters are protected from destructive drift;
- Task-aware and task-agnostic modularization, where clients dynamically allocate and reuse modules (e.g., adapters, prompts, experts) for new tasks while freezing or cautiously updating modules that encode past knowledge.
Sharing knowledge about change.
- Change-aware aggregation, where updates that indicate new patterns are weighted differently or propagate more quickly;
- Federated meta-learning of adaptation rules, where the global model is not a monolithic predictor, but a meta-learner that provides good priors and adaptation strategies for continually changing tasks;
- Hierarchical FCL, where local edge nodes maintain short-term memory and rapid adaptation, while higher tiers track slowly evolving global structure.
6.2. Foundation Models and On-Device Adaptation
Distillation into edge-sized models.
- query the teacher model with local (possibly obfuscated) inputs to obtain soft labels or embeddings;
- perform local distillation from teacher outputs to student models, without sending raw data;
- participate in FL rounds to aggregate and refine student models across the fleet.
Parameter-efficient FL fine-tuning.
- Clients download a shared frozen backbone and a small set of trainable modules (e.g., adapters or prompts);
- They fine-tune these modules on local data and upload only the module parameters or deltas;
- The server aggregates these PEFT modules, producing updated global adapters or prompt sets that can then be redistributed.
Hierarchical intelligence: foundation models + TinyML.
- Cloud-scale foundation models providing broad world knowledge and general reasoning capabilities;
- Edge-scale models running on gateways and MEC servers, adapted to local domains (e.g., a city, a factory, a hospital network) via FL;
- TinyML models on microcontrollers, serving as ultra-low-power sentinels or pre-filters, occasionally interacting with higher levels.
- edge models distill or specialize parts of foundation models for their local environment;
- TinyML nodes participate in federated learning of small front-end models that align with representations used by higher tiers;
- knowledge flows both downwards (from cloud to edge) and upwards (from local experiences to update foundation models), subject to privacy and governance constraints.
6.3. Green and Trustworthy FL
Green FL: towards sustainable federated training.
- designing energy-efficient models (pruned, quantized, or sparsified) suitable for target hardware;
- optimizing communication schedules and topologies to minimize expensive transmissions (e.g., long-haul WAN or cellular);
- aligning FL workloads with renewable energy availability at edge and cloud sites (e.g., deferring non-urgent aggregation or fine-tuning to times when green energy is abundant);
- co-designing FL algorithms that converge quickly with fewer rounds and lower per-round cost.
Trustworthy FL: robustness, fairness, interpretability, and accountability.
- Robustness requires resistance to adversarial clients (poisoning, backdoors, Byzantine behavior) and resilience under network failures and non-IID distributions;
- Fairness demands that FL models do not systematically underperform for minority populations, rare client types, or low-resource regions, and that client contributions and benefits are balanced;
- Interpretability is crucial in sensitive domains (healthcare, finance, critical infrastructure), where stakeholders require explanations for model predictions and adaptation behavior over time;
- Auditable privacy and compliance call for mechanisms to demonstrate adherence to privacy budgets, data-usage policies, and unlearning requests.
- integrating robust aggregation, DP, and secure aggregation with hardware roots of trust and confidential computing at edge and cloud;
- embedding fairness-aware objectives and personalized evaluation protocols into FL pipelines, with per-client or per-region reporting of performance and resource usage;
- logging and provenance mechanisms (possibly blockchain-based) that record model versions, training configurations, and aggregation events, enabling post-hoc audits and regulatory inspections;
- new human-in-the-loop workflows for monitoring, validating, and updating FL models in safety-critical settings.
6.4. Convergence with TinyML, Neuromorphic, and Beyond
Federated TinyML at massive scale.
- fleets of millions of sensors collaboratively training tiny models for anomaly detection, environmental monitoring, or keyword spotting;
- hierarchical FL where TinyML devices communicate with local aggregators (gateways, edge servers) only occasionally, using heavily compressed updates;
- hybrid on-device learning patterns where local online adaptation runs continuously, while FL provides periodic global regularization and knowledge sharing.
Neuromorphic, in-memory, and quantum hardware.
- non-traditional numerical properties (e.g., analog computation, limited precision, device variability);
- event-based processing (spikes, asynchronous updates) rather than synchronous clocked operations;
- fundamentally different energy and latency trade-offs.
- How to represent and aggregate updates from neuromorphic learners in a global model that may still reside on conventional hardware?
- Can federated learning leverage local synaptic plasticity rules and spike-timing dependent plasticity (STDP) as forms of local adaptation, with periodic global synchronization?
- How to express privacy and robustness guarantees when local learning dynamics are analog and less directly controlled?
Rethinking the FL abstraction.
- treat each client as a heterogeneous learning agent, potentially running very different learning algorithms (backprop, Hebbian rules, evolutionary strategies) on very different hardware;
- aggregate not only parameter updates but also structured knowledge (e.g., symbolic rules, prototypes, memories, or policies) in ways that respect heterogeneous representations;
- interleave FL with other forms of distributed intelligence (e.g., multi-agent reinforcement learning, swarm intelligence) in complex cyber-physical environments.
7. Conclusion and Outlook
Open Problems and Research Directions
- Foundation models and federated adaptation. Integrating large multimodal and language models into federated edge AI remains largely unexplored at scale. Open questions include how to combine cloud-based pre-training with edge- and device-level fine-tuning, how to partition models across tiers (split/hybrid FL), and how to design communication- and energy-efficient adapter mechanisms that respect device constraints while preserving privacy.
- Unified treatment of personalization, robustness, and fairness. While personalized FL, robust aggregation, and fairness-aware training have each seen rapid progress, most real systems will need to satisfy all three simultaneously. Developing principled objectives, optimization schemes, and evaluation benchmarks that jointly capture personalization quality, robustness to adversaries, and fairness across heterogeneous clients is an open challenge.
- Cross-layer optimization and autoscaling. Today’s systems often treat hardware, networking, and learning algorithms in isolation. A key research direction is end-to-end co-design: FL algorithms that are explicitly aware of accelerators, memory hierarchies, and radio conditions; orchestrators that adapt topology, client selection, compression, and hyperparameters in response to telemetry; and autoscaling policies that trade off accuracy, latency, energy, and carbon footprint in real time.
- Green and sustainable federated learning. Energy efficiency and environmental impact are becoming first-class objectives. Beyond model compression and efficient hardware, this requires carbon-aware scheduling (e.g., aligning FL rounds with renewable-energy availability), topology-aware aggregation that minimizes backhaul usage, and lifecycle analyses that account for deployment, updates, and decommissioning of large fleets of devices and edge servers.
- End-to-end privacy, security, and governance. Although individual techniques (DP, secure aggregation, TEEs, blockchain, auditing) are well-studied, composing them into auditable, certifiable, and user-understandable systems is still nascent. Future work must bridge technical mechanisms with legal and organizational processes: consent and unlearning workflows, cross-border data constraints, certification for safety-critical domains, and mechanisms for multi-stakeholder trust in decentralized or consortial settings.
- Benchmarks, simulators, and real-world evaluations. Progress is bottlenecked by the gap between synthetic benchmarks and real deployments. There is a need for standardized, heterogeneous benchmarks that couple realistic data distributions with faithful models of networks, hardware, and governance constraints, as well as open-source testbeds that allow reproducible experimentation at scale in healthcare, ITS, IIoT, and smart-city scenarios.
References
- Xia, W.; Li, Q.; Wu, D.; Chen, M. Federated Learning for Edge Computing: Architectures, Challenges, and Opportunities. IEEE Internet of Things Journal 2021, 8, 3209–3230, urvey of FL architectures and challenges in edge computing. [Google Scholar] [CrossRef]
- Abreha, H.G.; Hayajneh, M.; Serhani, M.A. Federated Learning in Edge Computing: A Systematic Survey. Sensors 2022, 22, 450, Survey on federated learning in edge computing,. [Google Scholar] [CrossRef] [PubMed]
- Qi, Q.; Lin, T.; et al. Federated Learning for Edge Intelligence: Architectures, Algorithms, and Applications. ACM Computing Surveys 2022, 55, 1–37, Survey of FL for edge intelligence. [Google Scholar] [CrossRef]
- Nguyen, T.; Pham, Q.V.; Mirjalili, S.; Pathirana, P.N.; Ding, Z.; Seneviratne, A. Federated Learning for Edge Networks: A Comprehensive Survey. IEEE Communications Surveys 2022, 24, 1781–1826, Survey on FL at the network edge. [Google Scholar] [CrossRef]
- Li, X.; Huang, K.; Yang, Q.; Wang, S. Federated Learning for Edge Networks: A Comprehensive Survey. IEEE Transactions on Edge Computing 2021, 1, 45–59, Survey focusing on federated learning architectures for edge computing. [Google Scholar] [CrossRef]
- Wu, J.; et al. Topology-Aware Federated Learning in Edge Networks: Models, Algorithms, and Systems. IEEE Communications Surveys & Tutorials 2024, 26, 234–256, Survey on topology-aware FL. [Google Scholar] [CrossRef]
- McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 2017. Introduces the Federated Averaging (FedAvg) algorithm.
- Kairouz, P.; McMahan, H.B.; Avent, B.; et al. Advances and Open Problems in Federated Learning. Foundations and Trends in Machine Learning 2021. Comprehensive survey of federated learning.
- Bonawitz, K.; Eichner, H.; Grieskamp, W.; Huba, D.; Ingerman, A.; Ivanov, V.; Kiddon, C.; Konecny, J.; Mazzocchi, S.; McMahan, H.B.; et al. Towards Federated Learning at Scale: System Design. In Proceedings of the Proceedings of the 2nd SysML Conference, 2019. System design for Google-scale cross-device federated learning.
- Hard, A.; Rao, K.; Mathews, R.; Ramaswamy, S.; Beaufays, F.; Augenstein, S.; Eichner, H.; Kiddon, C.; Ramage, D. Federated Learning for Mobile Keyboard Prediction. In Proceedings of the Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019, pp. 505–513. Federated training of RNN language models for on-device keyboard prediction.
- Chen, X.; et al. FedSA: A Semi-Asynchronous Federated Learning Mechanism in Edge Computing. IEEE Transactions on Parallel and Distributed Systems 2021. Staleness-aware federated learning. [CrossRef]
- Konečný, J.; McMahan, H.B.; Ramage, D.; Richtárik, P. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. In Proceedings of the NIPS Workshop on Private Multi-Party Machine Learning, 2016. Early work on communication-efficient federated optimization.
- Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. In Proceedings of the Proceedings of MLSys, 2020. Introduces FedProx, a proximal term for heterogeneous FL.
- Reisizadeh, A.; Maleki, A.; Hassani, H.; Jadbabaie, A.; Pedarsani, R. FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization. In Proceedings of the Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS). PMLR, 2020, pp. 202–212. FedPAQ algorithm for communication-efficient FL.
- Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp.1175–1191.; Secure aggregation protocol for federated learning,. [CrossRef]
- Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In Proceedings of the Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. Variance-reduction method for federated learning under client drift.
- Hussain, N.; et al. Federated Learning in Healthcare: Architectures, Applications, and Challenges. IEEE Access 2022, 10, 19726–19745, Federated learning in healthcare,. [Google Scholar] [CrossRef]
- Chen, M.; et al. Federated Learning for Smart Industrial IoT: A Survey. IEEE Communications Surveys & Tutorials 2023, 25, 456–489, Survey on FL for smart IIoT,. [Google Scholar] [CrossRef]
- Yang, H.; et al. Hierarchical Federated Learning for Industrial IoT. IEEE Transactions on Industrial Informatics 2023, 19, 5678–5689, Hierarchical FL framework for IIoT,. [Google Scholar] [CrossRef]
- Nguyen, T.; Pham, Q.V.; Mirjalili, S.; et al. Federated Learning for Healthcare: Hospital-Centric Edge Prototypes for ICU Monitoring and Imaging. IEEE Access 2022, 10, 123456–123470, Hospital-centric edge FL for ICU monitoring,. [Google Scholar] [CrossRef]
- Lim, W.Y.B.; Luong, N.C.; Hoang, D.T.; Jiao, Y.; Liang, Y.C.; Yang, Q.; Niyato, D.; Miao, C. Federated Learning in Mobile Edge Networks: A Comprehensive Survey. IEEE Communications Surveys & Tutorials 2020, 22, 2031–2063, Survey on FL in 5G/6G networks,. [Google Scholar] [CrossRef]
- Yuan, M.; et al. Decentralized Federated Learning: A Survey of Algorithms and Systems. ACM Computing Surveys 2024, 56, 1–36, Survey of decentralized FL algorithms and systems,. [Google Scholar] [CrossRef]
- Qu, Y.; Gao, L.; et al. Blockchain-Enabled Federated Learning for Internet of Vehicles and Edge Computing: A Survey. IEEE Internet of Things Journal 2022, 9, 12345–12360, Survey of blockchain-enabled FL for IoV,. [Google Scholar] [CrossRef]
- Liu, Y.; Li, Q.; Chen, M.; Wu, D. Hierarchical Federated Learning for Edge Computing: Algorithms, Frameworks, and Applications. IEEE Internet of Things Journal 2023, 10, 10345–10358, Introduces hierarchical federated averaging for edge computing. [Google Scholar] [CrossRef]
- Wu, J.; et al. Hierarchical Aggregation for Federated Learning in Edge Networks. IEEE Transactions on Mobile Computing 2022, 21, 4567–4579, Hierarchical aggregation in FL,. [Google Scholar] [CrossRef]
- Li, W.; et al. Smart Factory Federated Learning Architectures and Case Studies. Sensors 2023, 23, 5678, Smart factory FL case studies,. [Google Scholar] [CrossRef]
- Hasan, M.; et al. Clustered Federated Learning for Edge Computing. Future Generation Computer Systems 2024, 145, 234–245, Clustered FL for edge computing. [Google Scholar] [CrossRef]
- Koloskova, A.; Loizou, N.; Boreiri, S.; Jaggi, M.; Stich, S.U. A Unified Theory of Decentralized SGD with Changing Topology and Local Updates. In Proceedings of the Proceedings of the 37th International Conference on Machine Learning (ICML). PMLR, 2020, Vol. 119, Proceedings of Machine Learning Research, pp. 5634–5644. Covers decentralized gossip-based federated learning algorithms.
- Rahmati, M. Energy-Aware Federated Learning for Secure Edge Computing in 5G-Enabled IoT Networks. Journal of Electrical Systems and Information Technology 2025, 12, 13, Energy-aware FL in 5G/6G networks. [Google Scholar] [CrossRef]
- Liu, X.; Dong, X.; Jia, N.; Zhao, W. Federated Learning-Oriented Edge Computing Framework for the IIoT. Sensors 2023, 23, 4182, Edge-oriented FL framework for IIoT,. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Edge-Only Federated Learning for Railway Condition Monitoring and Safety. Future Generation Computer Systems 2024, 145, 234–245, Edge-only FL for railway monitoring,. [Google Scholar] [CrossRef]
- Zhang, W.; et al. Resource Allocation and Scheduling in Federated Learning for IIoT. IEEE Transactions on Industrial Informatics 2025, 21, 5678–5689, Resource allocation in IIoT FL,. [Google Scholar] [CrossRef]
- Boruga, D.; Bolintineanu, D.; Racates, G.I. Federated Learning in Edge Computing: Enhancing Data Privacy and Efficiency in Resource-Constrained Environments. In Proceedings of the World Journal of Advanced Engineering Technology and Sciences, Vol. 13; 2024; pp. 205–214, Focus on compliance and governance for FL at the edge. [Google Scholar] [CrossRef]
- Wu, J.; et al. Data Security and Privacy-Preserving Techniques for Edge Federated Learning: A Survey. ACM Computing Surveys 2023, 55, 1–34, Survey on privacy-preserving FL at the edge. [Google Scholar] [CrossRef]
- Hasan, M.; et al. On-Premise Federated Learning for Industrial IoT. IEEE Internet of Things Journal 2023, 10, 6789–6801, On-premise FL for IIoT,. [Google Scholar] [CrossRef]
- Chen, M.; et al. Smart and Collaborative IIoT: Federated Learning and Data Governance. IEEE Internet of Things Journal 2023, 10, 4001–4015, Federated learning and governance in IIoT,. [Google Scholar] [CrossRef]
- Li, W.; et al. Federated Edge Computing for Privacy-Preserving Analytics in Healthcare. IEEE Transactions on Network and Service Management 2025, 19, 512–525, Federated edge computing in healthcare. [Google Scholar] [CrossRef]
- Patel, R.; et al. Case Studies of Edge-Only Federated Learning in Hospital Settings. Journal of Biomedical Informatics 2024, 145, 104567, Edge-only FL case studies in healthcare. [Google Scholar] [CrossRef]
- Rahman, M.; Zhang, W.; Koloskova, A.; Jaggi, M. Managing Federated Learning on Decentralized Infrastructures as a Service. IEEE Transactions on Cloud Computing 2025, 13, 789–802, Framework for decentralized FL management as a service,. [Google Scholar] [CrossRef]
- Liu, X.; Dong, X.; Jia, N.; Zhao, W. Federated Learning-Oriented Edge Computing Framework for the IIoT. Sensors 2024, 24, 4182, Federated learning framework for IIoT edge infrastructures. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Chen, M.; et al. Secure Federated Learning for Industrial IoT Edge Computing. IEEE Transactions on Industrial Informatics 2025, 21, 6789–6802, Secure FL for IIoT edge computing. [Google Scholar] [CrossRef]
- Alazab, M.; Khan, M.; Islam, R.; et al. Federated Learning for E-Healthcare Systems: A Next-Generation Holistic Architecture. IEEE Access 2024, 12, 45012–45029, Federated learning for e-healthcare. [Google Scholar] [CrossRef]
- He, C.; Li, Z.; So, J.; Zhang, M.; Wang, H.; Xu, X.; Zhao, S.; Rong, Y. FedML: A Research Library and Benchmarking Suite for Federated Learning. https://github.com/FedML-AI/FedML, 2020. GitHub repository for FedML.
- Zhang, Y.; Chen, M.; et al. Resource-Aware Federated Learning in IoT and Edge Environments: A Survey. IEEE Internet of Things Journal 2023, 10, 4001–4020, Survey on resource-aware FL in IoT. [Google Scholar] [CrossRef]
- Systems, S. FEDn: Federated Learning from Research to Reality. Scaleout Systems Whitepaper, 2024. Self-managed federated learning framework for on-premise and private clouds.
- FEDML.; DENSO. FEDML Empowers On-Premise AI Innovation at DENSO. Business Wire Press Release, 2024. Industrial-scale on-premise federated learning deployment.
- Systems, S. Self-Managed vs SaaS Federated Learning Platforms: A Comparison. Scaleout Systems Whitepaper 2024. Comparison of deployment models for FL platforms.
- Rahman, M.; Zhang, W.; Chen, M. Optimized Resource Allocation for Industrial IoT Federated Learning. IEEE Transactions on Industrial Informatics 2025, 21, 5678–5690, Resource optimization in IIoT FL,. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Federated Learning in IoT: A Resource-Aware Design Guide. Future Generation Computer Systems 2025, 152, 45–60, Resource-aware FL design guide for IoT,. [Google Scholar] [CrossRef]
- Lian, X.; Zhang, C.; Zhang, H.; Hsieh, C.J.; Zhang, W.; Liu, J. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2017. Foundational analysis of decentralized SGD.
- Koloskova, A.; Stich, S.; Jaggi, M. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication. Proceedings of the 37th International Conference on Machine Learning (ICML) 2020. Often cited for decentralized SGD/DFL analysis.
- Koloskova, A.; Stich, S.; Jaggi, M. Gossip over Graph Overlays for Decentralized Federated Learning. IEEE Transactions on Neural Networks and Learning Systems 2025, 36, 1234–1247, Decentralized FL via gossip over graphs,. [Google Scholar] [CrossRef]
- Wang, X.; et al. Blockchain-Enabled Federated Learning for UAV Swarms. IEEE Transactions on Vehicular Technology 2022, 71, 8901–8915, Blockchain-enabled FL for UAVs,. [Google Scholar] [CrossRef]
- Huang, T.; et al. Blockchain for Federated Learning: A Survey. ACM Computing Surveys 2023, 55, 1–36, Survey of blockchain-enabled federated learning,. [Google Scholar] [CrossRef]
- Wu, L.; Ruan, W.; Hu, J.; He, Y. A Survey on Blockchain-Based Federated Learning, 2023. Published December 2023, Published December 2023,. [CrossRef]
- Koloskova, A.; Lin, T.; Stich, S.U.; Jaggi, M. Decentralized Deep Learning with Gradient Tracking. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2021). Curran Associates, Inc., Vol. 34; 2021; pp. 8059–8075, Introduces gradient tracking for decentralized federated learning.. [Google Scholar]
- Vepakomma, P.; Gupta, O.; Swedish, T.; Raskar, R. Split Learning for Health: Distributed Deep Learning without Sharing Raw Patient Data. In Proceedings of the Proceedings of the 34th AAAI Conference on Artificial Intelligence Workshops, 2018. Early work on split learning in healthcare.
- Vepakomma, P.; Gupta, O.; Raskar, R. Split Learning: A Comprehensive Overview of Concepts, Privacy, and Systems. IEEE Transactions on Neural Networks and Learning Systems 2023, 34, 1234–1247, Overview of split learning systems. [Google Scholar] [CrossRef]
- Duan, Q.; Lu, Z. Edge Cloud Computing and Federated–Split Learning in Internet of Things. Future Internet 2024, 16, 227, Survey on edge-cloud split learning,. [Google Scholar] [CrossRef]
- Yang, G.; et al. RoS-FL: U-Shaped Split Federated Learning for Medical Image Segmentation. In Proceedings of the Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 1505–1509. Introduces RoS-FL for medical image segmentation. Introduces RoS-FL for medical image segmentation. [CrossRef]
- Gao, L.; et al. Split-Federated Learning for Recommendation and Personalization. In Proceedings of the Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), 2023, pp. 2456–2463. Split-FL for recommender system. Split-FL for recommender systems. [CrossRef]
- Li, X.; Wang, N.; Zhu, L.; Yuan, S.; Guan, Z. FUSE: A Federated Learning and U-Shape Split Learning-Based Electricity Theft Detection Framework. Science China Information Sciences 2024, 67, 149302, Three-tier split learning for edge-cloud collaborative analyti. [Google Scholar] [CrossRef]
- Yang, H.; et al. EUSFL: Edge–Client–Cloud U-Shaped Federated Learning. Future Generation Computer Systems 2024, 145, 245–256, Edge-client-cloud U-shaped FL,. [Google Scholar] [CrossRef]
- Shi, X.; et al. Edge–Cloud Collaborative Split Fine-Tuning for Large Models. IEEE Transactions on Parallel and Distributed Systems 2023, 34, 1234–1246, Split fine-tuning across edge and cloud,. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. Edge–Cloud Collaborative Federated and Split Learning with Knowledge Transfer. IEEE Transactions on Parallel and Distributed Systems 2024, 35, 1234–1246, Edge-cloud collaborative FL with knowledge transfer. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. USFL: Unified Split and Federated Learning for Heterogeneous Edge–Cloud Environments. IEEE Transactions on Mobile Computing 2025, 24, 2345–2358, Unified split and federated learning for heterogeneous edge-cloud. [Google Scholar] [CrossRef]
- Gao, Y.; Liu, Y.; Sun, L.; et al. A Survey on Heterogeneous Federated Learning. ACM Computing Surveys 2022, 55, 1–37, Survey on heterogeneous and personalized federated learning,. [Google Scholar] [CrossRef]
- Sabah, M.; Alazab, M.; et al. Model-Optimization Based Personalized Federated Learning: A Survey. Journal of Parallel and Distributed Computing 2024, 180, 45–60, Survey of model-optimization approaches in personalized FL. [Google Scholar] [CrossRef]
- Shamsian, A.; et al. Personalized Federated Learning: A Survey and Taxonomy. IEEE Transactions on Neural Networks and Learning Systems 2021, 32, 5462–5479, Survey and taxonomy of personalized FL methods. [Google Scholar] [CrossRef]
- Huang, Y.; et al. Federated Learning for Generalization, Robustness, and Fairness: A Survey and Benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 2024, 46, 2345–2367, Survey and benchmark on generalization, robustness, and fairness in FL. [Google Scholar] [CrossRef]
- Zhao, L.; Chen, M.; et al. Feature-Adaptation Based Personalized Federated Learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2024), 2024, pp. 12345–12356. Feature-adaptation approach for personalized FL.
- Liang, P.P.; Liu, T.; Shen, X.; Lin, Y.; Chen, J.; et al. Think Locally, Act Globally: Federated Learning with Local and Global Representations. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2020) 2020, pp. 20863–20874. Representation-based personalized FL method.
- Tang, X.; et al. Learning. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024), 2024, pp.; 1234–1245. Parameter subset selection for PFL. [CrossRef]
- Wang, H.; et al. FedAS: Adaptive Selection for Personalized Federated. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024), 2024, pp.; 2345–2356. Adaptive selection method for PFL,. [CrossRef]
- Fallah, A.; Mokhtari, A.; Ozdaglar, A. Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2020), pp. 23012–23023. Per-FedAvg meta-learning approach for personalized FL.
- Zhang, Y.; Chen, M.; Wu, D. Learning. In Proceedings of the Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), 2022, pp. 1234–1242. Few-shot meta-augmented personalized FL. [CrossRef]
- Huang, Y.; et al. Shift. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 12345–12354. Prototype-based personalized FL under domain shift,. [CrossRef]
- Smith, V.; Chiang, C.K.; Sanjabi, M.; Talwalkar, A. Federated Multi-Task Learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS); 2017; pp. 4424–4434, Classic MTL formulation of federated learnin. [Google Scholar]
- Ghosh, A.; Chung, J.; Yin, D.; Ramchandran, K. An Efficient Clustered Federated Learning Method. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2020, pp. 19586–19597. Cluster-based FL method.
- Huang, Y.; et al. FCCL: Federated Cross-Correlation Learning for Robust Personalized Federated Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 2023, 45, 6789–6801, Cross-correlation learning for robust PFL. [Google Scholar] [CrossRef]
- Wang, H.; et al. Heterogeneity-Aware Representation Learning for Personalized Federated. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 2345–2356. Heterogeneity-aware representation learning for PFL. [CrossRef]
- Chen, X.; et al. Asynchronous Federated Learning over Heterogeneous Devices: A Survey. IEEE Internet of Things Journal 2023, 10, 4001–4020, Survey on asynchronous FL,. [Google Scholar] [CrossRef]
- Zhang, W.; et al. Federated Learning over Heterogeneous Devices: Asynchronous Protocols and System Designs. IEEE Communications Surveys & Tutorials 2023, 25, 456–489, Survey of asynchronous FL protocols. [Google Scholar] [CrossRef]
- Ghosh, A.; Chung, J.; Yin, D.; Ramchandran, K. Efficient Neural Network Training on Edge Devices using Federated Dropout. In Proceedings of the Proceedings of the International Conference on Learning Representations (ICLR) Workshops, 2020. Federated dropout for efficient edge training.
- Rahman, M.; et al. Dynamic Gradient Compression and Client Selection for Communication-Efficient Federated Learning. IEEE Communications and IoT Systems Conference (CIoT-SC) 2024, pp. 123–134, Dynamic compression and client selection in FL. [CrossRef]
- Zhang, W.; Chen, M.; et al. HA-HEFL: Hybrid Asynchronous Heterogeneous Federated Learning. Neurocomputing 2025, 512, 456–468, Hybrid asynchronous heterogeneous FL,. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. Heterogeneity-Aware Asynchronous Federated Learning for Vision. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023, pp. 456–465. Timely asynchronous FL for vision workloads. [CrossRef]
- Wu, J.; et al. Communication-Efficient Federated Learning: A Survey of Compression, Scheduling, and Over-the-Air Aggregation. IEEE Communications Surveys & Tutorials 2023, 25, 789–812, Survey of communicationefficient FL. [Google Scholar] [CrossRef]
- Oh, S.; et al. Sparse Over-the-Air Federated Learning with Error Feedback. IEEE Journal on Selected Areas in Communications 2024, 42, 1234–1245, Sparse OTA FL with error feedback,. [Google Scholar] [CrossRef]
- Jang, H.; et al. Fed-ZOE: Zeroth-Order Estimation for Dimension-Limited Over-the-Air Federated Learning. IEEE Transactions on Wireless Communications 2024, 23, 5678–5689, Zeroth-order estimation for OTA FL,. [Google Scholar] [CrossRef]
- Chen, M.; et al. Energy-Efficient Federated Learning in Mobile and Edge Computing: A Survey. IEEE Communications Surveys & Tutorials 2023, 25, 1234–1256, Survey on energy-efficient FL.. [Google Scholar]
- Zhang, Y.; et al. Green Federated Learning: Algorithms, Systems, and Applications. IEEE Internet of Things Journal 2024, 11, 2345–2358, Green FL algorithms,. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, M.; Wu, D. A Survey on Green Federated Learning: Towards Sustainable Distributed AI. ACM Computing Surveys 2024, 56, 1–36, Survey on sustainable and energy-efficient federated learning. [Google Scholar] [CrossRef]
- Xu, X.; He, C.; Zhang, M. TinyFL: Federated Learning for Tiny Devices and Microcontrollers. IEEE Internet of Things Journal 2023, 10, 3456–3468, Federated learning for resource-constrained IoT devices. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Federated Learning and Renewable Energy-Aware Edge Computing: A Survey. IEEE Transactions on Sustainable Computing 2024, 9, 123–137, Survey on renewable energy-aware FL at the edge,. [Google Scholar] [CrossRef]
- Lyu, L.; et al. The Federation Strikes Back: A Systematic Study of Privacy Risks and Defenses in Federated Learning. Proceedings of the IEEE 2024, 112, 789–812, Systematic study of privacy risks and defenses in FL,. [Google Scholar] [CrossRef]
- Zhang, W.; et al. Privacy and Security Issues in Federated Learning: Attacks, Defenses, and Open Challenges. IEEE Transactions on Dependable and Secure Computing 2024, 21, 567–580, Survey of privacy/security issues in FL. [Google Scholar] [CrossRef]
- Chen, M.; et al. Research Progress in Privacy-Preserving Federated Learning: A Comprehensive Survey. Information Sciences 2024, 648, 123–145, Comprehensive survey of privacy-preserving FL,. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Privacy, Security, and Robustness in Federated Learning: A 2025 Survey. IEEE Communications Surveys & Tutorials 2025, 27, 456–489, Survey on privacy, security, and robustness in FL. [Google Scholar] [CrossRef]
- Li, W.; et al. DynamicPFL: Adaptive Personalized Federated Learning with Dynamic Differential Privacy. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2023); 2023; pp. 12345–12356, Adaptive personalized FL with dynamic DP. [Google Scholar]
- Huang, Y.; et al. Robust Federated Learning: Algorithms, Benchmarks, and Open-Source Repository. https://github.com/RobustFL/RobustFL, 2024. Open-source repository for robust FL algorithms and benchmarks.
- Silvano, C.; et al. A Survey of Deep Learning Accelerators: From Datacenters to Edge Devices. ACM Computing Surveys 2025, 57, 1–45, Survey of DL accelerators across datacenter and edge,. [Google Scholar] [CrossRef]
- Chen, M.; et al. Edge AI Accelerators: Architectures, Programming Models, and Benchmarks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2024, 43, 1234–1245, Survey of edge AI accelerators. [Google Scholar] [CrossRef]
- Zhang, W.; et al. Deep Learning Accelerators for Edge Intelligence: A Survey. IEEE Internet of Things Journal 2024, 11, 2345–2358, Survey of DL accelerators for edge intelligence,. [Google Scholar] [CrossRef]
- Rahman, M.; et al. A Comparative Study of Federated Learning Frameworks and Platforms. IEEE Communications Surveys & Tutorials 2025, 27, 789–812, Comparison of FL frameworks and platforms. [Google Scholar] [CrossRef]
- Riedel, M.; et al. Evaluating Federated Learning Frameworks: Scalability, Flexibility, and Performance. Future Generation Computer Systems 2024, 152, 45–60, Evaluation of FL frameworks across scalability and performance dimensions,. [Google Scholar] [CrossRef]
- Chen, M.; et al. Benchmarking Edge AI Hardware: Inferences-per-Watt Across CPUs, GPUs, and NPUs. IEEE Access 2024, 12, 45678–45690, Benchmark study of edge AI hardware efficiency. [Google Scholar] [CrossRef]
- Electronics, M. Microcontroller NPUs for Edge AI: Design Considerations and Benchmarks. Technical report, Mouser Electronics, 2024. Application note on NPUs for microcontrollers.
- Texas Instruments. Embedded Edge AI on TI SoCs: Performance and Power Trade-offs. Technical report, Texas Instruments, 2024. Whitepaper on TI SoCs for embedded edge AI.
- Duan, X.; et al. Federated Learning for 6G-Enabled Industrial IoT: Architectures, Challenges, and Opportunities. IEEE Network 2023, 37, 123–130, Survey of FL in 6G-enabled IIoT,. [Google Scholar] [CrossRef]
- Zhang, W.; et al. Edge-Native Intelligence in 6G Networks: A Survey of Architectures and Technologies. IEEE Communications Surveys & Tutorials 2023, 25, 789–812, Survey of edge-native intelligence in 6G. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Federated Learning in 6G for Industrial IoT: State of the Art and Future Directions. IEEE Internet of Things Journal 2024, 11, 2345–2358, Survey of FL in 6G IIoT,. [Google Scholar] [CrossRef]
- Banbury, C.; et al. TinyML on Microcontrollers: A Review of Challenges and Techniques. ACM Transactions on Embedded Computing Systems 2022, 21, 1–29, Review of TinyML challenges and techniques. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. Federated Learning for TinyML: A Survey. IEEE Internet of Things Journal 2025, 12, 3456–3470, Survey of FL for TinyML devices. [Google Scholar] [CrossRef]
- Rahman, M.; et al. TinyFL at the Edge: Collaborative Learning on Microcontrollers and Sensors. IEEE Transactions on Green Communications and Networking 2025, 9, 123–135, TinyFL collaborative learning at the edge. [Google Scholar] [CrossRef]
- Chen, M.; et al. TinyFedTL: Federated Transfer Learning for TinyML Devices in IoT. IEEE Internet of Things Journal 2025, 12, 4567–4578, Federated transfer learning for TinyML IoT devices,. [Google Scholar] [CrossRef]
- Liu, Y.; et al. Federated Learning for TinyML-Based IoT Systems. Sensors 2024, 24, 2345, Federated learning for TinyML IoT systems. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, M.; Wu, D. TinyReptile: Federated Meta-Learning for TinyML. In Proceedings of the Proceedings of the IEEE International Conference on Internet of Things (iThings), 2023, pp. 456–463; Federated meta-learning for TinyML devices,. [CrossRef]
- Chen, M.; Zhang, Y.; Wu, D. Edge-Intelligent Computing for Federated Learning in Future Networks. IEEE Communications Magazine 2023, 61, 112–118, Survey of edge-intelligent computing for FL,. [Google Scholar] [CrossRef]
- Li, W.; Chen, M.; Wu, D. Edge–Cloud Collaborative Learning for Federated. In Proceedings of the Proceedings of the IEEE International Conference on Edge Computing and Communications Technologies (ECCT), 2023, pp.; 123–130. Edge-cloud collaborative FL archite. [CrossRef]
- Liu, Y.; Li, Q.; Chen, M. Federated Learning Simulators and Frameworks: A Systematic Evaluation. Sensors 2024, 24, 2345, Evaluation of FL simulators and frameworks. [Google Scholar] [CrossRef]
- TensorFlow Federated Team. TensorFlow Federated Documentation. https://www.tensorflow.org/federated, 2024. Accessed 2024.
- OpenMined Community. Syft Documentation: Privacy-Preserving Data Science. https://docs.openmined.org, 2024. Accessed 2024.
- FedML Team. FedML Documentation. https://doc.fedml.ai, 2024. Accessed 2024.
- FedML Team. Recent Advances in the FedML Ecosystem. https://fedml.ai/blog, 2023. FedML blog updates.
- Yang, Q.; Liu, Y.; Cheng, Y.; Kang, Y.; Chen, T.; Yu, H. Federated Learning. Journal of Machine Learning Research 2021, 22, 1–7, Overview of federated learning; associated with FATE platform. [Google Scholar]
- WeBank, AI. FATE: Federated AI Technology Enabler – Documentation. https://fate.fedai.org, 2024. Accessed 2024.
- Substra Foundation. Substra: A Framework for Collaborative, Privacy-Preserving Machine Learning. https://substra.org, 2019. Whitepaper introducing Substra.
- Substra Foundation. Substra Documentation. https://docs.substra.org, 2024. Accessed 2024.
- Linux Foundation AI & Data and Substra. Substra and the Linux Foundation: Open-Source Federated Learning in Healthcare. https://lfai.foundation/projects/substra, 2024. Open-source FL in healthcare.
- OpenFL Team. OpenFL: Open Federated Learning – Documentation. https://github.com/intel/openfl, 2024. Intel OpenFL documentation.
- Intel Corporation. Confidential Federated Learning with Intel SGX and OpenFL. https://www.intel.com, 2024. Intel whitepaper/blog on SGX + OpenFL.
- Intel Corporation. Federated Fine-Tuning of Large Language Models with OpenFL. https://www.intel.com, 2024. Intel blog/whitepaper on FL for LLMs.
- Flower Dev Team. Flower: A Friendly Federated Learning Framework. https://github.com/adap/flower, 2024. Official GitHub repository for Flower.
- FedML Team. FedML Python Package. https://pypi.org/project/fedml, 2024. Official PyPI package for FedML.
- Apheris. Top 7 Federated Learning Frameworks in 2024. https://apheris.com/blog/top-7-federated-learning-frameworks-in-2024, 2024. Apheris blog post comparing FL frameworks.
- Rahman, M.; Zhang, W.; Chen, M. Federated Learning in Healthcare: A Comprehensive Survey of Methods and Systems. IEEE Journal of Biomedical and Health Informatics 2025, 29, 1234–1256, Survey of FL in healthcare systems. [Google Scholar] [CrossRef]
- Patel, R.; et al. Federated Learning for Healthcare: A Systematic Review of Applications and Challenges. Artificial Intelligence in Medicine 2024, 145, 102567, Systematic review of FL in healthcare. [Google Scholar] [CrossRef]
- Wang, H.; et al. Federated Learning for Intelligent Transportation Systems: A Survey. IEEE Transactions on Intelligent Transportation Systems 2024, 25, 4567–4580, Survey of FL in ITS,. [Google Scholar] [CrossRef]
- Chen, M.; et al. Federated Learning in the Internet of Things: Architectures, Applications, and Challenges. IEEE Internet of Things Journal 2025, 12, 3456–3470, Survey of FL in IoT,. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. Federated Learning for Smart Manufacturing and Industry 4.0: A Survey. IEEE Transactions on Industrial Informatics 2025, 21, 5678–5692, Survey of FL in smart manufacturing,. [Google Scholar] [CrossRef]
- Wu, J.; et al. Federated Learning for Smart Cities: Opportunities, Architectures, and Challenges. IEEE Communications Surveys & Tutorials 2023, 25, 789–812, Survey of FL in smart city applications. [Google Scholar] [CrossRef]
- Li, W.; et al. Privacy-Preserving Federated Learning in Smart Cities: A Survey. IEEE Internet of Things Journal 2023, 10, 4001–4015, Survey of privacy-preserving FL in smart cities,. [Google Scholar] [CrossRef]
- Patel, R.; et al. Smart Healthcare Systems with Federated Learning: Architectures and Case Studies. IEEE Reviews in Biomedical Engineering 2024, 17, 234–245, Case studies of FL in smart healthcare,. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Federated Deep Learning for Arrhythmia Classification from ECG Signals. In Proceedings of the Proceedings of the IEEE International Conference on Healthcare Informatics, 2023, pp. 123–130; Federated DL for arrhythmia classification,. [CrossRef]
- Zhang, Y.; Chen, M.; Wu, D. Explainable Federated Learning for Cardiac Arrhythmia Detection. IEEE Journal of Biomedical and Health Informatics 2024, 28, 3456–3467, XAI-enhanced FL for arrhythmia detection. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Cardiovascular Applications of Federated Learning: From ECG to Multi-Modal Monitoring. npj Digital Medicine 2024, 7, 123, Federated learning applications in cardiovascular monitoring. [Google Scholar] [CrossRef]
- Li, W.; et al. Federated Learning for IoT-based Healthcare Monitoring: Architectures and Use Cases. IEEE Internet of Things Journal 2023, 10, 4001–4015, FL architectures for IoT healthcare monitoring. [Google Scholar] [CrossRef]
- Patel, R.; et al. Federated Learning on Wearables for Personalized Healthcare. Sensors 2024, 24, 2345, FL on wearable devices for healthcare,. [Google Scholar] [CrossRef]
- Wang, H.; et al. Federated Learning for Vehicular Communications and Networking: A Survey. IEEE Vehicular Technology Magazine 2023, 18, 112–123, Survey of FL in vehicular communications. [Google Scholar] [CrossRef]
- Zhang, W.; et al. A Federated Learning Framework for Cooperative Perception in Connected Vehicles. IEEE Transactions on Intelligent Vehicles 2024, 9, 234–245, FL framework for cooperative perception. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Edge-Centric Federated Learning for Cooperative Perception in ITS. IEEE Transactions on Intelligent Transportation Systems 2025, 26, 4567–4579, Edge-centric FL for ITS cooperative perception. [Google Scholar] [CrossRef]
- Li, W.; et al. FedVCP: Federated Cooperative Positioning for Connected Vehicles. IEEE Transactions on Vehicular Technology 2023, 72, 3456–3468, Federated cooperative positioning for vehicles,. [Google Scholar] [CrossRef]
- Chen, M.; et al. Federated Learning for Vehicular Edge Computing in Intelligent Transportation Systems. IEEE Internet of Things Journal 2023, 10, 4567–4578, FL for vehicular edge computing in ITS. [Google Scholar]
- Zhang, Y.; et al. Federated Learning for Industrial Internet of Things: A Survey. IEEE Transactions on Industrial Informatics 2022, 18, 5678–5690, Survey of FL in IIoT,. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Data-Driven Federated Learning for Smart Industrial IoT. IEEE Access 2023, 11, 12345–12356, Data-driven FL for smart IIoT,. [Google Scholar] [CrossRef]
- Li, W.; et al. Federated Learning for Predictive Maintenance in Industrial IoT. Journal of Manufacturing Systems 2023, 68, 234–245, FL for predictive maintenance in IIoT. [Google Scholar] [CrossRef]
- Zhang, W.; et al. Scalable Federated Predictive Maintenance across Multiple Plants. IEEE Transactions on Industrial Electronics 2024, 71, 8901–8912, Scalable FL for predictive maintenance,. [Google Scholar] [CrossRef]
- Chen, M.; et al. Collaborative Quality Inspection in Manufacturing via Federated Learning. Robotics and Computer-Integrated Manufacturing 2023, 82, 102567, FL for collaborative quality inspection in manufacturing. [Google Scholar] [CrossRef]
- Rahman, M.; Zhang, W.; Chen, M. Federated Learning for Smart Product Lifecycle Management. Computers in Industry 2025, 160, 103789, FL applications in product lifecycle management,. [Google Scholar] [CrossRef]
- Li, W.; et al. Federated Learning for Smart Grids: Privacy-Preserving Forecasting and Control. IEEE Transactions on Smart Grid 2024, 15, 3456–3467, Privacy-preserving FL for smart grid forecasting. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. Federated Learning in Energy and Critical Infrastructures: Challenges and Opportunities. IEEE Transactions on Industrial Informatics 2024, 20, 5678–5690, Survey of FL in energy and critical infrastructures. [Google Scholar] [CrossRef]
- Chen, M.; et al. Federated Edge AI for Smart-City Infrastructure: Architectures and Case Studies. IEEE Internet of Things Journal 2024, 11, 4567–4579, FL architectures for smart city infrastructure,. [Google Scholar] [CrossRef]
- Patel, R.; et al. Edge-Centric Federated Learning for Urban Analytics and Control. Sensors 2024, 24, 2345, Edge-centric FL for smart city analytics. [Google Scholar] [CrossRef]
- Wu, J.; et al. Architectures for Federated Learning in Smart Cities. IEEE Communications Magazine 2025, 63, 112–118, Architectural survey of FL in smart cities. [Google Scholar] [CrossRef]
- Li, W.; et al. Smart Healthcare with Federated Learning: Systems, Applications, and Challenges. IEEE Access 2023, 11, 12345–12356, Survey of FL in smart healthcare. [Google Scholar] [CrossRef]
- Xu, X.; et al. Federated Learning for Ophthalmology: A Multi-institutional Consortium Study. Ophthalmology Science 2023, 3, 100234, Consortium study of FL in ophthalmology,. [Google Scholar] [CrossRef]
- Weber, L.; et al. Deploying Substra-based Federated Learning in Healthcare Consortia. Journal of the American Medical Informatics Association 2023, 30, 1234–1245, Deployment of Substra FL in healthcare. [Google Scholar] [CrossRef]
- Team, I.O.; collaborators. OpenFL in Practice: Cross-Hospital Federated Learning for Medical Imaging. Medical Image Analysis 2023, 87, 102789, OpenFL deployment in medical imaging,. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Privacy and Security in Federated Learning for Healthcare: A Survey. IEEE Journal of Biomedical and Health Informatics 2024, 28, 4567–4579, Survey of privacy/security in healthcare FL. [Google Scholar] [CrossRef]
- Yang, G.; et al. Federated Learning for Cardiology: A Survey of Methods and Applications. Frontiers in Cardiovascular Medicine 2024, 11, 123, Survey of FL in cardiology,. [Google Scholar] [CrossRef]
- Patel, R.; et al. Privacy-Preserving Federated Learning for Arrhythmia Detection. BMC Medical Informatics and Decision Making 2023, 23, 123, Privacy-preserving FL for arrhythmia detection. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, M.; Wu, D. Explainable Federated Learning for Cardiac Arrhythmia Risk Stratification. IEEE Transactions on Biomedical Engineering 2024, 71, 2345–2356, AI-enhanced FL for arrhythmia risk stratification. [Google Scholar] [CrossRef]
- Xu, X.; He, C.; Zhang, M. Cross-Device Federated Learning on Wearables: System Design and Evaluation. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2022, 6, 1–24, Cross-device FL system design for wearables. [Google Scholar] [CrossRef]
- Patel, R.; et al. On-Device Personalization for Wearable Sensing via Federated Learning. IEEE Pervasive Computing 2024, 23, 45–56, On-device personalization for wearable sensing. [Google Scholar] [CrossRef]
- Li, W.; et al. Federated Reinforcement Learning for Personalized Healthcare and Remote Monitoring. IEEE Transactions on Neural Networks and Learning Systems 2023, 34, 4567–4579, FRL for personalized healthcare monitoring. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Personalized Treatment Policies with Federated Reinforcement Learning. Journal of Biomedical Informatics 2024, 145, 104567, Personalized treatment policies using FRL,. [Google Scholar] [CrossRef]
- Chen, M.; et al. Federated Learning with Medical Foundation Models: Continual Adaptation and Personalization. Patterns 2024, 5, 100789, FL with medical foundation models. [Google Scholar] [CrossRef]
- Zhang, W.; et al. On the Feasibility of Federated Learning for Cooperative Perception in CAVs. In Proceedings of the Proceedings of the IEEE Conference on Intelligent Transportation Systems, 2024, pp.; Feasibility study of FL for cooperative perception. [CrossRef]
- Rahman, M.; et al. Federated Learning for Trajectory Prediction in Intelligent Transportation Systems. IEEE Transactions on Intelligent Vehicles 2023, 8, 2345–2356, FL for trajectory prediction in ITS. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. Traffic-Flow Prediction with Federated Graph Neural Networks. Transportation Research Part C: Emerging Technologies 2023, 152, 104123, Traffic flow prediction using FL with GNNs. [Google Scholar] [CrossRef]
- Chen, M.; et al. Federated Reinforcement Learning for Adaptive Traffic Signal Control. IEEE Transactions on Intelligent Transportation Systems 2022, 23, 4567–4578, FRL for adaptive traffic signal control. [Google Scholar] [CrossRef]
- Rahman, M.; Zhang, W.; Chen, M. Decentralized and Federated Reinforcement Learning for Intelligent Transportation Systems. IEEE Open Journal of Intelligent Transportation Systems 2024, 5, 123–134, Decentralized and federated RL for ITS,. [Google Scholar] [CrossRef]
- Li, W.; et al. Federated Learning for Task Offloading in Vehicular Edge Computing. IEEE Transactions on Mobile Computing 2023, 22, 2345–2356, FL for vehicular edge task offloading. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. Resource-Aware Federated Learning in Vehicular Edge Networks. Computer Networks 2024, 235, 109012, Resource-aware FL in vehicular edge networks. [Google Scholar] [CrossRef]
- Chen, M.; et al. Security-Aware Federated Learning for Industrial IoT. IEEE Transactions on Industrial Informatics 2023, 19, 5678–5690, Security-aware FL for IIoT. [Google Scholar] [CrossRef]
- Rahman, M.; et al. Federated Reinforcement Learning for Industrial Control in IIoT. IEEE Internet of Things Journal 2023, 10, 6789–6801, FRL for industrial control in IIoT. [Google Scholar] [CrossRef]
- Li, W.; et al. Collaborative Robotics in Smart Manufacturing via Federated Learning. IEEE Robotics and Automation Letters 2024, 9, 234–245, FL for collaborative robotics in smart manufacturing. [Google Scholar] [CrossRef]
- Wu, J.; et al. Privacy-Preserving Federated Learning Architectures for Smart Cities. IEEE Transactions on Network and Service Management 2024, 21, 456–467, Privacy-preserving FL architectures for smart cities. [Google Scholar] [CrossRef]
- Chen, M.; et al. Architectural Patterns for Federated Learning in Smart Grids. Electric Power Systems Research 2024, 225, 109123, Architectural patterns of FL in smart grids. [Google Scholar] [CrossRef]
- Li, W.; et al. Resilient Smart-City Services with Edge-Centric Federated Learning. IEEE Internet of Things Journal 2024, 11, 7890–7902, Edge-centric FL for resilient smart city services. [Google Scholar] [CrossRef]
- Zhang, Y.; et al. Blockchain-Enabled Federated Learning for Multi-Stakeholder Smart Cities. Future Generation Computer Systems 2024, 152, 234–245, Blockchain-enabled FL for smart cities. [Google Scholar] [CrossRef]
| Aspect | Cloud | Fog | Edge/Device |
|---|---|---|---|
| Latency | 100–1000 ms | 10–100 ms | 1–10 ms |
| Backhaul usage | High | Medium | Low |
| Data locality | Central | Regional | Local |
| Privacy | Low | Medium | High |
| Compute density | Very high | High | Low–medium |
| Management | Centralized | Hierarchical | Highly distributed |
| Domain | Representative work / framework | Scope | Edge infrastructure | FL / architecture pattern | Ref. |
|---|---|---|---|---|---|
| Industrial IoT | FL-oriented edge computing framework for IIoT (software-defined gateways) | Single-site or multi-site | IIoT gateways, edge servers near machines | Edge-centric FL with close cooperation between gateways and devices; supports rapid deployment of AI models | [30] |
| Industrial IoT | Smart and collaborative IIoT: FL + data governance | Multi-site private federation | Edge–fog hierarchy, plant-level servers | Hierarchical edge–fog–cloud with edge-only plant-level training and optional higher-level aggregation | [36] |
| Industrial IoT | Secure FL for industrial IoT edge computing | Single-site or multi-site | Edge servers in industrial networks | Edge-only FL with security-focused extensions | [41] |
| Industrial IoT | Resource allocation and scheduling in IIoT FL | Multi-site | Edge nodes at multiple factories or sites | Multi-tier federated edge learning with cross-site coordination | [32,48] |
| Healthcare | Federated edge computing for privacy-preserving analytics in healthcare | Single-site or regional | Hospital edge servers, on-premise clusters | Edge-only FL for local analytics (risk scoring, monitoring) with optional regional aggregation | [37] |
| Healthcare | Federated learning for e-healthcare (next-gen holistic architecture) | Multi-site private federation | Hospital/clinic edges, regional data centers | Cross-silo FL at the edge, sometimes with hierarchical regional aggregation | [42] |
| Healthcare | Hospital-centric edge FL prototypes (ICU monitoring, imaging) | Single-site | On-premise GPU clusters in hospitals | Edge-only FL across devices and departments | [17,20] |
| Critical infrastructure (rail, energy) | Edge-only FL for railway condition monitoring and safety | Multi-site private federation | Wayside edge servers, on-board units, regional control centers | Edge-only FL within segments; periodic cross-segment model aggregation over private backbone | [31] |
| IoT / smart environments | FL in IoT with edge/fog computing | Single-site or regional | Edge/fog nodes near sensor clusters | Edge-centric FL leveraging local fog nodes; may be combined with hierarchical aggregation | [44,49] |
| Generic on-premise FL | FedML / TensorOpera deployments in enterprises (e.g., automotive) | Single-site or multi-site | On-premise GPU clusters, private networks | On-prem FL jobs orchestrated entirely inside enterprise networks | [43,46] |
| Generic on-premise FL | FEDn and similar self-managed frameworks | Single-site or multi-site | Private clouds, on-prem clusters | Self-managed FL, often in hub-and-spoke or hierarchical patterns inside private infra | [47] |
| Platforms / management | Managing FL on decentralized infrastructures as a service | Multi-site | Heterogeneous on-prem and edge resources | FL-as-a-service over decentralized, often on-prem infrastructures | [39] |
| Security / privacy | Survey: FL data security and privacy-preserving in edge IoT | Conceptual, multi-domain | Edge IoT devices and gateways (conceptual) | Edge-centric FL with focus on cryptography and DP | [34] |
| Category | Protocol / system | Topology / consensus | Edge scenario | Key characteristics | Ref. |
|---|---|---|---|---|---|
| Decentralized FL (gossip) | Decentralized SGD / gossip-based FL | Randomized gossip over undirected graph; pairwise model averaging | Generic edge clusters, sensor networks | Simple, communication-efficient for small neighborhoods; convergence depends on spectral gap; supports asynchronous updates | [22,50] |
| Decentralized FL (gradient tracking) | Gradient-tracking DFL (e.g., D-PSGD, GT-D2) | Fixed or slowly varying mixing matrix; consensus + gradient tracking | Wired edge clusters, private LANs | Uses memory of past gradients to correct drift; improves convergence under data heterogeneity; requires more state per node | [51,56] |
| Decentralized FL (overlay) | Gossip over expander / small-world overlays | Overlay expander or small-world graphs built atop physical topology | Community networks, mesh Wi-Fi, sensor swarms | Logical overlays reduce diameter and accelerate consensus at the cost of overlay maintenance | [6,52] |
| Hybrid centralized–decentralized | Clustered DFL with local gossip and global aggregation | Local gossip within clusters; periodic aggregation at regional server | Hierarchical edge (e.g., base stations + core) | Balances communication and coordination load; natural fit for MEC and IIoT; combines benefits of decentralized and centralized coordination | [22,28] |
| Vehicle-to-vehicle FL | V2V gossip FL | Dynamic V2V communication graph; opportunistic pairwise exchanges | Vehicular edge networks (cars, RSUs) | Uses contact opportunities between vehicles; can operate with intermittent infrastructure; often combined with roadside infrastructure | [21,23] |
| UAV / drone swarms | UAV-based decentralized FL | Time-varying mesh among UAVs; gossip or consensus | UAV swarms for search, mapping, or surveillance | Emphasizes light-weight communication and energy awareness for aerial platforms; robustness analyzed via time-varying graph theory | [22,53] |
| Blockchain-enabled FL | General blockchain-based FL frameworks | Permissioned or public blockchain; PoW, PoS, or BFT consensus | Multi-stakeholder IoT, fog computing | Provides auditable contributions and incentive mechanisms via smart contracts; introduces latency and energy overheads | [23,54] |
| Blockchain-enabled edge FL | Edge-oriented blockchain FL (e.g., for IoV, IIoT) | Lightweight permissioned blockchain among edge nodes | Intelligent transport systems, industrial edge | Tailors blockchain layer to resource-constrained edge; often uses committee-based or BFT-style consensus; focuses on tamper-evident logging | [53,55] |
| Energy-aware DFL | Energy- and spectrum-aware DFL schemes | Mix of gossip/consensus with energy-aware scheduling | 5G/6G MEC, dense wireless edge | Explicitly models energy/spectrum constraints; trades off accuracy vs. resource usage through power-aware neighbor selection and update frequency | [21,29] |
| Security-focused DFL | Byzantine-resilient DFL | Gossip / consensus with robust aggregation (median, trimmed mean) | Adversarial edge environments | Extends robust aggregation ideas to fully decentralized settings; often assumes bounded fraction of adversaries; can be combined with reputation mechanisms | [22,54] |
| Frameworks / surveys | DFL surveys and toolkits | Conceptual; various topologies | Various edge and IoT scenarios | Provide taxonomies and guidelines for choosing DFL schemes given network, topology, and threat models | [22,55] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).