Submitted:
28 August 2024
Posted:
29 August 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Q1
- Which are the impacts of heterogeneous environments in Federated Learning?
- Q2
- How can the federated learning scenario benefit from different systems and architectures?
- Q3
- Which are the privacy implications of using federated learning?
- Q4
- How can a cluster management system (i.e. Kubernetes) make the application of heterogeneous machine learning models feasible and more easy to do?
1.1. Federated Learning and Edge Computing in the Internet of Things
2. Related Work
2.1. Dynamic Federated Learning
2.2. Heterogeneous Federated Learning
- Communication:
- It can be difficult to coexist with Networks with variable bandwidth, latency, and reliability. These are difficult to manage and correct. For instance, protocols such as TCP/IP are compatible with changing reliability of the communication, but not so elastic in terms of infrastructural changes such IP or handover [7].
- Models:
- Different model, machine-learning architectures, data formats, and data dimensions can create difficulties in aggregation. For instance, two distinct layout of neural networks can generate different results with the same data. In this case, the results need to be evaluated and analyzed with a specific algorithm [8].
- Statistics:
- Non-uniform data distributions and misaligned statistics can affect the quality of the overall model. For instance dealing with heterogeneous data from different sources can be difficult and need to be adapted with specific algorithms and preparations, otherwise can lead to over-fit and under-fit problems in the models.
- Devices:
- Limitations in computing power, memory, and communication capabilities between devices can slow learning process [2]. For instance, distributing the analysis between low powered end devices such as smartphones or low-end IoT devices and High End processing servers could slow down the high end devices.
2.3. Privacy Leakage Problem
3. Method
3.1. Problem Statement and Motivation
4. Statistical Methodology for Federated Learning
4.1. Federated Averaging
- Initialization: The central server initializes the global model with initial weights and distributes it to all participating clients.
- Local Training: Each client receives the global model and trains it locally on its data for a predefined number of epochs or iterations. This local training produces an updated set of weights specific to each client.
- Sending Weights: Clients send their updated weights to the central server. During this process, only the model weights are transferred, while the raw data remains on local devices, thus preserving user privacy in a broad way.
- Aggregation: The central server collects all updated weights from clients and calculates the weighted average of these weights to update the global model. Weighting can take into account the amount of local training data from each client, ensuring that clients with more data have a greater influence on updating the model.
- Iteration: This process of deploying the global model, training locally, dispatching weights, and aggregating is repeated for a predefined number of rounds until the global model reaches a desired convergence or performance level.
| Algorithm 1: FedAvg: Federated Averaging Algorithm |
![]() |
4.2. Privacy Leakage
- Gradient Inversion Attacks:
- Attackers can reconstruct original data from the shared gradients during the model update process. Even though raw data is not directly shared, gradients can carry enough information to reveal sensitive data points. For example, given a gradient computed with respect to the model weights and input data , it is possible to approximate by minimizing the difference between the computed gradient and the gradient of a guessed input.
- Model Updates:
- Repeatedly sharing model updates can lead to leakage of information about the training data. Over time, these updates can accumulate enough information for an attacker to infer private data. For instance, in a typical FedAvg setup, the global model update at round is computed as:where is the learning rate, is the number of data points at client k, n is the total number of data points, and is the local model update from client k. If the model update is performed whithin a short period of time, the attacker can infer from the absence or the high volume of sent data the position or the details on the data of the user. Otherwise, if the model is not sufficiently protected in transit, the attacker can intercept it and try to analyze it.
- Side-Channel Attacks:
- Attackers can exploit side-channel information, such as the timing or size of the communications between clients and the server, to gain insights into the data being processed. These side-channels can indirectly leak sensitive information even if the data and model updates are encrypted. Famous attacks such as Spectre and Meltdown fall in this category.
- Membership Inference Attacks:
- These attacks aim to determine whether a specific data point was part of the training dataset. This is particularly concerning in scenarios where the presence of certain data points can imply sensitive information. Given a model and a data point , an attacker can train a shadow model to infer the membership status of by analyzing the output confidence scores.
- Differential Privacy:
- A technique that adds noise to the gradients or the model parameters to mask the contribution of individual data points[15]. By ensuring that the inclusion or exclusion of a single data point does not significantly affect the output, differential privacy provides a strong privacy guarantee. The formal definition of differential privacy is:where is the randomized mechanism, D and are datasets differing by one element, S is a subset of possible outputs, is the privacy budget, and is a small probability.
- Secure Multi-Party Computation (SMPC):
- SMPC protocols[16] allow multiple parties to jointly compute a function over their inputs while keeping those inputs private. In the context of FL, SMPC can be used to securely aggregate model updates without exposing individual contributions. For example, using additive secret sharing, each client k splits its update into shares and distributes them among the clients. The server only receives the aggregated result, which is the sum of all shares.
- Homomorphic Encryption:
- This allows computations to be performed on encrypted data without needing to decrypt it first. In FL, homomorphic encryption can be used to perform model aggregation securely, ensuring that the server does not see the raw updates from clients. Given an encryption function E and a decryption function D, homomorphic encryption ensures that:Unfortunately, at the moment of writing, Homomorphic Encryption is orders of magnitude slower than non-homomorphic alternatives, making it unfeasible for large amount of data.
5. Application of Multilayer Architecture
- Scalability:
- The cloud environment can be easily scaled to meet changing needs for computational resources. By leveraging cloud platforms, resources can be dynamically allocated based on current demand, allowing for seamless expansion or contraction of the infrastructure. This is particularly useful in federated learning (FL), where the number of participating devices and the volume of data can vary significantly over time. Autoscaling features in Kubernetes ensure that container instances are automatically adjusted to handle fluctuating workloads, maintaining optimal performance without manual intervention.
- Flexibility:
- Containers can be used to isolate and manage different components of the FL system, making it easy to adapt to heterogeneous devices and data. Each container can be configured with the specific dependencies and environment required for a particular task, ensuring consistency and reproducibility across different nodes. This modular approach allows for rapid deployment of updates and new features, as well as easier troubleshooting and maintenance. Furthermore, containers enable the use of diverse programming languages and tools within the same FL system, enhancing the ability to integrate with various data sources and device capabilities.
- Reliability:
- The cloud environment provides a reliable and resilient infrastructure that can tolerate node failures and other issues. Cloud providers typically offer robust service level agreements (SLAs) and fault-tolerant architectures that ensure high availability. Docker and Kubernetes contribute to this reliability by managing container health checks, restarts, and failovers. In the event of a node failure, Kubernetes can seamlessly migrate workloads to other healthy nodes, minimizing downtime and maintaining the continuity of the FL process. Additionally, the use of multi-zone or multi-region deployments can further enhance the resilience of the system against localized failures [17].
- A cluster of VMs in the cloud:
- VMs provide the isolation and computational resources needed to run containers. Each VM can host multiple containers, offering a layer of abstraction that separates the hardware from the application layer. This isolation ensures that each container operates in a controlled environment, preventing conflicts and enhancing security. By distributing containers across multiple VMs, the system can leverage the cloud’s elasticity to optimize resource utilization and cost efficiency.
- A container orchestrator (Docker and Kubernetes):
- The orchestrator manages the lifecycle of containers, ensuring they are always running and automatically scaling them as needed. Docker provides the containerization platform, while Kubernetes (k3s is a lightweight Kubernetes distribution designed for resource-constrained environments such as edge computing and IoT devices. It simplifies Kubernetes by reducing dependencies and the overall binary size) handles the orchestration. These tools automate the deployment, scaling, and operation of application containers across clusters of hosts. Kubernetes’ advanced scheduling capabilities ensure that containers are optimally placed based on resource requirements and constraints, improving efficiency and performance.
- A federated learning module:
- The FL module handles communication between participants, model aggregation, and local model updating. This module is responsible for coordinating the training process, ensuring that updates from local models are securely and accurately aggregated to form a global model. It manages the distribution of the global model to clients, collects local updates, and performs federated averaging or other aggregation techniques. The module also handles encryption and secure communication protocols to protect the integrity and confidentiality of the data being transmitted.
- A data management module:
- The data management module pre-processes the data, distributes it to participants, and ensures data privacy. This module is crucial for handling the diverse and often sensitive nature of the data used in FL. It includes functionalities for data normalization, anonymization, and encryption to comply with privacy regulations and protect user information. The module also manages the allocation of data to ensure balanced and representative training across all participants, improving the robustness and fairness of the final model.
5.0.1. Architectural Support using Docker and Kubernetes
- Isolation and Consistency: Containers provide isolated environments for each participating node, ensuring that computations are consistent and reproducible. This isolation limits the potential for privacy leakage between nodes.
- Orchestration and Scaling: Kubernetes automates the deployment, scaling, and operation of containers. This ensures efficient resource management and helps in dynamically adapting to changing computational demands, which is essential in large-scale FL systems.
- Secure Communication: Containers can be configured to enforce secure communication protocols, ensuring that data in transit is protected. Kubernetes can manage and automate the deployment of these secure channels.
- Automated Updates: Integration with Continuous Integration/Continuous Deployment (CI/CD) tools ensures that security patches and updates are consistently applied across all nodes, reducing vulnerabilities.
5.1. Implementation
5.1.1. Results
6. Final Considerations and Future Directions
6.1. Conclusion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; others. Advances and open problems in federated learning. Foundations and trends® in machine learning 2021, 14, 1–210. [Google Scholar] [CrossRef]
- Ye, M.; Fang, X.; Du, B.; Yuen, P.C.; Tao, D. Heterogeneous federated learning: State-of-the-art and research challenges. ACM Computing Surveys 2023, 56, 1–44. [Google Scholar] [CrossRef]
- Jere, S. Federated Learning in Mobile Edge Computing: An Edge-Learning Perspective for Beyond" 5G. Signal Processing (eess.SP) 2020. [Google Scholar] [CrossRef]
- Parra-Ullauri, J.M.; Madhukumar, H.; Nicolaescu, A.C.; Zhang, X.; Bravalheri, A.; Hussain, R.; Vasilakos, X.; Nejabati, R.; Simeonidou, D. kubeFlower: A privacy-preserving framework for Kubernetes-based federated learning in cloud–edge environments. Future Generation Computer Systems 2024, 157, 558–572. [Google Scholar] [CrossRef]
- Kim, J.; Kim, D.; Lee, J. Design and implementation of kubernetes enabled federated learning platform. 2021 international conference on information and communication technology convergence (ICTC). IEEE, 2021, pp. 410–412.
- Pham, K.Q.; Kim, T. Elastic Federated Learning with Kubernetes Vertical Pod Autoscaler for edge computing. Future Generation Computer Systems 2024, 158, 501–515. [Google Scholar] [CrossRef]
- Hansmann, W.; Frank, M. On things to happen during a TCP handover. 28th Annual IEEE International Conference on Local Computer Networks, 2003. LCN’03. Proceedings. IEEE, 2003, pp. 109–118.
- Melin, P.; Monica, J.C.; Sanchez, D.; Castillo, O. Multiple ensemble neural network models with fuzzy response aggregation for predicting COVID-19 time series: the case of Mexico. Healthcare. MDPI, 2020, Vol. 8, p. 181.
- Muñoz-González, L.; Biggio, B.; Demontis, A.; Paudice, A.; Wongrassamee, V.; Lupu, E.C.; Roli, F. Towards poisoning of deep learning algorithms with back-gradient optimization. Proceedings of the 10th ACM workshop on artificial intelligence and security, 2017, pp. 27–38.
- Zhou, X.; Xu, M.; Wu, Y.; Zheng, N. Deep model poisoning attack on federated learning. Future Internet 2021, 13, 73. [Google Scholar] [CrossRef]
- Chabanne, H.; Danger, J.L.; Guiga, L.; Kühne, U. Side channel attacks for architecture extraction of neural networks. CAAI Transactions on Intelligence Technology 2021, 6, 3–16. [Google Scholar] [CrossRef]
- Hu, H.; Salcic, Z.; Sun, L.; Dobbie, G.; Yu, P.S.; Zhang, X. Membership inference attacks on machine learning: A survey. ACM Computing Surveys (CSUR) 2022, 54, 1–37. [Google Scholar] [CrossRef]
- Ma, X.; Zhu, J.; Lin, Z.; Chen, S.; Qin, Y. A state-of-the-art survey on solving non-IID data in Federated Learning. Future Generation Computer Systems 2022, 135, 244–258. [Google Scholar] [CrossRef]
- Qu, Z.; Lin, K.; Li, Z.; Zhou, J. Federated learning’s blessing: Fedavg has linear speedup. ICLR 2021-Workshop on Distributed and Private Machine Learning (DPML), 2021.
- Huang, Z.; Hu, R.; Guo, Y.; Chan-Tin, E.; Gong, Y. DP-ADMM: ADMM-based distributed learning with differential privacy. IEEE Transactions on Information Forensics and Security 2019, 15, 1002–1012. [Google Scholar] [CrossRef]
- Lindell, Y. Secure multiparty computation. Communications of the ACM 2020, 64, 86–96. [Google Scholar] [CrossRef]
- Duan, Q.; Huang, J.; Hu, S.; Deng, R.; Lu, Z.; Yu, S. Combining Federated Learning and Edge Computing Toward Ubiquitous Intelligence in 6G Network: Challenges, Recent Advances, and Future Directions. IEEE Communications Surveys & Tutorials 2023, 25, 2892–2950. [Google Scholar] [CrossRef]
- Beutel, D.J.; Topal, T.; Mathur, A.; Qiu, X.; Fernandez-Marques, J.; Gao, Y.; Sani, L.; Li, K.H.; Parcollet, T.; de Gusmão, P.P.B. ; others. Flower: A friendly federated learning framework 2022.
- Shamsian, A.; Navon, A.; Fetaya, E.; Chechik, G. Personalized federated learning using hypernetworks. International Conference on Machine Learning. PMLR, 2021, pp. 9489–9502.
- Wang, J.; Li, Y.; Ye, R.; Li, J. High Precision Method of Federated Learning Based on Cosine Similarity and Differential Privacy. 2022 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics); IEEE: Espoo, Finland, 2022; pp. 533–540. [CrossRef]
| 1 | Track changes and periodical updates are available on the repository: https://github.com/FabioLiberti/DHFLPL
|
| 2 | |
| 3 | |
| 4 | An image dataset with 10 classes, available at https://www.cs.toronto.edu/~kriz/cifar.html
|


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
