Submitted:
02 September 2024
Posted:
04 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related work
-
SL and FLThe reference [5] introduces a personalized SL framework to address issues like data leakage and non-iid datasets in decentralized learning. It proposes an optimal cut layer selection method using multiplayer bargaining and the Kalai-Smorodinsky bargaining solution (KSBS). This approach efficiently balances the time of training, usage of energy, and privacy of data. Each device tailors its model for non-i.i.d. datasets while they have a common server-side model which ensures robustness by generalization. Simulation results validate the framework’s effectiveness in achieving optimal utility and addressing decentralized learning challenges. However, they do not address the communication overhead caused by transmitting the forward-propagation results at each local step. The reference [6] provides convergence analysis for Sequential Split Learning (SSL), a variant of SL in which the model training process is conducted sequentially, with each client trained one after the other, on heterogeneous data. It compares SSL with Federated Averaging (FedAvg) showing SSL’s superiority on extremely heterogeneous data. However, in practice, if the heterogeneity of data is mild, FedAvg outperforms SSL. Also, SSL still suffers from large communication overheads between the server and clients.
-
SplitFed learningThe reference [7] presents AdaSFL, a method designed to optimize model training efficiency by controlling local update frequency and batch size. The theoretical analysis demonstrates convergence rates, which facilitate the creation of an adaptive algorithm for adjusting update frequency and batch sizes tailored to heterogeneous workers. However, clients must obtain back-propagation results from the server at each local update. Meanwhile, [8] recommends updating client and server-side models concurrently, utilizing local-loss-based training and auxiliary networks designed specifically for split learning. This parallel training approach effectively reduces latency and eliminates the need for server-to-client communication. The paper includes latency analysis for optimal model partitioning and offers guidelines for model splitting. Specifically, [4] developed a communication and storage-efficient SFL approach. In this method, each client trains a portion of the model and calculates its local loss function using an auxiliary network, leading to reduced communication overhead. Furthermore, the server model is trained based on the sequence of forward propagation results from the clients, ensuring that only one copy of the server model is maintained at any given time. Additionally, [8] suggested a similar framework, albeit with a key difference that each client possesses its separate server model, and these models are aggregated to construct the global server model.
-
Auxiliary networksNeural network training with back-propagation is hindered by inefficiencies arising from the update locking issue, where layers must await the complete propagation of signals through the network before updating [9]. To address this, [9] proposed Decoupled Greedy Learning (DGL), a more straightforward training approach that relaxes the joint training objective greedily, showing significant effectiveness for CNNs in large-scale image classification. This method optimizes the training objective using auxiliary modules or replay buffers to reduce communication delays caused by waiting for backward propagation. [10] addressed the backward update lock constraint by introducing a model that decouples modules through predictions of future computations within the network graph. These models use local information to predict the outcomes of subgraphs, particularly focusing on error gradients. By using synthetic gradients instead of true backpropagated gradients, subgraphs can update independently and asynchronously, realizing decoupled neural interfaces. A similar approach has been adopted for training in SFL by [4,8]. Indeed, they use an auxiliary model to replace the server model. The mentioned research demonstrates that an auxiliary model with a relatively smaller dimension compared to the server model performs sufficiently well in serving as a replacement.
3. SplitFed Learning Scenario
| Algorithm 1 CSE-SFL [4] |
|
4. Convergence rate analysis
4.1. Client-Side Model Convergence
4.2. Server-Side Model Convergence
5. Discussion and Conclusions
5.1. Summary of Contributions
- Convergence Analysis: We clearly formulated the CSE-FSL algorithm developed by [4]. We conducted a comprehensive convergence rate analysis under both full and partial client participation scenarios given the non-i.i.d. dataset and non-convex loss function. The convergence guarantees are derived under several assumptions, including L-smoothness of the objective functions, unbiased gradient estimators, and bounded gradient variances which are natural in conventional FL convergence analysis.
-
Key Results:
- −
- Client-Side Model: We demonstrated that, under full client participation, the client-side model converges with a rate of . This result highlights the effectiveness of the algorithm in achieving linear convergence rates while accommodating the federated setting’s constraints and sequential update of the server model. An increase in l, causes a longer convergence time which is obvious as it means the server model will be updated after more global rounds.
- −
- Server-Side Model: For the server-side model, we established convergence rates of under both full and partial client participation scenarios. This result underscores the robustness of the algorithm in ensuring effective learning even when clients participate partially. This also demonstrates that the number of clients and their local steps are not effective in speeding up the convergence in contrast to FL settings.
5.2. Implications
Appendix A. Proofs
Appendix A.1. Client-Side Model Convergence
Appendix A.2. Server-Side Model Convergence
References
- McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data, 2023. arXiv:cs.LG/1602.05629].
- Thapa, C.; Chamikara, M.A.P.; Camtepe, S. SplitFed: When Federated Learning Meets Split Learning. CoRR 2020, abs/2004.12088, [2004.12088].
- Gupta, O.; Raskar, R. Distributed learning of deep neural network over multiple agents. Journal of Network and Computer Applications 2018, 116, 1–8. [Google Scholar] [CrossRef]
- Mu, Y.; Shen, C. Communication and Storage Efficient Federated Split Learning. arXiv preprint 2023, arXiv:2302.05599. [Google Scholar]
- Kim, M.; DeRieux, A.; Saad, W. A bargaining game for personalized, energy efficient split learning over wireless networks. 2023 IEEE Wireless Communications and Networking Conference (WCNC). IEEE, 2023, pp. 1–6.
- Li, Y.; Lyu, X. Convergence Analysis of Sequential Split Learning on Heterogeneous Data. arXiv preprint 2023, arXiv:2302.01633. [Google Scholar]
- Liao, Y.; Xu, Y.; Xu, H.; Yao, Z.; Wang, L.; Qiao, C. Accelerating federated learning with data and model parallelism in edge computing. IEEE/ACM Transactions on Networking 2023. [CrossRef]
- Han, D.J.; Bhatti, H.I.; Lee, J.; Moon, J. Accelerating federated learning with split learning on locally generated losses. ICML 2021 workshop on federated learning for user privacy and data confidentiality. ICML Board, 2021.
- Belilovsky, E.; Eickenberg, M.; Oyallon, E. Decoupled greedy learning of cnns. International Conference on Machine Learning. PMLR, 2020, pp. 736–745.
- Jaderberg, M.; Czarnecki, W.M.; Osindero, S.; Vinyals, O.; Graves, A.; Silver, D.; Kavukcuoglu, K. Decoupled neural interfaces using synthetic gradients. International conference on machine learning. PMLR, 2017, pp. 1627–1635.
- Ghadimi, S.; Lan, G. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization 2013, 23, 2341–2368. [Google Scholar] [CrossRef]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 2021, 14, 1–210. [Google Scholar] [CrossRef]
- Reddi, S.; Charles, Z.; Zaheer, M.; Garrett, Z.; Rush, K.; Konečnỳ, J.; Kumar, S.; McMahan, H.B. Adaptive federated optimization. arXiv preprint 2020, arXiv:2003.00295. [Google Scholar]
- Reisizadeh, A.; Mokhtari, A.; Hassani, H.; Jadbabaie, A.; Pedarsani, R. Fedpaq: A communication-efficient federated learning method with periodic averaging and quantization. International conference on artificial intelligence and statistics. PMLR, 2020, pp. 2021–2031.
- Yang, H.; Fang, M.; Liu, J. Achieving linear speedup with partial worker participation in non-iid federated learning. arXiv preprint 2021, arXiv:2101.11203. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).