Preprint
Article

This version is not peer-reviewed.

A Privacy-Enhanced Multi-Stage Dimensionality Reduction Vertical Federated Clustering Framework

A peer-reviewed version of this preprint was published in:
Electronics 2025, 14(16), 3182. https://doi.org/10.3390/electronics14163182

Submitted:

08 July 2025

Posted:

09 July 2025

You are already at the latest version

Abstract
Federated Clustering (FL clustering) aims to discover latent knowledge in multi-source distributed data through clustering algorithms while preserving data privacy. Federated learning is categorized into horizontal and vertical federated learning based on data partitioning scenarios. Horizontal Federated Learning is applicable to scenarios with overlapping feature spaces but different sample IDs across parties. Vertical federated learning facilitates cross-institutional feature complementarity, particularly suited for scenarios with highly overlapping sample IDs yet significantly divergent features.As a classic clustering algorithm, k-means has seen extensive improvements and applications in horizontal federated learning. However, its application in vertical federated learning remains insufficiently explored, with room for enhancement in privacy protection and communication efficiency. Simultaneously, client feature imbalance may lead to biased clustering results.To improve communication efficiency, this paper introduces Product Quantization (PQ) to compress high-dimensional data into low-dimensional codes by generating local codebooks. Leveraging the inherent k-means algorithm within PQ, local training preserves data structures while overcoming privacy risks associated with traditional PQ methods that require server-side data reconstruction (which may leak data distributions).To enhance privacy without compromising performance, Multidimensional Scaling (MDS) maps codebook cluster centers into distance-preserving indices. Only these indices are uploaded to the server, eliminating the need for data reconstruction. The server executes k-means on the indices to minimize intra-group similarity and maximize inter-group divergence. This scheme retains original codebooks locally for strict privacy protection.The nested application of PQ and MDS significantly reduces communication volume and frequency while effectively alleviating clustering bias caused by client feature dimension imbalance. Validation on the MNIST dataset confirms that the approach maintains k-means clustering performance while meeting federated learning requirements for privacy and efficiency.
Keywords: 
;  ;  ;  ;  ;  ;  

1. Introduction

Federated Learning (FL) is a distributed machine learning approach whose core concept involves multiple clients (e.g., mobile devices or enterprises) collaboratively training a shared global model without sharing local data[1]. Federated Learning Clustering integrates federated learning and clustering algorithms to mine latent knowledge from multi-source distributed data while ensuring data privacy protection[2]. Federated learning faces multiple challenges including data heterogeneity[3,4], communication bottlenecks[1,5], and privacy protection[6].
Federated learning is categorized into horizontal federated learning (Horizontal Federated Learning,HFL)[7] and vertical federated learning (Vertical Federated Learning, VFL)[8] based on data distribution patterns. In horizontal federated learning, participating parties share identical features but possess distinct sample IDs[9]. In vertical federated learning scenarios, the data distribution pattern entails different clients owning distinct feature sets while sharing identical sample IDs. Horizontal federated learning finds widespread application in edge computing and IoT collaboration[10,11]. Vertical federated learning’s value lies in cross-institutional feature complementarity, particularly suitable for scenarios with highly overlapping sample IDs yet significantly divergent features, such as in healthcare[12], finance[13], and other industries.
Current research on federated clustering algorithms predominantly focuses on the horizontal federated domain, with relatively limited exploration in vertical federated settings. Compared to horizontal federated learning, vertical federated learning relies more heavily on high-frequency encrypted interactions and high-dimensional feature transmission [14], imposing greater demands on communication. Simultaneously, vertical federated learning requires multi-party collaboration to compute loss functions and gradients. Due to features being dispersed across different participants, it necessitates reliance on Secure Multi-Party Computation (MPC), Homomorphic Encryption (HE), and other technologies to achieve joint training under privacy protection , all incurring substantial computational costs[15].In 2023, Zitao Li et al. proposed a differentially private k-means clustering algorithm based on the Flajolet-Martin (FM) technique[16]. By aggregating differentially private cluster centers and membership information of local data on an untrusted central server, they constructed a weighted grid as a summary of the global dataset, ultimately generating global centers by executing the k-means algorithm. Subsequently, in the paper[17] , they introduced Flajolet-Martin (FM) sketches to encode local data and estimate cross-party marginal distributions under differential privacy constraints, thereby constructing a global Markov Random Field (MRF) model to generate high-quality synthetic data.Federico[18] proposed a vertical federated k-means algorithm based on homomorphic encryption and differential privacy protection, demonstrating its superior clustering performance (e.g., k-means loss and clustering accuracy) over traditional privacy-preserving K-Means algorithms while maintaining the same privacy level.Li [19]et al. proposed a vertical federated density peaks clustering algorithm based on a hybrid encryption framework. Building upon the merged distance matrix, they introduced a more effective clustering method under nonlinear mapping, enhancing Density Peaks Clustering (DPC) performance while addressing privacy protection issues in Vertical Federated Learning (VFL).
This study addresses the privacy protection issues of K-Means clustering in vertical federated learning scenarios by proposing a federated clustering framework based on lightweight multi-stage dimensionality reduction. It primarily resolves three major challenges: privacy constraints, communication bottlenecks, and feature imbalance problems.
Our main contributions are:
1)
We innovatively propose a multi-stage dimensionality reduction framework applicable to vertical federated learning clustering based on Product Quantization (PQ) and Multidimensional Scaling (MDS) techniques. Locally, feature compression and codebook generation effectively reduce data volume, while PQ-quantized parameters inherently provide noise injection effects to enhance privacy. Innovatively, we introduce one-dimensional MDS embedding to map clustering centers in the codebook into distance-preserving indices. This achieves zero raw codebook upload, abandons data reconstruction on the server side, and fundamentally eliminates the risk of data distribution leakage.
2)
The multi-stage dimensionality reduction mechanism significantly reduces transmitted data volume and communication frequency while ensuring clustering accuracy, thereby improving communication efficiency.
3)
The combination of PQ dimensionality reduction and MDS embedding algorithms mitigates clustering bias caused by feature imbalance in vertical federated learning.
4)
Extensive experiments on the MNIST dataset validate that our algorithm satisfies federated learning privacy requirements while preserving clustering accuracy.
The structure of our paper is arranged as follows: Section II discusses existing clustering algorithms, challenges in VFL, and theoretical analyses of various dimensionality reduction techniques; Section III details our algorithmic design; Section IV presents the experimental process and results analysis conducted to validate our algorithm; Finally, Section V concludes the paper.

3. Method

This research comprehensively addresses factors such as feature imbalance, communication efficiency, and privacy preservation, designing a multi-stage dimensionality reduction clustering framework for vertical federated learning scenarios.
This framework is implemented based on ‌PQ quantization‌ and ‌MDS techniques‌. After privacy-preserving alignment across multiple clients, the product quantization technique replaces traditional local model training. Algorithmically, PQ fundamentally relies on k-means clustering principles, compressing high-dimensional features into low-dimensional codes to significantly reduce communication overhead.Regarding privacy protection design: directly uploading codebooks (cluster center sets) risks exposing original data distributions. We observe clustering effectiveness heavily depends on relative distances between cluster centers. Thus, we employ the ‌MDS one-dimensional embedding algorithm‌, f : R n R , f ( x I ) = z i ensuring:
z i z j x i x j , i , j
The server executes a one-shot clustering algorithm on distance-preserving indices to obtain the clustering structure, achieving maximized intra-group similarity and Maximized inter-group divergence
This approach preserves raw codebook privacy locally while maintaining distance relationships essential for clustering through indices, ensuring algorithmic performance and further enhancing communication efficiency.
The adoption of ‌multi-stage dimensionality reduction‌ instead of direct dimensionality reduction is partially due to significant disparities in feature counts across clients. Direct reduction would bias clustering results toward certain features under such imbalances.
Figure 1 shows the process of the algorithm.

3.1. Algorithm Design

Suppose there is a federated learning framework with m clients. The global dataset is D with global dimension dim, distributed across clients. D g denotes the dataset of the g-th client, where data points in D g are d i m g d i m e n s i o n a l vectors. There are d i m g = d i m . Define sub_dim as the subspace dimensionality in PQ quantization, and sub_k as the number of cluster centers per subspace in PQ quantization. C o d e b o o k g represents the codebook of the g-th client, and c o d e g i denotes the cluster center set of the i-th subspace for the g-th client, c o d e g i is the data index set of the g-th client, and B is the global data index set.
Figure 1.
Figure 1.
Preprints 167063 g001
Algorithm Mainly Consists of the Following Steps:
  • ‌Encrypted Entity Alignment
    Select common samples: Extract common samples from each party’s dataset.
  • ‌Local Initialization and Training of PQ Quantizer:
    1)
    Pad dimensions: Determine whether the local dimension dim is divisible by sub_dim. If not divisible, pad dimensions.
    First, calculate the number of dimensions p to pad:
    d i m = s u b _ d i m × s u b _ k
    p = d i m d i m
    Then, pad the original vector by appending p zeros to its end.
    v = [ v 1 , v 2 , , v dim , 0 , 0 , , 0 p ]
    2)
    Generate subspace codebooks based on training data.
  • Secondary mapping:
    Perform secondary mapping on cluster centers in subspace codebooks using MDS one-dimensional embedding algorithm, then use the normalized mapped values as codebook indices.
  • Data transmission:
    Codebooks are stored locally. Transfer the indices to the server side.
  • Server-side global cluster center aggregation:
    Execute k-means algorithm on the indices uploaded by clients at the server side to obtain abstract global cluster centers.
    When using abstract cluster centers, apply the same mapping operation to local data, then compute distances to global abstract cluster centers to determine true cluster assignments.
    The algorithm requires only one round of communication.
    Table 1 Symbol Explanation:
Table 1. Symbol Explanation:
Table 1. Symbol Explanation:
Symbol Description
D Description
sub_dim Subspace data dimensionality
sub_k k-value for the g-th client
D g Dataset of the g-th client
c o d e b o o k g Codebook of the g-th client
c o d e g i Cluster center set of the i-th subspace for the g-th client
B g Data index set of the g-th client
B Global data index set
Algorithm Pseudocode:
Algorithm 3 Algorithm:
Input: Distributed dataset D = D 1 , D 2 , , D m , where D g is the data of the g-th client. Number of clusters k. Maximum iterations T.
Output: Global cluster centers C = c 1 , c 2 , , c k .
Setps:
1:
   RSA Private Set Intersection
1)
Server generates RSA key pair and broadcasts public key
2)
Clients blind their own elements
3)
Server signs blinded elements
4)
Clients unblind received signatures
5)
Clients send unblinded signatures to server
6)
Server computes intersection signatures and sends to clients
7)
Clients map signatures back to original elements
2:
   Model Training:
1)
Server distributes parameters: sub_dim (subspace dimensionality), ks (number of subspace cluster centers)
2)
At local client (g-th client):
   If dim not divisible by sub_dim, pad dimensions with zeros.
   codebook=pq.train(sub_dim,ks)
   for code in codebook:
      MDS dimensionality reduction:code = mds(code)
      Normalization:code = normalize(code)
      Add to code_list:code_list.append(code)
   Convert data to code indices: data_codes = pq.encode(data,code_list)
   Upload encrypted data indices to server
3)
Server aggregation:    Collect data indices from all clients.
   Global index set B = Φ
   for client in client_list:       Column-wise merge: B = B B g
   In abstract index set B:
   Initialize cluster centers by randomly selecting k data points as initial global cluster centers.
   Perform k-means clustering on sample set using initialized global centers to obtain final cluster centers C : c 1 , c 2 , , c k

3.2. Privacy Enhancement

The privacy protection mechanism of this algorithm derives from two aspects:For one thing PQ quantization technology reduces dimensionality and processes raw data, preventing its direct upload to the server.for another, Locally retained PQ codebooks avoid leakage of original data distributions.
Additionally, to defend against attacks like membership inference analysis, differential privacy noise protection can be incorporated. Differential privacy provides a rigorous framework ensuring analytical results never reveal individual information. By adding noise to data or intermediate results, attackers cannot determine any individual’s presence in the dataset. According to differential privacy serial/parallel theorems [40]:
  • Serial Composition‌:For a given dataset D,assume there exists random algorithms M 1 , M 2 , , M n , with privacy budgets ϵ 1 , ϵ 2 , , ϵ n respectively. The composition algorithm M ( M 1 ( D ) , M 2 ( D ) , , M n ( D ) ) provides ( i = 1 n ϵ i -DP protection. That is, for the same dataset, applying a series of differentially private algorithms sequentially provides protection equivalent to the sum of privacy budgets.
  • Parallel Composition:For disjoint datasets D 1 , D 2 , , D n ,assume there exists random algorithms M 1 , M 2 , , M n ,with privacy budgets ϵ 1 , ϵ 2 , , ϵ n respectively,respectively M ( M 1 ( D 1 ) , M 2 ( D 2 ) , , M n ( D n ) ) provides ( m a x ϵ i )-DP privacy budgets. That is,for disjoint datasets, applying different differentially private algorithms separately in parallel provides privacy protection equivalent to the maximum privacy budget among the composed algorithms.
Adding differential privacy noise to raw data or codebooks can effectively enhance privacy protection.

4. Experiments and Results

4.1. Experimental Settings

4.1.1. Dataset

The MNIST dataset is adopted, with columns split according to client feature ratios, and different subsets are distributed to different clients.

4.1.2. Parameter Settings

  • Total number of clients:2
  • Total number of client features: 784
  • Client feature ratios: (1:1),(1:6),(1:13)
  • PQ quantization subspace dimensions: 1,2,4,8
  • Number of PQ quantization cluster centers per subspace: 10,64,128,256

4.1.3. Evaluation Metrics

The evaluation employs NMI and ARI metrics.

4.2. Performance Analysis

Centralized data clustering serves as the baseline for validation.

4.2.1. Comparison Between Proposed Method and Centralized Kmeans

This experiment involves two clients, each with 392 data dimensions at a 1:1 ratio. As shown in Table 2, the proposed algorithm achieves slightly higher average NMI and ARI values under various subspace dimensions and cluster center counts than centralized k-means, indicating effective extraction of original data features.When the subspace dimension is 1, equivalent to performing kmeans on each data column and replacing values with cluster centers, PQ loss is minimized. Data confirms superior performance in this scenario. Performance improves with fewer cluster centers per subspace, partly because the sparse MNIST dataset can be effectively represented by fewer clusters.
Table 2. Algorithm Performance Comparison
Table 2. Algorithm Performance Comparison
Dispe
rsion
Coeffi
cient
Hash
Dimen
sion
Proposed Method Without MDS
Algorithm Using
Codebook Indices
at Server
Without MDS
Algorithm,
Restored Data
at Server
Centralized
Algorithm
NMI ARI NMI ARI NMI ARI NMI ARI
1 10 0.53403 0.42542 0.50209 0.41110 0.51871 0.40490 0.49581 0.36387
2 10 0.49353 0.37211 0.47981 0.38815 0.49406 0.38168
4 10 0.51018 0.38628 0.44081 0.33740 0.52128 0.40873
8 10 0.52476 0.43289 0.42186 0.31696 0.52207 0.40520
1 64 0.49707 0.36712 0.48437 0.37222 0.51585 0.40136
2 64 0.53773 0.42763 0.44766 0.33290 0.49617 0.36369
4 64 0.50124 0.37786 0.42137 0.32132 0.49500 0.38318
8 64 0.48329 0.39052 0.41128 0.34143 0.49390 0.36302
1 128 0.48375 0.36266 0.50666 0.41889 0.49081 0.36068
2 128 0.49855 0.38596 0.46257 0.35915 0.49043 0.36039
4 128 0.48026 0.35906 0.42605 0.34344 0.49403 0.38208
8 128 0.47704 0.37444 0.40413 0.31093 0.48159 0.37014
1 256 0.52029 0.40768 0.49181 0.40905 0.49049 0.36069
2 256 0.48873 0.36059 0.49902 0.39875 0.48150 0.35965
4 256 0.49801 0.39486 0.43289 0.35194 0.49628 0.36436
8 256 0.46309 0.36877 0.41015 0.34877 0.48375 0.36400

4.2.2. Impact of MDS on Algorithm Performance

To evaluate MDS’s impact, we compare against two alternative implementations without MDS: one clusters codebook indices at the server, and another reconstructs data using PQ quantization at the server. Privacy comparison: direct index clustering > proposed method > data reconstruction.
As Table 2 shows, performance comparison: proposed method ≥ data reconstruction direct index clustering. When subspace dimension=1, direct index clustering performs comparably to other methods. However, its performance declines sharply as PQ subspace dimension increases, indicating insufficient capture of data distance features. The proposed method better captures data distance features, yielding slightly superior performance over data reconstruction.

4.2.3. Impact of Client Feature Quantity on Performance

Adjusting feature ratios among clients (1:1, 1:6, 1:13) shows algorithm performance remains relatively stable within a narrow range, confirming client feature ratios do not affect performance.
Table 3. Experimental Results : Impact of Client Feature Quantity on Performance
Table 3. Experimental Results : Impact of Client Feature Quantity on Performance
Number
of
Clients
Feature Ratio
Among
Clients
Subspace
Dimension
Cluster
Centers per
Subspace
Proposed Method
NMI ARI
2 1:1 1 10 0.53403 0.42542
2 1:1 2 10 0.49353 0.37211
2 1:1 4 10 0.51018 0.0.38628
2 1:1 8 10 0.52476 0.43289
2 1:6 1 10 0.51950 0.40631
2 1:6 2 10 0.51160 0.39005
2 1:6 4 10 0.51162 0.38794
2 1:6 8 10 0.52514 0.42123
2 1:13 1 10 0.49672 0.36139
2 1:13 2 10 0.49207 0.36924
2 1:13 4 10 0.51211 0.38790
2 1:13 8 10 0.53030 0.44559

5. Conclusion

Based on the characteristics of clustering algorithms, this study transforms the core contradiction of federated clustering—"data utility vs. privacy preservation vs. communication efficiency"—into a verifiable distance-preserving optimization problem, providing a secure clustering implementation framework for vertical federated learning. The proposed algorithm can address communication challenges in vertical federated learning while preserving privacy.
Addressing three core challenges of K-Means clustering in Vertical Federated Learning—inadequate privacy protection, excessive communication overhead, and feature dimension imbalance—this paper innovatively proposes a multi-level dimensionality reduction federated clustering framework integrating Product Quantization (PQ) and Multidimensional Scaling (MDS). By compressing original high-dimensional features into low-dimensional codes via PQ and further reducing dimensionality through one-dimensional MDS embedding on codebooks, communication efficiency is significantly enhanced. Privacy protection is achieved through: 1) dimensionality reduction lowering data precision and transmission volume, and 2) mapping sensitive codebooks to secure indices via MDS embedding, transmitting only distance-preserving indices to the server. Original codebooks and feature data remain exclusively on local clients. Experimental results demonstrate that the algorithm ensures stable clustering performance while enhancing communication efficiency and privacy. The multi-stage reduction framework does not degrade performance and even slightly outperforms centralized algorithms.
Beyond clustering applications, this multi-stage dimensionality reduction framework can extend to other machine learning algorithms in vertical federated settings. Future work may explore optimizing and integrating diverse privacy mechanisms, such as combining Secure Multi-Party Computation (SMPC) , homomorphic encryption or implementing multi-tiered privacy policies.

References

  1. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
  2. Stallmann, M.; Wilbik, A. Towards Federated Clustering: A Federated Fuzzy-Means Algorithm (FFCM). arXiv preprint arXiv:2201.07316 2022.
  3. Chen, M.; Shlezinger, N.; Poor, H.V.; Eldar, Y.C.; Cui, S. Communication-efficient federated learning. Proceedings of the National Academy of Sciences 2021, 118, e2024789118. [CrossRef]
  4. Zhu, H.; Xu, J.; Liu, S.; Jin, Y. Federated learning on non-IID data: A survey. Neurocomputing 2021, 465, 371–390. [CrossRef]
  5. Zhou, X.; Yang, G. Communication-efficient and privacy-preserving large-scale federated learning counteracting heterogeneity. Information Sciences 2024, 661, 120167. [CrossRef]
  6. Mohammadi, N.; Bai, J.; Fan, Q.; Song, Y.; Yi, Y.; Liu, L. Differential privacy meets federated learning under communication constraints. IEEE Internet of Things Journal 2021, 9, 22204–22219. [CrossRef]
  7. Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Foundations and trends® in machine learning 2021, 14, 1–210. [CrossRef]
  8. Wei, K.; Li, J.; Ma, C.; Ding, M.; Wei, S.; Wu, F.; Chen, G.; Ranbaduge, T. Vertical federated learning: Challenges, methodologies and experiments. arXiv preprint arXiv:2202.04309 2022.
  9. Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 2019, 10, 1–19. [CrossRef]
  10. Li, J.; Wei, H.; Liu, J.; Liu, W. FSLEdge: An energy-aware edge intelligence framework based on Federated Split Learning for Industrial Internet of Things. Expert Systems with Applications 2024, 255, 124564. [CrossRef]
  11. Khan, L.U.; Pandey, S.R.; Tran, N.H.; Saad, W.; Han, Z.; Nguyen, M.N.; Hong, C.S. Federated learning for edge networks: Resource optimization and incentive mechanism. IEEE Communications Magazine 2020, 58, 88–93. [CrossRef]
  12. Rieke, N.; Hancox, J.; Li, W.; Milletari, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. NPJ digital medicine 2020, 3, 119. [CrossRef] [PubMed]
  13. Wu, Z.; Hou, J.; He, B. Vertibench: Advancing feature distribution diversity in vertical federated learning benchmarks. arXiv preprint arXiv:2307.02040 2023.
  14. Khan, A.; ten Thij, M.; Wilbik, A. Communication-efficient vertical federated learning. Algorithms 2022, 15, 273. [CrossRef]
  15. Cheng, K.; Fan, T.; Jin, Y.; Liu, Y.; Chen, T.; Papadopoulos, D.; Yang, Q. Secureboost: A lossless federated learning framework. IEEE intelligent systems 2021, 36, 87–98. [CrossRef]
  16. Li, Z.; Wang, T.; Li, N. Differentially private vertical federated clustering. arXiv preprint arXiv:2208.01700 2022.
  17. Zhao, F.; Li, Z.; Ren, X.; Ding, B.; Yang, S.; Li, Y. VertiMRF: Differentially Private Vertical Federated Data Synthesis. In Proceedings of the Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 4431–4442.
  18. Mazzone, F.; Brown, T.; Kerschbaum, F.; Wilson, K.H.; Everts, M.; Hahn, F.; Peter, A. Privacy-Preserving Vertical K-Means Clustering. arXiv preprint arXiv:2504.07578 2025.
  19. Li, C.; Ding, S.; Xu, X.; Guo, L.; Ding, L.; Wu, X. Vertical Federated Density Peaks Clustering under Nonlinear Mapping. IEEE Transactions on Knowledge and Data Engineering 2024. [CrossRef]
  20. Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics 2020, 9, 1295. [CrossRef]
  21. Han, J.; Kamber, M.; Mining, D. Concepts and techniques. Morgan kaufmann 2006, 340, 94104–103205.
  22. Mary, S.S.; Selvi, T. A study of K-means and cure clustering algorithms. Int. J. Eng. Res. Technol 2014, 3, 1985–1987.
  23. GAO, Y.; XIE, Y.; DENG, H.; ZHU, Z.; ZHANG, Y. A Privacy-preserving Data Alignment Framework for Vertical Federated Learning. J. Electron. Inf. Technol. 2024, 46, 3419–3427.
  24. Yang, L.; Chai, D.; Zhang, J.; Jin, Y.; Wang, L.; Liu, H.; Tian, H.; Xu, Q.; Chen, K. A survey on vertical federated learning: From a layered perspective. arXiv preprint arXiv:2304.01829 2023.
  25. Liu, Y.; Kang, Y.; Zou, T.; Pu, Y.; He, Y.; Ye, X.; Ouyang, Y.; Zhang, Y.Q.; Yang, Q. Vertical federated learning: Concepts, advances, and challenges. IEEE Transactions on Knowledge and Data Engineering 2024, 36, 3615–3634. [CrossRef]
  26. Zhao, Z.; Mao, Y.; Liu, Y.; Song, L.; Ouyang, Y.; Chen, X.; Ding, W. Towards efficient communications in federated learning: A contemporary survey. Journal of the Franklin Institute 2023, 360, 8669–8703. [CrossRef]
  27. Yang, H.; Liu, H.; Yuan, X.; Wu, K.; Ni, W.; Zhang, J.A.; Liu, R.P. Synergizing Intelligence and Privacy: A Review of Integrating Internet of Things, Large Language Models, and Federated Learning in Advanced Networked Systems. Applied Sciences 2025, 15, 6587. [CrossRef]
  28. Zhang, C.; Li, S. State-of-the-art approaches to enhancing privacy preservation of machine learning datasets: A survey. arXiv preprint arXiv:2404.16847 2024.
  29. Qi, Z.; Meng, L.; Li, Z.; Hu, H.; Meng, X. Cross-Silo Feature Space Alignment for Federated Learning on Clients with Imbalanced Data 2025.
  30. Hu, K.; Xiang, L.; Tang, P.; Qiu, W. Feature norm regularized federated learning: utilizing data disparities for model performance gains. In Proceedings of the Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024, pp. 4136–4146.
  31. Aramian, A. Managing Feature Diversity: Evaluating Global ModelReliability in FederatedLearning for Intrusion Detection Systems in IoT, 2024.
  32. Johnson, A. A Survey of Recent Advances for Tackling Data Heterogeneity in Federated Learning 2025.
  33. Jegou, H.; Douze, M.; Schmid, C. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 2010, 33, 117–128. [CrossRef]
  34. Konečnỳ, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 2016.
  35. Yue, K.; Jin, R.; Wong, C.W.; Baron, D.; Dai, H. Gradient obfuscation gives a false sense of security in federated learning. In Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23), 2023, pp. 6381–6398.
  36. Ge, T.; He, K.; Ke, Q.; Sun, J. Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence 2013, 36, 744–755. [CrossRef] [PubMed]
  37. Xiao, S.; Liu, Z.; Shao, Y.; Lian, D.; Xie, X. Matching-oriented product quantization for ad-hoc retrieval. arXiv preprint arXiv:2104.07858 2021.
  38. Deisenroth, M.P.; Faisal, A.A.; Ong, C.S. Mathematics for machine learning; Cambridge University Press, 2020.
  39. Izenman, A.J. Linear Dimensionality Reduction. Springer New York 2013.
  40. Vadhan, S.; Zhang, W. Concurrent Composition Theorems for Differential Privacy 2022.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated