In federated learning (FL), clients train models locally without sharing raw data, ensuring data privacy. In particular, federated distillation transfers knowledge to clients regardless of the model architecture. However, when groups of clients with different data distributions exist, sharing the same knowledge among all clients becomes impractical. To address this issue, this paper presents an approach that clusters clients based on the output of a client model trained using their own data. The clients are clustered based on the predictions of their models for each label on a public dataset. Prior knowledge of the number of groups is not required. Evaluations on the MNIST dataset showed that our method accurately identified these group structures and improved accuracy by 15–75% compared with traditional federated distillation algorithms when distinct group structures were present. Additionally, we observed significant performance improvements in smaller client groups, bringing us closer to fair FL.