Submitted:
19 June 2024
Posted:
20 June 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. K-Means++ System Analysis
- -
- X represents the set of data points in the complex system.
- -
- C represents the set of centroids chosen during initialization.
- -
- For each x in X, D(x) is defined as the minimum squared Euclidean distance to the nearest centroid in C.
- -
- The K-means++ algorithm aims to minimize squared distances between data points and their assigned centroids.
- -
- W(C) signifies this objective function for a centroid set C.
- -
- The iterative centroid updates aim at minimizing W(C).
- -
- denotes the centroid closest to data point
- -
- indicates the Euclidean distance.
- -
- The K-means++ algorithm employs a strategic centroid initialization technique, resulting in a network structure reminiscent of a complete graph.
- -
- This structure is denoted as G, with nodes representing centroids, data points, and edges symbolizing connections.
3. Demonstration Analysis: A Case Study on Power Grid
| Algorithm 1 k-means++ algorithm |
|
Input: Network with buses, range of k values for silhouette analysis Step 1: Initialization and Centroid Selection (K-means++) Initialize randomly distinct centroids: , , ..., . Step 2: k-means++ Clustering Repeat until convergence: a. For each bus in the network: Calculate the Euclidean distance between and each centroid . Assign bus to the cluster with the closest centroid. b. For each cluster : Recalculate the centroid using the formula: . Step 3: Silhouette Value Analysis For each value of in the specified range: a. For each bus in each cluster : Calculate the average closeness using AED measure based on tie-lines. b. For each bus in each cluster : Calculate the minimum of average closeness concerning other clusters . c. For each bus in each cluster : Calculate the silhouette coefficient for each cluster using and . Calculate the average silhouette coefficient for all buses in each cluster . d. Calculate the average silhouette coefficient for all clusters in this value of . Step 4: Selecting Optimal k based on Silhouette Analysis Choose the value of k that maximizes the average silhouette coefficient , indicating the optimal cluster arrangement. Step 5: Final Clusters and Centroids Apply the k-means++ algorithm with the optimal to obtain the final clusters and centroids. Output: Optimal clusters of buses, the value of determined by silhouette analysis This combined algorithm integrates the k-means++ clustering algorithm with the silhouette value analysis to determine the optimal number of clusters and achieve accurate clustering results. The k-means++ algorithm initializes centroids and refines cluster assignments. At the same time, the silhouette analysis assesses the quality of clusters. It identifies the most appropriate value of for optimal Clustering. The final clusters and centroids are obtained using the selected optimal value. |
3.1. Case Study on IEEE 39-Bus System
3.2. Case Study on IEEE 300-Bus System
3.3. K-Means++ Performance with Limited Data Availability
3.4. K-Means++ Performance in Presence of Noise
4. Further Analysis on K-Means++ Robustness to Approximate System Dynamics
5. Conclusions
- -
- showcases the algorithm's prowess in identifying critical components,
- -
- sheds light on its ability to approximate complex systems' intricate dynamics,
- -
- demonstrated the resilience of K-means++ performance to the noise,
- -
- examined the practical potential of the incorporating K-means++ in real-time application with limited available data,
- -
- provide a demonstration through a case study, centered on the power system, of this transformative potential,
- -
- further fortified by additional performance metrics.
Authors' Contributions
Funding
Availability of data and materials
Acknowledgments
Competing interests
References
- Fujimoto, Richard, Conrad Bock, Wei Chen, Ernest Page, and Jitesh H. Panchal, eds. Research challenges in modeling and simulation for engineering complex systems. Cham, Switzerland: Springer International Publishing, 2017.
- Hempel, Stefan, Aneta Koseska, Jürgen Kurths, and Zora Nikoloski. "Inner composition alignment for inferring directed networks from short time series." Physical review letters 107, no. 5 (2011): 054101. [CrossRef]
- Liu, Hui, Jun-An Lu, Jinhu Lü, and David J. Hill. "Structure identification of uncertain general complex dynamical networks with time delay." Automatica 45, no. 8 (2009): 1799-1807. [CrossRef]
- Han, Xiao, Zhesi Shen, Wen-Xu Wang, and Zengru Di. "Robust reconstruction of complex networks from sparse data." Physical review letters 114, no. 2 (2015): 028701. [CrossRef]
- Zhang, Yichi, Chunhua Yang, Keke Huang, Marko Jusup, Zhen Wang, and Xuelong Li. "Reconstructing heterogeneous networks via compressive sensing and clustering." IEEE Transactions on Emerging Topics in Computational Intelligence 5, no. 6 (2020): 920-930. [CrossRef]
- Kim, M.W., Kim, K.T. and Youn, H.Y., 2019, December. Node Clustering Based on Feature Correlation and Maximum Entropy for WSN. In 2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP) (pp. 184-191). IEEE.
- Sharma, D., Thulasiraman, K., Wu, D. and Jiang, J.N., 2019. A network science-based k-means++ clustering method for power systems network equivalence. Computational Social Networks, 6, pp.1-25. [CrossRef]
- Deshpande, A., Kacham, P. and Pratap, R., 2020, August. Robust $ k $-means++. In Conference on Uncertainty in Artificial Intelligence (pp. 799-808). PMLR.
- Wan, L., Zhang, G., Li, H. and Li, C., 2021. A novel bearing fault diagnosis method using spark-based parallel ACO-K-Means clustering algorithm. IEEE Access, 9, pp.28753-28768. [CrossRef]
- Xiong, H., Wu, J. and Chen, J., 2006, August. K-means clustering versus validation measures: a data distribution perspective. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 779-784).
- Anderson, B.D. and Vongpanitlerd, S., 2013. Network analysis and synthesis: a modern systems theory approach. Courier Corporation.
- Jabi, M., Pedersoli, M., Mitiche, A. and Ayed, I.B., 2019. Deep clustering: On the link between discriminative models and k-means. IEEE transactions on pattern analysis and machine intelligence, 43(6), pp.1887-1896. [CrossRef]
- Mallat, Stéphane G., and Zhifeng Zhang. "Matching pursuits with time-frequency dictionaries." IEEE Transactions on signal processing 41, no. 12 (1993): 3397-3415. [CrossRef]
- Makarychev, K., Reddy, A. and Shan, L., 2020. Improved guarantees for k-means++ and k-means++ parallel. Advances in Neural Information Processing Systems, 33, pp.16142-16152.
- Huang, R., Chen, Y., Yin, T., Li, X., Li, A., Tan, J., Yu, W., Liu, Y. and Huang, Q., 2021. Accelerated derivative-free deep reinforcement learning for large-scale grid emergency voltage control. IEEE Transactions on Power Systems, 37(1), pp.14-25. [CrossRef]
- Rafiq, M.N., Sharma, D., Wu, D., Jiang, J.N. and Kang, C., 2017, September. Average electrical distance-based bus clustering method for network equivalence. In 2017 19th International Conference on Intelligent System Application to Power Systems (ISAP) (pp. 1-6). IEEE.
- Shahapure, K.R. and Nicholas, C., 2020, October. Cluster quality analysis using silhouette score. In 2020 IEEE 7th international conference on data science and advanced analytics (DSAA) (pp. 747-748). IEEE.
- Petrovic, S., 2006, October. A comparison between the silhouette index and the davies-bouldin index in labelling ids clusters. In Proceedings of the 11th Nordic workshop of secure IT systems (Vol. 2006, pp. 53-64). Citeseer.
- Meilă, M., 2003, August. Comparing clusterings by the variation of information. In Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings (pp. 173-187). Berlin, Heidelberg: Springer Berlin Heidelberg.
- Meng, J., Yu, Z., Cai, Y. and Wang, X., 2023. K-Means++ Clustering Algorithm in Categorization of Glass Cultural Relics. Applied Sciences, 13(8), p.4736. [CrossRef]
- Arthur, D. and Vassilvitskii, S., 2007, January. K-means++ the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027-1035).
- Drineas, P., Frieze, A., Kannan, R., Vempala, S. and Vinay, V., 2004. Clustering large graphs via the singular value decomposition. Machine learning, 56, pp.9-33. [CrossRef]
- Ran, D., Jiaxin, H. and Yuzhe, H., 2020, June. Application of a Combined Model based on K-means++ and XGBoost in Traffic Congestion Prediction. In 2020 5th International Conference on Smart Grid and Electrical Automation (ICSGEA) (pp. 413-418). IEEE.
- Gao, M., Pan, S., Chen, S., Li, Y., Pan, N., Pan, D. and Shen, X., 2021. Identification method of electrical load for electrical appliances based on K-Means++ and GCN. IEEE Access, 9, pp.27026-27037. [CrossRef]
- Luc Gérin-Lajoie. IEEE PES Task Force on Benchmark Systems for Stability Controls[R]. EMTP-RV 39-bus system, Version 1.5 - Mars 04, 2015.
- Grigg, C., Wong, P., Albrecht, P., Allan, R., Bhavaraju, M., Billinton, R., Chen, Q., Fong, C., Haddad, S., Kuruganty, S. and Li, W., 1999. The IEEE reliability test system-1996. A report prepared by the reliability test system task force of the application of probability methods subcommittee. IEEE Transactions on power systems, 14(3), pp.1010-1020. [CrossRef]
- Cayton, Lawrence. Algorithms for manifold learning. eScholarship, University of California, 2008.
- Belkin, Mikhail, and Partha Niyogi. "Laplacian eigenmaps and spectral techniques for embedding and clustering." Advances in neural information processing systems 14 (2001).
- Zemel, Richard, and Miguel Carreira-Perpiñán. "Proximity graphs for clustering and manifold learning." Advances in neural information processing systems 17 (2004).
- Cohen, Israel, Yiteng Huang, Jingdong Chen, Jacob Benesty, Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. "Pearson correlation coefficient." Noise reduction in speech processing (2009): 1-4.
- López-Caraballo, Carlos Hugo, Juan A. Lazzús, Ignacio Salfate, Pedro Rojas, Marco Rivera, and Luis Palma-Chilla. "Impact of noise on a dynamical system: Prediction and uncertainties from a swarm-optimized neural network." Computational Intelligence and Neuroscience 2015 (2015): 74-74. [CrossRef]







| Slack bus 30 | Slack bus 39 | ||
|---|---|---|---|
|
Cluster Name |
Buses in Clusters |
Cluster Name |
Buses in Clusters |
| 1 | 26, 28, 29, 38 | 1 | 26, 28, 29, 38 |
| 2 | 16, 19, 20, 21, 22, 23, 24, 33, 34, 35, 36 | 2 | 16, 19, 20, 21, 22, 23, 24, 33, 34, 35, 36 |
| 3 | 10, 11, 12, 13, 32 | 3 | 10, 11, 12, 13, 32 |
| 4 | 14, 15, 17, 27 | 4 | 14, 15, 17, 27 |
| 5 | 25, 30, 37, 1, 2 | 5 | 25, 30, 37, 1, 2 |
| 6 | 3, 4, 5, 9, 18, 39 | 6 | 3, 4, 5, 9, 18, 39 |
| 7 | 6, 7, 8, 31 | 7 | 6, 7, 8, 31 |
| Case# | Tie-line combination | Original Network flow (MW) |
K-means++ equivalent network | |
|---|---|---|---|---|
| Flow (MW) | Deviation (%) | |||
| 1 | TL25–26/TL17–18 | 41.30 | 41.34 | 0.10 |
| 2 | TL25–26/TL4–14 | 35.78 | 35.75 | 0.09 |
| 3 | TL25–26/TL6–11 | 41.50 | 40.49 | 2.43 |
| 4 | TL4–14/TL6–11 | 32.81 | 32.77 | 0.12 |
| Case# | Tie-line combination | Original Network flow (MW) |
K-means++ equivalent network | |
|---|---|---|---|---|
| Flow (MW) | Deviation (%) | |||
| 1 | TL19–87/TL4–16 | 1051.79 | 1032.31 | 1.85 |
| 2 | TL19–87/TL62–144 | 55.34 | 57.93 | 4.68 |
| 3 | TL8–14/TL62–144 | 450.86 | 487.01 | 8.02 |
| 4 | TL4–16/TL62–144 | 994.46 | 973.19 | 2.14 |
| Case# | Tie-line combination | Original Network flow (MW) |
K-means++ equivalent network Flow (MW) | ||
|---|---|---|---|---|---|
| Full data | 0.8 data | 0.5 data | |||
| 1 | TL19–87/TL4–16 | 1051.79 | 1032.31 | 1032.27 | 1031.13 |
| 2 | TL19–87/TL62–144 | 55.34 | 57.93 | 57.84 | 57.39 |
| 3 | TL8–14/TL62–144 | 450.86 | 487.01 | 486.55 | 486.02 |
| 4 | TL4–16/TL62–144 | 994.46 | 973.19 | 973.01 | 972.88 |
| Case# | Tie-line combination | Original Network flow (MW) |
K-means++ equivalent network Flow (MW) | |
|---|---|---|---|---|
| Without Noise | With Noise |
|||
| 1 | TL19–87/TL4–16 | 1051.79 | 1032.31 | 1034.33 |
| 2 | TL19–87/TL62–144 | 55.34 | 57.93 | 57.82 |
| 3 | TL8–14/TL62–144 | 450.86 | 487.01 | 490.03 |
| 4 | TL4–16/TL62–144 | 994.46 | 973.19 | 980.75 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).