Submitted:
31 March 2026
Posted:
01 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Materials and Methods
3.1. Dataset and Data Pre-Processing
- Geospatial features, comprising latitude, longitude, neighborhood, and administrative zone, enable the granular analysis of water consumption patterns across different regions within the city.
- Socioeconomic attributes support the analysis of the relationship between economic status and water usage in terms of the stratum (a classification system in Colombia that categorizes households based on income and living conditions) and corresponding water tariffs ().
- Hydraulic sector identification defines the node within the water distribution network to which a household is assigned, providing information about the infrastructure and water distribution practices.
- Monthly water consumption data with records spanning six years, from 2017 to early 2023, yielding a timeseries with 78 time instants (months). The average water consumption values within the analysis period is , with a variance of . Along with consumptions values in the interval , above statistics indicate moderate dispersion and the presence of extreme values, likely influencing conventional clustering approaches.
3.2. Simple Graph Learning with Adaptive Clustering (Simple GLAC)
3.3. Experimental Framework
- :
- The Geospatial graph, in Figure 1a, connects households based on spatial proximity using a k-nearest neighbors (KNN) scheme, where denotes geographic coordinates. The number of neighbors is set to to ensure a sufficiently dense graph, enabling the capture of local spatial correlations independently of administrative or infrastructural constraints. Since this graph lacks an explicit grouping variable, Figure 1 illustrates only pairwise geographic relationships rather than aggregated consumption patterns.
- :
- In the Territorial graph, illustrated in Figure 1b, households are grouped according to administrative divisions (29 communes), such that and edges are restricted within each region (). This structure introduces explicit diffusion barriers aligned with urban governance boundaries (see Figure 1b). Further, the initial statistics (scatterplot in Figure 1b) reveal substantial spatial heterogeneity in consumption across zones, with average values ranging from approximately to . High-density communes (e.g., C2, C24, C11) concentrate around , forming a stable baseline of urban demand, while smaller or peripheral zones exhibit greater variability. These patterns highlight that consumption is strongly conditioned by administrative context, which is not captured by traditional clustering methods operating on independent time series. Therefore, the graph formulation must enable the integration of governance-aware structure into the clustering process, transforming purely statistical groupings into administratively meaningful profiles.
- :
- The Socioeconomic graph connects households within the same socioeconomic stratum, defined by the Colombian six-level classification system (, ), under the assumption that income-related factors influence water consumption behavior (see Figure 1c). The consumption distribution (right panel) indicates a non-monotonic relationship between socioeconomic level and consumption. While Stratum 6 exhibits the highest mean consumption () and large variance, indicating heterogeneity associated with high-income households, Stratum 1 also shows relatively elevated consumption (), likely driven by higher population density. In contrast, intermediate strata (4–5) display lower average consumption (), suggesting more efficient usage patterns. This irregular structure represents a key challenge for clustering, as consumption does not follow a simple socioeconomic gradient. Consequently, the effectiveness of each method depends on its ability to translate these heterogeneous patterns into coherent clusters through graph diffusion.
- :
- The Hydraulic sector graph encodes connectivity based on the water distribution network, shown in Figure 1d, linking households according to their associated pipeline sector (see Figure 1d). This results in a structurally homogeneous topology in which nodes are grouped by shared infrastructure. Consumption statistics (scatterplot on the right) indicate relatively small differences in average consumption across sectors, with most values clustered around the citywide mean (). For instance, sector HS4 shows the highest consumption () and variability, suggesting heterogeneous land use or infrastructure conditions, while HS5 exhibits the lowest consumption and variance, indicating more uniform demand. The remaining sectors (HS1–HS3) lie within a narrow range (), reflecting limited inter-group contrast. This infrastructure-based topology introduces a diffusion regime dominated by homogenization, where strong connectivity tends to smooth consumption differences. As a result, the main challenge is not identifying large-scale variations, but preserving meaningful deviations from the mean while maintaining coherent clusters.
4. Results and Discussion
4.1. Quantitative Performance Results
| Graph structure | Method | SS | CHS | DBS | ID |
|---|---|---|---|---|---|
| – | K-means | 0.29 | 1908.11 | 1.36 | 109.20 |
| Hierarchical | 0.33 | 1735.70 | 1.37 | 109.00 | |
| GMM | 0.48 | 728.02 | 2.36 | 141.37 | |
| GConv-GMM | 0.30 | 154.56 | 2.40 | 1.08 | |
| Cluster GCN | 0.18 | 66.32 | 5.49 | 1.96 | |
| AE-G-GMM | 0.01 | 16.73 | 17.97 | 9.96 | |
| GLAC-GCN | 0.20 | 1307.02 | 1.61 | 23.28 | |
| Simple GLAC | 0.43 | 124141.04 | 0.53 | 26.17 | |
| GConv-GMM | 0.30 | 323.48 | 2.21 | 0.16 | |
| Cluster GCN | 0.79 | 3319.40 | 1.20 | 1.37 | |
| AE-G-GMM | 0.21 | 236.63 | 5.38 | 64.92 | |
| GLAC-GCN | 0.31 | 1833.31 | 1.20 | 10.22 | |
| Simple GLAC | 0.40 | 16300.48 | 1.11 | 10.17 | |
| GConv-GMM | 0.40 | 1840.50 | 1.24 | 3.45 | |
| Cluster GCN | 0.50 | 3178.50 | 0.82 | 8.42 | |
| AE-G-GMM | –0.01 | 34.26 | 15.49 | 23.72 | |
| GLAC-GCN | 0.49 | 2624.86 | 0.99 | 3.99 | |
| Simple GLAC | 0.88 | 75477.18 | 0.33 | 13.55 | |
| GConv-GMM | 0.87 | 5321.70 | 0.51 | 0.01 | |
| Cluster GCN | 0.67 | 8691.90 | 0.47 | 1.88 | |
| AE-G-GMM | –0.02 | 108.59 | 12.27 | 28.56 | |
| GLAC-GCN | 0.64 | 5582.63 | 0.62 | 4.96 | |
| Simple GLAC | 0.86 | 16451.63 | 0.41 | 32.24 |
| Graph Structure | Data | GConv-GMM | Cluster GCN | AE-G-GMM | GLAC-GCN | Simple Glac | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Clusters | C1 | C2 | C3 | C1 | C2 | C3 | C1 | C2 | C3 | C1 | C2 | C3 | C1 | C2 | C3 | |
| Users | 1015 | 2744 | 831 | 842 | 322 | 3426 | 2646 | 1263 | 681 | 64 | 4173 | 353 | 60 | 1913 | 2617 | |
| Mean () | 11.47 | 11.55 | 12.22 | 11.46 | 11.57 | 13.01 | 11.04 | 12.25 | 12.60 | 10.08 | 11.54 | 13.29 | 10.15 | 11.37 | 12.09 | |
| Variance | 71.01 | 58.34 | 82.67 | 57.32 | 63.56 | 87.93 | 61.07 | 69.44 | 87.07 | 108.12 | 60.79 | 111.70 | 113.45 | 65.34 | 64.12 | |
| IC | 21.24 | 0.57 | 1.58 | 12.56 | 10.52 | 10.21 | 4385.72 | 4989.97 | 6250.47 | 139.34 | 27.51 | 36.32 | 8.54 | 0.26 | 0.25 | |
| Users | 2405 | 608 | 1577 | 3811 | 43 | 736 | 3745 | 796 | 49 | 528 | 2575 | 1487 | 4036 | 77 | 477 | |
| Mean () | 11.45 | 11.47 | 12.04 | 11.53 | 11.86 | 12.32 | 10.60 | 15.71 | 21.92 | 11.27 | 11.66 | 11.79 | 11.46 | 11.95 | 13.26 | |
| Variance | 54.56 | 71.83 | 79.86 | 57.70 | 139.64 | 101.76 | 53.29 | 92.79 | 301.66 | 72.48 | 61.30 | 70.59 | 58.20 | 57.32 | 126.84 | |
| IC | 0.08 | 0.02 | 0.10 | 0.00 | 1.72 | 0.00 | 3827.12 | 619.53 | 21056.55 | 8.43 | 28.38 | 33.02 | 0.86 | 1.67 | 24.53 | |
| Users | 645 | 1630 | 2315 | 1332 | 2748 | 210 | 1542 | 356 | 2692 | 784 | 1936 | 1870 | 4171 | 182 | 237 | |
| Mean () | 11.73 | 11.68 | 12.11 | 11.29 | 11.43 | 13.11 | 11.34 | 11.68 | 14.73 | 11.21 | 11.72 | 11.82 | 11.53 | 11.90 | 13.77 | |
| Variance | 66.83 | 65.26 | 67.32 | 60.12 | 61.30 | 103.47 | 62.33 | 58.65 | 103.52 | 52.12 | 73.79 | 70.32 | 61.51 | 54.49 | 141.71 | |
| IC | 8.95 | 3.32 | 2.45 | 10.63 | 15.26 | 6.77 | 7576.36 | 4500.43 | 4823.17 | 2.84 | 2.83 | 6.00 | 0.53 | 25.37 | 20.55 | |
| Users | 4475 | 104 | 11 | 991 | 3537 | 62 | 1804 | 2043 | 743 | 936 | 236 | 3418 | 448 | 4109 | 33 | |
| Mean () | 11.65 | 11.97 | 12.33 | 11.27 | 11.75 | 12.51 | 10.63 | 11.08 | 15.45 | 11.31 | 11.62 | 11.78 | 11.38 | 11.73 | 11.76 | |
| Variance | 65.25 | 81.90 | 58.52 | 69.08 | 63.95 | 101.67 | 54.99 | 65.33 | 87.08 | 59.24 | 70.98 | 67.41 | 60.03 | 65.53 | 145.42 | |
| IC | 8.32 | 4.01 | 0.01 | 0.15 | 0.03 | 0.14 | 3947.84 | 4681.10 | 6224.13 | 0.66 | 18.47 | 0.11 | 27.15 | 0.37 | 7.17 | |
4.2. Qualitative Performance Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Use of Artificial Intelligence
Acknowledgments
Conflicts of Interest
References
- Liu, H.; Xing, R.; Davies, E.G. Forecasting municipal water demands: Evaluating the impacts of population growth, climate change, and conservation policies on water end-use. Sustainable Cities and Society 2025, 130, 106581. [Google Scholar] [CrossRef]
- Yerbury, L.W.; Campello, R.J.; Livingston, G., Jr.; Goldsworthy, M.; O’Neil, L. Comparing clustering approaches for smart meter time series: Investigating the influence of dataset properties on performance. Applied Energy 2025, 391, 125811. [Google Scholar] [CrossRef]
- Jimenez-Castaño, C.; Álvarez Meza, A.; Cárdenas-Peña, D.; Orozco-Gutíerrez, A.; Guerrero-Erazo, J. twin support vector machines for imbalanced data classification. Pattern Recognition Letters 2024, 182, 39–45. [Google Scholar] [CrossRef]
- Wang, R.; Zhao, X.; Qiu, H.; Cheng, X.; Liu, X. Uncovering urban water consumption patterns through time series clustering and entropy analysis. Water Research 2024, 262, 122085. [Google Scholar] [CrossRef] [PubMed]
- Qiao, J.; Shen, K.; Xiao, W.; Tang, J.; Chen, Y.; Xu, J. Integrating Graph Data Models in Advanced Water Resource Management: A New Paradigm for Complex Hydraulic Systems. Water 2025, 17. [Google Scholar] [CrossRef]
- Khemani, B.; Patil, S.; Kotecha, K.; Tanwar, S. A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. Journal of Big Data 2024, 11, 18. [Google Scholar] [CrossRef]
- Huang, Z.; Tang, Y.; Chen, Y. A graph neural network-based node classification model on class-imbalanced graph data. Knowledge-Based Systems 2022, 244, 108538. [Google Scholar] [CrossRef]
- Zhang, X.; Xie, X.; Kang, Z. Graph Learning for Attributed Graph Clustering. Mathematics 2022, 10. [Google Scholar] [CrossRef]
- Cárdenas-Peña, D.; Collazos-Huertas, D.; Álvarez Meza, A.; Castellanos-Dominguez, G. Supervised kernel approach for automated learning using General Stochastic Networks. Engineering Applications of Artificial Intelligence 2018, 68, 10–17. [Google Scholar] [CrossRef]
- Jin, X.; Han, J. K-Means Clustering. In Encyclopedia of Machine Learning; Springer US: Boston, MA, 2010; pp. 563–564. [Google Scholar] [CrossRef]
- Bar-Joseph, Z.; Gifford, D.K.; Jaakkola, T.S. Fast optimal leaf ordering for hierarchical clustering. Bioinformatics 2001, 17, S22–S29. Available online: https://academic.oup.com/bioinformatics/article-pdf/17/suppl_1/S22/50522365/bioinformatics_17_suppl1_s22.pdf. [CrossRef]
- Reynolds, D. Gaussian Mixture Models. In Encyclopedia of Biometrics; Springer US: Boston, MA, 2009; pp. 659–663. [Google Scholar] [CrossRef]
- Bermejo-Martín, G.; Rodríguez-Monroy, C.; Núñez-Guerrero, Y.M. Design Thinking for Urban Water Sustainability in Huelva’s Households: Needfinding and Synthesis through Statistic Clustering. Sustainability 2020, 12. [Google Scholar] [CrossRef]
- Ioannou, A.E.; Creaco, E.F.; Laspidou, C.S. Exploring the Effectiveness of Clustering Algorithms for Capturing Water Consumption Behavior at Household Level. Sustainability 2021, 13. [Google Scholar] [CrossRef]
- Candelieri, A. Clustering and Support Vector Regression for Water Demand Forecasting and Anomaly Detection. Water 2017, 9. [Google Scholar] [CrossRef]
- Silva, M.G.; Madeira, S.C.; Henriques, R. Water Consumption Pattern Analysis Using Biclustering: When, Why and How. Water 2022, 14. [Google Scholar] [CrossRef]
- Padulano, R.; Del Giudice, G. A Mixed Strategy Based on Self-Organizing Map for Water Demand Pattern Profiling of Large-Size Smart Water Grid Data. Water Resources Management 2018, 32, 3671–3685. [Google Scholar] [CrossRef]
- Tsitsulin, A.; Palowitch, J.; Perozzi, B.; Müller, E. Graph clustering with graph neural networks. Journal of Machine Learning Research 2023, 24, 1–21. [Google Scholar]
- Hou, M.; Xia, F.; Gao, H.; Chen, X.; Chen, H. Urban region profiling with spatio-temporal graph neural networks. IEEE Transactions on Computational Social Systems 2022, 9, 1736–1747. [Google Scholar] [CrossRef]
- Jia, Z.; Li, H.; Yan, J.; Sun, J.; Han, C.; Qu, J. Dynamic Graph Convolution-Based Spatio-Temporal Feature Network for Urban Water Demand Forecasting. Applied Sciences 2023, 13, 10014. [Google Scholar] [CrossRef]
- Tang, J.; Xia, L.; Huang, C. Explainable Spatio-Temporal Graph Neural Networks. In Proceedings of the Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023; pp. 2432–2441. [Google Scholar]
- Wang, B.; Luo, X.; Zhang, F.; Yuan, B.; Bertozzi, A.L.; Brantingham, P.J. Graph-based deep modeling and real time forecasting of sparse spatio-temporal data. arXiv 2018, arXiv:1804.00684. [Google Scholar]
- Ma, F.; Liu, F.; Li, W. Jet tagging algorithm of graph network with Haar pooling message passing. Physical Review D 2023, 108, 072007. [Google Scholar] [CrossRef]
- Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI open 2020, 1, 57–81. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
- Chiang, W.L.; Liu, X.; Si, S.; Li, Y.; Bengio, S.; Hsieh, C.J. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In Proceedings of the Proceedings of the 25th ACM International Conference on Knowledge Discovery; Data Mining. ACM, jul 2019, KDD ’19. [Google Scholar] [CrossRef]
- Wang, C.; Pan, S.; Hu, R.; Long, G.; Jiang, J.; Zhang, C. Attributed Graph Clustering: A Deep Attentional Embedding Approach. In Proceedings of the International Joint Conference on Artificial Intelligence, 2019. [Google Scholar]
- Daneshfar, F.; Soleymanbaigi, S.; Yamini, P.; Amini, M.S. A survey on semi-supervised graph clustering. Engineering Applications of Artificial Intelligence 2024, 133, 108215. [Google Scholar] [CrossRef]
- Monti, F.; Boscaini, D.; Masci, J.; Rodolà, E.; Svoboda, J.; Bronstein, M.M. Geometric deep learning on graphs and manifolds using mixture model CNNs. arXiv 2016, arXiv:1611.08402. [Google Scholar] [CrossRef]
- Wang, C.; Pan, S.; Long, G.; Zhu, X.; Jiang, J. Mgae: Marginalized graph autoencoder for graph clustering. In Proceedings of the Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017; pp. 889–898. [Google Scholar]
- Mrabah, N.; Bouguessa, M.; Touati, M.F.; Ksantini, R. Rethinking graph auto-encoder models for attributed graph clustering. IEEE Transactions on Knowledge and Data Engineering 2022, 35, 9037–9053. [Google Scholar] [CrossRef]
- Xu, Y.K.; Huang, D.; Wang, C.D.; Lai, J.H. GLAC-GCN: global and local topology-aware contrastive graph clustering network. IEEE Transactions on Artificial Intelligence, 2024. [Google Scholar]
- Yang, L.; Yang, R.; Zuo, Z.; Kwan, M.P.; Zhou, S. Graph distance and feature-guided multi-view clustering: A novel method for clustering urban buildings. Transactions in GIS 2023, 27, 2127–2158. [Google Scholar] [CrossRef]
- Maaten, L.; Hinton, G. Visualizing data using t-SNE. Journal of machine learning research 2008, 9, 2579–2605. [Google Scholar]
- Xie, J.; Girshick, R.; Farhadi, A. Unsupervised deep embedding for clustering analysis. In Proceedings of the International conference on machine learning. PMLR, 2016; pp. 478–487. [Google Scholar]
- Petrovic, S. A comparison between the silhouette index and the davies-bouldin index in labelling ids clusters. Proceedings of the Proceedings of the 11th Nordic workshop of secure IT systems. Citeseer 2006, Vol. 2006, 53–64. [Google Scholar]
- Wang, X.; Xu, Y. An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. In Proceedings of the IOP Conference Series: Materials Science and Engineering. IOP Publishing, 2019, Vol. 569, p. 052024.
- Ying, Z.; Bourgeois, D.; You, J.; Zitnik, M.; Leskovec, J. Gnnexplainer: Generating explanations for graph neural networks. Advances in neural information processing systems 2019, 32. [Google Scholar]


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).