Submitted:
22 February 2025
Posted:
24 February 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Materials and Methods
2.1. Abelar pilot basin and available data
2.2. ML methods
2.2.1. Data preprocessing
2.2.2. Cluster Analysis
2.2.3. Times Series Gaussian Process
2.2.4. Model inspection methods
2.2.5. Model validation
3. Results
3.1. CA Results
3.2. TS-GPR Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Stylianoudaki, C.; Trichakis, I.; Karatzas, G.P. Modeling Groundwater Nitrate Contamination Using Artificial Neural Networks. Water 2022, 14, 1173. [Google Scholar] [CrossRef]
- Castrillo, M.; García, Á.L. Estimation of High Frequency Nutrient Concentrations from Water Quality Surrogates Using Machine Learning Methods. Water Research 2020, 172, 115490. [Google Scholar] [CrossRef] [PubMed]
- Xu, J.; Xu, Z.; Kuang, J.; Lin, C.; Xiao, L.; Huang, X.; Zhang, Y. An Alternative to Laboratory Testing: Random Forest-Based Water Quality Prediction Framework for Inland and Nearshore Water Bodies. Water 2021, 13, 3262. [Google Scholar] [CrossRef]
- Rodriguez-Galiano, V.; Mendes, M.P.; Garcia-Soldado, M.J.; Chica-Olmo, M.; Ribeiro, L. Predictive Modeling of Groundwater Nitrate Pollution Using Random Forest and Multisource Variables Related to Intrinsic and Specific Vulnerability: A Case Study in an Agricultural Setting (Southern Spain). Science of The Total Environment 2014, 476–477, 189–206. [Google Scholar] [CrossRef]
- Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–Runoff Modelling Using Long Short-Term Memory (LSTM) Networks. Hydrology and Earth System Sciences 2018, 22, 6005–6022. [Google Scholar] [CrossRef]
- Zhi, W.; Feng, D.; Tsai, W.-P.; Sterle, G.; Harpold, A.; Shen, C.; Li, L. From Hydrometeorology to River Water Quality: Can a Deep Learning Model Predict Dissolved Oxygen at the Continental Scale? Environ. Sci. Technol. 2021, 55, 2357–2368. [Google Scholar] [CrossRef]
- Barzegar, R.; Aalami, M.T.; Adamowski, J. Short-Term Water Quality Variable Prediction Using a Hybrid CNN–LSTM Deep Learning Model. Stoch Environ Res Risk Assess 2020, 34, 415–433. [Google Scholar] [CrossRef]
- Bu, J.; Liu, W.; Pan, Z.; Ling, K. Comparative Study of Hydrochemical Classification Based on Different Hierarchical Cluster Analysis Methods. International Journal of Environmental Research and Public Health 2020, 17, 9515. [Google Scholar] [CrossRef]
- Zhu, Y.; Yang, H.; Xiao, Y.; Hao, Q.; Li, Y.; Liu, J.; Wang, L.; Zhang, Y.; Hu, W.; Wang, J. Identification of Hydrochemical Characteristics, Spatial Evolution, and Driving Forces of River Water in Jinjiang Watershed, China. Water 2024, 16, 45. [Google Scholar] [CrossRef]
- Liu, H.; Yang, J.; Ye, M.; James, S.C.; Tang, Z.; Dong, J.; Xing, T. Using t-Distributed Stochastic Neighbor Embedding (t-SNE) for Cluster Analysis and Spatial Zone Delineation of Groundwater Geochemistry Data. Journal of Hydrology 2021, 597, 126146. [Google Scholar] [CrossRef]
- Khandelwal, A.; Xu, S.; Li, X.; Jia, X.; Stienbach, M.; Duffy, C.; Nieber, J.; Kumar, V. Physics Guided Machine Learning Methods for Hydrology 2020.
- Aris, A.Z.; Praveena, S.M.; Abdullah, M.H.; Radojevic, M. Statistical Approaches and Hydrochemical Modelling of Groundwater System in a Small Tropical Island. Journal of Hydroinformatics 2011, 14, 206–220. [Google Scholar] [CrossRef]
- Fabbrocino, S.; Rainieri, C.; Paduano, P.; Ricciardi, A. Cluster Analysis for Groundwater Classification in Multi-Aquifer Systems Based on a Novel Correlation Index. Journal of Geochemical Exploration 2019, 204, 90–111. [Google Scholar] [CrossRef]
- He, Q.; Barajas-Solano, D.; Tartakovsky, G.; Tartakovsky, A.M. Physics-Informed Neural Networks for Multiphysics Data Assimilation with Application to Subsurface Transport. Advances in Water Resources 2020, 141, 103610. [Google Scholar] [CrossRef]
- Haggerty, R.; Sun, J.; Yu, H.; Li, Y. Application of Machine Learning in Groundwater Quality Modeling - A Comprehensive Review. Water Research 2023, 233, 119745. [Google Scholar] [CrossRef]
- Bui, D.T.; Khosravi, K.; Karimi, M.; Busico, G.; Khozani, Z.S.; Nguyen, H.; Mastrocicco, M.; Tedesco, D.; Cuoco, E.; Kazakis, N. Enhancing Nitrate and Strontium Concentration Prediction in Groundwater by Using New Data Mining Algorithm. Science of The Total Environment 2020, 715, 136836. [Google Scholar] [CrossRef]
- Bhattarai, A.; Dhakal, S.; Gautam, Y.; Bhattarai, R. Prediction of Nitrate and Phosphorus Concentrations Using Machine Learning Algorithms in Watersheds with Different Landuse. Water 2021, 13, 3096. [Google Scholar] [CrossRef]
- Samper, J.; Naves, A.; Pisani, B.; Montenegro, L.; Mon, A.; Fernández, J.; Arias, R.; Piñeiro, R.; Velo, M.; Ameijenda, C. Estudio Hidrogeológico, Vulnerabilidad y Protección de Las Captaciones de Los Suministros Rurales En Abegondo (A Coruña). Congreso Hispano-Luso de Aguas Subterráneas, AIH-GE 2016, doi:335-344.
- Naves, A.; Samper, J.; Mon, A.; Pisani, B.; Montenegro, L.; Carvalho, J.M. Demonstrative Actions of Spring Restoration and Groundwater Protection in Rural Areas of Abegondo (Galicia, Spain). Sustain. Water Resour. Manag. 2019, 5, 175–186. [Google Scholar] [CrossRef]
- Samper, J.; Naves, A.; Pisani, B.; Dafonte, J.; Montenegro, L.; García-Tomillo, A. Sustainability of Groundwater Resources of Weathered and Fractured Schists in the Rural Areas of Galicia (Spain). Environ Earth Sci 2022, 81, 141. [Google Scholar] [CrossRef]
- Soto, B.; Brea, M.A.; Pérez, R.; Díaz-Fierros, F. Influence of 7-Year Old Eucalyptus Globulus Plantation on the Low Flow of a Small Basin. 2005.
- Rodríguez-Suárez, J.A.; Soto, B.; Perez, R.; Diaz-Fierros, F. Influence of Eucalyptus Globulus Plantation Growth on Water Table Levels and Low Flows in a Small Catchment. Journal of Hydrology 2011, 396, 321–326. [Google Scholar] [CrossRef]
- Peel, M.C.; Finlayson, B.L.; McMahon, T.A. Updated World Map of the Köppen-Geiger Climate Classification. Hydrology and Earth System Sciences 2007, 11, 1633–1644. [Google Scholar] [CrossRef]
- Samper, J.; Huguet, L.; García-Vera, M.A.; Ares, J. Manual del usuario del programa VISUAL-BALAN V.1.0: Código interactivo para la realización de balances hidrológicos y la estimación de la recarga. Technical Report for ENRESA 1999, 1. [Google Scholar]
- Samper, J.; Vera, M.A.G.; Pisani, B.; Alvares, D.; Espinha, J.; Varela, A.; Losada, J.A. Using Hydrological Models and Geographic Information Systems for Water Resources Evaluation: GIS-VISUAL-BALAN and Its Application to Atlantic Basins in Spain (Valiñas) and Portugal (Serra Da Estrela). IAHS Publ. 310 2007. [Google Scholar]
- Espinha Marques, J.; Samper, J.; Pisani, B.; Alvares, D.; Carvalho, J.M.; Chaminé, H.I.; Marques, J.M.; Vieira, G.T.; Mora, C.; Sodré Borges, F. Evaluation of Water Resources in a High-Mountain Basin in Serra Da Estrela, Central Portugal, Using a Semi-Distributed Hydrological Model. Environ Earth Sci 2011, 62, 1219–1234. [Google Scholar] [CrossRef]
- Samper, J.; García Vera, M.A. Manual de Usuario Del Programa BALAN_8. Dpto. Ingeniería del terreno. 1992. [Google Scholar]
- Tukey, J.W. (John W. Exploratory Data Analysis; Reading, Mass. : Addison-Wesley Pub. Co, 1977; ISBN 978-0-201-07616-5. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011) 2825-2830 2011.
- Macqueen, J. Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1967, 1, 281–297. [Google Scholar]
- Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; Adaptive computation and machine learning; 3. print.; MIT Press: Cambridge, Mass, 2008; ISBN 978-0-262-18253-9. [Google Scholar]
- Seeger, M. Gaussian Processes for Machine Learning. Int. J. Neur. Syst. 2004, 14, 69–106. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Machine Learning 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Goldstein, A.; Kapelner, A.; Bleich, J.; Pitkin, E. Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. Journal of Computational and Graphical Statistics 2015, 24, 44–65. [Google Scholar] [CrossRef]














| Name of attribute | Description | Units |
|---|---|---|
| Date | Date of the observation | - |
| P | Precipitation | mm |
| R | Recharge | mm |
| Surface runoff | mm | |
| Interflow/subsurface flow | mm | |
| Groundwater flow | mm | |
| Total flow | mm | |
| K | Potassium concentration | mg/L |
| Na | Sodium concentration | mg/L |
| Ca | Calcium concentration | mg/L |
| Mg | Magnesium concentration | mg/L |
| Fe | Iron concentration | µg/L |
| Mn | Manganese concentration | µg/L |
| Cu | Copper concentration | µg/L |
| Zn | Zinc concentration | µg/L |
| Al | Aluminum concentration | µg/L |
| Vn | Vanadium concentration | µg/L |
| Si | Silicon concentration | mg/L |
| Cl | Chloride concentration | mg/L |
| Sulfate concentration | mg/L | |
| Nitrate concentration | mg/L |
| Cluster number | |||
|---|---|---|---|
| Statistic | 0 | 1 | 2 |
| count | 179 | 76 | 133 |
| mean | 25.156 | 19.802 | 21.333 |
| std | 2.860 | 2.037 | 3.296 |
| min | 17.700 | 13.200 | 9.809 |
| 25% quantile | 23.114 | 18.674 | 19.099 |
| 50% quantile | 25.399 | 20.049 | 21.400 |
| 75% quantile | 27.500 | 21.124 | 23.779 |
| max | 30.699 | 23.499 | 29.099 |
| Cluster number | |||
|---|---|---|---|
| Statistic | 0 | 1 | 2 |
| count | 179 | 76 | 133 |
| mean | 0.793 | 0.026 | 0.548 |
| std | 0.093 | 0.045 | 0.116 |
| min | 0.538 | 0 | 0.297 |
| 25% quantile | 0.730 | 0 | 0.479 |
| 50% quantile | 0.804 | 0 | 0.545 |
| 75% quantile | 0.871 | 0.026 | 0.599 |
| max | 0.942 | 0.171 | 0.919 |
| Cluster number | |||
|---|---|---|---|
| Statistic | 0 | 1 | 2 |
| count | 179 | 76 | 133 |
| mean | 0.205 | 0.973 | 0.447 |
| std | 0.093 | 0.045 | 0.120 |
| min | 0.057 | 0.828 | 0.046 |
| 25% quantile | 0.127 | 0.973 | 0.398 |
| 50% quantile | 0.195 | 1 | 0.452 |
| 75% quantile | 0.269 | 1 | 0.515 |
| max | 0.460 | 1 | 0.703 |
| Metrics of accuracy | |||||
|---|---|---|---|---|---|
| Step | Input variables | Case | R2 | NRMSE | NMAE |
| 1 | Training | 0.07 | 0.19 | 0.15 | |
| Testing | 0.13 | 0.20 | 0.15 | ||
| 2 | R, , | Training | -0.01 | 0.19 | 0.15 |
| Testing | 0.22 | 0.19 | 0.15 | ||
| 3 | R, , with shifting | Training | 0.44 | 0.14 | 0.10 |
| Testing | 0.45 | 0.16 | 0.11 | ||
| 4 | R, , withshifting and K, Ca, Mg, Cl | Training | 0.82 | 0.08 | 0.05 |
| Testing | 0.80 | 0.10 | 0.06 | ||
| Validation | R, , withshifting and K, Ca, Mg, Cl | Validation | 0.85 | 0.15 | 0.12 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).