Submitted:
14 February 2024
Posted:
15 February 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Study Area

2.2. Methodology

| Feature | VIF |
|---|---|
| TDS | 4209.78 |
| Sodium | 1137.34 |
| Calcium | 425.13 |
| Magnesium | 380.55 |
| Bicarbonate | 58.74 |
| Sulfate | 39.68 |
| Chloride | 31.69 |
| pH | 20.16 |
| EC | 10.20 |
| Nitrate (NO3-N) | 5.45 |
| Well Depth | 1.70 |
| Potassium | 1.43 |
| Feature | IG |
|---|---|
| Nitrate (NO3-N) | 0.876 |
| Calcium | 0.869 |
| Sodium | 0.869 |
| Sulfate | 0.869 |
| Chloride | 0.869 |
| Potassium | 0.869 |
| Magnesium | 0.869 |
| TDS | 0.816 |
| EC | 0.784 |
| Well Depth | 0.525 |
| Bicarbonate | 0.520 |
| pH | 0.509 |
Uncertainty Analysis
R-factor
Bootstrapping
Random Forest and Gradient Boosting
Support Vector Machines (SVM) and XGBoost
K-Nearest Neighbors (KNN) and Decision Trees
3. Results
3.1. AUC based performance evaluation
3.2. Statistical Analysis using Friedman Test
| Test | Value |
|---|---|
| Friedman Test - F-value | 5.0 |
| Friedman Test - p-value | 0.4159 |
3.3. Nemenyi Test for pairwise comparisons
| XGB Classifier | Random Forest Classifier | Support Vector Classifier | K Neighbors Classifier | Gradient Boosting Classifier | Decision Tree Classifier | |
|---|---|---|---|---|---|---|
| 1 | 1.0 | NaN | NaN | NaN | NaN | NaN |
| 2 | NaN | 1.0 | NaN | NaN | NaN | NaN |
| 3 | NaN | NaN | 1.0 | NaN | NaN | NaN |
| 4 | NaN | NaN | NaN | 1.0 | NaN | NaN |
| 5 | NaN | NaN | NaN | NaN | 1.0 | NaN |
| 6 | NaN | NaN | NaN | NaN | NaN | 1.0 |
3.3. Confusion matrix
| True\Predicted | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 |
|---|---|---|---|---|---|
| Class 1 | 2 (TP) | 3 (FN) | 0 | 0 | 0 |
| Class 2 | 0 | 3 (TP) | 0 | 1 (FP) | 0 |
| Class 3 | 0 | 3 (FP) | 1 (TP) | 2 (FP) | 1 (TN) |
| Class 4 | 0 | 2 (FP) | 0 | 1 (TP) | 1 (TN) |
| Class 5 | 0 | 0 | 0 | 2 (FP) | 62 (TP) |
| True\Predicted | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 |
|---|---|---|---|---|---|
| Class 1 | 4 (TP) | 1 (FN) | 0 | 0 | 0 |
| Class 2 | 2 (FP) | 1 (TP) | 0 | 0 | 1 (FN) |
| Class 3 | 0 | 1 (FP) | 1 (TP) | 2 (FP) | 3 (TN) |
| Class 4 | 0 | 1 (FP) | 0 | 0 | 3 (TN) |
| Class 5 | 0 | 0 | 0 | 0 | 64 (TP) |
| True\Predicted | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 |
|---|---|---|---|---|---|
| Class 1 | 4 (TP) | 1 (FN) | 0 | 0 | 0 |
| Class 2 | 2 (FP) | 1 (TP) | 0 | 0 | 1 (FN) |
| Class 3 | 1 (FP) | 0 | 0 | 0 | 6 (FN) |
| Class 4 | 0 | 0 | 0 | 0 | 4 (FN) |
| Class 5 | 0 | 0 | 0 | 0 | 64 (TP) |
| True\Predicted | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 |
|---|---|---|---|---|---|
| Class 1 | 3 (TP) | 2 (FN) | 0 | 0 | 0 |
| Class 2 | 3 (TP) | 0 | 0 | 0 | 1 (FN) |
| Class 3 | 0 | 2 (FP) | 1 (TP) | 1 (FP) | 3 (TN) |
| Class 4 | 0 | 0 | 0 | 0 | 4 (TN) |
| Class 5 | 0 | 1 (FP) | 1 (TP) | 2 (FP) | 60 (TP) |
| True\Predicted | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | |
|---|---|---|---|---|---|---|
| Class 1 | 4 (TP) | 1 (FN) | 0 | 0 | 0 | |
| Class 2 | 2 (FP) | 1 (TP) | 0 | 1 (FP) | 0 | |
| Class 3 | 0 | 2 (FP) | 2 (TP) | 2 (FP) | 1 (TN) | |
| Class 4 | 0 | 2 (FP) | 0 | 1 (TP) | 1 (TN) | |
| Class 5 | 0 | 0 | 0 | 0 | 64 (TP) | |
| True\Predicted | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 |
|---|---|---|---|---|---|
| Class 1 | 4 (TP) | 1 (FN) | 0 | 0 | 0 |
| Class 2 | 2 (FP) | 2 (TP) | 0 | 0 | 0 |
| Class 3 | 0 | 1 (FP) | 4 (TP) | 1 (FP) | 1 (TN) |
| Class 4 | 0 | 1 (FP) | 1 (FP) | 1 (TP) | 1 (TN) |
| Class 5 | 0 | 1 (FP) | 1 (FP) | 2 (FP) | 60 (TP) |
| Classes | WQI Range | Water Quality |
|---|---|---|
| Class 1 | 0-25 | Excellent water quality |
| Class 2 | 26-50 | Good water quality |
| Class 3 | 51-75 | Fair water quality |
| Class 4 | 76-100 | Poor water quality |
| Class 5 | Above 100 | Very poor to unacceptable water quality |

4. Discussion and conclusion
References
- Rao, E.P.; Puttanna, K.; Sooryanarayana, K.; Biswas, A.; Arunkumar, J. Assessment of nitrate threat to water quality in India. In The Indian nitrogen assessment; Elsevier, 2017; pp. 323–333. [Google Scholar] [CrossRef]
- Wanke, H.; Nakwafila, A.; Hamutoko, J.; Lohe, C.; Neumbo, F.; Petrus, I.; David, A.; Beukes, H.; Masule, N.; Quinger, M. Hand dug wells in Namibia: an underestimated water source or a threat to human health? Physics and Chemistry of the Earth, Parts A/B/C 2014, 76, 104–113. [Google Scholar] [CrossRef]
- Brown, T.C.; Froemke, P. Nationwide assessment of nonpoint source threats to water quality. BioScience 2012, 62, 136–146. [Google Scholar] [CrossRef]
- Lapworth, D.; Boving, T.; Kreamer, D.; Kebede, S.; Smedley, P. Groundwater quality: Global threats, opportunities and realising the potential of groundwater. 2022, 811, 152471. [Google Scholar] [CrossRef] [PubMed]
- Memon, A.H.; Lund, G.M.; Channa, N.A.; Younis, M.; Ali, S.; Shah, F.B. Analytical Study of Drinking Water Quality Sources of Dighri Sub-division of Sindh, Pakistan. 2016. [Google Scholar]
- Khan, S.; Aziz, T.; Noor-Ul-Ain, A.K.; Ahmed, I.; Nida, A. Drinking water quality in 13 different districts of Sindh, Pakistan. Health Care Curr Rev 2018, 6, 1000235. [Google Scholar] [CrossRef]
- Akhan, F.; Siddqui, I.; USMANI, T. of Larkana and Mirpurkhas Districts of Sind. Jour. Chem. Soc. Pak. Vol 2006, 28, 131. [Google Scholar]
- Hayder, G.; Kurniawan, I.; Mustafa, H.M. Implementation of machine learning methods for monitoring and predicting water quality parameters. Biointerface Res. Appl. Chem 2020, 11, 9285–9295. [Google Scholar] [CrossRef]
- Avila, R.; Horn, B.; Moriarty, E.; Hodson, R.; Moltchanova, E. Evaluating statistical model performance in water quality prediction. Journal of environmental management 2018, 206, 910–919. [Google Scholar] [CrossRef] [PubMed]
- Ashwini, K.; Vedha, J.; Priya, M. Intelligent model for predicting water quality. Int. J. Adv. Res. Ideas Innov. Technol. ISSN 2019, 5, 70–75. [Google Scholar]
- Kalin, L.; Isik, S.; Schoonover, J.E.; Lockaby, B.G. Predicting water quality in unmonitored watersheds using artificial neural networks. Journal of environmental quality 2010, 39, 1429–1440. [Google Scholar] [CrossRef]
- McGrane, S.J. Impacts of urbanisation on hydrological and water quality dynamics, and urban water management: a review. Hydrological Sciences Journal 2016, 61, 2295–2311. [Google Scholar] [CrossRef]
- Dutt, V.; Sharma, N. Potable water quality assessment of traditionally used springs in a hilly town of Bhaderwah, Jammu and Kashmir, India. Environmental monitoring and assessment 2022, 194, 30. [Google Scholar] [CrossRef] [PubMed]
- Lermontov, A.; Yokoyama, L.; Lermontov, M.; Machado, M.A.S. River quality analysis using fuzzy water quality index: Ribeira do Iguape river watershed, Brazil. Ecological Indicators 2009, 9, 1188–1197. [Google Scholar] [CrossRef]
- De Pauw, N.; Vanhooren, G. Method for biological quality assessment of watercourses in Belgium. Hydrobiologia 1983, 100, 153–168. [Google Scholar] [CrossRef]
- Zhang, Y.; Guo, F.; Meng, W.; Wang, X.-Q. Water quality assessment and source identification of Daliao river basin using multivariate statistical methods. Environmental monitoring and assessment 2009, 152, 105–121. [Google Scholar] [CrossRef]
- Lenat, D.R. Water quality assessment of streams using a qualitative collection method for benthic macroinvertebrates. Journal of the North American Benthological Society 1988, 7, 222–233. [Google Scholar] [CrossRef]
- Behmel, S.; Damour, M.; Ludwig, R.; Rodriguez, M. Water quality monitoring strategies—A review and future perspectives. Science of the Total Environment 2016, 571, 1312–1329. [Google Scholar] [CrossRef] [PubMed]
- Hassan, M.M.; Hassan, M.M.; Akter, L.; Rahman, M.M.; Zaman, S.; Hasib, K.M.; Jahan, N.; Smrity, R.N.; Farhana, J.; Raihan, M. Efficient prediction of water quality index (WQI) using machine learning algorithms. Human-Centric Intelligent Systems 2021, 1, 86–97. [Google Scholar] [CrossRef]
- Lap, B.Q.; Du Nguyen, H.; Hang, P.T.; Phi, N.Q.; Hoang, V.T.; Linh, P.G.; Hang, B.T.T. Predicting water quality index (WQI) by feature selection and machine learning: a case study of An Kim Hai irrigation system. Ecological Informatics 2023, 74, 101991. [Google Scholar] [CrossRef]
- Ding, F.; Zhang, W.; Cao, S.; Hao, S.; Chen, L.; Xie, X.; Li, W.; Jiang, M. Optimization of water quality index models using machine learning approaches. Water Research 2023, 243, 120337. [Google Scholar] [CrossRef] [PubMed]
- Van Rossum, G. Python Programming Language. In Proceedings of the USENIX annual technical conference; 2007; pp. 1–36. [Google Scholar]
- Saabith, A.S.; Vinothraj, T.; Fareez, M. Popular python libraries and their application domains. International Journal of Advance Engineering and Research Development 2020, 7. [Google Scholar]
- Bansal, S.; Ganesan, G. Advanced evaluation methodology for water quality assessment using artificial neural network approach. Water Resources Management 2019, 33, 3127–3141. [Google Scholar] [CrossRef]
- Gevrey, M.; Rimet, F.; Park, Y.S.; Giraudel, J.L.; Ector, L.; Lek, S. Water quality assessment using diatom assemblages and advanced modelling techniques. Freshwater biology 2004, 49, 208–220. [Google Scholar] [CrossRef]
- Uddin, M.G.; Olbert, A.I.; Nash, S. Assessment of water quality using Water Quality Index (WQI) models and advanced geostatistical technique. Civil Engineering Research Association of Ireland (CERAI). Civil Engineering Research Association of Ireland (CERAI), 2020; 594–599. [Google Scholar]
- Soomro, A.; Mangrio, M.; Bharchoond, Z.; Mari, F.; Pirzada, P.; Lashari, B.; Bhatti, M.; Skogerboe, G. Maintenance plans for irrigation facilities of pilot distributaries in Sindh Province, Pakistan. Volume 3-Bareji Distributary, Mirpurkhas District; IWMI, 1997. [Google Scholar]
- Van der Hoek, W.; Boelee, E.; Konradsen, F. Irrigation, domestic water supply and human health; Encyclopedia of Life Support Systems (EOLSS): Paris, France, 2002. [Google Scholar]
- Van der Hoek, W.; Konradsen, F.; Ensink, J.H.; Mudasser, M.; Jensen, P.K. Irrigation water as a source of drinking water: is safe use possible? Tropical medicine & international health 2001, 6, 46–54. [Google Scholar] [CrossRef]
- Akhtar, N.; Syakir Ishak, M.I.; Bhawani, S.A.; Umar, K. Various natural and anthropogenic factors responsible for water quality degradation: A review. Water 2021, 13, 2660. [Google Scholar] [CrossRef]
- Khatri, N.; Tyagi, S. Influences of natural and anthropogenic factors on surface and groundwater quality in rural and urban areas. Frontiers in life science 2015, 8, 23–39. [Google Scholar] [CrossRef]
- Burri, N.M.; Weatherl, R.; Moeck, C.; Schirmer, M. A review of threats to groundwater quality in the anthropocene. Science of the Total Environment 2019, 684, 136–154. [Google Scholar] [CrossRef]
- Udhayakumar, R.; Manivannan, P.; Raghu, K.; Vaideki, S. Assessment of physico-chemical characteristics of water in Tamilnadu. Ecotoxicology and environmental safety 2016, 134, 474–477. [Google Scholar] [CrossRef]
- Patil, P.; Sawant, D.; Deshmukh, R. Physico-chemical parameters for testing of water–A review. International journal of environmental sciences 2012, 3, 1194–1207. [Google Scholar]
- Brusseau, M.; Walker, D.; Fitzsimmons, K. Physical-chemical characteristics of water. In Environmental and pollution science; Elsevier, 2019; pp. 23–45. [Google Scholar] [CrossRef]
- Kroll, C.N.; Song, P. Impact of multicollinearity on small sample hydrologic regression models. Water Resources Research 2013, 49, 3756–3769. [Google Scholar] [CrossRef]
- Sulaiman, M.S.; Abood, M.M.; Sinnakaudan, S.K.; Shukor, M.R.; You, G.Q.; Chung, X.Z. Assessing and solving multicollinearity in sediment transport prediction models using principal component analysis. ISH Journal of Hydraulic Engineering 2021, 27, 343–353. [Google Scholar] [CrossRef]
- Iliou, T.; Anagnostopoulos, C.-N.; Nerantzaki, M.; Anastassopoulos, G. A novel machine learning data preprocessing method for enhancing classification algorithms performance. In Proceedings of the Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS), 2015; Nerantzaki, M., Ed.; pp. 1–5. [CrossRef]
- Werner de Vargas, V.; Schneider Aranda, J.A.; dos Santos Costa, R.; da Silva Pereira, P.R.; Victória Barbosa, J.L. Imbalanced data preprocessing techniques for machine learning: a systematic mapping study. Knowledge and Information Systems 2023, 65, 31–57. [Google Scholar] [CrossRef]
- Veček, N.; Črepinšek, M.; Mernik, M. On the influence of the number of algorithms, problems, and independent runs in the comparison of evolutionary algorithms. Applied Soft Computing 2017, 54, 23–45. [Google Scholar] [CrossRef]
- Liang, G.; Zhang, C. A comparative study of sampling methods and algorithms for imbalanced time series classification. In Proceedings of the AI 2012: Advances in Artificial Intelligence: 25th Australasian Joint Conference, Sydney, Australia, 4-7 December 2012; Proceedings 25, 2012. pp. 637–648. [Google Scholar] [CrossRef]
- Browne, M.W. Cross-validation methods. Journal of mathematical psychology 2000, 44, 108–132. [Google Scholar] [CrossRef]
- Daoud, J.I. Multicollinearity and regression analysis. In Proceedings of the Journal of Physics: Conference Series; 2017; p. 012009. [Google Scholar] [CrossRef]
- Akram, P.; Solangi, G.S.; Shehzad, F.R.; Ahmed, A. Groundwater Quality Assessment using a Water Quality Index (WQI) in Nine Major Cities of Sindh, Pakistan. International Journal of Research in Environmental Science (IJRES) 2020, 6, 18–26. [Google Scholar] [CrossRef]
- Wijaya, D.R.; Sarno, R.; Zulaika, E. Information Quality Ratio as a novel metric for mother wavelet selection. Chemometrics and Intelligent Laboratory Systems 2017, 160, 59–71. [Google Scholar] [CrossRef]
- Singhee, A.; Rutenbar, R.A. Why quasi-Monte Carlo is better than Monte Carlo or Latin hypercube sampling for statistical circuit analysis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2010, 29, 1763–1776. [Google Scholar] [CrossRef]
- Hoffman, R.N.; Kalnay, E. Lagged average forecasting, an alternative to Monte Carlo forecasting. Tellus A: Dynamic Meteorology and Oceanography 1983, 35, 100–118. [Google Scholar] [CrossRef]
- Feroz, F.; Hobson, M.P. Multimodal nested sampling: an efficient and robust alternative to Markov Chain Monte Carlo methods for astronomical data analyses. Monthly Notices of the Royal Astronomical Society 2008, 384, 449–463. [Google Scholar] [CrossRef]
- Noori, R.; Karbassi, A.; Moghaddamnia, A.; Han, D.; Zokaei-Ashtiani, M.; Farokhnia, A.; Gousheh, M.G. Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. Journal of hydrology 2011, 401, 177–189. [Google Scholar] [CrossRef]
- Pan, M.; Li, C.; Liao, J.; Lei, H.; Pan, C.; Meng, X.; Huang, H. Design and modeling of PEM fuel cell based on different flow fields. Energy 2020, 207, 118331. [Google Scholar] [CrossRef]
- Pirmohamed, M.; Burnside, G.; Eriksson, N.; Jorgensen, A.L.; Toh, C.H.; Nicholson, T.; Kesteven, P.; Christersson, C.; Wahlström, B.; Stafberg, C. A randomized trial of genotype-guided dosing of warfarin. New England Journal of Medicine 2013, 369, 2294–2303. [Google Scholar] [CrossRef]
- Sharafati, A.; Yasa, R.; Azamathulla, H.M. Assessment of stochastic approaches in prediction of wave-induced pipeline scour depth. Journal of Pipeline Systems Engineering and Practice 2018, 9, 04018024. [Google Scholar] [CrossRef]
- Sheldon, M.R.; Fillyaw, M.J.; Thompson, W.D. The use and interpretation of the Friedman test in the analysis of ordinal-scale data in repeated measures designs. Physiotherapy Research International 1996, 1, 221–228. [Google Scholar] [CrossRef] [PubMed]
- Pereira, D.G.; Afonso, A.; Medeiros, F.M. Overview of Friedman’s test and post-hoc analysis. Communications in Statistics-Simulation and Computation 2015, 44, 2636–2653. [Google Scholar] [CrossRef]
- Pohlert, T. The pairwise multiple comparison of mean ranks package (PMCMR). R package 2014, 27, 9. [Google Scholar]
- Garcia, S.; Herrera, F. An Extension on" Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons. Journal of machine learning research 2008, 9. [Google Scholar]
- Townsend, J.T. Theoretical analysis of an alphabetic confusion matrix. Perception & Psychophysics 1971, 9, 40–50. [Google Scholar] [CrossRef]



| Classifier | R-factor |
|---|---|
| K-Nearest Neighbors | 0.83 |
| Decision Tree | 0.77 |
| Gradient Boosting | 0.83 |
| Random Forest | 0.83 |
| SVM | 0.83 |
| XGBoost | 0.83 |
| Algorithm | AUC |
|---|---|
| Decision Tree (DT) | 0.81 |
| Random Forest (RF) | 0.94 |
| Gradient Boosting | 0.95 |
| K-Nearest Neighbors (KNN) | 0.84 |
| Support Vector Machine (SVM) | 0.93 |
| XGBoost | 0.95 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).