Submitted:
22 August 2023
Posted:
24 August 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods
2.1. Multivariate Exploratory Data Analysis (MEDA)
| Ph | Hardness | Solids | Chloramines | Sulfate | Conductivity | Organic_carbon | Trihalomethanes | Turbidity | Potability | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 2785.00 | 3276.00 | 3276.00 | 3276.00 | 2495.00 | 3276.00 | 3276.00 | 3114.00 | 3276.00 | 3276.00 |
| Mean | 7.08 | 196.37 | 22014.09 | 7.12 | 333.78 | 426.21 | 14.28 | 66.40 | 3.97 | 0.39 |
| Std | 1.59 | 32.88 | 8768.57 | 1.58 | 41.42 | 80.82 | 3.31 | 16.18 | 0.78 | 0.49 |
| Min | 0.00 | 47.43 | 320.94 | 0.35 | 129.00 | 181.48 | 2.20 | 0.74 | 1.45 | 0.00 |
| 25% | 6.09 | 176.85 | 15666.69 | 6.13 | 307.70 | 365.73 | 12.07 | 55.84 | 3.44 | 0.00 |
| 50% | 7.04 | 196.97 | 20927.83 | 7.13 | 333.07 | 421.88 | 14.22 | 66.62 | 3.96 | 0.00 |
| 75% | 8.06 | 216.67 | 27332.76 | 8.11 | 359.95 | 481.79 | 16.56 | 77.34 | 4.50 | 1.00 |
| Max | 14.00 | 323.12 | 61227.20 | 13.13 | 481.03 | 753.34 | 28.30 | 124.00 | 6.74 | 1.00 |


2.2. Feature Engineering
2.3. Machine Learning Classifiers
2.4. Hyperparameters Optimization
2.5. Error Analysis
3. Results and Discussion
3.1. Predicted and Observed Data
3.2. Model Performance Evaluation
3.3. Feature Importance
4. Conclusion
Data Availability Statement
Conflicts of Interest
References
- Derdour, A.; Jodar-Abellan, A.; Pardo, M.Á.; Ghoneim, S.S.M.; Hussein, E.E. Designing Efficient and Sustainable Predictions of Water Quality Indexes at the Regional Scale Using Machine Learning Algorithms. Water 2022, 14, 2801. [Google Scholar] [CrossRef]
- Panneerselvam, B.; Ravichandran, N.; Kaliyappan, S.P.; Karuppannan, S.; Bidorn, B. Quality and Health Risk Assessment of Groundwater for Drinking and Irrigation Purpose in Semi-Arid Region of India Using Entropy Water Quality and Statistical Techniques. Water 2023, 15, 601. [Google Scholar] [CrossRef]
- Adeloju, S.B.; Khan, S.; Patti, A.F. Arsenic Contamination of Groundwater and Its Implications for Drinking Water Quality and Human Health in Under-Developed Countries and Remote Communities—A Review. Appl. Sci. 2021, 11, 1926. [Google Scholar] [CrossRef]
- Shi, Z.; Chow, C.W.K.; Fabris, R.; Liu, J.; Jin, B. Applications of Online UV-Vis Spectrophotometer for Drinking Water Quality Monitoring and Process Control: A Review. Sensors 2022, 22, 2987. [Google Scholar] [CrossRef]
- Jung, Y.-J.; Khant, N.A.; Kim, H.; Namkoong, S. Impact of Climate Change on Waterborne Diseases: Directions towards Sustainability. Water 2023, 15, 1298. [Google Scholar] [CrossRef]
- Hussain, S.; Reza, M. Environmental Damage and Global Health: Understanding the Impacts and Proposing Mitigation Strategies. J. Big-Data Anal. Cloud Comput. 2023, 8, 1–21. [Google Scholar]
- Ongeri, S. Bacteriological Quality of Drinking Water in Administrative Wards around Kisii Town, Kisii County, Kenya. East Afr. J. Contemp. Res. 2023, 3, 27–37. [Google Scholar]
- Khosravi, M.; Ghoochani, S.; Nazemi, N. Deep Learning-Based Modeling of Daily Suspended Sediment Concentration and Discharge in Esopus Creek 2023.
- Water And Sanitation in Developing Countries: Including Health in the Equation. Environ. Sci. Technol. 2007, 41, 17–24. [CrossRef]
- Anik, A.H.; Sultan, M.B.; Alam, M.; Parvin, F.; Ali, M.M.; Tareq, S.M. The Impact of Climate Change on Water Resources and Associated Health Risks in Bangladesh: A Review. Water Secur. 2023, 18, 100133. [Google Scholar] [CrossRef]
- Rhue, S.J.; Torrico, G.; Amuzie, C.; Collins, S.M.; Lemaitre, A.; Workman, C.L.; Rosinger, A.Y.; Pearson, A.L.; Piperata, B.A.; Wutich, A.; et al. The Effects of Household Water Insecurity on Child Health and Well-Being. WIREs Water n/a. [CrossRef]
- García-Ávila, F.; Zhindón-Arévalo, C.; Valdiviezo-Gonzales, L.; Cadme-Galabay, M.; Gutiérrez-Ortega, H.; del Pino, L.F. A Comparative Study of Water Quality Using Two Quality Indices and a Risk Index in a Drinking Water Distribution Network. Environ. Technol. Rev. 2022, 11, 49–61. [Google Scholar] [CrossRef]
- Wyrwoll, P.R.; Manero, A.; Taylor, K.S.; Rose, E.; Quentin Grafton, R. Measuring the Gaps in Drinking Water Quality and Policy across Regional and Remote Australia. Npj Clean Water 2022, 5, 1–14. [Google Scholar] [CrossRef]
- Drinking-Water Available online:. Available online: https://www.who.int/news-room/fact-sheets/detail/drinking-water (accessed on 9 August 2023).
- Li, P.; Wu, J. Drinking Water Quality and Public Health. Expo. Health 2019, 11, 73–79. [Google Scholar] [CrossRef]
- Water Quality & Treatment: A Handbook on Drinking Water; Edzwald, J. K., American Water Works Association, Eds.; 6th ed.; McGraw-Hill: New York, 2011; ISBN 978-0-07-163011-5. [Google Scholar]
- Scanlon, B.R.; Fakhreddine, S.; Reedy, R.C.; Yang, Q.; Malito, J.G. Drivers of Spatiotemporal Variability in Drinking Water Quality in the United States. Environ. Sci. Technol. 2022, 56, 12965–12974. [Google Scholar] [CrossRef]
- Zhang, Z.-M.; Zhang, F.; Du, J.-L.; Chen, D.-C. Surface Water Quality Assessment and Contamination Source Identification Using Multivariate Statistical Techniques: A Case Study of the Nanxi River in the Taihu Watershed, China. Water 2022, 14, 778. [Google Scholar] [CrossRef]
- Yang, W.; Zhao, Y.; Wang, D.; Wu, H.; Lin, A.; He, L. Using Principal Components Analysis and IDW Interpolation to Determine Spatial and Temporal Changes of Surface Water Quality of Xin’anjiang River in Huangshan, China. Int. J. Environ. Res. Public. Health 2020, 17, 2942. [Google Scholar] [CrossRef] [PubMed]
- Ebrahimi, S.; Khorram, M. Variability Effect of Hydrological Regime on River Quality Pattern and Its Uncertainties: Case Study of Zarjoob River in Iran. J. Hydroinformatics 2021, 23, 1146–1164. [Google Scholar] [CrossRef]
- Ghoochani, S.; Salehi, M.; DeSimone, D.; Esfandarani, M.S.; Bhattacharjee, L. Studying the Impacts of Non-Routine Extended Schools’ Closure on Heavy Metal Release into Tap Water. Environ. Sci. Water Res. Technol. 2022, 8, 1223–1235. [Google Scholar] [CrossRef]
- Yusri, W.M.E.W.M.; Ramli, M.H.M.; Khusaini, N.S.; Mohamed, Z. IoT Based Water Quality Monitoring System and Test for Swimming Pool Water Physicochemical Quality. AIP Conf. Proc. 2023, 2609, 020002. [Google Scholar] [CrossRef]
- Water Quality Assessments|A Guide to the Use of Biota, Sediments And Available online:. Available online: https://www.taylorfrancis.com/books/mono/10.1201/9781003062103/water-quality-assessments-deborah-chapman (accessed on 9 August 2023).
- Samarinas, N.; Spiliotopoulos, M.; Tziolas, N.; Loukas, A. Synergistic Use of Earth Observation Driven Techniques to Support the Implementation of Water Framework Directive in Europe: A Review. Remote Sens. 2023, 15, 1983. [Google Scholar] [CrossRef]
- Uddin, M.G.; Nash, S.; Rahman, A.; Olbert, A.I. A Novel Approach for Estimating and Predicting Uncertainty in Water Quality Index Model Using Machine Learning Approaches. Water Res. 2023, 229, 119422. [Google Scholar] [CrossRef]
- Uddin, M.G.; Nash, S.; Rahman, A.; Olbert, A.I. Performance Analysis of the Water Quality Index Model for Predicting Water State Using Machine Learning Techniques. Process Saf. Environ. Prot. 2023, 169, 808–828. [Google Scholar] [CrossRef]
- Mehedi, M.A.A.; Khosravi, M.; Yazdan, M.M.S.; Shabanian, H. Exploring Temporal Dynamics of River Discharge Using Univariate Long Short-Term Memory (LSTM) Recurrent Neural Network at East Branch of Delaware River. Hydrology 2022, 9, 202. [Google Scholar] [CrossRef]
- Karimi, M.; Khosravi, M.; Fathollahi, R.; Khandakar, A.; Vaferi, B. Determination of the Heat Capacity of Cellulosic Biosamples Employing Diverse Machine Learning Approaches. Energy Sci. Eng. 2022, 10, 1925–1939. [Google Scholar] [CrossRef]
- Abdollahzadeh, M.; Khosravi, M.; Hajipour Khire Masjidi, B.; Samimi Behbahan, A.; Bagherzadeh, A.; Shahkar, A.; Tat Shahdost, F. Estimating the Density of Deep Eutectic Solvents Applying Supervised Machine Learning Techniques. Sci. Rep. 2022, 12, 4954. [Google Scholar] [CrossRef] [PubMed]
- Ibrahim, H.; Yaseen, Z.M.; Scholz, M.; Ali, M.; Gad, M.; Elsayed, S.; Khadr, M.; Hussein, H.; Ibrahim, H.H.; Eid, M.H.; et al. Evaluation and Prediction of Groundwater Quality for Irrigation Using an Integrated Water Quality Indices, Machine Learning Models and GIS Approaches: A Representative Case Study. Water 2023, 15, 694. [Google Scholar] [CrossRef]
- Ahmad, M.; Al Mehedi, M.A.; Yazdan, M.M.S.; Kumar, R. Development of Machine Learning Flood Model Using Artificial Neural Network (ANN) at Var River. Liquids 2022, 2, 147–160. [Google Scholar] [CrossRef]
- Mehedi, M.A.A.; Yazdan, M.M.S. Automated Particle Tracing & Sensitivity Analysis for Residence Time in a Saturated Subsurface Media. Liquids 2022, 2, 72–84. [Google Scholar] [CrossRef]
- Piazza, S.; Sambito, M.; Freni, G. Analysis of Optimal Sensor Placement in Looped Water Distribution Networks Using Different Water Quality Models. Water 2023, 15, 559. [Google Scholar] [CrossRef]
- Reljić, M.; Romić, M.; Romić, D.; Gilja, G.; Mornar, V.; Ondrasek, G.; Bubalo Kovačić, M.; Zovko, M. Advanced Continuous Monitoring System—Tools for Water Resource Management and Decision Support System in Salt Affected Delta. Agriculture 2023, 13, 369. [Google Scholar] [CrossRef]
- Younes, K.; Kharboutly, Y.; Antar, M.; Chaouk, H.; Obeid, E.; Mouhtady, O.; Abu-samha, M.; Halwani, J.; Murshid, N. Application of Unsupervised Machine Learning for the Evaluation of Aerogels’ Efficiency towards Ion Removal—A Principal Component Analysis (PCA) Approach. Gels 2023, 9, 304. [Google Scholar] [CrossRef]
- Zhou, Y.; Wang, X.; Li, W.; Zhou, S.; Jiang, L. Water Quality Evaluation and Pollution Source Apportionment of Surface Water in a Major City in Southeast China Using Multi-Statistical Analyses and Machine Learning Models. Int. J. Environ. Res. Public. Health 2023, 20, 881. [Google Scholar] [CrossRef]
- Yazdan, M.M.S.; Ahad, M.T.; Kumar, R.; Mehedi, M.A.A. Estimating Flooding at River Spree Floodplain Using HEC-RAS Simulation. J 2022, 5, 410–426. [Google Scholar] [CrossRef]
- Electronics | Free Full-Text | IoT-Enabled Chlorine Level Assessment and Prediction in Water Monitoring System Using Machine Learning Available online:. Available online: https://www.mdpi.com/2079-9292/12/6/1458 (accessed on 9 August 2023).
- Li, L.; Rong, S.; Wang, R.; Yu, S. Recent Advances in Artificial Intelligence and Machine Learning for Nonlinear Relationship Analysis and Process Control in Drinking Water Treatment: A Review. Chem. Eng. J. 2021, 405, 126673. [Google Scholar] [CrossRef]
- Taffese, W.Z.; Sistonen, E. Machine Learning for Durability and Service-Life Assessment of Reinforced Concrete Structures: Recent Advances and Future Directions. Autom. Constr. 2017, 77, 1–14. [Google Scholar] [CrossRef]
- A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists - Shen - 2018 - Water Resources Research - Wiley Online Library Available online:. Available online: https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2018WR022643 (accessed on 9 August 2023).
- Mehedi, M.A.A.; Reichert, N.; Molkenthin, F. SENSITIVITY ANALYSIS OF HYPORHEIC EXCHANGE TO SMALL SCALE CHANGES IN GRAVEL-SAND FLUMEBED USING A COUPLED GROUNDWATER-SURFACE WATER MODEL. 2020.
- Vinuesa, R.; Azizpour, H.; Leite, I.; Balaam, M.; Dignum, V.; Domisch, S.; Felländer, A.; Langhans, S.D.; Tegmark, M.; Fuso Nerini, F. The Role of Artificial Intelligence in Achieving the Sustainable Development Goals. Nat. Commun. 2020, 11, 233. [Google Scholar] [CrossRef]
- Chen, K.; Chen, H.; Zhou, C.; Huang, Y.; Qi, X.; Shen, R.; Liu, F.; Zuo, M.; Zou, X.; Wang, J.; et al. Comparative Analysis of Surface Water Quality Prediction Performance and Identification of Key Water Parameters Using Different Machine Learning Models Based on Big Data. Water Res. 2020, 171, 115454. [Google Scholar] [CrossRef]
- Lashkaripour, A.; Rodriguez, C.; Mehdipour, N.; Mardian, R.; McIntyre, D.; Ortiz, L.; Campbell, J.; Densmore, D. Machine Learning Enables Design Automation of Microfluidic Flow-Focusing Droplet Generation. Nat. Commun. 2021, 12, 25. [Google Scholar] [CrossRef]
- Kumar, R.; Yazdan, M.M.S.; Mehedi, M.A.A. Demystifying the Preventive Measures for Flooding from Groundwater Triggered by the Rise in Adjacent River Stage. 2022. [CrossRef]
- Wasana, H.M.S.; Perera, G.D.R.K.; Gunawardena, P.D.S.; Fernando, P.S.; Bandara, J. WHO Water Quality Standards Vs Synergic Effect(s) of Fluoride, Heavy Metals and Hardness in Drinking Water on Kidney Tissues. Sci. Rep. 2017, 7, 42516. [Google Scholar] [CrossRef]
- Mehedi, M.A.A.; Yazdan, M.M.S.; Ahad, M.T.; Akatu, W.; Kumar, R.; Rahman, A. Quantifying Small-Scale Hyporheic Streamlines and Resident Time under Gravel-Sand Streambed Using a Coupled HEC-RAS and MIN3P Model. Eng 2022, 3, 276–300. [Google Scholar] [CrossRef]
- Damo, R.; Icka, P. Evaluation of Water Quality Index for Drinking Water. Pol. J. Environ. Stud. 2013. [Google Scholar]
- Han, X.; Liu, X.; Gao, D.; Ma, B.; Gao, X.; Cheng, M. Costs and Benefits of the Development Methods of Drinking Water Quality Index: A Systematic Review. Ecol. Indic. 2022, 144, 109501. [Google Scholar] [CrossRef]
- VanDerslice, J. Drinking Water Infrastructure and Environmental Disparities: Evidence and Methodological Considerations. Am. J. Public Health 2011, 101, S109–S114. [Google Scholar] [CrossRef]
- Adeniran, A.; Daniell, K.A.; Pittock, J. Water Infrastructure Development in Nigeria: Trend, Size, and Purpose. Water 2021, 13, 2416. [Google Scholar] [CrossRef]
- Hangan, A.; Chiru, C.-G.; Arsene, D.; Czako, Z.; Lisman, D.F.; Mocanu, M.; Pahontu, B.; Predescu, A.; Sebestyen, G. Advanced Techniques for Monitoring and Management of Urban Water Infrastructures—An Overview. Water 2022, 14, 2174. [Google Scholar] [CrossRef]
- Ramesh, N.I.; Davison, A.C. Local Models for Exploratory Analysis of Hydrological Extremes. J. Hydrol. 2002, 256, 106–119. [Google Scholar] [CrossRef]
- Khosravi, M.; Mehedi, M.A.A.; Baghalian, S.; Burns, M.; Welker, A.L.; Golub, M. Using Machine Learning to Improve Performance of a Low-Cost Real-Time Stormwater Control Measure 2022.
- Yazdan, M.M.S.; Khosravia, M.; Saki, S.; Mehedi, M.A.A. Forecasting Energy Consumption Time Series Using Recurrent Neural Network in Tensorflow 2022.
- Khosravi, M.; Arif, S.B.; Ghaseminejad, A.; Tohidi, H.; Shabanian, H. Performance Evaluation of Machine Learning Regressors for Estimating Real Estate House Prices 2022.
- Čeh, M.; Kilibarda, M.; Lisec, A.; Bajat, B. Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments. ISPRS Int. J. Geo-Inf. 2018, 7, 168. [Google Scholar] [CrossRef]
- Arumugam, S.R.; Gowr, S. ; Abimala; Balakrishna; Manoj, O. Performance Evaluation of Machine Learning and Deep Learning Techniques. In Convergence of Deep Learning In Cyber-IoT Systems and Security; John Wiley & Sons, Ltd, 2022; pp. 21–65 ISBN 978-1-119-85768-6.
- Miao, X.; Jiang, A.; Zhu, Y.; Kwan, H.K. A Joint Learning Framework for Gaussian Processes Regression and Graph Learning. Signal Process. 2022, 201, 108708. [Google Scholar] [CrossRef]
- Khosravi, M.; Tabasi, S.; Hossam Eldien, H.; Motahari, M.R.; Alizadeh, S.M. Evaluation and Prediction of the Rock Static and Dynamic Parameters. J. Appl. Geophys. 2022, 199, 104581. [Google Scholar] [CrossRef]
- Khosravi, M.; Duti, B.M.; Yazdan, M.M.S.; Ghoochani, S.; Nazemi, N.; Shabanian, H. Multivariate Multi-Step Long Short-Term Memory Neural Network for Simultaneous Stream-Water Variable Prediction. Eng 2023, 4, 1933–1950. [Google Scholar] [CrossRef]
- Prakaisak, I.; Wongchaisuwat, P. Hydrological Time Series Clustering: A Case Study of Telemetry Stations in Thailand. Water 2022, 14, 2095. [Google Scholar] [CrossRef]



| Input features | WHO limits |
|---|---|
| Ph | 6.5–8.5 |
| Hardness | 200 mg/L |
| Solids | 1000 ppm |
| Chloramines | 4 ppm |
| Sulfate | 1000 mg/L |
| Conductivity | 400 μS/cm |
| Organic carbon | 10 ppm |
| Trihalomethanes | 80 ppm |
| Turbidity | 5 NTU |
| Model | F1-score | Percentage (%) | Jaccard |
|---|---|---|---|
| Logistic Regression (LR) | 0.21 | 61.58 | 0.11 |
| Support Vector Machine (SVR) | 0.23 | 57.12 | 0.015 |
| Stochastic Gradient Descent (SGD) | 0.43 | 52.37 | 0.31 |
| K-Nearest Neighbors (KNN) | 0.49 | 63.24 | 0.32 |
| Gaussian Process Classifiers (GPC) | 0.59 | 73.36 | 0.39 |
| Gaussian Naïve Bayes (GNB) | 0.64 | 75.08 | 3.66 |
| Decision Tree (DT) | 0.91 | 83.61 | 0.87 |
| Random Forest (RF) | 0.93 | 85.37 | 0.89 |
| Stacked Ensemble Classifier (SEC) | 0.65 | 92.98 | 3.14 |
| Extreme Gradient Boosting (XGboost) | 0.91 | 89.47 | 2.15 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).