2.1. Machine Learning Models for Water Quality Assesment
In recent years, machine learning methods have been extensively applied to the prediction and classification of water quality parameters. To assess model performance, researchers commonly utilize established metrics. For regression tasks, the primary indicators include the coefficient of determination (), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). For classification tasks, standard metrics comprise accuracy, precision, recall, and F1-score. Each metric provides a distinct perspective on model performance and prediction reliability.
The study by [
10] evaluates multiple machine learning models, including the adaptive neuro-fuzzy inference system (ANFIS), feed-forward neural networks (FFNN), and K-nearest neighbors (KNN), for predicting the water quality index (WQI) and classifying water quality (Water Quality Classification, WQC). The findings demonstrate that the ANFIS model achieved a regression coefficient of 96.17% for WQI prediction, while the FFNN attained 100% accuracy for WQC classification.
The comparative analysis presented in [
11] examines residual chlorine prediction in drinking water using various machine learning approaches. Among the individual models evaluated, Deep Learning models exhibited strong performance results. The study also investigates multi-model ensembles (MMEs), revealing that optimal model combinations can explain up to 74% of the variance in observed chlorine levels, thereby offering cost-effective alternatives to complex physical models. Further research by [
12] reports that Multiple Linear Regression (MLR) achieved high regression performance (
= 0.9992, RMSE = 0.338) in predicting drinking water quality in seawater desalination plants. SHAP analysis identified total bacterial count, TDS, sodium, chloride, and residual chlorine as critical features, supporting an interpretable machine learning-based approach for sustainable plant operations.
Similarly, Khan et al. [
13] implemented a Water Quality Prediction and Classification (WQPC) system utilizing an SVM classifier. The model was primarily evaluated using accuracy, calculated from the confusion matrix components: True Positives, True Negatives, False Positives, and False Negatives. The proposed system reported a high prediction accuracy ranging from 92% to 95%.
Garcia et al. [
14] evaluated the performance of Decision Tree, Random Forest, and K-Nearest Neighbors (KNN) algorithms for classifying groundwater quality based on Total Dissolved Solids (TDS). The models were assessed using accuracy, precision, recall, F1-score, and cross-validation. Both Decision Tree and Random Forest achieved superficial perfect scores (100%) across all metrics, which was likely due to overfitting. In contrast, KNN achieved an accuracy of 92.9%, with an average cross-validation score of 93.7% and a standard deviation of 3.74%. The authors concluded that tree-based models provided superior performance and greater stability for water quality classification tasks.
Patil et al. [
15] compared several machine learning algorithms, including SVM, Random Forest, KNN, and Logistic Regression, to predict water potability. Among these, the SVM classifier achieved the highest accuracy of 64%, with precision values of 0.71 for non-potable and 0.56 for potable water. The study focused on standard classification metrics rather than regression-based errors such as MSE or RMSE, highlighting the importance of class-wise performance and model robustness in handling imbalanced datasets instead.
Ding et al. [
16] improved Water Quality Index (WQI) modeling using CatBoost, SVM, Logistic Regression (LR), XGBoost, and LightGBM. The models were assessed through standard uncertainty (SU) and sensitivity analysis using multiple linear regression (MLR), reporting both R² and RMSE values. The CatBoost-based model demonstrated the lowest uncertainty (0.559–0.903), while the SVM and LR models showed higher uncertainty levels (0.576–1.034). Additionally, the study examined different aggregation functions (SGM, WQM, SWM) and discussed how they influenced model sensitivity and overall reliability.
The study by Iyer et al. [
17] explored the use of machine learning algorithms, including SVM, Random Forest, and Decision Tree, for predicting water quality. Model performance was evaluated using accuracy, precision, recall, and F1-score. The authors found that Random Forest achieved the highest accuracy of 68%, outperforming the other models. However, the study did not report additional error metrics such as MSE or RMSE.
In the work of Prasad and Ranjit [
18], various machine learning algorithms, including Logistic Regression, SVM, Naive Bayes, KNN, Decision Tree, Random Forest, AdaBoost, and XGBoost, were applied for water quality classification. Accuracy was the main evaluation metric, with both Decision Tree and XGBoost achieving 99% accuracy. Other metrics, such as precision, recall, and F1-score, were also included, though regression-based errors, such as MSE and RMSE, were not reported.
Walczak and Walczak [
19] conducted a comparative analysis of four machine learning algorithms such as Neural Networks, Random Forest, KNN, and multivariate Linear Regression for predicting the Water Quality Index (WQI) with a reduced set of input variables. Model performance was assessed using Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the coefficient of determination (
). Both the Neural Network and Linear Regression models attained an
value of 0.999 when trained on the complete dataset; however, RMSE values varied depending on the number of predictors included.
Karthick et al. [
20] evaluated 14 machine learning algorithms for water quality classification, employing advanced preprocessing techniques such as the Yeo-Johnson transformation, principal component analysis (PCA), and Synthetic Minority Over-sampling Technique (SMOTE). Model performance was measured using accuracy, precision, recall, F1-score, and ROC AUC. The XGBoost model achieved the highest accuracy at 96.31% without SMOTE, while LightGBM also demonstrated strong performance.
Shams et al. [
21] assessed several machine learning models for predicting both the Water Quality Index (WQI) and Water Quality Classification (WQC). The authors applied grid search for hyperparameter optimization and evaluated regression models using Mean Absolute Error (MAE), Median Absolute Error (MedAE), Mean Squared Error (MSE), and the coefficient of determination (
). For classification tasks, accuracy, recall, precision, and F1-score are employed as evaluation metrics. The Multi-Layer Perceptron (MLP) model achieved the highest regression performance, with an
of 99.8% and an MSE of
. For classification, the Gradient Boosting (GB) model attained an accuracy of 99.5%.
Prabu et al. [
22] focused on anomaly detection in water treatment plants using a modified Quality Index (QI) and an encoder-decoder architecture. They evaluated their model using accuracy, precision, recall, Critical Success Index (CSI). Their proposed model achieved 89.18% accuracy, 85.54% precision, 94.02% recall, demonstrating strong performance in real-time anomaly detection and water quality monitoring.
Similarly, the authors at [
23] demonstrate that their proposed WDT-ANFIS model, which combines Wavelet De-noising Techniques with Adaptive Neuro-Fuzzy Inference Systems, significantly improved prediction accuracy for water quality parameters, achieving
values of at least 0.9 for most parameters while outperforming other approaches like MLP-ANN and RBF-ANN.
The authors of [
24] proposed two hybrid models: CEEMDAN-XGBoost and CEEMDAN-RF-for short-term water quality prediction, combining decision tree algorithms with the CEEMDAN denoising method. Using hourly data from the Tualatin River, these models demonstrated superior accuracy and stability compared with benchmarks such as LSTM and SVM, with CEEMDAN-RF excelling at predicting temperature and dissolved oxygen, while CEEMDAN-XGBoost performed better for pH and turbidity prediction. The study by [
25] addresses the common challenge of class imbalance in water-quality datasets by using an adaptive synthetic sampling (ADASYN) approach to generate synthetic instances above the threshold. Their evaluation of four machine learning models using k-nearest neighbors, boosting decision trees (BDT), support vector machines (SVM), and multi-layer perceptron neural networks (MLP-ANN) showed that BDT and MLP-ANN achieved the highest accuracy and sensitivity (over 90% and 75% respectively).
For water quality classification, the Random Forest Classifier consistently achieved the highest accuracy, reaching 98.2%, with precision, recall, and F1-score all equal to 0.98, as reported in [
26]. K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) models also demonstrated strong performance, achieving accuracy levels of 97%–98%. These findings underscore the robustness of ensemble and kernel-based methods for multi-class environmental prediction tasks. Furthermore, [
26] reports that models such as the Extra Tree Regressor (ETR) and Random Forest Regressor (RFR) achieve
values close to 0.99, with RMSE values between 1.55 and 1.71 and MAE below 0.75. Ridge Regression, Artificial Neural Networks (ANN), and ANFIS models also maintained competitive performance, with
values above 0.95.
The authors of [
27] applied a Linear model (LTSP-Linear) for water quality prediction at the Huangyang Reservoir, focusing on dissolved oxygen (DO), pH, and turbidity. Model evaluation using MSE and MAE indicated reductions of 8.55% in MSE and 10.51% in MAE compared to deep learning models such as LSTM and Informer, as well as the statistical ARIMA. For short-term DO prediction (96-step), the Linear model achieved an MSE of 0.05479 and an MAE of 0.15481, demonstrating high efficiency and performance despite its simple architecture.
The comprehensive review in [
28] highlights the effectiveness of models such as Extreme Gradient Boosting (XGBoost) and Long Short-Term Memory (LSTM) networks for water quality prediction. The analysis demonstrates that XGBoost achieves up to 99.99% accuracy, while LSTM attains an
of 0.9999 when predicting key parameters, including chlorophyll-a, turbidity, salinity, and dissolved oxygen.
In the context of water-quality prediction, the authors introduce a new performance metric, the
Water Quality Score (WQS), to facilitate the evaluation of models proposed in the literature. The score is computed according to Equation (
1):
where
represents the normalized root mean square error of the model predictions,
is the coefficient of determination, and
is a weighting factor. Setting
assigns equal importance to the error and goodness-of-fit terms, while
eliminates the contribution of
entirely. The authors adopt
, emphasizing predictive accuracy over explained variance. This choice is motivated by the nature of water-quality datasets, in which measurements are often noisy and acquired at relatively long probing intervals, commonly greater than 24 hours, making variance-based metrics less informative. If WQS values are above 1, then it is considered as an underfitted result or the attributes that contribute to RMSE value have not been normalized.
An exploratory analysis of the proposed Water Quality Score (WQS) shows that its behavior is strongly governed by the normalized RMSE term, since for predictive error contributes 80% of the final score, while contributes only 20%. Under the intended use of the metric, RMSE is expected to be normalized and therefore typically remain in the interval . In that case, is nonnegative, and because is also usually bounded by , WQS is expected to lie approximately within , with higher values indicating better overall predictive performance. A value close to 1 reflects both low prediction error and strong explanatory power, whereas moderate values indicate a trade-off between acceptable fit and non-negligible error. Negative WQS values arise when the normalized RMSE exceeds 1, making the term negative and causing the error contribution to outweigh the positive contribution of . Such cases suggest either very poor predictive accuracy or that RMSE was not properly normalized. Conversely, WQS values greater than 1 should not normally occur if RMSE is correctly normalized and ; therefore, values above 1 strongly indicate a scaling inconsistency, a calculation error, or the use of a non-normalized RMSE.
The litereture review data summarized in
Table 1 highlights a significant shift in environmental monitoring, moving away from simple classification toward complex, time-aware forecasting. At the heart of this effort is the concept of water potability, which is essentially a measure of whether water is safe for long-term human consumption without health risks. This is not determined by a single "clean" look, but by a delicate balance of physical and chemical parameters, such as pH, mineral concentration via TDS and EC, and the presence of suspended particles measured as turbidity.
When analyzing high-performance scores reported in the literature, a degree of scientific skepticism is necessary. While seeing a coefficient of determination (
) of 0.9999 might seem like a superficial achievement, in the messy world of real-world environmental sensors, such high values are often a red flag. Therefore, near-perfect values can be superficial, frequently indicating that a model has overfitted to a very specific, noise-free dataset or that the experimental conditions were too controlled to be realistic. In a true urban environment, sensor drift, biofouling, and sudden weather changes introduce unpredictability that makes absolute perfection nearly impossible. Therefore, in our research, we prioritize models that demonstrate high reliability in the 94–96% range, as they tend to be more robust and better at generalizing to the actual complexity of city water networks. By focusing on realistic loss metrics like RMSE and curve fir metrics like the
we also ensure that our proposition system remains easily re-evaluated by others. In the following
Section 2.2, a literature review of Deep Learning models is presented.