Submitted:
25 May 2026
Posted:
26 May 2026
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Materials and Methods
2.1. Study Site and Dataset
2.1.1. The Study Site
2.1.2. The Dataset
2.2. Proposed Hybrid HMM-LSTM-Attention Architecture
2.2.1. HMM Feature Extraction and Integration
- Hidden state sequences Z: we obtain Z = {, ..., } via Viterbi decoding, identifying the most probable state sequence: P(S | O, λ) through dynamic programming. Each state is one-hot encode into a vector ht ∈ ℝ3 representing categorical state membership at time t.
- Transition probability features P: we compute P(zt | zt-1, ..., zt-k) from the learned transition matrix, capturing recent state dynamics over a window k= 3 days. These transition features encode sequential dependencies between environmental states providing the LSTM with explicit information regarding regime evolution patterns.
2.2.2. Dual-Branch LSTM design
- Branch 1 (Hierarchical Attention): This branch implements a hierarchical two-layer LSTM (128 and 64 units, respectively) integrated with an addictive attention mechanism. The first layer captures multi-scale temporal patterns form the sequence x̃1:T, while the second layer refines these into compact 64-dimensional features. The attention layer then computes importance weights αti using the following alignment model:
- 2.
- Branch 2 (Baseline temporal encoding): operating in parallel, branch 2 processes the same inputs through a single 64-unit LSTM without an attention layer. This pathway acts as an “architectural insurance policy”, preventing over-reliance on attention-selected features and ensuring predictions remain grounded in the comprehensive historical context even if attention focuses on spurious correlations.
- Firstly, it is expecting to deal with uncertainty quantification. As probabilistic modeling captures inherent uncertainty in pollution events, posterior P(st | O) derived from the Forward-Backward algorithm provides confidence metrics essential for risk-based decision making.
- Secondly, it is capable for implicit seasonal modeling. The transition matrix A encodes explicit seasonal structures, allowing the model to leverage patterns likes dry-season stability and monsoon-driven degradation without requiring external meteorological data.
- Thirdly, the interpretable states align directly with QCVN 10:2023/BTNMT classes, facilitating transparent reasoning and improved communication with non-technical stakeholders.
- Finally, the attention mechanism enables the visualization of temporal weights, revealing the historical drivers behind current predictions and allowing for validation against established domain knowledge.
2.3. Model training protocol
- Step 1 – Data processing: raw water quality measurements undergo standardization using StandardScaler normalization fitted exclusively on training data to prevent information leakage. Missing values, which constitute approximately 3.2% of the dataset, are imputed through forward-fill and backward-fill propagation preserving temporal continuity.
- Step 2- Data partitioning: The dataset is partitioned chronologically using temporal splitting into 70% training, 15% validation, and 15% testing sets. No shuffling is applied during this split to maintain the strict chronological integrity of the time-series data.
- Step 3 – Input sequence construction: Input sequences are framed using a 7-day temporal sliding window (lookback window). This specific window size optimally balances the extraction of weekly anthropogenic autocorrelation patterns and monthly environmental variations while keeping the computational load tractable.
- Step 4 – HMM module implementation: The HMM component employs a Gaussian Hidden Markov Model with three latent states corresponding to discrete, regulatory water quality regimes (Good, Moderate, and Poor). It processes the full input sequences to generate state probability distributions, which are subsequently projected through dense layers with a hyperbolic tangent (tanh) activation function.
- Step 5 – Dual branch LSTM configuration: The deep learning architecture implements a hierarchical parallel design. Branch 1 utilizes a two-layer stacked LSTM network with 128 and 64 hidden units, configured with return_sequences=True to enable downstream attention integration. A Bahdanau attention layer is positioned after the second LSTM layer to compute learned alignment weights across 30 timesteps for dynamic temporal weighting. In parallel, Branch 2 provides a direct temporal encoding pathway using a single 64-unit LSTM layer.
- Step 6 – Feature fusion and optimization: The outputs from both pathways are concatenated into a unified 128-dimensional feature vector, which passes through two sequential fully connected layers with progressive dimensionality reduction 64→32 units). Model optimization is executed using the Adam algorithm under a categorical cross-entropy loss function. Balanced class weights are explicitly integrated into the loss function to mitigate the severe impact of data class imbalance
2.4. Models’ Baselines Comparison
- First, the standalone Hidden Markov Model (HMM) serves as probabilistic baseline, utilizing three hidden states and Gaussian emissions configured with diagonal covariance. Trained via Baum-Welch, this model operates directly on raw water quality time series without the hierarchical feature extraction capabilities inherent in deep learning architecture.
- Second, the standalone LSTM without Attention employs identical two-layer sequential LSTM configuration (128 and 64 units) and matches the dropout regularization of the proposed model. However, it completely excludes both HMM-derived state features and the attention layer, serving as a representative benchmark for standard deep learning frameworks typically deployed in water quality predicting literature.
- Third, the ARIMA model represent the classical statistical time-series forecasting approach. An Auto-Arima algorithm automatically executes hyperparameter selection through grid searches guided by the Akaike Information Criterion (AIC). Separate univariate ARIMA are optimized for each parameter, and theirs independent forecasts are consolidated through rule-based classifier tha maps the continous values directly to the corresponding QCVN 10:2023/BTNMT regulatory categories.
- Overall Accuracy: this metric quantifies the total proportion of correctly classified water quality samples across the entire test set;
- Class-specific Precision: this measures reliability and validity of positive predictions calculated as (TP/(TP+FP));
- Class-specific Recall: this quantifies the proportion of actual class members that are correctly identified by the model, calculated as (TP/(TP+FN)), which is particularly critical for capturing rare pollution events;
- F1-Score: this metric provides the harmonic mean that balances precision and recall, evaluated using both macro-averaged and weighted approaches;
- Confusion Matrix: this enables a granular, diagnosis inspection of specific misclassification patterns and shifts across classes.
3. Results
3.1. Training Dynamics and Convergence Behavior
3.2. Quantitative Model Performance Evaluation and Comparison
3.2.1. Model Performance Summary
3.2.2. Benchmarking Against Classical Statistics
- Modeling discrete regulatory regime transitions via HMM state sequences;
- Processing complex, nonlinear parameter interdependencies via deep recurrent layers;
- Dynamically weighting temporal feature importance via attention tracking.
3.2.3. Analysis of the accuracy-F1 diagnostic gap
4. Discussion
4.1. Architectural Synergy and Component Contributions
- The HMM component successfully captures discrete regime-switching behavior, modeling abrupt environmental changes such as sudden storm events or urban sewage overflows. By encoding explicit seasonal dynamics directly within its transition probability matrices, the model significantly improves tracking performance during unstable transitional periods;
- The dual-branch LSTM architecture enables hierarchical temporal learning across various timescales, successfully capturing dynamics ranging from short-term daily tidal fluctuations (1–7 days) to long-term quarterly monsoon transitions (30–90 days);
- The attention mechanism provides selective temporal weighting across historical observations. The learned attention patterns closely align with established domain knowledge: during stable conditions, recent observations (3–5 days) receive the highest weights, whereas pollution events trigger sharp attention peaks 1–3 days prior, successfully identifying the physical time-lag associated with terrestrial runoff impacts.
4.2. Architectural Synergy and Component Contributions
4.3. Compartive Context with Extant Literature
4.4. Practical Implicatins for Water Supply Infrastucture
- Operational early warning: The attention mechanism's ability to identify a one-to-three-day lag between terrestrial runoff stressors and pollution peaks provides local authorities with a critical operational lead time;
- Automated regulatory mapping: By mapping multi-parameter sensor arrays directly to national standards, managers can instantly identify critical zones requiring urgent remediation without manual data processing;
- Resource optimization: The 67% precision for "Poor" alerts prevents the waste of investigative municipal resources and maintains system credibility;
- Infrastructure protection: Early detection of elevated Total Suspended Solids (TSS) or highly acidic pH values enables operators to proactively manage water intake valves, preventing severe physical damage to downstream water treatment facilities.
5. Conclusions
- Deploy the model's predictions strictly as screening tools, maintaining human oversight for all critical management actions;
- Calibrate operational alarm thresholds to favor false alarms over missed pollution events, mitigating the impact of the 14% minority recall;
- Implement multi-model ensemble architectures to combine complementary prediction strengths;
- Conduct a prospective validation phase over a 1–2 year period under real-world conditions;
- Develop specialized technical training programs for environmental stakeholders and institutional managers.
- Mitigating the impacts of severe class imbalance through advanced data augmentation or focal loss formulations;
- Incorporating auxiliary continuous covariates such as local meteorological records, tidal height charts, and upstream discharge rates;
- Quantifying model prediction uncertainty to support robust risk-based decisions;
- Executing long-term prospective deployment validation to verify model stability.
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| HMM | Hidden Markov Model |
| LSTM | Long Short-Term Memory |
| QCVN | Quy chuẩn Việt Nam (Vietnam National Technical Regulation) |
| BTNMT | Bộ Tài nguyên và Môi Trường (Ministry of Natural Resources and Environment; from July 1st 2025 Ministry of Agriculture and Environment) |
| TSS | Total Suspended Solids |
| pH | Potential of Hydrogen |
| TPH | Total Petroleum Hydrocarbons |
| DO | Dissolved Oxygen |
| ARIMA | Autoregressive Integrated Moving Average |
| US. EPW | United States Environmental Protection Agency |
| WHO | World Health Organization |
| FAO | Food and Agriculture Organization of the United Nations |
| CNN | Convolutional Neural Network |
| APHA | American Public Health Association |
| MPN | Most Probable Number |
| AWWA | American Water Works Association |
| WEF | Water Environment Federation |
| AIC | Akaike Information Criterion |
| TP | True Positive |
| FP | False Positive |
| FN | False Negative |
References
- Trung, K.T., et al., Seasonal Variations in Surface Water Physicochemical Quality and Pollution Assessment in the Can Gio Estuary, Vietnam. Journal of Environment, Climate, and Ecology, 2025. 2(2): p. 147-154. [CrossRef]
- Vu, H.D., et al., Implementing International Soft Law Commitments on Wastewater Management in Vietnam: Evaluation and Lessons Learned. Indonesian Journal of Environmental Law and Sustainable Development, 2025. 4(2): p. 217-266. [CrossRef]
- Department, A.O.o.t.U.N.F., The state of world fisheries and aquaculture. 2018: Food and Agriculture Organization of the United Nations.
- Diaz, R.J. and R. Rosenberg, Spreading dead zones and consequences for marine ecosystems. science, 2008. 321(5891): p. 926-929.
- Boyd, C.E., Water quality: an introduction. 2000: Springer Science & Business Media.
- Ghaemi, E., M. Tabesh, and S. Nazif, Improving the ARIMA model prediction for water quality parameters of urban water distribution networks (case study: CANARY dataset). International Journal of Environmental Research, 2022. 16(6): p. 98. [CrossRef]
- Sun, H. and M. Koch, Time series analysis of water quality parameters in an estuary using Box-Jenkins ARIMA models and cross correlation techniques. Computational methods in water resources, 1996. 11: p. 230-239.
- Yavuz, V.S., Forecasting monthly rainfall and temperature patterns in Van Province, Türkiye, using ARIMA and SARIMA models: a long-term climate analysis. Journal of Water and Climate Change, 2025. 16(2): p. 800-818. [CrossRef]
- Durdu, A hybrid neural network and ARIMA model for water quality time series prediction. Engineering Applications of Artificial Intelligence, 2010. 23(4): p. 586-594.
- Maryam, R. and K. Abdullah, Time series and neural network to forecast water quality parameters using satellite data. Continental Shelf Research, 2021. 231: p. 104612.
- Wang, T., W. Chen, and B. Tang, Water quality prediction using ARIMA-SSA-LSTM combination model. Water Supply, 2024. 24(4): p. 1282-1297. [CrossRef]
- Box, G.E., et al., Time series analysis: forecasting and control. 2015: John Wiley & Sons.
- Aarts, E. and J. Haslbeck, Modeling Psychological Time Series with Multilevel Hidden Markov Models: A Tutorial. 2025. [CrossRef]
- Li, D., et al., An advanced approach for the precise prediction of water quality using a discrete hidden markov model. Journal of Hydrology, 2022. 609: p. 127659. [CrossRef]
- Lu, Z., Hidden Markov Models for Time Series: An Introduction Using R, by Walter Zucchini, Iain L. Macdonald, and Roland Langrock. Monographs on Statistics and Applied Probability 150, Published by CRC Press, 2016. Total number of pages: 28+ 370. ISBN: 978-1-4822-5383-2 (Hardback). Journal of Time Series Analysis, 2017.
- Zhang, T., et al., Hidden markov models to analyze China’s total water resources states and transfer characteristics. Theoretical and Applied Climatology, 2026. 157(5): p. 285. [CrossRef]
- Li, H., et al., Prediction of reservoir water levels via an improved attention mechanism based on CNN− LSTM: H. Li et al. Applied Intelligence, 2025. 55(7): p. 506. [CrossRef]
- Rahman, A., et al., An enhanced multi-head attention-based LSTM model for forecasting the surface water quality index. Water Practice & Technology, 2025. 20(4): p. 896-914. [CrossRef]
- Ruan, J., et al., A novel RF-CEEMD-LSTM model for predicting water pollution. Scientific Reports, 2023. 13(1): p. 20901. [CrossRef]
- Yang, Y., et al., A study on water quality prediction by a hybrid CNN-LSTM model with attention mechanism. Environmental Science and Pollution Research, 2021. 28(39): p. 55129-55139. [CrossRef]
- Zhang, Q., et al., A watershed water quality prediction model based on attention mechanism and Bi-LSTM. Environmental Science and Pollution Research, 2022. 29(50): p. 75664-75680. [CrossRef]
- Bi, J., et al., Accurate water quality prediction with attention-based bidirectional LSTM and encoder–decoder. Expert Systems with Applications, 2024. 238: p. 121807. [CrossRef]
- Vaswani, A., et al., Attention is all you need. Advances in neural information processing systems, 2017. 30.






| Class | Label | Description |
| A2 | Good | Full compliance with conservation standards |
| B1 | Moderate | Mild environment stress |
| B2 | Poor | Severe degradation requiring urgent remediation |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).