Submitted:
30 April 2025
Posted:
30 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Materials and Methods
3.1. Study Area and Dataset Analysis
3.2. Data Preprocessing
3.3. Feature Importance Analysis
3.4. Method Selection and Evaluation
- Input Layer: Receives preprocessed input feature data, providing foundational information for subsequent network layers.
- First LSTM Layer: Extracts long-term dependencies from time series data, capturing dynamic data characteristics.
- Second LSTM Layer: Further explores deep temporal dependency features, enhancing model expressiveness.
- Dropout: Implements regularization with a specified dropout rate, reducing overfitting risk through random deactivation of neurons during training.
- Output Layer: A fully connected layer mapping previous layer outputs to target variables, generating final predictions.
4. Results and Discussion
4.1. Feature Importance Analysis Results
4.2. Model Training Results Analysis
4.3. SVM Model Performance
4.4. ANN Model Performance
4.5. LSTM Model Performance
5. Conclusion
- Theoretical Contribution: This work enhances the understanding of the temporal and spatial characteristics of water quality parameters and validates the applicability of deep learning techniques in DO prediction.
- Methodological Contribution: A systematic data preprocessing scheme and a Random Forest-based feature selection strategy were developed, along with a comprehensive multi-model comparison framework.
- Practical Contribution: The proposed models and methods can be applied in real-world water quality monitoring systems and provide technical support for water environment management and early warning systems.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ministry of Water Resources of the People’s Republic of China. (2024, June 14). 2023 China Water Resources Bulletin released. Available online: https://www.chinawater.com.cn/yw/202406/t20240614_1052624.html (accessed on 23 March 2025).
- Tung, Tran Minh, Yaseen, Zaher Mundher, others (2020). A survey on river water quality modelling using artificial intelligence models: 2000–2020. Journal of Hydrology, 585(), 124670. [CrossRef]
- Siami-Namini, S., Tavakoli, N., & Namin, A. S. (2018, December). A comparison of ARIMA and LSTM in forecasting time series. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 1394–1401). IEEE. [CrossRef]
- Cox, B. A. (2003). A review of dissolved oxygen modelling techniques for lowland rivers. Science of the Total Environment, 314, 303–334. [CrossRef]
- Liu, Y., Zhang, Q., Song, L., Chen, Y. (2019). Attention-based recurrent neural networks for accurate short-term and long-term dissolved oxygen prediction. Computers and Electronics in Agriculture, 165, 104964. [CrossRef]
- Pyo, J.; Pachepsky, Y.; Kim, S.; Abbas, A.; Kim, M.; Kwon, Y. S.; ... Cho, K. H. (2023). Long short-term memory models of water quality in inland water environments. Water Research X, 21, 100207. [CrossRef]
- Khabusi, S. P., Huang, Y. P. (2022, August). A deep learning approach to predict dissolved oxygen in aquaculture. In 2022 International Conference on Advanced Robotics and Intelligent Systems (ARIS) (pp. 1–6). IEEE. [CrossRef]
- Pan, D., Zhang, Y., Deng, Y., Van Griensven Thé, J., Yang, S. X., Gharabaghi, B. (2024). Dissolved oxygen forecasting for Lake Erie’s central basin using hybrid long short-term memory and gated recurrent unit networks. Water, 16(5), 707. [CrossRef]
- Ji, X.; Shang, X.; Dahlgren, R. A.; Zhang, M. (2017). Prediction of dissolved oxygen concentration in hypoxic river systems using support vector machine: a case study of Wen-Rui Tang River, China. Environmental Science and Pollution Research, 24, 16062–16076. [CrossRef]
- Nong, X., Lai, C., Chen, L., Shao, D., Zhang, C., Liang, J. (2023). Prediction modelling framework comparative analysis of dissolved oxygen concentration variations using support vector regression coupled with multiple feature engineering and optimization methods: A case study in China. Ecological Indicators, 146, 109845. [CrossRef]
- Wu, X.; Zhang, Q.; Wen, F.; Qi, Y. (2022). A water quality prediction model based on multi-task deep learning: a case study of the Yellow River, China. Water, 14(21), 3408. [CrossRef]
- Ziyad Sami, B. F.; Latif, S. D.; Ahmed, A. N.; Chow, M. F.; Murti, M. A.; Suhendi, A.; ... El-Shafie, A. (2022). Machine learning algorithm as a sustainable tool for dissolved oxygen prediction: a case study of Feitsui Reservoir, Taiwan. Scientific Reports, 12(1), 3649. [CrossRef]
- Ruan, J.; Cui, Y.; Song, Y.; Mao, Y. (2023). A novel RF-CEEMD-LSTM model for predicting water pollution. Scientific Reports, 13(1), 20901. [CrossRef]
- Heddam, S.; Kisi, O. (2017). Extreme learning machines: a new approach for modeling dissolved oxygen (DO) concentration with and without water quality variables as predictors. Environmental Science and Pollution Research, 24(20), 16702–16724. [CrossRef]
- Li, W.; Wu, H.; Zhu, N.; Jiang, Y.; Tan, J.; Guo, Y. (2021). Prediction of dissolved oxygen in a fishery pond based on gated recurrent unit (GRU). Information Processing in Agriculture, 8(1), 185–193. [CrossRef]
- Moghadam, S. V.; Sharafati, A.; Feizi, H.; Marjaie, S. M. S.; Asadollah, S. B. H. S.; Motta, D. (2021). An efficient strategy for predicting river dissolved oxygen concentration: application of deep recurrent neural network model. Environmental Monitoring and Assessment, 193, 1–18. [CrossRef]
- Liu, P.; Wang, J.; Sangaiah, A. K.; Xie, Y.; Yin, X. (2019). Analysis and prediction of water quality using LSTM deep neural networks in IoT environment. Sustainability, 11(7), 2058. [CrossRef]
- Kulanuwat, L.; Chantrapornchai, C.; Maleewong, M.; Wongchaisuwat, P.; Wimala, S.; Sarinnapakorn, K.; Boonya-Aroonnet, S. (2021). Anomaly detection using a sliding window technique and data imputation with machine learning for hydrological time series. Water, 13(13), 1862. [CrossRef]
- Chen, Y.; Song, L.; Liu, Y.; Yang, L.; Li, D. (2020). A review of the artificial neural network models for water quality prediction. Applied Sciences, 10(17), 5776. [CrossRef]
- Eze, E.; Ajmal, T. (2020). Dissolved oxygen forecasting in aquaculture: A hybrid model approach. Applied Sciences, 10(20), 7079. [CrossRef]
- Elkiran, G., Nourani, V., Abba, S. I., Abdullahi, J. (2018). Artificial intelligence-based approaches for multi-station modelling of dissolve oxygen in river. Global Journal of Environmental Science and Management, 4(4), 439–450.
- Zhang, P.; Liu, X.; Dai, H.; Shi, C.; Xie, R.; Song, G.; Tang, L. (2024). A multi-model ensemble approach for reservoir dissolved oxygen forecasting based on feature screening and machine learning. Ecological Indicators, 166, 112413. [CrossRef]
- Liu, W.; Lin, S.; Li, X.; Li, W.; Deng, H.; Fang, H.; Li, W. (2024). Analysis of dissolved oxygen influencing factors and concentration prediction using input variable selection technique: A hybrid machine learning approach. Journal of Environmental Management, 357, 120777. [CrossRef]
- Huan, J.; Chen, B.; Xu, X. G.; Li, H.; Li, M. B.; Zhang, H. (2021). River dissolved oxygen prediction based on random forest and LSTM. Applied Engineering in Agriculture, 37(5), 901–910. [CrossRef]
- Tan, W.; Zhang, J.; Liu, X.; Yu, Z.; Xiao, K.; Wang, L.; ... Guo, P. (2022, September). Dissolved oxygen prediction based on PCA-LSTM. In Journal of Physics: Conference Series (Vol. 2337, No. 1, p. 012012). IOP Publishing. [CrossRef]
- Taşan, S. (2023). Estimation of groundwater quality using an integration of water quality index, artificial intelligence methods and GIS: Case study, Central Mediterranean Region of Turkey. Applied Water Science, 13(1), 15. [CrossRef]
- Singh, P.; Kaur, P. D. (2017). Review on data mining techniques for prediction of water quality. International Journal of Advanced Research in Computer Science, 8(5).
- Huang, M.; Hu, B. Q.; Jiang, H.; Fang, B. W. (2023). A water quality prediction method based on k-nearest-neighbor probability rough sets and PSO-LSTM. Applied Intelligence, 53(24), 31106–31128. [CrossRef]
- Yang, J. (2023). Predicting water quality through daily concentration of dissolved oxygen using improved artificial intelligence. Scientific Reports, 13(1), 20370. [CrossRef]
- Baidu Baijiahao. (2023). Sichuan Dujiangyan: Water conservancy for thousands of years, nourishing Sichuan. Available online: https://baijiahao.baidu.com/s?id=1823006260704459232&wfr=spider&for=pc (accessed on 23 March 2025).
- Baidu Baike. (2024). Dujiangyan. Available online: https://baike.baidu.com/item/%E9%83%BD%E6%B1%9F%E5%A0%B0/122963 (accessed on 23 March 2025).









| Parameter | Description | Unit |
|---|---|---|
| Temperature | Temperature of water. | °C |
| Dissolved Oxygen | Oxygen dissolved in water per unit volume. | mg/L |
| Turbidity | Turbidity caused by suspended particles in water. | NTU |
| Ammonia Nitrogen | Nitrogen in the form of ammonium ions. | mg/L |
| Total Phosphorus | Sum of all forms of phosphorus in water. | mg/L |
| pH | Measures acidity and alkalinity of water. | Dimensionless |
| Conductivity | Measures electrical conductivity due to dissolved salts. | |
| Permanganate Index | Determines oxygen consumption of organic matter in water. | mg/L |
| Total Nitrogen | Sum of all forms of nitrogen in water. | mg/L |
| Parameter | Description | Range |
|---|---|---|
| C | The penalty parameter is used to balance the relationship between model complexity and training error. | 0.1 - 100 |
| Gamma | The kernel function coefficient determines the influence range of the support vector. | Scale or Auto |
| Kernel | Kernel function type, used to map input data into a high-dimensional space. Choosing an appropriate kernel function can help improve model performance. | Rbf or Linear |
| Parameters | Description | Range |
|---|---|---|
| Layer sizes | Number of hidden layer neural units | [32, 256] |
| Activation function | Activation function type | ReLU, Tanh |
| Learning rate | Model learning rate | [0.001, 0.01] |
| Alpha | Regularization parameter, used to control model complexity | [0.0001, 0.001] |
| Parameter | Description | Range |
|---|---|---|
| LSTM layer units | Number of units in each LSTM layer | [32,128] |
| Dropout rate | Dropout rate for regularization | [0.1, 0.5] |
| Activation function | Activation function for LSTM layers | ReLU, tanh |
| Learning rate | Learning rate for model optimization | [0.001, 0.01] |
| Model | Optimal Parameters |
|---|---|
| SVM | C = 2.2645; gamma=auto; Kernel = Rbf |
| ANN | layer_sizes = 256; activation = tanh; learning_rate_init = 0.00177; alpha = 0.00101 |
| LSTM | Units1 = 48; Units2 = 112; Dropout rate = 0.1; Activation = tanh; Learning rate = 0.0011117 |
| Model | MAE | MSE | RMSE | R2 |
|---|---|---|---|---|
| SVM | 0.0371 | 0.0029 | 0.0534 | 0.9888 |
| ANN | 0.0719 | 0.0116 | 0.1079 | 0.9541 |
| LSTM | 0.0405 | 0.0028 | 0.0529 | 0.9890 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).