Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Prediction of the Concentration of Dissolved Oxygen in Running Water by Employing A Random Forest Machine Learning Technique

Version 1 : Received: 18 April 2020 / Approved: 19 April 2020 / Online: 19 April 2020 (12:32:56 CEST)

How to cite: Ahmed, M.H. Prediction of the Concentration of Dissolved Oxygen in Running Water by Employing A Random Forest Machine Learning Technique. Preprints 2020, 2020040342 (doi: 10.20944/preprints202004.0342.v1). Ahmed, M.H. Prediction of the Concentration of Dissolved Oxygen in Running Water by Employing A Random Forest Machine Learning Technique. Preprints 2020, 2020040342 (doi: 10.20944/preprints202004.0342.v1).

Abstract

Dissolved oxygen (DO) is a key indicator in the study of the ecological health of rivers. Modeling DO is a major challenge due to complex interactions among various process components of it. Considering the vital importance of it in water bodies, the accurate prediction of DO is a critical issue in ecosystem management. Given the intricacy of the current process-based water quality models, a data-driven model could be an effective alternative tool. In this study, a random forest machine learning technique is employed to predict the DO level by identifying its major drivers. Time-series of half-hourly water quality data, spanning from 2007 to 2019, for the South Branch Potomac River near Springfield, WV, are obtained from the United States Geological Survey database. Key drivers are identified, and models are formulated for different scenarios of input variables. The model is calibrated for each input scenario using 80% of the data. Water temperature and pH are found to be the most influential predictors of DO. However, satisfactory model performance is achieved by considering water temperature, pH, and specific conductance as input variables. The model validation is made by predicting DO concentrations for the remaining 20% of the data. The comparison with the traditional multiple linear regression method shows that the random forest model performs significantly better. The study insights are, therefore, expected to be useful to estimate stream/river DO levels at various sites with a minimum number of predictors and help build a sturdy framework for ecosystem health management across an environmental gradient.

Subject Areas

dissolved oxygen; rivers; random forest; multiple linear regression; water quality; ecosystem health

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.