Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Water Quality Class Modeling Using Machine Learning Algorithms at Roodeplaat Dam, South Africa

Version 1 : Received: 14 June 2023 / Approved: 14 June 2023 / Online: 14 June 2023 (08:40:50 CEST)

How to cite: Tadesse, K. B.; Dinka, M. O. Water Quality Class Modeling Using Machine Learning Algorithms at Roodeplaat Dam, South Africa. Preprints 2023, 2023061016. https://doi.org/10.20944/preprints202306.1016.v1 Tadesse, K. B.; Dinka, M. O. Water Quality Class Modeling Using Machine Learning Algorithms at Roodeplaat Dam, South Africa. Preprints 2023, 2023061016. https://doi.org/10.20944/preprints202306.1016.v1

Abstract

Water pollution is a common problem for dams situated within an urban or agricultural catchment. This can negatively affect the hydro ecosystem, drinking, recreational and other uses of water. In this study, the drinking water quality class of the Roodeplaat Dam, South Africa which faces pollution problems was modeled using machine learning algorisms in Python Jupyter Notebook 6.0.0. Eleven monthly water quality parameters recorded at five sampling stations from January 1981 to September 2017 were used for training and testing the model. Five machine learning classifiers: Gaussian Naïve Bayes (GNB), K-nearest neighbors (KNN), Decision Tree (DT), Support Vector Machines (SVM), and Linear Regression (LR) at a test size of 20%, 25%, 30%, and 40% were used to classify water into five classes (Excellent to Very bad). It was investigated that the dam water has only three classes good, medium, and bad. The prediction accuracies of machine learning algorithms from the highest to the lowest were 96.39%, 96.17%, 92.25%, 90.20, and 54.19% for KNN, DT, SVM, GNB, and LR, respectively. Therefore, KNN at a test size of 30% was recommended to classify the water quality of Roodeplat Dam accurately. Hence, machine learning algorithms can be used to identify the class of water quality before the water is treated and distributed for drinking use.

Keywords

Decision Tree; linear regression; Naïve Bayes; Python; Support Vector Machine

Subject

Environmental and Earth Sciences, Water Science and Technology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.