Submitted:
18 February 2023
Posted:
20 February 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Data preparation
| Variable | Count | Mean | Std. Dev. | Min | 25th Percentile | 75th Percentile |
|---|---|---|---|---|---|---|
| Max Temperature | 10,389 | 22.71 | 3.08 | 13.26 | 20.53 | 24.93 |
| Min Temperature | 10,389 | 12.90 | 1.96 | 5.71 | 11.61 | 14.41 |
| Precipitation | 10,389 | 3.17 | 6.34 | 0.00 | 0.15 | 3.66 |
| Wind | 10,389 | 2.49 | 0.55 | 0.65 | 2.14 | 2.87 |
| Relative Humidity | 10,389 | 0.76 | 0.10 | 0.32 | 0.69 | 0.84 |
| Solar | 10,389 | 16.92 | 7.23 | 0.00 | 11.18 | 22.19 |
2.2. Model Building
2.3. Model Evaluation
3. Results
4. Discussion
4.1. Precision-Recall curve results analysis
4.2. F1 Score results analysis
5. Conclusion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| XGBoost | eXtreme Gradient Boost |
| KNN | k-Nearest Neighbors |
| ML | Machine Learning |
| SVM | Support Vector Machine |
| MLP | Multi-Layer Perceptron |
| PBWB | Pangani Basin Water Board |
| TMA | Tanzania Meteorological Agency |
| KWK | Karanga-Weruweru-Kikavu |
| OvR | One-vs-the-Rest |
| ROC | Receiver Operating Characteristic |
References
- World Health Organization. Floods. Retrieved February 17, 2023, from. 2021. Available online: https://www.who.int/health-topics/floods.
- Jonkman, S.N. Global perspectives on loss of human life caused by floods. Natural hazards 2005, 34, 151–175. [Google Scholar] [CrossRef]
- Tanzania Meteorological Agency. Annual Technical Report on Meteorology, Hydrology and Climate Services 2020-2021 Update. Retrieved on February 17, 2023, from. 2021. Available online: https://www.meteo.go.tz/uploads/publications/sw1628770614-TMA%20BOOK%202020%20-2021%20UPDATE.pdf.
- Kimambo, O.N.; Chikoore, H.; Gumbo, J.R. Understanding the Effects of Changing Weather: A Case of Flash Flood in Morogoro on January 11, 2018. Advances in Meteorology 2019, 2019, 8505903. [Google Scholar] [CrossRef]
- Nayak, M.A.; Ghosh, S. Prediction of extreme rainfall event using weather pattern recognition and support vector machine classifier. Theoretical and applied climatology 2013, 114, 583–603. [Google Scholar] [CrossRef]
- Parmar, A.; Mistree, K.; Sompura, M. Machine learning techniques for rainfall prediction: A review. In Proceedings of the International Conference on Innovations in information Embedded and Communication Systems, 2017, Vol. 3.
- Stein, L.; Pianosi, F.; Woods, R. Event-based classification for global study of river flood generating processes. Hydrological Processes 2020, 34, 1514–1529. [Google Scholar] [CrossRef]
- Pham, Q.B.; Yang, T.C.; Kuo, C.M.; Tseng, H.W.; Yu, P.S. Combing random forest and least square support vector regression for improving extreme rainfall downscaling. Water 2019, 11, 451. [Google Scholar] [CrossRef]
- Grazzini, F.; Craig, G.C.; Keil, C.; Antolini, G.; Pavan, V. Extreme precipitation events over northern Italy. Part I: A systematic classification with machine-learning techniques. Quarterly Journal of the Royal Meteorological Society 2020, 146, 69–85. [Google Scholar] [CrossRef]
- Bank, W. Climate Change Knowledge Portal: Tanzania, 2019.
- Szczepanek, R. Daily Streamflow Forecasting in Mountainous Catchment Using XGBoost, LightGBM and CatBoost. Hydrology 2022, 9, 226. [Google Scholar] [CrossRef]
- Davenport, F.V.; Diffenbaugh, N.S. Using machine learning to analyze physical causes of climate change: A case study of US Midwest extreme precipitation. Geophysical Research Letters 2021, 48, e2021GL093787. [Google Scholar] [CrossRef]
- Fernández-Delgado, M.; Cernadas, E.; Barro, S.; Amorim, D. Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? Journal of Machine Learning Research 2014, 15, 3133–3181. [Google Scholar]
- Khoshgoftaar, T.M.; Golawala, M.; Hulse, J.V. An Empirical Study of Learning from Imbalanced Data Using Random Forest. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007), 2007, Vol. 2, pp. 310–317. [CrossRef]
- Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory Undersampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 2009, 39, 539–550. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; KDD ’16, p. 785–794. [CrossRef]
- Wang, C.; Deng, C.; Wang, S. Imbalance-XGBoost: leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost. Pattern Recognition Letters 2020, 136, 190–197. [Google Scholar] [CrossRef]
- Pilario, K.E.; Shafiee, M.; Cao, Y.; Lao, L.; Yang, S.H. A Review of Kernel Methods for Feature Extraction in Nonlinear Process Monitoring. Processes 2020, 8. [Google Scholar] [CrossRef]
- He, Y.; Ma, J.; Ye, X. A support vector machine classifier for the prediction of osteosarcoma metastasis with high accuracy. Int J Mol Med 2017, 40, 1357–1364. [Google Scholar] [CrossRef]
- Chychkarov, Y.; Serhiienko, A.; Syrmamiikh, I.; Kargin, A. Handwritten Digits Recognition Using SVM, KNN, RF and Deep Learning Neural Networks. In Proceedings of the CMIS, 2021.
- Mcroberts, R. A two-step nearest neighbors algorithm using satellite imagery for predicting forest structure within species composition classes. Remote Sensing of Environment 2009, 113, 532–545. [Google Scholar] [CrossRef]
- Azorin-Molina, C.; Ali, Z.; Hussain, I.; Faisal, M.; Nazir, H.M.; Hussain, T.; Shad, M.Y.; Mohamd Shoukry, A.; Hussain Gani, S. Forecasting Drought Using Multilayer Perceptron Artificial Neural Network Model. Advances in Meteorology 2017, 2017, 5681308. [Google Scholar] [CrossRef]
- Dinku, T.; Ceccato, P.; Grover-Kopec, E.; Lemma, M.; Connor, S.J.; Ropelewski, C.F. Validation of satellite rainfall products over East Africa’s complex topography. International Journal of Remote Sensing 2007, 28, 1503–1526. [Google Scholar] [CrossRef]
- Hamis, M.M. Validation of Satellite Rainfall Estimates Using Gauge Rainfall Over Tanzania, 2013.
- Lu, S.; ten Veldhuis, M.C.; van de Giesen, N. Evaluation of four satellite precipitation products over Tanzania. In Proceedings of the EGU General Assembly Conference Abstracts, 2018, EGU General Assembly Conference Abstracts, p. 1403.
- Cook, J.; Ramadas, V. When to consult precision-recall curves. The Stata Journal 2020, 20, 131–148. [Google Scholar] [CrossRef]
- Li, W.; Guo, Q. Plotting receiver operating characteristic and precision–recall curves from presence and background data. Ecology and Evolution 2021, 11, 10192–10206. [Google Scholar] [CrossRef]
- Erenel, Z.; Altincay, H. Improving the precision-recall trade-off in undersampling-based binary text categorization using unanimity rule. Neural Computing and Applications 2012, 22. [Google Scholar] [CrossRef]
- Brabec, J., K.; T., F.; V., M. On Model Evaluation Under Non-constant Class Imbalance. In Proceedings of the Computational Science – ICCS 2020; Springer International Publishing: Cham, 2020; pp. 74–87.
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
- Kotsiantis, S.; Kanellopoulos, D.; Pintelas, P.; et al. Handling imbalanced datasets: A review. GESTS international transactions on computer science and engineering 2006, 30, 25–36. [Google Scholar]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Information Processing & Management 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Batista, G.E.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter 2004, 6, 20–29. [Google Scholar] [CrossRef]
| 1 | Burundi and Tanzania – Floods Leave Homes Destroyed, Hundreds Displaced. https://floodlist.com/africa/burundi-tanzania-floods-late-february-2021
|
| 2 | Tanzania – Severe Flooding in Mtwara Region After Torrential Rainfall. https://floodlist.com/africa/tanzania-flood- mtwara-january-2021. |
| 3 | Tanzania – 12 Killed in Dar Es Salaam Flash Floods. https://floodlist.com/africa/tanzania-daressalaam-floods-october-2020. |
| 4 | |
| 5 | |
| 6 |
https://scikit-learn.org/stable/modules/generated/sklearn.multiclass OneVsRestClassifier.html |


| Random Forest | XGBoost | Support Vector Machine | KNN | Multi-layer Perceptron |
|---|---|---|---|---|
| 0.998 | 0.998 | 0.878 | 0.898 | 0.950 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).