Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Feature Selection Approach to Improve Malaria Diagnosis Model for High and Low Endemic Areas of Tanzania

Version 1 : Received: 11 November 2021 / Approved: 15 November 2021 / Online: 15 November 2021 (10:36:16 CET)

How to cite: Mariki, M.; Mduma, N.; Mkoba, E. Feature Selection Approach to Improve Malaria Diagnosis Model for High and Low Endemic Areas of Tanzania. Preprints 2021, 2021110243. https://doi.org/10.20944/preprints202111.0243.v1 Mariki, M.; Mduma, N.; Mkoba, E. Feature Selection Approach to Improve Malaria Diagnosis Model for High and Low Endemic Areas of Tanzania. Preprints 2021, 2021110243. https://doi.org/10.20944/preprints202111.0243.v1

Abstract

Malaria remains an important cause of death, especially in sub-Saharan Africa with about 228 million malaria cases worldwide and an estimated 405,000 deaths in 2019. Currently, malaria is diagnosed in the health facility using a microscope (BS) or rapid malaria diagnostic test (MRDT) and with area where these tools are inadequate the presumptive treatment is performed. Apart from that self-diagnosis and treatment is also practiced in some of the households. With the high-rate self-medication on malaria drugs, this study aimed at computing the most significant features using feature selection methods for best prediction of malaria in Tanzania that can be used in developing a machine learning model for malaria diagnosis. A malaria symptoms and clinical diagnosis dataset were extracted from patients’ files from four (4) identified health facilities in the regions of Kilimanjaro and Morogoro. These regions were selected to represent the high endemic areas (Morogoro) and low endemic areas (Kilimanjaro) in the country. The dataset contained 2556 instances and 36 variables. The random forest classifier a tree based was used to select the most important features for malaria prediction. Regional based features were obtained to facilitate accurate prediction. The feature ranking as indicated that fever is universally the most influential feature for predicting malaria followed by general body malaise, vomiting and headache. However, these features are ranked differently across the regional datasets. Subsequently, six predictive models, using important features selected by feature selection method, were used to evaluate the features performance. The features identified complies with malaria diagnosis and treatment guideline provided with WHO and Tanzania Mainland. The compliance is observed so as to produce a prediction model that will fit in the current health care provision system in Tanzania.

Keywords

Feature Selection; Malaria Diagnosis; Supervised learning

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.