Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

A Comparative Analysis of Machine Learning Models for Prediction of Insurance Uptake in Kenya

Version 1 : Received: 8 October 2020 / Approved: 9 October 2020 / Online: 9 October 2020 (08:41:24 CEST)

How to cite: Yego, N.; Kasozi, J.; Nkrunziza, J. A Comparative Analysis of Machine Learning Models for Prediction of Insurance Uptake in Kenya. Preprints 2020, 2020100186 (doi: 10.20944/preprints202010.0186.v1). Yego, N.; Kasozi, J.; Nkrunziza, J. A Comparative Analysis of Machine Learning Models for Prediction of Insurance Uptake in Kenya. Preprints 2020, 2020100186 (doi: 10.20944/preprints202010.0186.v1).

Abstract

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.

Subject Areas

Insurance Uptake, Machine Learning , Upsample, Downsample.

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.