Preprint Review Version 2 Preserved in Portico This version is not peer-reviewed

Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis

Version 1 : Received: 21 August 2023 / Approved: 21 August 2023 / Online: 21 August 2023 (13:12:21 CEST)
Version 2 : Received: 24 September 2023 / Approved: 25 September 2023 / Online: 26 September 2023 (05:17:49 CEST)
Version 3 : Received: 7 November 2023 / Approved: 8 November 2023 / Online: 8 November 2023 (10:22:33 CET)
Version 4 : Received: 16 November 2023 / Approved: 17 November 2023 / Online: 17 November 2023 (14:15:58 CET)

A peer-reviewed article of this Preprint also exists.

Imani, M.; Arabnia, H.R. Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis. Technologies 2023, 11, 167. Imani, M.; Arabnia, H.R. Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis. Technologies 2023, 11, 167.

Abstract

In this paper, a variety of machine learning techniques, including Artificial Neural Networks, Decision Trees, Support Vector Machines, Random Forests, Logistic Regression, and three gradient boosting techniques (XGBoost, LightGBM, and CatBoost), were employed to predict customer churn in the telecommunications industry using a publicly available dataset. To address the issue of imbalanced data, various data sampling techniques, such as SMOTE, the combination of SMOTE with Tomek Links, and the combination of SMOTE with Edited Nearest Neighbors, were implemented. Additionally, hyperparameter tuning was utilized to optimize the performance of the machine learning models. The models were evaluated and compared using commonly used metrics, including Precision, Recall, F1-Score, and the Receiver Operating Characteristic Area Under Curve (ROC AUC). The results revealed that the performance of the models was enhanced by the application of hyperparameter tuning and the combined data sampling methods on the training data.Overall, after applying SMOTE, XGBoost achieved impressive ROC AUC scores of 90% and an F1-Score of 92%. When SMOTE was combined with Tomek Links, LightGBM performed exceptionally well, achieving a ROC AUC of 91%. XGBoost continued to outperform with an F1-Score of 91% and an ROC AUC of 89%. SMOTE with ENN led XGBoost to outperform other techniques with an F1-Score of 88% and an ROC AUC of 89%. However, LightGBM exhibited a performance decline of 4% in F1-Score and 1% in ROC AUC compared to exclusive SMOTE usage. Lastly, after Optuna Hyperparameter Tuning, CatBoost excelled, achieving an impressive F1-Score of 93% and an ROC AUC of 91%.

Keywords

machine learning; churn prediction; imbalanced data; combined data sampling techniques; hyperparameter optimization

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (1)

Comment 1
Received: 26 September 2023
Commenter: Mehdi Imani
Commenter's Conflict of Interests: Author
Comment: The list of changes is as below:1- the second part of the abstract
2- the introduction
3- the section 5.3 (ROC AUC Benchmark)
4- some changes in 6.2 (simulation results)
5- section 6.2.5 is added
6- some changes in 6.2.6
7- table 11 is added
8- some changes in conclusion
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.