Preprint Review Version 1 Preserved in Portico This version is not peer-reviewed

Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis

Version 1 : Received: 21 August 2023 / Approved: 21 August 2023 / Online: 21 August 2023 (13:12:21 CEST)
Version 2 : Received: 24 September 2023 / Approved: 25 September 2023 / Online: 26 September 2023 (05:17:49 CEST)
Version 3 : Received: 7 November 2023 / Approved: 8 November 2023 / Online: 8 November 2023 (10:22:33 CET)
Version 4 : Received: 16 November 2023 / Approved: 17 November 2023 / Online: 17 November 2023 (14:15:58 CET)

A peer-reviewed article of this Preprint also exists.

Imani, M.; Arabnia, H.R. Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis. Technologies 2023, 11, 167. Imani, M.; Arabnia, H.R. Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis. Technologies 2023, 11, 167.

Abstract

In this paper, a variety of machine learning techniques, including Artificial Neural Networks, Decision Trees, Support Vector Machines, Random Forests, Logistic Regression, and three gradient boosting techniques (XGBoost, LightGBM, and CatBoost), were employed to predict customer churn in the telecommunications industry using a publicly available dataset. To address the issue of imbalanced data, various data sampling techniques, such as SMOTE, the combination of SMOTE with Tomek Links, and the combination of SMOTE with Edited Nearest Neighbors, were implemented. Additionally, hyperparameter tuning was utilized to optimize the performance of the machine learning models. The models were evaluated and compared using commonly used metrics, including Precision, Recall, F1-Score, and the Receiver Operating Characteristic Area Under Curve (ROC AUC). The results revealed that the performance of the models was enhanced by the application of hyperparameter tuning and the combined data sampling methods on the training data. Overall, LightGBM demonstrated superior performance compared to the other machine learning techniques examined. The findings indicate that LightGBM exhibited a superior performance both prior to and following the application of these techniques.

Keywords

Machine learning, Churn Prediction, Imbalanced Data, Combined Data Sampling Techniques, Hyperparameter Optimization.

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.