Version 1
: Received: 21 August 2023 / Approved: 21 August 2023 / Online: 21 August 2023 (13:12:21 CEST)
Version 2
: Received: 24 September 2023 / Approved: 25 September 2023 / Online: 26 September 2023 (05:17:49 CEST)
Version 3
: Received: 7 November 2023 / Approved: 8 November 2023 / Online: 8 November 2023 (10:22:33 CET)
Version 4
: Received: 16 November 2023 / Approved: 17 November 2023 / Online: 17 November 2023 (14:15:58 CET)
Imani, M.; Arabnia, H.R. Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis. Technologies2023, 11, 167.
Imani, M.; Arabnia, H.R. Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis. Technologies 2023, 11, 167.
Imani, M.; Arabnia, H.R. Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis. Technologies2023, 11, 167.
Imani, M.; Arabnia, H.R. Hyperparameter Optimization and Combined Data Sampling Techniques in Machine Learning for Customer Churn Prediction: A Comparative Analysis. Technologies 2023, 11, 167.
Abstract
In this paper, a variety of machine learning techniques, including Artificial Neural Networks, Decision Trees, Support Vector Machines, Random Forests, Logistic Regression, and three gradient boosting techniques (XGBoost, LightGBM, and CatBoost), were employed to predict customer churn in the telecommunications industry using a publicly available dataset. To address the issue of imbalanced data, various data sampling techniques, such as SMOTE, the combination of SMOTE with Tomek Links, and the combination of SMOTE with Edited Nearest Neighbors, were implemented. Additionally, hyperparameter tuning was utilized to optimize the performance of the machine learning models. The models were evaluated and compared using commonly used metrics, including Precision, Recall, F1-Score, and the Receiver Operating Characteristic Area Under Curve (ROC AUC). The results revealed that the performance of the models was enhanced by the application of hyperparameter tuning and the combined data sampling methods on the training data.Overall, after applying SMOTE, XGBoost achieved impressive ROC AUC scores of 90% and an F1-Score of 92%. When SMOTE was combined with Tomek Links, LightGBM performed exceptionally well, achieving a ROC AUC of 91%. XGBoost continued to outperform with an F1-Score of 91% and an ROC AUC of 89%. SMOTE with ENN led XGBoost to outperform other techniques with an F1-Score of 88% and an ROC AUC of 89%. However, LightGBM exhibited a performance decline of 4% in F1-Score and 1% in ROC AUC compared to exclusive SMOTE usage. Lastly, after Optuna Hyperparameter Tuning, CatBoost excelled, achieving an impressive F1-Score of 93% and an ROC AUC of 91%.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Received:
26 September 2023
Commenter:
Mehdi Imani
Commenter's Conflict of Interests:
Author
Comment:
The list of changes is as below:1- the second part of the abstract 2- the introduction 3- the section 5.3 (ROC AUC Benchmark) 4- some changes in 6.2 (simulation results) 5- section 6.2.5 is added 6- some changes in 6.2.6 7- table 11 is added 8- some changes in conclusion
Commenter: Mehdi Imani
Commenter's Conflict of Interests: Author
2- the introduction
3- the section 5.3 (ROC AUC Benchmark)
4- some changes in 6.2 (simulation results)
5- section 6.2.5 is added
6- some changes in 6.2.6
7- table 11 is added
8- some changes in conclusion