Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

The Impact of SMOTE and ADASYN on Random Forests and Advanced Gradient Boosting Techniques in Telecom Customer Churn Prediction

Version 1 : Received: 2 March 2024 / Approved: 4 March 2024 / Online: 5 March 2024 (07:10:38 CET)
Version 2 : Received: 9 April 2024 / Approved: 9 April 2024 / Online: 10 April 2024 (07:57:46 CEST)

How to cite: Imani, M.; Ghaderpour, Z.; Joudaki, M. The Impact of SMOTE and ADASYN on Random Forests and Advanced Gradient Boosting Techniques in Telecom Customer Churn Prediction. Preprints 2024, 2024030213. https://doi.org/10.20944/preprints202403.0213.v1 Imani, M.; Ghaderpour, Z.; Joudaki, M. The Impact of SMOTE and ADASYN on Random Forests and Advanced Gradient Boosting Techniques in Telecom Customer Churn Prediction. Preprints 2024, 2024030213. https://doi.org/10.20944/preprints202403.0213.v1

Abstract

This paper explores the capability of various machine learning algorithms, including Random Forests and advanced gradient boosting techniques such as XGBoost, LightGBM, and CatBoost, to predict customer churn in the telecommunications sector. For this analysis, a dataset available to the public was employed. The performance of these algorithms was assessed using recognized metrics, including Accuracy, Precision, Recall, F1-score, and the Receiver Operating Characteristic Area Under Curve (ROC AUC). These metrics were evaluated at different phases: subsequent to data preprocessing and feature selection; following the application of SMOTE and ADASYN sampling methods; and after conducting hyperparameter tuning on the data that had been adjusted by SMOTE and ADASYN. The outcomes underscore the notable efficiency of upsampling techniques such as SMOTE and ADASYN in addressing the imbalance inherent in customer churn prediction. Notably, the application of random grid search for hyperparameter optimization did not significantly alter the results, which remained comparatively unchanged. The algorithms' performance post- ADASYN application marginally surpassed that observed after SMOTE application. Remarkably, LightGBM achieved an exceptional F1-score of 89% and an ROC AUC of 95% subsequent to the ADASYN sampling technique. This underlines the effectiveness of advanced boosting algorithms and upsampling methods like SMOTE and ADASYN in navigating the complexities of imbalanced datasets and intricate feature interdependencies.

Keywords

customer churn prediction; machine learning; classification techniques; SMOTE; ADASYN; Random Forest; XGBoost; LightGBM; CatBoost

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.