Submitted:
21 August 2023
Posted:
21 August 2023
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
2. Machine Learning Techniques
A. Artificial Neural Network
B. Support Vector Machine
C. Decision Tree
D. Logistic Regression
3. Ensemble Learning
A. Bagging
B. Random Forest
C. Boosting


D. The Famous Trio: XGBoost, LightGBM, CatBoost
| Algorithm 1. Gradient Boost |
|
1— 2—For i = 1 to M a. Compute gi(x) using eq() b. Train the function h(x,θi) c. Find pi using eq() d. Update the function 3—End |
- To randomly divide the records into subsets,
- To convert the labels to integer numbers,
- To transform the categorical features to numerical features, as follows:
4. Handling Imbalanced Data
A. Sampling Techniques
B. Training and Validation Process
5. Evaluation Metrics
A. Threshold Metrics
B. Ranking Metrics


C. ROC AUC Benchmark
6. Simulation
A. Simulation Setup
7. Simulation Results
A. Applying Feature Selection
B. Applying SMOTE
C. Applying SMOTE with Tomek Links
D. Applying SMOTE with ENN
E. Applying OPTUNA Hyperparameter Optimizer
8. Conclusion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- The Chartered Institute of Marketing. Cost of Customer Acquisition versus Customer Retention. 2010.
- Eichinger, F.; Nauck, D.D.; Klawonn, F. Sequence mining for customer behaviour predictions in telecommunications. In Proceedings of the Workshop on Practical Data Mining at ECML/PKDD; 2006; pp. 3–10. [Google Scholar]
- Prasad, U.D.; Madhavi, S. Prediction of churn behaviour of bank customers using data mining tools. Indian J. Market. 2011, 42, 25–30. [Google Scholar]
- Keramati, A.; Ghaneei, H.; Mirmohammadi, S.M. Developing a prediction model for customer churn from electronic banking services using data mining. Financial Innov. 2016, 2, 10. [Google Scholar] [CrossRef]
- Scriney, Michael, Dongyun Nie, and Mark Roantree. Predicting customer churn for insurance data. In International Conference on Big Data Analytics and Knowledge Discovery; Springer: Cham, 2020. [Google Scholar] [CrossRef]
- De Caigny, Arno, Kristof Coussement, and Koen W. De Bock. A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research 2018, 269, 760–772. [Google Scholar] [CrossRef]
- Kim, K.; Jun, C.-H.; Lee, J. Improved churn prediction in telecommunication industry by analyzing a large network. Expert Syst. Appl. 2014, 41, 6575–6584. [Google Scholar] [CrossRef]
- Ahmad, A.K.; Jafar, A.; Aljoumaa, K. Customer churn prediction in telecom using machine learning in big data platform. J. Big Data 2019, 6, 28. [Google Scholar] [CrossRef]
- De Caigny, Arno, Kristof Coussement, and Koen W. De Bock. A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research 2018, 269, 760–772. [Google Scholar] [CrossRef]
- Jadhav, R.J.; Pawar, U.T. Churn prediction in telecommunication using data mining technology. IJACSA Edit. 2011, 2, 17–19. [Google Scholar]
- Radosavljevik, D.; van der Putten, P.; Larsen, K.K. The impact of experimental setup in prepaid churn prediction for mobile telecommunications: what to predict, for whom and does the customer experience matter? Trans MLDM 2010, 3, 80–99. [Google Scholar]
- Richter, Y.; Yom-Tov, E.; Slonim, N. Predicting customer churn in mobile networks through analysis of social groups. SDM, vol. 2010, SIAM, 2010, pp. 732–741.
- Amin, A.; Shah, B.; Khattak, A.M.; Moreira, F.J.L.; Ali, G.; Rocha, A.; Anwar, S. Cross-company customer churn prediction in telecommunication: A comparison of data transformation methods. Int. J. Inf. Manag. 2018, 46, 304–319. [Google Scholar] [CrossRef]
- Tsiptsis, K.; Chorianopoulos, A. Data Mining Techniques in CRM: Inside Customer Segmentation; John Wiley & Sons, 2011. [Google Scholar]
- Joudaki, Majid, et al. Presenting a New Approach for Predicting and Preventing Active/Deliberate Customer Churn in Telecommunication Industry. In Proceedings of the International Conference on Security and Management (SAM). The Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp); 2011. [Google Scholar]
- Amin, A.; Al-Obeidat, F.; Shah, B.; Adnan, A.; Loo, J.; Anwar, S. Customer churn prediction in telecommunication industry using data certainty. J. Bus. Res. 2019, 94, 290–301. [Google Scholar] [CrossRef]
- Shaaban, E.; Helmy, Y.; Khedr, A.; Nasr, M. A proposed churn prediction model. J. Eng. Res. Appl. 2012, 2, 693–697. [Google Scholar]
- Khan, Y.; Shafiq, S.; Naeem, A.; Ahmed, S.; Safwan, N.; Hussain, S. Customers Churn Prediction using Artificial Neural Networks (ANN) in Telecom Industry. Int. J. Adv. Comput. Sci. Appl. 2019, 10. [Google Scholar] [CrossRef]
- Ho, Tin Kam. Random decision forests. Proceedings of 3rd international conference on document analysis and recognition; 1995; 1. [Google Scholar] [CrossRef]
- Breiman, Leo. Random forests. Machine learning 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Amin, A.; Shehzad, S.; Khan, C.; Ali, I.; Anwar, S. Churn prediction in telecommunication industry using rough set approach. In New Trends in Computational Collective Intelligence; Springer, 2015; pp. 83–95. [Google Scholar]
- Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining : Practical Machine Learning Tools and Techniques; Elsevier Science & Technology: San Francisco, 2016. [Google Scholar]
- Kumar, A.; Jain, M. Ensemble Learning for AI Developers: Learn Bagging, Stacking, and Boosting Methods with Use Cases, Apress, 2020.
- van Wezel, M.; Potharst, R. Improved customer choice predictions using ensemble methods. Eur. J. Oper. Res. 2007, 181, 436–452. [Google Scholar] [CrossRef]
- Ullah, I.; Raza, B.; Malik, A.K.; Imran, M.; Islam, S.U.; Kim, S.W. A Churn Prediction Model Using Random Forest: Analysis of Machine Learning Techniques for Churn Prediction and Factor Identification in Telecom Sector. IEEE Access 2019, 7, 60134–60149. [Google Scholar] [CrossRef]
- Lalwani, P.; Mishra, M.K.; Chadha, J.S.; Sethi, P. Customer churn prediction system: a machine learning approach. Computing 2021, 104, 271–294. [Google Scholar] [CrossRef]
- Tarekegn, A.; Ricceri, F.; Costa, G.; Ferracin, E.; Giacobini, M. Predictive Modeling for Frailty Conditions in Elderly People: Machine Learning Approaches. Psychopharmacol. 2020, 8, e16678. [Google Scholar] [CrossRef]
- Ahmed, M.; Afzal, H.; Siddiqi, I.; Amjad, M.F.; Khurshid, K. Exploring nested ensemble learners using overproduction and choose approach for churn prediction in telecom industry. Neural Comput. Appl. 2018, 32, 3237–3251. [Google Scholar] [CrossRef]
- Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory. ACM; 1992; pp. 144–152. [Google Scholar]
- Hur, Y.; Lim, S. Customer churning prediction using support vector machines in online auto insurance service. In Advances in Neural Networks” – ISNN 2005; Springer, 2005; pp. 928–933. [Google Scholar]
- Lee, S.J.; Siau, K. A review of data mining techniques. Ind. Manag. Data Syst. 2001, 101, 41–46. [Google Scholar] [CrossRef]
- Mazhari, N. Imani, M., Joudaki, M. and Ghelichpour, A.,"An overview of classification and its algorithms" 3rd Data Mining Conference (IDMC'09): Tehran, 2009.
- Linoff, G.S.; Berry, M.J. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management; John Wiley & Sons, 2011.
- Zhou, Z.-H. Ensemble Methods - Foundations and Algorithms; Taylor & Francis group, LLC: 2012.
- Kumar, A.; Jain, M. , Ensemble Learning for AI Developers: Learn Bagging, Stacking, and Boosting Methods with Use Cases, Apress, 2020.
- Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining: Practical Machine Learning Tools and Techniques; Elsevier Science & Technology: San Francisco, 2016. [Google Scholar]
- Karlberg, J.; Axen, M. Binary Classification for Predicting Customer Churn; Umeå University: Umeå, 2020. [Google Scholar]
- Windridge, D.; Nagarajan, R. "Quantum Bootstrap Aggregation. In Proceedings of the International Symposium on Quantum Interaction; 2017. [Google Scholar] [CrossRef]
- Wang, J.C.; Hastie, T. Boosted Varying-Coefficient Regression Models for Product Demand Prediction. J. Comput. Graph. Stat. 2014, 23, 361–382. [Google Scholar] [CrossRef]
- Al Daoud, E. Intrusion Detection Using a New Particle Swarm Method and Support Vector Machines. World Academy of Science, Engineering and Technology 2013, 77, 59–62. [Google Scholar]
- Al Daoud, E.; Turabieh, H. New empirical nonparametric kernels for support vector machine classification. Appl. Soft Comput. 2013, 13, 1759–1765. [Google Scholar] [CrossRef]
- Al Daoud, E. An Efficient Algorithm for Finding a Fuzzy Rough Set Reduct Using an Improved Harmony Search. Int. J. Mod. Educ. Comput. Sci. (IJMECS) 2015, 7, 16–23. [Google Scholar] [CrossRef]
- Zhang, Y.; Haghani, A. A gradient boosting method to improve travel time prediction. Transp. Res. Part C: Emerg. Technol. 2015, 58, 308–324. [Google Scholar] [CrossRef]
- Dorogush, A.; Ershov, V.; Gulin, A. CatBoost: gradient boosting with categorical features support. NIPS 2017. [Google Scholar]
- Qi, M.; Guolin, K.; Taifeng, W.; Wei, C.; Qiwei, Y.; Weidong, M.; TieYan, L. A Communication-Efficient Parallel Algorithm for Decision Tree. Advances in Neural Information Processing Systems 2016, 29, 1279–1287. [Google Scholar]
- Klein, A.; Falkner, S.; Bartels, S.; Hennig, P.; Hutter, F. Fast Bayesian optimization of machine learning hyperparameters on large datasets. Proceedings of Machine Learning Research PMLR; 2017; 54, pp. 528–536. [Google Scholar]
- 47. Kubat, Miroslav, and Stan Matwin. Addressing the curse of imbalanced training sets: one-sided selection. Icml.
- Chawla, Nitesh V. , et al. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Tomek, I. Two Modifications of CNN. IEEE Trans. Syst. Man, Cybern. 1976, SMC-6, 769–772. [Google Scholar] [CrossRef]
- Wilson, D.L. Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Trans. Syst. Man Cybern. 1972, 2, 408–421. [Google Scholar] [CrossRef]
- Tyagi, S.; Mittal, S. , "Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning," in Proceedings of ICRIC 2019, 2020. [CrossRef]
- Fawcett, T. An Introduction to ROC analysis. Pattern Recogn. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Akiba, Takuya, et al. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2019.
- Bergstra, James, Daniel Yamins, and David Cox. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of The 30th International Conference on Machine Learning. 2013.
- 55. Bergstra, James S., et al. Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems.
- Hansen, N.; Ostermeier, A. Completely Derandomized Self-Adaptation in Evolution Strategies. Evol. Comput. 2001, 9, 159–195. [Google Scholar] [CrossRef] [PubMed]
- Li, Liam, et al. A system for massively parallel hyperparameter tuning. Proceedings of Machine Learning and Systems 2020, 2, 230–246. [Google Scholar]







| Predicted Class | |||
|---|---|---|---|
| Churners | Non-Churners | ||
| Actual class | Churners | TP | FN |
| Non-churners | FP | TN | |
| ROC AUC<= 50% | Something Is Wrong * |
|---|---|
| 50%<= ROC AUC <60% | Similar to flipping a coin |
| 60%<= ROC AUC <70% | Weak prediction |
| 70%<= ROC AUC <80% | Good Prediction |
| 80%<= ROC AUC <90% | Very Good Prediction |
| ROC AUC >= 90% | Excellent Prediction |
| Variable Name | Type |
|---|---|
| state, (the US state of customers) | string |
| account_length (number of active months) | numerical |
| area_code, (area code of customers) | string |
| international_plan, (whether customers have international plans) | yes/no |
| voice_mail_plan, (whether customers have voice mail plans) | yes/no |
| number_vmail_messages, (number of voice-mail messages) | numerical |
| total_day_minutes, (total minutes of day calls) | numerical |
| total_day_calls, (total number of day calls) | numerical |
| total_day_charge, (total charge of day calls) | numerical |
| total_eve_minutes, (total minutes of evening calls) | numerical |
| total_eve_calls, (total number of evening calls) | numerical |
| total_eve_charge, (total charge of evening calls) | numerical |
| total_night_minutes, (total minutes of night calls) | numerical |
| total_night_calls, (total number of night calls) | numerical |
| total_night_charge, (total charge of night calls) | numerical |
| total_intl_minutes, (total minutes of international calls) | numerical |
| total_intl_calls, (total number of international calls) | numerical |
| total_intl_charge, (total charge of international calls) | numerical |
| number_customer_service_calls, (number of calls to customer service) | numerical |
| churn, (customer churn – the target variable) | yes/no |
| Models | Precision% | Recall% | F1-Score% | ROC AUC% |
|---|---|---|---|---|
| DT | 91 | 72 | 77 | 72 |
| ANN | 85 | 76 | 80 | 77 |
| LR | 61 | 70 | 62 | 70 |
| SVM | 81 | 57 | 59 | 57 |
| RF | 96 | 75 | 81 | 75 |
| CatB | 90 | 90 | 90 | 90 |
| LGBM | 94 | 91 | 92 | 91 |
| XGB | 96 | 87 | 91 | 87 |
| Models | Precision% | Recall% | F1-Score% | ROC AUC% |
|---|---|---|---|---|
| DT | 69 | 72 | 70 | 72 |
| ANN | 70 | 73 | 71 | 83 |
| LR | 61 | 71 | 61 | 70 |
| SVM | 65 | 73 | 68 | 73 |
| RF | 83 | 76 | 79 | 76 |
| CatB | 79 | 88 | 83 | 88 |
| LGBM | 87 | 90 | 88 | 90 |
| XGB | 95 | 90 | 92 | 90 |
| Models | Precision% | Recall% | F1-Score% | ROC AUC% |
|---|---|---|---|---|
| DT | 74 | 74 | 74 | 74 |
| ANN | 69 | 75 | 71 | 75 |
| LR | 61 | 70 | 61 | 69 |
| SVM | 65 | 73 | 67 | 73 |
| RF | 85 | 78 | 81 | 78 |
| CatB | 80 | 88 | 83 | 88 |
| LGBM | 89 | 91 | 90 | 91 |
| XGB | 94 | 89 | 91 | 89 |
| Models | Precision% | Recall% | F1-Score% | ROC AUC% |
|---|---|---|---|---|
| DT | 60 | 70 | 50 | 70 |
| ANN | 61 | 70 | 60 | 70 |
| LR | 52 | 50 | 13 | 50 |
| SVM | 60 | 70 | 58 | 70 |
| RF | 67 | 76 | 69 | 76 |
| CatB | 70 | 83 | 72 | 83 |
| LGBM | 80 | 89 | 84 | 87 |
| XGB | 88 | 89 | 88 | 89 |
| Models | Precision% | Recall% | F1-Score% | ROC AUC% |
|---|---|---|---|---|
| CatB | 89 | 91 | 90 | 91 |
| CatB-Optuna | 95 | 91 | 93 | 91 |
| LGBM | 92 | 90 | 91 | 90 |
| LGBM-Optuna | 93 | 89 | 90 | 89 |
| XGB | 93 | 88 | 90 | 88 |
| XGB-Optuna | 94 | 88 | 91 | 88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).