Preprint
Article

This version is not peer-reviewed.

SHAP-Based Feature Selection and Iterative Hyperparameter Tuning for Customer Churn Prediction in Telecommunication Datasets

Submitted:

14 December 2025

Posted:

16 December 2025

You are already at the latest version

Abstract
Customer churn prediction is a critical task in the telecommunications industry, where retaining customers directly impacts revenue and operational efficiency. This study proposes a two iteration machine learning pipeline that integrates SHAP (SHapley Additive exPlanations) for explainable feature selection and Optuna-based hyperparameter tuning to enhance model performance and interpretability. In the first iteration, baseline models are trained on the full feature set of the Telco Customer Churn dataset (7043 samples, 25 features after preprocessing). The top-performing models—Gradient Boosting, Random Forest, and AdaBoost—are tuned and evaluated. SHAP is then applied to the best model (Gradient Boosting) to identify the top 20 features. In the second iteration, models are retrained on the reduced feature set, achieving comparable or improved performance: validation AUC of 0.999 (vs. 0.999 for full features) and test AUC of 0.998 (vs. 0.997). Results demonstrate that SHAP driven feature reduction maintains high predictive accuracy (test F1-score: 0.977) while improving interpretability and reducing model complexity. This workflow highlights the value of explainable AI in churn prediction, enabling stakeholders to understand key drivers like "Churn Reason" and "Dependents." What is the research problem? Accurate prediction of customer churn using machine learning models with a focus on explainable features to support business decisions. Why use SHAP? SHAP provides additive feature importance scores, enabling global and local interpretability, feature ranking for dimensionality reduction, and transparency in model predictions. What is the novelty? The iterative pipeline combines baseline training, SHAP-based feature selection, reduced-feature retraining, and hyperparameter retuning, offering a reproducible workflow for explainable churn modeling.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated