Submitted:
17 April 2025
Posted:
18 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Employee Retention Background and Importance
1.2. Challenges in Predicting Employee Turnover
2. Literature Review
2.1. Traditional Approaches to Employee Retention Analysis
2.2. Machine Learning Applications in Human Resource Management
2.3. Feature Selection Techniques for Predictive HR Analytics
3. Methodology
3.1. Dataset Description and Preprocessing Techniques

3.2. Feature Selection Optimization Framework

3.3. Machine Learning Model Development and Evaluation

| Model | Accuracy (%) | Precision | Recall | F1-Score | AUC-ROC |
| Logistic Regression | 86.05±1.68 | 0.808 | 0.638 | 0.675 | 0.834 |
| Decision Tree | 77.89±3.73 | 0.535 | 0.543 | 0.537 | 0.753 |
| Random Forest | 85.71±1.23 | 0.723 | 0.527 | 0.514 | 0.846 |
| Gradient Boosting | 87.16±2.11 | 0.782 | 0.645 | 0.679 | 0.871 |
| XGBoost | 87.33±1.34 | 0.798 | 0.647 | 0.684 | 0.875 |
| Deep Neural Network | 86.73±2.54 | 0.735 | 0.665 | 0.689 | 0.869 |
4. Results and Analysis
4.1. Comparative Analysis of Feature Selection Methods
| Feature | Information Gain | Pearson Correlation | RFECV | SHAP-based | Hybrid Approach |
| OverTime | 100 | 100 | 100 | 100 | 100 |
| MonthlyIncome | 98 | 97 | 100 | 100 | 100 |
| JobInvolvement | 95 | 82 | 98 | 100 | 100 |
| StockOptionLevel | 94 | 79 | 96 | 98 | 98 |
| YearsAtCompany | 87 | 85 | 94 | 97 | 97 |
| JobSatisfaction | 76 | 68 | 87 | 96 | 95 |
| WorkLifeBalance | 73 | 65 | 83 | 92 | 93 |
| Age | 64 | 72 | 75 | 89 | 91 |

4.2. Model Performance Evaluation and Interpretation

4.3. Key Predictive Factors Influencing Employee Retention
5. Discussion
5.1. Theoretical Contributions to HR Analytics
5.2. Practical Applications for Human Resource Management
Acknowledgments
References
- Kaur, B., & Dogra, A. (2022, November). A machine learning model for predicting employees retention: An initiative towards HR through machine. In 2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC) (pp. 653-657). IEEE.
- Sharma, R., & Dhingra, L. (2024, July). Advancing Human Resource Strategies with Deep Learning: Predictive Analytics for Improving Employee Retention Rates. In 2024 2nd World Conference on Communication & Computing (WCONF) (pp. 1-4). IEEE.
- Ponmalar, S., Fowmiya, N. A., & Nandhini, C. (2024, December). AI-Driven Retention: A Hybrid Approach to Employee Turnover Prediction. In 2024 9th International Conference on Communication and Electronics Systems (ICCES) (pp. 1554-1559). IEEE.
- Mitravinda, K. M., & Shetty, S. (2022, December). Employee attrition: Prediction, analysis of contributory factors and recommendations for employee retention. In 2022 IEEE International conference for women in innovation, technology & entrepreneurship (ICWITE) (pp. 1-6). IEEE.
- Pandey, D. K., Upadhyay, S., Jha, A. K., Rana, S., & Singh, M. (2024, December). Leveraging HR Analytics for Predictive Talent Management and Employee Retention. In 2024 13th International Conference on System Modeling & Advancement in Research Trends (SMART) (pp. 436-440). IEEE.
- Huang, D., Yang, M., & Zheng, W. (2024). Using Deep Reinforcement Learning for Optimizing Process Parameters in CHO Cell Cultures for Monoclonal Antibody Production. Artificial Intelligence and Machine Learning Review, 5(3), 12-27. [CrossRef]
- Jiang, C., Zhang, H., & Xi, Y. (2024). Automated Game Localization Quality Assessment Using Deep Learning: A Case Study in Error Pattern Recognition. Journal of Advanced Computing Systems, 4(10), 25-37.
- Huang, T., Xu, Z., Yu, P., Yi, J., & Xu, X. (2025). A Hybrid Transformer Model for Fake News Detection: Leveraging Bayesian Optimization and Bidirectional Recurrent Unit. arXiv preprint arXiv:2502.09097.
- Weng, J., Jiang, X., & Chen, Y. (2024). Real-time Squat Pose Assessment and Injury Risk Prediction Based on Enhanced Temporal Convolutional Neural Networks. [CrossRef]
- Xu, X., Yu, P., Xu, Z., & Wang, J. (2025). A hybrid attention framework for fake news detection with large language models. arXiv preprint arXiv:2501.11967.
- Bi, W., Trinh, T. K., & Fan, S. (2024). Machine Learning-Based Pattern Recognition for Anti-Money Laundering in Banking Systems. Journal of Advanced Computing Systems, 4(11), 30-41.
- Ma, X., Bi, W., Li, M., Liang, P., & Wu, J. (2025). An Enhanced LSTM-based Sales Forecasting Model for Functional Beverages in Cross-Cultural Markets. Applied and Computational Engineering, 118, 55-63. [CrossRef]
- Xu, Y., Liu, Y., Wu, J., & Zhan, X. (2024). Privacy by Design in Machine Learning Data Collection: An Experiment on Enhancing User Experience. Applied and Computational Engineering, 97, 64-68. [CrossRef]
- Chen, J., Yan, L., Wang, S., & Zheng, W. (2024). Deep Reinforcement Learning-Based Automatic Test Case Generation for Hardware Verification. Journal of Artificial Intelligence General science (JAIGS) ISSN: 3006-4023, 6(1), 409-429. [CrossRef]
- Yu, P., Xu, Z., Wang, J., & Xu, X. (2025). The Application of Large Language Models in Recommendation Systems. arXiv preprint arXiv:2501.02178.
- Ma, D. (2024). AI-Driven Optimization of Intergenerational Community Services: An Empirical Analysis of Elderly Care Communities in Los Angeles. Artificial Intelligence and Machine Learning Review, 5(4), 10-25. [CrossRef]
- Wang, P., Varvello, M., Ni, C., Yu, R., & Kuzmanovic, A. (2021, May). Web-lego: trading content strictness for faster webpages. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications (pp. 1-10). IEEE.
- W. Xu, J. Xiao, and J. Chen, “Leveraging large language models to enhance personalized recommendations in e-commerce,” arXiv, arXiv:2410.12829, 2024.
- Ni, C., Zhang, C., Lu, W., Wang, H., & Wu, J. (2024). Enabling Intelligent Decision Making and Optimization in Enterprises through Data Pipelines.
- Zhang, C., Lu, W., Ni, C., Wang, H., & Wu, J. (2024, June). Enhanced user interaction in operating systems through machine learning language models. In International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024) (Vol. 13180, pp. 1623-1630). SPIE.
- Wang, Z., Shen, Q., Bi, S., & Fu, C. (2024). AI Empowers Data Mining Models for Financial Fraud Detection and Prevention Systems. Procedia Computer Science, 243, 891-899. [CrossRef]
- Bi, Shuochen, Jue Xiao, and Tingting Deng. "The Role of AI in Financial Forecasting: ChatGPT's Potential and Challenges." Proceedings of the 4th Asia-Pacific Artificial Intelligence and Big Data Forum. 2024.
- Wang, H., Wu, J., Zhang, C., Lu, W., & Ni, C. (2024). Intelligent security detection and defense in operating systems based on deep learning. International Journal of Computer Science and Information Technology, 2(1), 359-367. [CrossRef]
- Lu, W., Ni, C., Wang, H., Wu, J., & Zhang, C. (2024). Machine learning-based automatic fault diagnosis method for operating systems.
- Zhang, C., Lu, W., Wu, J., Ni, C., & Wang, H. (2024). SegNet network architecture for deep learning image segmentation and its integrated applications and prospects. Academic Journal of Science and Technology, 9(2), 224-229. [CrossRef]
- Chen, Y., Feng, E., & Ling, Z. (2024). Secure Resource Allocation Optimization in Cloud Computing Using Deep Reinforcement Learning. Journal of Advanced Computing Systems, 4(11), 15-29.
- Ma, X., & Fan, S. (2024). Research on Cross-national Customer Churn Prediction Model for Biopharmaceutical Products Based on LSTM-Attention Mechanism. Academia Nexus Journal, 3(3).
| Feature | Mean | Std | Min | 25% | 50% | 75% | Max |
| Age | 36.92 | 9.14 | 18 | 30 | 36 | 43 | 60 |
| Monthly Income | 6,502.93 | 4,707.96 | 1,009 | 2,911 | 4,919 | 8,379 | 19,999 |
| Years at Company | 7.01 | 6.13 | 0 | 3 | 5 | 9 | 40 |
| Performance Rating | 3.15 | 0.36 | 1 | 3 | 3 | 3 | 4 |
| Job Satisfaction | 2.73 | 1.10 | 1 | 2 | 3 | 4 | 4 |
| Feature | Type | Encoding Method | Distinct Values |
| Department | Categorical | One-Hot | 3 |
| Education Field | Categorical | One-Hot | 6 |
| Job Role | Categorical | One-Hot | 9 |
| Marital Status | Categorical | Label | 3 |
| Over Time | Binary | Label | 2 |
| Rank | Feature | Information Gain | Correlation with Target |
| 1 | OverTime | 0.1028 | 0.3867 |
| 2 | MonthlyIncome | 0.0891 | -0.3598 |
| 3 | JobInvolvement | 0.0762 | -0.3118 |
| 4 | StockOptionLevel | 0.0627 | -0.2739 |
| 5 | YearsAtCompany | 0.0571 | -0.2672 |
| 6 | TotalWorkingYears | 0.0519 | -0.2528 |
| 7 | JobSatisfaction | 0.0491 | -0.2521 |
| 8 | WorkLifeBalance | 0.0421 | -0.2399 |
| 9 | MaritalStatus | 0.0392 | 0.2344 |
| 10 | Age | 0.0388 | -0.2239 |
| Method | Features Selected | Accuracy | Precision | Recall | F1-Score | Computation Time (s) |
| All Features | 35 | 0.8707 | 0.7982 | 0.6473 | 0.6844 | - |
| Information Gain | 18 | 0.8733 | 0.8145 | 0.6571 | 0.7279 | 12.37 |
| RFECV | 15 | 0.8794 | 0.8232 | 0.6749 | 0.7408 | 297.45 |
| SHAP-based | 12 | 0.8829 | 0.8451 | 0.6812 | 0.7545 | 438.62 |
| Hybrid Approach | 14 | 0.8852 | 0.8498 | 0.6927 | 0.7633 | 485.19 |
| Method | Execution Time (s) | Feature Reduction (%) | Stability Index | Mean Accuracy (%) | Mean F1-Score |
| Information Gain | 12.37 | 48.6 | 0.72 | 87.33 | 0.728 |
| Pearson Correlation | 8.94 | 51.4 | 0.68 | 85.41 | 0.693 |
| RFECV | 297.45 | 57.1 | 0.86 | 87.94 | 0.741 |
| SHAP-based | 438.62 | 65.7 | 0.91 | 88.29 | 0.755 |
| Hybrid Approach | 485.19 | 60.0 | 0.94 | 88.52 | 0.763 |
| Model | Accuracy (%) | Precision | Recall | F1-Score | AUC-ROC | Training Time (s) | Inference Time (ms) |
| Logistic Regression | 86.05 | 0.808 | 0.638 | 0.675 | 0.834 | 1.27 | 0.31 |
| Decision Tree | 77.89 | 0.535 | 0.543 | 0.537 | 0.753 | 0.89 | 0.25 |
| Random Forest | 85.71 | 0.723 | 0.527 | 0.514 | 0.846 | 3.46 | 0.76 |
| Gradient Boosting | 87.16 | 0.782 | 0.645 | 0.679 | 0.871 | 7.82 | 0.93 |
| XGBoost | 87.33 | 0.798 | 0.647 | 0.684 | 0.875 | 5.23 | 0.85 |
| Deep Neural Network | 86.73 | 0.735 | 0.665 | 0.689 | 0.869 | 15.62 | 1.14 |
| Model | True Negative | False Positive | False Negative | True Positive | Precision | Recall |
| XGBoost | 245 | 12 | 23 | 42 | 0.798 | 0.647 |
| Gradient Boosting | 242 | 15 | 23 | 42 | 0.782 | 0.645 |
| Deep Neural Network | 238 | 19 | 22 | 43 | 0.735 | 0.665 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).