Submitted:
01 October 2024
Posted:
02 October 2024
You are already at the latest version
Abstract
Employee turnover is a significant issue for financial institutions, impacting productivity, increasing recruitment costs, and disrupting critical operations. In this project, this study aimed to predict employee turnover using a dataset containing attributes such as employee satisfaction, performance, and tenure. By framing the task as a binary classification problem, this study employed CatBoost and XGBoost, two advanced regression-based algorithms, to develop predictive models. This paper's analysis demonstrated that CatBoost outperformed XGBoost across all evaluation metrics, including MAE, MSE, RMSE, and R², making it the more effective model for predicting turnover in the financial sector. The study highlights key factors contributing to employee attrition, such as job satisfaction, tenure, and promotion opportunities, offering actionable insights for retention strategies. Additionally, by predicting probabilities rather than binary outcomes, this study aims to make more detailed decisions about employee retention. This research provides valuable tools for financial institutions to mitigate the risk of turnover, retain critical talent, and ensure operational continuity.
Keywords:
1. Introduction
2. Related Work
3. Data
3.1. Variable Introduction
| Variable | Description |
|---|---|
| satisfaction_level | The level of satisfaction of the employee |
| last_evaluation | The score of the last evaluation of the employee |
| number_project | The number of projects the employee has worked on |
| average_montly_hours | The average monthly hours worked by the employee |
| time_spend_company | The number of years the employee has spent at the company |
| Work_accident | Whether the employee had a work accident (1 = yes, 0 = no) |
| left | Whether the employee has left the company (1 = yes, 0 = no) |
| promotion_last_5years | Whether the employee had a promotion in the last 5 years (1 = yes, 0 = no) |
| sales | The department the employee works in |
| salary | The salary level of the employee (low, medium, high) |
3.2. Data Visualization
3.3. Descriptive Analysis
| count | mean | std | min | 0.25 | 0.5 | 0.75 | max | |
|---|---|---|---|---|---|---|---|---|
| satisfaction_level | 14999 | 0.61 | 0.25 | 0.09 | 0.44 | 0.64 | 0.82 | 1 |
| last_evaluation | 14999 | 0.72 | 0.17 | 0.36 | 0.56 | 0.72 | 0.87 | 1 |
| number_project | 14999 | 3.80 | 1.23 | 2.00 | 3.00 | 4.00 | 5.00 | 7 |
| average_montly_hours | 14999 | 201.05 | 49.94 | 96.00 | 156.00 | 200.00 | 245.00 | 310 |
| time_spend_company | 14999 | 3.50 | 1.46 | 2.00 | 3.00 | 3.00 | 4.00 | 10 |
4. Modeling
4.1. CatBoost
4.2. XGBoost
4.3. Comparative Analysis of Models

| Model | MAE | MSE | RMSE | R2 Score |
|---|---|---|---|---|
| CatBoost | 0.039190 | 0.011617 | 0.107781 | 0.935945 |
| XGBoost | 0.043599 | 0.012610 | 0.112295 | 0.930467 |
5. Conclusions
References
- Liu Shanshi, Sun Bo, Ge Chunmian, et al. Social network of human capital and enterprise innovation-an empirical study based on online resume data [J]. Management World, 2017, (07): 88-98+119+188. doi: 10.19744/j.cnki.11. [CrossRef]
- Zhang Yali, Yang Naiding. Risk analysis and control of personnel flow [J]. Science and Science and Technology Management, 2000,(09):42-44.
- Huang Yuhong, Yi Daichun, Jie Mengyin. Analysis of the current situation, role and influencing factors of employee mobility in small and micro enterprises in China [J]. Management World, 2016, (12): 77-89. doi: 10.9744/j.cnki.11-1235/F.2016. [CrossRef]
- Yang Yinan, Lian Yujun. Can social insurance reduce employee resignation rate? -estimation of the double difference model of comprehensive social survey in China [J]. Economic Management, 2015,37 (01): 168-179.doi: 10.19616/j.cnki.bmj.2015.01.019. [CrossRef]
- Hu Haozhi, Lu Xianxiang. Enterprise-specific human capital and employee mobility [J]. Financial Research, 2010,(06):86-92.
- Wang Chunxiu. Decision-making model of employee leaving and staying [J]. Shopping Mall Modernization, 2010,(35):157-159.
- Hancock J T, Khoshgoftaar T M. CatBoost for big data: an interdisciplinary review[J]. Journal of big data, 2020, 7(1): 94. [CrossRef]
- Prokhorenkova L, Gusev G, Vorobev A, et al. CatBoost: unbiased boosting with categorical features[J]. Advances in neural information processing systems, 2018, 31.
- Chen T, Guestrin C. Xgboost: A scalable tree boosting system[C]//Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016: 785-794. [CrossRef]
- Ogunleye A, Wang Q G. XGBoost model for chronic kidney disease diagnosis[J]. IEEE/ACM transactions on computational biology and bioinformatics, 2019, 17(6): 2131-2140. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).