Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Predicting Nurse Turnover for Highly Imbalanced Data Using SMOTE and Machine Learning Algorithms

Version 1 : Received: 31 October 2023 / Approved: 1 November 2023 / Online: 1 November 2023 (09:19:50 CET)

A peer-reviewed article of this Preprint also exists.

Xu, Y.; Park, Y.; Park, J.D.; Sun, B. Predicting Nurse Turnover for Highly Imbalanced Data Using the Synthetic Minority over-Sampling Technique and Machine Learning Algorithms. Healthcare 2023, 11, 3173. Xu, Y.; Park, Y.; Park, J.D.; Sun, B. Predicting Nurse Turnover for Highly Imbalanced Data Using the Synthetic Minority over-Sampling Technique and Machine Learning Algorithms. Healthcare 2023, 11, 3173.

Abstract

Predicting nurse turnover is a growing challenge within the healthcare sector, profoundly impacting healthcare quality and the nursing profession. This study employs the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance issues in the 2018 National Sample Survey of Registered Nurses (NSSRN) dataset and predict nurse turnover using machine learning (ML) algorithms. Four ML algorithms, namely logistic regression (LR), random forests (RF), decision tree (DT), and extreme gradient boosting (XGBoost), are applied to the SMOTE-enhanced dataset. The data is randomly split into an 80% training set and a 20% validation set. Eighteen carefully selected variables from the NSSRN database serve as predictive features, and the machine learning model identifies feature importance concerning nurse turnover. The study includes a performance comparison based on metrics such as Accuracy, Precision, Recall (Sensitivity), F1-score, and AUC. In summary, the results demonstrate that SMOTE-enhanced random forests (SMOTE_RT) exhibit the most robust predictive power, both in the classical approach (with all 18 predictive variables) and an optimized approach (utilizing eight key predictive variables). XGBoost, decision tree, and logistic regression follow in performance. Notably, age emerges as the most influential factor in nurse turnover, with working hours, EHR/EMR usability, individual income, and region also playing significant roles. This research offers valuable insights for healthcare researchers and stakeholders, aiding in selecting suitable ML algorithms for nurse turnover prediction.

Keywords

Nurse Turnover; Machine Learning; SMOTE; NSSRN; Random Forest; XGoost

Subject

Public Health and Healthcare, Nursing

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.