Preprint Article Version 1 This version is not peer-reviewed

Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data

Version 1 : Received: 11 May 2020 / Approved: 12 May 2020 / Online: 12 May 2020 (07:35:14 CEST)

How to cite: Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data. Preprints 2020, 2020050205 (doi: 10.20944/preprints202005.0205.v1). Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data. Preprints 2020, 2020050205 (doi: 10.20944/preprints202005.0205.v1).

Abstract

The presented research faces the problem of early detection of sepsis for patients in the Intensive Care Unit. The PhysioNet/Computing in Cardiology Challenge 2019 facilitated the development of automated, open-source algorithms for the early detection of sepsis from clinical data. A labeled clinical records dataset for training and verification of the algorithms was provided by the challenge organizers. However, a relatively small number of records with sepsis, supported by Sepsis-3 clinical criteria, led to highly unbalanced dataset (only 2% records with sepsis label). A high number of unbalanced data records is a great challenge for machine learning model training and is not suitable for training classical classifiers. To address these issues, a number of various models were investigated. A solution including feature selection and data balancing techniques was proposed in this paper. In addition, several performance metrics were investigated. Results show, that for successful prediction, a particular model having few or more predictors based on the length of stay in the Intensive Care Unit should be applied.

Subject Areas

Early detection; Sepsis; Evaluation metrics; Machine learning; Medical informatics; Feature extraction; Physionet challenge

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.