Version 1
: Received: 11 May 2020 / Approved: 12 May 2020 / Online: 12 May 2020 (07:35:14 CEST)
How to cite:
Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data. Preprints2020, 2020050205 (doi: 10.20944/preprints202005.0205.v1).
Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data. Preprints 2020, 2020050205 (doi: 10.20944/preprints202005.0205.v1).
Cite as:
Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data. Preprints2020, 2020050205 (doi: 10.20944/preprints202005.0205.v1).
Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data. Preprints 2020, 2020050205 (doi: 10.20944/preprints202005.0205.v1).
Abstract
The presented research faces the problem of early detection of sepsis for patients in the Intensive Care Unit. The PhysioNet/Computing in Cardiology Challenge 2019 facilitated the development of automated, open-source algorithms for the early detection of sepsis from clinical data. A labeled clinical records dataset for training and verification of the algorithms was provided by the challenge organizers. However, a relatively small number of records with sepsis, supported by Sepsis-3 clinical criteria, led to highly unbalanced dataset (only 2% records with sepsis label). A high number of unbalanced data records is a great challenge for machine learning model training and is not suitable for training classical classifiers. To address these issues, a number of various models were investigated. A solution including feature selection and data balancing techniques was proposed in this paper. In addition, several performance metrics were investigated. Results show, that for successful prediction, a particular model having few or more predictors based on the length of stay in the Intensive Care Unit should be applied.
Subject Areas
Early detection; Sepsis; Evaluation metrics; Machine learning; Medical informatics; Feature extraction; Physionet challenge
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.