Version 1
: Received: 11 May 2020 / Approved: 12 May 2020 / Online: 12 May 2020 (07:35:14 CEST)
How to cite:
Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data. Preprints2020, 2020050205. https://doi.org/10.20944/preprints202005.0205.v1
Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data. Preprints 2020, 2020050205. https://doi.org/10.20944/preprints202005.0205.v1
Abromavičius, V.; Plonis, D.; Tarasevičius, D.; Serackis, A. Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data. Preprints2020, 2020050205. https://doi.org/10.20944/preprints202005.0205.v1
APA Style
Abromavičius, V., Plonis, D., Tarasevičius, D., & Serackis, A. (2020). Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data. Preprints. https://doi.org/10.20944/preprints202005.0205.v1
Chicago/Turabian Style
Abromavičius, V., Deividas Tarasevičius and Artūras Serackis. 2020 "Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data" Preprints. https://doi.org/10.20944/preprints202005.0205.v1
Abstract
The presented research faces the problem of early detection of sepsis for patients in the Intensive Care Unit. The PhysioNet/Computing in Cardiology Challenge 2019 facilitated the development of automated, open-source algorithms for the early detection of sepsis from clinical data. A labeled clinical records dataset for training and verification of the algorithms was provided by the challenge organizers. However, a relatively small number of records with sepsis, supported by Sepsis-3 clinical criteria, led to highly unbalanced dataset (only 2% records with sepsis label). A high number of unbalanced data records is a great challenge for machine learning model training and is not suitable for training classical classifiers. To address these issues, a number of various models were investigated. A solution including feature selection and data balancing techniques was proposed in this paper. In addition, several performance metrics were investigated. Results show, that for successful prediction, a particular model having few or more predictors based on the length of stay in the Intensive Care Unit should be applied.
Keywords
Early detection; Sepsis; Evaluation metrics; Machine learning; Medical informatics; Feature extraction; Physionet challenge
Subject
Engineering, Electrical and Electronic Engineering
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.