Preprint
Article

This version is not peer-reviewed.

Hospital-Wide Sepsis Detection: A Machine Learning Model Based on Prospectively Expert-Validated Cohort

Submitted:

01 December 2025

Posted:

04 December 2025

You are already at the latest version

Abstract
Background/Objectives: Sepsis detection remains challenging due to clinical heterogene-ity and limitations of traditional scoring systems. This study developed and validated a hospital-wide machine learning model for sepsis detection using retrospectively devel-oped data from prospectively expert-validated cases, aiming to improve diagnostic accu-racy beyond conventional approaches. Methods: This retrospective cohort study analyzed 218,715 hospital episodes (2014-2018) at a tertiary care center. Sepsis cases (n=11,864, 5.42%) were prospectively validated in real-time by a Multidisciplinary Sepsis Unit using modified Sepsis-2 criteria with organ dysfunction. The model integrated structured data (26.95%) and unstructured clinical notes (73.04%) extracted via natural language pro-cessing from 2,829 variables, selecting 230 relevant predictors. Thirty models including random forests, support vector machines, neural networks, and gradient boosting were developed and evaluated. The dataset was randomly split (5/7 training, 2/7 testing) with preserved patient-level independence. Results: The BiAlert Sepsis model (random forest + Sepsis-2 ensemble) achieved AUC-ROC 0.95, sensitivity 0.93, and specificity 0.84, signifi-cantly outperforming traditional approaches. Compared to the best rule-based method (Sepsis-2 + qSOFA, AUC-ROC 0.90), BiAlert reduced false positives by 39.6% (13.10% vs 21.70%, p< 0.01). Novel predictors included eosinopenia and hypoalbuminemia, while traditional variables (MAP, GCS, platelets) showed minimal univariate association. The model received European Medicines Agency approval as a medical device in June 2024. Conclusions: This hospital-wide machine learning model, trained on prospectively ex-pert-validated cases and integrating extensive NLP-derived features, demonstrates supe-rior sepsis detection performance compared to conventional scoring systems. External validation and prospective clinical impact studies are needed before widespread imple-mentation.
Keywords: 
;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated