4. Dıscussıon
In our study, no statistically significant differences were observed between the post-CPR patient group and other ICU patients in terms of age, gender, comorbidities, or presence of malignancy. However, the post-CPR group demonstrated significantly higher NUTRIC scores compared to the other group. No significant difference was found in NRS-2002 scores. Previous studies have shown that the NUTRIC score offers superior predictive ability in ICU settings compared to other nutritional risk scores (20). Moreover, modified NUTRIC scores have been shown to be closely associated with 28-day mortality in ICU patients (21).
Another distinguishing parameter in the post-CPR group was a significantly higher APACHE II score. APACHE II emerged as a prominent predictor of mortality both in the discriminant analysis and in the Random Forest-based machine learning model. A retrospective cohort study conducted in India involving 37 patients who were followed in the ICU after in-hospital cardiac arrest found that higher APACHE II scores were associated with increased mortality (22). Furthermore, a large-scale study using data from 16,940 critically ill patients on mechanical ventilation, where 83% of the dataset was used for model training and 17% for testing, also reported similar findings. In that model, the most significant predictors of 30-day mortality were APACHE II score, Charlson Comorbidity Index (CCI), and the need for norepinephrine—results that are consistent with our model based on a much smaller dataset. In our overall ICU patient model, APACHE II score, inotropic support, CRP, and SOFA score emerged as the most important predictors (23).
We believe that the methodological approach—correct model development, addressing data imbalance using ADASYN, improving learning from borderline samples, and building 10 randomized model iterations with averaged performance from a dataset of 82 patients—contributed to results that are in line with the existing literature.
In our discriminant analysis, which aimed to characterize post-CPR patients in the ICU, the three most defining features were a high APACHE II score, significantly greater need for inotropic support, and elevated ALP levels. A recent and impactful study from 2024 reported that in patients who developed acute liver injury following cardiac arrest, serum ALP levels were identified as an independent predictor of poor prognosis in a retrospective analysis (24).
In the mortality prediction model specifically developed for the post-CPR patient subgroup, some interesting findings emerged. While the APACHE II score ranked among the top predictors in the general model, it appeared to be less influential in the post-CPR subgroup. Instead, MPV, gender, and platelet count surfaced as the top-ranked variables. Notably, the need for inotropic support continued to hold a prominent position even within the mortality prediction of post-CPR patients.
We attribute the diminished importance of the APACHE II score to the fact that all post-CPR patients in our dataset exhibited uniformly high and similar APACHE II scores within the first 24 hours of ICU admission. Supporting this, previous studies analyzing hospital mortality prediction following cardiac arrest through artificial intelligence modeling have shown that AI-based models outperformed APACHE scores alone. In homogeneous patient groups, the APACHE score has been found less effective in discriminating mortality, whereas models incorporating age, gender, and physiological parameters performed better (25).
As for platelet-related variables, the literature presents conflicting evidence regarding platelet count, red cell distribution width (RDW), and MPV. In a study by Cotoia et al., MPV and platelet count were found to be unrelated to mortality in post-cardiac arrest patients. However, another study demonstrated that thrombocytopenia and elevated MPV levels were strongly associated with mortality in ICU patients (26, 27).
Among the traditional statistical findings of our study, procalcitonin and CRP levels were significantly higher in post-CPR patients compared to other ICU patients. It is known that in post-cardiac arrest syndrome, due to the acute inflammatory response triggered by cardiac arrest, an early increase in procalcitonin levels—followed by a delayed rise in CRP—can occur independently of infection. Furthermore, this inflammatory profile has been associated with poor prognosis in this patient population (28, 29).
Both CRP and procalcitonin ranked among the top features in the feature importance list of our Random Forest mortality prediction model for post-CPR patients.
ALT, AST, and GFR values also showed statistically significant differences in post-CPR patients. ALT and AST levels were higher, whereas GFR was lower. These parameters, indicative of multi-organ failure, are expected findings in this patient group and are associated with poor prognosis (30).
The observation that type 2 respiratory failure was less common among post-CPR patients compared to other ICU patients should be interpreted within the context of the hospital where the study was conducted. In this institution, patients with respiratory failure are more frequently monitored and typically respond well to non-invasive mechanical ventilation (NIMV) therapy. This may explain why the diagnosis of type 2 respiratory failure was more prevalent in the other ICU patient group.
Albumin levels were also significantly lower in the post-CPR group. This finding is supported by studies indicating that higher serum albumin levels are associated with reduced mortality in post-cardiac arrest patients (31). In our study, both mortality and hypoalbuminemia were notable clinical features within the post-CPR group.
In artificial intelligence modeling, the presence of imbalanced datasets—such as when the number of deceased patients is lower than survivors—can result in models that are biased toward predicting survival. Additionally, if the dataset contains few borderline cases for the variables being studied (e.g., a patient with no inotropic support but a high APACHE II score), and if the model is not well-trained on such instances, its mortality prediction performance can decrease.
Oversampling methods have been proposed in the literature as solutions to these challenges, to be applied prior to statistical analysis. For instance, at the 6th International Conference on Software Engineering and Information Management in 2023, a logistic regression analysis developed using the ADASYN method to model mortality in patients presenting with typical chest pain due to acute myocardial infarction was reported to be the most successful approach (32).
Furthermore, recent collaborative projects between MIT and a medical center in Israel have led to the development of numerous studies utilizing large datasets consisting of thousands of patients’ clinical and laboratory data. Most of these studies are based on machine learning-driven AI models. Even with such large datasets, oversampling techniques such as ADASYN and SMOTE were applied to protect models from the adverse effects of data imbalance and to enhance learning from borderline examples (33).
4.1. Limitations of the Study
This study has several limitations. Most notably, we were unable to include data on the duration of cardiopulmonary resuscitation (CPR) and the initial cardiac rhythm detected due to significant inconsistencies in these records. Although these variables are recognized as important in the literature, they were excluded from our analysis. Despite conducting a retrospective review of hospital records over a well-documented three-year period, only 41 post-CPR patients met the inclusion criteria.
We excluded out-of-hospital cardiac arrests because we could not reliably differentiate witnessed from unwitnessed arrests. Furthermore, the time required to achieve minimum clinical stabilization during patient transfers from other centers to our ICU made these cases unsuitable for inclusion. Another limitation is the inability to implement targeted temperature management (TTM) at our institution, which may have impacted patient outcomes.
From a statistical and technical perspective, while the models demonstrated high performance metrics, caution is warranted when interpreting their generalizability to real-world clinical settings. The study was conducted on a relatively small sample of 82 patients, which increases the risk of overfitting. Additionally, the number of features included in the models was relatively high compared to the sample size, potentially leading to model overtraining and artificially inflated validation results. In some data splits, an AUC of 1.0 was observed, supporting this concern.
While the model’s robustness was assessed through repeated train-test splits, no subgroup fairness analysis (e.g., by age or sex) was conducted due to limited sample size. This limitation should be addressed in future studies.
Furthermore, although the ADASYN method improved the model’s ability to learn from borderline cases by generating synthetic examples for the minority class, it also introduces the risk of embedding artificial patterns into the training data. This could reduce the model’s performance when applied to real patient populations.
Therefore, external validation is essential in future applications of this model. It should be tested on larger, multicenter patient cohorts to ensure robustness. Additionally, comparative analyses with more simplified models are recommended to more realistically assess the model’s suitability for clinical use.