Submitted:
28 January 2026
Posted:
29 January 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Heart failure remains a leading cause of cardiovascular mortality globally, accounting for an estimated 17.9 million deaths annually [33]. Early identification of high-risk patients is critical for guiding treatment and reducing hospital readmissions. We utilize a dataset comprising medical records of 299 patients collected from the Faisalabad Institute of Cardiology and Allied Hospital in Pakistan between April and December 2015 [34].
- Breast cancer is the most frequently diagnosed cancer among women and the second leading cause of cancer-related death worldwide [22]. Despite advances in screening and treatment, survival disparities persist, underscoring the need for personalized, risk-based care. The METABRIC dataset offers a comprehensive molecular and clinical profile of nearly 2,509 breast cancer patients, combining genomic, transcriptomic, and survival data. Its use has enabled the identification of 10 novel molecular subtypes, significantly enhancing our understanding of prognosis and disease heterogeneity [35].
2. Materials and Methods
2.1. Study Dataset, Population, and Data Preprocessing
2.2. Survival Prediction Modeling
2.3. Model Performance Evaluation
2.4. Feature Importance and Clinical Translation

3. Result
3.1. Heart Failure Analysis
- Age (≥65 vs. <65 years): p = 0.0028 [HR: 1.89];
- Serum Creatinine (Abnormal vs. Normal): p = 0.0031 [HR: 1.90];
- High Blood Pressure (Yes vs. No): p = 0.0189 [HR: 1.66].
3.2. METABRIC Breast Cancer Analysis
- Age at Diagnosis (<65 vs. ≥65 years): p < 0.001 [HR: 2.10];
- NPI Group (Good/Moderate vs. Poor): p < 0.001 [HR:2.16; HR: 1.49];
- HER2 Status (Negative vs. Positive): p < 0.001 [HR: 0.63].
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fleming, T.R.; Harrington, D.P. Counting Processes and Survival Analysis; John Wiley & Sons, 1991. [Google Scholar]
- Kleinbaum, D.G.; Klein, M. Survival Analysis: A Self-Learning Text; Springer, 1996. [Google Scholar]
- Kurian, A.W.; Sigal, B.M.; Plevritis, S.K. Survival analysis of cancer risk reduction strategies for BRCA1/2 mutation carriers. Journal of Clinical Oncology 2010, 28, 222–231. [Google Scholar] [CrossRef] [PubMed]
- Motzer, R.J., Escudier, B., Tomczak, P., Hutson, T.E., Michaelson, M.D., Negrier, S., ... & Rini, B.I. Axitinib versus sorafenib as second-line treatment for advanced renal cell carcinoma: overall survival analysis and updated results from a randomised phase 3 trial. The Lancet Oncology 2013, 14, 552–562.
- Pocock, S.J.; Gore, S.M.; Kerr, G.R. Long term survival analysis: the curability of breast cancer. Statistics in Medicine 1982, 1, 93–104. [Google Scholar] [CrossRef] [PubMed]
- Chicco, D.; Jurman, G. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Medical Informatics and Decision Making 2020, 20, 1–16. [Google Scholar] [CrossRef]
- Marcílio, W.E.; Eler, D.M. (2020, November). From explanations to feature selection: assessing SHAP values as feature selection mechanism. In 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (pp. 340–347). IEEE.
- Mpanya, D.; Celik, T.; Klug, E.; Ntsinjana, H. Machine learning and statistical methods for predicting mortality in heart failure. Heart Failure Reviews 2021, 26, 545–552. [Google Scholar] [CrossRef]
- D'Amico, G.; Garcia-Tsao, G.; Pagliaro, L. Natural history and prognostic indicators of survival in cirrhosis: a systematic review of 118 studies. Journal of Hepatology 2006, 44, 217–231. [Google Scholar] [CrossRef]
- Guo, A.; Mazumder, N.R.; Ladner, D.P.; Foraker, R.E. Predicting mortality among patients with liver cirrhosis in electronic health records with machine learning. PLOS ONE 2021, 16, e0256428. [Google Scholar] [CrossRef]
- Kanwal, F.; Taylor, T.J., Kramer, J.R., Cao, Y., Smith, D., Gifford, A.L., ... & Asch, S.M. Development, validation, and evaluation of a simple machine learning model to predict cirrhosis mortality. JAMA Network Open 2020, 3, e2023780.
- Chen, H.C.; Kodell, R.L.; Cheng, K.F.; Chen, J.J. Assessment of performance of survival prediction models for cancer prognosis. BMC Medical Research Methodology 2012, 12, 102. [Google Scholar] [CrossRef]
- Spruance, S.L.; Reid, J.E.; Grace, M.; Samore, M. Hazard ratio in clinical trials. Antimicrobial Agents and Chemotherapy 2004, 48, 2787–2792. [Google Scholar] [CrossRef]
- Lin, D.Y.; Wei, L.J. The robust inference for the Cox proportional hazards model. Journal of the American Statistical Association 1989, 84, 1074–1078. [Google Scholar] [CrossRef]
- Ishwaran, H.; Kogalur, U.B.; Blackstone, E.H.; Lauer, M.S. Random survival forests. The Annals of Applied Statistics 2008, 2, 841–860. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: a gradient boosting machine. Annals of Statistics 2001, 1189–1232. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, August; pp. 785–794. [Google Scholar]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: unbiased boosting with categorical features. Advances in Neural Information Processing Systems 2018, 31. [Google Scholar]
- Kokori, E.; Patel, R.; Olatunji, G.; Ukoaka, B.M., Abraham, I.C., Ajekiigbe, V.O., ... & Aderinto, N. Machine learning in predicting heart failure survival: a review of current models and future prospects. Heart Failure Reviews 2025, 30, 431–442.
- Zhou, J.G., Wong, A.H.H., Wang, H., Tan, F., Chen, X., Jin, S.H., ... & Gaipl, U.S. Elucidation of the application of blood test biomarkers to predict immune-related adverse events in atezolizumab-treated NSCLC patients using machine learning methods. Frontiers in Immunology 2022, 13, 862752.
- Spreafico, M.; Hazewinkel, A.D.; van de Sande, M.A.; Gelderblom, H.; Fiocco, M. Machine Learning versus Cox Models for Predicting Overall Survival in Patients with Osteosarcoma: A Retrospective Analysis of the EURAMOS-1 Clinical Trial Data. Cancers 2024, 16, 2880. [Google Scholar] [CrossRef]
- Watkins, E.J. Overview of breast cancer. JAAPA 2019, 32, 13–17. [Google Scholar] [CrossRef]
- Giordano, S.H.; Buzdar, A.U.; Smith, T.L.; Kau, S.W.; Yang, Y.; Hortobagyi, G.N. Is breast cancer survival improving? Trends in survival for patients with recurrent breast cancer diagnosed from 1974 through 2000. Cancer 2004, 100, 44–52. [Google Scholar] [PubMed]
- Hespanhol, V.; Queiroga, H.; Magalhaes, A.; Santos, A.R.; Coelho, M.; Marques, A. Survival predictors in advanced non-small cell lung cancer. Lung Cancer 1995, 13, 253–267. [Google Scholar] [CrossRef]
- Ranstam, J.; Cook, J.A. LASSO regression. British Journal of Surgery 2018, 105, 1348. [Google Scholar] [CrossRef]
- Wang, H.; Lengerich, B.J.; Aragam, B.; Xing, E.P. Precision Lasso: accounting for correlations and linear dependencies in high-dimensional genomic data. Bioinformatics 2019, 35, 1181–1187. [Google Scholar] [CrossRef] [PubMed]
- Panahiazar, M.; Taslimitehrani, V.; Pereira, N.; Pathak, J. Using EHRs and machine learning for heart failure survival analysis. In MEDINFO 2015: eHealth-enabled Health; IOS Press, 2015; pp. 40–44. [Google Scholar]
- Buyrukoğlu, G. Survival analysis in breast cancer: evaluating ensemble learning techniques for prediction. PeerJ Computer Science 2024, 10, e2147. [Google Scholar] [CrossRef] [PubMed]
- Evangeline, I.K.; Kirubha, S.A.; Precious, J.G. Survival analysis of breast cancer patients using machine learning models. Multimedia Tools and Applications 2023, 82, 30909–30928. [Google Scholar] [CrossRef]
- Zhao, M.; Tang, Y.; Kim, H.; Hasegawa, K. Machine learning with k-means dimensional reduction for predicting survival outcomes in patients with breast cancer. Cancer Informatics 2018, 17, 1176935118810215. [Google Scholar] [CrossRef]
- Moncada-Torres, A.; van Maaren, M.C.; Hendriks, M.P.; Siesling, S.; Geleijnse, G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Scientific Reports 2021, 11, 6968. [Google Scholar] [CrossRef]
- Doan, L.M. T.; Angione, C.; Occhipinti, A. Machine learning methods for survival analysis with clinical and transcriptomics data of breast cancer. In Computational Biology and Machine Learning for Metabolic Engineering and Synthetic Biology; Springer US, 2022; pp. 325–393. [Google Scholar]
- World Health Organization. (n.d.). World Heart Day. Available online: https://www.who.int/cardiovascular_diseases/world-heart-day/en/.
- Ahmad, T.; Munir, A.; Bhatti, S.H.; Aftab, M.; Ali Raza, M. DATA_MINIMAL [Dataset]. PLOS ONE 2017. [Google Scholar] [CrossRef]
- Curtis, C.; Shah, S.P., Chin, S.F., Turashvili, G., Rueda, O.M., Dunning, M.J., ... & Caldas, C. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012, 486, 346–352.
- Vieira, D.; Gimenez, G.; Marmorela, G.; Estima, V. XGBoost survival embeddings: Improving statistical properties of XGBoost survival analysis implementation. Loft Python 2021. [Google Scholar]
- Tibshirani, R. The lasso method for variable selection in the Cox model. Statistics in Medicine 1997, 16, 385–395. [Google Scholar] [CrossRef]
- Pencina, M.J.; D'Agostino, R.B. Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Statistics in Medicine 2004, 23, 2109–2123. [Google Scholar] [CrossRef]
- Kamarudin, A.N.; Cox, T.; Kolamunnage-Dona, R. Time-dependent ROC curve analysis in medical research: current methods and applications. BMC Medical Research Methodology 2017, 17, 53. [Google Scholar] [CrossRef]
- US Food and Drug Administration (FDA). Inclusion of Older Adults in Cancer Clinical Trials: Guidance for Industry; U.S. Department of Health and Human Services, 2022. [Google Scholar]
- Mayo Clinic. Ejection Fraction: What Does It Mean? Available online: https://www.mayoclinic.org/tests-procedures/ejection-fraction/about/pac-20384971 (accessed on 24 May 2024).
- Lab Tests Online. Creatinine. Available online: https://labtestsonline.org/tests/creatinine (accessed on 24 May 2024).
- Galea, M.H.; Blamey, R.W.; Elston, C.E.; Ellis, I.O. The Nottingham Prognostic Index in primary breast cancer. Breast Cancer Research and Treatment 1992, 22, 207–219. [Google Scholar] [CrossRef]
- Early Breast Cancer Trialists’ Collaborative Group (EBCTCG). Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. The Lancet 2005, 365, 1687–1717. [Google Scholar] [CrossRef]
- Edge, S.B., Byrd, D.R., Compton, C.C., Fritz, A.G., Greene, F.L.; Trotti, A. (Eds.). (2010). AJCC Cancer Staging Manual (7th ed.). Springer.
- Bland, J.M.; Altman, D.G. The logrank test. BMJ 2004, 328, 1073. [Google Scholar] [PubMed]
- Nak, D.; Kivrak, M. Mastectomy, HER2 receptor positivity, NPI, late stage and luminal B-type tumor as poor prognostic factors in geriatric patients with breast cancer. Diagnostics 2024, 15, 13. [Google Scholar] [CrossRef] [PubMed]
- Baidoo, T.G.; Rodrigo, H. Data-driven survival modeling for breast cancer prognostics: A comparative study with machine learning and traditional survival modeling methods. PLOS ONE 2025, 20, e0318167. [Google Scholar] [CrossRef] [PubMed]








| GBM-Cox | RFS | LASSO-Cox | |
| Harrell’s C-index | 0.789 | 0.773 | 0.731 |
| All time AUC | 0.639 | 0.830 | 0.724 |
| GBM-Cox | RSF | LASSO-Cox | |
| Harrell’s C-index | 0.679 | 0.681 | 0.666 |
| AUC (5-year) | 0.714 | 0.730 | 0.675 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).