The Electronic Health Record-Based Artificial Intelligence Model for Predicting Long-Term Outcomes After Radical Surgery for Colorectal Cancer

Mariam Sh. Manukyan; Valeriya I Pavlova; Maxim S. Kirsanov; Aydar Akhmetzyanov; Rukiyat Sh. Abdulaeva; Marianna O. Mandrina; Yana V. Belenkaya; Ivan S Stilidi; Tigran G. Gevorkyan; Sergey S. Gordeyev

doi:10.20944/preprints202603.0700.v1

Submitted:

09 March 2026

Posted:

11 March 2026

You are already at the latest version

Abstract

Background: Accurate prediction of outcomes in colorectal cancer (CRC) is essential for personalized treatment. Conventional prognostic tools, including TNM staging, have limited accuracy. Machine learning (ML) may better capture complex prognostic patterns. Methods: In a retrospective multicenter cohort of 7,253 non-metastatic CRC patients after radical surgery, we compared prognostic accuracy for predicting recurrence and mortality using: a baseline TNM stage model; a logistic regression model with six clinicopatho-logical variables; and ML algorithms (Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost) with hyperparameter optimization (Optuna) and iterative feature selection. Binary outcomes (recurrence and all-cause mortality at 1 and 3 years) were used for ML training. Performance was assessed using area under the ROC curve (AUC). Results: The stage-only model showed poor discrimination (weighted AUC: 0.541 for mortality, 0.528 for recurrence). Logistic regression improved predictions (AUC: 0.759 and 0.645, respectively). Among ML models, CatBoost achieved the best performance. After iterative feature selection, the optimized CatBoost model utilizing 17 clinical var-iables demonstrated superior cross-validated AUCs of 0.81 for mortality and 0.84 for recurrence, consistently outperforming both baseline models across all time horizons. External validation on 1,452 held-out patients confirmed robustness with AUCs of 0.83 for mortality and 0.91 for recurrence. Conclusion: An optimized CatBoost model significantly outperforms traditional TNM staging and logistic regression in predicting recurrence and mortality in CRC using 17 routinely available variables. This parsimonious, data-driven tool offers improved individualized risk assessment for guiding post-operative man-agement. Prospective validation is warranted.

Keywords:

colorectal cancer

;

cox regression

;

machine learning

;

CatBoost

;

survival analysis

;

AUC

;

TNM staging

Subject:

Medicine and Pharmacology - Oncology and Oncogenics

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

The Electronic Health Record-Based Artificial Intelligence Model for Predicting Long-Term Outcomes After Radical Surgery for Colorectal Cancer

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe