Preprint
Article

This version is not peer-reviewed.

The Electronic Health Record-Based Artificial Intelligence Model for Predicting Long-Term Outcomes After Radical Surgery for Colorectal Cancer

Submitted:

09 March 2026

Posted:

11 March 2026

You are already at the latest version

Abstract
Background: Accurate prediction of outcomes in colorectal cancer (CRC) is essential for personalized treatment. Conventional prognostic tools, including TNM staging, have limited accuracy. Machine learning (ML) may better capture complex prognostic patterns. Methods: In a retrospective multicenter cohort of 7,253 non-metastatic CRC patients after radical surgery, we compared prognostic accuracy for predicting recurrence and mortality using: a baseline TNM stage model; a logistic regression model with six clinicopatho-logical variables; and ML algorithms (Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost) with hyperparameter optimization (Optuna) and iterative feature selection. Binary outcomes (recurrence and all-cause mortality at 1 and 3 years) were used for ML training. Performance was assessed using area under the ROC curve (AUC). Results: The stage-only model showed poor discrimination (weighted AUC: 0.541 for mortality, 0.528 for recurrence). Logistic regression improved predictions (AUC: 0.759 and 0.645, respectively). Among ML models, CatBoost achieved the best performance. After iterative feature selection, the optimized CatBoost model utilizing 17 clinical var-iables demonstrated superior cross-validated AUCs of 0.81 for mortality and 0.84 for recurrence, consistently outperforming both baseline models across all time horizons. External validation on 1,452 held-out patients confirmed robustness with AUCs of 0.83 for mortality and 0.91 for recurrence. Conclusion: An optimized CatBoost model significantly outperforms traditional TNM staging and logistic regression in predicting recurrence and mortality in CRC using 17 routinely available variables. This parsimonious, data-driven tool offers improved individualized risk assessment for guiding post-operative man-agement. Prospective validation is warranted.
Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated