Preprint
Review

This version is not peer-reviewed.

Artificial Intelligence and Rectal Cancer: Beyond Images

A peer-reviewed article of this preprint also exists.

Submitted:

22 May 2025

Posted:

23 May 2025

You are already at the latest version

Abstract
Introduction. Cancer variability plus medical big data can be handled by artificial intelligence. Its models can have different input types: images and many others (as numbers, predefined categories, free texts). The non image elements are as important as images, as per clinical practice and literature. This article reviews such models, with non image component, applied as use case to rectal cancer. Results and Discussion. Secondary literature analysis shows all reviews focusing on image inputs only. Primary literature models of interest include pure (only non image) and combined (both non image and image) inputs. Non image models show significant performance. Combined models frequently exhibit better behavior than their unimodal parts. Conclusion. To the best of our knowledge, no reviews focus on non image inputs, either alone or combined with images. Non image components require instead substantial attention, in optimal clinical practice and research. Multimodality –beyond images– is important, in rectal cancer and possibly other pathologies. Methods. Literature search was performed on PubMed, without temporal limits, in English, using ample keywords; for secondary literature, appropriate filters were employed.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

Rectal cancer is a world wide leading cause of morbidity and mortality. It is the 8th highest cancer type as per incidence (≈ 700,000 new cases per year) and has significant mortality (≈ 300,000 deaths per year) [1].
Cancer, in general, is still a fundamental burden in human society, despite significant technological and clinical progresses. Glonbally, it is the second leading cause of death [2]; furthermore, its trend is increasing [1]. One of the main reasons for cancer burden is its heterogeneity. Variability spans multiple levels: disease level (100+ cancer types), inter patient level (each patient is unique, as per history and current status) and even intra patient level (cancer evolves dynamically, particularly as per its genome). This daunting complexity hinders both its biological understanding as well as its clinical management (prevention, diagnosis, therapy and prognosis).
Fortunately, a very promising paradigmatic shift is ongoing in medicine (particularly, in oncology): evidence based medicine is gradually being complemented by personalized medicine3. Evidence based medicine –more traditional and established, based on clinical trials– is fundamental in the development of medical knowledge, having generated the standard of care in the last three decades [3,4]. Nonetheless, its methodology generates evidence for an ‘average patient’ (‘one size fits all’ approach), with application to the actual, unique patient being problematic [3,5]. Personalized medicine –more modern, with its precision and individualized perspective– can usefully complement evidence based medicine, particularly in oncology. Certain personalized medicine enablers –big data (particularly real world data), and artificial intelligence (particularly machine learning)– hold great promise [3].
Big data are digital data created in enormous amount (≈ 1012 terabyte/year, with growing trend). Big data, to be so defined, should have not only high ‘volume’ (quantity); but also high ‘variety’ (multiple variables and sources), ‘velocity’ (acquisition speed) as well as ‘veracity’ (quality-related parameter). Medical big data have enormous potential to improve research and clinical practice [6]; but they need to be elaborated by proper mathematical and computational methodologies. Particularly, AI –when rigorously designed, developed and validated [7]– can successfully handle them [8].
Artificial intelligence is the imitation of human intelligence by machine computing [9]. An artificial intelligence model is a mathematical relationship between an input (the data, with one or more independent variables) and one or more outputs (the dependent, pursued variables); in the typical situation, independent variables (also knows as features, predictors or covariates) are many (with cardinality in the order of magnitude of tens or even hundreds), with only one dependent variable (also know as outcome, target or label). Inputs can mainly be regrouped into ‘structured’ and ‘unstructured’. Structured data –including numbers and pre-defined categories– have higher standardization and usually data quality. Unstructured data –including images and uncategorized (freely typed) text– usually need to be mined (e.g. radiomics for images and natural language processing for texts). Particularly, radiomics in an approach which aims to extract structured quantitative information (consisting in even several hundreds of predictors) from unstructured medical images [11].
Machine learning is a notable and intensively researched subset of artificial intelligence. Machine learning consists in mimicking human learning by computer algorithms [10]. These algorithnms are able to ‘learn’ features and patterns directly from the input data without being explicitly programmed. Thus, machine learning –both the model structure and its subsequent performance– is strongly influenced by the data, those given as input (training) and those employed to validate (evaluation). Generally speaking, increasing the quantity of such data (i.e. the sample size) tends to provide more reliable results, as in statistics, an advantage of big data over ‘traditional’ data. But, simultaneously, big data are also required to have sufficient data quality, in order to give reliable and realistic results; as such, they need curated and standardized data collection [11,12].
Machine learning is mainly categorized into supervised and unsupervised learning (a hybrid methodology ‒semi supervised‒ is also possible). Supervised learning deals with labeled data (data for which the outcome has given, known values, called ‘labels’); unsupervised one with unlabeled data (data for which either outcome labels are too difficult/expensive to obtain or even it is complicated to identify an actual outcomme in the data set). Supervised machine learning can be further divided into ‘classification’ (when the outcome is qualitative, i.e. categorical) and ‘regression’ (outcome is quantitative, i.e. numerical): in classification, the model predicts to which outcome class (category or group) each data item (in generale object or entity, in medicine patient) belongs; in regression, it estimates, for each of them, a number, which usually has the mathematical meaning of a probability. Some algorithms can also perform both classification and regression tasks. Such models include decision trees (and random forests, which are their ensemble versione) and support vector machines. Other notable model classes include general linear models (which can be employed for regression, as linear regression, or classification, as logistic regression, whose name is misleading) and Cox regression (which handles temporal data). Least absolute shrinkage and selection operator (LASSO) involves linear regression with feature selection (regularization).
A noteworthy subset of machine learning is deep learning, which employs artificial neural networks with more than one hidden layer. Artificial neural networks are machine learning algorithms inspired by human brain: they are composed of relatively simple units (called ‘artificial neurons’, imitating biological neurons), which are networked together in a complex fashion (Figure 1).
Perceptrons are usually grouped into ‘layers’, each performing a specific task; ‘hidden’ layers are all besides the first and the last ones, which are the input and the input, respectively. Because of its architectural connectivity, deep learning models can accomplish complex tasks, as classification and generation. In medicine, it is mainly applied to images.
Medical images –such as imaging, histology and endoscopy– are fundamental, for both diagnostic and therapeutic purposes [13]. But images are not the only data type employable in medicine, to support decision making: structured data, as well as unstructured data different from images, are also essential6. Real world data [14,15] –defined as all data generated outside clinical trials[3,16]– are an important source for evidence generation. Particularly, electronic health records (or electronic medical records) –as medical reports and nurse diaries– can be a mine of patient information [17]. Medical big data are hence intrinsically multimodal, i.e. are of heterogeneous types and come from a plethora of sources.
Multimodality is also a property of oncology, particularly for RC [3,5]. Multimodal integration can be beneficial for the whole medical workflow, in both clinical practice and research: disease understanding, prevention, diagnosis, therapy, prognosis and follow-up. The integration of different roles –including but not limited to physicians, nurses, biomedical technicians and data scientists– is of paramount importance for advancing knowledge and for improving healthcare. Medicine is a multi-source, synthetic, holistic process: it cannot be reduced to images only.
Nonetheless, to the best of our knowledge, all reviews on artificial intelligence applied to rectal cancer focus on models whose input are based solely on images (herein called ‘image models’). This seems to contradict not only the above mentioned medical holism, but also the literature. Indeed, models with only non image input (herein termed ‘non image models’) can show high performance, e.g. in pan cancer application, entirely based on RWD/EHRs [18], or in RC specifically, as presented in the Results and Discussion section. Moreover, there are also ‘multimodal’ models, which are multivariate models (analyzing multiple variables) with multiple variable types, i.e. artificial intelligence models whose input is simultaneously based on images and non image components. Such models (in the following named ‘combined models’; in the literature also known as ‘hybrid models’ or ‘holistic models’) can exhibit higher performances than their unimodal, constituing models [6,11,19], highlighting even more the importance of multimodality.
While three of them [11,12,20] only briefly mention the existence of combined models, all reviews found [11,12,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36] focus solely on image models. Since non image inputs –integrated (combined models) or not (non image models) with images– are as essential as image inputs, reviewing the existing primary literature about them is needed.
This Review focuses on original articles reporting artificial intelligence models (particularly machine learning algorithms) that have some non image input and that are applied to rectal cancer, without restrictions on study outcome.

2. Results and Discussion

Table 1 and Table 2 present primary literature (N = 23 articles, for a total of 11,941 patients) dealing with artificial intelligence applied to rectal cancer, with different outcomes, with at least one non image input, so both non image models and combined models.
Table 1 contains general information, while Table 2 provides more in-depth analysis.
In Table 1, as per aim, most articles address the prediction of the response to neoadjuvant therapy (neoadjuvant chemoradiotherapy, part of the standard treatment of locally advanced rectal cancer); additional purposes include prediction of metastases and other rectal cancer features. Particularly, response prediction is analyzed in 11 (47.8 %) papers; prediction of clinical outcomes (overall survival, local recurrence and distant metastases) and of risk factors (KRAS mutation, tumor deposit, peri neural invasion, extra mural venous invasion) are the purpose of 4 (17.4 %) articles for each subset; prediction of lymph node staging in 3 (13.0 %) papers; prediction of histological examination in 1 (4.4 %) article.
Sample size is between hundreds and thousands of patients per study.
Classification based on TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement [62] is reported. Generally, based on it, one prediction model (particularly, based on artificial intelligence) can be regrouped in this manner: 1a training only; 1b training plus validation based on resampling (cross validation, bootstrapping, ...); 2a training plus validation based on random split (between training and validation data sets); 2b training plus validation based on non random split (e.g. based on temporal flow of the patients); 3 training plus external validation (i.e. training and validation are implemented upon different populations, centers, institutions, even countries and continents); 4 (external) validation only (this applies to an already published prediction model).
Most artificial intelligence models for rectal cancer that were found belong to the combined model category (N = 20), compared to non image models (N = 3). Variability among chosen artificial intelligence model (i.e. reason why so many models exist and continue to be used, instead of choosing always an optimal, ideal model) is linked to the so called ‘no free lunch theorem’ [64]. This fundamental theorem holds (in its original form or in analogous formulations) for many artificial intelligence tools (supervised models, unsupervised models, feature selection techniques, ...). It declares that no artificial intelligence model performance is the absolute best across all scenarios, contexts and data sets. This explain the variety of methodologies found in artificial intelligence. The commonest type of model found (69.6 %), including both non image and and combined models, is regression (either logistic or Cox ones); particularly, most models (52.2 %) have logistic regression type, which could be due to its interpretability, as model coefficients are directly related to the importance of each predictor; if models are ranked by decreasing patient number, the first three models are instead all Cox regression type. Another notable technique is support vector machine (8.7 %).
Table 2 shows additional information about articles listed in Table 1, particularly details about artificial intelligence model in terms of its input(s) and its validation.
For those models having an image component (combined models), the most common source of images is imaging techniques, mainly magnetic resonance imaging (65.0 %) and computerized tomography (25.0 %). An intensively employed methodology to deal with images is radiomics; as mentioned above, deep learning is also frequently applied to images.
Performance is expressed through either area under the receiver operating characteristic (AUROC or simply AUC) curve, concordance index (C index, a generalization of AUC [39]) or hazard ratio. Both non image models and combined models show relatively good performances (ranges of AUC and C index of 0.70–0.97 and 0.68–0.78, respectively). Interestingly, in combined models, an integrated model (e.g. deep learning plus pathological staging markers [47]) very often (75.0 %) shows higher performance than its individual components. Both these aspects suggest that multimodality is highly desirable, in agreement with the literature [64]. This can be more general, as it also holds even for images alone, i.e. without any non image input (e.g. where multiple imaging modalities are combined together, within the same model) [65].
Some notes about how performance was reported. If the model is externally validated, then the performance refers to external validation; otherwise, to the internal one. If more than one model is present for the same outcome, the best performing model is presented. If more than one outcome is pursued in the study, performance will be shown as a range.
It can be noted that too many studies (65.2 %) are still monocenter. Multicenter projects tend to be better than the latter, as they tend to provide both better input data, as per both quantity (number of patients, also pooled [39]) and quality (diversity and heterogeneity of populations, lowering overfitting and promoting generalizability), and consequent better performance and clinical applicability of the model. Particularly, although externally validated designs are encouragingly rising, compared to a decade ago, their presence should continue to increase, for the reasons previously stated. To summarize, more multicenter, externally validated, prospective studies are needed [66].
Regarding data preprocessing, most (82.6 %) articles report at least one methodology.
The most reported methodology type (60.9 %) is feature selection, which seems reasonable, given the well known importance of the ‘dimensionality curse’. The latter refers to the need of empirically finding an optimal balance between underfitting (where the proposed model is too simple, hence uncapable to accurately describe the data and produce reliable predictions) and overfitting (the model is instead excessively complex, as it copies exactly the training data patterns but it is uncapable of properly generalizing them to new, unseen data). This trade off is reached by choosing an appropriate number of model predictors; this number should be related to the number of data items (e.g. quantities as events per variable).
Other reported methods include data normalization and standardization (39.1 %), imputation (13.0%), augmentation (4.3%) and binning (4.3%). Data normalization consists in mathematically ‘compacting’ one variable data to the [0, 1] interval; data standardization involves bringing them to a standard Gaussian distribution (with mean 0 and standard deviation 1). Data imputation is a set of techniques whose aim is to handle missing values, estimating them based on various mathematical methods (the simplest being substituing a missing value in a feature with the feature median or mean). Data augmentation is the synthetic generation of data, which can be particularly useful in the case of class imbalance: in a binary outcome scenario, this would mean that the proportion of data items with a certain outcome value (label) is significantly different compared to that with the other, complementary label.
As previously meantioned, the rationale to explore and analyze the above presented primary articles is that all (N = 19) found secondary literature papers [11,12,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36] focus solely on image models; at maximum, few of them briefly mention the existence of the combined models; but none of them report about non image models. Thus, a detailed review of artificial intelligence models whose input has at least one non image element was needed. This provided the motivation for the present study. We now present more details about this evaluation of the reviews status: the secondary literature analysis is summarized in Table 3. As a note, almost half of the found reviews (42.1 %) deal with radiomics.

3. Conclusion

While there are no reviews dealing with artificial intelligence models applied to rectal cancer with non image inputs, the latter –alone (non image models) or together with image inputs (combined models)– can significantly and reliably describe medical data sets, predicting clinical outcomes with high performance, comparable to that of image models. Moreover, combined models usually exhibit better behavior compared with their constituting parts; this points to promote the synergic, holistic usage of multimodal data sources, to more comprehensively describe reality –also complex as human cancer– and find actionable features. Hence, multimodality –in both artificial intelligence and oncological management– is a fundamental approach to benefit patients.
As per clinical perspective on obtained results, rectal cancer is well suited as pathological model to be employed with artificial intelligence methodologies: necessity of multimodal therapies (radiotherapy, chemotherapy, surgery –all three are involved); opportunity to intensify therapies based on risk (adjuvant chemotherapy yes/no, demolitive surgery yes/no, radiotherapy dose escalation yes/no); good healing possibility, hence need to identify patients with either high risk to cure them or low risk to avoid unneeded toxicity. Because of those reasons, the literature has chosen to investigate –as clinical question for artificial intelligence techniques– stratification of outcomes, prediction of risk factors present at histological exam, prediction of lymph node staging and response prediction.
To conclude, within artificial intelligence applied to rectal cancer, it is needed to integrate different aspects: multiple data types (image and non image inputs); several medical procedures (neoadjuvant chemoradiotherapy and surgery); complementary knowledge branches (multidisciplinary clinical board and research team). This can be true, more generally, also for other oncological and non diseases. In the currently ever growing patient-centric viewpoint, it is ultimately beneficial to contemplate sources and features beyond images.

4. Methods

The articles, regarding artificial intelligence applied to rectal cancer, were searched on PubMed website (comprising MEDLINE database plus others), without temporal period limitations, written in English language only, through the following comprehensive search string: rectal-cancer AND (artificial-intelligence OR machine-learning OR deep-learning OR predictive-model ). Regarding secondary literature, found articles were internally filtered according to all these three literature types: reviews, systematic reviews and meta analyses.

References

  1. Sung H et al 2021. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA: A Cancer Journal for Clinicians 71(3): 209–49. [CrossRef]
  2. Vos T et al 2020. Global Burden of 369 Diseases and Injuries in 204 Countries and Territories, 1990–2019: A Systematic Analysis for the Global Burden of Disease Study 2019. The Lancet 396(10258): 1204–22. [CrossRef]
  3. Li A et al 2020. Clinical Trial Design: Past, Present, and Future in the Context of Big Data and Precision Medicine. Cancer 126(22): 4838–46. [CrossRef]
  4. Damiani A et al 2018. Large Databases (Big Data) and Evidence-Based Medicine. European Journal of Internal Medicine 53: 1–2. [CrossRef]
  5. De Maria Marchiano R et al 2021. Translational Research in the Era of Precision Medicine: Where We Are and Where We Will Go. Journal of Personalized Medicine 11(3): 216. [CrossRef]
  6. Acosta JN et al 2022. Multimodal Biomedical AI. Nature Medicine 28(9): 1773–84. [CrossRef]
  7. Lambin P et al 2013. Predicting Outcomes in Radiation Oncology—Multifactorial Decision Support Systems. Nature Reviews Clinical Oncology 10(1): 27–40. [CrossRef]
  8. Elmore JG et al 2021. Data Quality, Data Sharing, and Moving Artificial Intelligence Forward. JAMA Network Open 4(8): e2119345. [CrossRef]
  9. Torkamani A et al 2017. High-Definition Medicine. Cell 170(5): 828–843. [CrossRef]
  10. Cuocolo R et al 2020. Machine Learning in oncology: A clinical appraisal. Cancer Letters 481: 55–62. [CrossRef]
  11. Lambin P et al 2017. Radiomics: The Bridge between Medical Imaging and Personalized Medicine. Nature Reviews Clinical Oncology 14(12): 749–62. [CrossRef]
  12. Meldolesi E et al 2016. Standardized Data Collection to Build Prediction Models in Oncology: A Prototype for Rectal Cancer. Future Oncology 12(1): 119–36. [CrossRef]
  13. Cusumano D et al 2021. A field strength independent MR radiomics model to predict pathological complete response in locally advanced rectal cancer. Radiol Med. 126(3): 421–429. [CrossRef]
  14. Valentini V et al 2024. Four steps in the evolution of rectal cancer managements through 40 years of clinical practice: Pioneering, standardization, challenges and personalization. Radiother Oncol. 194: 110190. [CrossRef]
  15. Savino M et al 2023. A process mining approach for clinical guidelines compliance: real-world application in rectal cancer. Front Oncol. 13: 1090076. [CrossRef]
  16. Ramagopalan SV et al 2020. Can Real-World Data Really Replace Randomised Clinical Trials? BMC Medicine 18(1): 13. [CrossRef]
  17. Penberthy LT et al 2022. An Overview of Real-world Data Sources for Oncology and Considerations for Research. CA: A Cancer Journal for Clinicians 72(3): 287–300. [CrossRef]
  18. Morin O et al 2021. An Artificial Intelligence Framework Integrating Longitudinal Electronic Health Records with Real-World Data Enables Continuous Pan-Cancer Prognostication. Nature Cancer 2(7): 709–22. [CrossRef]
  19. Bi WL et al 2019. Artificial Intelligence in Cancer Imaging: Clinical Challenges and Applications. CA: A Cancer Journal for Clinicians caac.21552. [CrossRef]
  20. Jia LL et al 2022. Artificial Intelligence with Magnetic Resonance Imaging for Prediction of Pathological Complete Response to Neoadjuvant Chemoradiotherapy in Rectal Cancer: A Systematic Review and Meta-Analysis. Frontiers in Oncology 12: 1026216. [CrossRef]
  21. Bedrikovetski S et al 2021. Artificial Intelligence for Pre-Operative Lymph Node Staging in Colorectal Cancer: A Systematic Review and Meta-Analysis. BMC Cancer 21(1): 1058. [CrossRef]
  22. Coppola F et al 2021. Radiomics and Magnetic Resonance Imaging of Rectal Cancer: From Engineering to Clinical Practice. Diagnostics 11(5): 756. [CrossRef]
  23. Kalantar R et al 2021. Automatic Segmentation of Pelvic Cancers Using Deep Learning: State-of-the-Art Approaches and Challenges. Diagnostics 11(11): 1964. [CrossRef]
  24. Kuntz S et al 2021. Gastrointestinal Cancer Classification and Prognostication from Histology Using Deep Learning: Systematic Review. European Journal of Cancer 155: 200–215. [CrossRef]
  25. Kwok HC et al 2022. Rectal MRI Radiomics Inter- and Intra-Reader Reliability: Should We Worry about That? Abdominal Radiology 47(6): 2004–13. [CrossRef]
  26. Miranda J et al 2022. Rectal MRI Radiomics for Predicting Pathological Complete Response: Where We Are. Clinical Imaging 82: 141–49. [CrossRef]
  27. Namikawa K et al 2020. Utilizing Artificial Intelligence in Endoscopy: A Clinician’s Guide. Expert Review of Gastroenterology & Hepatology 14(8): 689–706. [CrossRef]
  28. Pacal I et al 2020. A Comprehensive Review of Deep Learning in Colon Cancer. Computers in Biology and Medicine 126: 104003. [CrossRef]
  29. Qin Y et al 2022. Review of Radiomics- and Dosiomics-Based Predicting Models for Rectal Cancer. Frontiers in Oncology 12: 913683. [CrossRef]
  30. Reginelli A et al 2021. Radiomics as a New Frontier of Imaging for Cancer Prognosis: A Narrative Review. Diagnostics 11(10): 1796. [CrossRef]
  31. Staal FCR et al 2021. Radiomics for the Prediction of Treatment Outcome and Survival in Patients With Colorectal Cancer: A Systematic Review. Clinical Colorectal Cancer 20(1): 52–71. [CrossRef]
  32. Stanzione A et al 2021. Radiomics and Machine Learning Applications in Rectal Cancer: Current Update and Future Perspectives. World Journal of Gastroenterology 27(32): 5306–21. [CrossRef]
  33. Tabari A et al 2022. Role of Machine Learning in Precision Oncology: Applications in Gastrointestinal Cancers. Cancers 15(1): 63. [CrossRef]
  34. Wang PP et al 2021. Magnetic Resonance Imaging-Based Artificial Intelligence Model in Rectal Cancer. World Journal of Gastroenterology 27(18): 2122–30. [CrossRef]
  35. Wong C et al 2023. MRI-Based Artificial Intelligence in Rectal Cancer. Journal of Magnetic Resonance Imaging 57(1): 45–56. [CrossRef]
  36. Xu Q et al 2021. MRI Evaluation of Complete Response of Locally Advanced Rectal Cancer After Neoadjuvant Therapy: Current Status and Future Trends. Cancer Management and Research 13: 4317–28. [CrossRef]
  37. Peng J et al 2014. Prognostic Nomograms for Predicting Survival and Distant Metastases in Locally Advanced Rectal Cancers. PLoS ONE 9(8): e106344. [CrossRef]
  38. Sun Y et al 2017. A Nomogram to Predict Distant Metastasis after Neoadjuvant Chemoradiotherapy and Radical Surgery in Patients with Locally Advanced Rectal Cancer. Journal of Surgical Oncology 115(4): 462–69. [CrossRef]
  39. Valentini V et al 2011. Nomograms for Predicting Local Recurrence, Distant Metastases, and Overall Survival for Patients With Locally Advanced Rectal Cancer on the Basis of European Randomized Clinical Trials. Journal of Clinical Oncology 29(23): 3163–72. [CrossRef]
  40. Chen LD et al 2020. Preoperative Prediction of Tumour Deposits in Rectal Cancer by an Artificial Neural Network–Based US Radiomics Model. European Radiology 30(4): 1969–79. [CrossRef]
  41. Cheng Y et al 2021. Multiparametric MRI-Based Radiomics Approaches on Predicting Response to Neoadjuvant Chemoradiotherapy (NCRT) in Patients with Rectal Cancer. Abdominal Radiology 46(11): 5072–85. [CrossRef]
  42. Cui Y et al 2019. Radiomics Analysis of Multiparametric MRI for Prediction of Pathological Complete Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer. European Radiology 29(3): 1211–20. [CrossRef]
  43. Dinapoli N et al 2018. Magnetic Resonance, Vendor-Independent, Intensity Histogram Analysis Predicting Pathologic Complete Response After Radiochemotherapy of Rectal Cancer. International Journal of Radiation Oncology*Biology*Physics 102(4): 765–74. [CrossRef]
  44. Ding L et al 2020. A Deep Learning Nomogram Kit for Predicting Metastatic Lymph Nodes in Rectal Cancer. Cancer Medicine 9(23): 8809–20. [CrossRef]
  45. Huang YQ et al 2016. Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer. Journal of Clinical Oncology 34(18): 2157–64. [CrossRef]
  46. Jin C et al 2021. Predicting Treatment Response from Longitudinal Images Using Multi-Task Deep Learning. Nature Communications 12(1): 1851. [CrossRef]
  47. Kleppe A et al 2022. A Clinical Decision Support System Optimising Adjuvant Chemotherapy for Colorectal Cancers by Integrating Deep Learning and Pathological Staging Markers: A Development and Validation Study. The Lancet Oncology 23(9): 1221–32. [CrossRef]
  48. Li M et al 2021. Radiomics for Predicting Perineural Invasion Status in Rectal Cancer. World Journal of Gastroenterology 27(33): 5610–21. [CrossRef]
  49. Liu H et al 2022. A Deep Learning Model Based on MRI and Clinical Factors Facilitates Noninvasive Evaluation of KRAS Mutation in Rectal Cancer. Journal of Magnetic Resonance Imaging 56(6): 1659–68. [CrossRef]
  50. Liu S et al 2021. Machine Learning-Based Radiomics Nomogram for Detecting Extramural Venous Invasion in Rectal Cancer. Frontiers in Oncology 11: 610338. [CrossRef]
  51. Liu X et al 2021. Deep Learning Radiomics-Based Prediction of Distant Metastasis in Patients with Locally Advanced Rectal Cancer after Neoadjuvant Chemoradiotherapy: A Multicentre Study. eBioMedicine 69: 103442. [CrossRef]
  52. Mao Y et al 2022. Pre-Treatment Computed Tomography Radiomics for Predicting the Response to Neoadjuvant Chemoradiation in Locally Advanced Rectal Cancer: A Retrospective Study. Frontiers in Oncology 12: 850774. [CrossRef]
  53. Peterson KJ et al 2023. Predicting Neoadjuvant Treatment Response in Rectal Cancer Using Machine Learning: Evaluation of MRI-Based Radiomic and Clinical Models. Journal of Gastrointestinal Surgery 27(1): 122–30. [CrossRef]
  54. van Stiphout RGPM et al 2011. Development and External Validation of a Predictive Model for Pathological Complete Response of Rectal Cancer Patients Including Sequential PET-CT Imaging. Radiotherapy and Oncology 98(1): 126–33. [CrossRef]
  55. van Stiphout RGPM et al 2014. Nomogram Predicting Response after Chemoradiotherapy in Rectal Cancer Using Sequential PETCT Imaging: A Multicentric Prospective Study with External Validation. Radiotherapy and Oncology 113(2): 215–22. [CrossRef]
  56. Wan L et al 2019. Developing a Prediction Model Based on MRI for Pathological Complete Response after Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer. Abdominal Radiology 44(9): 2978–87. [CrossRef]
  57. Wei Q et al 2023. External Validation and Comparison of MR-Based Radiomics Models for Predicting Pathological Complete Response in Locally Advanced Rectal Cancer: A Two-Centre, Multi-Vendor Study. European Radiology 33(3): 1906–17. [CrossRef]
  58. Wei Q et al 2023. Preoperative MR Radiomics Based on High-Resolution T2-Weighted Images and Amide Proton Transfer-Weighted Imaging for Predicting Lymph Node Metastasis in Rectal Adenocarcinoma. Abdominal Radiology 48(2): 458–70. [CrossRef]
  59. Yi X et al 2019. MRI-Based Radiomics Predicts Tumor Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer. Frontiers in Oncology 9: 552. [CrossRef]
  60. Roselló S et al 2018. Integrating Downstaging in the Risk Assessment of Patients With Locally Advanced Rectal Cancer Treated With Neoadjuvant Chemoradiotherapy: Validation of Valentini’s Nomograms and the Neoadjuvant Rectal Score. Clinical Colorectal Cancer 17(2): 104–112.e2. [CrossRef]
  61. Tang B et al 2022. Local Tuning of Radiomics-Based Model for Predicting Pathological Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer. BMC Medical Imaging 22(1): 44. [CrossRef]
  62. Collins GS et al 2015. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 350, g7594–g7594. [CrossRef]
  63. Montesinos-López OA et al 2022. Accounting for Correlation Between Traits in Genomic Prediction. Methods in Molecular Biology 2467: 285–327. [CrossRef]
  64. Lipkova J et al 2022. Artificial Intelligence for Multimodal Data Integration in Oncology. Cancer Cell 40(10): 1095–1110. [CrossRef]
  65. Feng L et al 2022. Development and Validation of a Radiopathomics Model to Predict Pathological Complete Response to Neoadjuvant Chemoradiotherapy in Locally Advanced Rectal Cancer: A Multicentre Observational Study. The Lancet Digital Health 4(1): e8–17. [CrossRef]
  66. Kim DW et al 2019. Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers. Korean Journal of Radiology 20(3): 405. [CrossRef]
Figure 1. Schematic representation of a typical artificial neural network , with multiple hidden layers (i.e. deep learning). Circles represent perceptrons.
Figure 1. Schematic representation of a typical artificial neural network , with multiple hidden layers (i.e. deep learning). Circles represent perceptrons.
Preprints 160652 g001
Table 1. Literature about non image artificial intelligence with application to rectal cancer.
Table 1. Literature about non image artificial intelligence with application to rectal cancer.
Reference First author Year Journal Aim Number of patients TRIPOD AI model(s)
Total Training Validation
NIMs
37 Peng J 2014 PLOS One Predict OS, DMs (and LR) 917 833 84 2b Cox regression
38 Sun Y 2017 Journal of Surgical Oncology Predict DMs after nCRT 522 425 97 2b Cox regression
39 Valentini V 2011 Journal of Clinical Oncology Predict OS, LR, DMs 2795 2242 553 3 Cox regression
CMs
40 Chen LD 2020 European Radiology Predict TDs 127 87 40 2b ANN
41 Cheng Y 2021 Abdominal Radiology Predict response (particularly, pCR) to nCRT 193 128 65 2a Logistic regression
42 Cui Y 2019 European Radiology Predict response (particularly, pCR) to nCRT 186 131 55 2a Logistic regression
43 Dinapoli N 2018 International Journal of Radiation Oncology Predict response (particularly, pCR) to nCRT 221 162 59 3 GLM
44 Ding L 2020 Cancer Medicine Predict preoperative LN metastases 545 362 183 2a Logistic regression, DL
45 Huang YQ 2016 Journal of Clinical Oncology Predict preoperative LN metastases 526 326 200 2b Logistic regression
46 Jin C 2021 Nature Communications Predict response (particularly, pCR) to nCRT 622 321 301 3 DL
47 Kleppe A 2022 Lancet Oncology Optimize adjuvant therapy 2072 997 1075 3 Cox regression, DL
48 Li M 2021 World Journal of Gastroenterology Predict PNI 303 242 61 2a Logistic regression
49 Liu H 2022 Journal of Magnetic Resonance Imaging Evaluate KRAS mutation 376 288 88 2a Logistic regression, DL
50 Liu S 2021 Frontiers in Oncology Detect preoperative EMVI 281 198 83 2b Logistic regression
51 Liu X 2021 Lancet EBioMedicine Predict DMs after nCRT 235 170 65 3 DL
52 Mao Y 2022 Frontiers in Oncology Predict response (particularly, pCR) to nCRT 216 151 65 2a Logistic regression
53 Peterson KJ 2023 Journal of Gastrointestinal Surgery Predict response (particularly, pCR) to nCRT 131 111 20 2a Logistic regression
54 van Stiphout RGPM 2011 Radiotherapy & Oncology Predict response (particularly, pCR) to nCRT 953 Various groupings 3 SVM
55 van Stiphout RGPM 2014 Radiotherapy & Oncology Predict response (particularly, pCR) to nCRT 190 112 78 3 Logistic regression
56 Wan L 2019 Abdominal Radiology Predict response (particularly, pCR) to nCRT 120 84 36 2b Logistic regression
57 Wei Q 2023 European Radiology Predict response (particularly, pCR) to nCRT 151 100 51 3 (plus 2a) RF
58 Wei Q 2023 Abdominal Radiology Predict preoperative LN metastases 125 80 45 2a Logistic regression
59 Yi X 2019 Frontiers in Oncology Predict response (particularly, pCR) to nCRT 134 101 33 2a SVM, RF, LASSO
Legend. DM = Distant metastasis. EMVI = Extra mural venous invasion. GLM = General linear model. LASSO = Least absolute shrinkage and selection operator. LN = Lymph node. LR = Local recurrence. nCRT = Neoadjuvant chemoradiotherapy. OS = Overall survival. pCR = Pathological complete response. PNI = Peri neural invasion. RF = Random forest. SVM = Support vector machine. TD = Tumor deposit. TRIPOD = Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis.
Table 2. Details about studies listed in Table 1.
Table 2. Details about studies listed in Table 1.
Reference First author Year Model input(s) Performance Number of centers External validation(s)? Note
Non images? Images? CM better?
NIMs
[37] Peng J 2014 Yes (demographic and clinicopathological) C-Index = 0.73–0.76 1 No
[38] Sun Y 2017 Yes
(clinicopathological)
C-Index = 0.71 1 No
[39] Valentini V 2011 Yes C-Index = 0.68–0.73 5 Yes Subsequently re-validated by Reference [60]
CMs
[40] Chen LD 2020 Yes (clinical) Yes (radiomics, ANN, US) Yes AUC = 0.80 1 No
[41] Cheng Y 2021 Yes Yes (radiomics, MRI) Yes AUC = 0.91–0.94 1 No
[42] Cui Y 2019 Yes (from EHRs) Yes (radiomics, MRI) Yes AUC = 0.97 1 No
[43] Dinapoli N 2018 Yes (clinical) Yes (radiomics, MRI) AUC = 0.75 3 Yes Subsequently re-validated by Reference [61]
[44] Ding L 2020 Yes Yes (DL, MRI) AUC = 0.89–0.92 1 No
[45] Huang YQ 2016 Yes
(clinicopathological)
Yes (radiomics, CT) C-Index = 0.78 1 No
[46] Jin C 2021 Yes (blood tumor markers) Yes (DL, MRI) Yes AUC = 0.97 3 Yes
[47] Kleppe A 2022 Yes (markers) Yes (DL, histopathology) Yes HR = 3.06–10.71 3 Yes
[48] Li M 2021 Yes (clinical) Yes (radiomics, CT) Yes AUC = 0.80 1 No
[49] Liu H 2022 Yes (clinical) Yes (DL, MRI) Yes AUC = 0.84 1 No
[50] Liu S 2021 Yes (clinical) Yes (radiomics, MRI) Yes AUC = 0.86 1 No
[51] Liu X 2021 Yes
(clinicopathological)
Yes (radiomics, DL, MRI) Yes C-Index = 0.78 3 Yes
[52] Mao Y 2022 Yes
(clinicopathological)
Yes (radiomics, CT) Yes AUC = 0.87 1 No
[53] Peterson KJ 2023 Yes (clinical [from EHRs]) Yes (radiomics, MRI) Yes AUC = 0.73 1 No
[54] van Stiphout RGPM 2011 Yes (clinical) Yes (PET-CT) Yes AUC = 0.86 4 Yes
[55] van Stiphout RGPM 2014 Yes (clinical) Yes (PET-CT) AUC = 0.70 2 Yes
[56] Wan L 2019 Yes (clinical) Yes (MRI) Yes AUC = 0.84 1 No
[57] Wei Q 2023 Yes (clinical) Yes (radiomics, MRI) Yes AUC = 0.87 2 Yes
[58] Wei Q 2023 Yes (clinical) Yes (radiomics, MRI) Yes AUC = 0.85 1 No
[59] Yi X 2019 Yes (radiological,
clinicopathological)
Yes (radiomics, MRI) AUC = 0.90–0.93 1 No
Legend. AUC = Area under the ROC curve. C Index = Concordance index. CT = Computerized tomography. HR = Hazard ratio. MRI = Magnetic resonance imaging. PET = Positron emission tomography. ROC = Receiver operating characteristic. US = Ultra sound.
Table 3. Review articles, dealing with artificial intelligence models of rectal cancer, found in our initial secondary literature evaluation.
Table 3. Review articles, dealing with artificial intelligence models of rectal cancer, found in our initial secondary literature evaluation.
Reference First author Year Journal IMs CMs NIMs Notes
[11] Lambin P 2017 Nature Reviews Clinical Oncology Main focus Briefly mentioned Not present Radiomics
[12] Meldolesi E 2016 Future Oncology
[20] Jia LL 2022 Frontiers in Oncology Systematic review (with meta-analysis)
[21] Bedrikovetski S 2021 BMC Cancer
[22] Coppola F 2021 Diagnostics Radiomics
[23] Kalantar R 2021 Diagnostics Deep learning
[24] Kuntz S 2021 European Journal of Cancer Deep learning; Systematic review
[25] Kwok HC 2022 Abdominal Radiology Radiomics
[26] Miranda J 2022 Clinical Imaging
[27] Namikawa K 2020 Expert Review of Gastroenterology & Hepatology
[28] Pacal I 2020 Computers in Biology and Medicine Deep learning
[29] Qin Y 2022 Frontiers in Oncology Radiomics
[30] Reginelli A 2021 Diagnostics
[31] Staal FCR 2021 Clinical Colorectal Cancer Radiomics; Systematic review
[32] Stanzione A 2021 World Journal of Gastroenterology Radiomics
[33] Tabari A 2022 Cancers
[34] Wang PP 2021 World Journal of Gastroenterology
[35] Wong C 2023 Journal of Magnetic Resonance Imaging
[36] Xu Q 2021 Cancer Management and Research
Legend. CM = Combined model. IM = Image model. NIM = Non image model.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated