Submitted:
02 June 2026
Posted:
03 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods: Scope, Evidence Selection, and Narrative Synthesis
3. Meta-Scientific Mechanisms of Evidentiary Fragility
4. Why Radiomics Is Structurally Exposed
4.1. Combinatorial Analytical Flexibility
4.2. Acquisition-Dependent Measurement Instability
4.3. Segmentation Variability
4.4. Retrospective, Monocentric Data Dominance
4.5. Small Samples Relative to Analytic Complexity
4.6. Leakage-Prone Validation
4.7. Weak Linkage to Clinical Decision-Making
5. Empirical Signatures of Evidentiary Fragility
5.1. Quality, Publication Bias, and Validation
5.2. Reporting Transparency
5.3. Benchmarks and Translation
5.4. Retractions as an Ecosystem Signal
6. From Recurrent Patterns to Systemic Interpretation
6.1. Paper-Grade Evidence as the Product of a Self-Reinforcing System
7. Standards and Safeguards Across the Radiomics Pipeline
7.1. Measurement, Reporting, and Statistical Planning
7.2. Leakage-Free Model Development
7.3. Validation Hierarchy, Calibration, and Clinical Utility
7.4. Open Science and Preregistration
7.5. Testing Robustness to Analytic Flexibility
7.6. The Shifting Technical Frontier: Foundation Models and Generative AI
8. Implications for the Field: From Available Standards to Routine Adoption
9. Limitations and Boundary Conditions
10. Conclusions
| Mechanism of evidentiary fragility | Expected ecosystem-level signature | Representative measured evidence in radiomics |
|---|---|---|
| 1. Persistently low and uneven methodological maturity | Formal standards, reporting frameworks, and quality tools expand, yet the average methodological quality of the published literature remains low and varies across radiomics subdomains. | Large-scale RQS evidence remains consistently poor: mean RQS 26.1% across 3258 assessments, with only 7.2% reaching ≥50% of the maximum score [75]; median RQS 31% across 1574 publications [74]; delta-radiomics median RQS 25%, with 51.2% of studies scoring <25% [41]; endometrial MRI radiomics mean RQS 13.77 [146] |
| 2. Positive-result selection and publication bias | Null, negative, and non-superior findings are selectively underrepresented, while positive claims dominate the published record and are often insufficiently benchmarked against simpler alternatives. | Positive findings dominate the literature: NEVER found only 1/149 negative studies (0.7%) and no non-radiomic comparator in 44% [60]; glioma radiomics reported positive findings in 26/27 studies (96%) [84]; cardiovascular radiomics showed funnel-plot asymmetry by Egger’s test (z = −2.39, p = 0.017) [83]. |
| 3. Leakage-prone analytical flexibility | Researcher degrees of freedom and workflow errors that compromise separation between model development and evaluation inflate apparent performance. | Empirical leakage studies show substantial optimism: feature selection outside cross-validation inflated AUC by up to 0.15 [68]; oversampling before cross-validation biased AUC by up to 0.34, produced AUCs up to 0.90 with random outcomes, and was likely present in 5/34 radiomics papers from 2023 [70] |
| 4. Underpowered model development and winner’s curse | Sample sizes remain too small relative to model complexity, favouring unstable estimates, optimistic effect sizes, and poor transportability. | In 116 binary-outcome radiomics prediction studies, only 11/116 (9.5%) justified sample size and only 6 included an a priori calculation; median training size was 150, median EPP 7.5, median Riley-based shortfall 268 patients, and only 12/116 (10.3%) met all adequacy criteria [66]. |
| 5. External-validation deficit and fragile generalisability | Models are rarely tested on genuinely independent populations; when transported beyond development data, apparent performance commonly attenuates. | External validation was absent in 121/149 NEVER studies (81%) [60]; across 1574 radiomics publications, only 14% reported independent external validation and 32% lacked any separate validation set [74]. In radiologic deep learning, external validation reduced median AUC by −0.046, with decline in 70/86 algorithms (81%) [85]. |
| 6. Acquisition-driven measurement instability | Apparent radiomic signal fails to transport across scanners, platforms, or acquisition settings, indicating vulnerability to non-biological measurement variation. | Acquisition effects markedly impair reproducibility: across five CT systems, 97.1% of features were repeatable on test–retest (scan–rescan), but inter-system reproducibility was poor (mean ICC/CCC 0.157 ± 0.174), with no feature reaching ICC/CCC >0.90 across systems [48]; on photon-counting detector CT, no features were robust to high-pitch acquisition or slice-thickness changes [49]; and acquisition effects exceeded segmentation effects in a 481-study reliability review [50]. |
| 7. Reporting opacity, self-assessment inflation, and weak open-science practice | Methodological errors remain difficult to detect, formal self-audit is rare, and independent verification is constrained by incomplete reporting and limited sharing of code and data. | Transparency remains limited: only 7/117 studies (6%) included a self-reported checklist/quality score [61]; CLEAR adoption was 2%, with self-reported adherence exceeding expert-confirmed adherence by 21 percentage points [88]; among 257 studies, 6% shared data/open datasets, 7% shared code, and 3% shared both [89]; 0/195 empirical radiology articles shared analysis scripts [147]. |
| 8. Quality-sensitive heterogeneity in evidence synthesis | Methodological quality is not merely a descriptive deficit: it becomes a measurable source of variation in pooled performance estimates. | Study quality measurably affects pooled evidence: in endometrial MRI radiomics, higher RQS was associated with lower QUADAS-2 risk, more recent publication year, and higher reported performance [146]; and in CT hematoma-expansion deep learning, subgroup analyses showed significant performance differences by segmentation technique and study quality [87]. |
| 9. Weak cumulative evidence and stalled clinical translation | Local methodological fragilities accumulate into low-certainty evidence at synthesis level and limited progression toward clinically embedded use. | Meta-evidence remains low-certainty: among 53 re-performed radiomics meta-analyses, only 3/53 associations (5.7%) were convincing and 43/53 (81%) were weak [86]. Fewer than 20 oncologic radiomics studies used clinical-trial data, and no published model had been prospectively implemented as routine clinical decision support [67]; this is consistent with recent analyses describing a widening publication–translation gap [6]. |
| Translational stage | Principal vulnerability / failure mode | Minimum operational requirement for a non-exploratory clinical claim | Primary standard(s), guideline(s), or framework(s) | Empirical evidence that the requirement remains insufficiently met |
|---|---|---|---|---|
| 1. Study conception | Analytical flexibility, post hoc hypothesis shaping, selective reporting | Pre-specify the clinical question, population, endpoint, candidate predictors, analysis plan, validation strategy, and primary performance metrics | Preregistration [128] Registered Reports [140], TOP guidelines [127] | In broader meta-research, positive findings were reported in 44% of Registered Reports versus 96% of standard publication models [141] |
| 2. Imaging measurement, feature extraction, and software traceability | Acquisition-, preprocessing-, filter-, and software-dependent feature instability | Use IBSI-compliant definitions; report acquisition, reconstruction, preprocessing, interpolation, discretisation, filters, extraction software, and software version | IBSI Phase 1, 2 [39,52]; documented open implementation: PyRadiomics [148] | Across photon-counting and dual-energy CT systems, mean inter-system ICC was 0.157, and no feature reached ICC >0.90 at matched dose [48]. Feature values differed across radiomics software implementations [91]. Vendor-dependent quantitative CT differences were also reported outside radiomics [92]. |
| 3. Sample-size adequacy and study positioning | Underpowered model development; unstable estimates; inflated apparent performance | Provide a formal sample-size justification using prediction-model criteria | Riley et al. [29] | Sample-size justification was absent in 90.5% of studies; only 10.3% met strict Riley-based criteria; median shortfall: 268 patients [66]. |
| 4. Model development and leakage-safe internal validation | Data leakage; optimistic bias; non-nested feature selection; misuse of resampling | Nest preprocessing, feature selection, resampling, hyperparameter tuning, and model selection within training folds; use leakage-safe internal validation. | Radiomic-signature safeguards [149]; nested cross-validation principles [117]; PROBAST+AI [118]. | AUC inflation reached +0.34 when oversampling preceded cross-validation [70] and +0.15 when feature selection was applied before cross-validation [68]. |
| 5. Reporting transparency of radiomics and AI methods | Incomplete or non-verifiable pipeline reporting | Report item-by-item against the framework appropriate to the study scope: CLEAR, CLEAR-E3, TRIPOD+AI, and CLAIM where applicable | CLEAR [37]; CLEAR-E3 [116]; TRIPOD+AI [119]; CLAIM 2024 [150] | Only 7/117 radiomics papers (6%) included a self-reported reporting checklist or quality-scoring document [61]. CLEAR adoption reached 2%; self-reported versus expert-confirmed adherence was 91% vs 66% (mean gap, 21 percentage points) [88]. |
| 6. Methodological appraisal | Conflation of reporting quality, methodological quality, and translational maturity | Use structured tools for methodological appraisal and translational readiness, rather than relying on discrimination metrics or narrative claims alone | METRICS [38]; RQS 2.0 and Radiomics Readiness Levels [97] | Median RQS was 31% of the maximum across 1574 publications [74]. In a 2025 diagnostic-accuracy synthesis, study quality assessed with METRICS emerged as a significant source of between-study differences in subgroup analyses [87]. |
| 7. Open science and computational reproducibility | Unavailable code; inaccessible datasets; non-reproducible computational workflows | Share code and data where feasible; otherwise state access restrictions and provide sufficient computational detail for independent re-analysis | FAIR principles [126]; TOP guidelines [127] | In 257 radiomics papers published in leading journals, only 6% shared data and 7% shared code [89]. Private data were used in 91% of papers in the NEVER study [60] and 89% in a separate meta-research sample [61]. In broader radiology and nuclear medicine AI, only 1/161 private-data studies shared the dataset [90]. |
| 8. External validation and transportability | Internal-only evidence; poor transportability across institutions, scanners, and protocols | Perform external validation on independent data; distinguish internal, internal–external, and external validation | Validation hierarchy and clinical prediction-model guidance [120,121] | External validation was absent in 81% [60] and 79% [61] of radiomic studies; only 14% of 1574 publications included external validation [74]. |
| 9. Calibration, clinical benchmarking, and incremental value | Discrimination-only evaluation; absent calibration; untested incremental clinical value | Report calibration alongside discrimination; compare against clinical, non-radiomic, or standard-of-care baselines; quantify added value | Calibration principles [77,123]; CLEAR comparison requirements [37] | Calibration was reported in 1/19 HPV-prediction studies [76] and 2/26 MGMT-prediction studies [64]. 44% of radiomics studies included no non-radiomic comparator [60]. In prostate radiotherapy, adding MRI radiomics increased the C-index from 0.69 to 0.70 over a clinical model [94]. |
| 10. Early clinical evaluation and trial-level evidence | Retrospective performance claims substituted for early clinical evaluation; incomplete protocol and trial reporting | Evaluate live clinical performance, safety, workflow effects, and human-factor consequences before trial-level claims; use AI-specific reporting extensions for interventional protocols and trial reports | DECIDE-AI [122]; SPIRIT-AI [151]; CONSORT-AI [152] | In oncology AI, median SPIRIT-AI concordance was 78.2% across 12 RCT protocols [153]; median combined CONSORT 2010/CONSORT-AI concordance was 82% across 57 RCT reports [154] |
| 11. Deployment governance, regulatory readiness, and real-world translation | Clinical-readiness claims without deployability, lifecycle governance, or regulatory fitness | Address trustworthiness, robustness, fairness, explainability, traceability, post-deployment monitoring, and applicable regulatory requirements before claiming clinical readiness | FUTURE-AI [155]; GMLP / IMDRF [156]; applicable medical-device regulation, including EU MDR where relevant [157] | The research–clinical translation gap has been described as widening [6]. Routine oncologic implementation remains limited [7,65]. Progress in cost-effectiveness analysis is minimal or insignificant across radiomics studies [75]. |
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
References
- Kocak, B.; Baessler, B.; Cuocolo, R.; Mercaldo, N.; Pinto Dos Santos, D. Trends and statistics of artificial intelligence and radiomics research in Radiology, Nuclear Medicine, and Medical Imaging: bibliometric analysis. Eur. Radiol. 2023, 33(11), 7542–55. [Google Scholar] [CrossRef] [PubMed]
- Lambin, P.; Leijenaar, R.T.H.; Deist, T.M.; Peerlings, J.; de Jong, E.E.C.; van Timmeren, J.; et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat. Rev. Clin. Oncol. 2017, 14(12), 749–62. [Google Scholar] [CrossRef]
- Gillies, R.J.; Kinahan, P.E.; Hricak, H. Radiomics: Images Are More than Pictures, They Are Data. Radiology 2016, 278(2), 563–77. [Google Scholar] [CrossRef] [PubMed]
- Ferrari, R.; Trinci, M.; Casinelli, A.; Treballi, F.; Leone, E.; Caruso, D.; et al. Radiomics in radiology: What the radiologist needs to know about technical aspects and clinical impact. Radiol. Med. 2024, 129(12), 1751–65. [Google Scholar] [CrossRef]
- U.S. Food and Drug Administration; Center for Devices and Radiological Health. Artificial Intelligence-Enabled Medical Devices [Internet]. FDA: Silver Spring (MD), May 2026. Available online: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-enabled-medical-devices.
- Kocak, B.; Pinto Dos Santos, D.; Dietzel, M. The widening gap between radiomics research and clinical translation: rethinking current practices and shared responsibilities. Eur. J. Radiol. Artif. Intell. 2025, 1, 100004. [Google Scholar] [CrossRef]
- Huang, E.P.; O’Connor, J.P.B.; McShane, L.M.; Giger, M.L.; Lambin, P.; Kinahan, P.E.; et al. Criteria for the translation of radiomics into clinically useful tests. Nat. Rev. Clin. Oncol. 2023, 20(2), 69–82. [Google Scholar] [CrossRef]
- Limkin, E.J.; Sun, R.; Dercle, L.; Zacharaki, E.I.; Robert, C.; Reuzé, S.; et al. Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology. Ann. Oncol. 2017, 28(6), 1191–206. [Google Scholar] [CrossRef]
- Ioannidis, J.P.A.; Ntzani, E.E.; Trikalinos, T.A.; Contopoulos-Ioannidis, D.G. Replication validity of genetic association studies. Nat. Genet. 2001, 29(3), 306–9. [Google Scholar] [CrossRef]
- Varoquaux, G.; Cheplygina, V. Machine learning for medical imaging: methodological failures and recommendations for the future. npj Digit Med. 2022, 5(1), 48. [Google Scholar] [CrossRef]
- Errington, T.M.; Mathur, M.; Soderberg, C.K.; Denis, A.; Perfito, N.; Iorns, E.; et al. Investigating the replicability of preclinical cancer biology. eLife 2021, 10, e71601. [Google Scholar] [CrossRef]
- Roberts, M.; Driggs, D.; Thorpe, M.; Gilbey, J.; Yeung, M.; Ursprung, S.; et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 2021, 3(3), 199–217. [Google Scholar] [CrossRef]
- Carp, J. On the Plurality of (Methodological) Worlds: Estimating the Analytic Flexibility of fMRI Experiments. Front Neurosci. 2012, 6. [Google Scholar] [CrossRef]
- Ioannidis, J.P.A. Why Most Published Research Findings Are False. PLoS Med. 2005, 2(8), e124. [Google Scholar] [CrossRef]
- Ioannidis, J.P.A. Why Most Clinical Research Is Not Useful. PLoS Med. 2016, 13(6), e1002049. [Google Scholar] [CrossRef] [PubMed]
- Munafò, M.R.; Nosek, B.A.; Bishop, D.V.M.; Button, K.S.; Chambers, C.D.; Percie Du Sert, N.; et al. A manifesto for reproducible science. Nat. Hum. Behav. 2017, 1(1), 0021. [Google Scholar] [CrossRef] [PubMed]
- Van Calster, B.; Steyerberg, E.W.; Wynants, L.; Van Smeden, M. There is no such thing as a validated prediction model. BMC Med. 2023, 21(1), 70. [Google Scholar] [CrossRef] [PubMed]
- Chalmers, I.; Glasziou, P. Avoidable waste in the production and reporting of research evidence. The Lancet 2009, 374(9683), 86–9. [Google Scholar] [CrossRef]
- Smaldino, P.E.; McElreath, R. The natural selection of bad science. R Soc. Open Sci. 2016, 3(9), 160384. [Google Scholar] [CrossRef]
- Edwards, M.A.; Roy, S. Academic Research in the 21st Century: Maintaining Scientific Integrity in a Climate of Perverse Incentives and Hypercompetition. Environ. Eng. Sci. 2017, 34(1), 51–61. [Google Scholar] [CrossRef]
- Dickersin, K. The Existence of Publication Bias and Risk Factors for Its Occurrence. JAMA 1990, 263(10), 1385. [Google Scholar] [CrossRef]
- Song, F.; Parekh, S.; Hooper, L.; Loke, Y.; Ryder, J.; Sutton, A.; et al. Dissemination and publication of research findings: an updated review of related biases. Health Technol. Assess. 2010, 14(8). [Google Scholar] [CrossRef]
- Dwan, K.; Gamble, C.; Williamson, P.R.; Kirkham, J.J. the Reporting Bias Group. Systematic Review of the Empirical Evidence of Study Publication Bias and Outcome Reporting Bias — An Updated Review. In PLoS ONE; Boutron, I., Ed.; 5 Jul 2013; Volume 8, 7. [Google Scholar] [CrossRef]
- Turner, E.H.; Matthews, A.M.; Linardatos, E.; Tell, R.A.; Rosenthal, R. Selective Publication of Antidepressant Trials and Its Influence on Apparent Efficacy. N Engl. J. Med. 2008, 358(3), 252–60. [Google Scholar] [CrossRef]
- Simmons, J.P.; Nelson, L.D.; Simonsohn, U. False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychol. Sci. 2011, 22(11), 1359–66. [Google Scholar] [CrossRef]
- Gelman, A.; Loken, E. The Statistical Crisis in Science. Am. Sci. 2014, 102(6), 460. [Google Scholar] [CrossRef]
- Ioannidis, J.P.A. Why Most Discovered True Associations Are Inflated. Epidemiology 2008, 19(5), 640–8. [Google Scholar] [CrossRef]
- Button, K.S.; Ioannidis, J.P.A.; Mokrysz, C.; Nosek, B.A.; Flint, J.; Robinson, E.S.J.; et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 2013, 14(5), 365–76. [Google Scholar] [CrossRef]
- Riley, R.D.; Ensor, J.; Snell, K.I.E.; Harrell, F.E.; Martin, G.P.; Reitsma, J.B.; et al. Calculating the sample size required for developing a clinical prediction model. BMJ 2020, m441. [Google Scholar] [CrossRef] [PubMed]
- Kerr, N.L. HARKing: Hypothesizing After the Results are Known. Pers. Soc. Psychol. Rev. 1998, 2(3), 196–217. [Google Scholar] [CrossRef]
- John, L.K.; Loewenstein, G.; Prelec, D. Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling. Psychol. Sci. 2012, 23(5), 524–32. [Google Scholar] [CrossRef]
- Banks, G.C.; Rogelberg, S.G.; Woznyj, H.M.; Landis, R.S.; Rupp, D.E. Editorial: Evidence on Questionable Research Practices: The Good, the Bad, and the Ugly. J. Bus. Psychol. 2016, 31(3), 323–38. [Google Scholar] [CrossRef]
- Greenberg, S.A. How citation distortions create unfounded authority: analysis of a citation network. BMJ 2009, 339(jul20 3), b2680–b2680. [Google Scholar] [CrossRef]
- Kunda, Z. The case for motivated reasoning. Psychol. Bull. 1990, 108(3), 480–98. [Google Scholar] [CrossRef]
- Nickerson, R.S. Confirmation Bias: A Ubiquitous Phenomenon in Many Guises. Rev. General. Psychol. 1998, 2(2), 175–220. [Google Scholar] [CrossRef]
- Kim, S.H.; Schramm, S.; Riedel, E.O.; Schmitzer, L.; Rosenkranz, E.; Kertels, O.; et al. Automation bias in AI-assisted detection of cerebral aneurysms on time-of-flight MR angiography. Radiol. med. 2025, 130(4), 555–66. [Google Scholar] [CrossRef]
- Kocak, B.; Baessler, B.; Bakas, S.; Cuocolo, R.; Fedorov, A.; Maier-Hein, L.; et al. CheckList for EvaluAtion of Radiomics research (CLEAR): a step-by-step reporting guideline for authors and reviewers endorsed by ESR and EuSoMII. Insights Imaging 2023, 14(1), 75. [Google Scholar] [CrossRef]
- Kocak, B.; Akinci D’Antonoli, T.; Mercaldo, N.; Alberich-Bayarri, A.; Baessler, B.; Ambrosini, I.; et al. METhodological RadiomICs Score (METRICS): a quality scoring tool for radiomics research endorsed by EuSoMII. Insights Imaging 2024, 15(1), 8. [Google Scholar] [CrossRef]
- Zwanenburg, A.; Vallières, M.; Abdalah, M.A.; Aerts, H.J.W.L.; Andrearczyk, V.; Apte, A.; et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology 2020, 295(2), 328–38. [Google Scholar] [CrossRef] [PubMed]
- Buvat, I.; Orlhac, F. The Dark Side of Radiomics: On the Paramount Importance of Publishing Negative Results. J. Nucl. Med. 2019, 60(11), 1543–4. [Google Scholar] [CrossRef] [PubMed]
- Nardone, V.; Reginelli, A.; Rubini, D.; Gagliardi, F.; Del Tufo, S.; Belfiore, M.P.; et al. Delta radiomics: an updated systematic review. Radiol. med. 2024, 129(8), 1197–214. [Google Scholar] [CrossRef] [PubMed]
- Traverso, A.; Wee, L.; Dekker, A.; Gillies, R. Repeatability and Reproducibility of Radiomic Features: A Systematic Review. Int. J. Radiat. Oncol. 2018, 102(4), 1143–58. [Google Scholar] [CrossRef]
- Berenguer, R.; Pastor-Juan, M.D.R.; Canales-Vázquez, J.; Castro-García, M.; Villas, M.V.; Mansilla Legorburo, F.; et al. Radiomics of CT Features May Be Nonreproducible and Redundant: Influence of CT Acquisition Parameters. Radiology 2018, 288(2), 407–15. [Google Scholar] [CrossRef] [PubMed]
- Mackin, D.; Fave, X.; Zhang, L.; Fried, D.; Yang, J.; Taylor, B.; et al. Measuring Computed Tomography Scanner Variability of Radiomics Features: Investigative Radiology. 2015, 50(11), 757–65. [Google Scholar] [CrossRef]
- Larue, R.T.H.M.; Van Timmeren, J.E.; De Jong, E.E.C.; Feliciani, G.; Leijenaar, R.T.H.; Schreurs, W.M.J.; et al. Influence of gray level discretization on radiomic feature stability for different CT scanners, tube currents and slice thicknesses: a comprehensive phantom study. Acta Oncol. 2017, 56(11), 1544–53. [Google Scholar] [CrossRef] [PubMed]
- Midya, A.; Chakraborty, J.; Gönen, M.; M.d, R.K.G.D.; Simpson, A.L. Influence of CT acquisition and reconstruction parameters on radiomic feature reproducibility. JMI 2018, 5(1), 011020. [Google Scholar] [CrossRef] [PubMed]
- Meyer, M.; Ronald, J.; Vernuccio, F.; Nelson, R.C.; Ramirez-Giraldo, J.C.; Solomon, J.; et al. Reproducibility of CT Radiomic Features within the Same Patient: Influence of Radiation Dose and CT Reconstruction Settings. Radiology 2019, 293(3), 583–91. [Google Scholar] [CrossRef]
- Zhu, L.; Dong, H.; Sun, J.; Wang, L.; Xing, Y.; Hu, Y.; et al. Robustness of radiomics among photon-counting detector CT and dual-energy CT systems: a texture phantom study. Eur. Radiol. 2024, 35(2), 871–84. [Google Scholar] [CrossRef]
- Zhang, H.; Lu, T.; Wang, L.; Xing, Y.; Hu, Y.; Xu, Z.; et al. Robustness of radiomics within photon-counting detector CT: impact of acquisition and reconstruction factors. Eur. Radiol. 2025, 35(8), 4661–73. [Google Scholar] [CrossRef]
- Xue, C.; Yuan, J.; Lo, G.G.; Chang, A.T.Y.; Poon, D.M.C.; Wong, O.L.; et al. Radiomics feature reliability assessed by intraclass correlation coefficient: a systematic review. Quant. Imaging Med. Surg. 2021, 11(10), 4431–60. [Google Scholar] [CrossRef]
- Kendrick, J.; Francis, R.J.; Hassan, G.M.; Ong, J.S.L.; Jeraj, R.; Barry, N.; et al. Deep learning-based PSMA PET segmentation repeatability: A post-hoc analysis of a single-center, prospective, test–retest trial. Radiol. med. 2025, 131(2), 320–31. [Google Scholar] [CrossRef]
- Whybra, P.; Zwanenburg, A.; Andrearczyk, V.; Schaer, R.; Apte, A.P.; Ayotte, A.; et al. The Image Biomarker Standardization Initiative: Standardized Convolutional Filters for Reproducible Radiomics and Enhanced Clinical Insights. Radiology 2024, 310(2), e231319. [Google Scholar] [CrossRef]
- Orlhac, F.; Lecler, A.; Savatovski, J.; Goya-Outi, J.; Nioche, C.; Charbonneau, F.; et al. How can we combat multicenter variability in MR radiomics? Validation of a correction procedure. Eur. Radiol. 2021, 31(4), 2272–80. [Google Scholar] [CrossRef]
- Da-ano, R.; Masson, I.; Lucia, F.; Doré, M.; Robin, P.; Alfieri, J.; et al. Performance comparison of modified ComBat for harmonization of radiomic features for multicenter studies. Sci. Rep. 2020, 10(1), 10248. [Google Scholar] [CrossRef]
- Demircioğlu, A. Reproducibility and interpretability in radiomics: a critical assessment. dir 2024. [Google Scholar] [CrossRef]
- Parmar, C.; Rios Velazquez, E.; Leijenaar, R.; Jermoumi, M.; Carvalho, S.; Mak, R.H.; et al. Robust Radiomics Feature Quantification Using Semiautomatic Volumetric Segmentation. In PLoS ONE; Woloschak, G.E., Ed.; 15 Jul 2014; Volume 9, 7. [Google Scholar] [CrossRef]
- Saha, A.; Harowicz, M.R.; Mazurowski, M.A. Breast cancer MRI radiomics: An overview of algorithmic features and impact of inter-reader variability in annotating tumors. Med. Phys. 2018, 45(7), 3076–85. [Google Scholar] [CrossRef]
- deSouza, N.M.; Van Der Lugt, A.; Deroose, C.M.; Alberich-Bayarri, A.; Bidaut, L.; Fournier, L.; et al. Standardised lesion segmentation for imaging biomarker quantitation: a consensus recommendation from ESR and EORTC. Insights Imaging 2022, 13(1), 159. [Google Scholar] [CrossRef] [PubMed]
- Song, H.; Wang, X.; Wu, R.; Liu, W. The influence of manual segmentation strategies and different phases selection on machine learning-based computed tomography in renal tumors: a systematic review and meta-analysis. Radiol. med. 2024, 129(7), 1025–37. [Google Scholar] [CrossRef] [PubMed]
- Kocak, B.; Bulut, E.; Bayrak, O.N.; Okumus, A.A.; Altun, O.; Borekci Arvas, Z.; et al. NEgatiVE results in Radiomics research (NEVER): A meta-research study of publication bias in leading radiology journals. Eur. J. Radiol. 2023, 163, 110830. [Google Scholar] [CrossRef]
- Kocak, B.; Akinci D’Antonoli, T.; Ates Kus, E.; Keles, A.; Kala, A.; Kose, F.; et al. Self-reported checklists and quality scoring tools in radiomics: a meta-research. Eur. Radiol. 2024, 34(8), 5028–40. [Google Scholar] [CrossRef] [PubMed]
- Halligan, S.; Menu, Y.; Mallett, S. Why did European Radiology reject my radiomic biomarker paper? How to correctly evaluate imaging biomarkers in a clinical setting. Eur. Radiol. 2021, 31(12), 9361–8. [Google Scholar] [CrossRef]
- Bleker, J.; Yakar, D.; van Noort, B.; Rouw, D.; de Jong, I.J.; Dierckx, R.A.J.O.; et al. Single-center versus multi-center biparametric MRI radiomics approach for clinically significant peripheral zone prostate cancer. Insights Imaging 2021, 12(1), 150. [Google Scholar] [CrossRef]
- Doniselli, F.M.; Pascuzzo, R.; Mazzi, F.; Padelli, F.; Moscatelli, M.; Akinci, D.; ’Antonoli, T.; et al. Quality assessment of the MRI-radiomics studies for MGMT promoter methylation prediction in glioma: a systematic review and meta-analysis. Eur. Radiol. 2024, 34(9), 5802–15. [Google Scholar] [CrossRef]
- Malcolm, J.A.; Tacey, M.; Gibbs, P.; Lee, B.; Ko, H.S. Current state of radiomic research in pancreatic cancer: focusing on study design and reproducibility of findings. Eur. Radiol. 2023, 33(10), 6659–69. [Google Scholar] [CrossRef]
- Zhong, J.; Liu, X.; Lu, J.; Yang, J.; Zhang, G.; Mao, S.; et al. Overlooked and underpowered: a meta-research addressing sample size in radiomics prediction models for binary outcomes. Eur. Radiol. 2025, 35(3), 1146–56. [Google Scholar] [CrossRef]
- Horvat, N.; Papanikolaou, N.; Koh, D.M. Radiomics Beyond the Hype: A Critical Evaluation Toward Oncologic Clinical Use. Radiol. Artif. Intell. 2024, 6(4), e230437. [Google Scholar] [CrossRef]
- Demircioğlu, A. Measuring the bias of incorrect application of feature selection when using cross-validation in radiomics. Insights Imaging 2021, 12(1), 172. [Google Scholar] [CrossRef]
- Kapoor, S.; Narayanan, A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns 2023, 4(9), 100804. [Google Scholar] [CrossRef]
- Demircioğlu, A. Applying oversampling before cross-validation will lead to high bias in radiomics. Sci. Rep. 2024, 14(1), 11563. [Google Scholar] [CrossRef]
- Marzi, C.; Giannelli, M.; Barucci, A.; Tessa, C.; Mascalchi, M.; Diciotti, S. Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets. Sci. Data 2024, 11(1), 115. [Google Scholar] [CrossRef]
- Gidwani, M.; Chang, K.; Patel, J.B.; Hoebel, K.V.; Ahmed, S.R.; Singh, P.; et al. Inconsistent Partitioning and Unproductive Feature Associations Yield Idealized Radiomic Models. Radiology 2023, 307(1), e220715. [Google Scholar] [CrossRef]
- Beddok, A.; Grogg, K.; Nioche, C.; Rozenblum, L.; Orlhac, F.; Calugaru, V.; et al. Predicting tumor recurrence site after reirradiation in head and neck cancer: a retrospective external validation of a published [18F]-FDG PET radiomic signature. Radiol. med. 2025, 130(11), 1854–63. [Google Scholar] [CrossRef]
- Kocak, B.; Keles, A.; Kose, F.; Sendur, A. Quality of radiomics research: comprehensive analysis of 1574 unique publications from 89 reviews. Eur. Radiol. 2024, 35(4), 1980–92. [Google Scholar] [CrossRef]
- Barry, N.; Kendrick, J.; Molin, K.; Li, S.; Rowshanfarzad, P.; Hassan, G.M.; et al. Evaluating the impact of the Radiomics Quality Score: a systematic review and meta-analysis. Eur. Radiol. 2025, 35(3), 1701–13. [Google Scholar] [CrossRef]
- Spadarella, G.; Ugga, L.; Calareso, G.; Villa, R.; D’Aniello, S.; Cuocolo, R. The impact of radiomics for human papillomavirus status prediction in oropharyngeal cancer: systematic review and radiomics quality score assessment. Neuroradiology 2022, 64(8), 1639–47. [Google Scholar] [CrossRef]
- On behalf of Topic Group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative; Van Calster, B.; McLernon, D.J.; Van Smeden, M.; Wynants, L.; Steyerberg, E.W. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019, 17(1), 230. [Google Scholar] [CrossRef]
- Vickers, A.J.; Elkin, E.B. Decision Curve Analysis: A Novel Method for Evaluating Prediction Models. Med. Decis. Mak. 2006, 26(6), 565–74. [Google Scholar] [CrossRef] [PubMed]
- Park, J.E.; Kim, D.; Kim, H.S.; Park, S.Y.; Kim, J.Y.; Cho, S.J.; et al. Quality of science and reporting of radiomics in oncologic studies: room for improvement according to radiomics quality score and TRIPOD statement. Eur. Radiol. 2020, 30(1), 523–36. [Google Scholar] [CrossRef]
- Spadarella, G.; Stanzione, A.; Akinci D’Antonoli, T.; Andreychenko, A.; Fanni, S.C.; Ugga, L.; et al. Systematic review of the radiomics quality score applications: an EuSoMII Radiomics Auditing Group Initiative. Eur. Radiol. 2022, 33(3), 1884–94. [Google Scholar] [CrossRef]
- Kocak, B.; Ammirabile, A.; Ambrosini, I.; Akinci D’Antonoli, T.; Borgheresi, A.; Cavallo, A.U.; et al. Explanation and Elaboration with Examples for METRICS (METRICS-E3): an initiative from the EuSoMII Radiomics Auditing Group. Insights Imaging 2025, 16(1), 175. [Google Scholar] [CrossRef]
- Cavallo, A.U.; Stanzione, A.; Ponsiglione, A.; Trotta, R.; Fanni, S.C.; Ghezzo, S.; et al. Prostate cancer MRI methodological radiomics score: a EuSoMII radiomics auditing group initiative. Eur. Radiol. 2024, 35(3), 1157–65. [Google Scholar] [CrossRef] [PubMed]
- Cavallo, A.U.; Ponsiglione, A.; Pereira, B.; Di Donna, C.; Koltsakis, E.; Vernuccio, F.; et al. CT and MRI radiomics in cardiovascular risk prediction: a systematic review and meta-analysis by the EuSoMII Radiomics Auditing Group. Eur. Radiol. 2025, 36(5), 4049–60. [Google Scholar] [CrossRef] [PubMed]
- Kocak, B.; Mese, I.; Ates Kus, E. Radiomics for differentiating radiation-induced brain injury from recurrence in gliomas: systematic review, meta-analysis, and methodological quality evaluation using METRICS and RQS. Eur. Radiol. 2025, 35(8), 4490–505. [Google Scholar] [CrossRef] [PubMed]
- Yu, A.C.; Mohajer, B.; Eng, J. External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review. Radiol. Artif. Intell. 2022, 4(3), e210064. [Google Scholar] [CrossRef]
- Zhong, J.; Lu, J.; Zhang, G.; Mao, S.; Chen, H.; Yin, Q.; et al. An overview of meta-analyses on radiomics: more evidence is needed to support clinical translation. Insights Imaging 2023, 14(1), 111. [Google Scholar] [CrossRef]
- Ahmadzadeh, A.M.; Ashoobi, M.A.; Broomand Lomer, N.; Elyassirad, D.; Gheiji, B.; Vatanparast, M.; et al. Application of Deep Learning for Predicting Hematoma Expansion in Intracerebral Hemorrhage Using Computed Tomography Scans: A Systematic Review and Meta-Analysis of Diagnostic Accuracy. Radiol. med. 2025, 130(12), 1973–85. [Google Scholar] [CrossRef]
- Kocak, B.; Ponsiglione, A.; Stanzione, A.; Ugga, L.; Klontzas, M.E.; Cannella, R.; et al. CLEAR guideline for radiomics: Early insights into current reporting practices endorsed by EuSoMII. Eur. J. Radiol. 2024, 181, 111788. [Google Scholar] [CrossRef]
- Akinci D’Antonoli, T.; Cuocolo, R.; Baessler, B.; Pinto Dos Santos, D. Towards reproducible radiomics research: introduction of a database for radiomics studies. Eur. Radiol. 2023, 34(1), 436–43. [Google Scholar] [CrossRef] [PubMed]
- Kocak, B.; Yardimci, A.H.; Yuzkan, S.; Keles, A.; Altun, O.; Bulut, E.; et al. Transparency in Artificial Intelligence Research: a Systematic Review of Availability Items Related to Open Science in Radiology and Nuclear Medicine. Acad. Radiol. 2023, 30(10), 2254–66. [Google Scholar] [CrossRef]
- Foy, J.J.; Robinson, K.R.; Li, H.; Giger, M.L.; Al-Hallaq, H.; Armato, S.G. Variation in algorithm implementation across radiomics software. J. Med. Imag. 2018, 5(4), 044505. [Google Scholar] [CrossRef] [PubMed]
- Challa, A.B.; Radike, M.; Rizvi, A.; Weber, N.M.; Wamil, M.; Poigai Arunachalam, S.; et al. Interobserver and intraobserver variability among different vendors for mitral valve assessment: implications for transcatheter mitral valve repair. Radiol. med. 2025, 130(3), 296–301. [Google Scholar] [CrossRef]
- Chetan, M.R.; Gleeson, F.V. Radiomics in predicting treatment response in non-small-cell lung cancer: current status, challenges and future perspectives. Eur. Radiol. 2021, 31(2), 1049–58. [Google Scholar] [CrossRef]
- Zhong, J.; Davey, A.; Frood, R.; McWilliam, A.; Shortall, J.; Reardon, M.; et al. Combining MRI radiomics, hypoxia gene signature score and clinical variables for prediction of biochemical recurrence-free survival after radiotherapy in prostate cancer. Radiol. med. 2025, 130(8), 1139–48. [Google Scholar] [CrossRef]
- Ter Maat, L.S.; van Duin, I.A.J.; Elias, S.G.; Leiner, T.; Verhoeff, J.J.C.; Arntz, E.R.A.N.; et al. CT radiomics compared to a clinical model for predicting checkpoint inhibitor treatment outcomes in patients with advanced melanoma. Eur. J. Cancer 2023, 185, 167–77. [Google Scholar] [CrossRef]
- Peng, W.; Wan, L.; Wang, S.; Zou, S.; Zhao, X.; Zhang, H. A multiple-time-scale comparative study for the added value of magnetic resonance imaging-based radiomics in predicting pathological complete response after neoadjuvant chemoradiotherapy in locally advanced rectal cancer. Front Oncol. 2023, 13, 1234619. [Google Scholar] [CrossRef] [PubMed]
- Lambin, P.; Woodruff, H.C.; Mali, S.A.; Zhong, X.; Kuang, S.; Lavrova, E.; et al. Radiomics Quality Score 2.0: towards radiomics readiness levels and clinical translation for personalized medicine. Nat. Rev. Clin. Oncol. 2025, 22(11), 831–46. [Google Scholar] [CrossRef] [PubMed]
- McGale, J.; Beddok, A.; Schwartz, L.H.; Dercle, L. Radiomics Quality Score 2.0: what changed from version 1.0 and why it matters. Nat. Rev. Clin. Oncol. 2026, 23(1), 84–5. [Google Scholar] [CrossRef]
- Demircioğlu, A. Retractions of publications in radiomics: An underestimated problem? Eur. Radiol. 2025, 36(5), 3778–87. [Google Scholar] [CrossRef]
- Kumar, V.; Gu, Y.; Basu, S.; Berglund, A.; Eschrich, S.A.; Schabath, M.B.; et al. Radiomics: the process and the challenges. Magn. Reson. Imaging 2012, 30(9), 1234–48. [Google Scholar] [CrossRef] [PubMed]
- Aerts, H.J.W.L.; Velazquez, E.R.; Leijenaar, R.T.H.; Parmar, C.; Grossmann, P.; Carvalho, S.; et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 2014, 5, 4006. [Google Scholar] [CrossRef]
- van Timmeren, J.E.; Cester, D.; Tanadini-Lang, S.; Alkadhi, H.; Baessler, B. Radiomics in medical imaging-“how-to” guide and critical reflection. Insights Imaging 2020, 11(1), 91. [Google Scholar] [CrossRef]
- Lee, J.; Steinmann, A.; Ding, Y.; Lee, H.; Owens, C.; Wang, J.; et al. Radiomics feature robustness as measured using an MRI phantom. Sci. Rep. 2021, 11(1), 3973. [Google Scholar] [CrossRef]
- Pavic, M.; Bogowicz, M.; Würms, X.; Glatz, S.; Finazzi, T.; Riesterer, O.; et al. Influence of inter-observer delineation variability on radiomics stability in different tumor sites. Acta Oncol. 2018, 57(8), 1070–4. [Google Scholar] [CrossRef] [PubMed]
- Granzier, R.W.Y.; Verbakel, N.M.H.; Ibrahim, A.; van Timmeren, J.E.; van Nijnatten, T.J.A.; Leijenaar, R.T.H.; et al. MRI-based radiomics in breast cancer: feature robustness with respect to inter-observer segmentation variability. Sci. Rep. 2020, 10(1), 14163. [Google Scholar] [CrossRef]
- Poirot, M.G.; Caan, M.W.A.; Ruhe, H.G.; Bjørnerud, A.; Groote, I.; Reneman, L.; et al. Robustness of radiomics to variations in segmentation methods in multimodal brain MRI. Sci. Rep. 2022, 12, 16712. [Google Scholar] [CrossRef] [PubMed]
- Park, J.E.; Kim, H.S.; Kim, D.; Park, S.Y.; Kim, J.Y.; Cho, S.J.; et al. A systematic review reporting quality of radiomics research in neuro-oncology: toward clinical utility and quality improvement using high-dimensional imaging features. BMC Cancer 2020, 20(1), 29. [Google Scholar] [CrossRef]
- Jannot, A.S.; Agoritsas, T.; Gayet-Ageron, A.; Perneger, T.V. Citation bias favoring statistically significant studies was present in medical research. J. Clin. Epidemiol. 2013, 66(3), 296–301. [Google Scholar] [CrossRef] [PubMed]
- Duyx, B.; Urlings, M.J.E.; Swaen, G.M.H.; Bouter, L.M.; Zeegers, M.P. Scientific citations favor positive results: a systematic review and meta-analysis. J. Clin. Epidemiol. 2017, 88, 92–101. [Google Scholar] [CrossRef]
- Vickers, A.J.; Woo, S. Decision curve analysis in the evaluation of radiology research. Eur. Radiol. 2022, 32(9), 5787–9. [Google Scholar] [CrossRef]
- Chiu, K.; Grundy, Q.; Bero, L. “Spin” in published biomedical literature: a methodological systematic review. PLoS Biol. 2017, 15(9), e2002173. [Google Scholar] [CrossRef]
- McGrath, T.A.; McInnes, M.D.F.; van Es, N.; Leeflang, M.M.G.; Korevaar, D.A.; Bossuyt, P.M.M. Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies. Clin. Chem. 2017, 63(8), 1353–1362. [Google Scholar] [CrossRef]
- Oh, Y.K. Position: State-of-the-Art Claims Require State-of-the-Art Evidence. arXiv [cs.LG. 2026, arXiv:2605.17273v2. [Google Scholar] [CrossRef]
- Di Cesare, E.; Esposito, A.; Lo Casto, A.; Mazzei, M.A.; Polonara, G.; Sverzellati, N.; et al. CT acquisition protocols by pathology, SIRM position paper part 1: head and neck, brain and spine, chest, cardiovascular. Radiol. med. 2025, 130(10), 1594–601. [Google Scholar] [CrossRef]
- Di Cesare, E.; Ascenti, G.; Cappabianca, S.; Granata, C.; Reginelli, A.; Trinci, M.; et al. CT acquisition protocols by pathology, SIRM position paper part 2 (Abdominal and Oncologic Imaging, Urology, Paediatric). Radiol. med. 2025, 131(2), 292–301. [Google Scholar] [CrossRef]
- Kocak, B.; Borgheresi, A.; Ponsiglione, A.; Andreychenko, A.E.; Cavallo, A.U.; Stanzione, A.; et al. Explanation and Elaboration with Examples for CLEAR (CLEAR-E3): an EuSoMII Radiomics Auditing Group Initiative. Eur. Radiol. Exp. 2024, 8(1), 72. [Google Scholar] [CrossRef]
- Varma, S.; Simon, R. Bias in error estimation when using cross-validation for model selection. BMC Bioinform. 2006, 7(1), 91. [Google Scholar] [CrossRef] [PubMed]
- Moons, K.G.M.; Damen, J.A.A.; Kaul, T.; Hooft, L.; Andaur Navarro, C.; Dhiman, P.; et al. PROBAST+AI: an updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ 2025, 388, e082505. [Google Scholar] [CrossRef]
- Collins, G.S.; Moons, K.G.M.; Dhiman, P.; Riley, R.D.; Beam, A.L.; Van Calster, B.; et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024, 385, e078378. [Google Scholar] [CrossRef] [PubMed]
- Steyerberg, E.W. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating [Internet]; Springer International Publishing: Cham, 2019; Available online: http://link.springer.com/10.1007/978-3-030-16399-0.
- Steyerberg, E.W.; Harrell, F.E. Prediction models need appropriate internal, internal–external, and external validation. J. Clin. Epidemiol. 2016, 69, 245–7. [Google Scholar] [CrossRef] [PubMed]
- Vasey, B.; Nagendran, M.; Campbell, B.; Clifton, D.A.; Collins, G.S.; Denaxas, S.; et al. Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. Nat. Med. 2022, 28(5), 924–33. [Google Scholar] [CrossRef]
- Huang, Y.; Li, W.; Macheret, F.; Gabriel, R.A.; Ohno-Machado, L. A tutorial on calibration measurements and calibration models for clinical prediction models. J. Am. Med. Inform. Assoc. 2020, 27(4), 621–33. [Google Scholar] [CrossRef]
- Vickers, A.J.; Van Calster, B.; Steyerberg, E.W. Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ 2016, i6. [Google Scholar] [CrossRef]
- Park, S.H.; Han, K.; Lee, J.G. Conceptual review of outcome metrics and measures used in clinical evaluation of artificial intelligence in radiology. Radiol. med. 2024, 129(11), 1644–55. [Google Scholar] [CrossRef]
- Wilkinson, M.D.; Dumontier, M.; Aalbersberg, IjJ; Appleton, G.; Axton, M.; Baak, A.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3(1), 160018. [Google Scholar] [CrossRef]
- Nosek, B.A.; Alter, G.; Banks, G.C.; Borsboom, D.; Bowman, S.D.; Breckler, S.J.; et al. Promoting an open research culture. Science 2015, 348(6242), 1422–5. [Google Scholar] [CrossRef]
- Nosek, B.A.; Ebersole, C.R.; DeHaven, A.C.; Mellor, D.T. The preregistration revolution. Proc. Natl. Acad. Sci. USA 2018, 115(11), 2600–6. [Google Scholar] [CrossRef]
- Wagenmakers, E.J.; Wetzels, R.; Borsboom, D.; Van Der Maas, H.L.J.; Kievit, R.A. An Agenda for Purely Confirmatory Research. Perspect. Psychol. Sci. 2012, 7(6), 632–8. [Google Scholar] [CrossRef]
- Steegen, S.; Tuerlinckx, F.; Gelman, A.; Vanpaemel, W. Increasing Transparency Through a Multiverse Analysis. Perspect. Psychol. Sci. 2016, 11(5), 702–12. [Google Scholar] [CrossRef]
- Simonsohn, U.; Simmons, J.P.; Nelson, L.D. Specification curve analysis. Nat. Hum. Behav. 2020, 4(11), 1208–14. [Google Scholar] [CrossRef]
- Pai, S.; Bontempi, D.; Hadzic, I.; Prudente, V.; Sokač, M.; Chaunzwa, T.L.; Bernatz, S.; Hosny, A.; Mak, R.H.; Birkbak, N.J.; Aerts, H.J.W.L. Foundation model for cancer imaging biomarkers. Nat. Mach. Intell. 2024, 6(3), 354–367. [Google Scholar] [CrossRef]
- Moor, M.; Banerjee, O.; Abad, Z.S.H.; Krumholz, H.M.; Leskovec, J.; Topol, E.J.; Rajpurkar, P. Foundation models for generalist medical artificial intelligence. Nature 2023, 616(7956), 259–265. [Google Scholar] [CrossRef]
- Koetzier, L.R.; Wu, J.; Mastrodicasa, D.; Lutz, A.; Chung, M.; Koszek, W.A.; Pratap, J.; Chaudhari, A.S.; Rajpurkar, P.; Lungren, M.P.; Willemink, M.J. Generating synthetic data for medical imaging. Radiology 2024, 312(3), e232471. [Google Scholar] [CrossRef]
- Mali, S.A.; Mohammadian Rad, N.; Woodruff, H.C.; Depeursinge, A.; Andrearczyk, V.; Lambin, P. Harmonizing CT scanner acquisition variability in an anthropomorphic phantom: a comparative study of image-level and feature-level harmonization using GAN, ComBat, and their combination. PLoS ONE 2025, 20(5), e0322365. [Google Scholar] [CrossRef]
- Floca, R.; Bohn, J.; Haux, C.; Wiestler, B.; Zöllner, F.G.; Reinke, A.; et al. Radiomics workflow definition & challenges - German priority program 2177 consensus statement on clinically applied radiomics. Insights Imaging 2024, 15(1), 124. [Google Scholar] [CrossRef]
- Santinha, J.; Pinto Dos Santos, D.; Laqua, F.; Visser, J.J.; Groot Lipman, K.B.W.; Dietzel, M.; et al. ESR Essentials: radiomics—practice recommendations by the European Society of Medical Imaging Informatics. Eur. Radiol. 2024, 35(3), 1122–32. [Google Scholar] [CrossRef]
- Avanzo, M.; Soda, P.; Bertolini, M.; Bettinelli, A.; Rancati, T.; Stancanello, J.; et al. Robust radiomics: a review of guidelines for radiomics in medical imaging. Front Radiol. 2026, 5, 1701110. [Google Scholar] [CrossRef]
- Nosek, B.A.; Spies, J.R.; Motyl, M. Scientific Utopia: II. Restructuring Incentives and Practices to Promote Truth Over Publishability. Perspect. Psychol. Sci. 2012, 7(6), 615–31. [Google Scholar] [CrossRef]
- Chambers, C.D.; Tzavella, L. The past, present and future of Registered Reports. Nat. Hum. Behav. 2021, 6(1), 29–42. [Google Scholar] [CrossRef]
- Scheel, A.M.; Schijen, M.R.M.J.; Lakens, D. An Excess of Positive Results: Comparing the Standard Psychology Literature With Registered Reports. Adv. Methods Pract. Psychol. Sci. 2021, 4(2), 25152459211007467. [Google Scholar] [CrossRef]
- Cagan, R. The San Francisco Declaration on Research Assessment. Dis. Model Mech. 2013, 6(4), 869–70. [Google Scholar] [CrossRef]
- Xu, H.L.; Gong, T.T.; Song, X.J.; et al. Artificial Intelligence Performance in Image-Based Cancer Identification: Umbrella Review of Systematic Reviews. J. Med. Internet Res. 2025, 27, e53567. [Google Scholar] [CrossRef]
- European Parliament and Council of the European Union. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). OJ L 2024/1689; Off J Eur Union. 12 Jul 2024. Available online: https://eur-lex.europa.eu/eli/reg/2024/1689/oj.
- U.S. Food and Drug Administration. Marketing submission recommendations for a predetermined change control plan for artificial intelligence-enabled device software functions: guidance for industry and FDA staff. FDA: Silver Spring (MD), 3 Dec 2024. Available online: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/marketing-submission-recommendations-predetermined-change-control-plan-artificial-intelligence.
- Huang, M.L.; Ren, J.; Jin, Z.Y.; Liu, X.Y.; Li, Y.; He, Y.L.; et al. Application of magnetic resonance imaging radiomics in endometrial cancer: a systematic review and meta-analysis. Radiol. med. 2024, 129(3), 439–56. [Google Scholar] [CrossRef]
- Wright, B.D.; Vo, N.; Nolan, J.; Johnson, A.L.; Braaten, T.; Tritz, D.; et al. An analysis of key indicators of reproducibility in radiology. Insights Imaging 2020, 11(1), 65. [Google Scholar] [CrossRef]
- Van Griethuysen, J.J.M.; Fedorov, A.; Parmar, C.; Hosny, A.; Aucoin, N.; Narayan, V.; et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017, 77(21), e104–7. [Google Scholar] [CrossRef]
- Welch, M.L.; McIntosh, C.; Haibe-Kains, B.; Milosevic, M.F.; Wee, L.; Dekker, A.; et al. Vulnerabilities of radiomic signature development: The need for safeguards. Radiother. Oncol. 2019, 130, 2–9. [Google Scholar] [CrossRef]
- Tejani, A.S.; Klontzas, M.E.; Gatti, A.A.; Mongan, J.T.; Moy, L.; Park, S.H.; et al. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update. Radiol. Artif. Intell. 2024, 6(4), e240300. [Google Scholar] [CrossRef]
- Cruz Rivera, S.; Liu, X.; Chan, A.W.; Denniston, A.K.; Calvert, M.J.; SPIRIT-AI and CONSORT-AI Working Group; et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat. Med. 2020, 26(9), 1351–63. [Google Scholar] [CrossRef]
- Liu, X.; Rivera, S.C.; Moher, D.; Calvert, M.J.; Denniston, A.K. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension. BMJ 2020, m3164. [Google Scholar] [CrossRef] [PubMed]
- Chen, D.; He, E.; Pace, K.; Chekay, M.; Raman, S. Concordance with SPIRIT-AI guidelines in reporting of randomized controlled trial protocols investigating artificial intelligence in oncology: a systematic review. Oncologist 2025, 30(5), oyaf112. [Google Scholar] [CrossRef] [PubMed]
- Chen, D.; Arnold, K.; Sukhdeo, R.; Farag Alla, J.; Raman, S. Concordance with CONSORT-AI guidelines in reporting of randomised controlled trials investigating artificial intelligence in oncology: a systematic review. BMJ Oncol. 2025, 4(1), e000733. [Google Scholar] [CrossRef] [PubMed]
- Lekadir, K.; Frangi, A.F.; Porras, A.R.; Glocker, B.; Cintas, C.; Langlotz, C.P.; et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ 2025, 388, e081554. [Google Scholar] [CrossRef]
- International Medical Device Regulators Forum; Artificial Intelligence/Machine Learning-enabled Medical Devices Working Group. Good Machine Learning Practice for Medical Device Development: Guiding Principles. IMDRF/AIML WG/N88 FINAL:2025. 27 Jan 2025. Available online: https://www.imdrf.org/sites/default/files/2025-01/IMDRF_AIML%20WG_GMLP_N88%20Final_0.pdf.
- Regulation (EU) 2017/745 of the European Parliament and of the Council of 5 April 2017 on medical devices, amending Directive 2001/83/EC, Regulation (EC) No 178/2002 and Regulation (EC) No 1223/2009 and repealing Council Directives 90/385/EEC and 93/42/EEC (Text with EEA relevance. ). OJ L [Internet]. 5 Apr 2017. Available online: http://data.europa.eu/eli/reg/2017/745/oj.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).