Submitted:
26 August 2025
Posted:
28 August 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
1.1. Personalization in Clinical Medicine and Artificial Intelligence
1.2. The AI Socratic Paradox
1.3. Contributions

2. Related Works
3. Feature Selection as a Metacognitive Requirement
3.1. Epistemic Challenges in Feature Selection
3.2. Triad-Informed Approaches to Feature Selection Challenges
4. Model Specification as a Metacognitive Requirement
4.1. Epistemic Challenges in Model Specification
4.2. Triad-Informed Approaches to Model Specification Challenges
5. Model Validation as a Metacognitive Requirement
5.1. Epistemic Challenges in Model Validation
5.2. Triad-Informed Approaches to Model Validation Challenges
6. Discussion
6.1. Towards an Integrated AISP-Aware Ecosystem
6.2. AISP as an Impetus for Epistemic Humility
Author Contributions
Funding
Acknowledgments
Declaration of Generative AI and AI-Assisted Technologies in the Writing Process
References
- Djulbegovic B, Guyatt GH. Progress in evidence-based medicine: a quarter century on. Lancet. 2017;390(10092):415-423.
- Guyatt G, Cairns J, Churchill D, et al. Evidence-based medicine. A new approach to teaching the practice of medicine. JAMA. 1992;268(17):2420-2425.
- Nikles J, Mitchell G. The essential role of N-of-1 trials in the movement toward personalized medicine. Med Care. 2015;53(4):301-306.
- Troqe, B., Lakemond, N.., & Holmberg, G.. (2024). From Half-Truths to Situated Truths: Exploring Situatedness in Human-AI Collaborative Decision-Making in the Medical Context. Journal of Competences, Strategy & Management, 12, 1–15.
- Buchanan BG, Shortliffe EH. Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Reading, MA: Addison-Wesley; 1984.
- Ezugwu AE, Ho Y-S, Egwuche OS, Ekundayo OS, Van Der Merwe A, Saha AK, Pal J. Classical Machine Learning: Seventy Years of Algorithmic Learning Evolution. Data Intelligence. Accepted July 15, 2024. arXiv:2408.01747.
- Ullah R, Sarwar N, Alatawi MN, et al. Advancing personalized diagnosis and treatment using deep learning architecture. Frontiers in Medicine. 2025;12:1545528.
- Ahmed N, Devitt KS, Keshet I, et al. Epistemological humility in the era of COVID-19. Patient Exp J. 2024;11(2):1-8.
- Katz RA, Graham SS, Buchman DZ. The need for epistemic humility in AI-assisted pain assessment. Med Health Care Philos. 2025 Mar 15;28(2):339–349. [CrossRef]
- Desai RJ, Glynn RJ, Solomon SD, Claggett B, Wang SV, Vaduganathan M. Individualized Treatment Effect Prediction with Machine Learning — Salient Considerations. NEJM Evidence. 2024 Apr;3(4):EVIDoa2300041. [CrossRef]
- Babushkina D, Votsis A. The ethics and epistemology of explanatory AI in medicine and psychiatry. Ethics and Information Technology. 2022 Sep;24(3–4):443–56. [CrossRef]
- Vega, C. Knowing you know nothing in the age of generative AI. Nature Humanities and Social Sciences Communications. 2025;12:471. [CrossRef]
- Stenseke, J. Interdisciplinary Confusion and Resolution in the Context of Moral Machines. Sci Eng Ethics. 2022 May 19;28(3):24. [CrossRef] [PubMed] [PubMed Central]
- Alvarado, R. AI as an Epistemic Technology. Sci Eng Ethics 29, 32 (2023). [CrossRef]
- Durán, J.M., Sand, M. & Jongsma, K. The ethics and epistemology of explanatory AI in medicine and healthcare. Ethics Inf Technol 24, 42 (2022). [CrossRef]
- López-Rubio, E., Ratti, E. Data science and molecular biology: prediction and mechanistic explanation. Synthese 198, 3131–3156 (2021). [CrossRef]
- van Baalen S, Boon M, Verhoef P. From clinical decision support to clinical reasoning support systems. J Eval Clin Pract. 2021 Jun;27(3):520-528. Epub 2021 Feb 7. [CrossRef] [PubMed] [PubMed Central]
- Lalli Myllyaho, Mikko Raatikainen, Tomi Männistö, Tommi Mikkonen, Jukka K. Nurminen. Systematic literature review of validation methods for AI systems. Journal of Systems and Software, Volume 181, 2021, 111050, ISSN 0164-1212. [CrossRef]
- S. Bharati, M. R. H. Mondal and P. Podder, “A Review on Explainable Artificial Intelligence for Healthcare: Why, How, and When?,” in IEEE Transactions on Artificial Intelligence, vol. 5, no. 4, pp. 1429-1442, April 2024. [CrossRef]
- Morone G, De Angelis L, Martino Cinnera A, Carbonetti R, Bisirri A, Ciancarelli I, Iosa M, Negrini S, Kiekens C, Negrini F. Artificial intelligence in clinical medicine: a state-of-the-art overview of systematic reviews with methodological recommendations for improved reporting. Front Digit Health. 2025 Mar 5;7:1550731. [CrossRef]
- Lamsaf, A.; Carrilho, R.; Neves, J.C.; Proença, H. Causality, Machine Learning, and Feature Selection: A Survey. Sensors 2025, 25, 2373. [Google Scholar] [CrossRef] [PubMed]
- Seoni S, Jahmunah V, Salvi M, Barua PD, Molinari F, Acharya UR. Application of uncertainty quantification to artificial intelligence in healthcare: A review of last decade (2013–2023). Computers in Biology and Medicine. 2023;165:107441. [CrossRef]
- Hüllermeier, E., Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn 110, 457–506 (2021). [CrossRef]
- Jiao L, Wang Y, Liu X, Li L, Liu F, Ma W, Guo Y, Chen P, Yang S, Hou B. Causal Inference Meets Deep Learning: A Comprehensive Survey. Research. 2024 Sep 10;7:0467. [CrossRef] [PubMed] [PubMed Central]
- Campagner A, Cabitza F, Ciucci D. Three-Way Decision for Handling Uncertainty in Machine Learning: A Narrative Review. Rough Sets. 2020 Jun 10;12179:137–52. [CrossRef] [PubMed Central]
- Mirza B, Wang W, Wang J, Choi H, Chung NC, Ping P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes (Basel). 2019 Jan 28;10(2):87. [CrossRef] [PubMed] [PubMed Central]
- Pudjihartono N, Fadason T, Kempa-Liehr AW, O’Sullivan JM. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Front Bioinform. 2022 Jun 27;2:927312. [CrossRef] [PubMed] [PubMed Central]
- Remeseiro B, Bolon-Canedo V. A review of feature selection methods in medical applications. Comput Biol Med. 2019 Sep;112:103375. Epub 2019 Jul 31. [CrossRef] [PubMed]
- Brown G, Pocock A, Zhao MJ, Luján M. Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. Journal of Machine Learning Research. 2012;13:27-66.
- Guyon, I., Weston, J., Barnhill, S. et al. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 389–422 (2002). [CrossRef]
- Moraffah R, Sheth P, Vishnubhatla S, Liu H. (2024). Causal Feature Selection for Responsible Machine Learning. arXiv preprint arXiv:2402.02696.
- Malec SA, Taneja SB, Albert SM, Elizabeth Shaaban C, Karim HT, Levine AS, Munro P, Callahan TJ, Boyce RD. Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer’s disease. J Biomed Inform. 2023 Jun;142:104368. [CrossRef]
- Skelly AC, Dettori JR, Brodt ED. Assessing bias: the importance of considering confounding. Evidence-Based Spine Care Journal. 2012 Feb;3(1):9–12. [CrossRef] [PubMed Central]
- Holmberg MJ, Andersen LW. Collider Bias. JAMA. 2022;327(13):1282–1283. [CrossRef]
- Newman-Griffis D, Divita G, Desmet B, Zirikly A, Rosé CP, Fosler-Lussier E. Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets. J Am Med Inform Assoc. 2021 Mar 1;28(3):516-532. [CrossRef] [PubMed Central]
- Daumas L, Corbel C, Zory R, Corveleyn X, Fabre R, Manera V, Robert P. Associations, overlaps and dissociations between apathy and fatigue. Scientific Reports. 2022 May 5;12:7387. [CrossRef] [PubMed Central]
- Pickering TG. Do we really need a new definition of hypertension? Journal of Clinical Hypertension. 2005;7(12):702-704.
- Aronson JK. Biomarkers and surrogate endpoints. Br J Clin Pharmacol. 2005 May;59(5):491-4. [CrossRef] [PubMed] [PubMed Central]
- Epstein AE, Bigger JT Jr, Wyse DG, Romhilt DW, Reynolds-Haertle RA, Hallstrom AP. Events in the Cardiac Arrhythmia Suppression Trial (CAST): mortality in the entire population enrolled. J Am Coll Cardiol. 1991 Jul;18(1):14-9. Erratum in: J Am Coll Cardiol 1991 Sep;18(3):888. [CrossRef] [PubMed]
- Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019 Oct 25;366(6464):447-453. [CrossRef] [PubMed]
- Gani MO, Kethireddy S, Adib R, Hasan U, Griffin P, Adibuzzaman M. Structural causal model with expert augmented knowledge to estimate the effect of oxygen therapy on mortality in the ICU. Artif Intell Med. 2023 Mar;137:102493. [CrossRef]
- Tikka, “Identifying Counterfactual Queries with the R Package cfid”, The R Journal, 2023.
- Bareinboim E, & Pearl J, Causal inference and the data-fusion problem, Proc. Natl. Acad. Sci. U.S.A. 113 (27) 7345-7352. (2016). [CrossRef]
- Triantafillou S, Jabbari F, Gregory F. Cooper Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, PMLR 161:1434-1443, 2021.
- Mascaro, S., Wu, Y., Woodberry, O. et al. Modeling COVID-19 disease processes by remote elicitation of causal Bayesian networks from medical experts. BMC Med Res Methodol 23, 76 (2023). [CrossRef]
- Wagan AA, Talpur S, Narejo S. Clustering uncertain overlapping symptoms of multiple diseases in clinical diagnosis. PeerJ Comput Sci. 2024 Oct 2;10:e2315. [CrossRef] [PubMed] [PubMed Central]
- Kempowsky-Hamon T, Valle C, Lacroix-Triki M, Hedjazi L, Trouilh L, Lamarre S, Labourdette D, Roger L, Mhamdi L, Dalenc F, Filleron T, Favre G, François JM, Le Lann MV, Anton-Leberre V. Fuzzy logic selection as a new reliable tool to identify molecular grade signatures in breast cancer--the INNODIAG study. BMC Med Genomics. 2015 Feb 7;8:3. [CrossRef] [PubMed] [PubMed Central]
- Paja, W. Application of the Fuzzy Approach for Evaluating and Selecting Relevant Objects, Features, and Their Ranges. Entropy. 2023 Aug 17;25(8):1223. [CrossRef] [PubMed Central]
- Saki A, Faghihi U. Integrating Fuzzy Logic with Causal Inference: Enhancing the Pearl and Neyman-Rubin Methodologies. arXiv preprint arXiv:2406.13731. 2024 Jun 19.
- Christ SL, Lee DJ, Lam BL, Diane ZD. Structural Equation Modeling: A Framework for Ocular and Other Medical Sciences Research. Ophthalmic Epidemiology. 2014 Feb;21(1):1–13. [CrossRef] [PubMed Central]
- Kraus E, Kern C. Measurement Modeling of Predictors and Outcomes in Algorithmic Fairness. In: Proceedings of the 3rd AAAI Workshop on Algorithmic Fairness through the Lens of Time (AFFECT 2024). CEUR Workshop Proceedings; 2024.
- Sullivan AJ, VanderWeele TJ. Bias and sensitivity analysis for unmeasured confounders in linear structural equation models. arXiv preprint arXiv:2103.05775. 2021.
- Tchetgen EJ, Ying A, Cui Y, Shi X, Miao W. “An Introduction to Proximal Causal Inference.” Statist. Sci. 39 (3) 375 - 390, August 2024. [CrossRef]
- Liu J, Park C, Li K, Tchetgen EJ, Regression-based proximal causal inference, American Journal of Epidemiology, Volume 194, Issue 7, July 2025, Pages 2030–2036. [CrossRef]
- Rakshit P, Shi X, Tchetgen Tchetgen E. Adaptive Proximal Causal Inference with Some Invalid Proxies. arXiv preprint arXiv:2507.19623. 2025 Jul 25.
- Woodman RJ, Mangoni AA. A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and future. Aging Clin Exp Res. 2023 Nov;35(11):2363-2397. Epub 2023 Sep 8. [CrossRef] [PubMed] [PubMed Central]
- Vincent, A.M., Jidesh, P. An improved hyperparameter optimization framework for AutoML systems using evolutionary algorithms. Sci Rep 13, 4737 (2023). [CrossRef]
- Banerjee I, Bhattacharjee K, Burns JL, Trivedi H, Purkayastha S, Seyyed-Kalantari L, Patel BN, Shiradkar R, Gichoya J. “Shortcuts” Causing Bias in Radiology Artificial Intelligence: Causes, Evaluation, and Mitigation. J Am Coll Radiol. 2023 Sep;20(9):842-851. Epub 2023 Jul 27. [CrossRef] [PubMed] [PubMed Central]
- Hsu CW, Lin CJ. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks. 2002;13(2):415–425.
- Bodenreider, O. The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research. 2004;32(Database issue):D267–D270.
- Baddam S, Burns B. Systemic Inflammatory Response Syndrome. [Updated 2025 Jun 20]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan. Available online: https://www.ncbi.nlm.nih.gov/books/NBK547669/.
- King BH, Navot N, Bernier R, Webb SJ. Update on diagnostic classification in autism. Curr Opin Psychiatry. 2014 Mar;27(2):105-9. [CrossRef] [PubMed] [PubMed Central]
- Qureshi AG, Jha SK, Iskander J, Avanthika C, Jhaveri S, Patel VH, Rasagna Potini B, Talha Azam A. Diagnostic Challenges and Management of Fibromyalgia. Cureus. 2021 Oct 11;13(10):e18692. [CrossRef] [PubMed] [PubMed Central]
- Berwick R, Barker C, Goebel A; guideline development group. The diagnosis of fibromyalgia syndrome. Clinical Medicine (London). 2022 Nov;22(6):570-574. [CrossRef]
- Reverberi C, Rigon T, Solari A, Hassan C, Cherubini P; GI Genius CADx Study Group; Cherubini A. Experimental evidence of effective human-AI collaboration in medical decision-making. Sci Rep. 2022 Sep 2;12(1):14952. [CrossRef] [PubMed] [PubMed Central]
- Harnad, S. The symbol grounding problem. Physica D: Nonlinear Phenomena. 1990;42(1–3):335–346.
- Rane S, Bruna PJ, Sucholutsky I, Kello C, Griffiths TL. Concept Alignment. arXiv preprint arXiv:2401.08672. 2024 Jan 9.
- Muttenthaler L, Utsumi Y, Brielmann AA, Cichy RM, Hebart MN. Human alignment of neural network representations. arXiv preprint arXiv:2211.01201. 2025 Feb 16.
- Ogg M, Wolmetz M. Measuring Alignment between Human and Artificial Intelligence with Representational Similarity Analysis. In: Proceedings of the Cognitive Computational Neuroscience Conference (CCNeuro 2024). 2024.
- Boggust A, Bang H, Strobelt H, Satyanarayan A. Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships. In: CHI Conference on Human Factors in Computing Systems (CHI ’25), April 26–May 01, 2025, Yokohama, Japan. ACM, New York, NY, USA. [CrossRef]
- Marjieh R, Kumar S, Campbell D, Zhang L, Bencomo G, Snell J, Griffiths TL. Learning Human-Aligned Representations with Contrastive Learning and Generative Similarity. arXiv preprint arXiv:2405.19420. 2025 Jan 31.
- Huynh T, Kornblith S, Walter MR, Maire M, Khademi M. “Boosting Contrastive Self-Supervised Learning with False Negative Cancellation,” 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2022, pp. 986-996. [CrossRef]
- Zang C, Wang F. SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records. Proc IEEE Int Conf Data Min. 2021 Dec;2021:857-866. [CrossRef] [PubMed] [PubMed Central]
- Ghesu FC, Georgescu B, Mansoor A, Yoo Y, Neumann D, Patel P, Vishwanath RS, Balter JM, Cao Y, Grbic S, Comaniciu D. Contrastive self-supervised learning from 100 million medical images with optional supervision. J Med Imaging (Bellingham). 2022 Nov;9(6):064503. Epub 2022 Nov 30. [CrossRef] [PubMed] [PubMed Central]
- Nie Y, He S, Bie Y, Wang Y, Chen Z, Yang S. ConceptCLIP: Towards Trustworthy Medical AI via Concept-Enhanced Contrastive Language-Image Pre-training. arXiv preprint arXiv:2501.15579. 2025.
- Koh PW, Nguyen T, Tang YS, Mussmann S, Pierson E, Kim B, Liang P. Concept Bottleneck Models. Proceedings of the 37th International Conference on Machine Learning (ICML). PMLR 119:5338–5348, 2020.
- Wu Y, Liu Y, Yang Y, Yao MS, Yang W, Shi X, Yang L, Li D, Liu Y, Yin S, Lei C, Zhang M, Gee JC, Yang X, Wei W, Gu S. A concept-based interpretable model for the diagnosis of choroid neoplasias using multimodal data. Nature Communications. 2025;16(1):3504. [CrossRef]
- Pang W, Ke X, Tsutsui S, Wen B. Integrating Clinical Knowledge into Concept Bottleneck Models. In: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2024. Lecture Notes in Computer Science, vol 15006. Springer, Cham; 2024. p. 3–13. [CrossRef]
- Park S, Mun J, Oh D, Lee N. An Analysis of Concept Bottleneck Models: Measuring, Understanding, and Mitigating the Impact of Noisy Annotations. arXiv preprint arXiv:2505.16705. 2025 May 22.
- Sam D, Pukdee R, Jeong DP, Byun Y, Kolter JZ. Bayesian Neural Networks with Domain Knowledge Priors. arXiv preprint arXiv:2402.13410. 2024 Feb 20.
- Wong A, Otles E, Donnelly JP, et al. External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Internal Medicine. 2021;181(8):1065–1070. [CrossRef]
- Heaven, WD. Google’s medical AI was super accurate in a lab. Real life was a different story. MIT Technology Review. 2020 Apr 27.
- Steyerberg, EW. Chapter 17: Internal and external validation of prediction models. In: Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd ed. Available online: www.clinicalpredictionmodels.org/extra-material/chapter-17.
- Ramspek CL, Jager KJ, Dekker FW, Zoccali C, van Diepen M. External validation of prognostic models: what, why, how, when and where? Clinical Kidney Journal. 2021;14(1):49–58. [CrossRef]
- Austin PC, van Klaveren D, Vergouwe Y, Nieboer D, Lee DS, Steyerberg EW. Geographic and temporal validity of prediction models: different approaches were useful to examine model performance. Journal of Clinical Epidemiology. 2016 Nov;79:76–85. [CrossRef]
- Salwei ME, Carayon P. A Sociotechnical Systems Framework for the Application of Artificial Intelligence in Health Care Delivery. J Cogn Eng Decis Mak. 2022 Dec;16(4):194-206. Epub 2022 May 11. [CrossRef] [PubMed] [PubMed Central]
- Rosenthal JT, Beecy A, Sabuncu MR. Rethinking clinical trials for medical AI with dynamic deployments of adaptive systems. NPJ Digital Medicine. 2025 May 6; 8:252. [CrossRef]
- Dolin P, Li W, Dasarathy G, Berisha V. Statistically Valid Post-Deployment Monitoring Should Be Standard for AI-Based Digital Health. arXiv preprint arXiv:2506.05701. 2025. arXiv:2506.05701. 2025.
- Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2020). Dive into Deep Learning. Journal of the American College of Radiology, JACR.
- Zadeh, L. A. (1999). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 100, 9–34.
- Musa, A., Prasad, R. & Hernandez, M. Addressing cross-population domain shift in chest X-ray classification through supervised adversarial domain adaptation. Sci Rep 15, 11383 (2025). [CrossRef]
- Blattmann M, Lindenmeyer A, Franke S, Neumuth T, Schneider D. Implicit versus explicit Bayesian priors for epistemic uncertainty estimation in clinical decision support. PLOS Digit Health. 2025 Jul 29;4(7):e0000801. [CrossRef] [PubMed] [PubMed Central]
- Cao S, Zhang Z. Deep Hybrid Models for Out-of-Distribution Detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022:4733–4741.
- Bickford Smith F, Kossen J, Trollope E, et al. Rethinking Aleatoric and Epistemic Uncertainty. International Conference on Machine Learning (ICML). 2025.
- Angelopoulos AN, Bates S. A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. arXiv preprint arXiv:2107.07511. 2022.
- Olsson H, Kartasalo K, Mulliqi N, Capuccini M, Ruusuvuori P, Samaratunga H, Delahunt B, Lindskog C, Janssen EAM, Blilie A; ISUP Prostate Imagebase Expert Panel; Egevad L, Spjuth O, Eklund M. Estimating diagnostic uncertainty in artificial intelligence assisted pathology using conformal prediction. Nat Commun. 2022 Dec 15;13(1):7761. [CrossRef] [PubMed] [PubMed Central]
- Sreenivasan, A.P., Vaivade, A., Noui, Y. et al. Conformal prediction enables disease course prediction and allows individualized diagnostic uncertainty in multiple sclerosis. npj Digit. Med. 8, 224 (2025). [CrossRef]
- Fayyad J, Alijani S, Najjaran H, Empirical validation of Conformal Prediction for trustworthy skin lesions classification, Computer Methods and Programs in Biomedicine, Volume 253, 2024, 108231, ISSN 0169-2607. [CrossRef]
- Shanmugam D, Lu H, Sankaranarayanan S, Guttag J. Test-time augmentation improves efficiency in conformal prediction. arXiv preprint arXiv:2505.22764. 2025.
- Kendall A, Gal Y. What uncertainties do we need in Bayesian deep learning for computer vision? Advances in Neural Information Processing Systems. 2017:5574-5584.
- Leibig, C., Allken, V., Ayhan, M.S. et al. Leveraging uncertainty information from deep neural networks for disease detection. Sci Rep 7, 17816 (2017). [CrossRef]
- Hsu YC, Shen Y, Jin H and Kira Z, “Generalized ODIN: Detecting Out-of-Distribution Image Without Learning From Out-of-Distribution Data,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 10948-10957. [CrossRef]
- Liang S, Li Y, Srikant R. Enhancing The Reliability of Out-of-Distribution Image Detection in Neural Networks. In: CVPR 2017;15058–15066.
- Guo C, Pleiss G, Sun Y, Weinberger KQ. On Calibration of Modern Neural Networks. NeurIPS 2017:1321–1330.
- Lee K, Lee K, Lee H, Shin J. A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks. NeurIPS 2018.
- González C, Gotkowski K, Fuchs M, Bucher A, Dadras A, Fischbach R, Kaltenborn IJ, Mukhopadhyay A. Distance-based detection of out-of-distribution silent failures for Covid-19 lung lesion segmentation. Med Image Anal. 2022 Nov;82:102596. [CrossRef]
- Müller M, Hein M. Mahalanobis++: Improving OOD Detection via Feature Normalization. arXiv preprint arXiv:2505.18032. 2025.
- Lakshminarayanan B, Pritzel A, Blundell C. Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles. Advances in Neural Information Processing Systems. 2017;30:6405–6416.
- HassanPour Zonoozi, M., Seydi, V. A Survey on Adversarial Domain Adaptation. Neural Process Lett 55, 2429–2469 (2023). [CrossRef]
- Subasri V, Krishnan A, Kore A, et al. Detecting and Remediating Harmful Data Shifts for the Responsible Deployment of Clinical AI Models. JAMA Netw Open. 2025;8(6):e2513685. [CrossRef]
- Vegt AH, Scott I, Dermawan K, Schnetler RJ, Kalke VR, Lane PJ, Implementation frameworks for end-to-end clinical AI: derivation of the SALIENT framework, Journal of the American Medical Informatics Association, Volume 30, Issue 9, September 2023, Pages 1503–1515. [CrossRef]
- Shimgekar SR, Vassef S, Goyal A, Kumar N, Saha K. Agentic AI framework for End-to-End Medical Data Inference. arXiv preprint arXiv:2507.18115. 2025.
- Lavin, A., Gilligan-Lee, C.M., Visnjic, A. et al. Technology readiness levels for machine learning systems. Nat Commun 13, 6039 (2022). [CrossRef]
- Li Z, Kesselman C, Nguyen TH, Xu BY, Bolo K, Yu K. From Data to Decision: Data-Centric Infrastructure for Reproducible ML in Collaborative eScience. arXiv preprint arXiv:2506.16051. 2025.
- Feng, J., Phillips, R.V., Malenica, I. et al. Clinical artificial intelligence quality improvement: towards continual monitoring and updating of AI algorithms in healthcare. npj Digit. Med. 5, 66 (2022). [CrossRef]
- Greenhalgh, T., Papoutsi, C. Studying complexity in health services research: desperately seeking an overdue paradigm shift. BMC Med 16, 95 (2018). [CrossRef]
- Topol, E. J. (2019). Deep medicine: how artificial intelligence can make healthcare human again. First edition.
- Mol, A. The Logic of Care: Health and the Problem of Patient Choice. Routledge; 2008.
- Cahalan, S. Brain on Fire: My Month of Madness. Simon & Schuster; 2012.
| Epistemic Challenge | Approach | Triad Domain | Advantages | Limitations | References |
|---|---|---|---|---|---|
| Feature ontology ambiguity | Structural Causal Models (SCMs) | Causal AI | Enables counterfactual inference | Require fully specified causal and measurement equations for every feature | Lamsaf 2025, Gani 2023, Tikka 2023, Bareinboim 2016 |
| Causal Bayesian Network (CBN) models | Causal AI | Allow encoding uncertain or partially known dependencies as probabilistic relationships | Cannot resolve conceptual vagueness; require well-defined random variables | Mascaro 2023, Triantafillou 2021, Wagan 2024 |
|
| Fuzzy Logic | Ambiguity Awareness | Handle conceptual vagueness | Do not resolve causal/relational ambiguities | Kempowsky- Hamon 2015, Paja 2023, Saki 2024 |
|
| Identification problem | Structural Equation Models (SEMs) | Causal AI | Quantify measurement error | Require all relevant confounders to be measured | Christ 2014, Kraus 2024, Sullivan 2021 |
| Proximal Causal Inference (PCI) | Causal AI | Resolves causal effects despite unmeasured confounders | Strict validity conditions for proxies | Eric 2024, Liu 2025, Rakshit 2025 |
| Epistemic Challenge | Approach | Triad Domain | Advantages | Limitations | References |
|---|---|---|---|---|---|
| Signification problem | Representation Similarity Analysis (RSA) | Ambiguity Awareness | Scalable data collection via simple similarity judgments | Cannot capture hierarchical relationships | Ogg 2024 |
| Abstraction Alignment | Ambiguity Awareness | Models hierarchical medical taxonomies | Limited clinical validation | Boggust 2025 | |
| Contrastive Learning | Ambiguity Awareness | Adjusts model representations to match concepts; improves zero-shot accuracy and concept-level explainability | Sensitivity to false negative concept pairings | Marjieh 2025, Huynh 2022, Zang 2021, Ghesu 2022, Nie 2025 |
|
| Misaligned class ontologies | Concept Bottleneck Models (CBMs) | Ambiguity Awareness | Improves model interpretability | Labeling burden (concept annotations) | Koh 2020, Wu 2025, Pang 2024, Park 2024 |
| Bayesian models with domain- informed priors | Causal AI | Enables probabilistic embeddings of clinical knowledge | Priors are not generalizable across architectures | Sam 2024 |
| Epistemic Challenge | Approach | Triad Domain | Advantages | Limitations | References |
|---|---|---|---|---|---|
| Misaligned uncertainty semantics | Conformal Prediction | Uncertainty Quantification | Creates statistical coverage guarantees | Prediction sets are sometimes too large to be clinically helpful | Angelopoulos 2022, Olsson 2022. Sreenivasan 2025, Fayyad 2024, Shanmugam 2025 |
| Bayesian Neural Networks (BNNs) | Uncertainty Quantification | Explicitly represent both epistemic and aleatoric uncertainty | Require representative datasets and well-chosen priors | Kendall 2017, Liebig 2017 |
|
| Domain-shift vulnerability | ODIN/G-ODIN | Uncertainty Quantification | Lightweight, versatile baseline | Overconfident under high epistemic uncertainty | Hsu 2020, Liang 2017, Guo 2017 |
| Mahalanobis Distance-Based OOD Methods | Uncertainty Quantification | Directly examine internal feature embeddings | Strict assumptions (Gaussian, well-clustered features) | Lee 2018, González 2022, Müller 2025, |
|
| Deep Ensemble OOD Methods | Uncertainty Quantification | Highest accuracy | High computational cost | Lakshminarayanan 2017 | |
| Adversarial Domain Adaptation | Ambiguity Awareness | Does not require labelled data from target distribution | Follows linear, rather than dynamic, paradigm | HassanPour 2023, Mus 2025 |
|
| Drift-triggered Continual Learning | Uncertainty Quantification | Follows dynamic paradigm | Catastrophic forgetting | Subasri 2025 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).