Submitted:
18 August 2025
Posted:
20 August 2025
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
1.1. Context and Motivation
1.2. Audience and Scope
1.3. Clinical Reasoning and Multimodal Integration
2. Emerging Multimodal Medical AI Models
2.1. Overview of Major Models
2.2. Approaches: Tool Use, Grafting, and Unified Systems
2.3. Tradeoffs in Model Design
2.4. Specialty Applications Overview
-
Radiology:Multimodal AI models that combine imaging data (e.g. X-rays, CT, MRI) with clinical information from electronic health records are showing improved diagnostic accuracy compared to single-modality approaches [15]. For example, models integrating chest X-rays with patient history and clinical notes have demonstrated enhanced performance in disease classification tasks [37]. Advanced techniques like fusion-based methods and representation learning allow these models to effectively combine visual and textual data [24]. There is also growing interest in cross-modality translation, such as automatically generating radiology reports from images [37], which could significantly reduce radiologists’ workload while maintaining diagnostic quality.
-
Pathology:AI systems that analyze both pathology slide images and molecular/genomic data are being developed to provide more comprehensive tumor characterization and prognostic stratification [22]. The integration of radiomics features from imaging with transcriptomic data has shown superior predictive capability for treatment responses in some cancers compared to single-modality approaches [22]. This comprehensive analysis mirrors the increasing clinical emphasis on integrated diagnostics, where pathologists collaborate with radiologists and other specialists to formulate more precise diagnostic and treatment plans.
-
Dermatology:Multimodal models combining clinical images, dermoscopic images, and patient metadata are being explored to improve skin lesion classification and melanoma detection [40]. These integrated approaches aim to mimic the multifaceted diagnostic process of dermatologists, who routinely consider visual features alongside patient history, risk factors, and other clinical information when making diagnostic decisions. By synthesizing these diverse inputs, multimodal systems offer potential improvements in sensitivity and specificity for skin cancer detection.
-
Ophthalmology:Generative AI techniques like GANs are being used to create synthetic retinal images to expand training datasets [40]. Multimodal foundational models capable of processing both eye images and clinical text show promise for enhancing diagnostic accuracy, patient education, and clinician training in ophthalmology [40]. These models can potentially detect subtle retinal changes associated with systemic diseases like diabetes and hypertension, facilitating earlier intervention and better management of these conditions.
3. Applications Across Medical Specialties
3.1. Radiology Applications
3.2. Pathology Applications
3.3. Dermatology and Ophthalmology Applications
3.4. Benchmark Overview
3.5. VQA-RAD and ROCO
3.5.1. VQA-RAD (Visual Question Answering in Radiology)
3.5.2. ROCO (Radiology Objects in COntext)
3.6. Importance and Impact
3.7. Limitations and Future Directions
4. Benchmark Datasets and Evaluation
4.1. PathVQA and Other QA Datasets
4.2. MIMIC-CXR and ImageCLEF Challenges
4.3. Synthetic Image Generation
4.4. Synthetic Text and Rare Disease Simulation
5. Synthetic Data Generation for Rare Diseases
5.1. Privacy and Collaboration via Synthetic Data
5.2. Clinical Trial Design and Validation Needs
5.3. Human-AI Collaboration Models
6. Clinical Validation and Regulatory Perspectives
6.1. Regulatory and FDA Perspectives
6.2. Ethical and Liability Challenges
6.3. Key Findings and Path Forward
6.4. Remaining Gaps
6.5. Future Research Agenda
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| LLM | Large Language Model |
| VQA | Visual Question Answering |
| MRI | Magnetic Resonance Imaging |
| PET | Positron Emission Tomography |
| CT | Computed Tomography |
| XAI | Explainable Artificial Intelligence |
| Grad-CAM | Gradient-weighted Class Activation Mapping |
| SHAP | SHapley Additive exPlanations |
| LIME | Local Interpretable Model-agnostic Explanations |
| Med-PaLM | Medical Pathways Language Model |
| LLaVA-Med | Large Language and Vision Assistant for Medicine |
| BiomedGPT | Biomedical Generative Pre-trained Transformer |
| BioGPT-ViT | BioGPT Vision Transformer |
| MLG-GAN | Multi-Level Guided Generative Adversarial Network |
| Mul-T | Multimodal Transformer |
| ECG | Electrocardiogram |
| EEG | Electroencephalogram |
| FDG-PET | Fluorodeoxyglucose Positron Emission Tomography |
| GANs | Generative Adversarial Networks |
| ROCO | Radiology Objects in COntext |
| NIH14 | National Institutes of Health Chest X-ray Dataset |
| QA | Question Answering |
| OCT | Optical Coherence Tomography |
| MIMIC | Medical Information Mart for Intensive Care |
| CLARO | CT Imaging of Lung Cancer And Related Outcomes |
| LCID | Lung Cancer Imaging Database |
| EHR | Electronic Health Records |
| AUC-ROC | Area Under the Receiver Operating Characteristic curve |
| BLEU | Bilingual Evaluation Understudy |
| MIMIC-CXR | Medical Information Mart for Intensive Care Chest X-Ray |
| ImageCLEF | Image Cross-Language Evaluation Forum |
| MMBERT | Multi-Modal Bidirectional Encoder Representations from Transformers |
| ROUGE | Recall-Oriented Understudy for Gisting Evaluation |
| CONCH | Contrastive Learning from Captions for Histopathology |
| IHC | Immunohistochemistry |
| H&E | Hematoxylin and Eosin |
| CNN | Convolutional Neural Network |
| RNN | Recurrent Neural Network |
| Onto-CGAN | Ontology-enhanced Conditional Generative Adversarial Network |
| AML | Acute Myeloid Leukemia |
| TPLC | Total Product Life Cycle |
| PMA | Premarket Approval |
| SaMD | Software as a Medical Device |
| FDA | Food and Drug Administration |
References
- Harrer, S.; Shah, P.; Antony, B.; Hu, J. Artificial intelligence for clinical trial design. Trends in pharmacological sciences 2019, 40, 577–591. [Google Scholar] [CrossRef]
- Maleki, M.; Ghahari, S. Clinical trials protocol authoring using llms. arXiv, 2024; arXiv:2404.05044. [Google Scholar]
- Askin, S.; Burkhalter, D.; Calado, G.; El Dakrouni, S. Artificial intelligence applied to clinical trials: opportunities and challenges. Health and technology 2023, 13, 203–213. [Google Scholar] [CrossRef] [PubMed]
- Maleki, M.; Ghahari, S. Comprehensive clustering analysis and profiling of covid-19 vaccine hesitancy and related factors across us counties: Insights for future pandemic responses. Healthcare 2024, 12, 1458. [Google Scholar] [CrossRef]
- Maleki, M.; Khan, M. Covid-19 health equity & justice dashboard: A step towards countering health disparities among seniors and minority population. Available at SSRN 4595845 2023. [Google Scholar]
- Mayorga-Ruiz, I.; Jiménez-Pastor, A.; Fos-Guarinos, B.; López-González, R.; García-Castro, F.; Alberich-Bayarri, Á. The role of AI in clinical trials. Artificial Intelligence in Medical Imaging: Opportunities, applications and risks, 2019; 231–243. [Google Scholar]
- Maleki, M. Clustering analysis of us covid-19 rates, vaccine participation, and socioeconomic factors. arXiv, 2024; arXiv:2404.08186. [Google Scholar]
- Maleki, M.; Haeri, F. Identification of cardiovascular diseases through ECG classification using wavelet transformation. arXiv, 2024; arXiv:2404.09393. [Google Scholar]
- Woo, M. An AI boost for clinical trials. Nature 2019, 573, S100–S100. [Google Scholar] [CrossRef] [PubMed]
- Maleki, M.; Ghahari, S. Impact of Major Health Events on Pharmaceutical Stocks: A Comprehensive Analysis Using Macroeconomic and Market Indicators. arXiv, 2024; arXiv:2408.01883. [Google Scholar]
- Maleki, M.; Bahrami, M.; Menendez, M.; Balsa-Barreiro, J. Social Behavior and COVID-19: Analysis of the Social Factors behind Compliance with Interventions across the United States. International Journal of Environmental Research and Public Health 2022, 19. [Google Scholar] [CrossRef]
- Angus, D.C. Randomized clinical trials of artificial intelligence. Jama 2020, 323, 1043–1045. [Google Scholar] [CrossRef]
- Reddy, S. Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Implementation science : IS 2024, 19, 27. [Google Scholar] [CrossRef]
- Abdullakutty, F.; Akbari, Y.; Al-Maadeed, S.; Bouridane, A. Histopathology in focus: a review on explainable multi-modal approaches for breast cancer diagnosis. Frontiers in medicine 2024, 11, 1450103. [Google Scholar] [CrossRef]
- Haq, I.U.; Mhamed, M.; Al-Harbi, M.; Osman, H. Advancements in Medical Radiology Through Multimodal Machine Learning: A Comprehensive Overview. Bioengineering (Basel, Switzerland) 2025, 12. [Google Scholar] [CrossRef]
- Rouzrokh, P.; Khosravi, B.; Faghani, S.; Moassefi, M. A Current Review of Generative AI in Medicine: Core Concepts, Applications, and Current Limitations. Current reviews in musculoskeletal medicine 2025, 18, 246–266. [Google Scholar] [CrossRef]
- Rashidi, H.H.; Pantanowitz, J.; Chamanzar, A.; Fennell, B. Generative Artificial Intelligence in Pathology and Medicine: A Deeper Dive. Modern pathology : an official journal of the United States and Canadian Academy of Pathology, Inc 2025, 38, 100687. [Google Scholar] [CrossRef]
- Araújo, C.C.; Frias, J.; Mendes, F.; Martins, M. Unlocking the Potential of AI in EUS and ERCP: A Narrative Review for Pancreaticobiliary Disease. Cancers 2025, 17. [Google Scholar] [CrossRef]
- Gao, X.; Shi, F.; Shen, D.; Liu, M. Multimodal transformer network for incomplete image generation and diagnosis of Alzheimer’s disease. Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society 2023, 110, 102303. [Google Scholar] [CrossRef] [PubMed]
- Kunze, K.N. Generative Artificial Intelligence and Musculoskeletal Health Care. HSS journal : the musculoskeletal journal of Hospital for Special Surgery 2025, 21, 15563316251335334. [Google Scholar] [CrossRef] [PubMed]
- Oettl, F.C.; Zsidai, B.; Oeding, J.F.; Hirschmann, M.T. Beyond traditional orthopaedic data analysis: AI, multimodal models and continuous monitoring. Knee surgery, sports traumatology, arthroscopy : official journal of the ESSKA 2025, 33, 2269–2275. [Google Scholar] [CrossRef] [PubMed]
- Parvin, N.; Joo, S.W.; Jung, J.H.; Mandal, T.K. Multimodal AI in Biomedicine: Pioneering the Future of Biomaterials, Diagnostics, and Personalized Healthcare. Nanomaterials (Basel, Switzerland) 2025, 15. [Google Scholar] [CrossRef]
- Tortora, L. Beyond Discrimination: Generative AI Applications and Ethical Challenges in Forensic Psychiatry. Frontiers in psychiatry 2024, 15, 1346059. [Google Scholar] [CrossRef]
- Gao, Y.; Wen, P.; Liu, Y.; Sun, Y. Application of artificial intelligence in the diagnosis of malignant digestive tract tumors: focusing on opportunities and challenges in endoscopy and pathology. Journal of translational medicine 2025, 23, 412. [Google Scholar] [CrossRef]
- Rao, V.M.; Hla, M.; Moor, M.; Adithan, S. Multimodal generative AI for medical image interpretation. Nature 2025, 639, 888–896. [Google Scholar] [CrossRef]
- Han, T.; Jeong, W.K.; Shin, J. Diagnostic performance of multimodal large language models in radiological quiz cases: the effects of prompt engineering and input conditions. Ultrasonography (Seoul, Korea) 2025, 44, 220–231. [Google Scholar] [CrossRef]
- Shao, J.; Ma, J.; Zhang, Q.; Li, W. Predicting gene mutation status via artificial intelligence technologies based on multimodal integration (MMI) to advance precision oncology. Seminars in cancer biology 2023, 91, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Pfob, A.; Sidey-Gibbons, C.; Barr, R.G.; Duda, V. The importance of multi-modal imaging and clinical information for humans and AI-based algorithms to classify breast masses (INSPiRED 003): an international, multicenter analysis. European radiology 2022, 32, 4101–4115. [Google Scholar] [CrossRef]
- Ullah, E.; Baig, M.M.; Waqas, A.; Rasool, G. Multimodal Generative AI for Anatomic Pathology-A Review of Current Applications to Envisage the Future Direction. Advances in anatomic pathology 2025. [Google Scholar] [CrossRef]
- Jain, S.S.; Elias, P.; Poterucha, T.; Randazzo, M. Artificial Intelligence in Cardiovascular Care-Part 2: Applications: JACC Review Topic of the Week. Journal of the American College of Cardiology 2024, 83, 2487–2496. [Google Scholar] [CrossRef] [PubMed]
- Hagos, D.H.; Aryal, S.K.; Ymele-Leki, P.; Burge, L.L. AI-driven multimodal colorimetric analytics for biomedical and behavioral health diagnostics. Computational and structural biotechnology journal 2025, 27, 2219–2232. [Google Scholar] [CrossRef]
- Javan, R.; Kim, T.; Mostaghni, N. GPT-4 Vision: Multi-Modal Evolution of ChatGPT and Potential Role in Radiology. Cureus 2024, 16, e68298. [Google Scholar] [CrossRef]
- Geersing, G.J.; de Wit, N.J.; Thompson, M. Generative artificial intelligence for general practice; new potential ahead, but are we ready? The European journal of general practice 2025, 31, 2511645. [Google Scholar] [CrossRef] [PubMed]
- Maleki, M. Advancing Healthcare Accessibility through a Neighborhood Search Recommendation Tool. Available at SSRN 4825773 2024. [Google Scholar] [CrossRef]
- Maleki, M. Evaluating the Reproducibility of ICU Patient Readmission using RNN and ODE models. Available at SSRN 4825763 2024. [Google Scholar] [CrossRef]
- Brodsky, V.; Ullah, E.; Bychkov, A.; Song, A.H. Generative Artificial Intelligence in Anatomic Pathology. Archives of pathology & laboratory medicine 2025, 149, 298–318. [Google Scholar] [CrossRef]
- Hong, E.K.; Ham, J.; Roh, B.; Gu, J. Diagnostic Accuracy and Clinical Value of a Domain-specific Multimodal Generative AI Model for Chest Radiograph Report Generation. Radiology 2025, 314, e241476. [Google Scholar] [CrossRef]
- Hong, G.S.; Jang, M.; Kyung, S.; Cho, K. Overcoming the Challenges in the Development and Implementation of Artificial Intelligence in Radiology: A Comprehensive Review of Solutions Beyond Supervised Learning. Korean journal of radiology 2023, 24, 1061–1080. [Google Scholar] [CrossRef]
- Chang, C.; Shi, W.; Wang, Y.; Zhang, Z. The path from task-specific to general purpose artificial intelligence for medical diagnostics: A bibliometric analysis. Computers in biology and medicine 2024, 172, 108258. [Google Scholar] [CrossRef]
- Sonmez, S.C.; Sevgi, M.; Antaki, F.; Huemer, J. Generative artificial intelligence in ophthalmology: current innovations, future applications and challenges. The British journal of ophthalmology 2024, 108, 1335–1340. [Google Scholar] [CrossRef]
- Zhang, X.; Wu, C.; Zhao, Z.; Lin, W. Development of a large-scale medical visual question-answering dataset. Communications medicine 2024, 4, 277. [Google Scholar] [CrossRef] [PubMed]
- Sorin, V.; Barash, Y.; Konen, E.; Klang, E. Creating Artificial Images for Radiology Applications Using Generative Adversarial Networks (GANs) - A Systematic Review. Academic radiology 2020, 27, 1175–1185. [Google Scholar] [CrossRef] [PubMed]
- Alajaji, S.A.; Khoury, Z.H.; Elgharib, M.; Saeed, M. Generative Adversarial Networks in Digital Histopathology: Current Applications, Limitations, Ethical Considerations, and Future Directions. Modern pathology : an official journal of the United States and Canadian Academy of Pathology, Inc 2024, 37, 100369. [Google Scholar] [CrossRef]
- Panagoulias, D.P.; Tsoureli-Nikita, E.; Virvou, M.; Tsihrintzis, G.A. Dermacen analytica: A novel methodology integrating multi-modal large language models with machine learning in dermatology. International journal of medical informatics 2025, 199, 105898. [Google Scholar] [CrossRef]
- Algarni, A. CareAssist GPT improves patient user experience with a patient centered approach to computer aided diagnosis. Scientific reports 2025, 15, 22727. [Google Scholar] [CrossRef]
- Teoh, J.R.; Dong, J.; Zuo, X.; Lai, K.W. Advancing healthcare through multimodal data fusion: a comprehensive review of techniques and applications. PeerJ. Computer science 2024, 10, e2298. [Google Scholar] [CrossRef] [PubMed]
- Brin, D.; Sorin, V.; Barash, Y.; Konen, E. Assessing GPT-4 multimodal performance in radiological image analysis. European radiology 2025, 35, 1959–1965. [Google Scholar] [CrossRef]
- Sosna, J.; Joskowicz, L.; Saban, M. Navigating the AI Landscape in Medical Imaging: A Critical Analysis of Technologies, Implementation, and Implications. Radiology 2025, 315, e240982. [Google Scholar] [CrossRef]
- Hacking, S. Foundation models in pathology: bridging AI innovation and clinical practice. Journal of clinical pathology 2025, 78, 433–435. [Google Scholar] [CrossRef]
- Lu, M.Y.; Chen, B.; Williamson, D.F.K.; Chen, R.J. A multimodal generative AI copilot for human pathology. Nature 2024, 634, 466–473. [Google Scholar] [CrossRef]
- Ferber, D.; El Nahhas, O.S.M.; Wölflein, G.; Wiest, I.C. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nature cancer 2025. [Google Scholar] [CrossRef]
- Lee, S.; Youn, J.; Kim, H.; Kim, M. CXR-LLaVA: a multimodal large language model for interpreting chest X-ray images. European radiology 2025, 35, 4374–4386. [Google Scholar] [CrossRef] [PubMed]
- Gupta, A.; Rajamohan, N.; Bansal, B.; Chaudhri, S. Applications of artificial intelligence in abdominal imaging. Abdominal radiology (New York), 2025. [Google Scholar] [CrossRef]
- Van Booven, D.J.; Chen, C.B.; Malpani, S.; Mirzabeigi, Y. Synthetic Genitourinary Image Synthesis via Generative Adversarial Networks: Enhancing Artificial Intelligence Diagnostic Precision. Journal of personalized medicine 2024, 14. [Google Scholar] [CrossRef]
- Sun, C.; Dumontier, M. Generating unseen diseases patient data using ontology enhanced generative adversarial networks. NPJ digital medicine 2025, 8, 4. [Google Scholar] [CrossRef]
- Yang, Y.; Shen, H.; Chen, K.; Li, X. From pixels to patients: the evolution and future of deep learning in cancer diagnostics. Trends in molecular medicine 2025, 31, 548–558. [Google Scholar] [CrossRef] [PubMed]
- Segal, B.; Rubin, D.M.; Rubin, G.; Pantanowitz, A. Evaluating the Clinical Realism of Synthetic Chest X-Rays Generated Using Progressively Growing GANs. SN computer science 2021, 2, 321. [Google Scholar] [CrossRef]
- Eckardt, J.N.; Hahn, W.; Röllig, C.; Stasik, S. Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence. NPJ digital medicine 2024, 7, 76. [Google Scholar] [CrossRef] [PubMed]
- Ibrahim, M.; Khalil, Y.A.; Amirrajab, S.; Sun, C. Generative AI for synthetic data across multiple medical modalities: A systematic review of recent developments and challenges. Computers in biology and medicine 2025, 189, 109834. [Google Scholar] [CrossRef]
- Liu, F.; Zhou, H.; Wang, K.; Yu, Y. MetaGP: A generative foundation model integrating electronic health records and multimodal imaging for addressing unmet clinical needs. Cell reports. Medicine 2025, 6, 102056. [Google Scholar] [CrossRef]
- Hirosawa, T.; Harada, Y.; Tokumasu, K.; Ito, T. Evaluating ChatGPT-4’s Diagnostic Accuracy: Impact of Visual Data Integration. JMIR medical informatics 2024, 12, e55627. [Google Scholar] [CrossRef] [PubMed]
- Hosny, A.; Bitterman, D.S.; Guthier, C.V.; Qian, J.M. Clinical validation of deep learning algorithms for radiotherapy targeting of non-small-cell lung cancer: an observational study. The Lancet. Digital health 2022, 4, e657–e666. [Google Scholar] [CrossRef] [PubMed]
- Sakamoto, T.; Harada, Y.; Shimizu, T. Facilitating Trust Calibration in Artificial Intelligence-Driven Diagnostic Decision Support Systems for Determining Physicians’ Diagnostic Accuracy: Quasi-Experimental Study. JMIR formative research 2024, 8, e58666. [Google Scholar] [CrossRef]
- Potnis, K.C.; Ross, J.S.; Aneja, S.; Gross, C.P. Artificial Intelligence in Breast Cancer Screening: Evaluation of FDA Device Regulation and Future Recommendations. JAMA internal medicine 2022, 182, 1306–1312. [Google Scholar] [CrossRef]
- Han, G.R.; Goncharov, A.; Eryilmaz, M.; Ye, S. Machine learning in point-of-care testing: innovations, challenges, and opportunities. Nature communications 2025, 16, 3165. [Google Scholar] [CrossRef]
- Ratkevičiūtė, K.; Aliukonis, V. Exploring Opportunities and Challenges of AI in Primary Healthcare: A Qualitative Study with Family Doctors in Lithuania. Healthcare (Basel, Switzerland) 2025, 13. [Google Scholar] [CrossRef]
- Hasan, S.S.; Fury, M.S.; Woo, J.J.; Kunze, K.N. Ethical Application of Generative Artificial Intelligence in Medicine. Arthroscopy : the journal of arthroscopic & related surgery : official publication of the Arthroscopy Association of North America and the International Arthroscopy Association 2025, 41, 874–885. [Google Scholar] [CrossRef]
- Kumar, R.; Waisberg, E.; Ong, J.; Paladugu, P. Artificial Intelligence-Based Methodologies for Early Diagnostic Precision and Personalized Therapeutic Strategies in Neuro-Ophthalmic and Neurodegenerative Pathologies. Brain sciences 2024, 14. [Google Scholar] [CrossRef] [PubMed]
- Sablone, S.; Bellino, M.; Cardinale, A.N.; Esposito, M. Artificial intelligence in healthcare: an Italian perspective on ethical and medico-legal implications. Frontiers in medicine 2024, 11, 1343456. [Google Scholar] [CrossRef] [PubMed]
- Jha, D.; Durak, G.; Das, A.; Sanjotra, J. Ethical framework for responsible foundational models in medical imaging. Frontiers in medicine 2025, 12, 1544501. [Google Scholar] [CrossRef]
- Huang, J.; Wittbrodt, M.T.; Teague, C.N.; Karl, E. Efficiency and Quality of Generative AI-Assisted Radiograph Reporting. JAMA network open 2025, 8, e2513921. [Google Scholar] [CrossRef]
- Lipkova, J.; Chen, R.J.; Chen, B.; Lu, M.Y. Artificial intelligence for multimodal data integration in oncology. Cancer cell 2022, 40, 1095–1110. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).