Submitted:
16 September 2025
Posted:
18 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods
2.1. Overview of Proposed Pipeline
2.2. Data Collection
2.3. Clinical Dataset and Pre-Processing
2.4. Synthetic Dataset Generation
2.5. Diagnostic Framework
- ED history and physical examination findings, which form the textual component of the diagnostic evaluation.
- A medical image associated with the patient’s condition, corresponding to one of the ophthalmic imaging modalities under investigation.
- A description of the image type (e.g. fundus image, anterior segment photo) to provide context for interpretation.
2.6. Evaluation Metric
2.7. Statistical Analysis
3. Results
3.1. By Image Modality
3.1. By Diagnosis Category

3.3. Impact of Chain-of-Thought Reasoning
4. Discussion
4.1. Influence of Image Modality and Diagnosis Category on Dissimilarity
4.2. Chain of Thought Reasoning and Its Effect on Diagnostic Dissimilarity
4.3. Significance of Study Findings
4.4. Limitations
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mir, T.A.; Mehta, S.; Qiang, K.; Adelman, R.A.; Del Priore, L.V.; Chow, J. Association of the Affordable Care Act with Eye-Related Emergency Department Utilization in the United States. Ophthalmology 2022, 129, 1412–1420. [Google Scholar] [CrossRef] [PubMed]
- Khou, V.; Ly, A.; Moore, L.; Markoulli, M.; Kalloniatis, M.; Yapp, M.; Hennessy, M.; Zangerl, B. Review of Referrals Reveal the Impact of Referral Content on the Triage and Management of Ophthalmology Wait Lists. BMJ Open 2021, 11, e047246. [Google Scholar] [CrossRef] [PubMed]
- Grossmann, F.F.; Zumbrunn, T.; Ciprian, S.; Stephan, F.-P.; Woy, N.; Bingisser, R.; Nickel, C.H. Undertriage in Older Emergency Department Patients – Tilting against Windmills? PLoS ONE 2014, 9, e106203. [Google Scholar] [CrossRef] [PubMed]
- Yin, J.; Jiang, B.; Zhao, T.; Guo, X.; Tan, Y.; Wang, Y. Trends in the Global Burden of Vision Loss among the Older Adults from 1990 to 2019. Front. Public Health 2024, 12, 1324141. [Google Scholar] [CrossRef] [PubMed]
- Cascella, M.; Semeraro, F.; Montomoli, J.; Bellini, V.; Piazza, O.; Bignami, E. The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives. J. Med. Syst. 2024, 48, 22. [Google Scholar] [CrossRef] [PubMed]
- Nazi, Z.A.; Peng, W. Large Language Models in Healthcare and Medical Domain: A Review. Informatics 2024, 11, 57. [Google Scholar] [CrossRef]
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large Language Models Are Zero-Shot Reasoners 2022.
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.; Le, Q.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models 2022.
- Wu, C.-K.; Chen, W.-L.; Chen, H.-H. Large Language Models Perform Diagnostic Reasoning 2023.
- Liu, J.; Wang, Y.; Du, J.; Zhou, J.T.; Liu, Z. MedCoT: Medical Chain of Thought via Hierarchical Expert. In Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics: Miami, Florida, USA, 2024; pp. 17371–17389. [Google Scholar]
- Sittig, D.F.; Singh, H. Recommendations to Ensure Safety of AI in Real-World Clinical Care. JAMA 2025, 333, 457. [Google Scholar] [CrossRef] [PubMed]
- Xu, H.; Stetson, P.D.; Friedman, C. A Study of Abbreviations in Clinical Notes. AMIA Annu. Symp. Proc. AMIA Symp. 2007, 2007, 821–825. [Google Scholar] [PubMed]
- Moon, S.; McInnes, B.; Melton, G.B. Challenges and Practical Approaches with Word Sense Disambiguation of Acronyms and Abbreviations in the Clinical Domain. Healthc. Inform. Res. 2015, 21, 35. [Google Scholar] [CrossRef] [PubMed]
- Koga, S.; Du, W. Challenges of Integrating Chatbot Use in Ophthalmology Diagnostics. JAMA Ophthalmol. 2024, 142, 883. [Google Scholar] [CrossRef] [PubMed]
- Mihalache, A.; Huang, R.S.; Popovic, M.M.; Patil, N.S.; Pandya, B.U.; Shor, R.; Pereira, A.; Kwok, J.M.; Yan, P.; Wong, D.T.; et al. Accuracy of an Artificial Intelligence Chatbot’s Interpretation of Clinical Ophthalmic Images. JAMA Ophthalmol. 2024, 142, 321. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Wu, X.; Li, M.; Liu, L.; Zhong, L.; Xiao, J.; Lou, B.; Zhong, X.; Chen, Y.; Huang, W.; et al. EE-Explorer: A Multimodal Artificial Intelligence System for Eye Emergency Triage and Primary Diagnosis. Am. J. Ophthalmol. 2023, 252, 253–264. [Google Scholar] [CrossRef] [PubMed]
- Tomita, K.; Nishida, T.; Kitaguchi, Y.; Kitazawa, K.; Miyake, M. Image Recognition Performance of GPT-4V(Ision) and GPT-4o in Ophthalmology: Use of Images in Clinical Questions. Clin. Ophthalmol. 2025, Volume 19, 1557–1564. [Google Scholar] [CrossRef] [PubMed]
- Mikhail, D.; Milad, D.; Antaki, F.; Milad, J.; Farah, A.; Khairy, T.; El-Khoury, J.; Bachour, K.; Szigiato, A.-A.; Nayman, T.; et al. Multimodal Performance of GPT-4 in Complex Ophthalmology Cases. J. Pers. Med. 2025, 15, 160. [Google Scholar] [CrossRef] [PubMed]
- Balaskas, G.; Papadopoulos, H.; Pappa, D.; Loisel, Q.; Chastin, S. A Framework for Domain-Specific Dataset Creation and Adaptation of Large Language Models. Computers 2025, 14, 172. [Google Scholar] [CrossRef]
- Buckley, T.; Diao, J.A.; Rajpurkar, P.; Rodman, A.; Manrai, A.K. Multimodal Foundation Models Exploit Text to Make Medical Image Predictions 2023.
- Li, K.Z.; Nguyen, T.T.; Moss, H.E. Performance of Vision Language Models for Optic Disc Swelling Identification on Fundus Photographs. Front. Digit. Health 2025, 7, 1660887. [Google Scholar] [CrossRef] [PubMed]
- Wang, M.Y.; Asanad, S.; Asanad, K.; Karanjia, R.; Sadun, A.A. Value of Medical History in Ophthalmology: A Study of Diagnostic Accuracy. J. Curr. Ophthalmol. 2018, 30, 359–364. [Google Scholar] [CrossRef] [PubMed]
- WU, J.; Wu, X.; Yang, J. Guiding Clinical Reasoning with Large Language Models via Knowledge Seeds 2024.
- Author 1, A.B. (University, City, State, Country); Author 2, C. (Institute, City, State, Country). Personal communication, 2012.




| Parameter | Value |
|---|---|
| Model Name | LLaMA 3 8B Instruct |
| Temperature | 0.6 |
| Max Tokens Top-p Sampling Method |
2048 0.9 Nucleus Sampling (Top-p) |
| Reference Diagnosis | Predicted Diagnosis | LCA term | Distance (Ref → LCA) | Distance (Pred → LCA) | Total Dissimilarity |
|---|---|---|---|---|---|
| Rhegmatogenous Retinal Detachment | Retinal Detachment | Retinal Detachment | 0 |
1 |
|
| 1 | |||||
| Non-proliferative Diabetic Retinopathy | Diabetic Retinopathy, Macular Edema | Diabetic Retinopathy | 1 | ||
| 1 | 2 | ||||
| Bilateral Papilledema | Benign Intracranial Hypertension | Disorder of Intracranial Pressure | 1 | 1 | 1 |
| Final Diagnosis | Average Dissimilarity (Image) |
Average Dissimilarity (Image + COT) |
|---|---|---|
| Acute Angle-Closure Glaucoma | 5.01 | 0.00 |
| Allergic Conjunctivitis | 3.02 | 1.70 |
| Arteritic Anterior Ischemic Optic Neuropathy | 10.50 | 5.20 |
| Chronic Angle-Closure Glaucoma | 12.20 | 1.40 |
| Cortical Cataract | 3.40 | 0.00 |
| Fourth Nerve Palsy | 10.00 | 0.00 |
| Sixth Nerve Palsy | 6.00 | 3.00 |
| Surgical Third Nerve Palsy | 3.00 | 1.00 |
| Third Nerve Palsy | 5.00 | 3.40 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).