Submitted:
05 December 2025
Posted:
07 December 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
3. Results
3.1. Performance in the 118th JNMLE
3.1.1. Overall Results
3.1.2. Accuracy Comparison
3.2. Performance in the 117th JNMLE
3.2.1. Overall Results
3.2.2. Accuracy Comparison
3.3. Performance in the 116th JNMLE
3.3.1. Overall Results
3.3.2. Accuracy Comparison
3.4. Combined Analysis of All Three Examinations
3.4.1. Integrated Overall Accuracy
3.4.2. Section-Based Performance
3.4.3. Question-Type Performance
3.4.4. Prohibited Choices
4. Discussion
5. Limitation
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| JNMLE | Japan National Medical Licensing Examination |
References
- Introducing ChatGPT. Open AI. Available online: https://openai.com/blog/chatgpt (accessed on 2025 Oct 21). https://perma.cc/MF36-PUJM.
- Tanaka, Y.; Nakata, T.; Aiga, K.; Etani, T.; Muramatsu, R.; Katagiri, S.; Kawai, H.; Higashino, F.; Enomoto, M.; Noda, M.; et al. Performance of Generative Pretrained Transformer on the National Medical Licensing Examination in Japan. PLOS Digit. Health 2024, 3, e0000433. [Google Scholar] [CrossRef] [PubMed]
- Takagi, S.; Watari, T.; Erabi, A.; Sakaguchi, K. Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study. JMIR Med Educ. 2023, 9, e48002. [Google Scholar] [CrossRef] [PubMed]
- Yanagita, Y.; Yokokawa, D.; Uchida, S.; Tawara, J.; Ikusaka, M. Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study. JMIR Form. Res. 2023, 7, e48023. [Google Scholar] [CrossRef] [PubMed]
- A Murad, I.; Khaleel, M.I.; Shakor, M.Y. Unveiling GPT-4o: Enhanced Multimodal Capabilities and Comparative Insights with ChatGPT-4. Int. J. Electron. Commun. Syst. 2024, 4, 127–136. [Google Scholar] [CrossRef]
- Nakao, T.; Miki, S.; Nakamura, Y.; Kikuchi, T.; Nomura, Y.; Hanaoka, S.; Yoshikawa, T.; Abe, O. Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study. JMIR Med Educ. 2024, 10, e54393. [Google Scholar] [CrossRef] [PubMed]
- Liu, M.; Okuhara, T.; Dai, Z.; Huang, W.; Gu, L.; Okada, H.; Furukawa, E.; Kiuchi, T. Evaluating the Effectiveness of advanced large language models in medical Knowledge: A Comparative study using Japanese national medical examination. Int. J. Med Informatics 2025, 193, 105673. [Google Scholar] [CrossRef] [PubMed]
- Available online: https://www.mhlw.go.jp/seisakunitsuite/bunya/kenkou_iryou/iryou/topics/tp240424-01.html (accessed on 2025 Oct 21). https://perma.cc/F9DY-KCKF.
- Available online: https://www.mhlw.go.jp/seisakunitsuite/bunya/kenkou_iryou/iryou/topics/tp230502-01.html (accessed on 2025 Oct 21). https://perma.cc/2SSR-2WMC.
- Available online: https://www.mhlw.go.jp/seisakunitsuite/bunya/kenkou_iryou/iryou/topics/tp220421-01.html (accessed on 2025 Oct 21). https://perma.cc/6ZH9-MBNC.
- Available online: https://www.mhlw.go.jp/general/sikaku/successlist/2024/siken01/about.html (accessed on 2025 Oct 21). https://perma.cc/S286-GW5U.
- Available online: https://www.mhlw.go.jp/general/sikaku/successlist/2023/siken01/about.html (accessed on 2025 Oct 21). https://perma.cc/FDQ2-62TL.
- Available online: https://www.mhlw.go.jp/general/sikaku/successlist/2022/siken01/about.html (accessed on 2025 Oct 21). https://perma.cc/98MC-JUJT.
- Available online: https://informa.medilink-study.com/web-informa/post41529.html/ (accessed on 2025 Oct 21). https://perma.cc/RQ8V-4RP5.
- Available online: https://informa.medilink-study.com/web-informa/post39343.html/ (accessed on 2025 Oct 21). https://perma.cc/NPW4-HDFT.
- Available online: https://informa.medilink-study.com/web-informa/post36171.html/ (accessed on 2025 Oct 21). https://perma.cc/UF3L-8PSE.
- Miyazaki, Y.; Hata, M.; Omori, H.; Hirashima, A.; Nakagawa, Y.; Eto, M.; Takahashi, S.; Ikeda, M. Performance of ChatGPT-4o on the Japanese Medical Licensing Examination: Evalution of Accuracy in Text-Only and Image-Based Questions. JMIR Med Educ. 2024, 10, e63129–e63129. [Google Scholar] [CrossRef]
- Nakao, T.; Miki, S.; Nakamura, Y.; Kikuchi, T.; Nomura, Y.; Hanaoka, S.; Yoshikawa, T.; Abe, O. Capability of GPT-4V(ision) in the Japanese National Medical Licensing Examination: Evaluation Study. JMIR Med Educ. 2024, 10, e54393. [Google Scholar] [CrossRef]



![]() |
| Question Section | Question Type | GPT-4o Number of Correct Answers |
GPT-4o Number of Questions |
GPT-4o Correct Answer Rate |
GPT-4 Number of Correct Answers |
GPT-4 Number of Questions |
GPT-4 Correct Answer Rate |
P Value |
| 118 All questions | All Questions | 363 | 400 | 90.8% | 333 | 400 | 83.3% | 0.0016 |
| 118 All questions | Picture-based Questions | 85 | 101 | 84.2% | 76 | 101 | 75.2% | 0.1153 |
| 118 All questions | Text-based Questions | 278 | 299 | 93.0% | 257 | 299 | 86.0% | 0.0050 |
| 118 Essential Section | All Questions | 96 | 100 | 96.0% | 89 | 100 | 89.0% | 0.0602 |
| 118 Essential Section | Picture-based Questions | 8 | 10 | 80.0% | 6 | 10 | 60.0% | 0.3291 |
| 118 Essential Section | Text-based Questions | 88 | 90 | 97.8% | 83 | 90 | 92.2% | 0.0005 |
| 118 Non-essential Sections | All Questions | 267 | 300 | 89.0% | 244 | 300 | 81.3% | 0.0084 |
| 118 Non-essential Sections | Picture-based Questions | 77 | 91 | 84.6% | 70 | 91 | 76.9% | 0.1879 |
| 118 Non-essential Sections | Text-based Questions | 190 | 209 | 90.9% | 174 | 209 | 83.3% | 0.0192 |
| 118 General Section | All Questions | 133 | 150 | 88.7% | 123 | 150 | 82.0% | 0.1026 |
| 118 General Section | Picture-based Questions | 5 | 10 | 50.0% | 7 | 10 | 70.0% | 0.3613 |
| 118 General Section | Text-based Questions | 128 | 140 | 91.4% | 116 | 140 | 82.9% | 0.0321 |
| 118 Clinical Section | All Questions | 230 | 250 | 92.0% | 210 | 250 | 84.0% | 0.0059 |
| 118 Clinical Section | Picture-based Questions | 80 | 91 | 87.9% | 69 | 91 | 75.8% | 0.0343 |
| 118 Clinical Section | Text-based Questions | 150 | 159 | 94.3% | 141 | 159 | 88.7% | 0.0701 |
| 117 All questions | All Questions | 350 | 393 | 89.1% | 312 | 393 | 79.4% | 0.0016 |
| 117 All questions | Picture-based Questions | 107 | 127 | 84.3% | 83 | 127 | 65.4% | 0.0005 |
| 117 All questions | Text-based Questions | 243 | 266 | 91.4% | 229 | 266 | 86.1% | 0.0550 |
| 117 Essential Section | All Questions | 94 | 100 | 94.0% | 81 | 100 | 81.0% | 0.0054 |
| 117 Essential Section | Picture-based Questions | 13 | 16 | 81.3% | 10 | 16 | 62.5% | 0.2381 |
| 117 Essential Section | Text-based Questions | 81 | 84 | 96.4% | 71 | 84 | 84.5% | 0.0085 |
| 117 Non-essential Sections | All Questions | 256 | 293 | 87.4% | 231 | 293 | 78.8% | 0.0058 |
| 117 Non-essential Sections | Picture-based Questions | 94 | 111 | 84.7% | 73 | 111 | 65.8% | 0.0010 |
| 117 Non-essential Sections | Text-based Questions | 162 | 182 | 89.0% | 158 | 182 | 86.8% | 0.5201 |
| 117 General Section | All Questions | 128 | 148 | 86.6% | 119 | 148 | 80.4% | 0.0042 |
| 117 General Section | Picture-based Questions | 9 | 14 | 64.3% | 7 | 14 | 50.0% | 0.4450 |
| 117 General Section | Text-based Questions | 117 | 132 | 88.6% | 110 | 132 | 83.3% | 0.2145 |
| 117 Clinical Section | All Questions | 222 | 246 | 90.2% | 193 | 246 | 78.5% | 0.0003 |
| 117 Clinical Section | Picture-based Questions | 96 | 111 | 86.5% | 74 | 111 | 66.7% | 0.0004 |
| 117 Clinical Section | Text-based Questions | 126 | 134 | 94.0% | 119 | 134 | 88.8% | 0.1268 |
| 116 All questions | All Questions | 366 | 395 | 92.7% | 330 | 394 | 83.8% | 0.0016 |
| 116 All questions | Picture-based Questions | 84 | 94 | 89.4% | 71 | 94 | 75.5% | 0.0126 |
| 116 All questions | Text-based Questions | 282 | 301 | 93.7% | 259 | 300 | 86.3% | 0.0027 |
| 116 Essential Section | All Questions | 94 | 98 | 95.9% | 83 | 97 | 85.6% | 0.0130 |
| 116 Essential Section | Picture-based Questions | 12 | 13 | 92.3% | 11 | 13 | 85.6% | 0.5393 |
| 116 Essential Section | Text-based Questions | 82 | 85 | 96.5% | 72 | 84 | 85.7% | 0.0145 |
| 116 Non-essential Sections | All Questions | 272 | 297 | 91.6% | 247 | 297 | 83.2% | 0.0020 |
| 116 Non-essential Sections | Picture-based Questions | 72 | 81 | 88.9% | 60 | 81 | 74.1% | 0.0152 |
| 116 Non-essential Sections | Text-based Questions | 200 | 216 | 92.6% | 187 | 216 | 86.6% | 0.0406 |
| 116 General Section | All Questions | 139 | 147 | 94.6% | 123 | 146 | 84.2% | 0.0042 |
| 116 General Section | Picture-based Questions | 4 | 7 | 57.1% | 3 | 7 | 42.9% | 0.5929 |
| 116 General Section | Text-based Questions | 135 | 140 | 96.4% | 120 | 139 | 86.3% | 0.0027 |
| 116 Clinical Section | All Questions | 227 | 248 | 91.5% | 207 | 248 | 83.5% | 0.0066 |
| 116 Clinical Section | Picture-based Questions | 80 | 87 | 92.0% | 68 | 87 | 78.2% | 0.0107 |
| 116 Clinical Section | Text-based Questions | 147 | 161 | 91.3% | 139 | 161 | 86.3% | 0.1571 |
|
GPT-4o Number of Prohibit Answers |
GPT-4o Number of Questions |
GPT-4o Prohibit Answer Rate |
GPT-4 Number of Prohibit Answers |
GPT-4 Number of Questions |
GPT-4 Prohibit Answer Rate |
|
| 118th JNMLE | 0 | 9 | 0.0% | 1 | 9 | 1.1% |
| 117th JNMLE | 0 | 11 | 0.0% | 2 | 11 | 8.2% |
| 116th JNMLE | 0 | 9 | 0.0% | 2 | 9 | 22.2% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
