Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

How Does ChatGPT Perform on the Italian National Residency Program Admission Test?

Version 1 : Received: 27 March 2024 / Approved: 27 March 2024 / Online: 27 March 2024 (13:20:49 CET)

How to cite: Barone, G.; Confalonieri, F.; Gaeta, A.; Ferraro, V.; Vinciguerra, P.; Di Maria, A. How Does ChatGPT Perform on the Italian National Residency Program Admission Test?. Preprints 2024, 2024031684. https://doi.org/10.20944/preprints202403.1684.v1 Barone, G.; Confalonieri, F.; Gaeta, A.; Ferraro, V.; Vinciguerra, P.; Di Maria, A. How Does ChatGPT Perform on the Italian National Residency Program Admission Test?. Preprints 2024, 2024031684. https://doi.org/10.20944/preprints202403.1684.v1

Abstract

Background: Open AI developed ChatGPT, a language model based on the GPT architecture, designed for text-based communication. Trained on diverse internet texts, ChatGPT generates contextually appropriate responses using machine learning. It understands input through analysis and context interpretation, generating coherent and contextually relevant responses. Interaction is possible through messaging platforms. Materials and Methods: In November 2023, we utilized ChatGPT 3.5, the default version at that time, to answer questions from the Italian National Residency Program Admission Tests (SSMs) of 2021, 2022, and 2023. The questions cover clinical, diagnostic, analytical, therapeutic, and epidemiological scenarios, sometimes accompanied by images. The study compared ChatGPT's answers to the official corrections on the Italian Ministry of University and Research (MUR) website. The scoring method used for evaluation was 1 point for correct answers, 0 points for unanswered questions, and -0.25 points for incorrect answers, reflecting the SSM test scoring system. Results: In summary, ChatGPT was tested with a total of 420 questions, 140 for each test. It achieved an overall accuracy of 80.48%, providing correct answers for 338 questions and incorrect answers for 82 questions. When faced with questions containing both text and images, it answered 55% correctly and 45% incorrectly. The model's performance varied over time, with an 82.14% accuracy rate in 2021 and 2022 (115 correct out of 140) and a 77.14% accuracy rate in 2023 (108 correct out of 140). Applying thisscoring method to the SSM test, ChatGPT would have scored 105 points in 2021 and 2022, and 100 points in 2023. Conclusions: ChatGPT has exhibited above-average performance in the last three SSM tests, highlighting its robust capability to interpret clinical scenarios and offer precise diagnostic and therapeutic guidance. Despite this, some limitations persist, notably the software's inability to interpret non-textual information.

Keywords

ChatGPT; Chat Generative Pre-trained Transformer; GPT-3.5; GPT-4; artificial intelligence (AI); chatbot; natural language processing (NLP); medical education; Italian National Residency Program Admission Test; Scuole di Specializzazione in Medicina (SSM)

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.