Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Consistency and Quality of ChatGPT Responses Compared to Clinical Guidelines for Ovarian Cancer: A Delphi Approach.

Version 1 : Received: 6 March 2024 / Approved: 7 March 2024 / Online: 7 March 2024 (09:48:59 CET)

How to cite: Piazza, D.; Martorana, F.; Curaba, A.; Sambataro, D.; Valerio, M.R.; Firenze, A.; Pecorino, B.; Scollo, P.; Chiantera, V.; Scibilia, G.; Vigneri, P.; Gebbia, V.; Scandurra, G. Consistency and Quality of ChatGPT Responses Compared to Clinical Guidelines for Ovarian Cancer: A Delphi Approach.. Preprints 2024, 2024030385. https://doi.org/10.20944/preprints202403.0385.v1 Piazza, D.; Martorana, F.; Curaba, A.; Sambataro, D.; Valerio, M.R.; Firenze, A.; Pecorino, B.; Scollo, P.; Chiantera, V.; Scibilia, G.; Vigneri, P.; Gebbia, V.; Scandurra, G. Consistency and Quality of ChatGPT Responses Compared to Clinical Guidelines for Ovarian Cancer: A Delphi Approach.. Preprints 2024, 2024030385. https://doi.org/10.20944/preprints202403.0385.v1

Abstract

Introduction: In recent years, generative Artificial Intelligence models, such as ChatGPT, have been increasingly utilized in healthcare. Despite acknowledging the high potential of AI models in terms of quick access to sources and formulating a response to a clinical question, the results obtained using these models still require validation through comparison with established clinical guidelines. This study compares the responses of the AI model to eight clinical questions with the Italian Association of Medical Oncology (AIOM) guidelines for ovarian cancer. Materials and Methods: The authors used the Delphi method to evaluate responses from ChatGPT and the AIOM guidelines. An expert panel of healthcare professionals assessed responses based on clarity, consistency, comprehensiveness, usability, and quality using a 5-point Likert scale. The GRADE methodology assessed the evidence quality and the recommendations' strength. Results: A survey involving 14 physicians revealed that the AIOM guidelines consistently scored higher averages compared to the AI models with a statistically significant difference. Post-hoc tests showed that AIOM guidelines significantly differed from all AI models, with no significant difference among the AI models. Conclusions: While AI models can provide rapid responses, they must match established clinical guidelines regarding clarity, consistency, comprehensiveness, usability, and quality. These findings underscore the importance of relying on expert-developed guidelines in clinical decision-making and highlight potential areas for AI model improvement.

Keywords

artificial intelligence; ChatGPT; ovarian carcinoma; guidelines

Subject

Medicine and Pharmacology, Oncology and Oncogenics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.