Submitted:
06 March 2024
Posted:
07 March 2024
You are already at the latest version
Abstract
Keywords:
Introduction
Materials and Methods
Results
Discussion
Limitations
Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Armstrong, D.K.; Alvarez, R.D.; Bakkum-Gamez, J.N.; Barroilhet, L.; Behbakht, K.; Berchuck, A.; Chen, L.; Cristea, M.; DeRosa, M.; Eisenhauer, E.L.; et al. Ovarian Cancer, Version 2.2020, NCCN Clinical Practice Guidelines in Oncology. Journal of the National Comprehensive Cancer Network 2021, 19, 191–226. [Google Scholar] [CrossRef] [PubMed]
- I Numeri Del Cancro 2023 | Associazione Italiana Registri Tumori. Available online: https://www.registri-tumori.it/cms/notizie/i-numeri-del-cancro-2023 (accessed on 26 February 2024).
- National Comprehensive Cancer Network - Home. Available online: https://www.nccn.org (accessed on 7 February 2024).
- Colombo, N.; Sessa, C.; Du Bois, A.; Ledermann, J.; McCluggage, W.G.; McNeish, I.; Morice, P.; Pignata, S.; Ray-Coquard, I.; Vergote, I.; et al. ESMO–ESGO Consensus Conference Recommendations on Ovarian Cancer: Pathology and Molecular Biology, Early and Advanced Stages, Borderline Tumours and Recurrent Disease. Annals of Oncology 2019, 30, 672–705. [Google Scholar] [CrossRef] [PubMed]
- LINEE GUIDA CARCINOMA DELL’OVAIO. Available online: https://www.aiom.it/linee-guida-aiom-2021-carcinoma-dellovaio/ (accessed on 7 February 2024).
- OpenAI. Available online: https://openai.com/ (accessed on 7 February 2024).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc., 2017; Volume 30. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc., 2020; Volume 33, pp. 1877–1901. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models Are Unsupervised Multitask Learners.
- Xu, L.; Sanders, L.; Li, K.; Chow, J.C.L. Chatbot for Health Care and Oncology Applications Using Artificial Intelligence and Machine Learning: Systematic Review. JMIR Cancer 2021, 7, e27850. [Google Scholar] [CrossRef] [PubMed]
- Papachristou, N.; Kotronoulas, G.; Dikaios, N.; Allison, S.J.; Eleftherochorinou, H.; Rai, T.; Kunz, H.; Barnaghi, P.; Miaskowski, C.; Bamidis, P.D. Digital Transformation of Cancer Care in the Era of Big Data, Artificial Intelligence and Data-Driven Interventions: Navigating the Field. Seminars in Oncology Nursing 2023, 39, 151433. [Google Scholar] [CrossRef] [PubMed]
- Taber, P.; Armin, J.S.; Orozco, G.; Del Fiol, G.; Erdrich, J.; Kawamoto, K.; Israni, S.T. Artificial Intelligence and Cancer Control: Toward Prioritizing Justice, Equity, Diversity, and Inclusion (JEDI) in Emerging Decision Support Technologies. Curr Oncol Rep 2023, 25, 387–424. [Google Scholar] [CrossRef] [PubMed]
- Tawfik, E.; Ghallab, E.; Moustafa, A. A Nurse versus a Chatbot ‒ the Effect of an Empowerment Program on Chemotherapy-Related Side Effects and the Self-Care Behaviors of Women Living with Breast Cancer: A Randomized Controlled Trial. BMC Nurs 2023, 22, 102. [Google Scholar] [CrossRef] [PubMed]
- Xue, V.W.; Lei, P.; Cho, W.C. The Potential Impact of ChatGPT in Clinical and Translational Medicine. Clinical & Translational Med 2023, 13, e1216. [Google Scholar] [CrossRef]
- Dave, T.; Athaluri, S.A.; Singh, S. ChatGPT in Medicine: An Overview of Its Applications, Advantages, Limitations, Future Prospects, and Ethical Considerations. Front. Artif. Intell. 2023, 6, 1169595. [Google Scholar] [CrossRef] [PubMed]
- Taylor, E. We Agree, Don’t We? The Delphi Method for Health Environments Research. HERD 2020, 13, 11–23. [Google Scholar] [CrossRef] [PubMed]
- Guyatt, G.H.; Oxman, A.D.; Vist, G.E.; Kunz, R.; Falck-Ytter, Y.; Alonso-Coello, P.; Schünemann, H.J. GRADE: An Emerging Consensus on Rating Quality of Evidence and Strength of Recommendations. BMJ 2008, 336, 924–926. [Google Scholar] [CrossRef] [PubMed]
- Yeo, Y.H.; Samaan, J.S.; Ng, W.H.; Ting, P.-S.; Trivedi, H.; Vipani, A.; Ayoub, W.; Yang, J.D.; Liran, O.; Spiegel, B.; et al. Assessing the Performance of ChatGPT in Answering Questions Regarding Cirrhosis and Hepatocellular Carcinoma. Clin Mol Hepatol 2023. [Google Scholar] [CrossRef] [PubMed]
- Cascella, M.; Montomoli, J.; Bellini, V.; Bignami, E. Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios. J Med Syst 2023, 47, 33. [Google Scholar] [CrossRef] [PubMed]
- Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.K.; Chua, M.; Rickard, M.; Lorenzo, A. ChatGPT and Large Language Model (LLM) Chatbots: The Current State of Acceptability and a Proposal for Guidelines on Utilization in Academic Medicine. Journal of Pediatric Urology 2023, S1477513123002243. [Google Scholar] [CrossRef] [PubMed]
- Schulte, B. Capacity of ChatGPT to Identify Guideline-Based Treatments for Advanced Solid Tumors. Cureus 2023. [Google Scholar] [CrossRef] [PubMed]
- Kothari, A.N. ChatGPT, Large Language Models, and Generative AI as Future Augments of Surgical Cancer Care. Ann Surg Oncol 2023, 30, 3174–3176. [Google Scholar] [CrossRef] [PubMed]
- Hamilton, Z.; Naffakh, N.; Reizine, N.M.; Weinberg, F.; Jain, S.; Gadi, V.K.; Bun, C.; Nguyen, R.H.-T. Relevance and Accuracy of ChatGPT-Generated NGS Reports with Treatment Recommendations for Oncogene-Driven NSCLC. JCO 2023, 41, 1555–1555. [Google Scholar] [CrossRef]
- Cheng, K.; Wu, H.; Li, C. ChatGPT/GPT-4: Enabling a New Era of Surgical Oncology. International Journal of Surgery, 2023; Publish Ahead of Print. [Google Scholar] [CrossRef]
- Ebrahimi, B.; Howard, A.; Carlson, D.J.; Al-Hallaq, H. ChatGPT: Can a Natural Language Processing Tool Be Trusted for Radiation Oncology Use? International Journal of Radiation Oncology*Biology*Physics 2023, S0360301623003541. [Google Scholar] [CrossRef] [PubMed]
- Haemmerli, J.; Sveikata, L.; Nouri, A.; May, A.; Egervari, K.; Freyschlag, C.; Lobrinus, J.A.; Migliorini, D.; Momjian, S.; Sanda, N.; et al. ChatGPT in Glioma Patient Adjuvant Therapy Decision Making: Ready to Assume the Role of a Doctor in the Tumour Board? Neurology, 2023. [Google Scholar]
- Huang, Y.; Gomaa, A.; Semrau, S.; Haderlein, M.; Lettmaier, S.; Weissmann, T.; Grigo, J.; Tkhayat, H.B.; Frey, B.; Gaipl, U.; et al. Benchmarking ChatGPT-4 on a Radiation Oncology in-Training Exam and Red Journal Gray Zone Cases: Potentials and Challenges for Ai-Assisted Medical Education and Decision Making in Radiation Oncology. Front. Oncol. 2023, 13, 1265024. [Google Scholar] [CrossRef] [PubMed]


| Model | Prompt |
|---|---|
| ChatGPT-3.5 | [Clinical Question #]* (as proposed from source document) |
| ChatGPT-4 | [Clinical Question #]* (as proposed from source document) |
| ChatGPT-4 | Act as an Italian multidisciplinary oncology group. We ask a question according to the PICO method. Reply extensively based on national and international guidelines and current evidence, indicate the limitations of the evidence, and indicate the ratio of benefits to harms. Also, provide answers with a formal GRADE approach indicating the overall quality of evidence and strength of recommendation. § [Clinical Question #]* |
| Domains | Questions | Mean | CI (± 95%) |
|---|---|---|---|
| clarity | How do you think the guideline expresses its recommendations? | 4.28 | 0.14 |
| How does the ChatGPT-3.5 model’s response to the clinical question express its recommendations? | 1.23 | 0.12 | |
| How does the ChatGPT-4 model’s response to the clinical question express its recommendations? | 2.23 | 0.21 | |
| How does the prompted ChatGPT-4 model’s response to the clinical question express its recommendations? | 3.31 | 0.21 | |
| relevance | How relevant is the evidence in the guideline for the recommendations? | 4.35 | 0.15 |
| How relevant is the evidence presented in the ChatGPT-3.5 model’s response to the clinical question for the recommendations made? | 1.36 | 0.09 | |
| How relevant is the evidence presented in the ChatGPT-4 model’s response to the clinical question for the recommendations made? | 2.25 | 0.24 | |
| How relevant is the evidence presented in the prompted ChatGPT-4 model’s response to the clinical question for the recommendations made? | 3.15 | 0.24 | |
| comprehensiveness | How comprehensive are the guidelines in addressing the topic? | 4.53 | 0.13 |
| How comprehensive is the ChatGPT-3.5 model’s response to the clinical question in addressing the topic? | 1.11 | 0.06 | |
| How comprehensively does the ChatGPT-4 model’s response to the clinical question is in addressing the topic? | 2.13 | 0.22 | |
| How comprehensive is the prompted ChatGPT-4 model’s response to the clinical question in addressing the topic? | 2.95 | 0.23 | |
| applicability | How applicable is the guide to clinical practice? | 4.28 | 0.14 |
| How applicable is the ChatGPT-3.5 model’s response to the clinical question to clinical practice? | 1.23 | 0.12 | |
| How applicable is the ChatGPT-4 model’s response to the clinical question to clinical practice? | 2.26 | 0.23 | |
| How applicable is the prompted ChatGPT-4 model’s response to the clinical question to clinical practice? | 2.82 | 0.27 | |
| quality | According to the GRADE approach, how would you rate the strength of the recommendations and the quality of the evidence presented in the guideline? 2.3 0.16 According to the GRADE approach |
||
| How would you rate the recommendations’ strength and the evidence’s quality presented in the ChatGPT-3.5 model’s response? 1.88 0.12 According to the GRADE approach |
|||
| How would you rate the recommendations’ strength and the evidence’s quality presented in the ChatGPT-4 model’s response? 2.49 0.26 According to the GRADE approach |
|||
| How would you rate the recommendations’ strength and the evidence’s quality presented in the prompted ChatGPT-4 model’s response? | 2.38 | 0.26 |
| Domain 1 | Domain 2 | Mean Difference | Adjusted p-value | Lower Bound | Upper Bound | Reject Null Hypothesis |
|---|---|---|---|---|---|---|
| ChatGPT-3.5 | ChatGPT-4 | 0.91 | 0.0618 | -0.037 | 1.857 | False |
| ChatGPT-3.5 | Guidelines | 2.586 | 0.001 | 1.639 | 3.533 | True |
| ChatGPT-3.5 | Prompted ChatGPT-4 | 1.56 | 0.0012 | 0.613 | 2.507 | True |
| ChatGPT-4 | Guidelines | 1.676 | 0.001 | 0.729 | 2.623 | True |
| ChatGPT-4 | Prompted ChatGPT-4 | 0.65 | 0.242 | -0.297 | 1.597 | False |
| Guidelines | Prompted ChatGPT-4 | -1.026 | 0.0314 | -1.973 | -0.079 | True |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).