Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

CLEAR: Pilot Testing of a Tool to Standardize Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models

Version 1 : Received: 17 November 2023 / Approved: 17 November 2023 / Online: 20 November 2023 (07:24:54 CET)

A peer-reviewed article of this Preprint also exists.

Sallam M, Barakat M, Sallam M (November 24, 2023) Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models. Cureus 15(11): e49373. doi:10.7759/cureus.49373 Sallam M, Barakat M, Sallam M (November 24, 2023) Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models. Cureus 15(11): e49373. doi:10.7759/cureus.49373

Abstract

Artificial intelligence (AI)-based conversational models, such as ChatGPT, Microsoft Bing, and Google Bard, emerged as valuable sources of health information for the lay individuals. However, the accuracy of information provided by these AI models remains a significant concern. This pilot study aimed to test a new tool referred to as “CLEAR”, designed to assess the quality of health information delivered by AI-based models. Tool development involved a literature review on health information quality, followed by initial establishment of the CLEAR tool comprising five items that aimed to assess the following: completeness of content in response to the prompt, lack of false information, evidence support, appropriateness, and relevance of the generated content. Each item was scored on a 5-point Likert scale from excellent to poor. Content validity was checked by expert review of the initial items. Pilot testing involved 32 healthcare professionals using the CLEAR tool to assess content on eight different health topics deliberately designed with varying qualities. The internal consistency was checked using the Cronbach α. Feedback through the pilot test resulted in language modifications to improve the clarity of the items. The final CLEAR tool was used to assess health information quality generated through four different AI-based models in five different, yet common health topics. The AI models were ChatGPT 3.5, ChatGPT 4, Bing, and Bard, and the content generated was scored by two independent raters with Cohen κ to assess the inter-rater agreement. The final five CLEAR items were: (1) Is the content sufficient? (2) Is the content accurate? (3) Is the content evidence-based? (4) Is the content clear, concise, and easy to understand? and (5) Is the content free from irrelevant information? Pilot testing using the eight different health topics revealed an acceptable internal consistency with a Cronbach α range of 0.669–0.981. The use of the final CLEAR tool yielded the following average scores: Bing (mean=24.4±0.42), ChatGPT-4 (mean=23.6±0.96), Bard (mean=21.2±1.79), and ChatGPT-3.5 (mean=20.6±5.20). The inter-rater agreement revealed the following Cohen κ values: for ChatGPT-3.5 (κ=0.875, P<.001), ChatGPT-4 (κ=0.780, P<.001), Bing (κ=0.348, P=.037), and Bard (κ=.749, P<.001). The CLEAR tool is a brief yet helpful tool that can aid to standardize testing of the quality of health information generated by the AI-based conversational models. Future studies are recommended to validate the utility of the CLEAR tool to assess the quality of the AI-generated health-related content using a larger sample across various complex health topics.

Keywords

Health Information Quality; AI-generated Health Information; AI in Healthcare; Health Information Reliability; Assessment Tool Feasibility

Subject

Public Health and Healthcare, Other

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.