Submitted:
25 June 2025
Posted:
27 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- How do conversational explanations impact the understandability of HCPs in AI’s decision-making?
- How does the inclusion of scientific evidence in explaining AI’s decision-making calibrate the trust of HCPs in the AI’s decisions?
- How do HCPs find our integrated visual and conversational explanations useful in evaluating the patient’s risk of diabetes as well as providing recommendations to improve patient conditions?
- As an artifact contribution, we present an decision support system with a novel design that integrates interactive visualizations with conversational AI for diabetes risk assessment, enabling healthcare professionals to efficiently explore AI assessments while maintaining structured analysis capabilities. The code is open-sourced on GitHub1.
- As theoretical contributions, we introduce: (1) an approach for grounding AI explanations in scientific evidence, enabling HCPs to validate AI findings against established medical knowledge and bridging the gap between AI systems and evidence-based clinical practice; (2) a hybrid approach for handling user prompts that combines specialized models for analytical functions with general LLMs for broader queries, ensuring support for healthcare professionals’ diverse information needs; and (3) a feature range analysis technique that identifies "AI-observed ranges" of values most influential in predictions, enabling systematic examination of how AI systems utilize different value ranges and providing deeper insight on feature contribution in decision-making.
- As an empirical contribution, we present findings from a mixed-methods study with 30 healthcare professionals and multiple co-design sessions, revealing that healthcare professionals build appropriate trust incrementally through evidence-grounded explanations rather than accepting AI recommendations at face value, and integrated visual and conversational interfaces enhance clinical decision-making by supporting both analytical reasoning and contextual understanding.
2. Related Work and Background
2.1. Clinical Decision Support Systems with Explainable AI
2.2. Conversational Explanations and Interfaces
2.3. Evidence-based Medicine and AI Integration
2.4. Clinical Decision Support Systems for Type 2 Diabetes
2.5. Technical Background
2.5.1. Natural Language Processing for Model Explanations
2.5.2. Feature Importance Analysis Methods
3. Technical Implementation
3.1. System Architecture and Components


3.1.1. Chatbot Implementation
Hybrid Query Processing Architecture
Model Selection Rationale
Contextual Grounding Mechanism
Scientific Evidence Integration
3.1.2. Feature Importance Visualization
3.1.3. Feature Range Analysis
- Filter all samples in the dataset predicted in the same class
- Identify the 25th and 75th percentiles of values for each important factor
- Define the range between these percentiles as the source of the factor’s influence on predictions
3.1.4. Patient Record Visualization
- Distribution Context: Building on established visualization patterns in healthcare interfaces [33], each metric is shown against a background distribution graph rendered in grey, providing immediate context for how individual values compare to the broader patient population. This design choice helps healthcare professionals quickly identify unusual or concerning values.
- Range Markers: we incorporate minimum and maximum value indicators from the dataset to establish clear boundaries for normal ranges.
- Clinical Thresholds: We implemented a dual-threshold system for critical health indicators, using color-coding and visual markers to highlight warning and critical levels. This design ensures immediate visibility of concerning values while maintaining a clear distinction between different risk levels.
- BMI: Overweight and obesity status
- Blood Pressure: Pre-hypertension and hypertension
- Glucose: Pre-diabetes and diabetes
- Insulin: Potential and actual insulin resistance
3.1.5. Recommendation System
- Counterfactual analysis to identify mathematically valid changes
- Clinical guidelines to ensure medical appropriateness
- Sequential timeline analysis to support gradual implementation
- Natural language generation for actionable guidance
3.2. Model and Dataset
3.3. User-Centered Design Process
4. User Study
4.1. Study Setup
![]() |
4.2. Study Procedure
4.3. Evaluation Measures
4.4. Thematic Analysis
5. Findings
5.1. RQ1: Impact of Conversational Explanations on Understandability
5.1.1. Interactions with the Chatbot Increase Interpretability of the Explanation Methods (D) [33]
“[The most helpful aspect of conversing with the AI was the] ability to interact with it to gain further insights and clarification on the data provided. Being able to ask follow-up questions on things like how certain factors influence risk or why certain factors were included helped me better understand how the AI was processing the data.”
Textual explanations clarified visuals, but after a while, a chatbot may become unnecessary (D-I)
“It was very helpful [to ask about feature ranges] because I didn’t exactly understand what the difference was between the AI-observed range and the scientific range. Now I understand the key differences between both.”
5.1.2. HCPs Value Textual Information That Is Easy to Understand and Quick to Read (D-I) [12]
“[The conversation with the AI] was quicker than when I Google something and gives me the facts without me needing to read through pages and pages of information to find what I want.”
“The AI was able to give a summary of the data in a simple to read and understand format whereas I would have to read through the data and come to a conclusion myself. The AI doing it for me helped in this process.”(P8)
5.2. RQ2: Impact of Scientific Evidence on Trust Calibration
5.2.1. Evidence-Based Explanation Builds Trust in AI (D) [12,33]
“This [evidence] is absolutely necessary when a clinician is interacting with an AI system. When evidence and reference is provided, not only it means that the AI decision is evidence based but also it shows with good confidence that it is not generated by the AI system in error.”
This is reflected in the quantitative results (Figure 4) for Task 2B (Range Analysis), which showed improved Trust ratings (, ) compared to Task 1.“If the difference [between AI observed and scientific ranges] was high then I would be more skeptical of the AI as we know that scientific data is robust and has been analyzed thoroughly however with the overlap shown it enhances my confidence as it shows that the AI is aligned with previous scientific research.”(P22)
5.2.2. AI Cannot Fully Replace a Clinician and Therefore Cannot be Trusted Fully (D) [12]
“It [scientific evidence] slightly increased trust, but just because a paper is referenced does not mean that it is scientifically robust or clinically significant, I would want to review the source myself to see if it was trustworthy.”(P2)
5.2.3. Uncertainty About Data Sources Leads to Distrust of AI Output (D) [12]
“I have a lot of confidence in the scientific ranges and I don’t know enough about the datasets the AI is using to know how accurate its assessment is.”
5.2.4. Lack of Understanding of Feature Range Visualization Causes Mistrust in AI (D) [33]
5.3. RQ3: Usefulness of Integrated Visual and Conversational Explanations
5.3.1. HCPs Used Both Visual and Conversational Components to Analyze Risk and Get Recommendations (D) [33]
5.3.2. The Dashboard Combines Patient Data for Faster Risk Analysis, but the Information Remains Insufficient (D-I) [33]
“It gives me some key metrics to perform an initial risk status. Clear. I don’t need to scrawl through the patient record to find what I need.”
Visualizations Facilitate Quick Understanding (D) [33]
5.3.3. Recommendations Are Too Generic and Need to be Personalized Based on Comprehensive Patient Background (D) [33]
6. Discussion
6.1. The Role of Conversational Explanations in Understanding
6.2. Evidence-Based Explanations and Trust Calibration
6.3. Complementary Nature of Visual and Conversational Modalities
6.4. Limitations
6.5. Implications for Future Research
Adaptive Conversational Interfaces Based on User Expertise
Transparency in Data Sources and Model Development
Personalized, Context-Aware Recommendations
7. Conclusion
Acknowledgments
References
- Reddy, S.; Rogers, W.; Makinen, V.P.; Coiera, E.; Brown, P.; Wenzel, M.; Weicken, E.; Ansari, S.; Mathur, P.; Casey, A.; et al. Evaluation framework to guide implementation of AI systems into healthcare settings. BMJ Health Care Inform 2021, 28. [Google Scholar] [CrossRef]
- Secinaro, S.; Calandra, D.; Secinaro, A.; Muthurangu, V.; Biancone, P. The role of artificial intelligence in healthcare: a structured literature review. BMC Medical Informatics and Decision Making 2021, 21. [Google Scholar] [CrossRef]
- Alowais, S.A.; Alghamdi, S.S.; Alsuhebany, N.; Alqahtani, T.; Alshaya, A.I. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Medical Education 2023, 23. [Google Scholar] [CrossRef] [PubMed]
- Shinners, L.; Aggar, C.; Grace, S.; Smith, S. Exploring healthcare professionals’ understanding and experiences of artificial intelligence technology use in the delivery of healthcare: An integrative review. Health Informatics Journal 2019, 26, 1225–1236. [Google Scholar] [CrossRef]
- Asan, O.; Bayrak, E.; Choudhury, A. Artificial Intelligence and Human Trust in Healthcare: Focus on Clinicians. Journal of Medical Internet Research 2020, 22, e15154. [Google Scholar] [CrossRef]
- Rony, M. “i wonder if my years of training and expertise will be devalued by machines”: concerns about the replacement of medical professionals by artificial intelligence. Sage Open Nursing 2024, 10. [Google Scholar] [CrossRef] [PubMed]
- Kerstan, S.; Bienefeld, N.; Grote, G. Choosing human over AI doctors? How comparative trust associations and knowledge relate to risk and benefit perceptions of AI in healthcare. Risk Analysis 2023, 44, 939–957. [Google Scholar] [CrossRef]
- Sutton, R.T.; Pincock, D.; Baumgart, D.C.; Sadowski, D.C.; Fedorak, R.N.; Kroeker, K.I. An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ digital medicine 2020, 3, 17. [Google Scholar] [CrossRef] [PubMed]
- Foraker, R.E.; Kite, B.; Kelley, M.M.; Lai, A.M.; Roth, C.; Lopetegui, M.A.; Shoben, A.B.; Langan, M.; Rutledge, N.L.; Payne, P.R. EHR-based visualization tool: adoption rates, satisfaction, and patient outcomes. eGEMs 2015, 3, 1159. [Google Scholar] [CrossRef]
- Hassan, M.; Kushniruk, A.; Borycki, E. Barriers to and facilitators of artificial intelligence adoption in health care: scoping review. JMIR Human Factors 2024, 11, e48633. [Google Scholar] [CrossRef]
- Loh, H.W.; Ooi, C.P.; Seoni, S.; Barua, P.D.; Molinari, F.; Acharya, U.R. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Computer Methods and Programs in Biomedicine 2022, 226, 107161. [Google Scholar] [CrossRef]
- Rajashekar, N.C.; Shin, Y.E.; Pu, Y.; Chung, S.; You, K.; Giuffre, M.; Chan, C.E.; Saarinen, T.; Hsiao, A.; Sekhon, J.; et al. Human-algorithmic interaction using a large language model-augmented artificial intelligence clinical decision support system. In Proceedings of the Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, 2024; pp. 1–20. [Google Scholar]
- Cheng, H.F.; Wang, R.; Zhang, Z.; O’connell, F.; Gray, T.; Harper, F.M.; Zhu, H. Explaining decision-making algorithms through UI: Strategies to help non-expert stakeholders. In Proceedings of the Proceedings of the 2019 chi conference on human factors in computing systems, 2019; pp. 1–12. [Google Scholar]
- Ooge, J.; Stiglic, G.; Verbert, K. Explaining artificial intelligence with visual analytics in healthcare. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2022, 12, e1427. [Google Scholar] [CrossRef]
- Nguyen, V.B.; Schlötterer, J.; Seifert, C. Explaining Machine Learning Models in Natural Conversations: Towards a Conversational XAI Agent. arXiv preprint 2022, arXiv:2209.02552. [Google Scholar]
- Nobani, N.; Mercorio, F.; Mezzanzanica, M.; et al. Towards an Explainer-agnostic Conversational XAI. In Proceedings of the IJCAI, 2021; pp. 4909–4910. [Google Scholar]
- Slack, D.; Krishna, S.; Lakkaraju, H.; Singh, S. Explaining machine learning models with interactive natural language conversations using TalkToModel. Nature Machine Intelligence 2023. [Google Scholar] [CrossRef]
- Dazeley, A.; Karpowicz, K.; Menzies, T. Levels of explainable artificial intelligence for human-aligned conversational explanations. Artificial Intelligence 2021, 298, 103525. [Google Scholar] [CrossRef]
- Rechkemmer, A.; Yin, M. When confidence meets accuracy: Exploring the effects of multiple performance indicators on trust in machine learning models. In Proceedings of the Proceedings of the 2022 chi conference on human factors in computing systems, 2022; pp. 1–14. [Google Scholar]
- Wang, D.; Yang, Q.; Abdul, A.; Lim, B.Y. Designing theory-driven user-centric explainable AI. In Proceedings of the Proceedings of the 2019 CHI conference on human factors in computing systems, 2019; pp. 1–15. [Google Scholar]
- Yang, Q.; Steinfeld, A.; Zimmerman, J. Unremarkable AI: Fitting intelligent decision support into critical, clinical decision-making processes. In Proceedings of the Proceedings of the 2019 CHI conference on human factors in computing systems, 2019; pp. 1–11. [Google Scholar]
- Holzinger, A.; Malle, B.; Kieseberg, P.; Roth, P.M.; Müller, H.; Reihs, R.; Zatloukal, K. Towards the augmented pathologist: Challenges of explainable-ai in digital pathology. arXiv preprint 2017, arXiv:1712.06657. [Google Scholar]
- Xie, Y.; Chen, M.; Kao, D.; Gao, G.; Chen, X. CheXplain: enabling physicians to explore and understand data-driven, AI-enabled medical imaging analysis. In Proceedings of the Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 2020; pp. 1–13. [Google Scholar]
- Yang, Q.; Hao, Y.; Quan, K.; Yang, S.; Zhao, Y.; Kuleshov, V.; Wang, F. Harnessing biomedical literature to calibrate clinicians’ trust in AI decision support systems. In Proceedings of the Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 2023; pp. 1–14. [Google Scholar]
- Schwartz, J.M.; George, M.; Rossetti, S.C.; Dykes, P.C.; Minshall, S.R.; Lucas, E.; Cato, K.D. Factors influencing clinician trust in predictive clinical decision support systems for in-hospital deterioration: qualitative descriptive study. JMIR Human Factors 2022, 9, e33960. [Google Scholar] [CrossRef]
- Tonekaboni, S.; Joshi, S.; McCradden, M.D.; Goldenberg, A. What clinicians want: contextualizing explainable machine learning for clinical end use. In Proceedings of the Machine learning for healthcare conference. PMLR, 2019; pp. 359–380. [Google Scholar]
- Bharati, S.; Mondal, M.R.H.; Podder, P. A Review on Explainable Artificial Intelligence for Healthcare: Why, How, and When? IEEE Transactions on Artificial Intelligence 2024, 5, 1429–1442. [Google Scholar] [CrossRef]
- Amann, J.; Blasimme, A.; Vayena, E.; Frey, D.; Madai, V.I.; Consortium, P. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC medical informatics and decision making 2020, 20, 1–9. [Google Scholar] [CrossRef]
- Sadeghi, Z.; Alizadehsani, R.; Cifci, M.A.; Kausar, S.; Rehman, R.; Mahanta, P.; Bora, P.K.; Almasri, A.; Alkhawaldeh, R.S.; Hussain, S.; et al. A brief review of explainable artificial intelligence in healthcare. arXiv preprint 2023, arXiv:2304.01543. [Google Scholar]
- Röber, T.E.; Goedhart, R.; Birbil, S. Clinicians’ Voice: Fundamental Considerations for XAI in Healthcare. arXiv preprint 2024, arXiv:2411.04855. [Google Scholar]
- Kushniruk, A.W.; et al. Issues and challenges in designing user interfaces for healthcare applications. In Proceedings of the Studies in Health Technology and Informatics, 2011. [Google Scholar]
- O’Sullivan, D.; Fraccaro, P.; Carson, E.; Weller, P. Decision time for clinical decision support systems. Clinical medicine 2014, 14, 338–341. [Google Scholar] [CrossRef]
- Bhattacharya, A.; Ooge, J.; Stiglic, G.; Verbert, K. Directive Explanations for Monitoring the Risk of Diabetes Onset: Introducing Directive Data-Centric Explanations and Combinations to Support What-If Explorations. In Proceedings of the Proceedings of the 28th International Conference on Intelligent User Interfaces, 2023; pp. 204–219. [Google Scholar]
- Kwon, B.C.; Choi, M.J.; Kim, J.T.; Choi, E.; Kim, Y.B.; Kwon, S.; Sun, J.; Choo, J. Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE transactions on visualization and computer graphics 2018, 25, 299–309. [Google Scholar] [CrossRef] [PubMed]
- Rostamzadeh, N.; Abdullah, S.S.; Sedig, K. Visual analytics for electronic health records: a review. Informatics 2021, 8, 12. [Google Scholar] [CrossRef]
- Dai, X.; Keane, M.T.; Shalloo, L.; Ruelle, E.; Byrne, R.M. Counterfactual explanations for prediction and diagnosis in XAI. In Proceedings of the Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, 2022; pp. 215–226. [Google Scholar]
- Mindlin, D.; Beer, F.; Sieger, L.N.; Heindorf, S.; Esposito, E.; Ngonga Ngomo, A.C.; Cimiano, P. Beyond one-shot explanations: a systematic literature review of dialogue-based xAI approaches. Artificial Intelligence Review 2025, 58, 81. [Google Scholar] [CrossRef]
- Wang, Q.; Anikina, T.; Feldhus, N.; van Genabith, J.; Hennig, L.; Möller, S. LLMCheckup: Conversational examination of large language models via interpretability tools. arXiv preprint 2024, arXiv:2401.12576. [Google Scholar]
- Feldhus, N.; Wang, Q.; Anikina, T.; Chopra, S.; Oguz, C.; Möller, S. InterroLang: Exploring NLP models and datasets through dialogue-based explanations. arXiv preprint 2023, arXiv:2310.05592. [Google Scholar]
- Wen, B.; Norel, R.; Liu, J.; Stappenbeck, T.; Zulkernine, F.; Chen, H. Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health. arXiv preprint 2024, arXiv:2406.13659. [Google Scholar]
- Laranjo, L.; Dunn, A.G.; Tong, H.L.; Kocaballi, A.B.; Chen, J.; Bashir, R.; Surian, D.; Gallego, B.; Magrabi, F.; Lau, A.Y.; et al. Conversational agents in healthcare: a systematic review. Journal of the American Medical Informatics Association 2018, 25, 1248–1258. [Google Scholar] [CrossRef]
- Xing, Z.; Yu, F.; Du, J.; Walker, J.S.; Paulson, C.B.; Mani, N.S.; Song, L. Conversational interfaces for health: bibliometric analysis of grants, publications, and patents. Journal of medical Internet research 2019, 21, e14672. [Google Scholar] [CrossRef]
- Alowais, S.A.; Alghamdi, S.S.; Alsuhebany, N.; Alqahtani, T.; Alshaya, A.I.; Almohareb, S.N.; Aldairem, A.; Alrashed, M.; Bin Saleh, K.; Badreldin, H.A.; et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC medical education 2023, 23, 689. [Google Scholar] [CrossRef] [PubMed]
- Kaufman, R.; Kirsh, D. Explainable AI And Visual Reasoning: Insights From Radiology. arXiv preprint 2023, arXiv:2304.03318 2023. [Google Scholar]
- Sackett, D.L.; Rosenberg, W.M.; Gray, J.M.; Haynes, R.B.; Richardson, W.S. Evidence based medicine: what it is and what it isn’t, 1996.
- Gibney, E. Has your paper been used to train an AI model? Almost certainly. Nature 2024, 632, 715–716. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Cao, L.; Danek, B.; Jin, Q.; Lu, Z.; Sun, J. Accelerating clinical evidence synthesis with large language models. arXiv preprint 2024, arXiv:2406.17755. [Google Scholar]
- Patel, R.; Brayne, A.; Hintzen, R.; Jaroslawicz, D.; Neculae, G.; Corneil, D. Retrieve to Explain: Evidence-driven Predictions with Language Models. arXiv preprint 2024, arXiv:2402.04068. [Google Scholar]
- Procter, R.; Tolmie, P.; Rouncefield, M. Holding AI to account: challenges for the delivery of trustworthy AI in healthcare. ACM Transactions on Computer-Human Interaction 2023, 30, 1–34. [Google Scholar] [CrossRef]
- Kent, D.M.; Shah, N.; colleagues. EHR-based prediction of Type 2 Diabetes in prediabetes patients using machine learning. Journal of Biomedical Informatics 2022, 132, 104121. [Google Scholar]
- Kourou, K.; Papageorgiou, E.I.; Fotiadis, D.I. Integration of decision support systems into electronic health records: A review of recent efforts. Health Informatics Journal 2021, 27, 840–857. [Google Scholar]
- Spoladore, R.; Rossi, L.; Gatti, M.; et al. OnT2D-DSS: An Ontology-Based Clinical Decision Support System for Personalized Management of Type 2 Diabetes. In Proceedings of the Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2024. [Google Scholar]
- Grechuta, M.; Patel, A.; Singh, M.; et al. Exandra: A clinical decision support system for pharmacological management in type 2 diabetes. Journal of Medical Systems 2025, 49, 45–56. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should I trust you? Explaining the predictions of any classifier. In Proceedings of the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016; pp. 1135–1144. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems, 2017; pp. 4765–4774. [Google Scholar]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the Proceedings of the 34th International Conference on Machine Learning. PMLR; 2017, pp. 3319–3328.
- Cao, T.; Raman, N.; Dervovic, D.; Tan, C. Characterizing multimodal long-form summarization: A case study on financial reports. arXiv preprint 2024, arXiv:2404.06162. [Google Scholar]
- Bhattacharya, A.; Stumpf, S.; Gosak, L.; Stiglic, G.; Verbert, K. EXMOS: Explanatory Model Steering Through Multifaceted Explanations and Data Configurations. In Proceedings of the Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, 2024; pp. 1–27. [Google Scholar]
- Chang, V.; Bailey, J.; Xu, Q.A.; Sun, Z. Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications 2023, 35, 16157–16173. [Google Scholar] [CrossRef] [PubMed]
- Gomez, O.; Holter, S.; Yuan, J.; Bertini, E. Vice: Visual counterfactual explanations for machine learning models. In Proceedings of the Proceedings of the 25th international conference on intelligent user interfaces, 2020; pp. 531–535. [Google Scholar]
- Prolific. Prolific, 2014. Accessed: February 20, 2025. 20 February.
- Bhattacharya, A.; Stumpf, S.; De Croon, R.; Verbert, K. Explanatory Debiasing: Involving Domain Experts in the Data Generation Process to Mitigate Representation Bias in AI Systems. arXiv preprint 2024, arXiv:2501.01441. [Google Scholar]
- Hoffman, R.R.; Mueller, S.T.; Klein, G.; Litman, J. Metrics for explainable AI: Challenges and prospects. arXiv preprint 2018, arXiv:1812.04608. [Google Scholar]
- Singh, R.; Miller, T.; Sonenberg, L.; Velloso, E.; Vetere, F.; Howe, P.; Dourish, P. An Actionability Assessment Tool for Explainable AI. arXiv preprint 2024, arXiv:2407.09516. [Google Scholar]
- Wijekoon, A.; Wiratunga, N.; Corsar, D.; Martin, K.; Nkisi-Orji, I.; Díaz-Agudo, B.; Bridge, D. XEQ Scale for Evaluating XAI Experience Quality. arXiv preprint 2024, arXiv:2407.10662. [Google Scholar]
- Liao, Q.V.; Gruen, D.; Miller, S. Questioning the AI: informing design practices for explainable AI user experiences. In Proceedings of the Proceedings of the 2020 CHI conference on human factors in computing systems; 2020, pp. 1–15.
- Clarke, V.; Braun, V. Thematic analysis. The journal of positive psychology 2017, 12, 297–298. [Google Scholar] [CrossRef]
- Fereday, J.; Muir-Cochrane, E. Demonstrating rigor using thematic analysis: A hybrid approach of inductive and deductive coding and theme development. International journal of qualitative methods 2006, 5, 80–92. [Google Scholar] [CrossRef]
- O’Connor, C.; Joffe, H. Intercoder reliability in qualitative research: debates and practical guidelines. International journal of qualitative methods 2020, 19, 1609406919899220. [Google Scholar] [CrossRef]
- Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. biometrics 1977, 159–174. [Google Scholar] [CrossRef]
| 1 | |
| 2 | The dataset is made available in the supplementary materials |
| 3 | |
| 4 |



| Themes and Sub-themes | Supporting Participants(n/30): IDs |
|---|---|
| RQ1: Impact of Conversational Explanations on Understandability | |
| Interactions with the chatbot increase interpretability of the explanation methods (D) | (29/30): 1-8, 10-30 |
| Textual explanations clarified visuals, but after a while, a chatbot may become unnecessary (D-I) | (20/30): 1, 3-5, 7-10, 13-14, 16, 18-21, 23-24, 26, 29-30 |
| HCPs value textual information that is easy to understand and quick to read (D-I) | (16/30): 1-4, 8, 10, 12, 15, 19-20, 22-24, 27-28 |
| RQ2: Impact of Scientific Evidence on Trust Calibration | |
| Evidence-based explanation builds trust in AI (D) | (28/30): 1-3, 5-8, 10-30 |
| AI cannot fully replace a clinician and therefore cannot be trusted fully (D) | (8/30): 2, 5, 9, 14, 17, 20-21, 26 |
| Uncertainty about data sources leads to distrust of AI output (D) | (8/30): 3, 5, 7, 10, 18, 20, 25, 28 |
| Lack of understanding of a visualization causes mistrust of it (D) | (2/30): 4, 17 |
| RQ3: Usefulness of Integrated Visual and Conversational Explanations | |
| HCPs used both visual and conversational components to analyze risk and get recommendations (D) | (25/30): 1-3, 5-8, 11-13, 15-16, 18-27, 29 |
| Dashboard combines patient data for faster risk analysis, but the information remains insufficient (D-I) | (24/30): 1-10, 14-17, 19, 21-23, 26, 28, 30 |
| Visualizations facilitate quick understanding (D) | (7/30): 1, 5, 14, 16, 19, 22, 28 |
| Recommendations are too generic and need to be personalized based on comprehensive patient background (D) | (26/30): 1-5, 8, 10-11, 13-21, 23-24, 26-27, 29-30 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
