Submitted:
17 November 2025
Posted:
18 November 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methodology
2.1. Design of Study and Reasons
2.1.1. Selection of Research Format: Systematic Literature Review (SLR)
2.1.2. Safety of Objectivity and Reproducibility
2.1.3. Analyzing Methodological Differences
2.2. Define the Purpose and Conceptual Focus
2.2.1. The Human-Centered Property of Explainability
2.2.2. The Shift from XAI to Explanatory AI (YAI) Paradigm
2.3. Research Questions
- When did the primary interest in Human-Centered Evaluation (HCE) of XAI begin to rise ?
- What is the main theoritical challenge this review addresses ?
- What are the prevailing characteristics of Explainable AI (XAI) systems investigated within Human-Centered Evaluation(HCE) frameworks ?
- Which methodological approaches and metrics are most frequently employed to evaluate the effectiveness of XAI from a user’s perspective ?
- What methodological and theoretical limitations (e.g., lack of standardization, external validity issues, cognitive underpinnings) hinder the consistent and comparable human-centered evaluation of XAI ?
- Why is there a need to shift the focus from existing XAI evaluation metrics to a Human-Centered Evaluation (HCE) approach grounded in cognitive or social theories ?
2.4. Organizer Plan of Analysis and Review Stages.
- I.
- Finding and selecting sources:
- II.
- Pulling Out and Tagging Data:
- III.
- Integrating Findings and Aggregating Them:
- IV.
- Appraising the Standard of the Literature:
3. Search and Selection Strategy
3.1. Definition of Data Sources and Formulation of Query
3.1.1. Selection of academic databases and search scope
Search Query:(Group 1: "Explainable AI" OR XAI OR Interpretability OR Transparency)AND(Group 2: "User Study" OR Empirical Evaluation OR "Human-Centered" OR Assessment)AND(Group 3: Trust OR Comprehension OR Satisfaction OR Usability)
3.1.2. Boolean Search Query Formulation
3.2. Inclusion and Exclusion Criteria (I/E)
| Category | Inclusion Criteria (I) | Exclusion Criteria (E) |
|---|---|---|
| I. Methodology | I1. The article should specifically mention an empirical user study involving data collection from human subjects [6]. | E1. Works pertaining only to the XAI algorithms (e.g., technical advances to SHAP or LIME) that lacked human-subject validation [7]. |
| II. Conceptual focus | I2. The key research question must center on evaluating the effect of XAI on cognitive or behavioral factors, such as Trust calibration, Perceived comprehension, or Decision-Making Quality [2]. | E2. Secondary literature: e.g., narrative reviews, meta-analyses, theoretical essays, regulatory proposals, which would only be used for theoretical framing, never use in coding primarily data. |
| III. Source rigor | I3. Publications should originate from peer-reviewed journals or proceedings of high-tier academic conferences (e.g., IEEE Transactions, ACM/CHI Proceedings). | E3. Non-peer-reviewed media including preprints (e.g., arXiv), technical reports or workshop abstracts. |
| IV. Language | I4. Full-text publication should be English. | E4. Publications not in English. |
3.3. Protocol: step-by-step process
4. Extraction and coding
4.1. Formulation of the coding and standardization protocol
4.1.1. Inter-Coder reliability (Reliability)
4.2. Coding characteristics of XAI systems (The Explanandum).
4.2.1. Format and type of data coding.
4.3. Coding experimental design and stakeholders.
4.3.1. Coding of experiment design scheme.
4.3.2. Coding of target audience (Stakeholders):
4.4. Core evaluation metrics coding.
4.4.1. Subjective measures
4.4.2. Objective measures
5. Synthesis and Critical Appraisal
5.1. Synthesis process: thematic synthesis and taxonomy construction
5.1.1. Thematic synthesis approach
5.1.2. Construction of the conceptual framework
5.2. The challenges of critical appraisal and standardization.
5.2.1. Metric diversity and failure to standardize.
5.2.2. Identification of theoretical and cognitive gaps
5.3. Methodological limitations
6. Results
6.1. Literature Descriptive Statistics
6.1.1. Publication dynamics and domain distribution.
6.1.2. Data Characteristics
6.2. XAI System Features
6.2.1. Explanation Scope
6.2.2. Target audience (stakeholders)
6.3. Core Evaluation Metrics Coding
6.3.1. Subjective Metrics (User Perception)
6.3.2. Objective Metrics (Behavior and Performance)
6.4. Analysis and Synthesis of XAI Effectiveness
7. Critical Appraisal and Discussion
7.1. Methodological Challenges and Standardization Gaps
7.1.1. Variation in Metrics and Lack of Standardization
7.2. Theoretical and Cognitive Gap (XAI vs. YAI)
7.2.1. Cognitive Grounding Shortcomings
7.2.2. The XAI/YAI Dichotomy
References
- 1.Kim, J.; Maathuis, H.; Sent, D. Human-centered evaluation of explainable AI applications: a systematic review. Frontiers in Artificial Intelligence 2024. Systematic Review. [CrossRef]
- Ma, S.; of Science, T.H.K.U.; Technology, C. Towards Human-centered Design of Explainable Artificial Intelligence (XAI): A Survey of Empirical Studies. arXiv preprint 2024. Empirical Study Survey.
- Christian Meske, Justin Brenne, E.S.; Dogangün, A. From Explainable to Explanatory Artificial Intelligence: Toward a New Paradigm for Human-Centered Explanations through Generative AI. arXiv preprint 2025. Introduces the Explanatory AI (YAI) paradigm, emphasizing contextual reasoning, narrative communication, and adaptive personalization for human-centered explanations.
- Sovrano, F.; Vitali, F. Explanatory artificial intelligence (YAI): human-centered explanations of explainable AI and complex data. arXiv preprint 2024. Introduces Explanatory AI (YAI) as a tool enhancing basic XAI output, grounded in Achinstein’s theory of explanations emphasizing pragmatic, user-focused explanations.
- Nguyen, T.; Canossa, A.; Zhu, J. How Human-Centered Explainable AI Interfaces Are Designed and Evaluated: A Systematic Survey. arXiv preprint 2024. Systematic Review.
- Yao Rong, Tobias Leemann, T.T.N.L.F.P.Q.V.U.T.S.G.K.; Kasneci, E. Towards Human-Centered Explainable AI: A Survey of User Studies for Model Explanations. IEEE Transactions on Pattern Analysis and Machine Intelligence 2024. Survey Paper.
- Chinnaraju, A. Explainable AI (XAI) for trustworthy and transparent decision-making: A theoretical framework for AI interpretability. World Journal of Advanced Engineering Technology and Sciences 2025. Review article.
- Sam Baron, Andrew J. Latham, S.V. Explainable AI and stakes in medicine: A user study. Artificial Intelligence 2025. User Study.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).