Submitted:
05 March 2025
Posted:
06 March 2025
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
2. Requirement Specification
2.1. Interpretation Tools
2.2. Purposes of Interpretation Tools
- i.
- Enabling the deployer to reject or correct training or test data on the basis of insufficient data quality. Data quality issues can include: unwanted confounding, data imbalance, bias, presence of noise, artifacts, and outliers.
- ii.
- Enabling the deployer to reject or correct an AI model on the basis of insufficient training data quality or inappropriate model behavior. Inappropriate model behavior can include: unwanted reliance on confounding information in data, unacceptable levels of uncertainty, bias, unfair decision making.
- iii.
- Enabling the deployer to reject, scrutinize (e.g., by cross-checking with the output of a second model or the opinion of a human expert), or correct outputs of a model on a given input or group of inputs on the basis of insufficient quality of test inputs, test inputs being outside the training distribution, inacceptable levels of uncertainty, or other reasons.
- iv.
- Selecting certain training or test data inputs, or input dimensions, for further inspection, e.g., to confirm the presence of noise or artifacts in inputs, or to assess the predictive value of individual input dimensions.
- v.
- Recommending certain dimensions of a test input for external intervention, for example with the goal of simulating a model’s output (e.g., a credit risk score) or predicting a real-world quantity predicted by the model (e.g., a health outcome) based on counterfactual data.
3.3. Requirements on Interpretation Tools
- i.
- The intended purpose of the tool.
- ii.
- The information provided by the tool, defined as the concrete interpretation of the tool’s output. The output shall correspond to well-defined unambigous properties of the high-risk AI system in general or its components.
- iii.
- A logically sound line of argument stating how the provided information enables the tool to fulfill its intended purpose when used by the deployer according to provided instructions.
- iv.
- The technical constraints including assumptions on the components of the high-risk AI system (e.g., model class, training data, test input) affecting the accuracy and precision of the tool with respect to providing correct information about the high-risk AI system or its components and with respect to serving its intended purpose.
- v.
- The expected accuracy and precision of the tool with respect to providing correct information, and the expected accuracy and precision of the tool with respect to serving its intended purpose.
- Theoretical guarantees taking into account the technical constraints and assumptions of the tool and the properties of the high-risk AI system and its components.
- Empirical results obtained using large enough and sufficiently representative sets of test inputs.
- vi.
- Instructions on when and how to use the tool, including instructions on how to act upon observing the tool’s output in order to fulfill its purpose.
- vii.
- A risk assessment including the discussion of possible failure modes of the tool and possible consequences of failures for the appropriate use of the high-risk AI system.
- viii.
- (A reference to) the technical specification of the interpretation tool.
- ix.
- Technical details of the experiments conducted to determine the accuracy and precision of the information provided by the tool and of its fitness for purpose.
- x.
- Details on the derivation of the theoretical guarantees for the accuracy and precision of the information provided by the tool and of its fitness for purpose.
3. Conclusions
References
- http://data.europa.eu/eli/reg/2024/1689/oj.
- https://standards.cencenelec.eu/dyn/www/f?p=205:22:0::::FSP_ORG_ID,FSP_LANG_ID:2916257,25&cs=1827B89DA69577BF3631EE2B6070F207D.
- DIN SPEC 92001-3:2023-04, 2023. Artificial intelligence—life cycle processes and quality requirements—part 3: Explainability.
- Hüllermeier, E., & Waegeman, W. (2021). Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine learning, 110(3), 457-506.
- Molnar, C. (2020). Interpretable machine learning. Lulu. com.
- Weber, R.O., Johs, A.J., Goel, P., & Silva, J.M. (2024). XAI is in trouble. AI Magazine, 45(3), 300-316.
- Haufe, S., Wilming, R., Clark, B., Zhumagambetov, R., Panknin, D., & Boubekki, A. Position: XAI needs formal notions of explanation correctness. In Interpretable AI: Past, Present and Future.
- Haufe, S.; Meinecke, F.; Görgen, K.; Dähne, S.; Haynes, J.D.; Blankertz, B.; Bießmann, F. On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage 2014, 87, 96–110. [Google Scholar] [CrossRef]
- Wilming, R., Kieslich, L., Clark, B., & Haufe, S. (2023, July). Theoretical behavior of XAI methods in the presence of suppressor variables. In International Conference on Machine Learning (pp. 37091-37107). PMLR.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
