Submitted:
05 July 2024
Posted:
09 July 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. When Does XAI Fail ?
2.1. Robustness


2.2. Adversarial Attacks
2.3. Partial Explanations

2.4. Data and Concept Drift
2.5. Anthropomorphization
2.6. Contradictory Explanations
2.7. Unstable Explanations
2.8. Incompatible Explanations
2.9. Mismatch
2.10. Counterintuitive Explanations
3. Progresses
3.1. Six Ws
- 1.
- Who gives and receives the explanation?
- 2.
- What is explained?
- 3.
- When is an explanation given?
- 4.
- Where is the explanation given?
- 5.
- Why is XSec needed?
- 6.
- How to explain security?
3.1.1. Who Gives and Receives the Explanation?
3.1.2. What is Explained?
3.1.3. Where?
- 1.
- Explanations are provided to the users as a part of the security policy.
- 2.
- Detaching explanations from the system and making it available elsewhere.
- 3.
- Considering a service where the users can interact with an expert system which provides explanations.
- 4.
- The best option which the authors consider is a ’security-explaining-carrying-system’, although a considerable amount of work is required to ensure the it’s safety.
3.1.4. When?
3.1.5. Why?
3.1.6. How?
3.2. Taxonomy of XAI and Black Box Attacks
3.2.1. Taxonomy
- 1.
- X-PLAIN is regarding the explanations provided for the predictions given by the model. This includes, the static and interactive changes in explanations, local/global explanations, in-model/post-hoc explanations, surrogate models and visualizations of a model.
- 2.
- XSP-PLAIN includes confidential information such as features which are required to be protected, integrity properties of the data and the model ND privacy properties of the data and the model.
- 3.
- XT-PLAIN deals with the threat models considered. This includes correctness, consistency, transferability, confidence, fairness and privacy.
3.2.2. Proposed Black Box Attack
3.3. Interpretation of Neural Networks is Fragile
4. Fooling Neural Network Interpretations
4.1. Preliminaries
4.2. Objective Function and Penalty Terms
4.3. Passive Fooling
4.4. Active Fooling
5. More Works
6. Conclusion
References
- Rawal, A.; McCoy, J.; Rawat, D.B.; Sadler, B.M.; Amant, R.S. Recent advances in trustworthy explainable artificial intelligence: Status, challenges, and perspectives. IEEE Transactions on Artificial Intelligence 2021, 3, 852–866. [Google Scholar] [CrossRef]
- Chung, N.C.; Chung, H.; Lee, H.; Chung, H.; Brocki, L.; Dyer, G. False Sense of Security in Explainable Artificial Intelligence (XAI). arXiv preprint 2024, arXiv:2405.03820. [Google Scholar]
- Bove, C.; Laugel, T.; Lesot, M.J.; Tijus, C.; Detyniecki, M. Why do explanations fail? A typology and discussion on failures in XAI. arXiv preprint 2024, arXiv:2405.13474. [Google Scholar]
- Ghorbani, A.; Abid, A.; Zou, J. Interpretation of neural networks is fragile. AAAI conference on artificial intelligence, July 2019; Vol. 33, No. 01. pp. 3681–3688. [Google Scholar]
- Heo, J.; Joo, S.; Moon, T. Fooling neural network interpretations via adversarial model manipulation. Advances in neural information processing systems 2019, 32. [Google Scholar]
- Dombrowski, A.K.; Alber, M.; Anders, C.; Ackermann, M.; Müller, K.R.; Kessel, P. Explanations can be manipulated and geometry is to blame. Advances in neural information processing systems 2019, 32. [Google Scholar]
- Rieger, L.; Hansen, L.K. A simple defense against adversarial attacks on heatmap explanations. arXiv preprint 2020, arXiv:2007.06381. [Google Scholar]


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).