Explainable Transformer Models for Human Emotion Recognition: A Multi-Method Explainability Study in the Context of Mental Health

Muhammad Azhar; Naureen Riaz; Waqar Azeem; Deshinta Arrova Dewi; Adeen Amjad; Muhammad Arman

doi:10.20944/preprints202604.1441.v1

Submitted:

17 April 2026

Posted:

21 April 2026

You are already at the latest version

Abstract

Recognizing emotions from written text is a very important part of Natural Language Processing (NLP) and is commonly used for feeling or sentiment analysis or keeping track of someone’s mental health status. This study uses a readable emotion-detecting framework with a RoBERTa-base model that has been modified and trained specifically for the Emotions for NLP dataset and provides an accuracy of 0.924% and f1 score of 0.925%. The main contributions of this study are the use of four different techniques that will help understand how the model works: SHAP (SHapley Additive exPlanations) provides global token credit attribution; LIME (Linear Interpretable Model-Agnostic Explanation) provides instance-level explanations; multi-head Attention Visualization provides structural interpretability; and Integrated Gradients via Captum provides gradient-based attribution using integration. The combination of these four techniques works together to improve transparency, help identify bias in the models, and support the responsible use of this model. Finally, the developers of this model performed many experiments that demonstrated the consistency with which the model could identify important emotional tokens (words or phrases) as predictive indicators of emotion.

Keywords:

emotion recognition

;

explainable AI

;

RoBERTa

;

SHAP

;

LIME

;

integrated gradients

;

attention visualization

;

psychological well-being

;

transformer models

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Explainable Transformer Models for Human Emotion Recognition: A Multi-Method Explainability Study in the Context of Mental Health

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe