Collaborative Explainable AI for EEG Mental Health Monitoring with Constrained QA-Tuned LLM Alignment

Zian Ding; Fusen Guo; Zhibo Zhang; Chan Yeun; Ernesto Damiani; Bonan Zhang; Lin Li

doi:10.20944/preprints202606.1182.v1

Submitted:

13 June 2026

Posted:

16 June 2026

You are already at the latest version

Abstract

The monitoring of mental health states using electroencephalogram (EEG) signals has gained increasing attention due to its non-invasive nature for psychological disorders. Large Language Models (LLMs) and Explainable Artificial Intelligence (XAI) have been utilized in advancing the intelligence and interpretability of EEG analysis. However, existing methods face critical bottlenecks, including the fundamental modal gap, high computational costs, and poor global consistency. The limitation of rigid classification tasks without supporting clinical reasoning and natural language interaction. In this study, we propose a collaborative explainable AI framework for EEG mental health monitoring with constrained question-and-answer (QA) tuned LLM alignment, which builds a smooth transformation path from raw EEG signals to evidence, and constructs a structured QA dataset for the instruction fine-tuning of LLMs. The central objective of this work is not simply to maximize EEG classification accuracy, but to develop an evidence-grounded alignment and explanation framework that connects EEG-derived physiological evidence with QA-based LLM reasoning. Furthermore, this work designs a transparent collaborative XAI mechanism that embeds interpretable EEG feature information as prior knowledge directly into the QA generation process of the LLM, and develops a multi-level interpretable pipeline combining attention heatmap analysis and decision tree surrogate modeling to achieve precise alignment between LLM internal reasoning and EEG neurophysiological patterns. The proposed framework addresses the limitations of traditional rigid EEG classification tasks, promotes the XAI paradigm shift from high-cost post-hoc explanations to transparent embedded explanations, and enables robust clinical reasoning and natural language interaction based on EEG signals. Experimental results on a benchmark EEG mental state dataset demonstrate that the proposed framework stably captures neurophysiological characteristics corresponding to different mental states, and effectively improves the classification performance, decision transparency and clinical credibility of EEG-based mental health monitoring systems. In this setting, classification performance is treated as one evaluation aspect, while the primary contribution lies in constrained evidence-grounded alignment and QA-based LLM explainability. This advancement provides an initial feasibility study of real-time, scalable, and trustworthy intelligent EEG-based mental health analysis.

Keywords:

electroencephalogram

;

explainable artificial intelligence

;

large language models

;

mental health monitoring

;

model tuning

Subject:

Engineering - Electrical and Electronic Engineering

1. Introduction

Electroencephalogram (EEG) is a critical non-invasive neuropsychologic detection modality, which captures subtle dynamic changes in cerebral cortical activity with its millisecond-level high temporal resolution. It therefore exhibits distinct advantages in mental health assessment and complex cognitive state analysis [1,2,3]. In traditional clinical practice, the assessment of psychological states such as depression and anxiety often relies heavily on subjective methods, including questionnaires and clinical interviews. These methods are not only time-consuming, but also susceptible to influence from patients’ expressive ability, memory bias, and evaluators’ subjective judgment. In contrast, EEG can provide objective, continuous, and real-time measurements of neuropsychologic activity, thereby effectively reducing the interference of subjective bias on final assessment results [4,5]. Based on this objective measurement capability, EEG technology has been widely applied to a range of key tasks, including the detection of emotional disorders, episodic or neurodegenerative diseases, and complex task assessment [6,7].

Although machine learning and deep learning methods have significantly improved the feature extraction efficiency and classification performance for EEG signals in recent years, the field still faces several key challenges that must be addressed to enable translation into practical clinical applications. First, EEG signals have high dimensionality, strong non-stationary characteristics, and significant inter-individual differences across subjects, which directly leads to severe degradation in the generalization ability of existing models when applied to unseen data [8,9]. Second, state-of-the-art deep learning models that achieve exceptional classification accuracy usually lack necessary interpretability. This "black box" property prevents clinicians from understanding the internal logic behind the model’s diagnostic decisions, limiting its credibility and usability in medical applications [10,11]. Furthermore, most current studies still focus on single classification tasks, and the corresponding models lack the ability to structurally represent and interpret neural patterns, making it difficult to support complex logical reasoning and in-depth interaction required for clinical diagnosis [12,13].

To address the "black box" limitation of deep learning models and improve their decision transparency, Explainable Artificial Intelligence (XAI) techniques have been increasingly applied to EEG data analysis. Researchers have attempted to precisely quantify the specific contribution of specific brain regions and frequency bands to the model’s final decision through feature attribution methods [14,15,16]. However, most existing XAI methods adopt model-agnostic strategies, which construct post-hoc explanations by locally perturbing input data and observing the resulting changes in model predictions. While this approach can reveal local correlations between features and prediction outputs, it is essentially based on simplified proxy models and often fails to generate globally consistent explanations when processing EEG data with high dimensionality and complex spatiotemporal correlations [17].

Meanwhile, Large Language Models (LLMs) have demonstrated significant advantages in complex natural language processing, multimodal logical reasoning, and professional medical data analysis [18,19,20]. However, direct application of LLMs to EEG analysis faces substantial obstacles: EEG signals are essentially continuous, fluctuating, unstructured time-series data, while LLMs operate on discrete text tokens, resulting in a fundamental modal gap in their data representations [13,21]. Recent cutting-edge cross-modal studies have provided a promising solution to this challenge: converting complex physical or time-series signals into two-dimensional spectrograms that encode rich time-frequency information, and further aligning these spectrograms with the representation space of LLMs, can effectively break the barrier between raw signals and language, and extend cross-modal reasoning capabilities [22].

Based on the aforementioned application background and technical bottlenecks, this paper proposes a collaborative interpretable artificial intelligence framework. The framework first converts raw continuous EEG signals into evidence and constructs a highly structured Question-and-Answer (QA) dataset based on these evidence, specifically for the instruction fine-tuning of LLMs. In this study, EEG classification is not treated as the only objective, but as one evaluation component within a broader evidence-grounded reasoning framework. The main focus is to connect EEG-derived physiological evidence with QA-based LLM reasoning and interpretable decision explanations. More importantly, this framework seamlessly embeds feature attributions and interpretable information extracted by the XAI module directly into the LLM’s question-answering generation process, achieving effective alignment between low-level neuropsychologic signal features and high-level linguistic semantic concepts. Here, the collaborative XAI mechanism refers to the integration of EEG-derived interpretable evidence, LLM attention and hidden-state analysis, and surrogate decision-tree reasoning. The multi-level interpretable pipeline refers to the explanation process that moves from token-level importance, to EEG evidence-level heatmaps, and then to rule-level decision-tree explanations. This design enhances the model’s interpretability and endows the system with robust medical logical reasoning capabilities. Different from approaches that simply use SHAP or LIME outputs as textual prompts for LLMs, the proposed framework combines embedded EEG evidence construction with internal LLM representation analysis and surrogate rule extraction. Therefore, the explanation process is not only based on external feature attribution, but also links model-layer behavior with EEG physiological evidence and final mental-state classification.

The contributions of this paper are summarized in the following aspects:

Propose an end-to-end EEG-language model alignment framework that achieves mapping from underlying neuropsychological signals to high-level semantic logical representations.This contribution focuses on constrained evidence-grounded alignment rather than pure classification optimization.
Design a collaborative XAI mechanism that addresses the limitations of traditional post-hoc explanation by embedding interpretable feature information as prior knowledge directly into the question-answering generation process of the LLM.This mechanism aims to improve explanation transparency by linking model outputs with EEG channels, frequency-band evidence, heatmap patterns, and surrogate decision rules.
Construct a structured QA dataset that provides a solid data foundation for instruction fine-tuning and training of subsequent LLMs in the continuous physiological signal domain. The QA formulation enables the model to generate mental-state predictions together with structured reasoning and interpretable evidence, while classification performance is treated as one evaluation aspect of the overall framework.

The rest of the paper is organized as follows. Section 2 reviews the related work. Section 3 provides the necessary background and preliminaries for understanding the proposed approach. Section 4 details the proposed explanation framework, including its components and workflow. Section 5 presents the experimental design and results to validate the effectiveness of the framework. Finally, Section 6 concludes the paper and discusses potential future research directions.

2. Related Work

EEG-based mental health monitoring has attracted increasing attention because EEG provides objective and temporally precise measurements of neural activity. Early studies and recent reviews have shown that deep learning methods can substantially improve EEG feature extraction and classification performance in tasks such as general EEG decoding, depression detection, and stress monitoring [23,24,25]. For example, convolutional neural networks, recurrent models, and Transformer-based architectures have been used to capture spatial, spectral, and temporal patterns from EEG signals. However, despite their promising performance, EEG models still face strong generalization challenges. The non-stationary nature of EEG signals, inter-subject variability, and differences in recording conditions can reduce model robustness when models are applied to unseen users or clinical environments [26,27]. Therefore, improving EEG-based mental state recognition requires not only stronger classifiers, but also methods that can produce stable and clinically meaningful evidence across individuals.

Interpretability is another key requirement for the clinical translation of EEG-based AI systems. Post-hoc XAI methods such as SHAP and LIME have been widely used to estimate feature contributions and explain black-box predictions [28]. In EEG research, these methods can help identify relevant electrodes, frequency bands, or temporal segments associated with model decisions [29]. In addition, inherently interpretable models have been proposed to reduce the gap between model performance and clinical transparency [30]. Nevertheless, many existing explanation methods are still based on local perturbation or surrogate approximation, which may be computationally expensive and may not provide globally consistent explanations for high-dimensional EEG data with complex spatial-temporal dependencies [31,32,33]. These limitations motivate the development of explanation mechanisms that are embedded into the modeling and reasoning process, rather than being applied only after prediction.

Recent advances in large language models have opened new possibilities for medical AI and multimodal reasoning. LLMs have demonstrated strong capabilities in clinical knowledge encoding, medical question answering, and diagnostic reasoning [34,35,36,37]. However, directly applying LLMs to EEG analysis remains challenging because EEG signals are continuous, multi-channel, and time-varying physiological signals, whereas LLMs are primarily designed to process discrete textual tokens. This creates a fundamental modality gap between neural signal representations and language-based reasoning. Recent EEG-specific LLM studies have started to address this challenge. Babu et al. provided a comprehensive survey and taxonomy of LLM-based EEG research, organizing existing studies into EEG representation learning, EEG-to-language decoding, cross-modal generation, clinical applications, and dataset management tools [38]. This survey suggests that LLMs can support semantic interpretation and diagnostic assistance in EEG analysis, but also highlights the need for reliable signal-language alignment and interpretable reasoning. In another recent study, Babu et al. proposed modality reprogramming to adapt frozen LLMs for multi-channel EEG classification [39]. Their work demonstrates that frozen LLMs can be repurposed for EEG classification by transforming EEG inputs into representations compatible with the LLM space, without requiring full model retraining.

Beyond EEG, cross-modal signal understanding studies also provide useful methodological inspiration. For example, RF-GPT converts radio-frequency signals into two-dimensional spectrograms and aligns them with the representation space of LLMs, showing that continuous physical signals can be bridged with language models through suitable intermediate representations [22]. These studies collectively indicate that LLMs can be extended beyond text-only tasks, but they also reveal several unresolved issues in EEG-based applications. First, many existing EEG studies remain confined to rigid classification tasks and provide limited support for clinical reasoning or natural language interaction. Second, existing LLM-based signal alignment methods often focus on representation transfer or classification, while the reasoning process behind the final decision is not always explicitly grounded in neurophysiological evidence. Third, post-hoc explanation methods may fail to provide transparent and globally coherent explanations for EEG-LLM systems. To address these gaps, this study proposes a constrained QA-tuned LLM alignment framework that transforms EEG features into structured evidence, embeds interpretable EEG information into the QA generation process, and combines attention heatmaps with decision tree surrogate modeling to link LLM reasoning behavior with EEG-derived physiological patterns.

3. Background and Preliminaries

3.1. EEG Signals and Mental States

EEG is a non-invasive technique for measuring brain electrical activity and has been widely adopted in neuroscience, psychology, and mental health assessment [40]. It records voltage fluctuations generated by synchronized neuronal activity through electrodes placed on the scalp. Compared with other neuroimaging methods, EEG offers very high temporal resolution at the millisecond level, which makes it suitable for capturing fast changes in cognitive and emotional states [41]. In practice, EEG signals are collected using standard electrode placement systems. The sensors are mapped to specific brain regions, such as frontal, central, parietal, temporal, and occipital areas. This setup makes it easier to analyze brain activity in a structured way and to extract region-related features for later modeling [42,43].

In signal analysis, EEG data are often decomposed into several frequency bands, including Delta (0–4 Hz), Theta (4–7 Hz), Alpha (8–13 Hz), Beta (14–30 Hz), and Gamma (30–100 Hz). These bands are linked to different neural and cognitive processes. For instance, Alpha activity is usually associated with relaxed but alert states, while Beta and Gamma bands are more related to stress responses [44]. Changes across these frequency bands provide useful information for estimating emotional states and stress levels [45,46]. By combining frequency information with electrode locations, EEG signals can be converted into structured features that reflect underlying mental conditions.

In this study, the goal of EEG-based mental health classification is to infer an individual’s psychological state directly from EEG signals. This provides a data-driven alternative to traditional methods such as self-reports or clinical interviews. The labels are derived from standardized affective stimuli [47] and grouped into several levels, including stressed, fluctuated, stable, and relaxed states. Each state is associated with different EEG patterns. These representations support supervised learning and also serve as a basis for later reasoning and interpretability analysis.

3.2. LLM Tuning

LLMs have shown strong ability in reasoning, semantic understanding, and structured text generation. However, when directly applied to domain-specific tasks such as EEG-based mental health assessment, pre-trained models often produce unreliable outputs. This is mainly due to hallucinated reasoning when handling domain-specific evidence [48].

To reduce this issue, fine-tuning is usually applied to adapt LLMs to specific tasks [49]. In this work, the prediction task is reformulated as a structured Question-Answering (QA) problem. This allows the model to learn task-related reasoning patterns, instead of performing implicit classification. Each training sample is represented as a pair

(q_{i}, a_{i})

, where

q_{i}

is the input instruction constructed from domain evidence, and

a_{i}

is the expected structured output. The dataset is defined as:

D = {(q_{i}, a_{i})}_{i = 1}^{N} .

(1)

Given this dataset, fine-tuning aims to optimize the conditional generation probability:

L = - \sum_{(q_{i}, a_{i}) \in D} log p_{θ} (a_{i} ∣ q_{i}),

(2)

where

θ

denotes the model parameters. In practice, the loss is computed only on the output tokens. This encourages the model to learn structured responses, rather than simply copying the input prompts [48].

To improve efficiency, parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA) are used [50]. Instead of updating all parameters, LoRA modifies only selected layers while keeping the original weights fixed. Let

F_{θ} (\cdot)

denote the base model and

Δ_{ϕ} (\cdot)

the learnable low-rank update. The tuned model can be written as:

F_{θ, ϕ} (x) = F_{θ} (x) + Δ_{ϕ} (x),

(3)

where

ϕ

represents the trainable adapter parameters. This approach reduces memory and computation cost, while keeping the general reasoning ability of the base model.

3.3. Explainable AI

XAI focuses on making complex models more transparent by showing how input information affects predictions. Among existing methods, SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) are widely used as post-hoc techniques. SHAP is based on ideas from cooperative game theory and estimates the contribution of each input variable by measuring its marginal effect across different feature subsets [51,52].

Instead of using a single attribution, SHAP averages contributions over different combinations of features. This results in a more stable importance score for each variable. The explanation model can be written in an additive form:

h (s) = ψ_{0} + \sum_{j = 1}^{d} ψ_{j} s_{j},

(4)

where

s \in {0, 1}^{d}

represents whether a variable is included or not, and

ψ_{j}

is the contribution of the j-th variable.

LIME, on the other hand, focuses on local interpretability by approximating a complex model with a simpler surrogate model in the neighborhood of a given sample [17,53]. It perturbs the input instance to generate a local dataset and fits an interpretable model (e.g., linear regression) weighted by a similarity function. The optimization objective can be expressed as:

\hat{g} = arg min_{g \in H} L (F, g, ω_{x}) + Γ (g),

(5)

where F is the original model, g is the surrogate model,

ω_{x}

measures the proximity between samples, and

Γ (g)

controls model complexity. Both SHAP and LIME primarily provide explanations at the feature level, indicating which input variables are influential for a prediction.

While these approaches are effective for machine learning models, they are not directly suitable for explaining LLM operating on structured reasoning inputs. In this work, the model processes EEG-derived evidence in a question-answer format, where the decision depends on interactions across multiple evidence components rather than isolated features. Therefore, explanations need to capture not only feature importance but also the reasoning structure within the LLM [54].

To address this, this work employs attention-based heatmap analysis to investigate the internal behavior of the LLM. Specifically, attention weights and hidden states across layers are used to construct a two-dimensional importance map. One axis represents model layers, and the other corresponds to structured evidence tokens. The next step is to identify which parts of the input evidence are emphasized by the model at different stages of processing.

In addition, we use a decision tree approximation to extract interpretable rules from the LLM. The idea is to convert internal model signals, such as attention scores and hidden states, into structured features. A surrogate decision tree is then trained to approximate the model’s behavior [55]. Formally, let

u

denote the transformed representation derived from the LLM, and y the corresponding prediction. The decision tree

T (u)

is trained such that:

T (u) \approx F (x),

(6)

where

F (x)

is the output of the original LLM.

The learned tree provides hierarchical decision rules that show how different evidence components contribute to the final prediction.

Compared with standard XAI methods, combining heatmap analysis with decision tree approximation gives a multi-level explanation. Heatmaps capture fine-grained interactions across model layers, while decision trees summarize these patterns into human-readable rules. This links low-level model signals with higher-level reasoning, making the overall decision process easier to interpret.

4. Proposed Method

This section first introduces the overall framework of the proposed framework, followed by the details of the components.

4.1. Proposed Overall Framework

This study proposes a collaborative framework for EEG-based mental health monitoring, which integrates structured QA-based LLM alignment with multi-level explainability analysis. As illustrated in Figure 1, the proposed framework consists of three main stages: LLM tuning and testing, LLM explanation, and text-based reasoning.

In the first stage, raw EEG signals are collected from participants and processed to extract meaningful features from both frequency and electrical nodes. The processed EEG data are reformulated into structured QA instances based also on the Retrieval Augmented Knowledge (RAG) base. The inputs consist of evidence-based descriptions derived from EEG signals, and the output corresponds to mental state predictions with reasoning. Based on this formulation, two LLMs are developed: an untuned baseline LLM with RAG information only and a fine-tuned LLM with both RAG information and EEG QA formulations. Both models are evaluated on a generalized testing set to assess their reasoning capability and generalization performance.

In the second stage, the explainability of the tuned LLM is analyzed through a multi-level interpretation pipeline. Given EEG testing samples, the tuned LLM generates answers for these testing samples. Furthermore, the internal model features, including model layer attention distributions and hidden states, are extracted. These model layers and testing samples are then used to construct sample-level heatmaps, capturing the contribution of different evidence tokens across model layers. Subsequently, evidence alignment is performed to map token-level importance to structured EEG evidence, leading to the generation of physical-layer EEG evidence heatmaps. These EEG evidence heatmaps provide an interpretable view of how the model prioritizes different EEG features and evidence during decision-making.

In the final stage, features are constructed from both model-layer features and evidence-level importance using sample-level heatmaps. These features are combined through cross-feature construction to capture interactions between models and EEG evidence. A joint decision tree is then trained to approximate the behavior of the LLM. This allows us to extract hierarchical decision rules that reflect its reasoning patterns. The resulting trees are further converted into human-readable explanations through a text-based reasoning process.

4.2. EEG Data Collection and Processing

EEG data were collected in a controlled laboratory setting to keep the recording conditions stable. The experiment followed an affective stimulation protocol to capture different mental states. All participants provided written informed consent before taking part in the study. The protocol was approved by the Institutional Review Board (IRB) of Khalifa University, and standard requirements such as voluntary participation, withdrawal, privacy, and data usage were clearly addressed. A custom application was used to present emotionally evocative images from the International Affective Picture System (IAPS) [56]. Each image was shown for a fixed duration, with neutral intervals inserted between stimuli to reduce carry-over effects. These stimuli were selected to induce several psychological states, including stressed, mildly fluctuating, stable, and relaxed conditions. EEG signals were recorded continuously during the experiment and aligned with the stimulus intervals, which allows consistent labeling of mental states. This synchronization helps maintain temporal alignment between the signals and the labels.

The dataset includes 24 participants with a balanced demographic distribution, providing a reasonable level of inter-subject variability. EEG signals were collected using the Emotiv Insight wireless headset [57], with a sampling rate of 128 Hz. The device records five channels located at AF3, AF4, T7, T8, and Pz, covering frontal, temporal, and parietal regions. For analysis, EEG signals are divided into standard frequency bands: Delta (0–4 Hz), Theta (4–7 Hz), Alpha (8–13 Hz), Beta (14–30 Hz), and Gamma (30–100 Hz). The feature space is constructed by combining electrode locations with these frequency components. For each electrode, five band-based features are extracted, resulting in a structured representation that captures both spatial and spectral information. For example, each channel includes Delta, Theta, Alpha, Beta, and Gamma band power, forming a consistent multi-channel feature set across all samples.

The collected EEG data are stored in a structured format with timestamps, frequency-band values, and signal quality indicators. These signals are then processed to build feature representations, which are used as inputs for the later QA formulation and LLM-based reasoning.

4.3. QA Formulation Process

As summarized in Algorithm 1, the EEG corpus is divided into a training set

D_{train}

and a testing set

D_{test}

. The training split is first used to establish a reference distribution for subsequent evidence construction. Specifically, the feature-wise mean vector

μ

, standard deviation vector

σ

, and label-specific centroids

c_{y}

are computed from

D_{train}

. These quantities provide a stable baseline for measuring how each EEG sample deviates from the training distribution and ensure that the testing split is processed without leaking information into the reference statistics. Based on these reference statistics, the evidence-construction operations in Algorithm 1 are applied to convert each EEG sample into structured and interpretable descriptors. Specifically,

{Agg}_{elec} (\cdot)

aggregates frequency-band features within each electrode,

{Agg}_{band} (\cdot)

aggregates features within each frequency band across electrodes,

{TopK}^{+} (\cdot)

identifies the most elevated z-score features, and

{TopK}^{-} (\cdot)

identifies the most suppressed z-score features. The function

reasoning (\cdot)

produces an evidence-grounded explanation using these structured descriptors, derived neurophysiological indicators, and channel-band deviations.

For each sample

x_{i}

, a structured evidence representation is derived by combining raw EEG observations with aggregated and derived descriptors. Following the notation in Algorithm 1, the evidence block is represented as

E_{i} = (x_{i}, e_{i}^{(elec)}, e_{i}^{(band)}, r_{i}^{(1)}, r_{i}^{(2)}, r_{i}^{(3)}, E_{i}^{+}, E_{i}^{-}),

(7)

where

x_{i}

denotes the original EEG feature vector,

e_{i}^{(elec)}

and

e_{i}^{(band)}

denote electrode-level and band-level aggregates, respectively,

r_{i}^{(1)}, r_{i}^{(2)}, r_{i}^{(3)}

are derived neurophysiological indicators, and

E_{i}^{+}, E_{i}^{-}

denote the most elevated and most suppressed features selected from the z-score vector

z_{i} = (x_{i} - μ) ⊘ σ .

(8)

For each sample, the question side is defined as

Q_{i} = (Problem, E_{i}),

(9)

where Problem specifies the EEG-based mental state reasoning task and

E_{i}

is the structured evidence block. The answer side is defined as

A_{i} = ({\hat{y}}_{i}, reasoning (E_{i}), c_{i}),

(10)

where

{\hat{y}}_{i} = M (y_{i})

is the mapped mental-state label under the label map

M

,

reasoning (E_{i})

denotes the explanation grounded in the evidence block, and

c_{i}

is a confidence score. In our implementation, the confidence is derived from the distance between the sample and the nearest training label centroid, i.e.,

c_{i} = exp (- min_{y} {∥ x_{i} - c_{y} ∥}_{2}) .

(11)

Compared with direct label prediction, this QA construction explicitly encourages LLM to reason over structured EEG evidence, including channel-band deviations, regional aggregates, and derived asymmetry or ratio indicators. In this way, the generated QA pairs are more suitable for downstream LLM alignment and tuning, since LLM is trained to associate mental-state decisions with interpretable physiological evidence.

Finally, the same evidence construction pipeline is applied to both

D_{train}

and

D_{test}

, while the training-derived reference statistics

(μ, σ, c_{y})

are reused for the testing split. This preserves a consistent evidence space across splits and provides a principled basis for comparing untuned and QA-tuned LLMs under identical EEG reasoning constraints. Figure 2 illustrates representative EEG-based QA samples formulated, where structured evidence is provided. Preprints 218444 i001

4.4. QA-Tuning Process

The tuning of local LLMs in this study follows a parameter-efficient instruction alignment pipeline, as summarized in Algorithm 2. The objective is to learn a mapping from structured EEG evidence to reasoning outputs, while preserving the general linguistic and reasoning capabilities of the base LLM.

Let the QA dataset generated from EEG samples be denoted as

J = {(Q_{k}, A_{k})}_{k = 1}^{| J |},

where

Q_{k} = (Problem, E_{k})

is the question constructed from EEG evidence (as defined in the QA generation stage), and

A_{k} = ({\hat{y}}_{k}, reasoning (E_{k}), c_{k})

is the corresponding structured answer. Each QA instance is transformed into a supervised training pair

(x_{k}, y_{k})

via a formatting function

(x_{k}, y_{k}) = FormatSFT (Q_{k}, A_{k}),

where

x_{k}

encodes the EEG-based reasoning problem together with the Evidence Block

E_{k}

, and

y_{k}

represents the target structured response. The resulting training set is defined as

D = {(x_{k}, y_{k}) ∣ (Q_{k}, A_{k}) \in J} .

During optimization, the loss is computed only over the output tokens corresponding to

y_{k}

, ensuring that the model focuses on generating structured reasoning outputs rather than reproducing the input prompt.

To improve efficiency, parameter-efficient fine-tuning is adopted using Low-Rank Adaptation (LoRA). Given a base model M with parameters

θ

, the original weights are frozen, and a set of trainable low-rank adapters

ϕ

is injected into selected projection layers

U

.The adapted model is written as

f_{θ, ϕ} (x) = f_{θ} (x) + Δ_{ϕ} (x),

where

Δ_{ϕ}

denotes the learned low-rank update. The model is trained for E epochs by minimizing the supervised fine-tuning loss

L_{SFT} = - \sum_{(x_{k}, y_{k}) \in D} log p_{θ, ϕ} (y_{k} ∣ x_{k}) .

After training, the LoRA parameters are merged into the base model to obtain a standalone tuned model

\hat{M}

. The merged model is then exported and quantized for efficient local deployment. This setup allows a direct comparison between untuned and QA-aligned LLMs under the same EEG evidence inputs, while keeping the computational cost manageable and the results reproducible. Preprints 218444 i002

4.5. Explanation Process

Given an EEG QA sample

(Q_{i}, A_{i})

with structured evidence

E_{i}

, the tuned model

\hat{M}

produces a prediction together with internal representations, including hidden states and attention distributions across layers. These signals are first used to construct a sample-level importance representation, where each input token is associated with a layer-wise importance score. Formally, let

H_{ℓ}

and

A_{ℓ}

denote the hidden state and attention matrix at layer ℓ, respectively. The sample-level importance can be expressed as:

S_{i} = G ({H_{ℓ}, A_{ℓ}}_{ℓ = 1}^{L}),

(12)

where

G (\cdot)

denotes an aggregation function that combines model-layer statistics (e.g., norms, means, or attention entropy).

To obtain interpretable explanations aligned with EEG features, the sample-level importance is further mapped to structured evidence components. Specifically, the token-level importance scores are grouped according to the evidence elements defined in

E_{i}

, resulting in an evidence-level heatmap:

H_{i} = Align (S_{i}, E_{i}),

(13)

where

H_{i}

captures the contribution of each EEG-derived evidence component across model layers. This transformation enables the interpretation of how different EEG features (e.g., band power, electrode activity, or derived indicators) influence the model’s decision.

To further extract structured reasoning, model-layer features and evidence-level importance are jointly transformed into a feature space for surrogate modeling. Let

u_{i}

denote the constructed feature vector:

u_{i} = Φ (S_{i}, H_{i}),

(14)

where

Φ (\cdot)

represents feature construction, including cross-feature interactions between model-layer statistics and EEG evidence importance.

A decision tree surrogate model

T (\cdot)

is then trained to approximate the behavior of the LLM:

T (u_{i}) \approx \hat{M} (Q_{i}) .

(15)

The resulting tree provides a hierarchical set of decision rules that reveal how model-layer responses contribute to the final prediction. Preprints 218444 i003

5. Experimental Design and Results

5.1. Experimental Setup

The experiments are conducted on a realistic EEG-based mental state dataset [57]. The collected dataset contains multi-channel brain signal features extracted from different electrodes and frequency bands, including Theta, Alpha, Beta, and Gamma components. These features provide physiological evidence for mental state analysis, detailed in SubSection 4.2. The mental state recognition task is formulated as structured QA pairs, where each sample consists of an EEG evidence block and a corresponding reasoning-based explanation. The evidence block includes electrode-frequency features (e.g., AF3 Theta, T7 Beta), while the output is a description of the mental states. Example QA formulations are shown in Figure 2. For training and evaluation, the generated QA samples are divided into a tuning set and a held-out testing set. The tuning set contains the majority of samples used for instruction tuning, while the testing set contains 500 samples used for evaluation. The testing set is constructed to include diverse EEG patterns across all mental states to assess the robustness of the proposed framework. Experiments are conducted using local large language models (LLMs), including Gemma-3-4B, LLaMA-3-4B, and Qwen-3-4B, deployed via Ollama and AnythingLLM. Parameter-efficient fine-tuning is performed using LoRA, where only low-rank adapter parameters are updated while keeping the base model weights frozen.The key LLM tuning and implementation settings are summarized in Table 1. The models are trained using supervised instruction tuning with structured QA pairs, and the loss is computed only over output tokens to encourage structured reasoning generation. The training and data processing pipelines are implemented in Python, with model fine-tuning conducted in a Google Colab environment. All experiments are implemented using standard deep learning libraries, including PyTorch and HuggingFace Transformers.

5.2. Heatmap Explanation

The sample-level heatmap gives an initial view of how the model distributes attention across reasoning segments and layers. As shown in Figure 3, the importance values are not uniform and instead follow clear layer-wise patterns. Early segments tend to have higher activation in shallow and middle layers, while later segments show weaker influence, especially in deeper layers. However, these patterns are still abstract and cannot be directly linked to interpretable EEG evidence. This motivates further alignment with domain-specific features in the next step.

The EEG feature-level heatmaps provide a more interpretable view by showing how different electrode-frequency components are weighted across model layers for different mental states. As shown in Figure 4, consistent activation patterns appear across the four states. This suggests that the model captures structured neurophysiological characteristics, rather than relying on arbitrary feature combinations.

For the relaxed state, alpha-related features (e.g., AF3 Alpha, T8 Alpha, PZ Alpha) show relatively high importance in early and middle layers, followed by a gradual decrease in deeper layers. This pattern is consistent with common EEG findings, where alpha activity is linked to relaxed and low-arousal conditions. The concentration of importance in the middle layers suggests that the model aggregates alpha-related evidence at an intermediate stage before making the final prediction. In contrast, the stressed state is characterized by stronger activation of beta-band components (e.g., AF3 Low Beta, T7 Low Beta, AF4 High Beta). These features maintain relatively high importance across a wider range of layers. Compared with the relaxed state, the importance values are both higher and more stable, indicating that higher-frequency activity plays a key role in the decision process. This matches the general view that beta activity is associated with increased cognitive load and stress. For the fluctuated state, theta-related features (e.g., AF3 Theta, T7 Theta, T8 Theta) are more noticeable. However, their importance is spread across layers rather than concentrated in a specific region. This distributed pattern suggests that the model captures changes and transitions in brain activity, instead of a stable condition, which aligns with the definition of fluctuating mental states. The stable state shows a more balanced pattern. Multiple frequency bands (e.g., alpha, beta, and gamma) contribute with moderate importance, and no single band clearly dominates. This indicates that the model relies on a combination of consistent but non-extreme features to represent stability.

5.3. Decision Tree Explanation

The feature importance analysis of the decision tree gives an initial view of which model-layer features have the strongest impact on the final decision. As shown in Figure 5, only a subset of layer-level features plays a dominant role. This uneven distribution suggests that the model does not use all layers equally, but instead focuses on specific layers and statistics, such as attention entropy and hidden state norms. However, these importance scores only indicate which features matter. They do not explain how these features interact to produce a specific decision.

As shown in Figure 6, the tree structure reflects a hierarchical decision process, where a small number of layer-level features split the data into different mental states. At the root node, Layer34_attention_entropy is used as the main splitting feature. This suggests that attention behavior at this layer has a strong influence on the overall decision. When the entropy is below a certain threshold, the model tends to classify the sample as relaxed. This is consistent with the idea that more stable attention patterns are linked to low-arousal conditions. If this condition is not satisfied, the model moves to intermediate-layer features, such as Layer8_attention_entropy and Layer23_hidden_mean. These features help refine the decision, especially when separating stressed states from other categories. The repeated use of attention entropy across different layers indicates that variation in attention patterns is an important signal for detecting higher cognitive load or stress. Further down the tree, features such as Layer24_hidden_norm and Layer22_attention_entropy are used to distinguish between fluctuated and stable states. Fluctuated states are associated with more variable activation patterns across layers, while stable states appear when hidden representations are more consistent in magnitude.

5.4. Comparisons with Baseline Classification and Explanation Methods

Although the main objective of the proposed framework is evidence-grounded explanation rather than pure EEG classification, classification baselines are still useful for contextualizing the discriminative difficulty of the EEG mental-state recognition task. Therefore, two representative EEG deep learning models, EEGNet and DeepConvNet, are evaluated on the same held-out testing set. Since the dataset used in this study contains structured electrode-band features rather than raw continuous EEG segments, both models are implemented as feature-level convolutional baselines using the same 5-channel and 5-band EEG representation. Accuracy, Macro-Precision, Macro-Recall, and Macro-F1 are reported to provide a balanced evaluation across different mental-state classes.

As shown in Table 2, DeepConvNet achieves stronger classification performance than EEGNet on the current electrode-band feature representation. This result indicates that deeper convolutional structures can capture more discriminative patterns from the structured EEG feature matrix. EEGNet also provides a lightweight reference baseline, but its performance is lower under the current feature-level setting. These results provide classification-oriented baselines for the EEG mental-state task, while the subsequent analysis focuses on the interpretability of the reasoning process rather than treating classification accuracy as the sole optimization target.

In addition to classification baselines, representative post-hoc XAI methods are used to examine feature-level explanations. LIME generates local explanations by perturbing the input EEG feature space and estimating the contribution of individual features to a specific prediction. SHAP estimates positive and negative feature contributions relative to a baseline output. Representative LIME and SHAP explanations are shown in Figure 7 and Figure 8, respectively.

The LIME explanation presents local decision conditions and class probabilities, showing how individual EEG features support or oppose a given prediction. The SHAP force plots further show how features such as alpha, theta, and beta-band components push the model output above or below the baseline value. These results confirm that conventional post-hoc XAI methods can identify influential EEG features at the input level. However, their explanations mainly describe feature-output associations and do not explicitly characterize how structured EEG evidence is processed inside the QA-tuned LLM. In contrast, the proposed explanation process links EEG feature heatmaps, model-layer importance, and surrogate decision-tree rules. This provides a more structured explanation path from physiological EEG evidence to internal model behavior and final language-based reasoning.

5.5. Analysis and Discussion

Instead of directly merging model-layer features and EEG features into a single classifier, the proposed framework performs a joint analysis. This allows the reasoning process to reflect both model behavior and physiological evidence. From the EEG feature heatmaps in Figure 4, each mental state shows a distinct activation pattern across layers. The relaxed state is mainly associated with alpha-band activity (e.g., AF3 Alpha, T8 Alpha), which appears stronger in early and middle layers. In contrast, the stressed state shows consistent activation of low-beta and high-beta components across more layers. The fluctuated state presents more spread-out theta activity, indicating higher variability, while the stable state shows balanced contributions from several frequency bands without a clear dominant one.

These patterns are consistent with the decision tree in Figure 6. At the root node, Layer34_attention_ entropy separates relaxed samples from others. This suggests that stable attention at this layer is related to the alpha-dominant patterns seen in the heatmaps. For the remaining samples, intermediate-layer features such as Layer8_attention_entropy and Layer23_hidden_mean are used to identify stressed states. This matches the observation that beta-band activity remains strong across layers under stress. The repeated use of attention entropy in these branches indicates that changes in attention patterns are closely related to high-frequency EEG activity. The distinction between fluctuated and stable states is mainly handled by features such as Layer24_hidden_norm and Layer22_attention_entropy. In the heatmaps, fluctuated states show more dispersed theta activity, while stable states exhibit more balanced activation patterns. This suggests that intermediate-layer representations capture the level of variability in EEG signals.

There is a clear alignment between model layers and EEG features. Early and middle layers are mainly responsible for combining dominant EEG signals, while deeper layers provide more stable and abstract representations for final decisions. This shows that the model is not acting as a black box, but instead reflects meaningful physiological patterns across layers. Based on this alignment, joint reasoning can be constructed by linking decision paths with EEG evidence. For example, a stressed sample often follows a path dominated by attention entropy features in intermediate layers, which is consistent with strong beta-band activity. Similarly, a relaxed prediction is associated with lower entropy and aligns with alpha-dominant patterns. These connections allow the model to produce structured explanations grounded in both internal behavior and EEG evidence. However, the proposed formulation still relies on structured feature construction and therefore does not fully bridge the modality gap in the same way as end-to-end cross-modal alignment methods such as RF-GPT [22]. Compared with RF-GPT, the proposed framework is less direct in signal-language alignment, but it provides a more constrained and traceable reasoning path. Each prediction can be traced back to specific EEG channels, frequency bands, heatmap patterns, and surrogate decision rules. Therefore, the proposed method should be regarded as an interpretable and evidence-grounded alternative, rather than a replacement for fully end-to-end cross-modal models.

6. Conclusions

This study presents an explainable framework for EEG-based mental state analysis using a constrained QA-tuned LLM. By reformulating the task as a structured reasoning problem, the model produces predictions that are grounded in physiological evidence, rather than relying on implicit classification. A multi-level explanation pipeline is developed. Sample-level heatmaps show how importance is distributed across layers. EEG feature heatmaps reveal activation patterns that are consistent with known physiological signals. Decision tree approximations further summarize these patterns into hierarchical and interpretable rules. Together, these components link internal model behavior with EEG-based evidence. The results show that the model captures meaningful neurophysiological patterns across different mental states. For example, alpha activity is more prominent in relaxed conditions, while beta activity is stronger under stress. The decision tree structures also indicate how these patterns are used across layers. This provides a consistent view that connects model dynamics with physiological signals, improving both interpretability and reliability.

There are still some limitations. The current setup uses low-density EEG data with a limited number of channels, which restricts the level of detail in the analysis. Moreover, the dataset is based on a limited number of participants and was not further validated on large-scale public EEG benchmarks. Therefore, the present results should be interpreted as an initial feasibility study rather than direct evidence of clinical translation. Broader validation using larger datasets, denser EEG recordings, and external benchmark datasets is still required to assess the generalizability and robustness of the proposed framework. In addition, the alignment is based on statistical relationships, and does not explicitly model causal effects between EEG features and model decisions. The current explanation evaluation is also mainly based on computational consistency and physiological plausibility, without formal assessment by clinical or domain experts.

Future work will focus on incorporating causal-aware alignment methods to move beyond correlation-based explanations. Another important direction is to conduct external validation using larger cohorts, denser EEG recordings, and public benchmark datasets, so that the stability of the proposed framework can be evaluated beyond the current self-collected dataset. In addition, future studies will involve neurologists, psychologists, or other domain experts to evaluate the comprehensibility, clinical relevance, and trustworthiness of the generated explanations. Such expert-centered evaluation will help determine whether the heatmaps, decision tree rules, and language-based explanations are understandable and useful in realistic mental health assessment scenarios. These steps can further improve the reliability and practical use of explainable LLM systems in EEG-based healthcare applications.

References

Kuriyakose, D.; et al. Explainable AI uncovers novel EEG microstate candidate neurophysiological markers for autism spectrum disorder. Front. Comput. Neurosci. 2026, 20, 1763727. [Google Scholar] [CrossRef] [PubMed]
Torres, J.M.M.; Medina-DeVilliers, S.; Clarkson, T.; Lerner, M.D.; Riccardi, G. Evaluation of interpretability for deep learning algorithms in EEG emotion recognition: A case study in autism. Artif. Intell. Med. 2023, 143, 102545. [Google Scholar] [CrossRef] [PubMed]
Rehman, A.; Mun, S. Explainable AI-Enhanced Ensemble Protocol Using Gradient-Boosted Models for Zero-False-Alarm Seizure Detection from EEG. Sensors 2026, 26, 863. [Google Scholar] [CrossRef] [PubMed]
Zhai, L.; Zhao, M.; Zhang, J.; Jamil, M.; Naz, R.; Li, C. A systematic review of EEG-based biomarkers for depression, anxiety, and bipolar disorder: trends in explainable artificial intelligence (XAI). BMC Psychiatry 2025. [Google Scholar] [CrossRef] [PubMed]
Zanola, A.; Fabrice Tshimanga, L.; Del Pup, F.; Baiesi, M.; Atzori, M. xEEGNet: towards explainable AI in EEG dementia classification. J. Neural Eng. 2025, 22, 046042. [Google Scholar] [CrossRef] [PubMed]
Ahmad, I.; Zhu, M.; Li, G.; Javeed, D.; Kumar, P.; Chen, S. A secure and interpretable AI for smart healthcare system: A case study on epilepsy diagnosis using EEG signals. IEEE J. Biomed. Health Inform. 2024, 28, 3236–3247. [Google Scholar] [CrossRef] [PubMed]
Islam, M.S.; Hussain, I.; Rahman, M.M.; Park, S.J.; Hossain, M.A. Explainable artificial intelligence model for stroke prediction using EEG signal. Sensors 2022, 22, 9859. [Google Scholar] [CrossRef] [PubMed]
Jayaram, V.; Alamgir, M.; Altun, Y.; Scholkopf, B.; Grosse-Wentrup, M. Transfer learning in brain-computer interfaces. IEEE Comput. Intell. Mag. 2016, 11, 20–31. [Google Scholar] [CrossRef]
Lotte, F.; Bougrain, L.; Cichocki, A.; Clerc, M.; Congedo, M.; Rakotomamonjy, A.; Yger, F. A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update. J. Neural Eng. 2018, 15, 031005. [Google Scholar] [CrossRef] [PubMed]
Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015; pp. 1721–1730. [Google Scholar] [CrossRef]
Zhang, Z.; Damiani, E.; Hamadi, H.; Yeun, C.; Taher, F. A late multi-modal fusion model for detecting hybrid spam e-mail. Int. J. Comput. Theory Eng. 2023, 15, 76–81. [Google Scholar] [CrossRef]
Moor, M.; Banerjee, O.; Abad, Z.S.H.; Krumholz, H.M.; Leskovec, J.; Topol, E.J.; Rajpurkar, P. Foundation models for generalist medical artificial intelligence. Nature 2023, 616, 259–265. [Google Scholar] [CrossRef] [PubMed]
Lu, W.; Song, C.; Wu, J.; Zhu, P.; Zhou, Y.; Mai, W.; Zheng, Q.; Ouyang, W. Unimind: Unleashing the power of llms for unified multi-task brain decoding. arXiv 2025, arXiv:2506.18962. [Google Scholar] [CrossRef]
Sánchez-Hernández, S.E.; Torres-Ramos, S.; Román-Godínez, I.; Salido-Ruiz, R.A. Evaluation of the relation between ictal EEG features and XAI explanations. Brain Sci. 2024, 14, 306. [Google Scholar] [CrossRef] [PubMed]
Hussain, I.; Jany, R.; Boyer, R.; Azad, A.; Alyami, S.A.; Park, S.J.; Hasan, M.M.; Hossain, M.A. An explainable EEG-based human activity recognition model using machine-learning approach and LIME. Sensors 2023, 23, 7452. [Google Scholar] [CrossRef] [PubMed]
Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef] [PubMed]
Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
AlSaad, R.; Abd-Alrazaq, A.; Boughorbel, S.; Ahmed, A.; Renault, M.A.; Damseh, R.; Sheikh, J. Multimodal large language models in health care: applications, challenges, and future outlook. J. Med. Internet Res. 2024, 26, e59505. [Google Scholar] [CrossRef] [PubMed]
Carmona-Martos, L.; Martín-Palomeque, P.; Escudero-Arnanz, Ó.; Soguero-Ruiz, C. Interpretable large language models for early prediction of antimicrobial multidrug resistance. Health Inf. Sci. Syst. 2025, 14, 11. [Google Scholar] [CrossRef] [PubMed]
Feli, M.; Azimi, I.; Liljeberg, P.; Rahmani, A.M. An llm-powered agent for physiological data analysis: A case study on ppg-based heart rate estimation. In Proceedings of the 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); IEEE, 2025; pp. 1–7. [Google Scholar] [CrossRef] [PubMed]
Guo, F.; Zhang, Z.; Mo, H.; Li, C. A method for battery soh estimation based on k-means and lightgbm algorithm. In Proceedings of the 2024 6th International Conference on System Reliability and Safety Engineering (SRSE); IEEE, 2024; pp. 1–7. [Google Scholar] [CrossRef]
Zou, H.; Tian, Y.; Wang, B.; Bariah, L.; Lasaulce, S.; Huang, C.; Debbah, M. RF-GPT: Teaching AI to See the Wireless World. arXiv 2026, arXiv:2602.14833. [Google Scholar] [CrossRef]
Craik, A.; He, Y.; Contreras-Vidal, J.L. Deep learning for electroencephalogram (EEG) classification tasks: a review. J. Neural Eng. 2019, 16, 031001. [Google Scholar] [CrossRef] [PubMed]
Xia, M.; Zhang, Y.; Wu, Y.; Wang, X. An end-to-end deep learning model for EEG-based major depressive disorder classification. IEEE Access 2023, 11, 41337–41347. [Google Scholar] [CrossRef]
Subhani, A.R.; Mumtaz, W.; Saad, M.N.B.M.; Kamel, N.; Malik, A.S. Machine learning framework for the detection of mental stress at multiple levels. IEEE Access 2017, 5, 13545–13556. [Google Scholar] [CrossRef]
Wan, Z.; Yang, R.; Huang, M.; Zeng, N.; Liu, X. A review on transfer learning in EEG signal analysis. Neurocomputing 2021, 421, 1–14. [Google Scholar] [CrossRef]
Xue, B.; Lv, Z.; Xue, J. Feature transfer learning in EEG-based emotion recognition. In Proceedings of the 2020 Chinese Automation Congress (CAC); IEEE, 2020; pp. 3608–3611. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Shoeibi, A.; Sadeghi, D.; Moridian, P.; Ghassemi, N.; Heras, J.; Alizadehsani, R.; Khadem, A.; Kong, Y.; Nahavandi, S.; Zhang, Y.D.; et al. Automatic diagnosis of schizophrenia in EEG signals using CNN-LSTM models. Front. Neuroinformatics 2021, 15, 777977. [Google Scholar] [CrossRef] [PubMed]
Borra, D.; Fantozzi, S.; Magosso, E. Interpretable and lightweight convolutional neural network for EEG decoding: Application to movement execution and imagination. Neural Netw. 2020, 129, 55–74. [Google Scholar] [CrossRef] [PubMed]
Tonekaboni, S.; Joshi, S.; McCradden, M.D.; Goldenberg, A. What clinicians want: contextualizing explainable machine learning for clinical end use. In Proceedings of the Machine learning for healthcare conference. PMLR, 2019; pp. 359–380. [Google Scholar] [CrossRef]
Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; Müller, H. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1312. [Google Scholar] [CrossRef] [PubMed]
Theissler, A.; Spinnato, F.; Schlegel, U.; Guidotti, R. Explainable AI for time series classification: a review, taxonomy and research directions. Ieee Access 2022, 10, 100700–100724. [Google Scholar] [CrossRef]
Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large language models encode clinical knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef] [PubMed]
Thirunavukarasu, A.J.; Ting, D.S.J.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S.W. Large language models in medicine. Nat. Med. 2023, 29, 1930–1940. [Google Scholar] [CrossRef] [PubMed]
Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar] [CrossRef]
Peng, Q.; Li, J.; Huang, S.; Jiang, Y.; Gong, K.; Ding, R.; Ye, S.; Zheng, C.; Wei, X.Y.; Li, Q. Aligning clinical needs and AI capabilities: a survey on LLMs for medical reasoning. Authorea Prepr. 2025. [Google Scholar] [CrossRef] [PubMed]
Babu, N.; Mathew, J.; Vinod, A. Large language models for eeg: A comprehensive survey and taxonomy. arXiv 2025, arXiv:2506.06353. [Google Scholar] [CrossRef]
Babu, N.; Mathew, J.; Satija, U.; Vinod, A. Modality reprogramming: Adapting frozen LLMs for multi-channel EEG classification. Neurocomputing 2025, 132407. [Google Scholar] [CrossRef]
Al Hammadi, A.Y.; Yeun, C.Y.; Damiani, E.; Yoo, P.D.; Hu, J.; Yeun, H.K.; Yim, M.S. Explainable artificial intelligence to evaluate industrial internal security using EEG signals in IoT framework. Ad. Hoc Netw. 2021, 123, 102641. [Google Scholar] [CrossRef]
Al Hammadi, A.Y.; Lee, D.; Yeun, C.Y.; Damiani, E.; Kim, S.K.; Yoo, P.D.; Choi, H.J. Novel EEG Sensor-Based Risk Framework for the Detection of Insider Threats in Safety Critical Industrial Infrastructure. IEEE Access 2020, 8, 206222–206234. [Google Scholar] [CrossRef]
Joshi, V.M.; Ghongade, R.B. IDEA: Intellect database for emotion analysis using EEG signal. J. King Saud. Univ.-Comput. Inf. Sci. 2022, 34, 4433–4447. [Google Scholar] [CrossRef]
Kim, J.; Park, Y.; Chung, W. Transform based feature construction utilizing magnitude and phase for convolutional neural network in EEG signal classification. In Proceedings of the 2020 8th International Winter Conference on Brain-Computer Interface (BCI), 2020; pp. 1–4. [Google Scholar] [CrossRef]
Agarwal, T.; Raturi, S.; Vybhav, T.; Singh, M. Classification of EEG signal using lstms under audiovisual stimuli. In Proceedings of the 2020 international conference on communication and signal processing (iccsp); IEEE, 2020; pp. 1229–1232. [Google Scholar] [CrossRef]
Chao, H.; Dong, L. Emotion Recognition Using Three-Dimensional Feature and Convolutional Neural Network from Multichannel EEG Signals. IEEE Sens. J. 2021, 21, 2024–2034. [Google Scholar] [CrossRef]
Chattopadhyay, S.; Zary, L.; Quek, C.; Prasad, D.K. Motivation detection using EEG signal analysis by residual-in-residual convolutional neural network. Expert Syst. With Appl. 2021, 184, 115548. [Google Scholar] [CrossRef]
Zhang, Z.; Umar, S.; Hammadi, A.Y.A.; Yoon, S.; Damiani, E.; Ardagna, C.A.; Bena, N.; Yeun, C.Y. Explainable Data Poison Attacks on Human Emotion Evaluation Systems Based on EEG Signals. IEEE Access 2023, 11, 18134–18147. [Google Scholar] [CrossRef]
Wu, X.K.; Chen, M.; Li, W.; Wang, R.; Lu, L.; Liu, J.; Hwang, K.; Hao, Y.; Pan, Y.; Meng, Q.; et al. Llm fine-tuning: Concepts, opportunities, and challenges. Big Data Cogn. Comput. 2025, 9, 87. [Google Scholar] [CrossRef]
Zhang, B.; Wang, J.; Du, Q.; Zhang, J.; Tu, Z.; Chu, D. A survey on data selection for llm instruction tuning. J. Artif. Intell. Res. 2025, 83. [Google Scholar] [CrossRef]
Che, C.; Wang, Z.; Yang, P.; Wang, C.; Ma, H.; Shi, Z. LoRA in LoRA: Towards parameter-efficient architecture expansion for continual visual instruction tuning. Proc. Proc. AAAI Conf. Artif. Intell. 2026, Vol. 40, 19978–19986. [Google Scholar] [CrossRef]
Zhang, Z.; Hamadi, H.A.; Damiani, E.; Yeun, C.Y.; Taher, F. Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research. IEEE Access 2022, 10, 93104–93139. [Google Scholar] [CrossRef]
Huang, X.; Zhang, Z.; Guo, F.; Wang, X.; Chi, K.; Wu, K. Research on older adults’ interaction with e-health interface based on explainable artificial intelligence. In Proceedings of the International Conference on Human-Computer Interaction; Springer, 2024; pp. 38–52. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-precision model-agnostic explanations. In Proceedings of the Proceedings of the AAAI conference on artificial intelligence, 2018; Vol. 32. [Google Scholar] [CrossRef]
Li, H.; Kam-Kwai, W.; Luo, Y.; Chen, J.; Liu, C.; Zhang, Y.; Lau, A.K.H.; Qu, H.; Liu, D. Save It for the “Hot” Day: An LLM-Empowered Visual Analytics System for Heat Risk Management. IEEE Trans. Vis. Comput. Graph. 2025, 31, 8928–8943. [Google Scholar] [CrossRef] [PubMed]
Ku, J.; Kim, S.; Lee, E.; Zaman, U.; Kim, K. Enhancing Autonomous Ship Communication: A Cost-Effective and High-Accuracy LLM Framework Using Decision Trees and RAG. Proceedings of the 2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) 2025, 0420–0426. [Google Scholar] [CrossRef]
Bradley, M.M.; Lang, P.J. International Affective Picture System. In Encyclopedia of Personality and Individual Differences; Zeigler-Hill, V., Shackelford, T.K., Eds.; Springer International Publishing: Cham, 2017; pp. 1–4. [Google Scholar] [CrossRef]
Alhammadi, A.; Yeob Yeun, C.; Damiani, E.; D. Yoo, P.; Hu, J.; Ku Yeun, H.; Yim, M.-S. EEG Brainwave Dataset. [CrossRef]

Figure 1. The overall framework of the proposed method.

Figure 2. Examples of EEG-based mental state QA formulation.

Figure 3. Sample level heatmap across model layers.

Figure 4. EEG feature heatmaps across model layers of different mental states.

Figure 5. Top layer features influencing the model’s decision.

Figure 6. Decision tree approximations of model layer features.

Figure 7. Representative LIME explanation for EEG feature-based mental-state prediction.

Figure 8. Representative SHAP force-plot explanations for EEG feature-based mental-state prediction.

Table 1. LLM tuning and implementation settings.

Item	Setting
Evaluated LLMs	Gemma-3-4B, LLaMA-3-4B, Qwen-3-4B
Tuned base model	Qwen/Qwen3-4B
Fine-tuning method	LoRA instruction tuning
Training data format	Structured EEG QA pairs
Maximum sequence length	2048 tokens
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.05
Batch size	2 per device
Gradient accumulation	8 steps
Epochs	1
Maximum training steps	100
Learning rate	$2 \times 10^{- 4}$
Optimizer	`paged_adamw_8bit`
Quantization	4-bit NF4 during tuning; Q4_K_M for GGUF export
Implementation	Google Colab, PyTorch, HuggingFace, PEFT

Table 2. Classification performance of EEG deep learning baselines on the held-out testing set.

Method	Accuracy	Macro-Precision	Macro-Recall	Macro-F1
EEGNet	0.713	0.729	0.714	0.715
DeepConvNet	0.919	0.926	0.919	0.920

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Collaborative Explainable AI for EEG Mental Health Monitoring with Constrained QA-Tuned LLM Alignment

Abstract

Keywords:

Subject:

1. Introduction

2. Related Work

3. Background and Preliminaries

3.1. EEG Signals and Mental States

3.2. LLM Tuning

3.3. Explainable AI

4. Proposed Method

4.1. Proposed Overall Framework

4.2. EEG Data Collection and Processing

4.3. QA Formulation Process

4.4. QA-Tuning Process

4.5. Explanation Process

5. Experimental Design and Results

5.1. Experimental Setup

5.2. Heatmap Explanation

5.3. Decision Tree Explanation

5.4. Comparisons with Baseline Classification and Explanation Methods

5.5. Analysis and Discussion

6. Conclusions

References

MDPI Initiatives

Important Links

Subscribe