Preprint
Article

This version is not peer-reviewed.

Collaborative Explainable AI for EEG Mental Health Monitoring with Constrained QA-Tuned LLM Alignment

  † These authors contributed equally to this work.

Submitted:

13 June 2026

Posted:

16 June 2026

You are already at the latest version

Abstract
The monitoring of mental health states using electroencephalogram (EEG) signals has gained increasing attention due to its non-invasive nature for psychological disorders. Large Language Models (LLMs) and Explainable Artificial Intelligence (XAI) have been utilized in advancing the intelligence and interpretability of EEG analysis. However, existing methods face critical bottlenecks, including the fundamental modal gap, high computational costs, and poor global consistency. The limitation of rigid classification tasks without supporting clinical reasoning and natural language interaction. In this study, we propose a collaborative explainable AI framework for EEG mental health monitoring with constrained question-and-answer (QA) tuned LLM alignment, which builds a smooth transformation path from raw EEG signals to evidence, and constructs a structured QA dataset for the instruction fine-tuning of LLMs. The central objective of this work is not simply to maximize EEG classification accuracy, but to develop an evidence-grounded alignment and explanation framework that connects EEG-derived physiological evidence with QA-based LLM reasoning. Furthermore, this work designs a transparent collaborative XAI mechanism that embeds interpretable EEG feature information as prior knowledge directly into the QA generation process of the LLM, and develops a multi-level interpretable pipeline combining attention heatmap analysis and decision tree surrogate modeling to achieve precise alignment between LLM internal reasoning and EEG neurophysiological patterns. The proposed framework addresses the limitations of traditional rigid EEG classification tasks, promotes the XAI paradigm shift from high-cost post-hoc explanations to transparent embedded explanations, and enables robust clinical reasoning and natural language interaction based on EEG signals. Experimental results on a benchmark EEG mental state dataset demonstrate that the proposed framework stably captures neurophysiological characteristics corresponding to different mental states, and effectively improves the classification performance, decision transparency and clinical credibility of EEG-based mental health monitoring systems. In this setting, classification performance is treated as one evaluation aspect, while the primary contribution lies in constrained evidence-grounded alignment and QA-based LLM explainability. This advancement provides an initial feasibility study of real-time, scalable, and trustworthy intelligent EEG-based mental health analysis.
Keywords: 
;  ;  ;  ;  

1. Introduction

Electroencephalogram (EEG) is a critical non-invasive neuropsychologic detection modality, which captures subtle dynamic changes in cerebral cortical activity with its millisecond-level high temporal resolution. It therefore exhibits distinct advantages in mental health assessment and complex cognitive state analysis [1,2,3]. In traditional clinical practice, the assessment of psychological states such as depression and anxiety often relies heavily on subjective methods, including questionnaires and clinical interviews. These methods are not only time-consuming, but also susceptible to influence from patients’ expressive ability, memory bias, and evaluators’ subjective judgment. In contrast, EEG can provide objective, continuous, and real-time measurements of neuropsychologic activity, thereby effectively reducing the interference of subjective bias on final assessment results [4,5]. Based on this objective measurement capability, EEG technology has been widely applied to a range of key tasks, including the detection of emotional disorders, episodic or neurodegenerative diseases, and complex task assessment [6,7].
Although machine learning and deep learning methods have significantly improved the feature extraction efficiency and classification performance for EEG signals in recent years, the field still faces several key challenges that must be addressed to enable translation into practical clinical applications. First, EEG signals have high dimensionality, strong non-stationary characteristics, and significant inter-individual differences across subjects, which directly leads to severe degradation in the generalization ability of existing models when applied to unseen data [8,9]. Second, state-of-the-art deep learning models that achieve exceptional classification accuracy usually lack necessary interpretability. This "black box" property prevents clinicians from understanding the internal logic behind the model’s diagnostic decisions, limiting its credibility and usability in medical applications [10,11]. Furthermore, most current studies still focus on single classification tasks, and the corresponding models lack the ability to structurally represent and interpret neural patterns, making it difficult to support complex logical reasoning and in-depth interaction required for clinical diagnosis [12,13].
To address the "black box" limitation of deep learning models and improve their decision transparency, Explainable Artificial Intelligence (XAI) techniques have been increasingly applied to EEG data analysis. Researchers have attempted to precisely quantify the specific contribution of specific brain regions and frequency bands to the model’s final decision through feature attribution methods [14,15,16]. However, most existing XAI methods adopt model-agnostic strategies, which construct post-hoc explanations by locally perturbing input data and observing the resulting changes in model predictions. While this approach can reveal local correlations between features and prediction outputs, it is essentially based on simplified proxy models and often fails to generate globally consistent explanations when processing EEG data with high dimensionality and complex spatiotemporal correlations [17].
Meanwhile, Large Language Models (LLMs) have demonstrated significant advantages in complex natural language processing, multimodal logical reasoning, and professional medical data analysis [18,19,20]. However, direct application of LLMs to EEG analysis faces substantial obstacles: EEG signals are essentially continuous, fluctuating, unstructured time-series data, while LLMs operate on discrete text tokens, resulting in a fundamental modal gap in their data representations [13,21]. Recent cutting-edge cross-modal studies have provided a promising solution to this challenge: converting complex physical or time-series signals into two-dimensional spectrograms that encode rich time-frequency information, and further aligning these spectrograms with the representation space of LLMs, can effectively break the barrier between raw signals and language, and extend cross-modal reasoning capabilities [22].
Based on the aforementioned application background and technical bottlenecks, this paper proposes a collaborative interpretable artificial intelligence framework. The framework first converts raw continuous EEG signals into evidence and constructs a highly structured Question-and-Answer (QA) dataset based on these evidence, specifically for the instruction fine-tuning of LLMs. In this study, EEG classification is not treated as the only objective, but as one evaluation component within a broader evidence-grounded reasoning framework. The main focus is to connect EEG-derived physiological evidence with QA-based LLM reasoning and interpretable decision explanations. More importantly, this framework seamlessly embeds feature attributions and interpretable information extracted by the XAI module directly into the LLM’s question-answering generation process, achieving effective alignment between low-level neuropsychologic signal features and high-level linguistic semantic concepts. Here, the collaborative XAI mechanism refers to the integration of EEG-derived interpretable evidence, LLM attention and hidden-state analysis, and surrogate decision-tree reasoning. The multi-level interpretable pipeline refers to the explanation process that moves from token-level importance, to EEG evidence-level heatmaps, and then to rule-level decision-tree explanations. This design enhances the model’s interpretability and endows the system with robust medical logical reasoning capabilities. Different from approaches that simply use SHAP or LIME outputs as textual prompts for LLMs, the proposed framework combines embedded EEG evidence construction with internal LLM representation analysis and surrogate rule extraction. Therefore, the explanation process is not only based on external feature attribution, but also links model-layer behavior with EEG physiological evidence and final mental-state classification.
The contributions of this paper are summarized in the following aspects:
  • Propose an end-to-end EEG-language model alignment framework that achieves mapping from underlying neuropsychological signals to high-level semantic logical representations.This contribution focuses on constrained evidence-grounded alignment rather than pure classification optimization.
  • Design a collaborative XAI mechanism that addresses the limitations of traditional post-hoc explanation by embedding interpretable feature information as prior knowledge directly into the question-answering generation process of the LLM.This mechanism aims to improve explanation transparency by linking model outputs with EEG channels, frequency-band evidence, heatmap patterns, and surrogate decision rules.
  • Construct a structured QA dataset that provides a solid data foundation for instruction fine-tuning and training of subsequent LLMs in the continuous physiological signal domain. The QA formulation enables the model to generate mental-state predictions together with structured reasoning and interpretable evidence, while classification performance is treated as one evaluation aspect of the overall framework.
The rest of the paper is organized as follows. Section 2 reviews the related work. Section 3 provides the necessary background and preliminaries for understanding the proposed approach. Section 4 details the proposed explanation framework, including its components and workflow. Section 5 presents the experimental design and results to validate the effectiveness of the framework. Finally, Section 6 concludes the paper and discusses potential future research directions.

3. Background and Preliminaries

3.1. EEG Signals and Mental States

EEG is a non-invasive technique for measuring brain electrical activity and has been widely adopted in neuroscience, psychology, and mental health assessment [40]. It records voltage fluctuations generated by synchronized neuronal activity through electrodes placed on the scalp. Compared with other neuroimaging methods, EEG offers very high temporal resolution at the millisecond level, which makes it suitable for capturing fast changes in cognitive and emotional states [41]. In practice, EEG signals are collected using standard electrode placement systems. The sensors are mapped to specific brain regions, such as frontal, central, parietal, temporal, and occipital areas. This setup makes it easier to analyze brain activity in a structured way and to extract region-related features for later modeling [42,43].
In signal analysis, EEG data are often decomposed into several frequency bands, including Delta (0–4 Hz), Theta (4–7 Hz), Alpha (8–13 Hz), Beta (14–30 Hz), and Gamma (30–100 Hz). These bands are linked to different neural and cognitive processes. For instance, Alpha activity is usually associated with relaxed but alert states, while Beta and Gamma bands are more related to stress responses [44]. Changes across these frequency bands provide useful information for estimating emotional states and stress levels [45,46]. By combining frequency information with electrode locations, EEG signals can be converted into structured features that reflect underlying mental conditions.
In this study, the goal of EEG-based mental health classification is to infer an individual’s psychological state directly from EEG signals. This provides a data-driven alternative to traditional methods such as self-reports or clinical interviews. The labels are derived from standardized affective stimuli [47] and grouped into several levels, including stressed, fluctuated, stable, and relaxed states. Each state is associated with different EEG patterns. These representations support supervised learning and also serve as a basis for later reasoning and interpretability analysis.

3.2. LLM Tuning

LLMs have shown strong ability in reasoning, semantic understanding, and structured text generation. However, when directly applied to domain-specific tasks such as EEG-based mental health assessment, pre-trained models often produce unreliable outputs. This is mainly due to hallucinated reasoning when handling domain-specific evidence [48].
To reduce this issue, fine-tuning is usually applied to adapt LLMs to specific tasks [49]. In this work, the prediction task is reformulated as a structured Question-Answering (QA) problem. This allows the model to learn task-related reasoning patterns, instead of performing implicit classification. Each training sample is represented as a pair ( q i , a i ) , where q i is the input instruction constructed from domain evidence, and a i is the expected structured output. The dataset is defined as:
D = { ( q i , a i ) } i = 1 N .
Given this dataset, fine-tuning aims to optimize the conditional generation probability:
L = ( q i , a i ) D log p θ ( a i q i ) ,
where θ denotes the model parameters. In practice, the loss is computed only on the output tokens. This encourages the model to learn structured responses, rather than simply copying the input prompts [48].
To improve efficiency, parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA) are used [50]. Instead of updating all parameters, LoRA modifies only selected layers while keeping the original weights fixed. Let F θ ( · ) denote the base model and Δ ϕ ( · ) the learnable low-rank update. The tuned model can be written as:
F θ , ϕ ( x ) = F θ ( x ) + Δ ϕ ( x ) ,
where ϕ represents the trainable adapter parameters. This approach reduces memory and computation cost, while keeping the general reasoning ability of the base model.

3.3. Explainable AI

XAI focuses on making complex models more transparent by showing how input information affects predictions. Among existing methods, SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) are widely used as post-hoc techniques. SHAP is based on ideas from cooperative game theory and estimates the contribution of each input variable by measuring its marginal effect across different feature subsets [51,52].
Instead of using a single attribution, SHAP averages contributions over different combinations of features. This results in a more stable importance score for each variable. The explanation model can be written in an additive form:
h ( s ) = ψ 0 + j = 1 d ψ j s j ,
where s { 0 , 1 } d represents whether a variable is included or not, and ψ j is the contribution of the j-th variable.
LIME, on the other hand, focuses on local interpretability by approximating a complex model with a simpler surrogate model in the neighborhood of a given sample [17,53]. It perturbs the input instance to generate a local dataset and fits an interpretable model (e.g., linear regression) weighted by a similarity function. The optimization objective can be expressed as:
g ^ = arg min g H L ( F , g , ω x ) + Γ ( g ) ,
where F is the original model, g is the surrogate model, ω x measures the proximity between samples, and Γ ( g ) controls model complexity. Both SHAP and LIME primarily provide explanations at the feature level, indicating which input variables are influential for a prediction.
While these approaches are effective for machine learning models, they are not directly suitable for explaining LLM operating on structured reasoning inputs. In this work, the model processes EEG-derived evidence in a question-answer format, where the decision depends on interactions across multiple evidence components rather than isolated features. Therefore, explanations need to capture not only feature importance but also the reasoning structure within the LLM [54].
To address this, this work employs attention-based heatmap analysis to investigate the internal behavior of the LLM. Specifically, attention weights and hidden states across layers are used to construct a two-dimensional importance map. One axis represents model layers, and the other corresponds to structured evidence tokens. The next step is to identify which parts of the input evidence are emphasized by the model at different stages of processing.
In addition, we use a decision tree approximation to extract interpretable rules from the LLM. The idea is to convert internal model signals, such as attention scores and hidden states, into structured features. A surrogate decision tree is then trained to approximate the model’s behavior [55]. Formally, let u denote the transformed representation derived from the LLM, and y the corresponding prediction. The decision tree T ( u ) is trained such that:
T ( u ) F ( x ) ,
where F ( x ) is the output of the original LLM.
The learned tree provides hierarchical decision rules that show how different evidence components contribute to the final prediction.
Compared with standard XAI methods, combining heatmap analysis with decision tree approximation gives a multi-level explanation. Heatmaps capture fine-grained interactions across model layers, while decision trees summarize these patterns into human-readable rules. This links low-level model signals with higher-level reasoning, making the overall decision process easier to interpret.

4. Proposed Method

This section first introduces the overall framework of the proposed framework, followed by the details of the components.

4.1. Proposed Overall Framework

This study proposes a collaborative framework for EEG-based mental health monitoring, which integrates structured QA-based LLM alignment with multi-level explainability analysis. As illustrated in Figure 1, the proposed framework consists of three main stages: LLM tuning and testing, LLM explanation, and text-based reasoning.
In the first stage, raw EEG signals are collected from participants and processed to extract meaningful features from both frequency and electrical nodes. The processed EEG data are reformulated into structured QA instances based also on the Retrieval Augmented Knowledge (RAG) base. The inputs consist of evidence-based descriptions derived from EEG signals, and the output corresponds to mental state predictions with reasoning. Based on this formulation, two LLMs are developed: an untuned baseline LLM with RAG information only and a fine-tuned LLM with both RAG information and EEG QA formulations. Both models are evaluated on a generalized testing set to assess their reasoning capability and generalization performance.
In the second stage, the explainability of the tuned LLM is analyzed through a multi-level interpretation pipeline. Given EEG testing samples, the tuned LLM generates answers for these testing samples. Furthermore, the internal model features, including model layer attention distributions and hidden states, are extracted. These model layers and testing samples are then used to construct sample-level heatmaps, capturing the contribution of different evidence tokens across model layers. Subsequently, evidence alignment is performed to map token-level importance to structured EEG evidence, leading to the generation of physical-layer EEG evidence heatmaps. These EEG evidence heatmaps provide an interpretable view of how the model prioritizes different EEG features and evidence during decision-making.
In the final stage, features are constructed from both model-layer features and evidence-level importance using sample-level heatmaps. These features are combined through cross-feature construction to capture interactions between models and EEG evidence. A joint decision tree is then trained to approximate the behavior of the LLM. This allows us to extract hierarchical decision rules that reflect its reasoning patterns. The resulting trees are further converted into human-readable explanations through a text-based reasoning process.

4.2. EEG Data Collection and Processing

EEG data were collected in a controlled laboratory setting to keep the recording conditions stable. The experiment followed an affective stimulation protocol to capture different mental states. All participants provided written informed consent before taking part in the study. The protocol was approved by the Institutional Review Board (IRB) of Khalifa University, and standard requirements such as voluntary participation, withdrawal, privacy, and data usage were clearly addressed. A custom application was used to present emotionally evocative images from the International Affective Picture System (IAPS) [56]. Each image was shown for a fixed duration, with neutral intervals inserted between stimuli to reduce carry-over effects. These stimuli were selected to induce several psychological states, including stressed, mildly fluctuating, stable, and relaxed conditions. EEG signals were recorded continuously during the experiment and aligned with the stimulus intervals, which allows consistent labeling of mental states. This synchronization helps maintain temporal alignment between the signals and the labels.
The dataset includes 24 participants with a balanced demographic distribution, providing a reasonable level of inter-subject variability. EEG signals were collected using the Emotiv Insight wireless headset [57], with a sampling rate of 128 Hz. The device records five channels located at AF3, AF4, T7, T8, and Pz, covering frontal, temporal, and parietal regions. For analysis, EEG signals are divided into standard frequency bands: Delta (0–4 Hz), Theta (4–7 Hz), Alpha (8–13 Hz), Beta (14–30 Hz), and Gamma (30–100 Hz). The feature space is constructed by combining electrode locations with these frequency components. For each electrode, five band-based features are extracted, resulting in a structured representation that captures both spatial and spectral information. For example, each channel includes Delta, Theta, Alpha, Beta, and Gamma band power, forming a consistent multi-channel feature set across all samples.
The collected EEG data are stored in a structured format with timestamps, frequency-band values, and signal quality indicators. These signals are then processed to build feature representations, which are used as inputs for the later QA formulation and LLM-based reasoning.

4.3. QA Formulation Process

As summarized in Algorithm 1, the EEG corpus is divided into a training set D train and a testing set D test . The training split is first used to establish a reference distribution for subsequent evidence construction. Specifically, the feature-wise mean vector μ , standard deviation vector σ , and label-specific centroids c y are computed from D train . These quantities provide a stable baseline for measuring how each EEG sample deviates from the training distribution and ensure that the testing split is processed without leaking information into the reference statistics. Based on these reference statistics, the evidence-construction operations in Algorithm 1 are applied to convert each EEG sample into structured and interpretable descriptors. Specifically, Agg elec ( · ) aggregates frequency-band features within each electrode, Agg band ( · ) aggregates features within each frequency band across electrodes, TopK + ( · ) identifies the most elevated z-score features, and TopK ( · ) identifies the most suppressed z-score features. The function reasoning ( · ) produces an evidence-grounded explanation using these structured descriptors, derived neurophysiological indicators, and channel-band deviations.
For each sample x i , a structured evidence representation is derived by combining raw EEG observations with aggregated and derived descriptors. Following the notation in Algorithm 1, the evidence block is represented as
E i = x i , e i ( elec ) , e i ( band ) , r i ( 1 ) , r i ( 2 ) , r i ( 3 ) , E i + , E i ,
where x i denotes the original EEG feature vector, e i ( elec ) and e i ( band ) denote electrode-level and band-level aggregates, respectively, r i ( 1 ) , r i ( 2 ) , r i ( 3 ) are derived neurophysiological indicators, and E i + , E i denote the most elevated and most suppressed features selected from the z-score vector
z i = ( x i μ ) σ .
For each sample, the question side is defined as
Q i = ( Problem , E i ) ,
where Problem specifies the EEG-based mental state reasoning task and E i is the structured evidence block. The answer side is defined as
A i = ( y ^ i , reasoning ( E i ) , c i ) ,
where y ^ i = M ( y i ) is the mapped mental-state label under the label map M , reasoning ( E i ) denotes the explanation grounded in the evidence block, and c i is a confidence score. In our implementation, the confidence is derived from the distance between the sample and the nearest training label centroid, i.e.,
c i = exp min y x i c y 2 .
Compared with direct label prediction, this QA construction explicitly encourages LLM to reason over structured EEG evidence, including channel-band deviations, regional aggregates, and derived asymmetry or ratio indicators. In this way, the generated QA pairs are more suitable for downstream LLM alignment and tuning, since LLM is trained to associate mental-state decisions with interpretable physiological evidence.
Finally, the same evidence construction pipeline is applied to both D train and D test , while the training-derived reference statistics ( μ , σ , c y ) are reused for the testing split. This preserves a consistent evidence space across splits and provides a principled basis for comparing untuned and QA-tuned LLMs under identical EEG reasoning constraints. Figure 2 illustrates representative EEG-based QA samples formulated, where structured evidence is provided.    Preprints 218444 i001

4.4. QA-Tuning Process

The tuning of local LLMs in this study follows a parameter-efficient instruction alignment pipeline, as summarized in Algorithm 2. The objective is to learn a mapping from structured EEG evidence to reasoning outputs, while preserving the general linguistic and reasoning capabilities of the base LLM.
Let the QA dataset generated from EEG samples be denoted as
J = { ( Q k , A k ) } k = 1 | J | ,
where Q k = ( Problem , E k ) is the question constructed from EEG evidence (as defined in the QA generation stage), and A k = ( y ^ k , reasoning ( E k ) , c k ) is the corresponding structured answer. Each QA instance is transformed into a supervised training pair ( x k , y k ) via a formatting function
( x k , y k ) = FormatSFT ( Q k , A k ) ,
where x k encodes the EEG-based reasoning problem together with the Evidence Block E k , and y k represents the target structured response. The resulting training set is defined as
D = { ( x k , y k ) ( Q k , A k ) J } .
During optimization, the loss is computed only over the output tokens corresponding to y k , ensuring that the model focuses on generating structured reasoning outputs rather than reproducing the input prompt.
To improve efficiency, parameter-efficient fine-tuning is adopted using Low-Rank Adaptation (LoRA). Given a base model M with parameters θ , the original weights are frozen, and a set of trainable low-rank adapters ϕ is injected into selected projection layers U .The adapted model is written as
f θ , ϕ ( x ) = f θ ( x ) + Δ ϕ ( x ) ,
where Δ ϕ denotes the learned low-rank update. The model is trained for E epochs by minimizing the supervised fine-tuning loss
L SFT = ( x k , y k ) D log p θ , ϕ ( y k x k ) .
After training, the LoRA parameters are merged into the base model to obtain a standalone tuned model M ^ . The merged model is then exported and quantized for efficient local deployment. This setup allows a direct comparison between untuned and QA-aligned LLMs under the same EEG evidence inputs, while keeping the computational cost manageable and the results reproducible.    Preprints 218444 i002

4.5. Explanation Process

Given an EEG QA sample ( Q i , A i ) with structured evidence E i , the tuned model M ^ produces a prediction together with internal representations, including hidden states and attention distributions across layers. These signals are first used to construct a sample-level importance representation, where each input token is associated with a layer-wise importance score. Formally, let H and A denote the hidden state and attention matrix at layer , respectively. The sample-level importance can be expressed as:
S i = G { H , A } = 1 L ,
where G ( · ) denotes an aggregation function that combines model-layer statistics (e.g., norms, means, or attention entropy).
To obtain interpretable explanations aligned with EEG features, the sample-level importance is further mapped to structured evidence components. Specifically, the token-level importance scores are grouped according to the evidence elements defined in E i , resulting in an evidence-level heatmap:
H i = Align ( S i , E i ) ,
where H i captures the contribution of each EEG-derived evidence component across model layers. This transformation enables the interpretation of how different EEG features (e.g., band power, electrode activity, or derived indicators) influence the model’s decision.
To further extract structured reasoning, model-layer features and evidence-level importance are jointly transformed into a feature space for surrogate modeling. Let u i denote the constructed feature vector:
u i = Φ ( S i , H i ) ,
where Φ ( · ) represents feature construction, including cross-feature interactions between model-layer statistics and EEG evidence importance.
A decision tree surrogate model T ( · ) is then trained to approximate the behavior of the LLM:
T ( u i ) M ^ ( Q i ) .
The resulting tree provides a hierarchical set of decision rules that reveal how model-layer responses contribute to the final prediction.    Preprints 218444 i003

5. Experimental Design and Results

5.1. Experimental Setup

The experiments are conducted on a realistic EEG-based mental state dataset [57]. The collected dataset contains multi-channel brain signal features extracted from different electrodes and frequency bands, including Theta, Alpha, Beta, and Gamma components. These features provide physiological evidence for mental state analysis, detailed in SubSection 4.2. The mental state recognition task is formulated as structured QA pairs, where each sample consists of an EEG evidence block and a corresponding reasoning-based explanation. The evidence block includes electrode-frequency features (e.g., AF3 Theta, T7 Beta), while the output is a description of the mental states. Example QA formulations are shown in Figure 2. For training and evaluation, the generated QA samples are divided into a tuning set and a held-out testing set. The tuning set contains the majority of samples used for instruction tuning, while the testing set contains 500 samples used for evaluation. The testing set is constructed to include diverse EEG patterns across all mental states to assess the robustness of the proposed framework. Experiments are conducted using local large language models (LLMs), including Gemma-3-4B, LLaMA-3-4B, and Qwen-3-4B, deployed via Ollama and AnythingLLM. Parameter-efficient fine-tuning is performed using LoRA, where only low-rank adapter parameters are updated while keeping the base model weights frozen.The key LLM tuning and implementation settings are summarized in Table 1. The models are trained using supervised instruction tuning with structured QA pairs, and the loss is computed only over output tokens to encourage structured reasoning generation. The training and data processing pipelines are implemented in Python, with model fine-tuning conducted in a Google Colab environment. All experiments are implemented using standard deep learning libraries, including PyTorch and HuggingFace Transformers.

5.2. Heatmap Explanation

The sample-level heatmap gives an initial view of how the model distributes attention across reasoning segments and layers. As shown in Figure 3, the importance values are not uniform and instead follow clear layer-wise patterns. Early segments tend to have higher activation in shallow and middle layers, while later segments show weaker influence, especially in deeper layers. However, these patterns are still abstract and cannot be directly linked to interpretable EEG evidence. This motivates further alignment with domain-specific features in the next step.
The EEG feature-level heatmaps provide a more interpretable view by showing how different electrode-frequency components are weighted across model layers for different mental states. As shown in Figure 4, consistent activation patterns appear across the four states. This suggests that the model captures structured neurophysiological characteristics, rather than relying on arbitrary feature combinations.
For the relaxed state, alpha-related features (e.g., AF3 Alpha, T8 Alpha, PZ Alpha) show relatively high importance in early and middle layers, followed by a gradual decrease in deeper layers. This pattern is consistent with common EEG findings, where alpha activity is linked to relaxed and low-arousal conditions. The concentration of importance in the middle layers suggests that the model aggregates alpha-related evidence at an intermediate stage before making the final prediction. In contrast, the stressed state is characterized by stronger activation of beta-band components (e.g., AF3 Low Beta, T7 Low Beta, AF4 High Beta). These features maintain relatively high importance across a wider range of layers. Compared with the relaxed state, the importance values are both higher and more stable, indicating that higher-frequency activity plays a key role in the decision process. This matches the general view that beta activity is associated with increased cognitive load and stress. For the fluctuated state, theta-related features (e.g., AF3 Theta, T7 Theta, T8 Theta) are more noticeable. However, their importance is spread across layers rather than concentrated in a specific region. This distributed pattern suggests that the model captures changes and transitions in brain activity, instead of a stable condition, which aligns with the definition of fluctuating mental states. The stable state shows a more balanced pattern. Multiple frequency bands (e.g., alpha, beta, and gamma) contribute with moderate importance, and no single band clearly dominates. This indicates that the model relies on a combination of consistent but non-extreme features to represent stability.

5.3. Decision Tree Explanation

The feature importance analysis of the decision tree gives an initial view of which model-layer features have the strongest impact on the final decision. As shown in Figure 5, only a subset of layer-level features plays a dominant role. This uneven distribution suggests that the model does not use all layers equally, but instead focuses on specific layers and statistics, such as attention entropy and hidden state norms. However, these importance scores only indicate which features matter. They do not explain how these features interact to produce a specific decision.
As shown in Figure 6, the tree structure reflects a hierarchical decision process, where a small number of layer-level features split the data into different mental states. At the root node, Layer34_attention_entropy is used as the main splitting feature. This suggests that attention behavior at this layer has a strong influence on the overall decision. When the entropy is below a certain threshold, the model tends to classify the sample as relaxed. This is consistent with the idea that more stable attention patterns are linked to low-arousal conditions. If this condition is not satisfied, the model moves to intermediate-layer features, such as Layer8_attention_entropy and Layer23_hidden_mean. These features help refine the decision, especially when separating stressed states from other categories. The repeated use of attention entropy across different layers indicates that variation in attention patterns is an important signal for detecting higher cognitive load or stress. Further down the tree, features such as Layer24_hidden_norm and Layer22_attention_entropy are used to distinguish between fluctuated and stable states. Fluctuated states are associated with more variable activation patterns across layers, while stable states appear when hidden representations are more consistent in magnitude.

5.4. Comparisons with Baseline Classification and Explanation Methods

Although the main objective of the proposed framework is evidence-grounded explanation rather than pure EEG classification, classification baselines are still useful for contextualizing the discriminative difficulty of the EEG mental-state recognition task. Therefore, two representative EEG deep learning models, EEGNet and DeepConvNet, are evaluated on the same held-out testing set. Since the dataset used in this study contains structured electrode-band features rather than raw continuous EEG segments, both models are implemented as feature-level convolutional baselines using the same 5-channel and 5-band EEG representation. Accuracy, Macro-Precision, Macro-Recall, and Macro-F1 are reported to provide a balanced evaluation across different mental-state classes.
As shown in Table 2, DeepConvNet achieves stronger classification performance than EEGNet on the current electrode-band feature representation. This result indicates that deeper convolutional structures can capture more discriminative patterns from the structured EEG feature matrix. EEGNet also provides a lightweight reference baseline, but its performance is lower under the current feature-level setting. These results provide classification-oriented baselines for the EEG mental-state task, while the subsequent analysis focuses on the interpretability of the reasoning process rather than treating classification accuracy as the sole optimization target.
In addition to classification baselines, representative post-hoc XAI methods are used to examine feature-level explanations. LIME generates local explanations by perturbing the input EEG feature space and estimating the contribution of individual features to a specific prediction. SHAP estimates positive and negative feature contributions relative to a baseline output. Representative LIME and SHAP explanations are shown in Figure 7 and Figure 8, respectively.
The LIME explanation presents local decision conditions and class probabilities, showing how individual EEG features support or oppose a given prediction. The SHAP force plots further show how features such as alpha, theta, and beta-band components push the model output above or below the baseline value. These results confirm that conventional post-hoc XAI methods can identify influential EEG features at the input level. However, their explanations mainly describe feature-output associations and do not explicitly characterize how structured EEG evidence is processed inside the QA-tuned LLM. In contrast, the proposed explanation process links EEG feature heatmaps, model-layer importance, and surrogate decision-tree rules. This provides a more structured explanation path from physiological EEG evidence to internal model behavior and final language-based reasoning.

5.5. Analysis and Discussion

Instead of directly merging model-layer features and EEG features into a single classifier, the proposed framework performs a joint analysis. This allows the reasoning process to reflect both model behavior and physiological evidence. From the EEG feature heatmaps in Figure 4, each mental state shows a distinct activation pattern across layers. The relaxed state is mainly associated with alpha-band activity (e.g., AF3 Alpha, T8 Alpha), which appears stronger in early and middle layers. In contrast, the stressed state shows consistent activation of low-beta and high-beta components across more layers. The fluctuated state presents more spread-out theta activity, indicating higher variability, while the stable state shows balanced contributions from several frequency bands without a clear dominant one.
These patterns are consistent with the decision tree in Figure 6. At the root node, Layer34_attention_ entropy separates relaxed samples from others. This suggests that stable attention at this layer is related to the alpha-dominant patterns seen in the heatmaps. For the remaining samples, intermediate-layer features such as Layer8_attention_entropy and Layer23_hidden_mean are used to identify stressed states. This matches the observation that beta-band activity remains strong across layers under stress. The repeated use of attention entropy in these branches indicates that changes in attention patterns are closely related to high-frequency EEG activity. The distinction between fluctuated and stable states is mainly handled by features such as Layer24_hidden_norm and Layer22_attention_entropy. In the heatmaps, fluctuated states show more dispersed theta activity, while stable states exhibit more balanced activation patterns. This suggests that intermediate-layer representations capture the level of variability in EEG signals.
There is a clear alignment between model layers and EEG features. Early and middle layers are mainly responsible for combining dominant EEG signals, while deeper layers provide more stable and abstract representations for final decisions. This shows that the model is not acting as a black box, but instead reflects meaningful physiological patterns across layers. Based on this alignment, joint reasoning can be constructed by linking decision paths with EEG evidence. For example, a stressed sample often follows a path dominated by attention entropy features in intermediate layers, which is consistent with strong beta-band activity. Similarly, a relaxed prediction is associated with lower entropy and aligns with alpha-dominant patterns. These connections allow the model to produce structured explanations grounded in both internal behavior and EEG evidence. However, the proposed formulation still relies on structured feature construction and therefore does not fully bridge the modality gap in the same way as end-to-end cross-modal alignment methods such as RF-GPT [22]. Compared with RF-GPT, the proposed framework is less direct in signal-language alignment, but it provides a more constrained and traceable reasoning path. Each prediction can be traced back to specific EEG channels, frequency bands, heatmap patterns, and surrogate decision rules. Therefore, the proposed method should be regarded as an interpretable and evidence-grounded alternative, rather than a replacement for fully end-to-end cross-modal models.

6. Conclusions

This study presents an explainable framework for EEG-based mental state analysis using a constrained QA-tuned LLM. By reformulating the task as a structured reasoning problem, the model produces predictions that are grounded in physiological evidence, rather than relying on implicit classification. A multi-level explanation pipeline is developed. Sample-level heatmaps show how importance is distributed across layers. EEG feature heatmaps reveal activation patterns that are consistent with known physiological signals. Decision tree approximations further summarize these patterns into hierarchical and interpretable rules. Together, these components link internal model behavior with EEG-based evidence. The results show that the model captures meaningful neurophysiological patterns across different mental states. For example, alpha activity is more prominent in relaxed conditions, while beta activity is stronger under stress. The decision tree structures also indicate how these patterns are used across layers. This provides a consistent view that connects model dynamics with physiological signals, improving both interpretability and reliability.
There are still some limitations. The current setup uses low-density EEG data with a limited number of channels, which restricts the level of detail in the analysis. Moreover, the dataset is based on a limited number of participants and was not further validated on large-scale public EEG benchmarks. Therefore, the present results should be interpreted as an initial feasibility study rather than direct evidence of clinical translation. Broader validation using larger datasets, denser EEG recordings, and external benchmark datasets is still required to assess the generalizability and robustness of the proposed framework. In addition, the alignment is based on statistical relationships, and does not explicitly model causal effects between EEG features and model decisions. The current explanation evaluation is also mainly based on computational consistency and physiological plausibility, without formal assessment by clinical or domain experts.
Future work will focus on incorporating causal-aware alignment methods to move beyond correlation-based explanations. Another important direction is to conduct external validation using larger cohorts, denser EEG recordings, and public benchmark datasets, so that the stability of the proposed framework can be evaluated beyond the current self-collected dataset. In addition, future studies will involve neurologists, psychologists, or other domain experts to evaluate the comprehensibility, clinical relevance, and trustworthiness of the generated explanations. Such expert-centered evaluation will help determine whether the heatmaps, decision tree rules, and language-based explanations are understandable and useful in realistic mental health assessment scenarios. These steps can further improve the reliability and practical use of explainable LLM systems in EEG-based healthcare applications.

References

  1. Kuriyakose, D.; et al. Explainable AI uncovers novel EEG microstate candidate neurophysiological markers for autism spectrum disorder. Front. Comput. Neurosci. 2026, 20, 1763727. [Google Scholar] [CrossRef] [PubMed]
  2. Torres, J.M.M.; Medina-DeVilliers, S.; Clarkson, T.; Lerner, M.D.; Riccardi, G. Evaluation of interpretability for deep learning algorithms in EEG emotion recognition: A case study in autism. Artif. Intell. Med. 2023, 143, 102545. [Google Scholar] [CrossRef] [PubMed]
  3. Rehman, A.; Mun, S. Explainable AI-Enhanced Ensemble Protocol Using Gradient-Boosted Models for Zero-False-Alarm Seizure Detection from EEG. Sensors 2026, 26, 863. [Google Scholar] [CrossRef] [PubMed]
  4. Zhai, L.; Zhao, M.; Zhang, J.; Jamil, M.; Naz, R.; Li, C. A systematic review of EEG-based biomarkers for depression, anxiety, and bipolar disorder: trends in explainable artificial intelligence (XAI). BMC Psychiatry 2025. [Google Scholar] [CrossRef] [PubMed]
  5. Zanola, A.; Fabrice Tshimanga, L.; Del Pup, F.; Baiesi, M.; Atzori, M. xEEGNet: towards explainable AI in EEG dementia classification. J. Neural Eng. 2025, 22, 046042. [Google Scholar] [CrossRef] [PubMed]
  6. Ahmad, I.; Zhu, M.; Li, G.; Javeed, D.; Kumar, P.; Chen, S. A secure and interpretable AI for smart healthcare system: A case study on epilepsy diagnosis using EEG signals. IEEE J. Biomed. Health Inform. 2024, 28, 3236–3247. [Google Scholar] [CrossRef] [PubMed]
  7. Islam, M.S.; Hussain, I.; Rahman, M.M.; Park, S.J.; Hossain, M.A. Explainable artificial intelligence model for stroke prediction using EEG signal. Sensors 2022, 22, 9859. [Google Scholar] [CrossRef] [PubMed]
  8. Jayaram, V.; Alamgir, M.; Altun, Y.; Scholkopf, B.; Grosse-Wentrup, M. Transfer learning in brain-computer interfaces. IEEE Comput. Intell. Mag. 2016, 11, 20–31. [Google Scholar] [CrossRef]
  9. Lotte, F.; Bougrain, L.; Cichocki, A.; Clerc, M.; Congedo, M.; Rakotomamonjy, A.; Yger, F. A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update. J. Neural Eng. 2018, 15, 031005. [Google Scholar] [CrossRef] [PubMed]
  10. Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 2015; pp. 1721–1730. [Google Scholar] [CrossRef]
  11. Zhang, Z.; Damiani, E.; Hamadi, H.; Yeun, C.; Taher, F. A late multi-modal fusion model for detecting hybrid spam e-mail. Int. J. Comput. Theory Eng. 2023, 15, 76–81. [Google Scholar] [CrossRef]
  12. Moor, M.; Banerjee, O.; Abad, Z.S.H.; Krumholz, H.M.; Leskovec, J.; Topol, E.J.; Rajpurkar, P. Foundation models for generalist medical artificial intelligence. Nature 2023, 616, 259–265. [Google Scholar] [CrossRef] [PubMed]
  13. Lu, W.; Song, C.; Wu, J.; Zhu, P.; Zhou, Y.; Mai, W.; Zheng, Q.; Ouyang, W. Unimind: Unleashing the power of llms for unified multi-task brain decoding. arXiv 2025, arXiv:2506.18962. [Google Scholar] [CrossRef]
  14. Sánchez-Hernández, S.E.; Torres-Ramos, S.; Román-Godínez, I.; Salido-Ruiz, R.A. Evaluation of the relation between ictal EEG features and XAI explanations. Brain Sci. 2024, 14, 306. [Google Scholar] [CrossRef] [PubMed]
  15. Hussain, I.; Jany, R.; Boyer, R.; Azad, A.; Alyami, S.A.; Park, S.J.; Hasan, M.M.; Hossain, M.A. An explainable EEG-based human activity recognition model using machine-learning approach and LIME. Sensors 2023, 23, 7452. [Google Scholar] [CrossRef] [PubMed]
  16. Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef] [PubMed]
  17. Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016; pp. 1135–1144. [Google Scholar] [CrossRef]
  18. AlSaad, R.; Abd-Alrazaq, A.; Boughorbel, S.; Ahmed, A.; Renault, M.A.; Damseh, R.; Sheikh, J. Multimodal large language models in health care: applications, challenges, and future outlook. J. Med. Internet Res. 2024, 26, e59505. [Google Scholar] [CrossRef] [PubMed]
  19. Carmona-Martos, L.; Martín-Palomeque, P.; Escudero-Arnanz, Ó.; Soguero-Ruiz, C. Interpretable large language models for early prediction of antimicrobial multidrug resistance. Health Inf. Sci. Syst. 2025, 14, 11. [Google Scholar] [CrossRef] [PubMed]
  20. Feli, M.; Azimi, I.; Liljeberg, P.; Rahmani, A.M. An llm-powered agent for physiological data analysis: A case study on ppg-based heart rate estimation. In Proceedings of the 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); IEEE, 2025; pp. 1–7. [Google Scholar] [CrossRef] [PubMed]
  21. Guo, F.; Zhang, Z.; Mo, H.; Li, C. A method for battery soh estimation based on k-means and lightgbm algorithm. In Proceedings of the 2024 6th International Conference on System Reliability and Safety Engineering (SRSE); IEEE, 2024; pp. 1–7. [Google Scholar] [CrossRef]
  22. Zou, H.; Tian, Y.; Wang, B.; Bariah, L.; Lasaulce, S.; Huang, C.; Debbah, M. RF-GPT: Teaching AI to See the Wireless World. arXiv 2026, arXiv:2602.14833. [Google Scholar] [CrossRef]
  23. Craik, A.; He, Y.; Contreras-Vidal, J.L. Deep learning for electroencephalogram (EEG) classification tasks: a review. J. Neural Eng. 2019, 16, 031001. [Google Scholar] [CrossRef] [PubMed]
  24. Xia, M.; Zhang, Y.; Wu, Y.; Wang, X. An end-to-end deep learning model for EEG-based major depressive disorder classification. IEEE Access 2023, 11, 41337–41347. [Google Scholar] [CrossRef]
  25. Subhani, A.R.; Mumtaz, W.; Saad, M.N.B.M.; Kamel, N.; Malik, A.S. Machine learning framework for the detection of mental stress at multiple levels. IEEE Access 2017, 5, 13545–13556. [Google Scholar] [CrossRef]
  26. Wan, Z.; Yang, R.; Huang, M.; Zeng, N.; Liu, X. A review on transfer learning in EEG signal analysis. Neurocomputing 2021, 421, 1–14. [Google Scholar] [CrossRef]
  27. Xue, B.; Lv, Z.; Xue, J. Feature transfer learning in EEG-based emotion recognition. In Proceedings of the 2020 Chinese Automation Congress (CAC); IEEE, 2020; pp. 3608–3611. [Google Scholar] [CrossRef]
  28. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  29. Shoeibi, A.; Sadeghi, D.; Moridian, P.; Ghassemi, N.; Heras, J.; Alizadehsani, R.; Khadem, A.; Kong, Y.; Nahavandi, S.; Zhang, Y.D.; et al. Automatic diagnosis of schizophrenia in EEG signals using CNN-LSTM models. Front. Neuroinformatics 2021, 15, 777977. [Google Scholar] [CrossRef] [PubMed]
  30. Borra, D.; Fantozzi, S.; Magosso, E. Interpretable and lightweight convolutional neural network for EEG decoding: Application to movement execution and imagination. Neural Netw. 2020, 129, 55–74. [Google Scholar] [CrossRef] [PubMed]
  31. Tonekaboni, S.; Joshi, S.; McCradden, M.D.; Goldenberg, A. What clinicians want: contextualizing explainable machine learning for clinical end use. In Proceedings of the Machine learning for healthcare conference. PMLR, 2019; pp. 359–380. [Google Scholar] [CrossRef]
  32. Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; Müller, H. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1312. [Google Scholar] [CrossRef] [PubMed]
  33. Theissler, A.; Spinnato, F.; Schlegel, U.; Guidotti, R. Explainable AI for time series classification: a review, taxonomy and research directions. Ieee Access 2022, 10, 100700–100724. [Google Scholar] [CrossRef]
  34. Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large language models encode clinical knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef] [PubMed]
  35. Thirunavukarasu, A.J.; Ting, D.S.J.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S.W. Large language models in medicine. Nat. Med. 2023, 29, 1930–1940. [Google Scholar] [CrossRef] [PubMed]
  36. Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar] [CrossRef]
  37. Peng, Q.; Li, J.; Huang, S.; Jiang, Y.; Gong, K.; Ding, R.; Ye, S.; Zheng, C.; Wei, X.Y.; Li, Q. Aligning clinical needs and AI capabilities: a survey on LLMs for medical reasoning. Authorea Prepr. 2025. [Google Scholar] [CrossRef] [PubMed]
  38. Babu, N.; Mathew, J.; Vinod, A. Large language models for eeg: A comprehensive survey and taxonomy. arXiv 2025, arXiv:2506.06353. [Google Scholar] [CrossRef]
  39. Babu, N.; Mathew, J.; Satija, U.; Vinod, A. Modality reprogramming: Adapting frozen LLMs for multi-channel EEG classification. Neurocomputing 2025, 132407. [Google Scholar] [CrossRef]
  40. Al Hammadi, A.Y.; Yeun, C.Y.; Damiani, E.; Yoo, P.D.; Hu, J.; Yeun, H.K.; Yim, M.S. Explainable artificial intelligence to evaluate industrial internal security using EEG signals in IoT framework. Ad. Hoc Netw. 2021, 123, 102641. [Google Scholar] [CrossRef]
  41. Al Hammadi, A.Y.; Lee, D.; Yeun, C.Y.; Damiani, E.; Kim, S.K.; Yoo, P.D.; Choi, H.J. Novel EEG Sensor-Based Risk Framework for the Detection of Insider Threats in Safety Critical Industrial Infrastructure. IEEE Access 2020, 8, 206222–206234. [Google Scholar] [CrossRef]
  42. Joshi, V.M.; Ghongade, R.B. IDEA: Intellect database for emotion analysis using EEG signal. J. King Saud. Univ.-Comput. Inf. Sci. 2022, 34, 4433–4447. [Google Scholar] [CrossRef]
  43. Kim, J.; Park, Y.; Chung, W. Transform based feature construction utilizing magnitude and phase for convolutional neural network in EEG signal classification. In Proceedings of the 2020 8th International Winter Conference on Brain-Computer Interface (BCI), 2020; pp. 1–4. [Google Scholar] [CrossRef]
  44. Agarwal, T.; Raturi, S.; Vybhav, T.; Singh, M. Classification of EEG signal using lstms under audiovisual stimuli. In Proceedings of the 2020 international conference on communication and signal processing (iccsp); IEEE, 2020; pp. 1229–1232. [Google Scholar] [CrossRef]
  45. Chao, H.; Dong, L. Emotion Recognition Using Three-Dimensional Feature and Convolutional Neural Network from Multichannel EEG Signals. IEEE Sens. J. 2021, 21, 2024–2034. [Google Scholar] [CrossRef]
  46. Chattopadhyay, S.; Zary, L.; Quek, C.; Prasad, D.K. Motivation detection using EEG signal analysis by residual-in-residual convolutional neural network. Expert Syst. With Appl. 2021, 184, 115548. [Google Scholar] [CrossRef]
  47. Zhang, Z.; Umar, S.; Hammadi, A.Y.A.; Yoon, S.; Damiani, E.; Ardagna, C.A.; Bena, N.; Yeun, C.Y. Explainable Data Poison Attacks on Human Emotion Evaluation Systems Based on EEG Signals. IEEE Access 2023, 11, 18134–18147. [Google Scholar] [CrossRef]
  48. Wu, X.K.; Chen, M.; Li, W.; Wang, R.; Lu, L.; Liu, J.; Hwang, K.; Hao, Y.; Pan, Y.; Meng, Q.; et al. Llm fine-tuning: Concepts, opportunities, and challenges. Big Data Cogn. Comput. 2025, 9, 87. [Google Scholar] [CrossRef]
  49. Zhang, B.; Wang, J.; Du, Q.; Zhang, J.; Tu, Z.; Chu, D. A survey on data selection for llm instruction tuning. J. Artif. Intell. Res. 2025, 83. [Google Scholar] [CrossRef]
  50. Che, C.; Wang, Z.; Yang, P.; Wang, C.; Ma, H.; Shi, Z. LoRA in LoRA: Towards parameter-efficient architecture expansion for continual visual instruction tuning. Proc. Proc. AAAI Conf. Artif. Intell. 2026, Vol. 40, 19978–19986. [Google Scholar] [CrossRef]
  51. Zhang, Z.; Hamadi, H.A.; Damiani, E.; Yeun, C.Y.; Taher, F. Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research. IEEE Access 2022, 10, 93104–93139. [Google Scholar] [CrossRef]
  52. Huang, X.; Zhang, Z.; Guo, F.; Wang, X.; Chi, K.; Wu, K. Research on older adults’ interaction with e-health interface based on explainable artificial intelligence. In Proceedings of the International Conference on Human-Computer Interaction; Springer, 2024; pp. 38–52. [Google Scholar] [CrossRef]
  53. Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-precision model-agnostic explanations. In Proceedings of the Proceedings of the AAAI conference on artificial intelligence, 2018; Vol. 32. [Google Scholar] [CrossRef]
  54. Li, H.; Kam-Kwai, W.; Luo, Y.; Chen, J.; Liu, C.; Zhang, Y.; Lau, A.K.H.; Qu, H.; Liu, D. Save It for the “Hot” Day: An LLM-Empowered Visual Analytics System for Heat Risk Management. IEEE Trans. Vis. Comput. Graph. 2025, 31, 8928–8943. [Google Scholar] [CrossRef] [PubMed]
  55. Ku, J.; Kim, S.; Lee, E.; Zaman, U.; Kim, K. Enhancing Autonomous Ship Communication: A Cost-Effective and High-Accuracy LLM Framework Using Decision Trees and RAG. Proceedings of the 2025 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) 2025, 0420–0426. [Google Scholar] [CrossRef]
  56. Bradley, M.M.; Lang, P.J. International Affective Picture System. In Encyclopedia of Personality and Individual Differences; Zeigler-Hill, V., Shackelford, T.K., Eds.; Springer International Publishing: Cham, 2017; pp. 1–4. [Google Scholar] [CrossRef]
  57. Alhammadi, A.; Yeob Yeun, C.; Damiani, E.; D. Yoo, P.; Hu, J.; Ku Yeun, H.; Yim, M.-S. EEG Brainwave Dataset. [CrossRef]
Figure 1. The overall framework of the proposed method.
Figure 1. The overall framework of the proposed method.
Preprints 218444 g001
Figure 2. Examples of EEG-based mental state QA formulation.
Figure 2. Examples of EEG-based mental state QA formulation.
Preprints 218444 g002
Figure 3. Sample level heatmap across model layers.
Figure 3. Sample level heatmap across model layers.
Preprints 218444 g003
Figure 4. EEG feature heatmaps across model layers of different mental states.
Figure 4. EEG feature heatmaps across model layers of different mental states.
Preprints 218444 g004
Figure 5. Top layer features influencing the model’s decision.
Figure 5. Top layer features influencing the model’s decision.
Preprints 218444 g005
Figure 6. Decision tree approximations of model layer features.
Figure 6. Decision tree approximations of model layer features.
Preprints 218444 g006
Figure 7. Representative LIME explanation for EEG feature-based mental-state prediction.
Figure 7. Representative LIME explanation for EEG feature-based mental-state prediction.
Preprints 218444 g007
Figure 8. Representative SHAP force-plot explanations for EEG feature-based mental-state prediction.
Figure 8. Representative SHAP force-plot explanations for EEG feature-based mental-state prediction.
Preprints 218444 g008
Table 1. LLM tuning and implementation settings.
Table 1. LLM tuning and implementation settings.
Item Setting
Evaluated LLMs Gemma-3-4B, LLaMA-3-4B, Qwen-3-4B
Tuned base model Qwen/Qwen3-4B
Fine-tuning method LoRA instruction tuning
Training data format Structured EEG QA pairs
Maximum sequence length 2048 tokens
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05
Batch size 2 per device
Gradient accumulation 8 steps
Epochs 1
Maximum training steps 100
Learning rate 2 × 10 4
Optimizer paged_adamw_8bit
Quantization 4-bit NF4 during tuning; Q4_K_M for GGUF export
Implementation Google Colab, PyTorch, HuggingFace, PEFT
Table 2. Classification performance of EEG deep learning baselines on the held-out testing set.
Table 2. Classification performance of EEG deep learning baselines on the held-out testing set.
Method Accuracy Macro-Precision Macro-Recall Macro-F1
EEGNet 0.713 0.729 0.714 0.715
DeepConvNet 0.919 0.926 0.919 0.920
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Accessibility

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated