Evolving Explainable AI for Power Quality Disturbance Detection and Interpretation in Smart Grids

Zhibo Zhang

doi:10.20944/preprints202606.1505.v1

Submitted:

17 June 2026

Posted:

22 June 2026

You are already at the latest version

Abstract

The large-scale integration of renewable energy sources and power electronic devices has introduced increasing complexity to power quality disturbances in smart grids. While machine learning and deep learning methods have demonstrated high accuracy in disturbance detection and classification, their inherent black-box nature limits interpretability and limits operator trust. Existing Explainable Artificial Intelligence (XAI) techniques often suffer from high computational overhead. To address these challenges, this paper proposes an Evolving Explainable Artificial Intelligence (E-XAI) framework for power quality disturbance detection and interpretation in smart grids. The proposed framework employs a Genetic Algorithm (GA) to automatically optimize power quality features, reducing computational complexity while preserving model interpretability. By using this approach, machine learning models can not only identify disturbances but also provide transparent explanations of the key contributing factors, enabling grid operators to gain actionable insights. Experimental results demonstrate that the E-XAI framework reduces the computational cost of XAI techniques while improving classification stability. This work provides an effective pathway toward explainable and intelligent power quality disturbance analysis, addressing the critical research gap in lightweight XAI methods for smart grid environments.

Keywords:

cyber-physical systems

;

Evolving Explainable Artificial Intelligence (E-XAI)

;

genetic algorithm (GA)

;

power quality disturbance

;

power systems

;

smart grid

Subject:

Engineering - Electrical and Electronic Engineering

1. Introduction

The rapid advancement of artificial intelligence methodologies such as machine learning and deep learning has opened new technical avenues for analyzing power quality disturbances [1,2]. Conventional disturbance detection methods predominantly rely on threshold settings and rule-based judgments, exhibiting limited adaptability when faced with complex and variable disturbance types. In contrast, artificial intelligence approaches can autonomously extract features from data, demonstrating robust pattern recognition capabilities [3]. Studies indicate that shallow learning models, such as support vector machines and decision trees achieve favorable results in disturbance classification. Deep learning models, including convolutional neural networks and recurrent neural networks, further enhance the automation of feature extraction [4,5].

As a vital infrastructure of modern society, the power system shoulders the crucial mission of underpinning economic development and safeguarding public welfare. In recent years, amid profound transformations in the global energy landscape, renewable energy generation, including wind and photovoltaic power, has experienced rapid growth, driving a fundamental shift in the structure and operational models of power systems [6,7]. Throughout this process, operational data generated across all grid segments has increased exponentially. Ensuring data security during collection, transmission, and processing has thus become a critical challenge for maintaining grid stability [8,9]. Concurrently, artificial intelligence technologies, exemplified by machine learning and deep learning, are progressively permeating various sectors of power systems, demonstrating promising applications in load forecasting and fault diagnosis [10]. However, these intelligent algorithms often exhibit opaque decision-making processes, making the enhancement of model interpretability a current research priority [10].

The large-scale grid integration of renewable energy represents a crucial pathway for optimizing and adjusting the energy structure [11,12]. However, the intermittency, volatility, and converter-based interfacing of renewable generation exert a significant impact on power quality within the grid. The output from renewable sources such as wind and photovoltaic power exhibits pronounced intermittency and volatility, readily causing fluctuations at grid connection points [13,14]. In severe instances, these fluctuations may result in voltage sags or swells. Meanwhile, a large number of power electronic devices, such as inverters and converters, serve as essential interfaces for renewable energy grid integration. During operation, these devices inject harmonic currents into the system, which exacerbates the problem of distorted voltage waveforms [15]. Moreover, the predominantly single-phase connection method employed by distributed power sources readily causes three-phase load imbalance within the grid [16]. This increases line losses and disrupts the quality and reliability of power supply to end users. These power quality issues rarely occur in isolation: voltage dips may co-occur with harmonic distortion, while fluctuations and three-phase imbalance can compound each other [17]. This complex interplay presents fresh challenges for conventional monitoring approaches.

With the increasing reliance on communication networks in modern power systems, cyber events targeting measurement and control infrastructures have become a critical concern [18]. Malicious activities, such as data injection attacks or command tampering, can alter sensor readings and relay states, thereby misleading system operators and threatening secure grid operation [19]. To address this challenge, existing studies have developed various detection methods, which range from rule-based intrusion detection to data-driven anomaly detection. Notably, in power quality monitoring, the integrity of disturbance-related measurements is equally vulnerable to cyberattacks [20]. Consequently, how to simultaneously detect power quality disturbances and identify potential cyber events has emerged as a noteworthy research direction.

The application of artificial intelligence technologies within power systems provides robust support for enhancing grid intelligence [21]. However, the "black box" nature of their decision-making processes raises concerns about the transparency of their decision-making processes [22]. Whilst machine learning and deep learning algorithms demonstrate high accuracy in disturbance identification and fault diagnosis, the decision-making process of these models is difficult to interpret [23,24]. In recent years, Explainable Artificial Intelligence (XAI) has emerged as a research focus, with studies attempting to reveal internal model mechanisms through visualisation analysis and feature evaluation [25]. Within the field of power quality disturbance analysis, researchers have explored applying explainability methods to disturbance identification, seeking to enhance result credibility while maintaining accuracy [26].

To address the above research gaps in power quality disturbance detection in smart grids, this paper proposes an Evolving Explainable Artificial Intelligence (E-XAI) framework. The framework uses a Genetic Algorithm (GA) to select the best subset of power quality features. This feature selection process reduces computational complexity and preserves model interpretability. With this approach, machine learning models can identify disturbances and provide clear explanations. The framework provides grid operators with insights into the key factors associated with disturbance events. The E-XAI framework also supports dynamic modeling based on electrical signal features. This enhances smart grid monitoring with improved real-time performance, efficiency, and interpretability.

The contributions of this paper are summarized in the following aspects:

Propose an E-XAI framework specifically for power quality disturbance detection and interpretation in smart grids.
Employ GA to automatically optimize the optimal power quality feature subset, thereby reducing computational complexity while preserving model interpretability.
Experimentally demonstrate that the proposed E-XAI framework reduces the computational cost of XAI techniques while improving disturbance classification stability.
Integrate the E-XAI framework into smart grid monitoring systems to enable interpretable analysis of power quality events.

The rest of the paper is organized as follows. Section 2 reviews the related work. Section 3 provides the necessary background and preliminaries for understanding the proposed approach. Section 4 details the proposed E-XAI framework, including its components and workflow. Section 5 presents the experimental design and results to validate the effectiveness of the framework. Finally, Section 6 concludes the paper and discusses potential future research directions.

2. Related Work

In the field of power quality disturbance analysis in smart grids, it is not enough to simply detect the occurrence of disturbances. Therefore, understanding the characteristics and underlying causes of these disturbances has become increasingly important for ensuring reliable grid operation [27,28]. For example, in [29], Joaquín et al. review real-time detection and classification methods for power quality disturbances, focusing on voltage sags and notches. Their analysis indicates that transformation and classification techniques are well developed for offline use, but real-time implementation remains constrained by the computational limitations of embedded systems. In [30], M. S. Priyadarshini et al. employ wavelet packet analysis combined with energy-based feature extraction for voltage sag detection, comparing six mother wavelets to identify the most suitable one. Their results demonstrate that wavelet packet analysis improves time-frequency analysis and enables more effective feature extraction for disturbance identification. In [31], Kamran et al. review harmonic detection, suppression, and estimation techniques, comparing traditional methods with machine learning-based approaches. Their analysis shows that hybrid filters and adaptive methods provide better accuracy for dynamic harmonic conditions. These studies indicate that understanding disturbance characteristics is essential for developing effective detection methods, especially with increasing renewable energy integration.

On the other hand, machine learning and deep learning techniques have been widely applied to power quality disturbance detection and classification tasks, offering improved accuracy and automation over traditional methods [32]. In [33], Topaloglu develops a Convolutional Neural Network (CNN)-based approach with an attention model for power quality disturbance classification, achieving 99.92% accuracy on a nine-class dataset. Their attention model rescales input data to enhance feature learning, demonstrating the potential of combining deep learning with attention mechanisms for disturbance recognition. In [34], Fatma et al. develop a Long Short-Term Memory (LSTM)-based method for detecting and classifying power quality disturbances, including sag, swell, surge, distortion, and interruption. Their model achieved 96.37% accuracy even in the presence of random noise with 30-dB Signal-to-Noise Ratio (SNR), demonstrating the robustness of LSTM networks for power quality disturbance classification under noisy conditions. These studies demonstrate that deep learning methods, particularly CNN with attention mechanisms and LSTM networks, achieve high accuracy and robustness in power quality disturbance classification, even under noisy conditions.

However, despite the high accuracy of deep learning models, their black-box nature limits their adoption in safety-critical power systems [35]. Therefore, XAI has emerged as a promising direction to enhance the interpretability and trustworthiness of disturbance classification results. In [36], R. Machlev et al. review the application of XAI techniques in energy and power systems, identifying SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) as the most widely used methods. Their analysis reveals that most studies combine traditional machine learning models with XAI. In contrast, deep learning models are rarely integrated with interpretability techniques. This points to a clear gap between high-performance black box models and the increasing demand for interpretability. In [37], Ahmet Cifci employs ten machine learning models combined with SHAP and Individual conditional expectation plots to predict decentralized smart grid stability, with the Artificial Neural Network (ANN) model achieving the highest Area Under the Curve (AUC) of 99.4%. Their SHAP analysis reveals that reaction time and nominal price are the most influential features, demonstrating how XAI can provide insights into model behavior and feature importance in power system applications [38,39]. These studies demonstrate that XAI techniques, particularly SHAP and LIME, can effectively enhance the interpretability of machine learning models in power system applications by revealing key features influencing model decisions [40]. However, research on applying XAI to deep learning-based power quality disturbance classification remains limited, highlighting the need for further investigation into explainable deep learning methods in this domain [41,42].

The development of efficient and explainable artificial intelligence methods for power quality disturbance monitoring remains an open research problem. Existing studies are relatively lacking in lightweight XAI frameworks specifically designed for the classification of power quality disturbances. This limitation hinders the advancement of real-time interpretable monitoring systems in smart grids. To address this research gap, this paper proposes an E-XAI framework for power quality disturbance detection and interpretation, aiming to enhance the efficiency and interpretability of disturbance analysis in smart grids.

3. Background and Preliminaries

3.1. Dataset

The dataset used in this study is collected from a Hardware-In-the-Loop (HIL) test bed. This test bed emulates a three-bus two-line transmission system with distance protection. It includes four Phasor Measurement Units (PMUs), four relays, two generators, and variable loads. The setup covers 37 power system scenarios, including natural events (e.g., line faults), normal operations, and cyber-attacks (e.g., command injection, relay disabling, data injection). Similar test bed configurations have been used in prior work to enable systematic evaluation of disturbance classification [43] and cyber-attack detection [44].

The raw data consists of time-synchronized measurements from four PMUs. Each PMU provides 29 features, including voltage/current phasors, sequence components, frequency, impedance, and relay status. This results in a total of 128 features sampled at 120 Hz. Control panel logs, Snort alerts, and relay logs are also recorded. The combination of these heterogeneous data sources has been shown to improve discrimination between natural events and cyber-attacks [45]. The loads range from 200 MW to 399 MW, and the fault locations span 10% to 90% of the line length, ensuring the variability for model training.

Preprocessing includes merging multi-source logs, normalizing timestamps, and quantizing continuous measurements (e.g., current magnitude into normal/high ranges) while keeping binary features unchanged. In this study, the dataset is randomly split into training and testing subsets. This preprocessing follows established practices in power system data mining [46], where temporal state transitions are compressed to facilitate sequential pattern learning. The resulting structured dataset supports the subsequent genetic algorithm-based feature optimization and explainable AI analysis.

The feature space of the dataset is mainly derived from synchronized PMU measurements and relay-related indicators. Table 1 summarizes the main categories of electrical features recorded for each PMU/relay unit. As shown in Table 1, the dataset covers both phase-domain measurements and sequence-domain measurements, together with frequency- and impedance-related indicators. These features provide a comprehensive description of the electrical behavior of the system under different natural, normal, and attack conditions.

3.2. Explainable AI

SHAP is a game-theoretic method for interpreting machine learning models by assigning each feature a contribution score to a given prediction [47,48]. It is grounded in Shapley values, which fairly distribute the prediction outcome among features. SHAP follows an additive attribution form that satisfies properties such as local accuracy, missingness, and consistency. The explanation model g is defined as:

g (s) = ψ_{0} + \sum_{j = 1}^{P} ψ_{j} s_{j},

(1)

where

s \in {0, 1}^{P}

indicates the presence or absence of features, P is the total number of features, and

ψ_{j} \in R

denotes the contribution of feature j. A mapping function

m_{x} (s) = x

links simplified inputs to the original input space. The surrogate model is trained to approximate the original model f:

g (s) \approx f (m_{x} (s)),

(2)

with

s \approx x^{'}

. The Shapley value for feature

x_{j}

is computed as:

ψ_{j} = \sum_{T \subseteq F ∖ {j}} \frac{| T |! (| F | - | T | - 1)!}{| F |!} [f (T \cup {j}) - f (T)],

(3)

where F is the full feature set and T is a subset excluding feature j. This formulation enables SHAP to identify key factors influencing predictions, such as those relevant to battery SOH degradation [49,50].

LIME aims to approximate a complex model f locally using an interpretable model g around a specific instance x[51,52]. The interpretable model

g \in G

is selected to be simple (e.g., linear) and human-understandable. A proximity function

π_{x} (z)

defines the locality by measuring the similarity between a perturbed sample z and the instance x. The objective function is:

ξ (x) = arg min_{g \in G} L (f, g, π_{x} (z)) + Ω (g),

(4)

where

Ω (g)

penalizes model complexity. The similarity function is typically defined as:

π_{x} (z) = exp (- \frac{D {(x, z)}^{2}}{τ^{2}}),

(5)

where

D (x, z)

measures the distance between x and z, and

τ

controls the locality. Based on this weighting, the objective can be rewritten as:

ξ (x) = \sum_{z, z^{'} \in Z} π_{x} (z) {(f (z) - g (z^{'}))}^{2},

(6)

where

π_{x} (z)

assigns higher importance to samples closer to x, enabling the interpretable model g to approximate f effectively in the local region.

3.3. Genetic Algorithms

Genetic Algorithms are population-based optimization methods inspired by evolutionary processes, where a set of candidate solutions is iteratively improved through selection and variation operators [53,54]. The procedure starts with a randomly initialized population and evolves it across generations by evaluating individuals according to a predefined fitness criterion.

Let

u

denote the complete feature vector and

u^{★}

a selected subset. A pretrained classifier

h (\cdot)

produces a softmax probability distribution over class labels. Define the reference label as

\tilde{y} = arg max h (u)

. The fitness of the subset

u^{★}

is formulated as:

F (u^{★}) = h {(u^{★})}_{\tilde{y}} - α \cdot |h {(u^{★})}_{\tilde{y}} - h {(u)}_{\tilde{y}}|,

(7)

where

h {(u^{★})}_{\tilde{y}}

represents the predicted probability of class

\tilde{y}

using only the subset

u^{★}

, and

α \in [0, 1]

controls the penalty on deviation from the full-input confidence. The first term encourages subsets that maintain strong class confidence, while the second term penalizes discrepancies from the original prediction. This design measures how effectively the selected features retain the model’s discriminative behavior.

Compared with conventional gradient-based feature selection methods, this formulation is model-agnostic and more robust to noise and redundancy. It emphasizes preserving predictive consistency, thereby yielding more reliable functional feature subsets for explanation. The computed fitness guides the selection stage, where individuals with higher scores are more likely to be chosen as parents. Typical strategies include tournament selection and roulette-wheel sampling [55].

Within the E-XAI framework, GA is employed to identify informative power quality feature subsets. By leveraging model prediction probabilities as the optimization signal, the algorithm efficiently discovers feature combinations that best preserve the original model behavior while enhancing interpretability.

4. Proposed Method

This section first introduces the overall framework of the proposed E-XAI method for power quality disturbance detection and interpretation, followed by the details of the components.

4.1. Overall Framework

The overall framework of the proposed E-XAI method is illustrated in Figure 1. The framework takes the power disturbance dataset as input and first performs feature extraction from raw electrical signals, including voltage, current, and frequency-related measurements. These extracted features are then encoded into an initial population, where each individual represents a candidate feature subset. This population serves as the starting point for the subsequent evolutionary optimization process. The goal of this stage is to transform high-dimensional raw measurements into structured representations suitable for both detection and interpretation tasks.

Within the evolving process, each individual is first decoded into an explainable feature subset and evaluated through a predefined fitness function. The fitness evaluation is guided by the performance of the detection model and the consistency of the explanation model, ensuring that the selected features not only maintain classification accuracy but also preserve interpretability. Based on the fitness scores, the population is iteratively updated through evolutionary operations, such as selection and variation, to gradually improve the quality of candidate solutions. This process continues until a stopping criterion is satisfied, resulting in an optimized set of evolved features.

Finally, the evolved features are jointly utilized by the detection model and the explanation model. The detection model performs disturbance classification, while the explanation model provides interpretable insights into the key contributing features. By integrating the evolutionary optimization process with explainable artificial intelligence techniques, the proposed framework achieves a balance between detection performance and interpretability, enabling efficient and transparent power disturbance analysis in smart grid environments.

4.2. GA Evolving Process

To improve both the effectiveness and interpretability of power quality disturbance detection, we design a GA-based evolving process to identify the most informative subset of power system features. Given a high-dimensional feature vector

x \in R^{P}

extracted from the Power System Attack Datasets, including voltage phase angles, voltage magnitudes, current measurements, frequency, impedance, and relay states, the objective is to select a compact subset

u^{*} \subseteq x

that preserves the discriminative capability of the original feature space while reducing redundancy.

Each individual in the population is represented as a binary mask

m \in {0, 1}^{P}

, where

m_{j} = 1

indicates that the j-th feature is selected. The masked feature vector is defined as:

x^{'} = m ⊙ x,

(8)

where ⊙ denotes element-wise multiplication. The reduced feature vector

x^{'}

is then fed into a pre-trained detection model

h (\cdot)

, which outputs a probability distribution over disturbance classes (e.g., natural events, normal operation, and attack events).

To guide the evolutionary search, we define a model-consistency-aware fitness function that evaluates how well a feature subset preserves the original model behavior:

F (m) = h {(x^{'})}_{\tilde{y}} - α \cdot |h {(x^{'})}_{\tilde{y}} - h {(x)}_{\tilde{y}}|,

(9)

where

\tilde{y} = arg max h (x)

is the predicted label using the full feature vector, and

α \in [0, 1]

controls the penalty for deviation. The first term encourages feature subsets that maintain high confidence, while the second term penalizes inconsistency with the original prediction. This formulation ensures that the evolving process is model-agnostic and aligned with the intrinsic decision logic of the detection model.

As described in Algorithm 1, the fitness of each individual is computed by applying the mask

m

to

x

and evaluating

F (m)

. The evolutionary process, summarized in Algorithm 2, iteratively updates the population to maximize the fitness function. Specifically, an initial population

P^{(0)} = {m_{i}}_{i = 1}^{s}

is randomly generated to ensure diversity. At each generation t, individuals are selected based on their fitness values using tournament selection:

m_{p a r e n t} = arg max_{m_{i} \in T} F (m_{i}),

(10)

where

T

denotes a randomly sampled subset of the population.

Offspring individuals are generated through crossover and mutation operations:

m_{c h i l d} = Mutation (Crossover (m_{i}, m_{j})) .

(11)

The new population

P^{(t + 1)}

is then formed by replacing the previous generation while preserving the best individual:

m^{b e s t} = arg max_{m \in P} F (m) .

(12)

The evolutionary process continues until convergence or reaching the maximum number of generations. The final optimal mask

m^{*}

defines the evolved feature subset:

u^{*} = m^{*} ⊙ x .

(13)

This subset represents the most informative combination of power system measurements for disturbance detection and interpretation, enabling reduced computational cost and enhanced explainability.

Algorithm 1: Fitness Evaluation

Input:

h (\cdot), x, m

Output:

F (m)

1

x^{'} \leftarrow m ⊙ x

2

\tilde{y} \leftarrow arg max h (x)

3

F (m) \leftarrow h {(x^{'})}_{\tilde{y}} - α | h {(x^{'})}_{\tilde{y}} - h {(x)}_{\tilde{y}} |

4 return

F (m)

Algorithm 2: GA-based Feature Evolution

4.3. Computational Complexity Analysis

In the proposed framework for power quality disturbance detection and interpretation, the computational cost is mainly dominated by the GA-based feature evolving process and the repeated evaluation of the detection model. Let

x \in R^{P}

denote the original power system feature vector extracted from the Power System Attack Datasets, where P represents the total number of features, including voltage, current, frequency, impedance, and relay-related measurements.

Each candidate solution is encoded as a binary vector

m \in {0, 1}^{P}

, which defines a reduced feature representation:

x^{'} = m ⊙ x .

(14)

The evaluation of each individual requires a forward inference of the detection model

h (\cdot)

, which outputs class probabilities over disturbance categories. Let

C_{h}

denote the computational cost of a single model inference. Then, the complexity of evaluating one individual is

O (C_{h})

.

Given a population size s and maximum number of generations

e_{max}

, the total cost of fitness evaluation across the evolutionary process can be expressed as:

O (s \cdot e_{max} \cdot C_{h}) .

(15)

Since the fitness function

F (m) = h {(x^{'})}_{\tilde{y}} - α |h {(x^{'})}_{\tilde{y}} - h {(x)}_{\tilde{y}}|

(16)

relies only on model outputs, it does not introduce additional significant computational overhead beyond inference, ensuring that the evaluation remains efficient and model-agnostic.

The genetic operators further contribute to the overall complexity. Tournament selection requires comparing fitness values within a subset

T

, resulting in

O (s)

operations per generation. Crossover and mutation are applied element-wise on binary masks, leading to a cost of

O (s \cdot P)

per generation. Therefore, the total complexity of the GA update process can be approximated as:

O (e_{max} \cdot (s \cdot P + s)) .

(17)

Combining both evaluation and evolution steps, the overall computational complexity of the GA-based feature selection process is:

O (s \cdot e_{max} \cdot (C_{h} + P)) .

(18)

In practice, since

C_{h} ≫ P

for most deep or machine learning models, the total cost is primarily dominated by repeated model inference. However, compared with exhaustive feature search strategies with complexity

O (2^{P})

, the proposed GA-based approach significantly reduces the search space to a linear function of s and

e_{max}

, making it scalable for high-dimensional power system data. Furthermore, by selecting a compact subset

u^{*} = m^{*} ⊙ x

, the framework reduces the input dimensionality for both detection and explanation modules, thereby lowering the computational burden in downstream tasks.

5. Experimental Design and Results

5.1. Experimental Setup

Experiments are conducted on a power system disturbance dataset described in Section 3. The dataset contains heterogeneous measurements from multiple subsystems, including voltage, current, frequency, impedance, and relay status signals. These measurements are aggregated into structured feature vectors, forming a high-dimensional representation of system behavior. To construct a unified dataset, multiple data files are merged into a single feature space. Each sample is associated with a binary label, where disturbance events (including cyber-attacks and abnormal system conditions) are classified against normal operating states. The dataset is randomly split into training and testing subsets using an 80:20 ratio, ensuring balanced class distributions. A standard preprocessing pipeline is applied prior to model training. Specifically, missing values are handled using median imputation, and all features are normalized using standard scaling to eliminate magnitude differences across measurement types.

To enhance interpretability, the proposed framework integrates a GA method to identify a compact and informative subset of features. Each candidate solution is encoded as a binary mask:

m \in {0, 1}^{P},

(19)

where P denotes the total number of features. The reduced feature vector is defined as:

x^{'} = m ⊙ x,

(20)

where ⊙ represents element-wise multiplication.

The fitness function is designed to preserve the predictive behavior of the original model while minimizing feature redundancy:

F (m) = h {(x^{'})}_{\tilde{y}} - α \cdot |h {(x^{'})}_{\tilde{y}} - h {(x)}_{\tilde{y}}|,

(21)

where

\tilde{y}

is the predicted class using the full feature set, and

h (\cdot)

denotes the model output.

The GA evolves feature subsets through selection, crossover, and mutation operations. The configuration used in this study is as follows:

Population size: 40
Number of generations: 25
Crossover rate: 0.8
Mutation rate: 0.08

For each test instance, the GA searches for a feature subset that maximizes the fitness function. The resulting subsets are aggregated across multiple samples to construct a global reduced feature set. This reduced feature space is then used for subsequent model training and explainability analysis.

To evaluate the effectiveness of the proposed framework, four representative machine learning models are employed for disturbance detection, including Random Forest (RF), k-Nearest Neighbors (KNN), Multi-Layer Perceptron (MLP), and AdaBoost. All models are implemented within a unified preprocessing pipeline, where missing values are handled using median imputation and features are normalized using standard scaling to ensure consistency across heterogeneous measurements. For model interpretability, two widely adopted explainability techniques are utilized, namely SHAP and LIME . SHAP provides both global and local feature importance based on Shapley values, enabling consistent attribution of feature contributions, while LIME explains individual predictions by approximating the model behavior with locally interpretable surrogate models. In the baseline setting, SHAP and LIME are applied directly to the full feature space. Under the proposed E-XAI framework, both methods are applied to the reduced feature subset obtained through the GA-based optimization process, resulting in more compact and interpretable explanations.

5.2. Detection Performance

Table 2 presents the detection performance of different models under the reduced feature setting obtained by the proposed E-XAI framework. The results show that the Random Forest (RF) model achieves the best performance, with an accuracy of 0.9238 and an AUC of 0.9756, indicating strong discriminative capability even after feature reduction. The KNN model achieves moderate performance, with an accuracy of 0.8396 and an AUC of 0.9009, demonstrating that distance-based methods can still capture meaningful patterns in the reduced feature space. The MLP model shows slightly lower performance, with an accuracy of 0.8002 and an AUC of 0.8484, suggesting that neural networks may be more sensitive to feature reduction due to their reliance on higher-dimensional representations. In contrast, AdaBoost exhibits the lowest performance among the evaluated models, with an accuracy of 0.7102 and an AUC of 0.6259. This indicates that boosting-based methods may be more affected by the removal of redundant features, particularly when the remaining features are less linearly separable. From a computational perspective, KNN has the lowest training cost (1.70 seconds), while MLP requires the highest training time (537.72 seconds). Random Forest achieves a favorable balance between performance and computational cost, making it the most suitable model for subsequent explainability analysis.

5.3. GA Fitness Analysis

Figure 2 illustrates the convergence behavior of the GA used in the proposed E-XAI framework. The fitness value shows a rapid increase during the early generations, followed by a gradual stabilization as the evolution progresses. Specifically, the best fitness improves from approximately 0.64 to 0.79 within the first 6 generations, indicating that informative feature subsets are quickly identified. After this stage, the fitness curve enters a plateau phase, suggesting convergence to a stable solution.

The mean fitness follows a similar trend, steadily increasing across generations while remaining below the best fitness, which indicates consistent population improvement and effective exploration-exploitation balance. The relatively small gap between the best and mean fitness values in later generations further confirms the convergence of the GA. The impact of GA-based feature selection on detection performance is summarized in Table 3. It can be observed that the reduced feature set maintains comparable performance across all models. For instance, the RF model achieves an accuracy of 0.9238 after feature reduction, which is nearly identical to the baseline accuracy of 0.9229, with a negligible accuracy drop of 0.0009. Similarly, KNN and MLP show only minor performance variations, while AdaBoost exhibits a slight decrease in AUC. In terms of computational efficiency, the GA-based feature reduction leads to noticeable improvements. The training time of RF is reduced by approximately 20.6 seconds, while MLP achieves a significant reduction of over 61 seconds. Although KNN shows a slight increase in training time, its computational cost remains negligible compared to other models. These results demonstrate that the selected feature subset effectively preserves discriminative information while reducing computational overhead.

5.4. Explanation Analysis

This section provides a detailed analysis of model interpretability by comparing baseline explanations with those obtained under the proposed E-XAI framework. Both SHAP and LIME are used to evaluate the quality, consistency, and sparsity of explanations.

Figure 3 compares LIME explanations for a representative sample before and after applying E-XAI. In the baseline setting, the explanation involves a relatively large number of features, with contributions distributed across multiple variables. Although LIME identifies locally important features, the presence of many small-magnitude contributions makes it difficult to distinguish truly dominant factors. This results in a dense and less interpretable explanation. In contrast, after applying E-XAI, the explanation becomes significantly more compact. Only a small subset of features contributes to the prediction, and their effects are more clearly separated into positive and negative influences. This indicates that the GA-based feature selection effectively removes redundant features while preserving the key decision structure. As a result, the explanation is not only shorter but also more meaningful, as it highlights the most influential features without distraction from irrelevant variables.

The SHAP summary plots in Figure 4 provide a global view of feature importance across the dataset. In the baseline model, feature contributions are distributed across multiple variables, indicating that the model relies on a high-dimensional and partially redundant feature space. Many features exhibit low-magnitude SHAP values, suggesting limited contribution to the final decision. After applying E-XAI, the SHAP summary plot shows a clear concentration of importance on a smaller subset of features. The dispersion of SHAP values is reduced, and dominant features become more distinguishable. This indicates that the reduced feature set captures the essential predictive structure while eliminating irrelevant information.

The dependence plots in Figure 5 further highlight this difference. In the baseline model, the relationship between feature values and SHAP contributions appears scattered and less structured, reflecting interactions across many features. In contrast, the E-XAI model exhibits more coherent and interpretable patterns, where the influence of feature values on the prediction becomes clearer and more stable.

The combined results from LIME and SHAP demonstrate a consistent trend: the proposed E-XAI framework transforms the model from a high-dimensional, distributed reasoning mechanism into a compact and structured decision system. In the baseline setting, explanations are often dispersed across many features, making it difficult to interpret the underlying decision logic. After feature optimization, explanations become more focused, highlighting only the most relevant features. This improvement is achieved without sacrificing predictive performance, as demonstrated in previous sections. Therefore, the proposed E-XAI framework not only reduces feature redundancy but also enhances explanation quality, making the model more suitable for practical applications where interpretability is critical.

6. Conclusion

This paper has proposed an E-XAI framework for real-time power quality disturbance detection and interpretation in smart grids. The framework integrates a GA to automatically select an optimal subset of power quality features, thereby reducing computational complexity while preserving the interpretability of the underlying machine learning models. By focusing on the most informative features, the proposed method enables both efficient disturbance classification and transparent explanation of the contributing factors. Experimental results on a realistic power system disturbance dataset demonstrate that the E-XAI framework achieves competitive detection performance while significantly reducing feature dimensionality. Furthermore, explanation analyses using SHAP and LIME confirmed that the reduced feature space yields more compact and interpretable explanations, with dominant factors clearly distinguishable from irrelevant variables. These results indicate that the proposed framework effectively addresses the trade-off between model accuracy, computational efficiency, and interpretability.

Despite these promising results, several limitations remain. The dataset used in this study, while comprehensive, is derived from a simulated test bed; real-world power system data may introduce additional noise and variability. Moreover, the current framework relies on a pre-trained detection model, and the GA optimization is performed offline. Future work will focus on extending the E-XAI framework to handle online feature adaptation and incremental learning to accommodate dynamic changes in smart grid environments. Additionally, integrating the framework with other explainability techniques, such as counterfactual explanations or attention-based mechanisms, and validating it on larger-scale, real-world datasets are important directions for further research. Overall, this work provides a practical pathway toward lightweight, interpretable, and real-time power quality monitoring, contributing to more resilient and trustworthy smart grid operations.

Acknowledgments

Declaration of generative AI and AI-assisted technologies in the writing process. During the preparation of this work, the authors used Grammarly and Claude in order to improve the readability and language of the work. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

References

Saber, A.M.; Youssef, A.; Svetinovic, D.; Zeineldin, H.; Kundur, D.; El-Saadany, E. Enhancing power quality event classification with ai transformer models. In Proceedings of the 2024 IEEE Power & Energy Society General Meeting (PESGM). IEEE, 2024, pp. 1–5.
Dong, M.; Li, H.; Yin, S.; Lin, J.; Zhang, H.; Zeng, Z.; Cheng, Y. A LLM-Guided Approach for Active-Clamped Flyback Converter of Power Processing Units in Electric Propulsion. IET Electric Power Applications 2026, 20, e70154.
Dehaghani, M.N.; Korõtko, T.; Rosin, A. Ai applications for power quality issues in distribution systems: A systematic review. IEEE Access 2025, 13, 18346–18365. [CrossRef]
Cui, C.; Duan, Y.; Hu, H.; Wang, L.; Liu, Q. Detection and classification of multiple power quality disturbances using stockwell transform and deep learning. IEEE transactions on instrumentation and measurement 2022, 71, 1–12. [CrossRef]
Liu, X.; Wu, Y.; Zhang, H.; Xue, Y.; Wang, L.; Li, H.; Yin, S. Impact of Position Measurement Angle Error on Performance of PMSM Drives for Electric Power Steering in a Wide Speed Range. IET Electric Power Applications 2026, 20, e70156. [CrossRef]
Ge, L.; Zhang, B.; Huang, W.; Li, Y.; Hou, L.; Xiao, J.; Mao, Z.; Li, X. A review of hydrogen generation, storage, and applications in power system. journal of Energy Storage 2024, 75, 109307. [CrossRef]
Zhang, Z.; Damiani, E.; Hamadi, H.; Yeun, C.; Taher, F. A late multi-modal fusion model for detecting hybrid spam e-mail. International Journal of Computer Theory and Engineering 2023, 15, 76–81. [CrossRef]
Islam, M.M.; Yu, T.; Giannoccaro, G.; Mi, Y.; La Scala, M.; Nasab, M.R.; Wang, J. Improving reliability and stability of the power systems: A comprehensive review on the role of energy storage systems to enhance flexibility. IEEE Access 2024, 12, 152738–152765. [CrossRef]
Liu, Y.; Zhou, F.; Yang, S.; Guan, M.; Liu, C. Research on Peak-to-Average Power Ratio Control Method for Switched Reluctance Pulse Generator. IET Electric Power Applications 2026, 20, e70144. [CrossRef]
Forootan, M.M.; Larki, I.; Zahedi, R.; Ahmadi, A. Machine learning and deep learning in energy systems: A review. Sustainability 2022, 14, 4832. [CrossRef]
Cavus, M. Advancing power systems with renewable energy and intelligent technologies: A comprehensive review on grid transformation and integration. Electronics 2025, 14, 1159. [CrossRef]
Zhang, Z.; Hu, J.; Pota, H.; Kermanshahi, S.K.; Turnbull, B.; Damiani, E.; Yeun, C.Y. Experimental demonstration of risks and influences of cyber attacks on wireless communication in microgrids. In Proceedings of the 2024 21st Annual International Conference on Privacy, Security and Trust (PST). IEEE, 2024, pp. 1–5.
Ejuh Che, E.; Roland Abeng, K.; Iweh, C.D.; Tsekouras, G.J.; Fopah-Lele, A. The impact of integrating variable renewable energy sources into grid-connected power systems: Challenges, mitigation strategies, and prospects. Energies 2025, 18, 689. [CrossRef]
Guo, F.; Zhang, Z.; Mo, H.; Li, C. A Method for Battery SoH Estimation Based on K-means and LightGBM algorithm. In Proceedings of the 2024 6th International Conference on System Reliability and Safety Engineering (SRSE), 2024, pp. 1–7. [CrossRef]
Kaur, J.; Bath, S.K. Harmonic distortion in power systems due to electronic control and renewable energy integration: a comprehensive review. Discover Electronics 2025, 2, 67. [CrossRef]
Razmi, D.; Lu, T.; Papari, B.; Akbari, E.; Fathi, G.; Ghadamyari, M. An overview on power quality issues and control strategies for distribution networks with the presence of distributed generation resources. IEEE access 2023, 11, 10308–10325. [CrossRef]
Senol, M.; Bayram, I.S.; Naderi, Y.; Galloway, S. Electric vehicles under low temperatures: A review on battery performance, charging needs, and power grid impacts. Ieee Access 2023, 11, 39879–39912. [CrossRef]
Peng, C.; Sun, H.; Yang, M.; Wang, Y.L. A survey on security communication and control for smart grids under malicious cyber attacks. IEEE Transactions on Systems, Man, and Cybernetics: Systems 2019, 49, 1554–1569. [CrossRef]
Shees, A.; Tariq, M.; Sarwat, A.I. Cybersecurity in smart grids: Detecting false data injection attacks utilizing supervised machine learning techniques. Energies 2024, 17, 5870. [CrossRef]
Syrmakesis, A.D.; Alhelou, H.H.; Hatziargyriou, N.D. A novel cyberattack-resilient frequency control method for interconnected power systems using SMO-based attack estimation. IEEE Transactions on Power Systems 2023, 39, 5672–5686.
Alhamrouni, I.; Abdul Kahar, N.H.; Salem, M.; Swadi, M.; Zahroui, Y.; Kadhim, D.J.; Mohamed, F.A.; Alhuyi Nazari, M. A comprehensive review on the role of artificial intelligence in power system stability, control, and protection: Insights and future directions. Applied Sciences 2024, 14, 6214. [CrossRef]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: a review on explainable artificial intelligence. Cognitive Computation 2024, 16, 45–74. [CrossRef]
Yu, J.; Zhang, Y. Challenges and opportunities of deep learning-based process fault detection and diagnosis: a review. Neural Computing and Applications 2023, 35, 211–252.
Zhang, Z.; Turnbull, B.; Kermanshahi, S.K.; Pota, H.; Hu, J. SDN-MG25: A Comprehensive Dataset for Cybersecurity Analysis in Software Defined Networking-Enabled Microgrid Systems. IEEE Open Journal of the Computer Society 2026, 7, 26–36. [CrossRef]
Minh, D.; Wang, H.X.; Li, Y.F.; Nguyen, T.N. Explainable artificial intelligence: a comprehensive review. Artificial Intelligence Review 2022, 55, 3503–3568.
Bin Akter, S.; Sarkar Pias, T.; Rahman Deeba, S.; Hossain, J.; Abdur Rahman, H. Ensemble learning based transmission line fault classification using phasor measurement unit (PMU) data with explainable AI (XAI). Plos one 2024, 19, e0295144. [CrossRef]
Yan, Y.; Chen, K.; Geng, H.; Fan, W.; Zhou, X. A Review on Intelligent Detection and Classification of Power Quality Disturbances: Trends, Methodologies, and Prospects. Computer Modeling in Engineering & Sciences (CMES) 2023, 137. [CrossRef]
Zhang, Z.; Turnbull, B.; Kermanshahi, S.K.; Pota, H.; Hu, J. UNSW-MG24: A heterogeneous dataset for cybersecurity analysis in realistic microgrid systems. IEEE Open Journal of the Computer Society 2025. [CrossRef]
Caicedo, J.E.; Agudelo-Martínez, D.; Rivas-Trujillo, E.; Meyer, J. A systematic review of real-time detection and classification of power quality disturbances. Protection and Control of Modern Power Systems 2023, 8, 1–37. [CrossRef]
Priyadarshini, M.; Bajaj, M.; Zaitsev, I. Energy feature extraction and visualization of voltage sags using wavelet packet analysis for enhanced power quality monitoring. Scientific Reports 2025, 15, 2226. [CrossRef]
Daniel, K.; Kütt, L.; Iqbal, M.N.; Shabbir, N.; Raja, H.A.; Sardar, M.U. A review of harmonic detection, suppression, aggregation, and estimation techniques. Applied Sciences 2024, 14, 10966. [CrossRef]
Ravi, T.; Srividya, S.; Anil, V.; Jayaprakash, S.; et al. Review of detection and classification of power quality disturbances using machine learning and deep learning methods. In Proceedings of the 2023 Innovations in Power and Advanced Computing Technologies (i-PACT). IEEE, 2023, pp. 1–8.
Topaloglu, I. Deep learning based a new approach for power quality disturbances classification in power transmission system. Journal of Electrical Engineering & Technology 2023, 18, 77–88.
Dekhandji, F.Z.; Recioui, A.; Ladada, A.; Moulay Brahim, T.S. Detection and classification of power quality disturbances using LSTM. Engineering Proceedings 2023, 29, 2. [CrossRef]
Aygul, K.; Aksoy, N.; Kucuktezcan, F.; Genc, I. From black box to decision support: An interpretable clustering-based ensemble for dynamic security assessment in modern power systems. Knowledge-Based Systems 2026, 338, 115490. [CrossRef]
Machlev, R.; Heistrene, L.; Perl, M.; Levy, K.Y.; Belikov, J.; Mannor, S.; Levron, Y. Explainable Artificial Intelligence (XAI) techniques for energy and power systems: Review, challenges and opportunities. Energy and AI 2022, 9, 100169. [CrossRef]
Cifci, A. Interpretable prediction of a decentralized smart grid based on machine learning and explainable artificial intelligence. IEEE access 2025.
Nand, K.; Zhang, Z.; Hu, J. A Comprehensive Survey on the Usage of Machine Learning to Detect False Data Injection Attacks in Smart Grids. IEEE Open Journal of the Computer Society 2025, 6, 1121–1132. [CrossRef]
Zhang, Z.; Turnbull, B.; Kermanshahi, S.K.; Pota, H.; Damiani, E.; Yeun, C.Y.; Hu, J. A survey on resilient microgrid system from cybersecurity perspective. Applied Soft Computing 2025, 175, 113088. [CrossRef]
Wang, Z.; Zhang, Z.; Al Hammadi, A.Y.; Huang, X.; Guo, F.; Damiani, E.; Yeun, C.Y.; Li, L. Evolving Explainable Artificial Intelligence for electroencephalography-based mental health classification in digital twin systems. Ad hoc networks 2025, p. 103964. [CrossRef]
Guo, F.; Xu, K.; Zhang, Z.; Zhou, H.; Chen, G.; Hu, J.; Zhang, J.; Mo, H. Battery SOH Prediction Under Different Conditions via MBLSTM and iTransformer With Anomaly Detection and Explainability. IEEE Open Journal of the Computer Society 2025, 6, 1847–1857. [CrossRef]
Guo, F.; Zhang, Z.; Ma, X.; Li, L.; Zhou, H.; Li, C.; Mo, H. An Interpretable TCN–Transformer Framework for Lithium-Ion Battery State of Health Estimation Using SHAP Analysis. Quality and Reliability Engineering International 2026.
Pan, S.; Morris, T.; Adhikari, U. Classification of disturbances and cyber-attacks in power systems using heterogeneous time-synchronized data. IEEE Transactions on Industrial Informatics 2015, 11, 650–662. [CrossRef]
Pan, S.; Morris, T.H.; Adhikari, U. A specification-based intrusion detection framework for cyber-physical environment in electric power system. Int. J. Netw. Secur. 2015, 17, 174–188.
Pan, S.; Morris, T.; Adhikari, U. Developing a hybrid intrusion detection system using data mining for power systems. IEEE Transactions on Smart Grid 2015, 6, 3104–3113. [CrossRef]
Hink, R.C.B.; Beaver, J.M.; Buckner, M.A.; Morris, T.; Adhikari, U.; Pan, S. Machine learning for power system disturbance and cyber-attack discrimination. In Proceedings of the 2014 7th International symposium on resilient control systems (ISRCS). IEEE, 2014, pp. 1–8.
Zhang, Z.; Hamadi, H.A.; Damiani, E.; Yeun, C.Y.; Taher, F. Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research. IEEE Access 2022, 10, 93104–93139. [CrossRef]
Huang, X.; Zhang, Z.; Guo, F.; Wang, X.; Chi, K.; Wu, K. Research on older adults’ interaction with e-health interface based on explainable artificial intelligence. In Proceedings of the International Conference on Human-Computer Interaction. Springer, 2024, pp. 38–52.
Zhang, Z.; Umar, S.; Hammadi, A.Y.A.; Yoon, S.; Damiani, E.; Ardagna, C.A.; Bena, N.; Yeun, C.Y. Explainable Data Poison Attacks on Human Emotion Evaluation Systems Based on EEG Signals. IEEE Access 2023, 11, 18134–18147. [CrossRef]
Lee, S.; Kwon, O.; Ju, H.; Kim, J.; Kim, J.; Kim, S.; Kim, H. State estimation of lithium-ion battery using a novel explainable artificial intelligence approach. Energy 2022, 247, 123464.
Ribeiro, M.T.; Singh, S.; Guestrin, C. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144.
Ribeiro, M.T.; Singh, S.; Guestrin, C. Anchors: High-precision model-agnostic explanations. In Proceedings of the Proceedings of the AAAI conference on artificial intelligence, 2018, Vol. 32.
Ghorbanzadeh, G.; Nabizadeh, Z.; Karimi, N.; Khadivi, P.; Emami, A.; Samavi, S. DGAFF: Deep genetic algorithm fitness Formation for EEG Bio-Signal channel selection. Biomedical Signal Processing and Control 2023, 79, 104119. [CrossRef]
Saibene, A.; Gasparini, F. Genetic algorithm for feature selection of EEG heterogeneous data. Expert Systems with Applications 2023, 217, 119488. [CrossRef]
Wang, B.; Pei, W.; Xue, B.; Zhang, M. Explaining deep convolutional neural networks for image classification by evolving local interpretable model-agnostic explanations. arXiv preprint arXiv:2211.15143 2022.

Figure 1. The overall framework of the proposed method.

Figure 2. Convergence behavior of the GA method during feature selection.

Figure 3. Comparison of LIME explanations: (left) baseline using full feature space; (right) E-XAI using reduced feature set.

Figure 4. Comparison of SHAP summary plots: (left) baseline; (right) E-XAI.

Figure 5. Comparison of SHAP dependence plots: (left) baseline; (right) E-XAI.

Table 1. Description of PMU Dataset and relay-related features in the Power System Attack Datasets

Feature	Description
PA1:VH – PA3:VH	Phase A – C voltage phase angle
PM1:V – PM3:V	Phase A – C voltage phase magnitude
PA4:IH – PA6:IH	Phase A – C current phase angle
PM4:I – PM6:I	Phase A – C current phase magnitude
PA7:VH – PA9:VH	Positive, negative, and zero-sequence voltage phase angle
PM7:V – PM9:V	Positive, negative, and zero-sequence voltage phase magnitude
PA10:IH – PA12:IH	Positive, negative, and zero-sequence current phase angle
PM10:I – PM12:I	Positive, negative, and zero-sequence current phase magnitude
F	Relay frequency
DF	Frequency delta ( $d F / d t$ ) for relays
PA:Z	Apparent impedance for relays
PA:ZH	Apparent impedance angle for relays
S	Relay status flag

Table 2. Detection Performance with Reduced Features (E-XAI).

Model	Accuracy	AUC	Train Time (s)	#Features
RF	0.9238	0.9756	252.99	115
KNN	0.8396	0.9009	1.70	115
MLP	0.8002	0.8484	537.72	115
AdaBoost	0.7102	0.6259	207.85	115

Table 3. Comparison of Baseline and E-XAI Performance

Model	${Acc}_{b a s e}$	${AUC}_{b a s e}$	${Time}_{b a s e}$	${Acc}_{r e d}$	${AUC}_{r e d}$	${Time}_{r e d}$	$Δ$ Acc	$Δ$ Time
RF	0.9229	0.9756	273.60	0.9238	0.9756	252.99	-0.0009	20.60
KNN	0.8377	0.8983	1.32	0.8396	0.9009	1.70	-0.0019	-0.38
MLP	0.8007	0.8504	599.07	0.8002	0.8484	537.72	0.0004	61.36
AdaBoost	0.7107	0.6430	228.43	0.7102	0.6259	207.85	0.0005	20.58

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Evolving Explainable AI for Power Quality Disturbance Detection and Interpretation in Smart Grids

Abstract

Keywords:

Subject:

1. Introduction

2. Related Work

3. Background and Preliminaries

3.1. Dataset

3.2. Explainable AI

3.3. Genetic Algorithms

4. Proposed Method

4.1. Overall Framework

4.2. GA Evolving Process

4.3. Computational Complexity Analysis

5. Experimental Design and Results

5.1. Experimental Setup

5.2. Detection Performance

5.3. GA Fitness Analysis

5.4. Explanation Analysis

6. Conclusion

Acknowledgments

References

MDPI Initiatives

Important Links

Subscribe