A Mamba-Driven Spatiotemporal Graph Neural Network for Fault Location in Low-Observability Active Distribution Networks

Zhengying Hou; Jilong Ma; Xuguang Hu

doi:10.20944/preprints202606.0120.v1

Submitted:

01 June 2026

Posted:

02 June 2026

You are already at the latest version

Abstract

In fault location for large-scale active distribution networks under low-observability conditions, inter-node relationships are difficult to model accurately, critical information at the initial fault stage is easily overlooked, and global information is insufficiently utilized. To address these issues, an adaptive Mamba-driven spatiotemporal graph neural network, termed AM-STGNN, is proposed. First, a dynamic spatiotemporal modeling module integrating adaptive implicit topology generation with the Mamba selective state-space model is designed to characterize time-varying electrical coupling among nodes and extract key transient fault information, thereby enhancing spatiotemporal feature representation under complex operating conditions. Second, an STGformer-based global interaction and linear attention optimization method is proposed to better exploit long-range spatiotemporal correlation information while reducing the computational complexity in large-scale distribution networks. Third, gated bilinear feature fusion and a physics-inspired differential output strategy are developed to better integrate spatiotemporal features and improve the discrimination of weak fault features and adjacent faulted line sections. Based on the above methods, limited nodal information can be more fully exploited under low observability to improve line-section fault location accuracy. Simulations validation based on the IEEE 123-node system demonstrates that AM-STGNN achieves superior performance, and the ablation experiments further verify the effectiveness of each component.

Keywords:

distribution network

;

fault location

;

graph neural network

;

Mamba model

;

adaptive topology

;

spatiotemporal correlation

Subject:

Engineering - Electrical and Electronic Engineering

1. Introduction

With the increasing integration of distributed energy resources and power electronic devices, the structural and operational characteristics of distribution networks have undergone substantial changes, accompanied by increasingly complex network topologies [1] . In this context, conventional distribution networks are gradually evolving into active distribution networks, where power flow has shifted from unidirectional transmission to bidirectional exchange. The operating state has also become more susceptible to variations in operating conditions, resulting in more complicated transient responses after faults occur. Consequently, conventional fault location methods face greater challenges in both system state monitoring and fault diagnosis [2] . Although deploying a large number of high-precision measurement devices can improve system observability and facilitate the acquisition of fault information, it inevitably increases equipment investment as well as operation and maintenance costs. Therefore, achieving rapid and accurate fault location in complex active distribution networks under low-observability conditions has become an important issue in current distribution network operation and maintenance.

To achieve fault location, impedance-based, phasor-measurement-based, and state-estimation-based methods have been extensively investigated. In [3] , an impedance-based fault distance estimation method considering the influence of distributed generation was developed to mitigate the increased errors of conventional impedance methods under distributed generation integration. In [4] , an adaptive impedance model for active distribution networks was constructed to address the insufficient adaptability of conventional impedance methods to complex operating scenarios. In [5] , synchronized phasor measurement information was introduced into extended impedance-based fault distance estimation, thereby improving the limited fault location accuracy in complex distribution systems. In [6] , PMU measurements were further integrated with power system state estimation, and distribution network fault location was formulated as an optimization problem, which improved the identification accuracy of faulted line sections under complex branching conditions. These methods have clear physical interpretations and are relatively straightforward to implement; however, they are sensitive to network parameters, measurement accuracy, and model completeness, which limits their robustness in complex active distribution network scenarios. Therefore, traveling-wave-based fault location methods, which can directly exploit the initial transient characteristics of faults, have been widely studied. In [7] , a single-ended fault location method based on reflected traveling waves was proposed to overcome the dependence of conventional traveling-wave methods on double-ended synchronized measurements. In [8] , a fault traveling-wave time matrix was constructed, and the arrival times of wavefronts were extracted for fault location, thereby addressing the challenges caused by complex traveling-wave propagation paths, limited location speed, and insufficient reliability in multi-branch distribution networks. Although traveling-wave-based methods can make full use of initial transient fault information and exhibit excellent theoretical location accuracy under long-distance and complex-branch conditions, they impose stringent requirements on sampling frequency, time synchronization, and wavefront identification accuracy. To reduce the dependence on high-end measurement devices and complex physical modeling, deep learning has been increasingly investigated for fault location in distribution networks. In [9] , a deep-learning-based fault identification and classification method for active distribution networks was proposed to address the difficulty of manually extracting discriminative features from complex fault waveforms. In [10] , a fault classification and location method based on a deep convolutional neural network was developed to effectively fuse multidimensional measurement features under PMU deployment conditions. In [11] , VAE-generated synthetic data and multiple machine learning algorithms were introduced into fault classification and location for transmission networks, demonstrating that data augmentation can alleviate the adverse effects of insufficient fault samples on model performance. Meanwhile, fault diagnosis studies for renewable energy devices such as photovoltaic systems have also shown that hybrid deep learning models and graph convolutional variational autoencoders can extract effective fault features from complex operating data [12,13] . These studies indicate that data-driven methods have strong feature learning capabilities. However, most conventional deep learning models are designed for Euclidean-structured data, and their ability to represent distribution network topology and the spatial correlations of fault propagation remains insufficient. To further integrate topological structure with measurement data, graph neural network methods have been widely investigated. In [14] , a graph-neural-network-based fault location method that simultaneously considers node attributes and line attributes was proposed to address the difficulty of representing line differences and fault propagation paths in conventional graph models. In [15] , a spatiotemporal recurrent graph neural network fault diagnosis method was developed to jointly model the temporal evolution and spatial propagation of faults. In [16] , a fault location method based on a multi-head graph attention network was proposed to address the inaccurate modeling of inter-node coupling relationships under complex topological conditions. In [17] , a VGAE-GraphSAGE-based fault location method for distribution networks was developed to overcome the limited feature representation capability of shallow graph networks. In [18] , a physics-preserving graph network was introduced for fault location, improving real-time location capability in low-observability and few-label scenarios. In [19] , a domain-adaptive transfer dynamic graph attention network was proposed to enhance model generalization under varying operating scenarios and data distribution shifts. In [20] , a spatiotemporal correlation graph neural network was developed for large-scale distribution network fault location, addressing the inaccurate identification of faulted line sections under low-observability conditions. In [21] , a two-stage graph spatiotemporal attention network was proposed to handle the changes in fault feature propagation paths caused by topology reconfiguration in distributed energy systems, as well as the mutual interference among different fault types. In [22] , an interpretable physics-informed graph network with a multitask cascaded fault diagnosis framework was developed to address the masking of original fault features by superimposed multiple fault characteristics and the insufficient physical interpretability of the diagnosis process in cascading fault scenarios. In addition, graph-enhanced deep reinforcement learning for distribution network fault recovery has also employed graph neural networks to extract topological features, thereby improving state perception under complex operating conditions [23] . Although existing graph-neural-network-based methods can improve fault location or fault diagnosis performance by integrating topological relationships with measurement features, several limitations remain under low-observability and complex operating scenarios, including the dependence of node relationships on static structures, insufficient utilization of short-term dynamic fault information, and inadequate modeling of global correlations.

Accordingly, an adaptive Mamba-driven spatiotemporal graph neural network is proposed for fault location in complex active distribution networks under low-observability conditions. Built upon the existing spatiotemporal correlation graph neural network framework, the proposed method incorporates dynamic topology learning, temporal sequence modeling, global interaction, and feature fusion mechanisms, enabling high-accuracy identification of faulted line sections under complex operating conditions. The main contributions are summarized as follows:

(1) A dynamic spatiotemporal modeling method that integrates adaptive implicit topology generation with the Mamba selective state-space model is proposed. This method can simultaneously characterize the time-varying electrical coupling relationships among nodes and capture key transient information during the fault process.

(2) An STGformer-based global interaction and linear attention optimization method is proposed. The computational complexity in large-scale distribution network scenarios is reduced while long-range spatiotemporal correlation modeling is enhanced.

(3) A gated bilinear feature fusion mechanism and a physics-inspired differential output strategy are proposed. The representation capability of spatiotemporal features is improved, and the discrimination of weak-feature faults and adjacent faulted line sections is further enhanced.

2. Selection Strategy for Observation Nodes

In large-scale active distribution networks, the large number of nodes and the complexity of branches make full-network deployment of high-density monitoring devices economically and technically demanding, as equipment investment, communication burden, and maintenance costs would be substantially increased. For the fault location task under low-observability conditions, the model inputs in this study are not collected from all network nodes, but from a predefined set of observation nodes. Once the observation nodes are determined, the training, validation, and test sets are constructed under identical observation conditions, and all subsequent modeling and result analyses are performed based on these inputs.

Let the node set of the distribution network be defined as

V = {1,2, \dots, n} .

(1)

Given an observation rate

ρ \in (0,1)

, the number of observation nodes is defined as

m = ⌊ ρ n ⌋ .

(2)

To avoid the concentration of observation nodes within local regions, the network is first divided into

m

non-overlapping regions according to the current propagation direction, denoted as

{Ω_{1}, Ω_{2}, \dots, Ω_{m}} .

One representative node is then selected from each region to form the final observation node set.

To quantify the ability of each candidate node to respond to surrounding fault propagation information, a correlation coefficient based on the shortest electrical line distance is introduced. Let

d_{i l}

denote the shortest electrical line distance between node

i

and node

l

, and let

σ_{i}

denote the average line distance from node

i

to its neighboring nodes. The correlation coefficient is defined as

r_{i l} = e x p (- \frac{d_{i l}^{2}}{σ_{i}^{2}}) .

(3)

Here,

r_{i l}

represents the correlation of fault unbalanced current propagation between nodes

i

and

l

within the same region. A shorter distance corresponds to a larger value of

r_{i l}

, whereas a longer distance leads to a smaller value. On this basis, the sum of the correlations between node

i

and the other nodes within the same region is used as the node score.

η_{i} = \sum_{l \in Ω_{q}, l \neq i} r_{i l} .

(4)

The node with the highest score is selected as the observation node of the corresponding region:

v_{q}^{*} = a r g \underset{i \in Ω_{q}}{m a x} η_{i}, q = 1,2, \dots, m .

(5)

Accordingly, the observation node set is expressed as

V_{o b s} = {v_{1}^{*}, v_{2}^{*}, \dots, v_{m}^{*}} .

(6)

This selection strategy ensures that the observation nodes are spatially dispersed across the entire network while covering the main range of fault information propagation as much as possible.

Let

R

denote the system adjacency matrix and

C

denote the output matrix. The Popov–Belevitch–Hautus (PBH) criterion is then adopted to verify the observability of the selected node set:

r a n k [\begin{matrix} λ_{i} I - R \\ C \end{matrix}] = n, i = 1,2, \dots, n,

(7)

where

λ_{i}

is an eigenvalue of

R

. If the above condition is satisfied, the current observation node set meets the system observability requirement. All subsequent experiments in this study are conducted based on this fixed set of observation nodes, and both the number and locations of these nodes remain unchanged.

3. Methods

3.1. Dynamic Spatiotemporal Modeling Based on the Coupling of Adaptive Graph Manifold and Mamba Dynamics

With the deployment of observation nodes kept fixed, the key issue in subsequent modeling lies in how to extract more effective fault information from the spatiotemporal sequences of limited nodes. Fault location in distribution networks is associated not only with the spatial dependencies among nodes but also with the temporal variations of fault signals. After a fault occurs, electrical quantities such as voltage, current, and phase angle propagate through the network and exhibit pronounced transient variations within a short time. If the spatial relationships are not accurately characterized, the model can hardly determine how fault information is transmitted across the network. If the temporal modeling capability is insufficient, the model may overlook the most critical abrupt features at the initial stage of the fault. On this basis, an adaptive implicit topology is first constructed from the spatial domain, and the Mamba selective state-space module is then employed to perform dynamic temporal modeling of fault sequences. In this manner, spatial and temporal information can be jointly extracted [24] . The overall framework is shown in Figure 1.

3.1.1. Data-Driven Adaptive Implicit Topology Generation

Although the inter-node connectivity obtained from the static geographical distance formulation

r_{i l} = \exp (- d_{i l}^{2} / ϵ_{i}^{2}),

(8)

can reflect the network structure under normal operating conditions. However, under short-circuit faults, distributed generation integration, or distributed generation disconnection, the actual electrical coupling relationships among nodes may vary significantly. In such cases, using only a static adjacency matrix is insufficient to characterize the variations in the strength of post-fault information propagation.

To address this issue, a data-driven strategy is adopted to automatically learn the topological relationships among nodes directly from data. Suppose that there are

N

observation nodes in the system. Two learnable embedding matrices are introduced for each node:

E_{1} \in R^{N \times d}, E_{2} \in R^{N \times d},

(9)

where

d

denotes the embedding dimension.

E_{1}

and

E_{2}

represent the latent features of nodes in the information-sending and information-receiving processes, respectively. To describe the potential directional influence between nodes, a direction-sensitive correlation matrix is first constructed as

S = E_{1} E_{2}^{⊤} - E_{2} E_{1}^{⊤} .

(10)

In general,

S

is an asymmetric matrix, which enables the influence of node

i

on node

j

and that of node

j

on node

i

to be represented differently. This property is consistent with the actual propagation process of fault disturbances in distribution networks.

Since the propagation of fault disturbances in the network usually exhibits directionality and variations in propagation strength, a nonlinear activation is further applied to

S

to suppress the effects of weak correlations and noise terms.

\tilde{S} = R e L U (S) .

(11)

The adaptive implicit adjacency matrix is further obtained through row-wise normalization:

A_{adp} = S o f t m a x (\tilde{S}) .

(12)

Here, the Softmax operation is performed along each row, ensuring that the sum of the weights assigned by each node to its adjacent nodes equals one. This normalization contributes to more stable feature aggregation in subsequent graph propagation. Consequently, the model no longer relies solely on geographical distance, but can adaptively adjust the connection strengths among nodes during training.

During graph feature propagation, let the node representation at the

l

-th layer be denoted as

H^{(l)} \in R^{N \times C_{l}}

.

The spatial aggregation based on the adaptive topology can then be formulated as

H^{(l+ 1)} = σ (A_{adp} H^{(l)} W^{(l)}),

(13)

where

W^{(l)}

denotes the learnable parameter matrix at the

l

-th layer, and

σ (\cdot)

represents a nonlinear activation function.

The function of this module can be summarized from two aspects. First, it enables the latent relationships among nodes to be learned from the available dataset without introducing additional node-related data, such as line impedance. Second, it provides spatial features that are more consistent with the actual propagation behavior of faults for subsequent temporal modeling, allowing information to be transmitted beyond a fixed graph structure

3.1.2. Mamba Selective Temporal Scanning Based on a Discrete State-Space Mode

After spatial feature extraction, the temporal evolution of fault signals must be further analyzed. The data used in this study contain 20 sampling points, corresponding to a 0.04 s fault transient process. Owing to this short time span, the voltage sag, current surge, and phase-angle variation occurring immediately after fault inception usually contain more discriminative fault information, whereas the subsequent samples mainly reflect relatively smooth response characteristics. If simple average aggregation or conventional attention weighting is adopted, the contribution of these critical instantaneous features may be weakened.

To address this issue, the Mamba selective state-space module is introduced to dynamically scan the temporal representations. The basic idea is to regard the time series of each node as a discretely sampled dynamic system. Let

x (t)

denote the input sequence and

h (t)

denote the hidden state. The continuous-time state-space model can be expressed as

\frac{d h (t)}{d t} = A h (t) + B x (t),

(14)

y (t) = C h (t),

(15)

where

A

is the state transition matrix, while

B

and

C

denote the input mapping matrix and the output mapping matrix, respectively.

Since the actual data are obtained through discrete sampling, the continuous system needs to be discretized. Given the sampling interval

Δ

, the zero-order hold method is adopted, yielding the following discrete form:

\overset{ˉ}{A} = e^{Δ A},

(16)

\overset{ˉ}{B} = (\int_{0}^{Δ} e^{τ A} d τ) B .

(17)

Accordingly, the discrete state update process is written as

h_{k} = \overset{ˉ}{A} h_{k - 1} + \overset{ˉ}{B} x_{k},

(18)

y_{k} = C h_{k} .

(19)

The above formulation corresponds to a linear time-invariant system, in which

A

,

B

,

C

, and

Δ

remain fixed. The key feature of Mamba lies in the introduction of an input-dependent selective mechanism, through which these parameters are no longer treated as constants but are dynamically generated from the current input. Specifically, for the input

x_{k}

at the

k

-th time step, the selective parameters are obtained as

Δ_{k} = f_{Δ} (x_{k}), B_{k} = f_{B} (x_{k}), C_{k} = f_{C} (x_{k}),

(20)

where

f_{Δ} (\cdot)

,

f_{B} (\cdot)

, and

f_{C} (\cdot)

are learnable functions composed of linear projections and nonlinear mappings. Accordingly, the discrete state update can be further expressed as

{\overset{ˉ}{A}}_{k} = e^{Δ_{k} A},

(21)

{\overset{ˉ}{B}}_{k} = (\int_{0}^{Δ_{k}} e^{τ A} d τ) B_{k},

(22)

h_{k} = {\overset{ˉ}{A}}_{k} h_{k - 1} + {\overset{ˉ}{B}}_{k} x_{k},

(23)

y_{k} = C_{k} h_{k} .

(24)

Compared with conventional temporal modeling methods, this mechanism adaptively adjusts the state update process according to the current input. At the initial stage of a fault, the input signals usually exhibit pronounced variations, enabling the model to place greater emphasis on these abrupt transient features. During the subsequent relatively stable stage, the state update becomes smoother, thereby reducing the influence of ineffective fluctuations.

In this study, the input to the Mamba module is not the original measurement data, but the node features obtained after spatial topology modeling. Let the spatiotemporal features output by the spatial module be denoted as

H \in R^{T \times N \times C}

.

Where

T

is the number of time steps,

N

is the number of nodes, and

C

is the feature dimension. For each node

n

, its feature sequence over all time steps is extracted as

X_{n} = [h_{1, n}, h_{2, n}, \dots, h_{T, n}] \in R^{T \times C} .

(25)

This sequence is then fed into the Mamba module for selective scanning, yielding the temporally enhanced node representation:

Z_{n} = M a m b a (X_{n}) .

(26)

After all nodes are processed simultaneously, a new spatiotemporal feature tensor is obtained as

Z \in R^{T \times N \times C} .

In this manner, the node coupling relationships learned in the spatial domain are integrated with the dynamic evolution process captured in the temporal domain. The former describes the propagation relationships of fault information across the network, whereas the latter extracts the dynamic variation characteristics embedded in fault sequences. Through this coupling, the model can simultaneously account for how fault information propagates through the network and how fault signals evolve over time. Figure 2 presents a schematic comparison between Mamba selective scanning and the conventional attention mechanism, providing a more intuitive illustration of their differences in responding to information at critical time instants.

3.2. Global Linear Spatiotemporal Interaction Architecture Based on the STGformer Kernel Decomposition Mechanism

3.2.1. Modeling Process of Kernel-Decomposed Linear Attention

Under low-observability conditions, fault information often needs to propagate across multiple nodes. However, as the number of graph convolution layers increases, the features of different nodes tend to become increasingly similar. In addition, standard self-attention suffers from computational and memory complexity of

O (L^{2})

. To address these issues, an STGformer-based kernel-decomposed linear attention mechanism is further introduced to enable efficient global interaction among spatiotemporal features [25]。

Let the spatiotemporal features obtained after adaptive topology modeling and Mamba temporal scanning be denoted as

X \in R^{L \times d} .

Where

L

denotes the total number of unfolded spatiotemporal positions, and

d

denotes the feature dimension. After linear projection of the input features, the query, key, and value matrices are obtained as

Q = X W_{Q}, K = X W_{K}, V = X W_{V},

(27)

where

W_{Q}

,

W_{K}

, and

W_{V}

are learnable parameter matrices.

In standard self-attention, the output is formulated as

A t t e n t i o n (Q, K, V) = S o f t m a x (\frac{Q K^{⊤}}{\sqrt{d}}) V .

(28)

The key operation in this formulation is the prior computation of

Q K^{⊤}

, which produces an

L \times L

attention matrix. Therefore, when

L

becomes large, both the computational cost and memory consumption increase substantially.

To reduce the computational complexity, STGformer adopts the kernel decomposition strategy. Specifically, a feature mapping function ϕ(⋅) is used to transform the query and key matrices, allowing the attention operation to be formulated as

A t t e n t i o n (Q, K, V) \approx ϕ (Q) ({ϕ (K)}^{T} V) .

(29)

Here,

ϕ (\cdot)

is usually defined as an element-wise non-negative mapping, such as

ϕ (X) = E L U (X) + 1 .

With this treatment, attention computation no longer requires the complete

L \times L

correlation matrix to be formed first. Instead, a smaller intermediate term can be computed first.

G = ϕ (K)^{⊤} V .

(30)

Where

G \in R^{d \times d} .

The final output is then obtained by multiplying this intermediate term with

ϕ (Q)

:

Y = ϕ (Q) G .

(31)

From the perspective of computational cost, the first step has a complexity of

O (L d^{2}),

and the second step also has a complexity of

O (L d^{2}) .

Therefore, the overall complexity can be approximately expressed as

O (L d^{2}),

which increases linearly with the sequence length

L

, rather than quadratically.

The significance of this formulation lies in that global dependencies among spatiotemporal positions can still be established without explicitly constructing the complete pairwise correlation matrix. Therefore, it is more suitable for distribution network fault location tasks involving a relatively large number of nodes, short temporal sequences, and large unfolded spatiotemporal scales.

On this basis, a global linear attention module is further employed to uniformly update the existing spatiotemporal features.

3.2.2. Spatiotemporal Feature Updating Based on Global Interaction

In this study, the STGformer module is not employed as an independent substitute for spatial or temporal modeling. Instead, it is used to further perform global interaction based on the spatiotemporal features obtained from the preceding modules. Let the feature tensor from the previous stage be denoted as

H \in R^{T \times N \times C},

where

T

is the number of time steps,

N

is the number of nodes, and

C

is the feature dimension. To enable unified modeling, this tensor is unfolded along both the temporal and spatial dimensions as

X \in R^{L \times C}, L = T \times N .

(32)

The unfolded feature matrix

X

is then fed into the global linear attention module to obtain the updated representation:

Y = L i n e a r A t t e n t i o n (X) .

(33)

To enhance the representation capability while preserving the original information, a residual connection is adopted for feature updating:

X^{'} = X + Y .

(34)

On this basis, a feed-forward network is further introduced to extract nonlinear feature representations:

Z = F F N (X^{'}) .

(35)

Finally, the output of the global interaction layer is obtained through another residual update:

H_{out} = X^{'} + Z .

(36)

After

H_{o u t}

is reshaped back into the tensor form of

T \times N \times C

, it can be passed to the subsequent modules.

Through this process, the model can not only exploit local topological propagation information, but also directly establish global dependencies among different nodes and different time steps. Therefore, more comprehensive spatiotemporal features required for fault location can be extracted.

Figure 3. Global Linear Spatiotemporal Interaction Based on STGformer.

3.3. Gated Bilinear Spatiotemporal Feature Fusion and Topology-Aware Differential Output Mechanism

After spatial modeling, temporal modeling, and global spatiotemporal interaction have been completed, the model obtains two types of feature representations with different emphases. One type focuses more on spatial topological relationships, whereas the other emphasizes the temporal evolution of fault signals. Effectively integrating these two types of information and further improving the discrimination capability for adjacent faulted line sections are critical to the final fault location performance.

Based on this consideration, two closely connected components are designed at the output stage. First, a gated mechanism is employed to adaptively fuse spatial and temporal features. Then, a topology-aware differential output strategy and a margin constraint are introduced in the classification stage to enhance the model’s ability to distinguish easily confused adjacent classes [26].

3.3.1. Gated Bilinear Spatiotemporal Feature Fusion

Let the spatial and temporal features obtained from the preceding modules be denoted as

H_{s} \in R^{N \times d}, H_{t} \in R^{N \times d},

where

N

is the number of nodes and

d

is the feature dimension.

H_{s}

mainly contains the spatial correlation information among nodes, whereas

H_{t}

primarily reflects the dynamic variation characteristics of fault sequences.

To enable these two types of features to be fused within a unified representation space, linear mappings are first applied to them separately:

{\tilde{H}}_{s} = H_{s} W_{s} + b_{s},

(37)

{\tilde{H}}_{t} = H_{t} W_{t} + b_{t},

(38)

where

W_{s}, W_{t} \in R^{d \times d}

are learnable parameter matrices, and

b_{s}

and

b_{t}

are bias terms.

The two feature representations are then concatenated along the feature dimension to obtain a joint representation:

H_{c} = [{\tilde{H}}_{s} ∥ {\tilde{H}}_{t}] .

(39)

A gating network is further used to generate the fusion coefficients:

G = σ (H_{c} W_{g} + b_{g}),

(40)

where

W_{g}

and

b_{g}

are the parameters of the gating layer, and

σ (\cdot)

denotes the sigmoid function. Therefore, all elements in

G

fall within the interval [0, 1].

Finally, the fused feature representation is defined as

H_{f} = G ⊙ {\tilde{H}}_{s} + (1 - G) ⊙ {\tilde{H}}_{t},

(41)

where

⊙

denotes the Hadamard element-wise product.

This operation enables the model to automatically determine the relative contributions of spatial and temporal information at different positions and feature dimensions. Compared with direct addition or classification after simple concatenation, gated fusion provides a more flexible way to exploit both types of features. For certain fault samples, the spatial propagation relationship may be more prominent, and the model can assign a higher weight to the spatial branch. For other samples, transient temporal variations may be more critical, and the model can increase the contribution of the temporal branch accordingly. In this way, information loss caused by fixed fusion strategies can be reduced.

3.3.2. Topology-Aware Differential Output

After the fused feature representation

H_{f}

is obtained, directly feeding it into a classifier may still lead to confusion between adjacent faulted line sections. This is because topologically adjacent nodes often exhibit similar variations in voltage, current, and phase angle during faults. Therefore, using only the original fused features may be insufficient to distinguish these neighboring classes effectively.

To address this issue, a topology-aware differential output mechanism is further introduced. Let the fused feature of node

i

be denoted as

h_{i}

, and let its neighbor set be represented by

N (i)

. The average feature of the neighboring nodes is first calculated as

{\overset{ˉ}{h}}_{i} = \frac{1}{∣ N (i) ∣} \sum_{j \in N (i)} h_{j} .

(42)

The differential feature between node

i

and its neighborhood is then computed as

Δ h_{i} = h_{i} - {\overset{ˉ}{h}}_{i} .

(43)

Here,

Δ h_{i}

represents the feature discrepancy between the current node and its surrounding nodes. For fault location, such discrepancy information is informative because the actual faulted section usually exhibits more pronounced abnormal characteristics than its neighboring regions.

In the classification stage, the original fused feature and the differential feature are concatenated to form a new classification input:

u_{i} = [h_{i} ∥ Δ h_{i}] .

(44)

The class logits are then obtained through a multilayer perceptron:

z_{i} = M L P (u_{i}) .

(45)

This output strategy uses only

h_{i}

for classification and makes greater use of the relative differences between each node and its neighborhood, thereby helping improve the separability between adjacent faulted line sections.

3.3.3. Boundary Enhancement Based on a Margin Constraint

Even with the introduction of the differential output mechanism, the prediction scores of adjacent nodes may still be very close. For the standard cross-entropy loss, as long as the logit corresponding to the true class is slightly higher than those of the other classes, the model tends to regard the current sample as correctly classified. In this case, the optimization intensity is substantially weakened. This may lead to a critical problem: although the prediction result is correct, the separation between the true class and the competing classes may remain small, making the model susceptible to misclassification under noise interference during testing.

Let the true class of a sample be denoted as

y

, and let the logit vector output by the classifier be denoted as

z

. The standard cross-entropy loss is defined as

L_{c e} = - l o g \frac{e^{z_{y}}}{\sum_{c = 1}^{C} e^{z_{c}}},

(46)

where

C

is the number of classes, and

z_{y}

denotes the logit corresponding to the true class.

To further enlarge the discriminative margin between the true class and the most easily confused class, a margin constraint is introduced on the basis of the cross-entropy loss. The competing class is defined as

c^{*} = a r g \underset{c \neq y}{m a x} z_{c},

(47)

namely, the class with the largest logit among all classes except the true class. Based on this definition, the following margin penalty term is formulated:

L_{m} = m a x (0, m - (z_{y} - z_{c^{*}})),

(48)

where

m

denotes the prescribed safety margin. When

z_{y} - z_{c^{*}} \geq m

, this penalty term becomes zero. When the gap between the two logits is insufficient, the penalty term is activated, encouraging an increase in the score of the true class and a decrease in the score of the competing class.

Therefore, the final loss function is expressed as

L = L_{c e} + λ L_{m},

(49)

where

λ

is the weighting coefficient of the margin term.

When the margin term is activated, namely,

z_{y} - z_{c^{*}} < m,

(50)

the following gradients can be obtained:

\frac{\partial L_{m}}{\partial z_{y}} = - 1, \frac{\partial L_{m}}{\partial z_{c^{*}}} = 1 .

(51)

This indicates that, during backpropagation, the logit corresponding to the true class is continuously increased, whereas the logit corresponding to the most easily confused class is suppressed. Compared with the standard cross-entropy loss, this constraint not only requires correct classification, but also enforces a sufficient separation between the true class and the competing class. By combining the proposed margin constraint with the preceding gated fusion and differential output mechanisms, the model can further enlarge the decision boundaries between adjacent faulted line sections while exploiting spatiotemporal information, thereby improving the location accuracy for samples with complex boundaries.

Figure 4. Gated Bilinear Fusion and Topology-Aware Discriminative Output Mechanism.

4. Results and Analysis

4.1. Experimental Settings and Simulation Validation Methodology

The experiments in this study are conducted based on fault data from the IEEE 123-node distribution network. The original dataset contains

128 \times 10 \times 20 = 25600

Samples, where 128 denotes the faulted lines traversed sequentially, 10 denotes the ten fault types considered on three-phase lines, and 20 denotes the consecutive sampling points during the fault period. The ten fault types include single-phase-to-ground faults

(A - G, B - G, C - G)

, phase-to-phase faults

(A - B, B - C, A - C)

, phase-to-phase-to-ground faults

(A - B - G, B - C - G, A - C - G)

, and three-phase short-circuit faults

(A - B - C)

.

The original data contain 128 nodes, and 12 feature variables are recorded for each node, corresponding to the three-phase voltages and their phase angles, as well as the three-phase currents and their phase angles, with the phase order arranged as ABC. Each feature variable contains 25,600 data points along the sample dimension. Therefore, the overall dataset can be represented as 128 nodes

\times

12 features

\times

25,600 samples. The sampling frequency is 500 Hz, corresponding to a time interval of 0.002 s. Thus, the 20 consecutive sampling points cover a 0.04 s fault transient process.

In temporal modeling, the original samples are reorganized according to the faulted line, fault type, and sampling point, such that each fault scenario corresponds to a transient sequence with a length of 20. This sequence is then used as the model input to characterize the fault transient process.

In terms of the observation setting, the inputs of all non-observation nodes are set to zero, so as to evaluate the fault location capability of the proposed model under low-observability conditions.

Two types of simulation validation are adopted in this study. The first type is used for fair comparison with baseline models, in which the data organization and sampling strategy are kept as consistent as possible. The second type is used to evaluate the location capability of the model on independent fault samples. Specifically, the training, validation, and test stages employ mutually non-overlapping fault samples, and the selection of model parameters does not rely on any information from the test set. The main text focuses primarily on the results obtained from the second type of simulation validation, while the fair-comparison results are provided as supplementary references.

For performance evaluation, the macro-averaged F1 score (Macro-F1), absolute location accuracy (Acc), and one-hop tolerant accuracy (Acc 1-hop) are adopted. Macro-F1 is used to measure the overall recognition performance across different classes, Acc reflects whether the faulted section is accurately located, and Acc 1-hop indicates whether the predicted result falls within the one-hop neighborhood of the true faulted section. All performance comparisons and ablation studies in this work are conducted using these three metrics.

Figure 5. IEEE 123-Node Active Distribution Network for Fault-Location Simulation Validation.

4.2. Overall Comparative Analysis of Fault Location Performance

Based on the simulation validation settings described above, the overall fault location performance of AM-STGNN is first evaluated. The experimental results show that AM-STGNN achieves a Macro-F1 of 0.9684, an Acc of 0.9711, and an Acc 1-hop of 0.9938. These results indicate that the proposed model can maintain high fault location accuracy and strong one-hop tolerant accuracy for adjacent line sections under strict scenario isolation. The overall performance comparison between AM-STGNN and scGNN is shown in Figure 6.

As shown in Figure 6, AM-STGNN outperforms scGNN across all three evaluation metrics. Specifically, scGNN achieves Macro-F1, Acc, and Acc 1-hop values of 0.9674, 0.9662, and 0.9703, respectively, whereas AM-STGNN reaches 0.9684, 0.9711, and 0.9938 in the proposed simulation validation. The improvement in Acc 1-hop is particularly significant, indicating that AM-STGNN provides better location stability in scenarios where adjacent line sections are prone to confusion.

From the perspective of performance characteristics, AM-STGNN not only maintains strong absolute fault location capability, but also tends to restrict erroneous predictions to the vicinity of the true faulted section when deviations occur. This property is practically meaningful for distribution network fault location, because misclassifications in such tasks are more likely to occur between topologically adjacent sections rather than between completely unrelated sections. The high Acc 1-hop further demonstrates that AM-STGNN can effectively exploit spatiotemporal features to narrow the range of misclassification.

The error analysis further shows that the remaining errors of AM-STGNN are mainly concentrated among a small number of faulted line sections with adjacent topologies or similar response characteristics, rather than being distributed as large-scale misclassifications among unrelated sections. This indicates that AM-STGNN can provide correct fault location results in most fault scenarios, and that the residual errors mainly arise from samples with close decision boundaries and highly similar features, rather than from insufficient overall discrimination capability.

Overall, AM-STGNN achieves favorable results in the simulation validation, demonstrating that the spatial modeling, temporal modeling, and output discrimination components remain effective without relying on sample overlap. This also indicates that the performance improvement of AM-STGNN is not caused by differences in data organization or validation strategy, but remains effective under stricter generalization conditions.

4.3. Ablation Analysis of Core Module

To analyze the influence of each component of AM-STGNN on fault location performance, ablation experiments are conducted based on the complete AM-STGNN model. The spatial branch, temporal branch, training strategy, and output discrimination modules are removed one at a time, and the corresponding changes in evaluation metrics are compared. The complete model achieves a Macro-F1 of 0.9684, an Acc of 0.9711, and an Acc 1-hop of 0.9938. The variations in Macro-F1 after removing different modules are shown in Figure 7.

As shown in Figure 7, the model performance generally decreases after each module is removed, indicating that the current performance is not contributed by a single component alone, but results from the joint effects of spatial modeling, temporal modeling, and the training strategy. Among all ablation settings, several cases lead to particularly pronounced performance degradation. When the two-stage training strategy, in which the spatiotemporal feature extraction module and the classification layer are first jointly trained and the classification layer is then optimized separately, is removed, the Macro-F1 decreases to 0.8216. When the entire temporal branch is removed, the Macro-F1 decreases to 0.8333. When the fault time window is extended from the last 13 sampling points to all 20 sampling points, the Macro-F1 decreases to 0.9002. When the training optimizer is changed from AdamW to Adam, the Macro-F1 decreases to 0.9337. These results indicate that the temporal branch, the two-stage training strategy, the selection of the fault time window, and the optimizer setting all have substantial effects on the final performance.

By contrast, when per-time-step supervision is replaced with scenario-level supervision, the Macro-F1 only decreases from 0.9684 to 0.9638, showing a relatively small performance drop. This suggests that both supervision strategies are feasible, while per-time-step supervision provides slightly better performance.

From the perspective of temporal modeling, removing the entire temporal branch causes the Macro-F1 to decrease from 0.9684 to 0.8333, representing one of the most pronounced performance drops among all ablation settings. This result indicates that relying solely on spatial graph features is insufficient for distribution network fault location, and that the short-term temporal evolution of fault signals has a direct influence on the location results. A more detailed analysis further shows that removing the selective state-space mixing module from the temporal branch reduces the Macro-F1 to 0.9199; removing the graph-structural constraint from the temporal feature updating process decreases the Macro-F1 to 0.9196; and reducing the high-order graph diffusion from three orders to one order decreases the Macro-F1 to 0.9273. These results demonstrate the overall effectiveness of the temporal branch and confirm that graph filtering, multi-order diffusion, and selective scanning all contribute to the final fault location performance.

From the perspective of spatial modeling, removing the feature aggregation component that exploits information from adjacent nodes in the spatial branch reduces the Macro-F1 to 0.8718. Removing the component that learns inter-node correlations through graph attention decreases the Macro-F1 to 0.8823. These results indicate that both components in the current spatial branch are effective: local aggregation information within the neighborhood is required, while graph attention is also necessary to supplement the correlation information among nodes.Furthermore, within the temporal branch, removing the graph-filtering-based recurrent branch results in a Macro-F1 of 0.9306, whereas removing the branch that directly preserves the original input features and performs recurrent modeling leads to a Macro-F1 of 0.9055. The latter causes a more substantial performance degradation, suggesting that preserving the original input information is more important for temporal modeling in the current model.

Overall, three conclusions can be drawn from the ablation experiments. First, the temporal branch is one of the most critical components of the current model, as its removal leads to the most pronounced performance degradation. Second, the local neighborhood aggregation and graph attention mechanisms in the spatial backbone, as well as the graph filtering, multi-order diffusion, and selective scanning mechanisms in the temporal branch, all contribute to the final performance. Third, the training strategy is also important, since the selection of the fault time window, the separate optimization of the classification layer, and the choice of optimizer all have substantial effects on the results. Therefore, the performance improvement of AM-STGNN is not brought by a single module, but by the combined effects of spatiotemporal modeling and the training strategy.

5. Discussion

To more clearly illustrate the differences in modeling philosophy between AM-STGNN and the spatiotemporal correlation graph neural network method, a comparison is conducted from three aspects: input conditions, structural composition, and experimental results. The two methods use the same number of observation nodes, original input features, and transient time-window length. Therefore, their differences mainly originate from the subsequent modeling strategies.

The spatiotemporal correlation graph neural network method mainly consists of three components: observation node selection, a graph-convolution-based spatiotemporal attention module, and a classification output module [20] . In this method, observation nodes are first selected according to the propagation characteristics of fault unbalanced currents. Then, graph convolution and spatiotemporal attention are employed to extract the attenuation characteristics of fault propagation. Finally, faulted line sections are identified through a fully connected layer. Its spatial relationships are mainly constructed based on predefined adjacency relationships and distance-based correlations, while temporal modeling is performed through the graph-convolution-based spatiotemporal attention module.

In contrast, AM-STGNN extends the baseline modeling framework in several key aspects while maintaining the same input conditions. First, in the spatial modeling stage, fixed adjacency relationships are no longer used as the sole basis for graph construction. Instead, an adaptive implicit topology is constructed through learnable node embeddings, allowing inter-node relationships to be dynamically adjusted according to the fault data. Second, in the temporal modeling stage, the Mamba selective state-space module is introduced to scan the short-term dynamic process after fault inception, thereby strengthening the extraction of critical transient information. Third, on the basis of local spatiotemporal features, an STGformer-based global linear attention layer is incorporated to enhance long-range spatiotemporal interaction. At the output end, gated bilinear fusion, topology-aware differential output, and a margin constraint are further introduced to improve the discrimination capability between adjacent faulted line sections.

The main structural and functional differences between the baseline method and AM-STGNN are summarized in Table 1. This comparison shows that AM-STGNN strengthens the baseline framework not by changing the input data, but by enhancing the representation, interaction, and discrimination processes after feature extraction.

Compared with the baseline method, AM-STGNN provides stronger capability in dynamic topology modeling, selective temporal state representation, global spatiotemporal interaction, gated feature fusion, and topology-aware boundary enhancement. These improvements correspond to the major challenges in low-observability fault location, namely inaccurate node-relationship representation, insufficient utilization of initial transient information, and weak discrimination between adjacent faulted line sections.

To provide a more intuitive illustration, Figure 8 further presents the overall comparison between AM-STGNN and the baseline scGNN. The baseline method mainly relies on predefined graph structures and spatiotemporal attention for feature extraction, whereas AM-STGNN introduces adaptive topology learning, Mamba-based selective temporal scanning, global linear interaction, and topology-aware discriminative output. Therefore, AM-STGNN is not limited to local propagation modeling, but simultaneously considers dynamic spatial relationships, short-term sequence evolution, and global spatiotemporal dependencies.

In terms of experimental results, the baseline spatiotemporal correlation graph neural network achieves a Macro-F1 of 0.9674, an absolute faulted-section location accuracy of 0.9662, and a one-hop tolerant accuracy of 0.9703. In contrast, AM-STGNN achieves corresponding values of 0.9684, 0.9711, and 0.9938, respectively. The comparison indicates that AM-STGNN outperforms the baseline method across all three metrics. In particular, the improvement in Acc 1-hop is the most pronounced, increasing from 0.9703 to 0.9938. This result suggests that AM-STGNN can more stably constrain the predicted results to the vicinity of the true faulted section in scenarios where adjacent line sections are prone to confusion.

Combined with the preceding methodological analysis and ablation results, the baseline spatiotemporal correlation graph neural network has already achieved high-accuracy fault location under low-observability conditions. Building upon this foundation, AM-STGNN further enhances three key capabilities: dynamic representation of inter-node relationships, extraction of critical initial transient fault features, and discrimination of long-range spatiotemporal dependencies and adjacent-section decision boundaries. Consequently, AM-STGNN not only maintains high absolute fault location accuracy, but also exhibits stronger tolerance performance for adjacent faulted line sections.

6. Conclusions

This study proposes an adaptive Mamba-driven spatiotemporal graph neural network, termed AM-STGNN, to address three major challenges in fault location for active distribution networks under low-observability conditions: the inaccurate representation of inter-node relationships, the insufficient utilization of initial transient fault information, and the inadequate modeling of global spatiotemporal correlations. In the proposed method, adaptive implicit topology generation is employed to characterize dynamic electrical coupling relationships among nodes. The Mamba selective state-space module is used to enhance the extraction of short-term transient fault features. Furthermore, STGformer-based global interaction, gated bilinear fusion, topology-aware differential output, and a margin constraint are integrated to improve the discrimination capability for weak-feature faults and adjacent faulted line sections. Simulation validation on the IEEE 123-node active distribution network shows that AM-STGNN achieves a Macro-F1 of 0.9684, an Acc of 0.9711, and an Acc 1-hop of 0.9938, outperforming the comparison method, which obtains corresponding values of 0.9674, 0.9662, and 0.9703. The improvement in one-hop tolerant accuracy is particularly pronounced, indicating that the proposed method can more stably constrain the prediction results to the vicinity of the true faulted section. The ablation experiments further demonstrate that the temporal branch, spatial aggregation, adaptive correlation modeling, selective scanning, and training strategy all contribute positively to model performance. Overall, AM-STGNN can more fully exploit the limited information from observation nodes under low-observability conditions, thereby effectively improving the location accuracy and stability of faulted line sections in complex active distribution networks.

Author Contributions

Conceptualization, Z.H. and J.M.; methodology, Z.H.; software, Z.H.; validation, Z.H.; formal analysis, Z.H.; investigation, Z.H. and J.M.; resources, J.M.; data curation, Z.H. and J.M.; writing—original draft preparation, Z.H.; writing—review and editing, Z.H.; visualization, Z.H.; supervision, X.H.; project administration, Z.H.; funding acquisition, X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62303103.

Data Availability Statement

Date will be made available on request.

Acknowledgments

The author would like to sincerely thank the supervisor for the guidance and support provided throughout the experimental research process, especially for the valuable advice and patient explanations offered during the design of the experimental scheme, the refinement of the research ideas, and the difficulties encountered during the study. This support was of great importance to the successful completion of this research and manuscript. During the preparation of this manuscript, the author used Google AI for the purpose of checking the accuracy of the English translation. The author has reviewed and edited the output and takes full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Orozco-Henao, C.; Bretas, A. S.; Herrera-Orozco, A. R.; Pulgarín-Rivera, J. D.; Dhulipala, S.; Wang, S. Towards active distribution networks fault location: Contributions considering DER analytical models and local measurements. Int. J. Electr. Power Energy Syst. 2018, 99, 454–464. [Google Scholar] [CrossRef]
Pattanaik, V.; Malika, B. K.; Panda, S.; Rout, P. K.; Sahu, B. K.; Samanta, I. S.; Bajaj, M.; Blazek, V.; Prokop, L. A critical review on phasor measurement units installation planning and application in smart grid environment. Results Eng. 2024, 24, 103559. [Google Scholar] [CrossRef]
Dashti, R.; Ghasemi, M.; Daisy, M. Fault location in power distribution network with presence of distributed generation resources using impedance based method and applying π line model. Energy 2018, 159, 344–360. [Google Scholar] [CrossRef]
Orozco-Henao, C.; Suman Bretas, A.; Marín-Quintero, J.; Herrera-Orozco, A.; Pulgarín-Rivera, J. D.; Velez, J. C. Adaptive Impedance-Based Fault Location Algorithm for Active Distribution Networks. Appl. Sci. 2018, 8, 1563. [Google Scholar] [CrossRef]
Chandran, S.; Gokaraju, R.; Narendra, K. An extended impedance-based fault location algorithm in power distribution system with distributed generation using synchrophasors. IET Gener. Transm. Dis. 2023, 18, 479–490. [Google Scholar] [CrossRef]
Dashtdar, M.; Hussain, A.; Al Garni, H. Z.; Mas’ud, A. A.; Haider, W.; AboRas, K. M.; Kotb, H. Fault Location in Distribution Network by Solving the Optimization Problem Based on Power System Status Estimation Using the PMU. Machines 2023, 11, 109. [Google Scholar] [CrossRef]
Shi, Y.; Zheng, T.; Yang, C. Reflected Traveling Wave Based Single-Ended Fault Location in Distribution Networks. Energies 2020, 13, 3917. [Google Scholar] [CrossRef]
Cheng, L.; Wang, T.; Wang, Y. A novel fault location method for distribution networks with distributed generations based on the time matrix of traveling-waves. Prot. Control Mod. Power Syst. 2022, 7, 46. [Google Scholar] [CrossRef]
Rizeakos, V.; Bachoumis, A.; Andriopoulos, N.; Birbas, M.; Birbas, A. Deep learning-based application for fault location identification and type classification in active distribution grids. Appl. Energy 2023, 338, 120932. [Google Scholar] [CrossRef]
Siddique, M. N. I.; Shafiullah, M.; Mekhilef, S.; Pota, H.; Abido, M. A. Fault classification and location of a PMU-equipped active distribution network using deep convolution neural network (CNN). Electr. Power Syst. Res. 2024, 229, 110178. [Google Scholar] [CrossRef]
Khan, M. A.; Asad, B.; Vaimann, T.; Kallaste, A.; Pomarnacki, R.; Hyunh, V. K. Improved Fault Classification and Localization in Power Transmission Networks Using VAE-Generated Synthetic Data and Machine Learning Algorithms. Machines 2023, 11, 963. [Google Scholar] [CrossRef]
Bougoffa, M.; Benmoussa, S.; Djeziri, M.; Palais, O. Hybrid Deep Learning for Fault Diagnosis in Photovoltaic Systems. Machines 2025, 13, 378. [Google Scholar] [CrossRef]
Arifeen, M.; Petrovski, A.; Hasan, M. J.; Noman, K.; Navid, W. U.; Haruna, A. Graph-Variational Convolutional Autoencoder-Based Fault Detection and Diagnosis for Photovoltaic Arrays. Machines 2024, 12, 894. [Google Scholar] [CrossRef]
Sun, H.; Kawano, S.; Nikovski, D. N.; Takano, T.; Mori, K. Distribution system fault location analysis using graph neural network with node and link attributes. In Proceedings of the ISGT Europe, Espoo, Finland, 18–21 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
Nguyen, B. L. H.; Vu, T. V.; Nguyen, T.-T.; Panwar, M.; Hovsapian, R. Spatial-Temporal Recurrent Graph Neural Networks for Fault Diagnostics in Power Distribution Systems. IEEE Access 2023, 11, 46039–46050. [Google Scholar] [CrossRef]
Liang, L.; Zhang, H.; Cao, S.; Zhao, X.; Li, H.; Chen, Z. Fault location method for distribution networks based on multi-head graph attention networks. Front. Energy Res. 2024, 12, 1395737. [Google Scholar] [CrossRef]
Fan, M.; Xia, J.; Zhang, H.; Zhang, X. Fault Location Method of Distribution Network Based on VGAE-GraphSAGE. Processes 2024, 12, 2179. [Google Scholar] [CrossRef]
Li, W. D.; Deepjyoti. PPGN: Physics-preserved graph networks for real-time fault location in distribution systems with limited observation and labels. arXiv 2021, arXiv:2107.02275. [Google Scholar]
Lu, T.; Hou, S. Fault Location Algorithm for Distribution Network With Distributed Generation Based on Domain-Adaptive TGATv2. IET Gener. Transm. Dis. 2025, 19, e70033. [Google Scholar] [CrossRef]
Ma, J.; Hu, X.; Wang, J.; Zhang, Z.; Ma, D.; Sun, Q. Research on Spatiotemporal Correlation Graph Neural Network Fault Location Method for Large-Scale Distribution Network. Sci. China Technol. Sc. (In Chinese) 2026, 56, 453–470. [Google Scholar]
Hu, X.; Ma, J.; Zhang, R.; Ma, D.; Wang, Q. Ts-GSAN: A Two-Stage Graphical Spatiotemporal Attention Network Fault Localization Method for Distributed Energy Systems. IEEE Trans. Instrum. Meas. 2025, 74, 1–8. [Google Scholar] [CrossRef]
Ma, J.; Hu, X.; Chu, T.; Zhao, H.; Ma, D. IPIGN: An Interpretable Physics-Informed Graph Network Multitask Cascading Failure Diagnosis Method for Distributed Energy Systems. IEEE Trans. Ind. Electron. 2026, 73, 7960–7971. [Google Scholar] [CrossRef]
Liu, Y.; Liao, P.; Wang, Y. Using Graph-Enhanced Deep Reinforcement Learning for Distribution Network Fault Recovery. Machines 2025, 13, 543. [Google Scholar] [CrossRef]
Li, L.; Wang, H.; Zhang, W. STG-Mamba: Spatial-Temporal Graph Learning via Selective State Space Model. arXiv 2024. [Google Scholar]
Wang, H.; Chen, J.; Pan, T.; Dong, Z.; Zhang, L.; Jiang, R.; Song, X. STGformer: Efficient Spatiotemporal Graph Transformer for Traffic Forecasting. arXiv 2024. [Google Scholar] [CrossRef]
Liu, J.; Huang, Y.; Chen, K.; Liu, G.; Yan, J.; Chen, S.; Xie, Y.; Yu, Y.; Huang, T. Graph Neural Networks for Fault Diagnosis in Photovoltaic-Integrated Distribution Networks with Weak Features. Sensors 2025, 25, 5691. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overall Spatiotemporal Architecture of the Proposed AM-STGNN Method.

Figure 2. Adaptive Implicit Topology Learning and Mamba-Based State Propagation.

Figure 6. Overall Fault-Location Performance Comparison in Simulation Validation.

Figure 7. Ablation Study on Core Modules of the Proposed AM-STGNN Method.

Figure 8. Comparison between AM-STGNN and baseline scGNN.

Table 1. Method comparison against baseline scGNN.

	Baseline scGNN	AM-STGNN (this work)
Dynamic topology	limited	data
Selective temporal state	limited	data ¹
Global interaction	moderate	strong
Gated fusion	limited	strong
Topology-aware margin	limited	strong

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Mamba-Driven Spatiotemporal Graph Neural Network for Fault Location in Low-Observability Active Distribution Networks

Abstract

Keywords:

Subject:

1. Introduction

2. Selection Strategy for Observation Nodes

3. Methods

3.1. Dynamic Spatiotemporal Modeling Based on the Coupling of Adaptive Graph Manifold and Mamba Dynamics

3.1.1. Data-Driven Adaptive Implicit Topology Generation

3.1.2. Mamba Selective Temporal Scanning Based on a Discrete State-Space Mode

3.2. Global Linear Spatiotemporal Interaction Architecture Based on the STGformer Kernel Decomposition Mechanism

3.2.1. Modeling Process of Kernel-Decomposed Linear Attention

3.2.2. Spatiotemporal Feature Updating Based on Global Interaction

3.3. Gated Bilinear Spatiotemporal Feature Fusion and Topology-Aware Differential Output Mechanism

3.3.1. Gated Bilinear Spatiotemporal Feature Fusion

3.3.2. Topology-Aware Differential Output

3.3.3. Boundary Enhancement Based on a Margin Constraint

4. Results and Analysis

4.1. Experimental Settings and Simulation Validation Methodology

4.2. Overall Comparative Analysis of Fault Location Performance

4.3. Ablation Analysis of Core Module

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe