Bayesian Conditional GAN for Unsupervised Anomaly Detection in Structural Health Monitoring Time Series Dataset

Yohannes L. Alemu; Christian Walther; Manuel Schneider; Norbert Greifzu; Leon Quinten Thiebes; Andreas Wenzel; Uwe Plank-Wiedenbeck; Tom Lahmer

doi:10.20944/preprints202604.1460.v1

Submitted:

20 April 2026

Posted:

21 April 2026

You are already at the latest version

Abstract

Detecting rare structural damage without labeled fault data remains a critical unsolved challenge in structural health monitoring (SHM). This paper introduces BcDCGAN, a Bayesian conditional deep convolutional generative adversarial network designed for unsupervised anomaly detection in multivariate vibration time series from prestressed concrete catenary poles. The architecture integrates variational Bayesian inference over generator and critic weights with temporal convolutional networks, enabling epistemic uncertainty alongside reconstruction and critic objectives. Trained exclusively on healthy acceleration signals with wind speed conditioning, the model produces a log-space Bayesian anomaly score that jointly combines normalized reconstruction error, critic evaluation, and epistemic uncertainty estimates into a single weighted decision function. An adaptive threshold is calibrated from the validation data for deployment-ready performance. Evaluation on a real 2017 catenary pole dataset (1606 signals, 70/10/20 split) with injected anomalies achieves 99.2% recall while revealing clear latent space separation and appropriate uncertainty signaling for out-of-distribution samples. Progressive posterior uncertainty reduction during training confirms robust learning of healthy structural dynamics, supporting interpretable, risk-aware decisions in safety-critical railway infrastructure.

Keywords:

Bayesian inference

;

conditional GAN

;

temporal convolutional networks

;

unsupervised anomaly detection

;

structural health monitoring

;

uncertainty quantification

;

bayesian conditional deep convolutional generative adversarial network

Subject:

Engineering - Civil Engineering

1. Introduction

The detection of anomalies in time series data is important for safety-critical infrastructures, where undetected faults can lead to catastrophic failures and substantial economic losses [1,2]. The goal is to identify deviations from normal operational behavior using continuous sensor streams, ideally without prior knowledge of failure modes.

However, real-world structural health monitoring (SHM) applications face significant challenges: (i) non-stationary and noisy sensor signals influenced by varying environmental and operational conditions, (ii) complex long-range temporal dependencies that violate independent and identically distributed assumptions, (iii) extreme rarity of anomalies with less than 0.1% of samples, and (iv) complete absence of labeled fault examples during training [3]. These factors render traditional supervised methods impractical and cause classical unsupervised techniques such as distance-based approaches, statistical thresholding, or one-class classifiers to perform poorly in dynamic environments [4].

In addition, purely discriminative models do not provide a measure of confidence, making them unsuitable for high-stakes decisions in structural health monitoring. Uncertainty quantification (UQ) is increasingly emphasized in regulatory discussions, such as the EU AI Act, as important for trustworthy high-risk AI systems [5].

Generative adversarial networks (GANs) have emerged as powerful tools for unsupervised anomaly detection by learning the distribution of normal data. Anomalies are flagged via high reconstruction error or low discriminator confidence [6]. However, standard GAN-based methods struggle with non-stationary multivariate time series and provide no calibrated measure of epistemic uncertainty.

Consequently, there is an urgent demand for a unified architecture that simultaneously (i) operates fully unsupervised on raw, unlabeled multivariate time series, (ii) captures complex temporal dynamics via dilated convolutions, (iii) incorporates Bayesian uncertainty modeling throughout the network, and (iv) delivers probabilistic anomaly scores and enables structural health monitoring in the complete absence of failure examples.

This paper makes the following contributions to unsupervised anomaly detection for structural health monitoring of time series signals:

A Bayesian conditional deep convolutional GAN (BcDCGAN) architecture tailored to multivariate vibration-based time series is proposed, enabling fully unsupervised anomaly detection using only healthy data from prestressed concrete catenary poles.
A variational Bayesian weight distribution is integrated into the generator and critic, yielding epistemic uncertainty estimates that support risk-aware decision making in safety-critical SHM applications.
A temporal convolutional network with dilated causal convolutions within the generator-critic model component is employed to capture long-range temporal dependencies and handle non-stationary operating conditions.
An adaptive Bayesian anomaly scoring and thresholding scheme is introduced that combines normalized reconstruction error, critic score, and epistemic uncertainty into a single score. The decision threshold is calibrated using validation data for practical deployment.
The effectiveness of the proposed framework is demonstrated on a real catenary-pole SHM dataset with injected anomalies, showing high recall and clear separation between normal and anomalous signals in both latent representations and uncertainty measures.

The remainder of the paper is organized as follows: Section 2 reviews related work, Section 3 motivates the approach, Section 4 describes the methodology, Section 5 presents the case study and results, and Section 6 concludes with future directions.

2. Literature Review

2.1. Time Series Anomalies

Time series anomalies are broadly classified into three main types: point anomalies, contextual anomalies, and collective anomalies [7,8]. Point anomalies refer to individual data points that deviate significantly from the rest of the data. Contextual anomalies are data points that are anomalous only within a specific context or time frame, such as a temperature spike that is unusual for a given season. Collective anomalies involve a sequence or group of data points that individually may not be considerable; together, however, they represent an unusual pattern, making them more complex and challenging to detect compared to point or contextual anomalies.

Figure 1 illustrates these types of anomalies using a representative acceleration signal segment of the catenary pole data set used in this study [9]. Point, contextual, and collective anomalies are injected into the healthy signal segment that result in an anomalous segment. Detecting collective anomalies is often the most challenging, as individual points may not exhibit clear deviations when viewed in isolation.

2.2. Traditional Anomaly Detection Limitations

Traditional anomaly detection methods including distance-based approaches (Euclidean distance, KNN), statistical thresholding (moving averages, standard deviation), dimensionality reduction (PCA), and density estimation (GMM, one-class SVM) struggle with non-stationary multivariate time series common in SHM. These methods assume stationarity, linearity, or simple distributional forms that fail to capture complex temporal dynamics and environmental variations in structural vibration data.

2.3. Generative Adversarial Network

Generative adversarial networks (GANs) have emerged as a powerful framework for unsupervised anomaly detection in time series data, using the assumption that anomalies are rare and thus poorly reconstructed by a model trained exclusively on normal instances [7,10].

A GAN consists of two neural networks trained simultaneously in an adversarial manner: a generator G that learns to produce synthetic samples from random noise, and a discriminator D that distinguishes real data from generated samples [11]. The generator captures the data distribution of the dominant (normal) class, while the discriminator provides feedback to improve realism.

As indicated in Figure 2, the generator produces fake samples from latent noise, while the discriminator classifies real vs. fake inputs. During inference, anomalies produce high reconstruction error or low discriminator confidence.

The original GAN is formulated as a two-player minimax game between the generator G and the discriminator D:

min_{G} max_{D} V (D, G) = E_{x \sim p_{data} (x)} [log D (x)] + E_{z \sim p_{z} (z)} [log (1 - D (G (z)))] .

(1)

where

x	$\sim p_{data} (x)$ is real data sample from true data distribution
z	$\sim p_{z} (z)$ is latent noise vector from prior (e.g., $N (0, 1)$ )
$D (\cdot)$	is discriminator network (outputs probability of real)
$G (\cdot)$	is generator network (maps noise to fake data)

The discriminator maximizes the value function to correctly classify real and fake samples, while the generator minimizes it to produce realistic fakes that fool the discriminator.

L_{D} = - E_{x \sim p_{data} (x)} [log D (x)] - E_{z \sim p_{z} (z)} [log (1 - D (G (z)))] .

The discriminator loss

L_{D}

combines two binary cross-entropy terms: one pushing

D (x) \to 1

for real data and another pushing

D (G (z)) \to 0

for generated data.

L_{G} = - E_{z \sim p_{z} (z)} [log D (G (z))] .

Minimizing the generator loss

L_{G}

given by

- log D (G (z))

, the generator receives stronger gradients when its output is easily detected as fake, improving training stability over the original saturating loss.

GAN-based approaches thus address many limitations of traditional methods, particularly in capturing long-range temporal dependencies and high-dimensional distributions without requiring labeled anomalies.

2.4. GAN Based Anomaly Detection Approaches

Several specialized GAN variants have been proposed for unsupervised anomaly detection (AD) in time series [12]. These approaches capitalize on the GAN’s ability to model the distribution of normal data while producing higher reconstruction errors or lower discriminator confidence for anomalous samples.

TAnoGAN introduces an LSTM-augmented GAN to learn compact latent representations of normal time series through adversarial training [13]. The anomaly scoring is based on a combination of reconstruction error and latent-space deviation. This method effectively captures sequential dependencies without requiring labeled faults.

The DCGANs + Bi-LSTM framework integrates Deep Convolutional GANs with bidirectional LSTM to jointly exploit spatial and long-range temporal features [14]. The convolutional generator creates realistic sequences, while the Bi-LSTM provides enhanced bidirectional context, leading to superior performance on complex multivariate signals.

BiGAN extends conventional GAN by jointly training an encoder along with the generator and discriminator, thereby learning a bidirectional mapping between data and latent space [15]. This architecture yields more accurate reconstructions and improved anomaly separation, as outliers typically exhibit poor inverse mappings to the learned normal manifold.

Table 1. GAN-based anomaly detection approaches.

Method	How it Works	Strength	Limitation
TAnoGAN	GAN with LSTM, uses reconstruction errors	Models temporal trends	Tuning sensitive
DCGAN+Bi-LSTM	DCGAN and Bi-LSTM for spatial-temporal data	Accurate for sequences	Computationally heavy
BiGAN	Joint encoder, generator, discriminator training	Precise reconstruction	Overfitting risk

These GAN-based methods substantially advance over traditional approaches in complex time series tasks [4]. However, they commonly lack inherent uncertainty quantification and struggle with strong non-stationarity limitations directly addressed by the Bayesian conditional formulation presented in this work.

2.5. Anomaly Detection Metrics

In unsupervised anomaly detection, quantitative evaluation remains challenging due to the absence of labeled anomalies in both training and real-world deployment. Various thresholding strategies are therefore applied to separate normal from anomalous samples. Common approaches include fixed thresholds, percentile-based methods (e.g., flagging the top 5% highest-scoring samples as anomalies), or statistical thresholds such as mean plus k standard deviations (

μ + k σ

, with

k = 2

frequently used) [2].

In adversarial models, discriminator scores near 0.5 often signal uncertainty and are interpreted as potential anomalies [4]. Reconstruction-based methods primarily use reconstruction error (e.g., mean squared error between input and reconstructed signal) as the primary anomaly indicator: a higher error suggests deviation from the learned normal distribution.

Combined scoring, which merges reconstruction error and discriminator confidence, provides a more robust signal in many GAN-based frameworks [16]. Since ground-truth labels are not available during training, recall is widely considered the most meaningful metric when synthetic anomalies are injected solely into the held test set [13]. Recall, given by (TP/(TP + FN)), measures the proportion of injected anomalies correctly identified and is particularly critical in safety-critical applications like structural health monitoring, where missing an actual fault carries far greater risk than occasional false alarms.

Table 2 summarizes the evaluation strategies. When anomalies are injected exclusively into the test set, recall offers a reliable and reproducible measure of detection capability while maintaining the fully unsupervised nature of the training process.

3. Motivation

The rapid expansion of sensor networks in modern infrastructure generates vast amounts of highly dimensional, noisy, and not engineered time series data [1]. This presents both an opportunity and a significant challenge in extracting meaningful insights from raw vibration signals without extensive feature engineering or labeled fault examples.

In real-world structural health monitoring (SHM), anomalies are inherently rare, unpredictable, and often impossible to prospectively label [3]. Standard supervised approaches therefore fail, necessitating fully unsupervised methods that learn the distribution of normal behavior exclusively from healthy data. The model must then reliably flag deviations—ranging from subtle emerging fatigue to abrupt impacts—while remaining robust to varying environmental and operational conditions.

Moreover, safety-critical applications demand more than just detection accuracy. Decision-makers require calibrated confidence estimates to assess risk appropriately [5]. UQ is therefore indispensable, as it distinguishes genuine out-of-distribution events from mere noise, prevents overconfident false negatives, and supports risk-aware maintenance strategies. Without UQ, even highly accurate models risk providing misleading outputs in unseen or novel conditions—undermining trustworthiness and regulatory compliance.

Existing GAN-based anomaly detection methods, such as TAnoGAN, DCGAN+Bi-LSTM, and BiGAN, have demonstrated that adversarial generative models can effectively learn the distribution of normal time series and flag deviations as anomalies. At the same time, these architectures typically operate with deterministic weights and do not natively provide calibrated uncertainty estimates, which can limit interpretability in safety-critical monitoring settings [17]. Moreover, many existing approaches do not explicitly incorporate environmental conditioning or employ temporal convolutional architectures specifically optimized for long-range dependencies in non-stationary signals induced by external factors such as wind or temperature variations.

The proposed Bayesian conditional deep convolutional GAN (BcDCGAN) extends this line of work by combining conditional adversarial training, temporal convolutional networks, and Bayesian weight distributions within a unified framework. In contrast to purely deterministic GANs, our model provides estimates of epistemic uncertainty through variational Bayesian inference on the generator and critic weights, and it uses an adaptive Bayesian anomaly score that jointly reflects reconstruction quality, critic evaluation, and parameter uncertainty. This design aims to improve robustness under non-stationary operating conditions and to offer more interpretable anomaly scores.

4. Methodology

The primary objective of the proposed framework is to identify structural anomalies under varying environmental and operational conditions without prior exposure to damage-state data. To achieve this, we adopt a fully unsupervised approach centered on a Bayesian Conditional Deep Convolutional Generative Adversarial Network (BcDCGAN).

The BcDCGAN architecture integrates variational Bayesian inference into both the generator and the critic, allowing the model to learn the underlying distribution of healthy signals while explicitly accounting for environmental and operational inputs. The general framework is illustrated in Figure 3.

In post-training, an adaptive threshold is calibrated using a held-out validation set of healthy signals by synthesizing three distinct indicators: the reconstruction error, the critic’s evaluation score, and the epistemic uncertainty.

During deployment, incoming signals are evaluated against this adaptive threshold alongside a corresponding uncertainty band to support decision-making.

4.1. Bayesian Inference

Bayesian inference provides a principled framework for uncertainty quantification in deep generative models by treating network parameters as probability distributions rather than fixed values [18]. This enables the model to capture epistemic uncertainty, improving robustness in data-scarce or non-stationary SHM scenarios [19].

P (H ∣ E) = \frac{P (E ∣ H) \cdot P (H)}{P (E)} .

(2)

where:

P (H ∣ E)

is the posterior probability, representing the updated belief in a hypothesis H after observing evidence E;

P (H)

is the prior probability, expressing the initial belief before seeing any data;

P (E ∣ H)

is the likelihood, indicating how likely the observed data are under hypothesis H;

P (E)

is the evidence or marginal likelihood, serving as a normalizing constant to ensure the posterior is a valid probability distribution.

Exact posterior inference

P (θ ∣ D)

is intractable for deep networks, so variational inference (VI) approximates it with a tractable distribution

q (θ)

by minimizing the Kullback-Leibler (KL) divergence between the approximate and true posterior [20]. This is equivalent to maximizing the Evidence Lower Bound (ELBO), a tractable lower bound on the log marginal likelihood that balances reconstruction accuracy and regularization:

ELBO (q) = E_{q (θ)} [log p (D ∣ θ)] - KL (q (θ) ‖ p (θ)) .

(3)

In this work, the variational posterior

q (θ)

is modeled as independent Gaussian distributions over the weights of the generator and critic networks. Standard Gaussian priors

p (θ) = N (0, 1)

are placed on these weights, while the encoder remains deterministic with no prior distribution [20]. With both the posterior and prior chosen as Gaussians, the KL divergence term in Equation 3 can be computed analytically using the closed-form expression:

KL (N (μ, σ^{2}) ‖ N (0, 1)) = \frac{1}{2} (μ^{2} + σ^{2} - 1 - 2 log σ) .

(4)

where:

μ

is the posterior mean, representing the learned central value of the weight distribution;

σ

is the posterior standard deviation, capturing the uncertainty around the mean;

log σ

is the logarithmic standard deviation parameterized during training to ensure positivity.

During training, the variational parameters

μ

and

log σ

are optimized jointly with the network weights using the reparameterization trick and Monte Carlo estimates of the ELBO [21]. The KL divergence term is added to the overall loss, acting as a regularizer that encourages the posterior to remain close to the prior. This optimization yields distributional weights that propagate uncertainty through forward passes and, via posterior Monte Carlo sampling during inference, provide calibrated epistemic uncertainty estimates for anomaly scoring which addresses a key limitation of deterministic GANs.

This Bayesian treatment improves generalization to novel structural conditions and environmental variations, while the ELBO objective ensures stable adversarial training with meaningful latent representations [19].

4.2. Temporal Causal Networks

Temporal Convolutional Networks (TCNs) are convolutional architectures specifically designed for sequence modeling tasks [22]. They effectively capture long-range temporal dependencies in time series data while maintaining computational efficiency and causal structure, making them well-suited for real-time analysis of non-stationary signals such as structural vibrations.

Two key mechanisms enable this capability:

Causality: In TCNs, the output at any timestep depends only on current and past inputs, never on future values. This is achieved through causal (zero-padded) convolutions that preserve temporal order and prevent information leakage. Causality is essential for streaming applications, allowing the model to process signals sequentially as they arrive, critical for online anomaly detection in SHM.

Dilation: Dilated convolutions introduce gaps between kernel elements, exponentially expanding the receptive field with network depth without increasing parameters or losing resolution. By stacking layers with increasing dilation rates (e.g.

d = 1, 2, 4, 8

), TCNs efficiently aggregate information across distant timesteps, enabling the modeling of complex long-term patterns common in wind-induced or fatigue-related vibrations.

Figure 4 illustrates a typical causal TCN with dilation. In the proposed BcDCGAN, TCN blocks with residual connections and dilated convolutions replace standard layers in both the generator and the critic, ensuring stable training and robust temporal feature extraction in multivariate acceleration signals.

4.3. Proposed Bayesian Conditional Deep Convolution GAN Anomaly Detection Architecture

The proposed architecture integrates Bayesian inference into the convolutional layers of the generator and critic [20]. Weight and bias distributions are used and optimized throughout model training, enabling inclusion and quantification of uncertainty. The generator is updated from critic feedback indirectly and reconstruction error directly. An encoder extracts latent space features to help the generator produce signals with minimal error. Gradients from the generator’s loss function flow back through the generator to update the encoder, producing a latent space that is optimized to support the generator’s task. Conditional inputs such as temperature and wind speed can be fed to model components for improved context identification.

Figure 5 shows how the generator and critic are structured. Within the generator and critic, each convolution layer operates based on TCN, considering time series signals up to the current sequence with selected dilation rates. This is applied to each weight and bias distribution in every layer. Each convolution layer is therefore equipped with the TCN and Bayesian inference.

The objective function of GAN with Wasserstein critic which replaces the discriminator part is characterized by assigning real valued scores as shown on Equation 5. being high score indication for the consideration of the signal as real signal.

min_{G} max_{C} E_{x \sim p_{data}} [C (x)] - E_{z \sim p_{z}} [C (G (z))] .

(5)

Where:

x	$\sim p_{data} (x)$ is real data sample from true data distribution,
z	$\sim p_{z} (z)$ is latent vector from prior,
$C (\cdot)$	is critic network,
$G (\cdot)$	is generator network.

The critic loss on Equation 6 is set to be minimized as its negative value is taken. Like wise, minimizing the generator loss on Equation 7 increases the critic score for the generated signals.

L_{C} = E_{z \sim p_{z}} [C (G (z))] - E_{x \sim p_{data}} [C (x)] .

(6)

L_{G} = - E_{z \sim p_{z}} [C (G (z))] .

(7)

L_{rec} = {∥\hat{x} - x∥}_{2}^{2} .

(8)

where:

L_{C}

is the critic loss,

L_{G}

is the generator loss,

L_{r e c}

is the reconstruction loss.

The reconstruction error is considered one part of the losses

L_{r e c}

that contribute to the total loss of the generator. The total loss of generator is given by the sum of factored

L_{r e c}

, a KL-divergence regularizer term and the generator loss

L_{G}

from the critic.

L_{G, t o t a l} = λ_{r e c} * L_{r e c} + β_{E L B O} * L_{E L B O} + L_{G} .

(9)

where:

L_{G, t o t a l}

is the total generator loss,

λ_{r e c}

is the reconstruction loss weight defined based on the current and total number of epochs

given by

λ_{rec} (epoch) = 1 + 20 \cdot \frac{epoch}{\sqrt{epochs}}

,

β_{E L B O}

is a factor that gradually increases the strength of ELBO regularization,

L_{E L B O}

is the ELBO loss which is negative of ELBO defined in equation 3.

4.4. Adaptive Threshold

The anomaly detection framework employs a multi-component Bayesian scoring function that integrates reconstruction error, epistemic uncertainty, and critic network evaluations. The anomaly threshold in this model is computed as a weighted combination of these three components: the reconstruction error is calculated as the mean squared difference between the original sequences and their reconstructions in the validation data set; the critic score is obtained by evaluating the realism of the generator outputs in the validation data set using the critic network; and the epistemic uncertainty is estimated by Monte Carlo (MC) sampling of the generator to capture variability in the reconstructions [21]. Each component is then normalized and linearly combined with empirically chosen weights to produce the Bayesian combined anomaly score for a time series sample:

S = α E_{norm} + β U_{norm} - γ C_{norm} .

(10)

where:

E_{norm}

is the normalized reconstruction error;

U_{norm}

is the normalized epistemic uncertainty;

C_{norm}

is the normalized critic score;

α

,

β

and

γ

are empirically chosen weights;

S is the combined anomaly score.

The adaptive threshold is determined from the validation scores as follows:

τ = μ_{val} + k \cdot σ_{val .}

(11)

where:

μ_{val}

and

σ_{val}

are the mean and standard deviation of the combined scores computed from the validation set;

k is a sensitivity parameter (typically

k = 1

).

A test sample is classified as anomalous if its combined score

S (x_{test})

exceeds this threshold

τ

.

5. Case Study

5.1. Dataset

Three prestressed concrete catenary poles were instrumented; the central pole M27, equipped with multiple sensors, was selected for analysis. In this study a data set collected in 2017 is used [9]. Acceleration signals along the x-axis that is the railway direction, recorded by the sensor

a 12

, together with the corresponding wind speed

V_{x}

measurements from a 3D anemometer, form the dataset. Acceleration signals are used as the primary input for anomaly detection. In contrast, wind speed signals are used as a conditioning input. The data set comprises 1606 acceleration signals, each with 114688 timestamps. The signals are split using a 70/10/20 ratio: 1124 signals for training, 160 signals for validation, and 322 signals for testing. In this fully unsupervised approach validation data are reserved exclusively for post-training adaptive threshold computation (

τ = μ_{val} + k σ_{val}

) and not used during model optimization. Synthetic anomalies are injected only into the test set for evaluation.

Figure 6 shows a prestressed catenary pole monitored with SHM sensors on the railway line between Erfurt and Leipzig, Germany [9].

Wind speed data of the same length is used as a conditioning input for both the generator and the critic. All experiments are implemented in Python 3.8.17 using TensorFlow, on a system with a 12th Gen Intel Core i7 processor and 32 GB RAM.

As the data set experiences nonlinearity, Kernel Principal Component Analysis (KPCA) served as a nonlinear dimensionality reduction tool for the accelerometer time series dataset. This approach improved computational efficiency and model accuracy under varying operational loads.

The model is trained exclusively on healthy (normal) signals, preserving the fully unsupervised paradigm. To enable quantitative evaluation, synthetic anomalies are injected solely into the held-out test set.

The model architecture takes care of the condition input by its own for a desired dilation time. However, in this particular case study, it is preferred to take reduced statistical dimension of the wind speed signal sequence expressed in 15 features in the time domain and the frequency domain. These are: mean, standard deviation, skewness, kurtosis, root mean squared, peak-to-peak, crest factor, shape factor, impulse factor, number of peaks, autocorrelation lag-1, zero-crossing, total spectral power, spectral centroid, and spectral entropy. These features effectively condense high-dimensional data into interpretable vectors, preserving patterns such as periodicity, energy distribution, and anomalies. Now each acceleration signal of n sequence has a corresponding conditional input wind signal of 15 sequences(features).

5.2. Anomaly Injection

To generate ground-truth anomalies for evaluation, a controlled injection procedure is implemented that adds synthetic anomalies to normal vibration signals from prestressed catenary poles. The injection process first filters the test data set to identify signals with sufficient intensity, selecting only those with a mean standard deviation above a minimum threshold and variance above a specified percentile. From these valid signals, a fraction is randomly selected for anomaly injection.

For each selected signal, multiple calibrated anomaly patterns are injected to achieve a target signal-to-noise ratio, where the anomaly amplitude is scaled relative to the original signal power in the affected region. Four distinct anomaly patterns are used, each corresponding to real damage scenarios in prestressed concrete poles as shown in Figure 7.

Ramp patterns simulate progressive stiffness degradation caused by corrosion-induced prestress loss, where the structural response drifts linearly over time as the deterioration advances. Step patterns represent sudden changes in stiffness from impact events, such as vehicle collisions or rock strikes, producing an abrupt shift in the vibration signature. Sine wave patterns model periodic oscillations induced by wind-induced vibrations in damaged poles with reduced damping capacity, where the structure becomes more susceptible to resonant behavior. Gaussian pulses emulate transient impulse responses from crack formation or spalling events during concrete fracture, capturing momentary energy release during damage propagation.

Each pattern is normalized to achieve the target power, smoothed at the edges to ensure realistic transitions, and injected at non-overlapping locations within the signal. The final data set contains normal signals and signals with damage-simulating anomalies, with the indices of the injected signals serving as ground truth for subsequent evaluation.

5.3. Model Training

Following standard practice in unsupervised generative anomaly detection, the model is trained exclusively on normal data, 70% of the dataset, to learn a compact representation of the normal data manifold. Leaky ReLU is used in all hidden layers across the model components to preserve gradient flow for negative activations during training. The generator’s output layer uses tanh to bound the generated signals within the same normalized range as the training data. The critic’s output layer is left without an activation function, producing an unbounded score consistent with the Wasserstein distance objective. Validation data, 10% of the dataset, are not used to monitor or optimize reconstruction or critic losses during training. The validation data are rather reserved for post-training calibration of anomaly scores such as reconstruction or critic-based scores which in turn are used for the adaptive anomaly detection thresholding. This is typically performed by selecting a high percentage threshold, in order to preserve a tight but not overgeneralized representation of normal behavior [23,24].

The main hyper-parameters used during the training and testing of the proposed model are summarized in Table 3.

As shown in Table 3, the critic network is updated twice for each generator update. This helps the critic maintain meaningful gradients, allowing the generator to learn effectively from its feedback. However, the critic should not become too strong; otherwise, the generator would receive vanishing or uninformative gradients, which would hinder its learning.

According to Figure 8, the training reconstruction error begins above 1.5 and decreases steadily until approximately epoch 20, after which it converges to a stable value near 1.0. This smooth decline indicates efficient initial fitting followed by stabilization as the model captures the underlying data distribution. The relatively small magnitude of the reconstruction error converging to approximately 1.0 reflects the combined effect of input data normalization and the generator’s tanh output activation, providing a normalized measure of reconstruction fidelity that establishes a stable baseline for anomaly detection. This normalization, combined with the Bayesian framework’s regularization through KL divergence, ensures that even normal signals are reconstructed with high fidelity, making deviations from this baseline more statistically meaningful for anomaly detection.

The critic score maintains a stable balance near zero throughout training, confirming the textbook Wasserstein GAN equilibrium, where neither generator nor critic dominates. After epoch 20, both metrics exhibit minimal fluctuation, indicating successful model convergence and effective manifold learning. This stability, together with the normalized error scale, provides a reliable foundation for detecting subtle damage signatures in prestressed catenary poles.

5.4. Latent Space Analysis

t-SNE is a nonlinear dimensionality reduction method that maps high-dimensional points into a low-dimensional space by matching probability-based similarities between pairs of points in both spaces using a Kullback–Leibler divergence objective. It preserves local neighborhoods and mitigates the crowding problem through a Gaussian kernel in the high-dimensional space and a heavy-tailed Student-t distribution in the low-dimensional space. As a result, it offers a qualitative visualization of the learned latent manifold, where compact, well-separated clusters correspond to locally similar signals and highlight how the model organizes normal and anomalous patterns [25].

Figure 9 shows the t-SNE visualization of the latent representations of the test set. Normal signals form a tight cluster with consistently lower epistemic uncertainty, while injected anomalies occupy a distinct region with higher epistemic uncertainty (colored by uncertainty).

Elevated uncertainty over anomalies indicates that the model has not encountered similar patterns during training, lacking confidence to classify them as normal, which is a desirable property for out-of-distribution detection. Conversely, low uncertainty over the normal cluster confirms that the encoder learns a well-defined manifold boundary, enabling reliable anomaly detection through both geometric separation and uncertainty signaling.

5.5. Validation data based Threshold and Anomaly Detection

Figure 10 evaluates the detection of anomalies in 322 independent test signals; which is 20% of the dataset. The clear separation between most detected anomalies and the reconstruction threshold indicates that anomalous signals consistently exhibit elevated peak reconstruction error and uncertainty relative to normal samples. Moreover, the dispersion captured by the uncertainty band

\pm 2 σ

demonstrates that epistemic variability itself increases for anomalous signals, strengthening their detectability. Importantly, detections are not clustered around marginal threshold crossings but instead show pronounced deviations, suggesting that the decision boundary is not overly sensitive to noise and that the model’s confidence degrades meaningfully under distributional shift.

Figure 11 analyzes the statistical separability of the final combined Bayesian score in log space. The histogram reveals a distinct rightward shift of the anomalous distribution relative to the normal distribution, with limited overlap around the adaptive threshold. This indicates that the weighted combination of logarithmically transformed reconstruction error and epistemic uncertainty produces a scoring function with strong class separation. The threshold derived from validation statistics (

μ + k \cdot σ

) appears well-calibrated, capturing the majority of anomalous samples while retaining most normal samples below the decision boundary. Together, distributional separation and the stability of the threshold suggest that the proposed scoring formulation yields robust anomaly detection and benefits from operating in log space, which stabilizes variance and amplifies relative differences between normal and out-of-distribution signals.

Among 322 test signals (120 anomalous), the Bayesian combined score detects 119 anomalies for 99.2% recall, which is essential in SHM where undetected faults risk catastrophic failure. Accuracy reaches 91.0%, precision 81%, and F1-score 89.2%, confirming robust generalization. The pronounced anomaly peaks ensure reliable detection at scale, with precision moderated by manifold complexity under distributional shift [6].

The combined score uses weights

α = 0.5

,

β = 1.0

,

γ = 0.1

in

S = α E_{norm} + β U_{norm} - γ C_{norm}

. The weight allocation reflects the relative contribution of each component to anomaly detection. Epistemic uncertainty receives the highest weight (

β = 1.0

) as it provides the most direct signal for out-of-distribution detection, capturing model confidence degradation under distributional shift. Reconstruction error (

α = 0.5

) serves as the primary structural signal, measuring deviation from the learned normal manifold. The critic score receives the lowest weight (

γ = 0.1

) as it provides auxiliary refinement rather than a standalone anomaly indicator, since critic outputs in Wasserstein GANs reflect relative data fidelity rather than a calibrated probability. Varying these weights confirmed that the chosen proportionality consistently yielded stable and detective anomaly scores across the validation set, supporting the adopted configuration.

5.6. Kullback-Leibler Divergence

The training progress, visualized as the

{log}_{10}

of the negative Evidence Lower Bound (ELBO) on Figure 12, reveals distinct dynamics where the generator shows smooth and stable convergence while the critic exhibits characteristic volatility as it adapts. Crucially, because the total log-evidence is the sum of the ELBO and the Kullback-Leibler (KL) divergence, minimizing the negative ELBO, as seen in the downward trajectory of both curves, directly minimizes the KL divergence between the model’s approximation and the true data distribution. This indicates that the model is successfully closing the gap to the ground truth, resulting in more accurate generative representations.

5.7. Posterior Uncertainty Monte Carlo

Figure 13 visualizes the evolution of the weight within the outer layers of the Generator and Critic over 100 training epochs, focusing on the posterior mean and standard deviation. In both components, the kernel mean remains exceptionally stable, centered at zero throughout the duration of training. However, the associated uncertainty regions (

\pm 2 σ

) exhibit a distinct contraction, particularly in the Generator. Similarly, the standard deviation of the kernel for both modules shows a steady downward trajectory. The Generator’s standard deviation drops from approximately

3.5

to

2.2

, while the Critic’s standard deviation decreases from around

4.0

to

3.0

.

This simultaneous narrowing of the uncertainty bands and the reduction in standard deviation suggest a highly controlled convergence process. The stability of the mean at zero indicates that the models are not shifting their global bias but are instead concentrating their weight distributions. This statistical behavior is indicative of an effective regularization or Bayesian learning process where the model progressively prunes away stochastic noise. As training progresses, the weights transition from a broad exploratory state to a more specialized and dense configuration, effectively narrowing the search space for optimal parameters.

Ultimately, these trends signify the achievement of statistical maturity within the model. The reduction in parameter volatility across both the Generator and Critic points to a balanced optimization where neither component is undergoing radical, destabilizing adjustments. This tightening of the posterior distributions reflects an increase in the model’s certainty, ensuring that the final output is derived from a refined and robust set of weights.

5.8. Consistency of Results with Theoretical Expectations

Our Bayesian conditional GAN demonstrates strong anomaly detection performance on the data set of prestressed concrete catenary poles. The achieved recall reflects effective learning of healthy structural dynamics from vibration data alone.

The reduction in the posterior weight uncertainty during training aligns with the theoretical predictions for Bayesian neural networks, where the epistemic uncertainty diminishes as the model observes more data while retaining sensitivity to out-of-distribution inputs [26]. The visualization of t-SNE confirms that anomalous samples reside in regions of higher epistemic uncertainty, validating the principle that model confidence serves as a reliable indicator of input novelty in uncertainty-aware deep learning frameworks [27].

The integration of Bayesian temporal convolutional networks within a conditional GAN architecture provides principled uncertainty quantification alongside robust temporal modeling. The conditional framework incorporates wind speed as conditioning input, enabling the model to distinguish environmental variations from genuine structural anomalies. This uncertainty-aware approach improves interpretability and reliability for safety-critical structural health monitoring applications.

Conclusion

This paper presented BcDCGAN, a Bayesian conditional deep convolutional GAN framework for unsupervised anomaly detection in the structural health monitoring of prestressed concrete catenary poles. The architecture successfully learns healthy structural dynamics from multivariate acceleration signals alone, achieving robust anomaly detection through complementary reconstruction, adversarial, and uncertainty signals.

Key contributions include variational Bayesian inference over generator and critic weights for calibrated epistemic uncertainty, temporal convolutional networks for long-range dependency modeling, and an adaptive Bayesian scoring mechanism with data-driven thresholding. The approach demonstrates clear separation of normal and anomalous patterns in latent space alongside appropriate uncertainty signaling, confirming effective learning of the healthy data manifold.

Experimental evaluation of real catenary pole vibration data with injected anomalies shows the potential of the methodology. High recall with interpretable uncertainty estimates supports a reliable deployment in the monitoring of critical rail infrastructure.

Future work will extend validation to natural damage scenarios and investigate multi-sensor fusion across catenary systems. Real-time implementation and transfer-learning to other civil infrastructure will further enhance the framework’s practical impact for structural health monitoring.

Author Contributions

Conceptualisation, Y.L.A.; methodology, Y.L.A.; resources, Y.L.A, C.W., M.S., N.G., L.T., A.W., U.P. and T.L.; software, Y.L.A., M.S., N.G.; validation, Y.L.A., M.S., N.G., L.T.; writing—original draft preparation, Y.L.A.; writing—review and editing, Y.L.A, C.W., M.S., N.G., L.T., A.W., U.P. and T.L.; supervision, A.W., U.P. and T.L.. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a special fund of the German government at the BMBF [grant number: 16DKWN078A].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available within the article.

Acknowledgments

This work was supported by a special BMBF fund for the Intelligente Methoden zur automatischen und nachvollziehbaren Analyze umfangreicher Infrastruktur-, Verkehrs- und Umweltmessdaten (InMeA) project [grant number: 16DKWN078A]. The BMBF is therefore acknowledged for the funding provided. Tom Lahmer (T.L.) is acknowledged as the principal investigator of the project.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Usmani, U.A.; Aziz, I.A.; Jaafar, J.; Watada, J. Deep Learning for Anomaly Detection in Time-Series Data: An Analysis of Techniques, Review of Applications, and Guidelines for Future Research. IEEE Access 2024.
Zamanzadeh Darban, Z.; Webb, G.I.; Pan, S.; Aggarwal, C.; Salehi, M. Deep learning for time series anomaly detection: A survey. ACM Computing Surveys 2024, 57, 1–42. [Google Scholar] [CrossRef]
Blázquez-García, A.; Conde, A.; Mori, U.; Lozano, J.A. A review on outlier/anomaly detection in time series data. ACM computing surveys (CSUR) 2021, 54, 1–33. [Google Scholar] [CrossRef]
Geiger, A.; Liu, D.; Alnegheimish, S.; Cuesta-Infante, A.; Veeramachaneni, K. Tadgan: Time series anomaly detection using generative adversarial networks. In Proceedings of the 2020 ieee international conference on big data (big data); IEEE, 2020; pp. 33–43. [Google Scholar]
Smuha, N.A. Regulation 2024/1689 of the Eur. Parl. & Council of June 13, 2024 (EU Artificial Intelligence Act). International Legal Materials 2025, 1–148. [Google Scholar]
Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Langs, G.; Schmidt-Erfurth, U. f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Medical image analysis 2019, 54, 30–44. [Google Scholar] [CrossRef] [PubMed]
Liang, H.; Song, L.; Wang, J.; Guo, L.; Li, X.; Liang, J. Robust unsupervised anomaly detection via multi-time scale DCGANs with forgetting mechanism for industrial multivariate time series. Neurocomputing 2021, 423, 444–462. [Google Scholar] [CrossRef]
Guigou, F.; Collet, P.; Parrend, P. SCHEDA: Lightweight euclidean-like heuristics for anomaly detection in periodic time series. Applied Soft Computing 2019, 82, 105594. [Google Scholar] [CrossRef]
Alkam, F. Vibration-based Monitoring of Concrete Catenary Poles using Bayesian Inference. PhD thesis, Dissertation, Weimar, Bauhaus-Universität Weimar, 2021. [Google Scholar]
Lee, C.K.; Cheon, Y.J.; Hwang, W.Y. Studies on the GAN-Based Anomaly Detection Methods for the Time Series Data. IEEE Access 2021, 9, 73201–73215. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Communications of the ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Bashar, M.A.; Nayak, R. ALGAN: Time Series Anomaly Detection with Adjusted-LSTM GAN: MA Bashar, R. Nayak. International Journal of Data Science and Analytics 2025, 20, 5719–5737. [Google Scholar] [CrossRef]
Bashar, M.A.; Nayak, R. TAnoGAN: Time Series Anomaly Detection with Generative Adversarial Networks. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI); IEEE, 2020; pp. 1778–1785. [Google Scholar] [CrossRef]
Tien, T.B.; et al. Time series data recovery in SHM of large-scale bridges: Leveraging GAN and Bi-LSTM networks. Structures 2024, 63. [Google Scholar]
Zhang, D.; Ma, M.; Xia, L. A comprehensive review on GANs for time-series signals. Neural Computing and Applications 2022, 34, 3551–3571. [Google Scholar] [CrossRef]
Zenati, H.; Foo, C.S.; Lecouat, B.; Manek, G.; Chandrasekhar, V.R. Efficient gan-based anomaly detection. arXiv 2018, arXiv:1802.06222. [Google Scholar]
Li, H.; Li, Y. Anomaly detection methods based on GAN: a survey. Applied Intelligence 2023, 53, 8209–8231. [Google Scholar] [CrossRef]
Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. Journal of the American statistical Association 2017, 112, 859–877. [Google Scholar] [CrossRef]
Nalisnick, E.; Matsukawa, A.; Teh, Y.W.; Gorur, D.; Lakshminarayanan, B. Do deep generative models know what they don’t know? arXiv 2018, arXiv:1810.09136. [Google Scholar]
Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
Murphy, K.P. Probabilistic machine learning: an introduction; MIT press, 2022. [Google Scholar]
Lara-Benítez, P.; Carranza-García, M.; Luna-Romera, J.M.; Riquelme, J.C. Temporal convolutional networks applied to energy-related time series forecasting. applied sciences 2020, 10, 2322. [Google Scholar] [CrossRef]
Akcay, S.; Atapour-Abarghouei, A.; Breckon, T.P. Ganomaly: Semi-supervised anomaly detection via adversarial training. In Proceedings of the Asian conference on computer vision. Springer, 2018, pp. 622–637.
Park, S.; Lee, K.H.; Ko, B.; Kim, N. Unsupervised anomaly detection with generative adversarial networks in mammography. Scientific Reports 2023, 13, 2925. [Google Scholar] [CrossRef] [PubMed]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. Journal of machine learning research 2008, 9. [Google Scholar]
Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural network. In Proceedings of the International conference on machine learning. PMLR, 2015, pp. 1613–1622.
Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the international conference on machine learning. PMLR, 2016, pp. 1050–1059.

Figure 1. Types of time series anomalies.

Figure 2. Standard GAN architecture.

Figure 3. Flow chart for the proposed Anomaly Detection Architecture

Figure 4. Schematic representation of a causal TCN architecture.

Figure 5. Bayesian conditional deep convolution GAN architecture.

Figure 6. Prestressed catenary pole.

Figure 7. Signal with injected anomalies.

Figure 8. Training reconstruction error and critic score.

Figure 9. t-Distribution stochastic neighbor embedding (t-SNE).

Figure 10. Anomaly detection with Bayesian combined score (log-space).

Figure 11. Anomaly detection with combined score distribution (log-space).

Figure 12. Negative ELBO minimization for generator and critic.

Figure 13. Mean and standard deviation uncertainty for generator and critic.

Table 2. Evaluation metrics.

Strategy/Metric	Description
Reconstruction Error	MSE or similar between input and reconstruction
Discriminator Score	Confidence near 0.5 indicates uncertainty
Combined Scoring	Fusion of residual and discriminative signals
Thresholding Approaches	Fixed, percentile, or $μ + k σ$
Recall	TP / (TP + FN) on injected anomalies
Precision, F1 and F2	Secondary when false positives are quantifiable

Table 3. Key hyperparameters used for training the Bayesian conditional DCcGAN.

Category	Parameter	Description
Training setup	$E = 50$	Total number of training epochs.
	$n_{C} = 2$	Number of critic updates per generator/encoder update.
Optimizers	Adam (G, E), $η = 5 \times 10^{- 3}, β_{1} = 0.9$	Optimizer and hyperparameters for generator and encoder.
	Adam (C), $η = 1 \times 10^{- 3}, β_{1} = 0.5$	Optimizer and hyperparameters for critic.
Bayesian prior	$μ_{prior} = 0.15$	Mean of the Gaussian prior for Bayesian TCN weights.
	$σ_{prior} = 1.5$	Standard deviation of the Gaussian prior for Bayesian TCN weights.
ELBO regularization	$β_{ELBO} = 0.9$	Target weight on the ELBO-based regularization term.
Uncertainty	$N_{MC} = 80$	Number of Monte Carlo forward passes per input to estimate epistemic uncertainty.
Anomaly scoring	$(α, β, γ) = (0.5, 1.0, 0.1)$	Weights for (reconstruction, uncertainty, critic) in the combined log-space anomaly score.
Thresholding	$τ = μ_{val} + k σ_{val}, k = 1$	Adaptive validation-based anomaly threshold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Bayesian Conditional GAN for Unsupervised Anomaly Detection in Structural Health Monitoring Time Series Dataset

Abstract

Keywords:

Subject:

1. Introduction

2. Literature Review

2.1. Time Series Anomalies

2.2. Traditional Anomaly Detection Limitations

2.3. Generative Adversarial Network

2.4. GAN Based Anomaly Detection Approaches

2.5. Anomaly Detection Metrics

3. Motivation

4. Methodology

4.1. Bayesian Inference

4.2. Temporal Causal Networks

4.3. Proposed Bayesian Conditional Deep Convolution GAN Anomaly Detection Architecture

4.4. Adaptive Threshold

5. Case Study

5.1. Dataset

5.2. Anomaly Injection

5.3. Model Training

5.4. Latent Space Analysis

5.5. Validation data based Threshold and Anomaly Detection

5.6. Kullback-Leibler Divergence

5.7. Posterior Uncertainty Monte Carlo

5.8. Consistency of Results with Theoretical Expectations

Conclusion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe