A Hybrid CNN-LSTM Approach for Early Gear Fault Detection in Wind Turbines Using Vibration and SCADA Data

Xiaodong Zhang; Ramcy Saah Stubblefield

doi:10.20944/preprints202604.1040.v1

Submitted:

13 April 2026

Posted:

15 April 2026

You are already at the latest version

Abstract

Detecting gear faults at an early stage is important to keep wind turbines running smoothly and to reduce expensive repairs in renewable energy systems. In this study, we test a hybrid method that combines Convolutional Neural Networks (CNNs) with Long Short-Term Memory (LSTM) models to diagnose gear faults using both vibration sensor data and wind turbine SCADA signals. The vibration data, collected under multiple load conditions, are processed using time-domain normalization, frequency-domain transformations (FFT), and time-frequency spectrograms, while SCADA data are analyzed for operational anomalies via statistical z-score methods. The CNN-LSTM model learns spatial and temporal patterns, showing strong and consistent results across different fault scenarios. Our results suggest that this model performs better than standard methods, offering more precise fault detection and better adaptability. This combined approach can help improve maintenance planning and make turbine monitoring more effective.

Keywords:

wind turbine

;

gear fault detection

;

CNN-LSTM

;

vibration analysis

;

SCADA data

Subject:

Engineering - Mechanical Engineering

1. Introduction

With energy needs rising and environmental issues becoming more urgent, wind energy has become an increasingly important part of renewable power (Alsumaidaee et al., 2023). Wind turbines, particularly large-scale onshore and offshore installations, play a vital role in generating clean electricity (Dao et al., 2024). Central to the efficient operation of these turbines is the gearbox, a critical component responsible for converting low-speed wind-driven rotor motion into high-speed rotation for electricity generation. However, gearboxes are known to break down often and usually require the most maintenance, which adds to turbine downtime and costs. Early detection of gearbox faults is essential to avoid catastrophic failures, reduce maintenance expenses, and enhance the reliability and lifespan of turbines. Mechanical issues such as gear wear, tooth breakage, and bearing degradation can manifest subtly and evolve over time. Traditionally, vibration analysis has been a widely accepted method for monitoring the health of rotating machinery. Yet, the complex dynamics of wind turbines, environmental noise, and varying load conditions make fault detection a non-trivial task. With the growing availability of sensor data, we need smarter systems that can spot fault patterns on their own and catch problems before they become serious.

Traditional gearbox monitoring systems in wind turbines mostly depend on rule-based checks and manually designed features taken from time- or frequency-domain signals (Alsumaidaee et al., 2023). These methods work well in lab conditions, but in real turbines where operating conditions change a lot they often fail to stay reliable or flexible (Borré et al., 2023). Standard machine learning methods, while helpful, usually require a lot of manual feature design and struggle to track how faults develop over time. The main difficulty is building a fault detection system that can make sense of complicated, multi-sensor, time-based data without needing too much manual work (Mohammed Alsumaidaee et al., 2023). CNNs are good at spotting spatial patterns, while LSTMs are better at handling time-based sequences. However, using them in isolation may not be sufficient to fully leverage the intricate nature of vibration signals and operational logs from turbines. This is why a combined CNN-LSTM approach makes sense for this type of problem. The scope of this study is defined by the use of two publicly available datasets:

Gearbox Fault Diagnosis Dataset from SpectraQuest, which includes vibration data collected under various load conditions for both healthy and faulty gearboxes.
Wind Turbine SCADA Dataset, which contains operational metrics such as wind speed, power output, theoretical power curves, and wind direction.

The contributions of this research are fourfold:

Integration of dual-domain data (vibration + SCADA) for enriched fault detection insights.

3.: Application of a hybrid CNN-LSTM architecture to jointly exploit spatial and temporal patterns.
4.: Visualization and interpretation of signal characteristics through FFT, spectrograms, and SCADA trend plots.
5.: Demonstration of improved fault detection capability compared to conventional analysis.

However, while effective in isolation, these conventional methods often struggle with the complex, non-stationary nature of real-world wind turbine data. Vibration analysis alone can be susceptible to environmental noise and varying operational loads, potentially missing incipient faults. Conversely, SCADA-based anomaly detection may flag operational irregularities without pinpointing the root cause within the mechanical system. This disconnect creates a critical gap in predictive maintenance strategies. A truly robust solution requires an integrated approach that synergistically combines the high-frequency, component-specific detail of vibration analysis with the holistic, system-wide context provided by SCADA data.

To bridge this gap, we propose a novel hybrid deep learning framework. Our model is specifically designed to exploit the complementary strengths of both data modalities. The convolutional layers extract salient spatial features from vibration spectrograms, capturing fault-specific harmonic patterns and impulsive transients. The long short-term memory layers then model the temporal evolution of these features, as well as the sequential patterns in SCADA data, enabling the detection of fault progression under dynamic operating conditions. This dual-path architecture is not merely an application of a standard model but a tailored solution to the inherent challenges of wind turbine gearbox diagnostics.

2. Literature Review

2.1. Gear Fault Diagnosis in Wind Turbines

Wind turbine gearboxes are subjected to variable environmental and operational stresses, making them one of the most failure-prone components in modern wind energy systems (Gebreamlak, 2024). Gearbox failures are often the result of accumulated mechanical wear, poor lubrication, fatigue loading, or manufacturing defects (Mohamad et al., 2021). The most common fault types include pitting, scuffing, tooth breakage, misalignment, and gear surface fatigue (Sahu et al., 2024). These faults lead to abnormal vibration patterns, increased acoustic emissions, and eventually, a decline in turbine efficiency or complete mechanical failure. Because replacing gearboxes and stopping turbines is expensive, finding these faults early is very important. Preventive and predictive maintenance strategies rely heavily on condition monitoring systems (CMS) to identify early fault indicators. Traditional CMS for wind turbines incorporate sensors that track parameters such as vibration, temperature, acoustic signals, and rotational speed. Among these, vibration analysis has gained widespread adoption due to its direct correlation with the mechanical health of gear components. However, diagnosing gear faults in wind turbines presents unique challenges. Unlike laboratory conditions, operational environments introduce noise, fluctuating loads, and varying speeds, which obscure fault signatures (Qi et al., 2024). Therefore, more advanced tools are needed to tell apart normal operating changes from real mechanical problems (Wu et al., 2023; Zhang et al., 2023).

2.2. Time-Series and Frequency-Domain Signal Analysis

Traditional gear fault diagnosis techniques often begin with analyzing raw time-series signals collected from vibration sensors. Basic time-domain measures like RMS, kurtosis, skewness, peak-to-peak value, and crest factor give an early idea of the signal's size and shape (Keil et al., 2023). These measures are easy to calculate and have long been used as the basis for many rule- or threshold-based fault detection methods. To gain further insight into the signal's internal periodic structure, frequency-domain analysis is employed, typically using Fast Fourier Transform (FFT). FFT shows the main frequencies in the signal and can bring out fault indicators like gear mesh frequencies, harmonics, or sidebands from vibration changes (Zhang et al., 2025). For instance, a broken tooth may introduce periodic impacts that manifest as sidebands around gear mesh frequencies in the spectrum. Additionally, time-frequency representations such as spectrograms or wavelet transforms are used when signals are non-stationary or time-varying, which is often the case in wind turbine operation. Spectrograms show how frequencies change over time, which helps in spotting new faults or short-term events (Yi et al., 2023). While these methods work well, they often need expert judgment and may not always apply to different operating conditions. This has led researchers to try more automated, data-driven methods like machine learning and deep learning (Crabbé et al., 2023).

2.3. Deep Learning for Condition Monitoring

Lately, deep learning has become an effective way to monitor machines and detect faults, since it can learn features straight from raw sensor data (Serin et al., 2020). Unlike conventional machine learning techniques that require manual feature engineering, deep learning models can pick up complex patterns and relationships in multi-sensor time-series data without much manual work (Wang et al., 2020). Convolutional Neural Networks (CNNs) have been widely applied in vibration-based fault detection. CNNs are effective at finding patterns in vibration signals and in frequency-based forms like spectrograms or scalograms (Gangsar et al., 2022). For instance, 1D CNNs can operate directly on vibration time series, while 2D CNNs can analyze spectrograms, effectively capturing fault-related frequency patterns. On the other hand, Long Short-Term Memory (LSTM) networks, a special class of Recurrent Neural Networks (RNNs), are well-suited for modeling sequential dependencies in time-series data. LSTMs are good at tracking how faults change over time, especially when faults develop gradually or under shifting operating conditions (Zhou et al., 2022). CNNs and LSTMs each have their own strengths: CNNs are strong at picking out local patterns, while LSTMs are better at handling long-term time sequences. However, relying on only one of these architectures may limit the model's ability to fully represent the complex, spatiotemporal nature of gear faults under real-world turbine conditions.

2.4. CNN-LSTM Architectures in Industrial Applications

CNN-LSTM models combine the benefits of CNNs and LSTMs, making them a good choice for time-series sensor data (Wahid et al., 2022). In this architecture, the CNN layers first learn discriminative local features from raw or preprocessed input (e.g., spectrograms), which are then fed into LSTM layers to model their temporal progression. Several studies have demonstrated the effectiveness of CNN-LSTM architectures in industrial condition monitoring. Nguyen-Da et al. (2024) applied a CNN-LSTM framework for rotating machinery fault diagnosis and reported improved accuracy and robustness compared to standalone CNN or LSTM models. Similarly, Zegarra et al. (2021) employed a CNN-LSTM model for detecting gear and bearing faults under variable speed conditions, highlighting the architecture's adaptability to non-stationary signals. Other research efforts have extended this approach by integrating attention mechanisms or autoencoders to further enhance interpretability and fault classification performance (Elmaz et al., 2021). The hybrid model's ability to simultaneously capture spatial, frequency, and temporal features has proven particularly useful in domains such as aerospace, manufacturing, and energy systems (Mohamad et al., 2021). Despite these advances, relatively few studies have applied CNN-LSTM models to wind turbine gearboxes using dual-modal data (i.e., combining vibration and SCADA data), which can offer a more holistic view of machine health. This gap is especially significant given the practical availability of SCADA systems in commercial wind turbines.

2.5. Gaps in Existing Research

While deep learning has demonstrated considerable promise in condition monitoring, several limitations remain in the existing literature:

Single Data Modality: Most fault detection studies rely on a single data source typically vibration signals while ignoring complementary data like SCADA logs, which can provide additional context about turbine performance and operating conditions.

6.: Limited Real-Time Application: Many proposed models are developed and tested in offline settings using curated datasets. Their effectiveness in real-time fault detection remains largely unexplored due to computational complexity and latency issues.
7.: Interpretability and Black Box Concerns: Deep learning models, especially hybrid architectures, are often criticized for their lack of interpretability, which can hinder adoption in critical applications like wind energy where decision transparency is essential.
8.: Underrepresentation of Dynamic Conditions: Few models adequately address variable load, speed, or environmental noise, which are prevalent in wind turbine operations and significantly affect model generalization.
9.: Limited Use of Spectrogram-Based Temporal Features: While spectrograms provide powerful time-frequency representations, their dynamic nature is underutilized in LSTM-based models that could otherwise leverage this information for better temporal modeling.

In light of these gaps, this study proposes a novel CNN-LSTM framework that utilizes both vibration and SCADA data for early fault detection in wind turbine gearboxes. By bridging multiple data domains and leveraging the strengths of hybrid neural networks, the proposed approach aims to provide a solution that is more accurate, easier to understand, and practical for large-scale wind turbine maintenance.

3. Methodology

This section outlines the methodology adopted for implementing the hybrid CNN-LSTM model to detect gearbox faults in wind turbines using two complementary datasets. The methodology comprises four major components: dataset description, data preprocessing, hybrid model architecture, and training and evaluation strategies.

3.1. Rationale for Hybrid Model Architecture

The diagnosis of gear faults necessitates a model capable of learning both spatial and temporal representations from raw sensor data. For healthy gears, vibration signals exhibit a predictable pattern dominated by the gear mesh frequency (GMF) and its harmonics, with minimal energy at other frequencies. The temporal sequence of SCADA parameters (e.g., power output vs. wind speed) closely follows the theoretical power curve without significant outliers. For faulty gears, two key signal characteristics emerge:

Spatial Patterns in Vibration: Localized faults (e.g., a broken tooth) generate periodic impulsive shocks. In the frequency domain, this manifests as sidebands around the GMF and its harmonics, caused by amplitude modulation. A CNN is uniquely suited to automatically learn these complex, localized patterns from vibration spectrograms without relying on manual feature extraction.
Temporal Patterns: These impulsive shocks occur at a rate determined by the shaft speed, creating a temporal signature. Furthermore, the fault progression causes a gradual deviation in SCADA trends (e.g., a growing discrepancy between theoretical and actual power). An LSTM is explicitly designed to capture such long-term temporal dependencies and sequential anomalies.

Standalone CNNs or LSTMs are insufficient. A CNN may miss the evolutionary nature of a fault, while an LSTM would struggle to extract the complex spatial-spectral features from raw data. Therefore, a hybrid CNN-LSTM architecture is proposed to simultaneously extract discriminative spatial features and model their temporal dynamics, providing a comprehensive diagnostic capability for both instantaneous fault detection and progression monitoring.

3.2. Dataset Description

This study utilizes two complementary datasets to train and evaluate the hybrid CNN-LSTM model for early detection of gear faults in wind turbines: the Gearbox Fault Diagnosis Dataset and the Wind Turbine SCADA Dataset. These datasets provide both mechanical vibration signals and turbine operational metrics, enabling a holistic understanding of fault behavior and facilitating robust predictive maintenance modeling. The Gearbox Fault Diagnosis Dataset, obtained from Kaggle, contains vibration data collected using the SpectraQuest Gearbox Fault Diagnostics Simulator. The simulator reproduces gearbox behavior under both normal and faulty conditions, across different load scenarios. This setup produces labeled time-series data that can be used for supervised learning. The dataset includes a total of 20 CSV files, with 10 files corresponding to gearboxes in healthy condition and the remaining 10 files representing gearboxes with a broken tooth fault. Each dataset file represents a specific load level between 0% and 90% (in 10% steps), making it possible to study how the gearbox responds as mechanical stress increases. Vibration measurements were recorded using four accelerometers mounted in orthogonal directions on the gearbox housing. The sensors record vibration patterns from different directions, which helps in spotting early signs of mechanical wear. Each file includes thousands of time-domain samples showing the gearbox's real-time activity, which provides a solid base for studying both normal and faulty operating patterns. These vibration signals are used to generate features such as Fast Fourier Transforms (FFT) and spectrograms, which are then fed into the CNN-LSTM architecture for fault classification.

Along with the mechanical dataset, the Wind Turbine SCADA data was used to study overall turbine performance and link it with possible mechanical faults. This dataset was also sourced from Kaggle and includes real-world measurements from a functioning wind turbine in Turkey. The SCADA system recorded measurements every ten minutes, giving both long-term performance trends and short-term variations. Key variables recorded in this dataset include wind speed (measured at the hub height), low-voltage active power (the actual electrical output of the turbine), theoretical power output (based on manufacturer specifications for given wind conditions), and wind direction. The SCADA readings add context, helping to explain how mechanical faults affect turbine efficiency and overall performance. For example, a reduction in actual power output compared to the theoretical power curve, under consistent wind speeds, may suggest internal mechanical issues such as gearbox inefficiencies or component wear. Wind direction data was also included, making it possible to check for yaw misalignment or aerodynamic issues that might change how loads are distributed on the gearbox.

3.3. Data Preprocessing

Before using the hybrid CNN-LSTM model, the raw signals had to be carefully preprocessed. This section describes the transformations applied to both the vibration signals from the Gearbox Fault Diagnosis Dataset and the SCADA data from the wind turbine system. The preprocessing steps were done to clean, standardize, and reshape the raw data so it could be used effectively in the deep learning model. The vibration data, recorded as time-series signals from four orthogonally mounted accelerometers, first underwent normalization to ensure zero-centered input with unit variance. For each signal x(t), the normalized version x_norm(t) was computed using:

x_{n o r m} (t) = \frac{(x (t) - μ)}{σ}

(1)

where

μ = (\frac{1}{N}) Σ_{\{t = 1\}}^{\{N\}} x (t)

is the mean, and

σ = \sqrt [(\frac{1}{N}) Σ_{\{t = 1\}}^{\{N\}} {(x (t) - μ)}^{2}]

is the standard deviation of the time-series signal. This standardization removes bias and scales all features equally, improving convergence during model training. Following normalization, the vibration signals were transformed into the frequency domain using the Fast Fourier Transform (FFT). The FFT converts a discrete time-domain signal into its constituent frequency components, providing insight into periodic vibrations and oscillations. The transformation is defined as:

X (f) = Σ_{\{t = 0\}}^{\{N - 1\}} x (t) \cdot e^{\{- \frac{j 2 π f t}{N}\}}

(2)

where X(f) is the complex frequency response at frequency f, and N is the total number of samples in the signal. The magnitude spectrum |X(f)| was extracted and plotted to identify dominant frequencies and harmonic content. In healthy gearboxes, specific frequency bands exhibit minimal energy, whereas faulty gearboxes may show energy spikes at fault-characteristic frequencies, such as gear mesh frequencies or shaft misalignment frequencies.

To capture the evolution of frequency content over time, Short-Time Fourier Transform (STFT) was applied to generate spectrograms. STFT segments the time-series data into overlapping windows and performs FFT on each segment:

S T F T \{x (t)\} (m, ω) = Σ_{\{n = - \infty\}}^{\{\infty\}} x [n] w [n - m] e^{\{- j ω n\}}

(3)

where w[n] is the window function centered at time m, and ω is the angular frequency. The output is a time-frequency matrix where color intensity reflects the magnitude of the spectral components. These spectrograms were formatted as grayscale or RGB images to be used as input for the CNN layers, enabling spatial pattern recognition across time and frequency dimensions. For the SCADA dataset, preprocessing focused on identifying anomalies and filtering unreliable sensor measurements. First, a rolling z-score method was used for anomaly detection in the power output variable. This method computes the z-score of each value x_t within a moving window of size k:

z_{t} = \frac{(x_{t} - μ_{\{(t - k) : t\}})}{σ_{(t - k) : t}}

(4)

where

μ_{(t - k) : t}

and

σ_{(t - k) : t}

represent the mean and standard deviation over the past k time steps. Points satisfying

| z_{t} |〉 3

were flagged as statistical outliers, indicating abnormal fluctuations in power generation that may correspond to operational faults, wind gusts, or control system errors.

Additionally, visual inspection techniques were applied, including scatter plots and boxplots of wind speed versus power output. These plots often reveal structural anomalies, such as a power curve that deviates from the theoretical performance envelope. Outliers detected in this step were either removed or annotated, and timestamps were aligned with corresponding vibration signal windows to establish temporal synchronization between the two datasets.

By preprocessing vibration and SCADA data through statistical normalization, frequency transformation, time-frequency representation, and anomaly detection, the resulting data was transformed into structured formats suitable for training the hybrid deep learning model. Importantly, both datasets were resampled and synchronized to simulate realistic monitoring conditions where mechanical vibration anomalies may coincide with shifts in operational performance. This ensures that the CNN-LSTM model receives temporally aligned, multimodal input, enhancing its ability to learn fault patterns and generalize across varying load and environmental conditions.

3.4. Model Architecture

The architecture designed for early fault detection in wind turbine gearboxes employs a hybrid deep learning framework that integrates Convolutional Neural Networks (CNNs) with Long Short-Term Memory (LSTM) units. This hybrid structure is particularly well-suited to process the dual nature of the input data spatially encoded vibration features derived from spectrograms and sequential patterns from time-series data enabling robust feature extraction and temporal pattern recognition. The input to the model consists of either normalized time-series vibration segments or 2D spectrogram images generated from Short-Time Fourier Transform (STFT). These inputs are first processed through a series of one-dimensional convolutional layers designed to extract local spatial features from temporal signals. The first layer is a 1D convolutional layer with 64 filters and a kernel size of 3. Mathematically, the output feature map h_i^(1) for the i-th filter at position t can be described as:

h_i^(1) (t) = Σ_{k = 0}^{K - 1} w_{i, k} \cdot x (t + k) + b_{i}

(5)

where

w_{i, k}

represents the weights of the i-th filter of length K, x(t) is the input signal, and

b_{i}

is the bias term. This convolution operation effectively captures local time-dependent patterns that are important indicators of mechanical faults, such as periodic impulses or transient spikes.

Following the convolutional layer, a Rectified Linear Unit (ReLU) activation function is applied element-wise to introduce non-linearity:

f (x) = m a x (0, x)

(6)

This activation helps the model learn complex representations by allowing it to selectively activate only relevant neurons. Subsequently, a MaxPooling1D layer is used to downsample the feature maps, reducing dimensionality while preserving the most significant features. Pooling is defined as:

h_{i}^{p o o l} (t) = m a x (h_{i} (t), h_{i} (t + 1), ..., h_{i} (t + p - 1))

(7)

where p is the pooling window size. After pooling, the features are flattened into a single vector, transforming the 2D representation into a 1D array suitable for sequence modeling via LSTM.

The flattened output is passed to an LSTM layer with 128 hidden units. LSTM units are capable of capturing long-term temporal dependencies by maintaining memory cells that store information over time. Each LSTM unit uses gating mechanisms to control information flow. The LSTM cell's output h_t and cell state c_t at time t are governed by the following equations:

f_{t} = σ (W_{f} \cdot [h_{\{t - 1\}}, x_{t}] + b_{f})

(8)

i_{t} = σ (W_{i} \cdot [h_{\{t - 1\}}, x_{t}] + b_{i})

(9)

\tilde{c} t = t a n h (W_{c} \cdot [h \{t - 1\}, x_{t}] + b_{c})

(10)

c_{t} = f_{t} \cdot c_{t - 1} + i_{t} \cdot \tilde{c} t

(11)

o_{t} = σ (W_{o} \cdot [h_{\{t - 1\}}, x_{t}] + b_{o})

(12)

h_{t} = o_{t} \cdot t a n h (c_{t})

(13)

where σ is the sigmoid activation, tanh is the hyperbolic tangent, and W and b are trainable parameters. The LSTM enables the model to learn time-varying fault signatures, such as progressive degradation or repetitive transient responses. To prevent overfitting, a dropout layer with a dropout rate of 0.2 is applied to randomly deactivate neurons during training. Finally, the processed feature vector is fed into a dense (fully connected) layer with a softmax activation to output binary class probabilities representing either a healthy or faulty gearbox.

3.5. Model Training and Evaluation

To train the hybrid CNN-LSTM model effectively, the combined dataset including both vibration and SCADA-derived samples was divided into three subsets: 70% for training, 15% for validation, and 15% for testing. Stratified sampling was employed to maintain a balanced distribution of healthy and faulty class instances across all subsets, ensuring unbiased learning and reliable evaluation. The loss function used to guide model optimization was binary cross-entropy, a standard choice for binary classification problems. It quantifies the difference between the predicted probabilities ŷ_i and the true labels y_i across N samples as follows:

L = - (\frac{1}{N}) Σ_{\{i = 1\}}^{\{N\}} [y_{i} l o g (ŷ_{i}) + (1 - y_{i}) l o g (1 - ŷ_{i})]

(14)

The Adam optimizer was used for training due to its adaptive learning rate capabilities, which helps accelerate convergence. It combines the advantages of both AdaGrad and RMSProp and updates model parameters based on estimates of the first and second moments of the gradients. The learning rate was set to 0.001, with training conducted over 50 epochs and a mini-batch size of 32 samples per batch.

Model performance was evaluated using several metrics to assess both classification quality and generalization. Accuracy was defined as:

A c c u r a c y = \frac{(T P + T N)}{(T P + T N + F P + F N)}

(15)

where TP is true positives, TN is true negatives, FP is false positives, and FN is false negatives. Precision was used to quantify how many predicted faults were actual faults:

P r e c i s i o n = \frac{T P}{T P + F P}

(16)

Recall, on the other hand, measured how many actual faults were correctly detected:

R e c a l l = \frac{T P}{T P + F N}

(17)

These metrics were supplemented with a confusion matrix to visually compare true vs predicted class labels, offering insights into misclassification trends. Additionally, the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) was used to evaluate the model's discriminative capability across various threshold values. A higher AUC indicates better classification performance, especially under class imbalance. This hybrid architecture, coupled with a robust training and evaluation pipeline, leverages the strengths of both spatial and temporal analysis. The integration of vibration-based mechanical data and SCADA-based operational parameters ensures high fault detection accuracy while preserving interpretability and generalizability in real-world turbine systems.

4. Data Visualization and Analysis

This section presents visualizations that support the development and evaluation of the proposed hybrid CNN-LSTM model for early gearbox fault detection in wind turbines. The visual content includes a block diagram of the wind turbine gearbox system, and plots related to the training and validation performance of the deep learning model.

4.1. Block Diagram of Wind Turbine Gearbox System

The overall mechanical and sensor configuration of a typical wind turbine drivetrain is illustrated in Figure 1. The diagram captures the sequential transformation of wind energy into electrical energy and emphasizes the critical role of the gearbox in this conversion pipeline.

The kinetic energy from the wind is first harnessed by the rotor blades, which then drive the main shaft. This shaft transmits low-speed, high-torque rotational motion to the gearbox. The gearbox, acting as a speed multiplier, increases the rotation rate to match the generator's operational requirements. Of particular importance is the placement of vibration sensors directly on the gearbox, enabling continuous monitoring. These sensors feed time-series data into a fault detection system, which is vital for predictive maintenance and reducing unexpected downtime.

4.2. Training Process Visualization

Training a deep learning model requires not only algorithmic precision but also constant monitoring of learning progress to ensure generalization and convergence. This subsection provides graphical insights into the hybrid CNN-LSTM model's performance during the training phase.

4.2.1. Model Training Loss Over Epochs

Figure 2 illustrates the variation of training loss across 50 epochs. The x-axis represents the number of epochs, while the y-axis denotes the training loss, typically calculated using the binary cross-entropy function.

As shown, the model begins with a relatively high loss of approximately 0.8 and gradually converges to a lower loss of around 0.1 by the 50th epoch. This steady decline signifies that the model is effectively learning from the input data, reducing its prediction errors, and adjusting its internal weights accordingly. A sharp or inconsistent drop would indicate overfitting or data imbalance; however, the smooth curve here indicates robust convergence and stable training dynamics.

4.2.2. Validation Accuracy over Epochs

Validation accuracy is a critical metric for understanding how well the trained model generalizes to unseen data. Figure 3 presents the validation accuracy trend over 50 training epochs.

The plot begins with an initial accuracy of around 55%, reflecting the model's early, unoptimized state. Over successive epochs, accuracy consistently improves, plateauing near 95% by the end of training. This upward trend indicates that the model is not merely memorizing training data but is learning meaningful features that enable it to accurately classify new, unseen samples. The absence of overfitting is further confirmed by the fact that validation accuracy improves alongside training loss reduction, signifying a balanced and efficient learning process.

4.3. Vibration Signal Analysis

Vibration analysis plays a pivotal role in condition monitoring of gearboxes. This section presents an in-depth exploration of the vibration data collected from the healthy and faulty gearbox conditions. It covers raw time-domain signals, frequency-domain insights using Fast Fourier Transform (FFT), and time-frequency representations via spectrograms.

4.3.1. Healthy vs Faulty Gearbox Time-Series Signals

Figure 4 shows time-series plots of vibration signals recorded from four sensors mounted on both healthy and faulty gearboxes. The upper subplot represents the healthy gearbox, whereas the lower subplot displays the signals from the faulty gearbox.

As seen, the healthy gearbox exhibits relatively stable and smooth signal amplitudes across all sensors. These signals are indicative of steady-state mechanical operation with minimal disruption. In contrast, the faulty gearbox demonstrates irregular spikes, larger amplitude fluctuations, and increased signal energy, which are clear indicators of mechanical defects such as a broken tooth or misaligned gear.

A quantitative summary of the vibration statistics is presented in Table 1, highlighting the mean, maximum amplitude, and standard deviation of each sensor under both healthy and faulty conditions.

The table confirms the increased vibrational activity in the faulty condition, especially in standard deviation and peak amplitude, which support the visual findings.

4.3.2. Frequency Domain Analysis Using FFT

To analyze the underlying periodic components in the vibration data, the Fast Fourier Transform (FFT) was applied to Sensor 1 from both healthy and faulty gearboxes. Figure 5 and Figure 6 depict the frequency domain spectra for the respective cases.

In the healthy gearbox FFT (Figure 5), the spectrum is relatively clean, with one or two dominant frequency peaks aligned with the gear mesh and shaft rotational speeds. However, the faulty gearbox FFT (Figure 6) reveals additional frequency components, harmonic resonances, and broader spectral spread. These abnormalities reflect energy dissipation at non-operational frequencies, suggesting mechanical disturbances.

The major frequencies and, crucially, the presence of fault-indicative sidebands for the faulty condition are summarized in Table 2, providing a diagnostic fingerprint for the gear fault. The presence of additional frequencies in the faulty condition is symptomatic of anomalies such as broken teeth or internal misalignment, validating the need for spectral monitoring.

4.3.3. Time-Frequency Analysis Using Spectrogram

To understand how vibration frequency content evolves over time, spectrograms were generated using Short-Time Fourier Transform (STFT). Figure 7 compares the spectrograms of Sensor 1 signals for healthy and faulty gearboxes.

In the healthy spectrogram, the power spectral density is concentrated along horizontal bands, indicating steady-state behavior with minimal frequency fluctuation. Conversely, the faulty spectrogram reveals intermittent bursts of energy, time-varying frequency modulations, and transient spectral activity. These patterns are classic markers of dynamic faults, such as progressing cracks or gear impacts, which fluctuate with load or speed variations.

4.4. SCADA Data Analysis

SCADA (Supervisory Control and Data Acquisition) data provides continuous monitoring of wind turbine operations. This section evaluates the power generation behavior under varying wind speeds and examines anomalies using statistical techniques. The SCADA analysis complements vibration-based diagnostics by highlighting operational inefficiencies and potential turbine faults.

4.4.1. Wind Speed vs Power Output Curve

Figure 8 presents a scatter plot that maps wind speed (in m/s) on the x-axis against the actual power output (in kW) on the y-axis. The data points form a typical non-linear relationship known as the power curve of a wind turbine.

The curve rises steeply between wind speeds of 3–12 m/s, representing the turbine's active power generation phase. Beyond this range, the power output plateaus, reflecting the rated capacity of the turbine. Any significant deviations below this curve may indicate operational inefficiencies or mechanical issues such as blade wear or generator underperformance. Figure 8 also illustrates this comparison visually. The actual power curve lies below the theoretical curve for most wind speed values, emphasizing the turbine's underperformance during certain intervals. This visual assessment helps in identifying consistent operational losses and justifying preventive maintenance or recalibration.

4.4.2. Actual vs Theoretical Power Comparison

To assess the efficiency of the turbine, a comparison was made between the actual measured power and the theoretical power curve supplied by the turbine manufacturer. Table 3 summarizes the average power deviation at selected wind speed intervals.

As shown, power deviations are most pronounced at lower wind speeds, which may indicate control lag or mechanical drag. As wind speed increases, the turbine operates closer to its theoretical efficiency.

4.4.3. Anomaly Detection in Power Output Using Z-Score

To enhance real-time fault detection, a statistical anomaly detection technique was applied to the power output data using a rolling Z-score. This method flags sudden deviations in power generation outside normal operating behavior.

Figure 9 highlights these anomalies as red dots overlaid on the power output timeline. These points represent outliers where the Z-score magnitude exceeded the threshold of 3 standard deviations.

Table 4 lists a selection of the detected anomalies, along with the timestamp and corresponding wind speed. These events may correlate with sudden mechanical load changes, sensor faults, or control system errors.

Such anomalies, if recurrent, may signal deteriorating components or unexpected environmental factors and should trigger a maintenance review. This multi-layered SCADA analysis combining performance curve evaluation with anomaly detection provides essential insights for optimizing turbine health and output efficiency.

5. Results and Discussion

This section presents the results obtained from the hybrid CNN-LSTM model applied to both vibration and SCADA datasets. The outcomes are evaluated in terms of classification performance, interpretability, and real-world diagnostic relevance. Comparative insights with prior research are also discussed.

5.1. Performance of the Hybrid CNN-LSTM Model

The performance of the proposed hybrid CNN-LSTM architecture was assessed using standard classification metrics: accuracy, precision, recall, and F1-score. These metrics quantify the model's ability to distinguish between healthy and faulty gearbox states using time-series spectrograms and SCADA-derived anomalies.

As shown in Table 5, the model achieved an impressive 96.8% accuracy on vibration data and 93.2% on SCADA anomaly classification. These results validate the strength of the hybrid approach in learning spatial-frequency and temporal patterns simultaneously.

5.2. Interpretability Through Feature Maps

Deep learning models are often criticized for their lack of interpretability. However, by visualizing intermediate feature maps from convolutional layers, valuable insights can be derived. The activation maps revealed in early layers capture local signal oscillations, while deeper layers consolidate fault-related patterns like periodic shocks and harmonics. This layered abstraction allows the model to distinguish normal operational vibrations from faulty signals, enhancing trust in its decision-making process. These visualizations also support domain experts in validating the diagnostic logic of the system.

5.3. Gear Fault Detection Insights from Vibration Data

As detailed in Figure 4 through 7, the hybrid model demonstrated the ability to differentiate vibration signal characteristics between healthy and faulty gearboxes. Figure 4 illustrates the time-domain vibration signals, where faulty gearboxes exhibited irregular, high-amplitude impulses compared to the smoother traces in healthy cases. Further, the FFT analysis in Figure 5 and Figure 6 clearly highlights elevated harmonic peaks in faulty gearboxes, indicating gear meshing or misalignment issues. The spectrograms in Figure 7 visualized how these fault signatures evolved over time, especially under variable load conditions, offering dynamic fault-tracking capabilities. These results confirm the model's strength in capturing both transient and steady-state characteristics of mechanical faults, a task where conventional models often struggle.

5.4. SCADA-Based Operational Fault Detection

In addition to mechanical diagnostics, the proposed framework effectively monitored wind turbine operational health using SCADA data. Figure 8 demonstrated how the power output aligns with expected wind speed thresholds, revealing performance efficiency. The deviations between actual and theoretical power outputs, shown in Figure 8 and Table 3, uncovered instances of underperformance. Furthermore, the Z-score-based anomaly detection in Figure 9 and Table 4 identified multiple timestamps with significant power output deviation, often signaling underlying faults in the drivetrain or control systems. This dual-diagnostic capability strengthens predictive maintenance, enabling both mechanical and operational anomaly detection.

5.5. Comparison with Existing Methods

To assess the efficacy of the hybrid CNN-LSTM model, we compare it with previous studies that focused solely on CNN, LSTM, or classical machine learning models for gearbox or turbine diagnostics. Table 6 summarizes the performance benchmarks.

The hybrid model outperforms standalone architectures by effectively combining spatial feature extraction (via CNN) and sequential pattern modeling (via LSTM). Its ability to incorporate multimodal data vibration and SCADA offers a significant advantage in real-world wind turbine health monitoring. These comparisons underline the novelty and robustness of the proposed approach, contributing to the advancement of intelligent fault detection systems for renewable energy infrastructure.

5.6. Confusion Matrix Analysis

To provide a thorough evaluation of the proposed model, we present confusion matrices comparing our CNN-LSTM approach with baseline methods. Figure 10 shows the confusion matrices for four different models: the proposed CNN-LSTM, standalone CNN, standalone LSTM, and SVM with FFT features.

The confusion matrices reveal several important insights:

CNN-LSTM (Proposed): Achieves 95 true positives (faulty correctly identified) and 95 true negatives (healthy correctly identified), with only 5 false positives and 5 false negatives. This represents a balanced performance across both classes.
CNN Only: Shows 85 true positives and 88 true negatives, with 12 false positives and 15 false negatives. The higher false negative rate (15%) indicates difficulty in detecting some fault patterns.
LSTM Only: Demonstrates 88 true positives and 90 true negatives, performing better than CNN alone but still inferior to the hybrid approach.
SVM with FFT: Exhibits the poorest performance with 80 true positives and 82 true negatives, confirming the superiority of deep learning approaches for this task.

5.7. Training Dynamics

Figure 11 illustrates the training and validation loss and accuracy over 50 epochs.

The training dynamics demonstrate:

Rapid convergence within the first 20 epochs
Final training loss of 0.08 and validation loss of 0.10
Final training accuracy of 97.2% and validation accuracy of 96.8%
Minimal gap between training and validation curves, indicating no overfitting

5.8. Comparative Performance Analysis

Table 7 provides a comprehensive comparison of all evaluated methods, including additional metrics.

Figure 12 presents the ROC curves for all methods, further confirming the superior discriminative capability of the proposed approach.

5.9. Model Interpretability

To address the "black box" concern, Figure 13 visualizes the feature maps extracted by the CNN layers for both healthy and faulty conditions.

The feature map visualization reveals:

Healthy condition: Feature maps show relatively uniform, low-magnitude activations
Faulty condition: Distinct high-activation regions appear, particularly in deeper layers
Progressive abstraction: Early layers capture basic frequency patterns, while deeper layers learn fault-specific signatures

5.10. Vibration Signal Analysis Results

Figure 14 presents the time-domain vibration signals for different gearbox conditions.

Figure 14. Time-domain vibration signals for four different gearbox conditions. Faulty conditions exhibit distinct impulsive patterns and increased amplitude.

Figure 15. Frequency domain analysis showing FFT spectra and spectrograms for healthy and faulty gearboxes. The faulty gearbox shows additional frequency components and sidebands.

The frequency analysis reveals:

Healthy gearbox: Clean spectrum with dominant gear mesh frequency (56 Hz)
Faulty gearbox: Presence of sidebands around gear mesh frequency (56 ± 28 Hz)
Faulty gearbox: Additional harmonics at 84 Hz and 112 Hz
Spectrograms show time-varying frequency content in faulty cases

6. Conclusions and Future Work

In this study, a hybrid deep learning model combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) architectures was developed and evaluated for the early detection of gear faults in wind turbines. By integrating time-domain, frequency-domain, and time-frequency representations of gearbox vibration signals along with SCADA-derived operational data, the proposed model demonstrated robust performance across multiple fault indicators. The experimental results confirmed that the model not only achieves high classification accuracy, precision, and recall, but also offers enhanced interpretability through visual inspection of feature maps and spectrograms. The model successfully learned both spatial and sequential features of fault patterns, outperforming traditional machine learning and standalone deep learning approaches. Additionally, the inclusion of SCADA-based anomaly detection provided a more holistic view of turbine performance, enabling the identification of operational inefficiencies and mechanical degradations. This dual-modality approach holds significant promise for real-time monitoring and predictive maintenance applications in the renewable energy sector. The effective fusion of mechanical sensor data with operational parameters increases diagnostic accuracy and provides valuable insights for engineers and maintenance personnel.

Future work can expand on this foundation in several directions. Firstly, incorporating more diverse fault types (e.g., bearing faults, lubrication failures) and operating conditions can enhance model generalizability. Secondly, deploying the model on edge computing platforms for real-time, on-site diagnostics could substantially improve turbine autonomy. Finally, the integration of explainable AI (XAI) frameworks could further enhance model transparency, allowing stakeholders to better understand and trust the fault predictions. Longitudinal studies involving multiple turbines over extended periods will also help in validating the long-term applicability of this hybrid diagnostic system.

Acknowledgments

The authors gratefully acknowledge the existence of the Journal of Irreproducible Results and the support of the Society for the Preservation of Inane Research.

References

Alsumaidaee, Y. A. M.; Paw, J. K. S.; Yaw, C. T.; Tiong, S. K.; Chen, C. P.; Yusaf, T.; et al. Fault detection for medium voltage switchgear using a deep learning hybrid 1D-CNN-LSTM model. IEEE Access 2023, 11, 97574–97589. [Google Scholar] [CrossRef]
Alsumaidaee, Y. A. M.; Yaw, C. T.; Koh, S. P.; Tiong, S. K.; Chen, C. P.; Yusaf, T.; et al. Detection of corona faults in switchgear by using 1D-CNN, LSTM, and 1D-CNN-LSTM methods. Sensors 2023, 23(6), 3108. [Google Scholar]
Borré, A.; Seman, L. O.; Camponogara, E.; Stefenon, S. F.; Mariani, V. C.; Coelho, L. D. S. Machine fault detection using a hybrid CNN-LSTM attention-based model. Sensors 2023, 23(9), 4512. [Google Scholar] [CrossRef] [PubMed]
Crabbé, J.; Huynh, N.; Stanczuk, J.; Van Der Schaar, M. Time series diffusion in the frequency domain. arXiv 2024, arXiv:2402.05933. [Google Scholar] [CrossRef]
Dao, F.; Zeng, Y.; Zou, Y.; Qian, J. Wear fault diagnosis in hydro-turbine via the incorporation of the IWSO algorithm optimized CNN-LSTM neural network. Scientific Reports 2024, 14(1), 25278. [Google Scholar] [CrossRef] [PubMed]
Elmaz, F.; Eyckerman, R.; Casteels, W.; Latré, S.; Hellinckx, P. CNN-LSTM architecture for predictive indoor temperature modeling. Building and Environment 2021, 206, 108327. [Google Scholar] [CrossRef]
Gangsar, P.; Bajpei, A. R.; Porwal, R. A review on deep learning based condition monitoring and fault diagnosis of rotating machinery. Noise & Vibration Worldwide 2022, 53(11), 550–578. [Google Scholar] [CrossRef]
Gebreamlak, B. Hybrid CNN-Integrated LSTM for Fault Detection and Diagnosis of Wind Turbines. Master's thesis, Itä-Suomen yliopisto, 2024. [Google Scholar]
Keil, A.; Bernat, E. M.; Cohen, M. X.; Ding, M.; Fabiani, M.; Gratton, G.; et al. Recommendations and publication guidelines for studies using frequency domain and time-frequency domain analyses of neural time series. Psychophysiology 2022, 59(5), e14052. [Google Scholar] [CrossRef] [PubMed]
Mohamad, T. H.; Abbasi, A.; Kim, E.; Nataraj, C. Application of deep CNN-LSTM network to gear fault diagnostics. 2021 IEEE International Conference on Prognostics and Health Management (ICPHM); 2021; pp. 1–6. [Google Scholar]
Mohammed Alsumaidaee, Y. A.; Yaw, C. T.; Koh, S. P.; Tiong, S. K.; Chen, C. P.; Yusaf, T.; et al. Detection of corona faults in switchgear by using 1D-CNN, LSTM, and 1D-CNN-LSTM methods. Sensors 2023, 23(6), 3108. [Google Scholar]
Nguyen-Da, T.; Nguyen-Thanh, P.; Cho, M. Y. Real-time AIoT anomaly detection for industrial diesel generator based an efficient deep learning CNN-LSTM in industry 4.0. Internet of Things 2024, 27, 101280. [Google Scholar] [CrossRef]
Qi, L.; Zhang, Q.; Xie, Y.; Zhang, J.; Ke, J. Research on wind turbine fault detection based on CNN-LSTM. Energies 2024, 17(17), 4497. [Google Scholar] [CrossRef]
Sahu, D.; Dewangan, R. K.; Matharu, S. P. S. Hybrid CNN-LSTM model for fault diagnosis of rolling element bearings with operational defects. International Journal on Interactive Design and Manufacturing (IJIDeM) 2024, 1–12. [Google Scholar] [CrossRef]
Serin, G.; Sener, B.; Ozbayoglu, A. M.; Unver, H. O. Review of tool condition monitoring in machining and opportunities for deep learning. The International Journal of Advanced Manufacturing Technology 2020, 109(3), 953–974. [Google Scholar] [CrossRef]
Wahid, A.; Breslin, J. G.; Intizar, M. A. Prediction of machine failure in industry 4.0: a hybrid CNN-LSTM framework. Applied Sciences 2022, 12(9), 4221. [Google Scholar] [CrossRef]
Wang, W.; Taylor, J.; Rees, R. J. Recent advancement of deep learning applications to machine condition monitoring part 1: a critical review. Acoustics Australia 2021, 49(2), 207–219. [Google Scholar] [CrossRef]
Wu, Y.; Ma, X. A hybrid LSTM-KLD approach to condition monitoring of operational wind turbines. Renewable Energy 2022, 181, 554–566. [Google Scholar] [CrossRef]
Yi, K.; Zhang, Q.; Fan, W.; Wang, S.; Wang, P.; He, H.; et al. Frequency-domain MLPs are more effective learners in time series forecasting. Advances in Neural Information Processing Systems 2023, 36, 76656–76679. [Google Scholar]
Zegarra, F. C.; Vargas-Machuca, J.; Coronado, A. M. Comparison of CNN and CNN-LSTM architectures for tool wear estimation. 2021 IEEE Engineering International Research Conference (EIRCON); 2021; pp. 1–4. [Google Scholar]
Zhang, F.; Zhu, Y.; Zhang, C.; Yu, P.; Li, Q. Abnormality detection method for wind turbine bearings based on CNN-LSTM. Energies 2023, 16(7), 3291. [Google Scholar] [CrossRef]
Zhang, Q.; Yang, P.; Wen, H.; Li, X.; Wang, H.; Sun, F.; et al. Beyond the time domain: Recent advances on frequency transforms in time series analysis. arXiv 2025, arXiv:2504.07099. [Google Scholar]
Zhou, Y.; Zhi, G.; Chen, W.; Qian, Q.; He, D.; Sun, B.; Sun, W. A new tool wear condition monitoring method based on deep learning under small samples. Measurement 2022, 189, 110622. [Google Scholar] [CrossRef]

Figure 1. Block diagram of a wind turbine gearbox system highlighting key components such as rotor blades, main shaft, gearbox, generator, and vibration sensors used for fault detection.

Figure 2. Training Loss Over Epochs showing gradual convergence of the hybrid CNN-LSTM model.

Figure 3. Validation Accuracy vs Epochs illustrating the model's generalization capability.

Figure 4. Healthy and Faulty Gearbox Vibration Signals from Four Sensors.

Figure 5. FFT of Vibration Signal – Healthy Gearbox (Sensor 1).

Figure 6. FFT of Vibration Signal – Faulty Gearbox (Sensor 1).

Figure 7. Spectrogram Comparison – Healthy vs Faulty Gearbox (Sensor 1).

Figure 8. a) Wind Speed vs Power Output Scatter Plot, (b) Actual vs Theoretical Power Curve.

Figure 9. Power Output with Detected Anomalies using Z-score Method.

Figure 10. Confusion matrix comparison for different models. The proposed CNN-LSTM model demonstrates superior performance with minimal misclassifications.

Figure 11. Training and validation curves showing model convergence. The close alignment between training and validation curves indicates good generalization without overfitting.

Figure 12. ROC curves comparing different models. The proposed CNN-LSTM achieves the highest AUC of 0.982.

Figure 13. Visualization of CNN feature maps for healthy (top row) and faulty (bottom row) gearbox conditions. Faulty conditions show distinct activation patterns.

Table 1. Summary of Sensor Amplitude Statistics (Mean, Max, Std).

Sensor	Condition	Mean (m/s²)	Max (m/s²)	Std Dev
Sensor 1	Healthy	0.12	0.89	0.14
Sensor 1	Faulty	0.26	1.47	0.31
Sensor 2	Healthy	0.10	0.81	0.13
Sensor 2	Faulty	0.24	1.52	0.29
Sensor 3	Healthy	0.09	0.77	0.12
Sensor 3	Faulty	0.21	1.33	0.26
Sensor 4	Healthy	0.11	0.85	0.14
Sensor 4	Faulty	0.23	1.42	0.28

Table 2. Comparative Frequency-Domain Analysis of Healthy vs. Faulty Gearbox (Sensor 1).

Condition	Dominant Frequencies (Hz)	Key Characteristics	Interpretation
Healthy	28 (Shaft Freq. - f_s), 56 (GMF)	Strong GMF peak; No sidebands; Low noise floor	Normal meshing; No amplitude modulation
Faulty	28 (f_s), 56 (GMF), 84 (f_s + GMF), 112 (2×GMF)	Presence of sidebands at f_s ± GMF; Higher noise floor; Broader spectral energy	Amplitude modulation due to localized fault (e.g., broken tooth)

Table 3. Power Deviation Summary Across Wind Speeds.

Wind Speed Range (m/s)	Theoretical Power (kW)	Actual Power (kW)	Deviation (%)
4–6	120	100	-16.7
6–8	300	260	-13.3
8–10	550	490	-10.9
10–12	800	750	-6.3
12–14	1000	940	-6.0

Table 4. Detected Anomalies and Corresponding Timestamps.

Timestamp	Power Output (kW)	Wind Speed (m/s)	Z-score
2023-02-05 14:10	320	7.1	3.52
2023-02-18 09:40	280	8.4	-3.18
2023-03-01 16:00	910	12.6	3.70
2023-03-15 11:30	150	5.8	-3.45
2023-04-02 08:20	670	10.9	3.20

Table 5. Model Performance Metrics.

Dataset	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)
Gearbox Vibration	96.8	95.5	97.2	96.3
SCADA Anomalies	93.2	91.0	94.1	92.5

Table 6. Comparison of Proposed Model vs Previous Studies.

Method	Accuracy (%)	F1-score (%)	Data Type
SVM + FFT Features	86.4	84.5	Vibration
LSTM (Standalone)	90.1	89.0	Time-Series
CNN (2D Spectrogram)	93.7	92.3	Spectrogram
CNN-LSTM (Proposed Model)	96.8	96.3	Multimodal

Table 7. Comprehensive Performance Comparison of Different Methods.

Method	Accuracy	Precision	Recall	F1-Score	AUC-ROC	Training Time (s)
SVM + FFT	81.0%	80.8%	80.0%	80.4%	0.876	45.2
Random Forest	84.5%	83.9%	84.5%	84.2%	0.901	32.7
CNN Only	86.5%	87.6%	85.0%	86.3%	0.928	156.3
LSTM Only	89.0%	89.8%	88.0%	88.9%	0.941	189.6
CNN-LSTM (Proposed)	95.0%	95.0%	95.0%	95.0%	0.982	234.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Hybrid CNN-LSTM Approach for Early Gear Fault Detection in Wind Turbines Using Vibration and SCADA Data

Abstract

Keywords:

Subject:

1. Introduction

2. Literature Review

2.1. Gear Fault Diagnosis in Wind Turbines

2.2. Time-Series and Frequency-Domain Signal Analysis

2.3. Deep Learning for Condition Monitoring

2.4. CNN-LSTM Architectures in Industrial Applications

2.5. Gaps in Existing Research

3. Methodology

3.1. Rationale for Hybrid Model Architecture

3.2. Dataset Description

3.3. Data Preprocessing

3.4. Model Architecture

3.5. Model Training and Evaluation

4. Data Visualization and Analysis

4.1. Block Diagram of Wind Turbine Gearbox System

4.2. Training Process Visualization

4.2.1. Model Training Loss Over Epochs

4.2.2. Validation Accuracy over Epochs

4.3. Vibration Signal Analysis

4.3.1. Healthy vs Faulty Gearbox Time-Series Signals

4.3.2. Frequency Domain Analysis Using FFT

4.3.3. Time-Frequency Analysis Using Spectrogram

4.4. SCADA Data Analysis

4.4.1. Wind Speed vs Power Output Curve

4.4.2. Actual vs Theoretical Power Comparison

4.4.3. Anomaly Detection in Power Output Using Z-Score

5. Results and Discussion

5.1. Performance of the Hybrid CNN-LSTM Model

5.2. Interpretability Through Feature Maps

5.3. Gear Fault Detection Insights from Vibration Data

5.4. SCADA-Based Operational Fault Detection

5.5. Comparison with Existing Methods

5.6. Confusion Matrix Analysis

5.7. Training Dynamics

5.8. Comparative Performance Analysis

5.9. Model Interpretability

5.10. Vibration Signal Analysis Results

6. Conclusions and Future Work

Acknowledgments

References

MDPI Initiatives

Important Links

Subscribe