Anomaly Detection of Wind Turbine Gearbox based on SCADA Temperature Data using Machine Learning

: Wind energy is becoming an essential source of power for countries which have the 1 aim to reduce greenhouse gases emission and mitigate the effects of global warming. The Wind 2 Turbines (WTs) installed around the globe is increasing signiﬁcantly every year. The dramatic 3 increase in wind power has encountered quite a few challenges, among which the major issues are 4 availability and reliability. The unexpected failure in WTs Gearbox (GB) ultimately increases the 5 Operation and Maintenance (O&M) cost. The identiﬁcation of faults in the earlier stages before it 6 turns to catastrophic damage to other components of WT is curcial. This research deals with the 7 prediction of WT failures by using a Supervisory Control and Data Acquisition (SCADA) system. 8 The main aim is to forecast the temperature of the WTs GB to predict impending overheating of the 9 GB at an early stage. The earlier prediction will help to optimize the maintenance period and to 10 save maintenance expenses and, even more important, to generate warnings in due time to avoid 11 major damages or even technical disasters. In the proposed method we compared six different 12 machine learning (ML) models based on error and accuracy of prediction. The bagging regressor 13 is the best ML model, which results in the mean square error of 0.33 and R 2 of 99.8 on training 14 data. The bagging regressor is then used to predict the fault in the WT GB, which detected the 15 anomalous behavior of WT GB 59 days earlier than the actual failure. This model also detects the 16 extremely unusual behavior of the GB 9 days earlier than a complete failure. 17


Introduction
The number of WTs installed all over the world is increasing day by day and the 51 numbers are expected to by 2,02,1000 by 2050 1 [6]. The rapid expansion of wind energy 52 has been encountering many challenges among these; the major challenges are the high 53 maintenance cost and poor reliability. The reliability of WT is susceptible to the failure 54 of its subsystems and degradation. 55 The undesirable GB reliability is a great challenge to the utilization of wind power more   Figure 2 shows the failure rates of different subassemblies and its downtime after failure. 66 The results in Figure 2 show that the lower the subassembly's reliability, the longer is the 67 downtime of the corresponding sub-assemblies [8].

Figure 2.
Wind turbine different sub-assemblies downtime and failure rates [8]    The WT SCADA data for this study was collected from the onshore wind farm La  Table 2. Data pre-processing is crucial in ML because "Better data beats fancier algorithms".

172
The SCADA data used for this study have extensive amount of WT parameters. The The collected SCADA has different parameters of varying scales. The different scaled data is very difficult to visualization and this degrades the ML predictive performance as well as slow down the prediction. The data is rescaled using Python to the same scale. The parameters are rescaled in the range of 0 to 1 and this is called normalization. The data is rescaled using the MinMaxScaler class from scikit-learn. The rescaled value is calculated using the following equation Where W' is rescaled value and w is the original value,max(w) is the maximum value in   The ML model selection is one of the toughness jobs in predicting GB temperature.

210
In this study, we compare the different ML models on the basis of the performance as 211 shown in Figure 7. The main points we consider during our predictive model selection Where y is the targeted output , x is the input parameters , m is the slope of linear 241 line and c is y intercept K-Nearest Neighbor: The K-Nearest Neighbor is a simple ML algorithm which predicts the targeted output based on the distance functions. The distance function for continuous value are :

Coefficient of determination:
The coefficient of determination (R 2 ) metric measures how good the algorithm to the regression of the equation by calculating the relationship between the input parameters and targeted variables. This metric is also called fitting degree of regression as well . Mathematically this is expressed as: where Gbt p is gearbox predicted temperature, Gbt a is gearbox actual temperature, K

277
The maximum value of R 2 is 1. If the value is closer to 1 the model is good and fitting to the regression line to the targeted variable [43]. If the value is less (less than 0.5) the model is not good for this data.

Mean Square Error :
The mean square error (MSE) estimated the average square difference between the actual value and predicted value. The MSE measures the quality of the regression ML algorithm. The value closer to zero is better. If the predicted GB temperature is Gbt p (m) and actual GB temperature is Gbt a (m) and total data samples are K mathematically MSE is expressed as:

Mean Absolute Error :
The mean square error which denoted as MAE. The MAE calculate the average between the absolute actual value and the absolute predicted value. Mathematically this error is represented as : Where Gbt p (m) is the predicted gearbox temperature, Gbt a (m) is actual gearbox temper-279 ature and K is total data samples.

281
The first objective is to deal with the SCADA data of WT and to remove the unnec- The feature selection is done using the correlation statistics as shown in Figure 11.  The first regressor we used to train the WT SCADA data is an extra-tree regressor.

303
The MSE, MAE and R 2 reported were 0.86, 0.79 and 43.01% respectively. The time taken 304 by this regressor was 55.95 seconds. As the dataset we used in this study is big which 305 contain almost 200000 samples which is hard to visualize for better visualization Figure   306 12 show the prediction of GB temperature for 3 days.  less and the error is high. The time taken by this model is less than the previous. Figure   320 14 represents the better visualization of the prediction with 3 days of data only.     we decide to remove, based on the observed outcomes in each fold.

348
The results is further evaluated without removing any feature from the dataset.   predicted temperature. Figure 18 shows the change in temperature of GB for 3 month 363 period which eventually fails after 76 days. Figure 19 shows the difference between  Figure 19. Difference between actual and prediction gearbox temperature for 3 months

378
The wind industry is past few years experiences significant growth and will con-379 tinue to play a significant role in providing energy to households and industries.However,