Machine Learning Schemes for Anomaly Detection in Solar Power Plants

The rapid industrial growth in solar energy is gaining increasing interest in renewable 1 power from smart grids and plants. Anomaly detection in photovoltaic (PV) systems is a demand2 ing task. In this sense, it is vital to utilize recent advances in machine learning to accurately and 3 timely detect different anomalies and condition monitoring. This paper addresses this issue by 4 evaluating different machine learning techniques and schemes and showing how to apply these 5 approaches to solve anomaly detection and detect faults on photovoltaic components. For this, 6 we apply distinct state-of-the-art machine learning techniques (AutoEncoder Long Short-Term 7 Memory (AE-LSTM), Facebook-Prophet, and Isolation Forest) to detect faults/anomalies and 8 evaluate their performance. These models shall identify the PV system’s healthy and abnormal 9 actual behaviors. Our results provide clear insights to make an informed decision, especially with 10 experimental trade-offs for such complex solution space. 11


13
For the past decade, rapid development and expansion of renewable energy have 14 been explored, including power plants. Such development is expected to advance our 15 abilities to produce clean and affordable energy, creating economic growth. As so, solar 16 power generation challenges have attracted significant attention recently. A leading 17 concern is detecting and localizing anomalous patterns within the solar systems. Big 18 data [4], and data-driven techniques highly assist in detecting and preventing such 19 anomalies and detect faults on photovoltaic (PV) components. 20 The scalable and coherent functionality of PV systems needs advanced tools to 21 monitor the system parameters' dynamic evolution and release alerts about anomalies 22 to decision-makers. Online monitoring of PV systems is technically beneficial to assist 23 operators in managing their plants and establishing economic assimilation into smart 24 grids [1]. Disastrous faults in photovoltaic (PV) arrays, if not appropriately identified, 25 will accordingly diminish the generated power and indeed introduce fire hazards [2]. 26 After anomalies appear on the surface of solar panels, if panel holders know the existence 27 of the anomalies in time, they can eliminate the anomalies to prevent more energy loss A new technique is presented by [1] for monitoring PV systems by detecting 92 anomalies using "kNearest-Neighbours (kNN)" and "one-class support vector machine 93 (OCSVM)". The Self-learning algorithms markedly decreased the measuring exertion 94 and supported reliable monitoring of faults. The authors of [12] used a k-Nearest- 95 Neighbours algorithm and a Multi-layer Perceptron to process the data from a DC    The unsupervised contextual and collective detection approach is utilized by [20] 138 to data streams from a large energy distributor in the Czech Republic. The approach 139 examined distinctive forms of potential anomalies (e.g., over-voltages, under-voltages).    operates starts from the input to the hidden layer, whereas decoding handles the hidden 157 layer to the output layer. AE has the merit of learning unlabelled data efficiently to 158 predict from the input vector. Figure 1 illustrates the construction of AE.

159
The encoding process is described by: Where W i and b i are the weights and bias parameters between the input and 162 the hidden layer. X is the primary input, H is the intermediate representation of the 163 primary data, and f 1 is the activation function (e.g., ReLU, Logistic (Sigmoid) and TanH).

164
Likewise, the decoding process is expressed as:  Figure 2 illustrates the LSTM unit.

178
LSTM controls the information flow through the gates using the following equa-179 tions: Where h t is the present final output, c t is current cell state, x t is the present input, 182 f t is the forget gate, i t is the input gate, u t is the input to the cell c that is gated by the Where C is the carrying capacity, k is the growth rate, and m is an offset specification.
where a(t) T is the cumulative growth till changepoints s j [29] and a(t) ∈ {0, 1} S is 203 a vector that can be computed as follows [28]: The prophet then modifies the primary logistic growth model to include trend 205 updates for non-linear, saturating growth as follows: and the linear growth can be draft as follows: Let δ ∈ R S such that points in δ are rate of modifications in g(t). The allocation 208 of change points would be by assigning δ using Laplacian distribution δ j ∼ Laplace  branches, it is more probable to be an anomaly [30].

220
The algorithm can be described as follow: Let T be a node in the tree, q is a sample 221 of selected features, p is the threshold value, and X = {x 1 , x 2 , x 3 , x 4 , . . . , x n } is the dataset with n samples where each sample has d features. T can be a leaf node or can 223 be an inside node (with two sub-nodes T le f t , T right ). If the threshold p > q, then the 224 sample will be maintained to the T le f t , otherwise the sample will be assigned to T right . for n = 2 0 otherwise (17) where H(i) expresses the harmonic which that can be evaluated by ln(i) + γ(γ 234 represents Euler's constant) [30].
Where is the Spearman's rank correlation coefficient, d_i is the difference among 254 the two ranks of each observation, and n is the number of observations. The figure   255 shows that both internal and external factors are highly correlated except for the daily

261
This section discusses the experiment evaluation carried to validate and evaluate the   signifies that the seasonality's impact is combined with the forecast trend. Table 1 shows 291 the parameters grid for Prophet with a total of 162 possible models. Due to the working principle of Prophet of predicting a time series signal, the 293 objective function was selected to be R-squared (R 2 ) which can be computed as follow:

292
Where y,Ŷ are the actual and the predicted data, respectively, whileȲ is the mean 295 value of actual data. The optimization results found the best R 2 to be 87.448 for the   The AE-LSTM, also learns/trains on time series signal, and then tries to pre-308 dict/forecast this signal characteristics in the future. Therefore, same as in Prophet, 309 the R 2 was used as an objective function. The optimization results found the best R 2 to  Table   317 3 shows the parameters grid for AE-LSTM with a total of 338 possible models. The

318
bootstrap is a parameter that controls the sampling process. If it is set to True, then the   The second test investigated the external correlated factor of module temperature 340 as shown in Figure 9. It can be seen that the signal is healthy. However, the Prophet