Intelligent identification method of sedimentary microfacies based on DMC-BiLSTM

Sedimentary microfacies division is the basis of oil and gas exploration research. The traditional sedimentary microfacies division mainly depends on human experience, which is greatly influenced by human factor and is low in efficiency. Although deep learning has its advantage in solving complex nonlinear problems, there is no effective deep learning method to solve sedimentary microfacies division so far. Therefore, this paper proposes a deep learning method based on DMC-BiLSTM for intelligent division of well-logging—sedimentary microfacies. First, the original curve is reconstructed multi-dimensionally by trend decomposition and median filtering, and spatio-temporal correlation clustering features are extracted from the reconstructed matrix by Kmeans. Then, taking reconstructed features, original curve features and clustering features as input, the prediction types of sedimentary microfacies at current depth are obtained based on BiLSTM. Experimental results show that this method can effectively classify sedimentary microfacies with its recognition efficiency reaching 96.84%.


Introduction
Sedimentary microfacies is one of the most important research contents in oil and gas exploration and development, and plays an especially important role in predicting the production of remaining oil enrichment area. In the traditional sedimentary microfacies division, based on previous research results and regional sedimentary background, sedimentary types in the study area are usually determined by combining paleontology, sedimentology and other theories with the analysis of key well core facies markers. The sedimentary microfacies in the study area are usually divided by comprehensively analyzing the characteristics of rock thickness and grain size and analyzing the morphological characteristics of various logging curves [1][2] . Therefore, it is difficult to find out the qualitative mapping relationship between sedimentary facies and various logging data due to the complicated and tedious cross-analysis of various data.
With the rapid development of computer technology, the main applications of machine learning in sedimentary microfacies include KNN [3] , Bayesian network [4] , support vector machine [5] and artificial neural network [6] . It depends on specific geological environment and sedimentary background when these methods mainly construct morphological parameters and physical parameters of GR curves. Therefore, machine learning method has limitations in constructing the identification model of sedimentary microfacies.
Although deep learning has its advantage in mining nonlinear relationship of data, there is no effective identification model of sedimentary microfacies so far. recurrent neural network can conduct sequential processing over time. In 2020, Song Hehou, et al applied LSTM network to identifying well logging curve shape [7] , but it could only identify 4 kinds of logging facies of Bell shape, Funnel shape, Egg shape, Cylinder shape with only one-way sequence, and could not satisfy the demand for effectively identifying more complex log facies types in the actual sedimentary environment. Therefore, it is urgent to develop a more accurate deep learning network model with stronger generalization ability to effectively identify sedimentary microfacies. This paper proposes a sedimentary microfacies identification model based on DMC-BiLSTM deep learning, and establishes a complete set of sedimentary microfacies recognition method of curve processing -structure characteristics -model trainingmodel evaluation. The validity and advantage of this proposed method is proved by comparing this method with geological experts and the other two deep learning models in precision and accuracy.  1) The original GR logging curve is decomposed by STL algorithm, and the periodic component and residual term are removed from the decomposed results to obtain the trend component. In this paper GR is decomposed by different frequency windows to obtain multiple trend component features; 2) Use the same frequency window as the trend decomposition to conduct median filtering to obtain multiple median filtering features; 3) The obtained features and the original GR data features are used as the input features of Kmeans clustering to conduct unsupervised learning and the clustering features with spatial and temporal correlation are obtained; 4) Establish the correspondence between logging facies and sedimentary microfacies of delta front subfacies; 5) All feature sets generate characteristics matrix, which are input into Bidirectional Long Short-Term Memory Network (BiLSTM) for sedimentary microfacies division.

DMC Characteristics Structure
DMC feature structure: D represents trend feature, M represents median filter feature, and C represents clustering feature.In the process of logging data collection, the effective signals are mixed with various noises due to external interference or the instrument itself. The key logging facies partitioning parameters such as the shape and width of logging signal need to be kept to the maximum extent while removing noises. First, the STL method is used to decompose the curve to get its trend feature, and then the median filter is used to get the median filter feature. Then the Kmeans clustering feature is constructed. Finally, the feature set of the GR curve is obtained after normalization of all the features.
1. The trend Component STL algorithm is a common algorithm in time series decomposition. Based on LOESS, the data at a certain moment is decomposed into trend component, periodic component and residual term [8] . According to the preliminary analysis of GR curve, the addition model can meet the decomposition requirements, namely: the residual term; v is some time.

Median Filtering
Basic principle of median filtering [9][10] is: by setting the window value, the output signal of a certain point can be replaced by the statistical median value of all signals within the window range of this point. One-dimensional median filtering is defined as: Kmeans [11] is an unsupervised learning clustering algorithm, and its steps are as follows: 1) Characteristic data of GR curve is clustered into K class (for better classification effect of the final model, K=6). First, 6 GR data points are selected as the initial center points; 2) According to the principle of minimum distance from the initial center point, all curve data is divided into the classes where each center point is located; 3) There is a large amount of data in each class, and the means of all curve sample data of 6 classes are calculated as the 6 center points of the second iteration; 4) Step 2 and 3 are repeated by using the center until convergence (the center point will not alter the table or reach the specified number of iterations), and then the clustering process is ended. Euclidean distance is easy to understand and the most commonly used distance formula. The Euclidean distance can better reflect the similarity between GR curve sequences in the small depth section, and fully reflect the spatial correlation of GR curve. It is defined as: Where: 1k x , 2k x are two n-dimensional vectors.

BiLSTM Principle
1 BiLSTM Principle GR data can be regarded as the time series changing with the depth, and its shape classification depends on the previous output and current input. Assuming that a given GR sequence , BiLSTM [12][13][14] can model each sequence forward and backward at the same time because each tag encoding contains contextual information from the past and the future, and it can better represent the long-term dependence on time series data in a richer way. Each layer of the BiLSTM network is composed of a single LSTM unit propagating forward and backward (Fig 2).

Fig. 2 BiLSTM Structure
Long Short-Term Memory Network (LSTM) can solve the problem that RNN cannot cope with its dependence on long distance [15] .The hidden layer of the original RNN has only one state, h, which is very sensitive to short-term input. Add a state c to hold long-term states, and it is called cell states. Three more gates, the forgetting gate( Where:  , tanh as two activation function, t c represents the storage unit at time t, t c represents the candidate storage unit at time t,   ; ; ; ; ; ; ; is the weight coefficient, updated by the back propagation,

 
; ; ; represents the bias coefficient, t x represents the sequence value at time t, t h represents the output of the hidden layer.

Model Configuration and Training
In this paper, BiLSTM network is used to identify sedimentary microfacies, and a four-layer BiLSTM unit is built to learn the GR curve sequence. The "distributed characteristic representation" learned is mapped to the sample tag space through the fully connected layer; finally, the output vector is input to the Softmax layer, which contains 5 neural units, to perform the logging phase classification task.
In order to avoid model overfitting, the data is divided into training set and validation set. The cross-validation method is adopted. 75% of the wells are used for training and 25% for validation. A Dropout unit with a coefficient of 0.2 is added to each layer of the network, which increases the generalization ability of the model by abandoning some connections. Adam [16] optimization algorithm is used to update the gradient, and learning rate attenuation algorithm is defined to accelerate the convergence of the model.
The cross entropy loss function is used as the training loss, as follows: softmax( ) Where, y is the expected output, a is the actual output of the neuron.

Source of Log Data
The logging data in this experiment is Gamma Log Facies Type Prediction data from the artificial intelligence companies CrowdAnalytix in California's Silicon Valley. The variation of GR logging characteristics can be used as the characterization of grain size, making GR play an important role in sequence stratigraphic analysis [17] .The abrupt changes in the GR log response are interpreted as sharp lithologic fractures associated with unconformity and sequence boundaries, so the basic shape of the logging curve is often used to explain the sedimentary background of the sedimentary cycle [18] . Table 2 shows the logging curve types of delta front subfacies.

Divide the Training Set and the Validation Set
The data was collected from 4,000 wells, and each well contained 1,100 data entries. The data of 3000 wells were selected as the training set, and the remaining 1000 wells as the verification set to verify the prediction ability of the classification model based on DMC-BiLSTM proposed in this paper on sedimentary microfacies. Table 3 is the specific divided training set and verification set.

Data Preprocessing
All experiments in this paper were carried out on Intel (R) Core (TM) i5-8300CPU@2.3GHz and8GBRAMD devices. 4 Taking No. 9 Well as an example, the frequency window values of decomposition and filtering are both 19, and the trend component feature, median filter feature and Kmeans clustering feature are constructed for the GR curve (Fig 4). -GR represents the original GR curve data, -decompose represents the trend decomposition curve, -medfilt represents the median filtering curve, -cluster represents the clustering characteristic curve, and -label represents the true label distribution. It can be seen that such a feature set can not only remove the high-frequency noise in the curve, but also retain the geological trend characteristics and effective edge morphological characteristics, and fully reflect the spatial and temporal correlation of the curve data. In the original data, in order to make the fitting ability of the model stronger, the GR curve feature construction was carried out in all the odd filtering windows in 3-20. Finally, the 20-dimensional feature set was generated together with the original GR curve data.

Experimental Results
In the data, the types of Distributary bays are significantly more than the other four microfacies types, and there is a phenomenon of data imbalance. Therefore, it is not scientific to adopt a single evaluation index. In this paper, multiple evaluation indexes were introduced, including confusion matrix, precision, recall, F1-score, and overall accuracy [19] to comprehensively evaluate the prediction results and to comprehensively reflect the classification effect of the model.
Where, TP is the true example, FP is the false positive example, TN is the true negative example, and FN is the false negative example. From the confusion matrix and various evaluation indexes, it can be seen that the classification accuracy of the model in this paper reached 0.98, 0.94, 0.96, 0.95 and 0.95, respectively for Distributary bay, Front sand sheet, Distributary channel, Mouth bar and Channel edge. Among them, the classification effect for the low-amplitude microdentate Distributary bay is the best, which is due to the fact that Distributary bay is mainly composed of mudstone deposits, shows high GR value and is easy to identify. Secondly, the accuracy, recall rate and F1 value reached more than 96% in the identification of Cylinder shape Distributary channel. For the identification of other curve shapes, the values of all indexes were slightly lower, probably because the curve shapes of the three are not very different, and the only difference is between water progradation and retrogradation during sedimentation, but the accuracy of prediction of the three curve shapes were also over 94%. In conclusion, the overall prediction ability of the model on the validation set was great, with average precision, average recall rate and average F1 value all above 95%, and the accuracy rate also reached 96.84%. The DMC-BiLSTM model proposed in this paper is compared with LSTM and (Time Convolutional Network) TCN [20] . Batch size of 100 was adopted for all the three models, 1000 rounds were trained, an early stop strategy was set (the model was within 20 rounds, if loss did not decrease, it was the final model), and model parameters such as Adam optimization algorithm were used. The difference is that the input to the DMC-BiLSTM model is the new 20-dimensional feature set. The final results are shown in Table 5. It can be seen that the DMC-BiLSTM model proposed in this paper has the best effect, with the final accuracy reaching 96.84%, while that of LSTM model is 69.76% and that of TCN model is 88.68%. The final accuracy is about 15% higher than that of LSTM model and about 8% higher than that of TCN model. At the same time, compared with the traditional LSTM model and TCN model, the prediction precision of DMC-BiLSTM model is also improved in each category. It is proved that the structural features play an important role in the identification of sedimentary microfacies, and the importance of before and after remote sequence information in the morphological analysis of GR curves is reflected.  19, -med was the median filtering curve of GR curve with the frequency window value of 19, -cluster was the characteristic curve after clustering, and the label column was the true label value. It can be seen that DMC-BiLSTM model not only has obvious advantage in the recognition accuracy of sedimentary microfacies, consistented with the artificial partition, but also can well distinguish the boundaries between different categories. 8

Conclusion
In this chapter, an intelligent identification method of sedimentary microfacies based on DMC-BILSTM deep learning is proposed.In this method, three kinds of features are constructed: geological trend feature, median filter feature and clustering feature.The median filtering feature can not only remove the high-frequency noise in the logging curve, but also retain the effective edge information. The clustering feature can better reflect the spatio-temporal correlation of the logging curve and distinguish the boundaries between different curve shapes.Comparing the DMC-BILSTM method with the TCN and LSTM models obtained from the original curve input simulation, the classification accuracy of the proposed method is 96.84%, which is higher than the 88.68% of the TCN model and 69.76% of the LSTM model.DMC-BILSTM method can be used to realize end-to-end learning of logging data, and the recognition accuracy of each kind of sedimentary microfacies is 98%, 94%, 96%, 95% and 95% for Distributary bay, Front sand sheet, Distributary channel, Mouth bar, Channel edge.Experimental results show that the DMC-BILSTM model is helpful to extract the hidden features of logging curve sequences and fully learn the differences between different shapes.It can effectively identify sedimentary microfacies, and the model has strong generalization and robustness.It provides a new method for the identification of sedimentary microfacies.