2.1. Machine Learning Methods
Machine learning methods encompass a set of algorithms designed to construct models that learn and improve autonomously based on data and statistical principles. These methods exploit patterns and regularities in the data, thereby enabling computers to perform tasks such as prediction, classification, clustering, and optimization. This paper uses random forest algorithm (RF), support vector machine algorithm (SVM) and extreme gradient boosting algorithm (XGB) for reservoir sweet spot prediction.
2.1.1. Random Forest Algorithm
The Random Forest algorithm, or RF for short, is a powerful and widely used supervised learning algorithm that can address both regression and classification problems. As an ensemble learning algorithm, RF integrates multiple decision trees to produce a final prediction result. In classification problems, RF determines the output class by taking the mode of the individual tree outputs. In regression problems, on the other hand, it uses the average output of each decision tree to obtain the final regression result. RF allocates samples to each tree by randomly drawing from the dataset and replacing the drawn samples. With each data extraction, a decision tree model is built, and ultimately, all the decision trees are integrated to form a "forest". The final prediction is determined through a voting decision process. Unlike the decision tree algorithm, RF introduces two distinct random conditions: 1) Extract training datasets randomly from the entire dataset, and each extraction is attributed to be a decision tree; 2) Select a subset of feature attributes randomly from the datasets which are attributed into the extracted training dataset. These two random conditions enable RF to achieve better performance compared with a single decision tree. To address limitations including overfitting and high variance, Random Forest (RF), as a variant of the decision tree algorithm, is firstly introduced by Breiman(Breiman., 2001). Liaw and Wiener (2002) confirmed the effectiveness of RF on a range of datasets, showing that it outperformed other popular classification algorithms, such as support vector machines and artificial neural networks (Liaw & Wiener, 2002). Since then, RF becomes a favorable algorithm in many applications, including image classification, gene expression analysis, and credit scoring.
The algorithm of RF is a powerful and versatile supervised learning algorithm that combines the strengths of decision trees with ensemble learning. The algorithm owns an excellent strength to handle various problems, such as regression and classification problems, and data missing problems, and maintains high performance on datasets with a large number of variables. As a consequence, it is a valuable tool in many research works.
Figure 2 depicts the regression process of the RF, which involves the generation of multiple decision trees produced by the bootstrap sampling method with replacement and random feature selection. Through each decision tree is independent, and it contributes to the prediction process. The final prediction of the Random Forest regression model is obtained by calculating the arithmetic mean of the predictions from each individual decision tree. This ensemble approach enhances the prediction accuracy and generalization performance of the model (Liaw & Wiener, 2002; Breiman, 2001). In addition to the number of decision trees, the maximum number of features in a single tree is also an important parameter that should be adjusted during the modeling process of the RF. Other key parameters include the minimum number of samples required to split a node and the minimum number of samples required to be at a leaf node. These parameters can be tuned to optimize the performance of the model for different applications.
There are three steps included in the algorithm of RF, and they can be performed as the following operations. First of all, the bootstrap resampling method is used to extract k samples from the original training set, so that the sample size of each sample is consistent with the original training set. Then, use the obtained k samples to build a decision tree respectively model to obtain k different classification results; the final result is obtained by arithmetically averaging the results of each decision tree.
RF constructs different training sets by randomly extracting samples from the original training sets. Accordingly, there will be differences in the generation of classification models using training sets, which can improve the classification performance of combined classification models as a whole. The k samples obtained from sampling are used to construct classification models respectively. Each classifier will correspond to an output result and vote all the results obtained. The final result is obtained by arithmetic averaging the results of each decision tree.
The category with the most votes shall be regarded as the final classification result. The final classification decision is shown in Eq. (1):
Where, represents the output structure of the classification model and represents the result of a single decision tree.
Figure 3.
Application diagram of RF.
Figure 3.
Application diagram of RF.
RF is an integrated algorithm that combines multiple decision trees, and the final output of its regression is the average output of all tree numbers. Randomness mainly embodies two aspects: random selection of data and random selection of features. The random selection of data is to build a data subset from the returned sampling in the original data set and use the data subset to build a sub-decision tree. This data selection method is called Bootstrap sampling. Then, random feature selection is introduced in the training process, and the optimal feature is selected from these features. This operation generates a large number of decision trees, which are unrelated to each other and each of which participates in the judgment process. Different trees are good at choosing different features, that is, the input data can be judged from different angles. Finally, the results of each decision tree are summarized to jointly determine the final output to improve the diversity of the system, thus improving the accuracy of the prediction model. RF is often used in real analysis. Compared with a single decision tree, this method can easily reduce model errors and has better generalization performance. The random forest mainly adjusts two parameters: the number of decision trees and the maximum number of features in a single tree. Because it is not sensitive to outliers in the data set and does not require too much parameter tuning, the setting of hyper-parameters will not fluctuate greatly for this method. Even if default parameters are used, better results can be achieved and it is robust.
2.1.2. Support Vector Machine Algorithm
The algorithm of Support Vector Machine (SVM) is an important method in machine learning, specifically for tasks such as classification, regression, and anomaly detection (Cortes & Vapnik, 1995). The fundamental concept of SVM is to identify the optimal hyperplane that can effectively separate data points belonging to different classes while maximizing the margin between the closest points of the two classes. SVM is known for its high prediction accuracy, robustness, and generalization ability. Another advantage of the SVM algorithm is the ability to use kernel functions, which can transform non-linear problems into linear ones (Schölkopf et al., 2002). Linear, polynomial, and radial basis functions are the most commonly used kernel functions. The three-kernel functions are suitable for different types of data. The choice and parameters of the appropriate kernel function may put a significant influence on the performance of the SVM. Therefore, it is a key step how to select an appropriate kernel function in the SVM modeling process.
SVM is a popular method in supervised learning, and it has been shown to be effective even with small sample sizes. SVM is initially proposed for classifications and its success is due to its ability to find the optimal hyperplane that maximizes the margin between the closest points in different two classes.
To expand on the concept of maximizing the margin,
Figure 4 demonstrates that there are countless lines that can separate the data samples. However, only the one line with the maximum margin, which is represented by the distance between the two parallel dashed lines, will correctly divide the data. The points on this line are known as the support vectors and are the critical points used in determining the hyperplane. In practice, it is not always possible to find a hyperplane that perfectly separates the data, and the SVM may avoid some misclassifications in a certain degree through the use of a penalty parameter.
The goal of SVM is to find a hyperplane of n-dimensional space (n is the number of features) that can classify data points. In the sample space, the partition hyperplane can be written in the form of a generalized vector, which is described by the following linear equation:
Where x is the input vector, the vector in the sample set. W=(w1,w2,w3,…… ,wd) is a normal vector, representing the direction of the hyperplane, and each vector is an adjustable weight vector. b is the intercept, also known as bias, and represents how far the hyperplane is offset from the origin. Let’s call this plane (w,b). According to the calculation formula from point to line, the distance between any point x in the sample space and the hyperplane (w,b) can be written as:
Suppose that the hyperplane (w,b) can correctly classify training samples, that is, for (x
i,y
i)∈D, if y
i=1, w
Tx+b > 0. If y
i is equal to negative 1, w
Tx plus b is less than 0.
As shown in the figure, the distance of a point from the hyperplane can be expressed as the degree of certainty or accuracy of classification prediction. The sample closest to the hyperplane makes the equal sign in the above equation true. These are the support vectors.
Figure 5.
Application diagram of SVM.
Figure 5.
Application diagram of SVM.
SVM is a powerful and widely used machine learning model. It can deal with linear classification problems, can deal with nonlinear classification problems and outlier detection. It is one of the most popular machine learning models, especially suited for complex classification problems with small to medium data sets.
2.1.3. Extreme Gradient Boosting Algorithm
Boosting is an ensemble learning algorithm that combines multiple weak learners to create a powerful model. It keeps iteratively trains new models, and then focuses on the samples that were misclassified in the previous iterations. Boosting algorithms are divided into two main categories: gradient boosting and adaptive boosting. Extreme Gradient Boosting (XGB) is a type of gradient boosting algorithm that has been shown to outperform traditional gradient boosting techniques in many machine learning tasks. XGB sequentially combines base learners to improve the model’s accuracy. The algorithm works by adding decision trees in each iteration to fit the residuals in the previous iteration’s prediction (
Figure 6). During the construction of new decision trees, XGB considers the importance of each feature to optimize the model effectively.
If a base learner makes imperfect predictions due to inherent algorithmic flaws, another base learner can be used to compensate for the "imperfect parts". The key principle of XGB is built up based on the rule above. By adding multiple base learners, the algorithm can continuously refine these "imperfect parts" and produce an ensemble model with excellent predictive accuracy and generalization performance.
Boosting is a popular ensemble learning technique, and the Gradient Boosting Decision Trees (GBDT) is a well-known example (Friedman, 2001). The GBDT algorithm trains a sequence of decision trees by fitting each tree to the residual errors left by the previous trees. In this process, the overall model error is reduced and the powerful model is then created. XGB, another boosting algorithm, was developed by Chen Tianqi and others as an open-source project to minimize model bias in supervised learning (Chen & Guestrin, 2016). While the XGB is also a gradient boosting algorithm, it offers several improvements over the GBDT. For instance, the XGB employs a second-order Taylor expansion to calculate the objective error function (loss function), which enhances its ability to model complex relationships among variables. Additionally, the XGB introduces a regularization term in the loss function, which simplifies the model’s computations and enhances its predictive accuracy and generalization performance (Sagi & Rokach, 2021).
Extreme gradient lifting algorithm XGB is a tree-boosting algorithm. Compared with traditional gradient lifting decision tree algorithm, XGB algorithm innovatively makes use of the second derivative information of loss function. This makes the XGB converge faster, ensures higher solving efficiency, and also increases expansibility. Because as long as a function meets the condition of the second derivative, this function can be used as a custom cost function under appropriate circumstances. Another advantage of the XGB is that it draws on the column sampling method of the RF, which further reduces the computation and overfitting. Currently, the widespread adoption of XGB stems not only from its model’s impressive performance and rapid processing speed, enabling it to handle large-scale data computations, but also from its versatility in addressing both classification and regression problems effectively.
XGB algorithm can be expressed as:
Where K represents the number of trees, and represents the classification result of the i-th sample in the K-th tree.
As can be seen from the expression of the XGB, this model is a set of iterative residual trees, and one tree will be added in each iteration. Each tree will eventually form a model formed by the linear combination of K trees by learning the residual of the previous (K-1) trees.
The XGB provides a number of metrics, including the total number of times each feature is used for splitting Fcount, the average gain of each feature, and the average coverage rate of samples after each feature splits nodes, ensuring the construction of a decision tree. The accuracy of node segmentation in the process makes the XGB have good performance.
For any tree whose structure is determined, there are:
Where C is the feature set used by all trees to generate nodes, is the gain value generated after each tree is divided by features in C, and is the number of samples falling on each node when the tree is divided by features in C.
Figure 7.
Application diagram of XGB.
Figure 7.
Application diagram of XGB.
XGBoost, which stands for Extreme Gradient Boosting, is a powerful machine learning algorithm widely used for both classification and regression tasks. This algorithm is particularly effective at handling structured data and is renowned for its exceptional predictive performance. The application diagram in
Figure 7 likely outlines a specific use case or implementation of XGBoost in a particular context.
In essence, XGBoost is an ensemble learning technique that combines the predictions of multiple weak models, typically decision trees, to create a strong and highly accurate model. It operates by iteratively building and optimizing these decision trees to minimize a specified objective function, such as mean squared error for regression or log-loss for classification.
2.2. Reservoir Classification Method
Reservoir classification is one of the important tasks in subsurface energy exploration and development. It aims to classify the underground reservoirs to provide valuable information about reservoir properties and hydrodynamics characteristics. In order to achieve accurate reservoir classification, researchers have proposed various reservoir classification methods. In this paper, the classification method of reservoir quality index and the optimization inversion method of the spherical-tube model are used to classify and evaluate the reservoir in the study area.
2.2.1. Classification by Reservoir Quality Index
The complex pore structure and strong reservoir heterogeneity make it challenging to accurately evaluate reservoirs based on simple parameters, such as porosity and permeability alone. The reservoir quality index (RQI) has been introduced to provide a more comprehensive assessment of reservoirs. Both porosity and permeability are the essential macro parameters for evaluating reservoirs, but they do not always provide a complete picture of the pore structure. The RQI is a macro parameter that combines porosity with permeability to provide a more accurate representation of the pore structure and reservoir quality. The RQI approach has been widely adopted in petrophysical classification and reservoir characterization (Ma, 2010). The micro pore-throat structure can be also characterized in the RQI, which has been used to identify the complex pore structure and pore heterogeneity in reservoir evaluations.
Define the reservoir quality factor RQI:
Where: is the effective porosity, %;
K is permeability, 10-3 um2.
When combined with reservoir micro pore-throat structure parameters, the RQI facilitates smooth evaluations of the complete pore structure in reservoir classification through presenting the pore-throat structures and petrophysical properties within the reservoir (Amaefule et al., 1993; Anovitz et al., 2015). The RQI serves as an effective method in petrophysical classification and a characteristic parameter reflecting the micro pore-throat structure. The higher the RQI, the better the micro pore-throat structure in the reservoir. Therefore, the RQI is a crucial tool in reservoir evaluation and management, allowing for the identification of potential reservoirs for development and the optimization of production strategies.
The comprehensive study in this paper shows that good correspondence with the reservoir quality index exists in four types of reservoirs, as shown in
Table 1 below.
2.2.2. Optimization Inversion of NMR spherical-tubular model
The pore structures in real reservoir rocks are too complex to characterize them using any analytical methods. However, if reasonable approximation conditions are set up, specific models can be employed to approximate the pore structure of rocks. In this study, the sphere-tubular model is utilized to classify and categorize reservoir rocks. Provided that the rock pores can be approximated by the combination of tubular pore and spherical pore, the spherical-tubular model is used to perform our research in this paper. By analyzing the parameters of the sphere-tubular model, the characteristics of the rock pore structure can be estimated to provide valuable references for the evaluation and development of subsurface reservoirs.
The spherical-tubular model is a useful tool to approximate the complex pore structures embedded in real reservoir rocks. This model is based on the idea that rock pores can be approximated as a combination of spherical pores (the pore part) and tubular pores (the throat part) (Liu et al, 2006, 2014). The different matching ways between spherical pores and tubular pores represent the different types of pore structures. The model assumes that pores are sorted by their volume size and that each group contains a set of spherical-tubular models with identical shapes. The spherical-tubular models in different groups have similar shapes, but their different radii of tubular pore and spherical pore exhibit different configurations, which represent the different numerical relationships between the radii of tubular pore and spherical pore (
Figure 8). Not only is the spherical-tubular model a perfect representation of the actual pore structures embedded in rocks, but also it provides a useful approximation to understand and classify better the different types of reservoir rock.
After performing the spherical-tubular model optimization inversion, a mapping relationship between the parameters of the spherical-tubular model and core nuclear magnetic resonance (NMR) data or NMR logging data is established, as suggested by Liu et al (2006; 2014). The pore shapes within the rock are assumed to be sorted in terms of their volume size, and each pore component contains different spherical-tubular models. The data inversion of NMR echo is performed in the transverse time distribution of NMR, the parameters of the spherical-tubular model can be determined using Eq. (10) and Eq. (11).
Where, T2i is the i-th distribution point value of echo signal inversion, ms;
Rs, Rc, Re, are the spherical pore radius, tubular pore radius and equivalent spherical pore radius in um, respectively;
Cd is the radius ratio of the tubular pore to the spherical pore, and its dimensionless.
The optimization inversion based on the sphere-tube model is a powerful algorithm to understand the pore-throat structure in reservoirs. By conducting the optimization inversion, a great many parameters, can be obtained, which includes the optimized inversion T2 spectrum (ms), T2 spectrum of spherical pore (T2S, ms), T2 spectrum of tubular pore (T2C, ms), geometric mean of T2 spectrum (T2lm, ms), sorting coefficient of spherical pores (SPS, dimensionless), sorting coefficient of tubular pores (SPC, dimensionless), mean radius of spherical pore (dms, um), and mean radius of tubular pore (dmc, um), etc. The distributions and combinations of spherical and tubular pores are described by the parameters in reservoirs from different perspectives, allowing for a more comprehensive understanding of the pore-throat structure.
In order to evaluate the reservoir, we selected the parameters derived from the optimized inversion to conduct our research. The parameters include the geometric mean of the T2 spectrum (T2lm, ms), sorting coefficient of spherical pores (SPS, dimensionless), sorting coefficient of tubular pores (SPC, dimensionless), mean radius of spherical pores (dms, um), and mean radius of tubular pores (dmc, um), and they provide a comprehensive evaluation of the reservoir pore-throat structure from different perspectives.