Application of the bootstrap method in data analysis of Upper Miocene sandstone reservoirs of the western part of the Sava Depression

In deep geological analysis of data, these are input data that are few and include a small set of data. In a small set of case data, it is necessary to obtain reliable data of individual geological variables from this type of data. The paper analyzes the possibility of applying the bootstrap method on variables that are important in the exploration and production of hydrocarbons. The variables analyzed were the following: porosity and total costs of disposal formation water. The case study was made on the data of reservoir "K", field "B" located in the western part of the Sava Depression. The analysis of the results showed the possibility of applying the bootstrap method in the analysis of deep geological data with the application of three different sizes of resampling dataset.


Introduction
The bootstrap is convenient method to support design of experiment analysis when non-Gaussian data are met i.e. sample is small or a normality test failed [1]. It has a wide application in all branches of science. Authors [2] applied the bootstrap method to determine assessing metasomatic mass and volume changes for deep crustal hydrothermal alteration of marble. [3] test a model-free bootstrap (MFB) method that can be used for any time series of data. The author's [4,5] applied bootstrap method in risk management. Bootstrap method also applied for measure agglomeration of manufacturing industries at the county level in the United States [6]. In the fishing industry, the application of the bootstrap method has applied to the example of the fish population in Iceland [7]. A bootstrap method for estimating uncertainty of water quality trends for Susquehanna River at Conowingo (U.S.A.) used by [8]. Bootstrap analyses performed for three rainfall events in the upstream of the Qingjian River basin, a sub-basin of the Yellow River [9]. The bootstrap interval data used for 24 hours' annual maximum precipitation records obtained from 21 meteorological stations in Mexico [10]. [11] also use it in production processes capability assessment. When analyzing geological data in a large number of cases, it is a small set of numbers, especially when analyzing deep geological data. So far, the "Jack-knifing" method has been successfully applied in the Croatian part of the Pannonian Basin in order to increase the sample size [12,13]. Therefore, the application of the statistical tool of the bootstrap method is very important in the analysis of the obtained geological data. These papers used as a theoretical basis for calculating bootstrap interval values for examples of costs of disposal formation water and porosity.

Mathematical settings of the bootstrap method
Before calculating bootstrap statistics, it is necessary to resample the data from the input data set. For nonparametric resampling, it is not necessary to know the type of distribution, model and sample size. During the resample process, the size of the input data set does not change, but within the same input data set is randomly changed with the same data from the input set ( Figure 1).

Figure 1.
Creating a bootstrap data set by resampling process [14] There are several bootstrap methods described in the literature, given the usual deep geological data the most suitable method to apply is smooth bootstrap. For each new set of data obtained by the random selection method, the arithmetic mean is calculated by [15,16]: where are: ̅ -mean sample value (after resampling), Xi-data set after resampling, n-sample size.
where are: Sm-standard deviation of bootstrap, ̅ -arithmetic bootstrap mean, ̅ -mean sample value (after resampling) of an individual data set, m-number of resampling data set.
By calculating the mean and standard deviation of the bootstrap data set, the last step is an interval estimate of the data set expectations: where are: Sm-standard deviation of bootstrap, ̅ -arithmetic bootstrap mean, z-value from the normal distribution, m-number of resampling data set. The most common reliability of the estimate of the interval is 90% or 95%. The procedure repeated from expressions (1) to (4) according to the number of resampling samples made, in order to obtain a reliable interval of data set values. The most common number of resampling data sets is 1000 [20] and 2000 [20,21] to ensure a 95 percent confidence interval of data set.

Research area and geological settings
The Sava Depression is an integral part of the Croatian Pannonian Basin System (CPBS). The subject of the analysis is field B, i.e. reservoir "L" which is located in the western part of the Sava Depression ( Figure 2). The subject of the analysis is field B, i.e. reservoir "K" which is located in the western part of the Sava Depression ( Figure 2).

Figure 2.
Geographical position of the analysed field "B" [22] In order to understand the development of the typical sandstone reservoirs Neogene age in Northern Croatia (Croatian part of Pannonian pool system, abbreviated CPBS) is shown here, briefly, depositional model of the area. Tectonics is the cause of today's shape and diversity of deposits, and the characteristic regional provision of structures northwest -southeast. As the location of CPBS depressions was always on the edge of the Pannonian basin system (PBS) area, their area was mostly covered by a shallow marine and lake environment, often isolated from larger bodies of water in the central part of the PBS. Clastic sedimentary environments are dominant throughout the CPBS area, from the Baden to the Lower Pontian. At that time, extremely large amounts of sandy and silty detritus were transmitted by turbidite currents to the area. In periods when such activity of turbidite currents did not exist, mainly different species of marl were deposited, both in the marine and lake environment. Therefore, marls are considered rocks of a "quiet" environment, i.e. an environment of "low" energy. The consequence of such developments is today's hydrocarbon deposits discovered in Upper Pannonian and Lower Pontian sandstones (Figure 3). All reservoirs are of turbidite origin and are therefore lithological constructed of several sandstone lithofacies. Most of the wells drilled only rocks up to and including the Klostar Ivanic formation, because hydrocarbon reservoirs have been proven within it, and only a few have drilled deeper reservoirs. Therefore, a typical geological column is limited only to the Lower Pontian reservoirs.
Development drilling activities in field "B" began in the 1960s. The analysed data on porosity were obtained during field development drilling phase, while data on the costs of disposal formation water were obtained from the production data.

Results and discussion
Data on the costs of disposal formation water were processed and taken from papers [24][25][26], while the value of porosity of reservoir "K" taken from paper [27]. The basic statistics of the input data are shown in Table 1. The analysed variables with respect to the size of the input data entry are classified into a small dataset. What makes this data suitable for applying the bootstrap method. For these variables, histograms were obtained by the bootstrap method in the case of 500, 1000 and 2000 number of resampling data sets are shown in Figure 4. As can be seen from Figure 3 for all cases of data resampling, the obtained histograms describe the normal distribution curve. The normality of the obtained data confirmed by the correlation coefficient for normality obtained from the graphic normality test (Table 2). Table 2 also shows the estimated 95% confidence interval for the porosity and total costs of disposal formation water (CDFW) variables. The correlation coefficient for the normality of the obtained sets of numbers is high (porosity over 0.99, CDFW over 0.98). The differences in the correlation coefficient between three of resampling data sets are negligible because it is a matter of changing the value only in the third decimal place.
Looking at this criterion, all three cases of resampling data are satisfactory in the case of both analysed variables. In terms of the confidence interval, the results obtained for the normality correlation coefficient are very similar. For the variables CDFW and porosity, value changes are detected in the second decimal place. It can be observed that in the obtained confidence interval for both variables the values of the lower and upper intervals do not include the minimum and maximum values from the initial data set. This is quite understandable because it is also a feature of the bootstrap method because in calculating the mean value of the sample of each individual new set of numbers, there will never be a result of the mean value of the minimum or maximum of the input data set. The correct choice of resampling size for the application of the bootstrap method with respect to the confidence interval was obtained as large a range in the interval as in the case of the variables analysed in this paper for porosity (n = 2000) and CDFW (n = 1000). Because the larger the range in the confidence interval, the closer the interval value of the input data set is. Therefore, when analysing the application of the bootstrap method, it is necessary to analyse the impact of resample size on the results and their interpretability. As can be seen from this analysis of variables, 3 different resampling sizes are enough to make the right decisions when analysing data.

Conclusions
The following conclusions of the paper are: -The Bootstrap method is applicable as an additional tool in the analysis of a small input set of variables.
-To apply the bootstrap method, it is sufficient to perform an analysis of three different sizes of resampling dataset.
-In the case of equality of the correlation coefficient of bootstrap samples, the larger the range of confidence intervals of the obtained samples, the more acceptable the final result of the bootstrap analysis. In case of inequality, the sample with the highest correlation coefficient and the highest confidence interval selected, in case of discrepancy; the sample with the highest correlation coefficient is taken into account.
-The porosity of the "K" deposit is 0.2182 to 0.2506 at a resampling size of 2000, while the CDFW value for field "B" is 2.31 USD/m 3 -2.69 USD/m 3 at a resampling size of 1000.
-Given the amount of data that can be collected during the analysis of deep geological data, and such bootstrap data with a 95% interval are good information in analysis and obtaining a clearer picture when mapping deep geological variables.
-Applying the bootstrap method provides a safer insight into the uncertainty of the reservoir volume and the possible costs of its production.