ARTICLE | doi:10.20944/preprints202012.0273.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Bootstrap; Bayesian nonparamteric learning; Ensemble Models
Online: 11 December 2020 (10:22:05 CET)
Bootstrap resampling techinques, introduced by Efron and Rubin, can be presented in a general Bayesian framework, approximating the statistical distribution of a statistical functional φ(F), where F is a random distribution function. Efron’s and Rubin’s bootstrap procedures can be extended introducing an informative prior through the Proper Bayesian bootstrap. In this paper different bootstrap techniques are used and compared in predictive classification and regression models based on ensemble approaches, i.e. bagging models involving decision trees. Proper Bayesian bootstrap, proposed by Muliere and Secchi, is used to sample the posterior distribution over trees, introducing prior distributions on the covariates and the target variable. The results obtained are compared with respect to other competitive procedures employing different bootstrap techniques. The empirical analysis reports the results obtained on simulated and real data.
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Neural Networks; Machine Learning; Bootstrap; Resampling; Algorithms
Online: 22 March 2021 (16:09:04 CET)
Neural networks present the characteristics that the results are strongly dependent on the training data, the weight initialisation, and the hyper-parameters chosen. The determination of the distribution of a statistical estimator, as the Mean Squared Error (MSE) or the accuracy, is fundamental to evaluate the performance of a neural network model (NNM). For many machine learning models, as linear regression, it is possible to analytically obtain information as variance or confidence intervals on the results. Neural networks present the difficulty of being not analytically tractable due to their complexity. Therefore, it is impossible to easily estimate distributions of statistical estimators. When estimating the global performance of an NNM by estimating the MSE in a regression problem, for example, it is important to know the variance of the MSE. Bootstrap is one of the most important resampling techniques to estimate averages and variances, between other properties, of statistical estimators. In this tutorial, the application of two resampling (including bootstrap) techniques to the evaluation of neural networks’ performance is explained from both a theoretical and practical point of view. Pseudo-code of the algorithms is provided to facilitate their implementation. Computational aspects, as the training time, are discussed since resampling techniques always require to run simulations many thousands of times and, therefore, are computationally intensive. A specific version of the bootstrap algorithm is presented that allows the estimation of the distribution of a statistical estimator when dealing with an NNM in a computationally effective way. Finally, algorithms are compared on synthetically generated data to demonstrate their performance.
ARTICLE | doi:10.20944/preprints202011.0239.v1
Subject: Earth Sciences, Atmospheric Science Keywords: Sava Depression; bootstrap; geostatistics; disposal formation water; porosity
Online: 6 November 2020 (10:33:27 CET)
In deep geological analysis of data, these are input data that are few and include a small set of data. In a small set of case data, it is necessary to obtain reliable data of individual geological variables from this type of data. The paper analyzes the possibility of applying the bootstrap method on variables that are important in the exploration and production of hydrocarbons. The variables analyzed were the following: porosity and total costs of disposal formation water. The case study was made on the data of reservoir "K", field "B" located in the western part of the Sava Depression. The analysis of the results showed the possibility of applying the bootstrap method in the analysis of deep geological data with the application of three different sizes of resampling dataset.
ARTICLE | doi:10.20944/preprints202107.0666.v1
Subject: Earth Sciences, Atmospheric Science Keywords: hail prevention; non-normal distributions; permutation; bootstrap; confidence intervals
Online: 29 July 2021 (14:21:23 CEST)
Grossversuch IV is a large and well documented experiment on hail suppression by silver iodide seeding. The original 1986 evaluation remained vague, although indicating a tendency to increase hail when seeding. The strategy to deal with distributions of hail energy far from normal was not optimal. The present re-evaluation sticks to the question asked and avoids both misleading transformations and unsatisfactory meteorological predictors. The raw data show an increase by about a factor of 3 for the hail energy when seeding. This is the opposite of what seeding is supposed to do. The probability to obtain such a result by chance is below 1%, calculated by permutation and bootstrap techniques applied on the raw data. Confidence intervals were approximated by bootstrapping as well as by a new method called "correlation imposed permutation" (CIP).
ARTICLE | doi:10.20944/preprints201911.0129.v1
Subject: Social Sciences, Economics Keywords: global emission reduction; trade; FDI; BRICS countries; Bootstrap ARDL
Online: 12 November 2019 (15:50:57 CET)
We used the Bootstrap ARDL method to test the relationship between the export trades, FDI and CO2 emissions of the BRICS countries. We found that China's foreign direct investment and the lag one period of CO2 emissions have a cointegration on exports. South Africa's foreign direct investment and CO2 emissions have a cointegration relationship with the lag one period of exports, and South Africa's the lag one period of exports and foreign direct investment have a cointegration relationship with the lag one period of CO2 emissions. But whether it is China or South Africa, these three variables have no causal relationship in the long-term. Among the variables of other BRICS countries, Russia is the only country showed degenerate case #1 in McNown et al. mentioned in their paper. When we examined short-term causality, we found that CO2 emissions and export trade showed a reverse causal relationship, while FDI and carbon emissions were not so obvious. Export trade has a positive causal relationship with FDI. Those variables are different from different situations and different countries.
ARTICLE | doi:10.20944/preprints201812.0291.v1
Subject: Earth Sciences, Oceanography Keywords: footprint, constrained Least square, Bootstrap, SST, AMSR-E, MODIS
Online: 24 December 2018 (15:40:37 CET)
This study was undertaken to derive and analyze the Advanced Microwave Scanning Radiometer - EOS (AMSR-E) sea surface temperature (SST) footprint associated with the Remote Sensing Systems (RSS) Level-2 (L2) product. The footprint, in this case, is characterized by the weight attributed to each 4 4 km square contributing to the SST value of a given AMSR-E pixel. High-resolution L2 SST fields obtained from the MODerate-resolution Imaging Spectroradiometer (MODIS), carried on the same spacecraft as AMSR-E, are used as the sub-resolution “ground truth“ from which the AMSR-E footprint is determined. Mathematically, the approach is equivalent to a linear inversion problem, and its solution is pursued by means of a constrained least square approximation based on the bootstrap sampling procedure. The method yielded an elliptic-like Gaussian kernel with an aspect ratio 1.58, very close to the AMSR-E 6.93GHz channel aspect ratio, 1.7. (The 6.93GHz channel is the primary spectral frequency used to determine SST.) The semi-major axis of the estimated footprint is found to be alignedwith the instantaneous field-of-view of the sensor as expected fromthe geometric characteristics of AMSR-E. Footprintswere also analyzed year-by-year and as a function of latitude and found to be stable – no dependence on latitude or on time. Precise knowledge of the footprint is central for any satellite-derived product characterization and, in particular, for efforts to deconvolve the heavily oversampled AMSR-E SST fields and for studies devoted to product validation and comparison. A preliminarly analysis suggests that use of the derived footprint will reduce the variance between AMSR-E and MODIS fields compared to the results obtained.
ARTICLE | doi:10.20944/preprints202009.0699.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: SVM; MRMR; Bootstrap; Genes; Gene Expression; Biological Relevance; Subject Classification
Online: 29 September 2020 (09:09:52 CEST)
Selection of biologically relevant genes from high dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was done on a single high-dimensional expression data, which leads to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining Support Vector Machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes are selected through statistical significance values computed using a non-parametric test statistic under a bootstrap based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e. subject classification, biological relevant criteria based on quantitative trait loci, and gene ontology. Our analytical results showed that the proposed approach selects genes that are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter, and wrapper methods of gene selection.
ARTICLE | doi:10.20944/preprints201607.0001.v1
Subject: Social Sciences, Finance Keywords: PUN, artificial intelligence models, regression tree, bootstrap aggregation, forecasting error
Online: 2 July 2016 (03:48:36 CEST)
Electricity price forecasting has become a crucial element for both private and public decision-making. This importance has been growing since the wave of deregulation and liberalization of energy sector worldwide late 1990s. Given these facts, this paper tries to come up with a precise and flexible forecasting model for the wholesale electricity price for the Italian power market on an hourly basis. We utilize artificial intelligence models such as neural networks and bagged regression trees that are rarely used to forecast electricity prices. After model calibration, our final model is bagged regression trees with exogenous variables. The selected model outperformed neural network and bagged regression with single price used in this paper, it also outperformed other statistical and non-statistical models used in other studies. We also confirm some theoretical specifications of the model. As a policy implication, this model might be used by energy traders, transmission system operators and energy regulators for an enhanced decision-making process.
ARTICLE | doi:10.20944/preprints201806.0226.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: autogenous intelligence; bootstrap fallacy; recursive self-improvement; self-modifying software; singularity
Online: 14 June 2018 (08:53:23 CEST)
Toby Walsh in “The Singularity May Never Be Near” gives six arguments to support his point of view that technological singularity may happen but that it is unlikely. In this paper, we provide analysis of each one of his arguments and arrive at similar conclusions, but with more weight given to the “likely to happen” probability.
ARTICLE | doi:10.20944/preprints202010.0016.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Bivariate Hermite distribution; Goodness-of-fit; Empirical probability generating function; Bootstrap distribution estimator
Online: 1 October 2020 (13:25:38 CEST)
This paper studies the goodness of fit test for the bivariate Hermite distribution. Specifically, we propose and study a Cramér-von Mises-type test based on the empirical probability generation function. The bootstrap can be used to consistently estimate the null distribution of the test statistics. A simulation study investigates the goodness of the bootstrap approach for finite sample sizes.
ARTICLE | doi:10.20944/preprints201807.0299.v1
Subject: Social Sciences, Business And Administrative Sciences Keywords: commuting stress; turnover intention; life satisfaction; mediation model; demographics; ANOVA; hierarchical regression; bootstrap; Turkey
Online: 17 July 2018 (09:49:16 CEST)
Using hierarchical regression analysis within a mediation model framework, the present study explores direct and indirect (through life satisfaction) causal impacts of commuting stress on turnover intention of employees from 29 business organizations in six populous cities of Turkey. A semi-random heterogeneous sample of 214 employees with different demographics was surveyed in winter and summer times for also capturing seasonal variations of variables. The results supporting the partial mediating role of life satisfaction in the positive relationship between commuting stress and turnover intention infer that commuting stress induces turnover intention directly and indirectly (by reducing life satisfaction). The analysis of variance reveals that demographic characteristics of employees such as gender, marital status, age, and family size together with commuting type and commuting duration matter for their perceived commuting stress, life satisfaction, and turnover intention levels. Commuting stress perception is relatively higher in summer time whereas the other magnitudes are consistently and significantly invariant between two survey implementations. The study concludes with a call for the consideration of commuting stress and life satisfaction together with environmental and demographic factors when analyzing the antecedents and consequences of employee turnover intention.
ARTICLE | doi:10.20944/preprints201608.0214.v1
Subject: Earth Sciences, Geoinformatics Keywords: land-use/land-cover (LULC); uncertainty; bootstrap resampling; chi-square threshold; class probability vector (CPV); entropy
Online: 26 August 2016 (11:56:26 CEST)
Supervised land-use/land-cover (LULC) classifications are typically conducted using class assignment rules derived from a set of multiclass training samples. Consequently, classification accuracy varies with the training data set and is thus associated with uncertainty. In this study, we propose a bootstrap resampling and reclassification approach that can be applied for assessing not only the uncertainty in classification results of the bootstrap-training data sets, but also the classification uncertainty of individual pixels in the study area. Two measures of pixel-specific classification uncertainty, namely the maximum class probability and Shannon entropy, were derived from the class probability vector of individual pixels and used for the identification of unclassified pixels. Unclassified pixels that are identified using the traditional chi-square threshold technique represent outliers of individual LULC classes, but they are not necessarily associated with higher classification uncertainty. By contrast, unclassified pixels identified using the equal-likelihood technique are associated with higher classification uncertainty and they mostly occur on or near the borders of different land-cover.
ARTICLE | doi:10.20944/preprints201612.0002.v1
Subject: Mathematics & Computer Science, Applied Mathematics Keywords: change point; estimation; consistency; panel data; short panels; boundary issue; structural change; bootstrap; non-life insurance; change in claim amounts
Online: 1 December 2016 (10:02:03 CET)
Panel data of our interest consist of a moderate number of panels, while the panels contain a small number of observations. An estimator of common breaks in panel means without a boundary issue for this kind of scenario is proposed. In particular, the novel estimator is able to detect a common break point even when the change happens immediately after the first time point or just before the last observation period. Another advantage of the elaborated change point estimator is that it results in the last observation in situations with no structural breaks. The consistency of the change point estimator in panel data is established. The results are illustrated through a simulation study. As a by-product of the developed estimation technique, a theoretical utilization for correlation structure estimation, hypothesis testing, and bootstrapping in panel data is demonstrated. A practical application to non-life insurance is presented as well.
ARTICLE | doi:10.20944/preprints202008.0330.v1
Subject: Keywords: Skin Detection; Color Space Model; Aggregated Channel Features (ACF) Detector; Histogram Oriented Gradient (HOG) Features Detection; Bootstrap Aggregation Decision Tree Classifier; Spot Detection
Online: 15 August 2020 (03:28:51 CEST)
Human Face and facial parts are the most significant parts as it reveals a person’s true identity. It plays an important role in various biometric applications like crowd analysis, human tracking, photography, cosmetic surgery, etc. There are many techniques are available to detect a facial image. Among them, skin detection is the most popular one. The aim of this paper is to detect first the person's identity from facial image and finally check any spot present the the detected person. The first step is to detect the maximum skin region based on a combination method of RGB and HSV color space model. Next it is to verify the skin areas of human through machine learning approach. The Aggregated Channel Features (ACF) detector is used to identify the different facial parts like eye pairs, nose, and mouth. Bootstrap aggregation decision tree classifier is applied to classify the person’s identity based on Histogram Oriented Gradient (HOG) features value. The experimental results show that the proposed method gives the average 97% accuracy.