COMMENT | doi:10.20944/preprints201608.0166.v1
Subject: Social Sciences, Geography Keywords: Regional inequality; Multilevel regression; Markov chain; Guizhou Province
Online: 17 August 2016 (12:58:58 CEST)
This study analyses regional development in one of the poorest provinces in China, Guizhou Province, between 2000 and 2012 using a multiscale and multi-mechanism framework. In general, regional inequality has been declining since 2000. In addition, economic development in Guizhou Province presented spatial agglomeration and club convergence, which shows how the development pattern of one core area, two-wing areas and a contiguous area at the edge of the province have been developed between 2006 and 2012. Multilevel regression analysis revealed that industrialization and investment level were the primary driving forces of regional economic disparity in Guizhou Province. The influences of marketization and decentralization on regional economic disparity were relatively weak. Investment level reinforced regional economic disparity and the development of core-periphery structure in the province. However, investment level actually weakened the regional economic disparity in Guizhou Province when the variable of time was considered. In addition, both the topography and urban–rural differentiation were the two main reasons for forming a core-periphery structure in Guizhou Province.
ARTICLE | doi:10.20944/preprints202011.0297.v1
Online: 10 November 2020 (10:00:37 CET)
In this paper, we present a relapse based demonstrating way to deal with investigate various arrangement MTC information. A commonplace use of this displaying approach incorporates three stages: first, define a model that approximates the connection between quality articulation and trial factors, with boundaries consolidated to address the exploration premium; second, utilize least-squares and assessing condition methods to gauge boundaries and their relating standard blunders; third, register test insights, P-qualities and NFD as proportions of factual criticalness. The benefits of this methodology are as per the following. To begin with, it tends to the exploration interest in a particular, precise way, and maximally uses all the information and other important data. Second, it represents both orderly and irregular varieties related with the information, and the consequences of such examination give not just quality explicit data applicable to the exploration objective, yet additionally its dependability, in this way helping agents to settle on better choices for subsequent investigations. Third, this methodology is truly adaptable, and can undoubtedly be stretched out to different sorts of MTC considers or other microarray explores by detailing various models dependent on the test plan of the investigations.
ARTICLE | doi:10.20944/preprints201712.0032.v1
Subject: Engineering, Energy & Fuel Technology Keywords: statistics; uncertainty; regression; sampling; outlier; probabilistic
Online: 6 December 2017 (06:36:02 CET)
Energy Measurement and Verification (M&V) aims to make inferences about the savings achieved in energy projects, given the data and other information at hand. Traditionally, a frequentist approach has been used to quantify these savings and their associated uncertainties. We demonstrate that the Bayesian paradigm is an intuitive, coherent, and powerful alternative framework within which M&V can be done. Its advantages and limitations are discussed, and two examples from the industry-standard International Performance Measurement and Verification Protocol (IPMVP) are solved using the framework. Bayesian analysis is shown to describe the problem more thoroughly and yield richer information and uncertainty quantification than the standard methods while not sacrificing model simplicity. We also show that Bayesian methods can be more robust to outliers. Bayesian alternatives to standard M&V methods are listed, and examples from literature are cited.
ARTICLE | doi:10.20944/preprints201608.0202.v2
Subject: Earth Sciences, Environmental Sciences Keywords: HR satellite remote sensing; urban fabric vulnerability; UHI & heat waves; landsat & MODIS sensors; LST & urban heating; segmentation & objects classification; data mining; feature extraction & selection; stepwise regression & model calibration
Online: 26 October 2021 (13:11:23 CEST)
Densely urbanized areas, with a low percentage of green vegetation, are highly exposed to Heat Waves (HW) which nowadays are increasing in terms of frequency and intensity also in the middle-latitude regions, due to ongoing Climate Change (CC). Their negative effects may combine with those of the UHI (Urban Heat Island), a local phenomenon where air temperatures in the compact built up cores of towns increase more than those in the surrounding rural areas, with significant impact on the quality of urban environment, on citizens health and energy consumption and transport, as it has occurred in the summer of 2003 on France and Italian central-northern areas. In this context this work aims at designing and developing a methodology based on aero-spatial remote sensing (EO) at medium-high resolution and most recent GIS techniques, for the extensive characterization of the urban fabric response to these climatic impacts related to the temperature within the general framework of supporting local and national strategies and policies of adaptation to CC. Due to its extension and variety of built-up typologies, the municipality of Rome was selected as test area for the methodology development and validation. First of all, we started by operating through photointerpretation of cartography at detailed scale (CTR 1: 5000) on a reference area consisting of a transect of about 5x20 km, extending from the downtown to the suburbs and including all the built-up classes of interest. The reference built-up vulnerability classes found inside the transect were then exploited as training areas to classify the entire territory of Rome municipality. To this end, the satellite EO HR (High Resolution) multispectral data, provided by the Landsat sensors were used within a on purpose developed "supervised" classification procedure, based on data mining and “object-classification” techniques. The classification results were then exploited for implementing a calibration method, based on a typical UHI temperature distribution, derived from MODIS satellite sensor LST (Land Surface Temperature) data of the summer 2003, to obtain an analytical expression of the vulnerability model, previously introduced on a semi-empirical basis.
ARTICLE | doi:10.20944/preprints202008.0058.v1
Online: 3 August 2020 (00:37:42 CEST)
House is the haven that keeps people from natural and human conditions, it gives them trust, safety, and steadiness. It is one of the most basic human needs this became a serious function which cities offer, and became one of the most important aspects which caught urban researchers interest, they take into consideration a wide range of architectural, social, and economic indicators. The study aims to provide an overall conception of Rwandz residential functions, using a collection of parameters and some GIS and statistical techniques, to help establish plans and future projects to improve the growth of this city and other towns and cities in that area. The study found that the old parts of Rwandz city which are located in the core, differ from the outer parts which are relatively newer in many properties, generally, the core is more densely populated than the outer, bigger family size, more illiteracy, and unemployment, few incomes, older houses, smaller houses, in the opposite of the outer parts. Besides, the study tested the correlation coefficient between the criteria; it found some strong statistical relationships between them, which reflected some real-life properties of the residential function. Lastly, the study designed a regression model to predict the main residential function criteria.
ARTICLE | doi:10.20944/preprints202106.0497.v1
Subject: Earth Sciences, Atmospheric Science Keywords: Ecosystem services; Benefit transfer; Meta-analysis; Meta-regression function.
Online: 21 June 2021 (10:04:14 CEST)
Meta-analysis has increasingly been used to synthesize the ecosystem services literature, with some testing of the use of such analyses to transfer benefits. These are typically based on local primary studies. However, meta-analyses associated with ecosystem services are a potentially powerful tool for transferring benefits, especially for environmental assets for which no primary studies are available. In this study we use the Ecosystem Service Valuation Database (ESVD), which brings together 1350 value estimates from more than 320 studies around the world, to estimate meta-regression functions for provisioning, regulating & maintenance and cultural ecosystem services across 12 biomes. We tested the reliability of these meta-regression functions and found that even using variables with high explanatory power, transfer errors could still be large. We show that meta-analytic transfer performs better than simple value transfer and, in addition, that local meta-analytical transfer (i.e. based on local explanatory variable values) provides more reliable estimates than global meta-analytical transfer (i.e. based on mean global explanatory variable values). Thus, we conclude that when taking into account the characteristics of the study area under analysis, including explanatory variables such as income, population density and protection status, we can determine the value of ecosystem services with greater accuracy.
SHORT NOTE | doi:10.20944/preprints202011.0284.v1
Subject: Medicine & Pharmacology, Allergology Keywords: Covid19; Best fit regression; Hyperbolic fit; Recovery rate; Reproducibity of research
Online: 9 November 2020 (16:15:21 CET)
In this report the positive cases of Covid19 in India with effect from 7th September ,2020 to 25th October ,2020 are analysed for statistical relevance . The scattered data are used to find out a model equation correlating two variables number of recovered Covid –patient with an interval of regular seven days . The best fit regression analysis shows a significant correlation of Pearson coefficient (r) with standard error ( s ) with a probable lower mortality rate . Finally the limitations of this analysis is discussed herewith .
ARTICLE | doi:10.20944/preprints202103.0530.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Multiblock data analysis; redundancy analysis; PLS regression; super- vised methods; multicolinearity
Online: 22 March 2021 (12:25:20 CET)
Within the framework of multiblock data analysis, a unified approach of supervised methods is discussed. It encompasses multiblock redundancy analysis (MB-RA) and multiblock partial least squares (MB-PLS) regression. Moreover, we develop new supervised strategies of multiblock data analysis, which can be seen as variants of one or the other of these two methods. They are respectively refered to as multiblock weighted redundancy analysis (MB-WRA) and multiblock weighted covariate analysis (MB-WCov). The four methods are based on the determination of latent variables associated with the various blocks of variables. They are derived from clear optimization criteria whose aim is to maximize either the sum of the covariances or the sum of squared covariances between the latent variable associated with the response block of variables and the block latent variables associated with the various explanatory blocks of variables. We also propose indices to help better interpreting the outcomes of the analyses. The methods are illustrated and compared based on simulated and real datasets.
ARTICLE | doi:10.20944/preprints201608.0026.v1
Subject: Engineering, Civil Engineering Keywords: concrete; sustainability; regression analysis mix design; CO2 emission; cost
Online: 3 August 2016 (06:05:26 CEST)
As argued by ‘Declaration of Concrete Environment (2010)’ of Korea and ‘Declaration of Asian Concrete Environment (2011)’ of six Asian countries, concrete as a single material has lately shown extremely large impact on environmental issues such as climate change. Assessment of environmental impact from concrete material and production has considerable importance. Concrete is a major material used in the construction industry that emits a large amount of substances with environmental impacts during its life cycle. Accordingly, technologies for the reduction in and assessment of the environmental impact of concrete from the perspective of Life Cycle Assessment must be developed. At present, the studies in relation to greenhouse gas emission from concrete are being carried out globally as a countermeasure against climate change. In this study, a sustainable concrete mix design algorithm was designed using correlation analyses, and its carbon emission and cost reduction performances were assessed. Using correlation analyses, the concrete strength, w/b and s/a ratios, and CO2 emissions were identified as major variables of concrete mix design that influenced other variables. Also, this study aims to evaluate the CO2 emission reduction performance of the algorithm-deduced sustainable concrete mix design, and therefore, the CO2 emissions of the sustainable concrete mix design are compared with those of the actual concrete mix design applied to the construction of the office building A in South Korea.
ARTICLE | doi:10.20944/preprints202201.0408.v1
Subject: Medicine & Pharmacology, Nursing & Health Studies Keywords: Indonesia; islands cluster; multiple logistic regression; obesity; risk factor
Online: 27 January 2022 (06:53:58 CET)
Obesity has become a rising global health problem affecting adults’ quality of life. The objective of this study was to describe the prevalence of obesity in Indonesian adults based on the cluster of islands. The study was also aimed to identify the risk factors of obesity in each island cluster. This study analysed secondary data of Indonesian Basic Health Research 2018. Our data for analysis comprised 688,638 adults (>=15 years) randomly selected using proportionate to population size throughout Indonesia. We included 20 variables for sociodemographic and obesity-related risk factors for analysis. Obese status was defined using Body Mass Index (BMI) >= 27.5 kg/m2. Our current study defined seven major islands cluster as the unit analysis consisting of 34 provinces in Indonesia. Descriptive analysis was conducted to determine the characteristics of the population and to calculate the prevalence of obesity within provinces in each of the island’s clusters. Multivariate logistic regression analyses to calculate odds ratios (ORs) was performed using R version 3.6.3. The study results showed that all island clusters had at least one province with an obesity prevalence of more than 20%. Six out of twenty variables, comprising four diet factors (consumption of sweet food, high-salt food, meat food, and carbonated drinks) and two other factors (mental health disorders and smoking behaviour), varied across the island clusters. In conclusion, there was a variation of obesity prevalence of the provinces within and between island clusters. Variation of risk factors raised in each cluster island suggested the government rethink and reframe the intervention to address obesity.
ARTICLE | doi:10.20944/preprints202003.0088.v1
Subject: Engineering, Civil Engineering Keywords: Major ions; Physicochemical parameters; Pearson’s correlation matrix; Regression; Water Quality Index (WQI)
Online: 5 March 2020 (12:02:36 CET)
This work evaluates the surface water quality in terms of physico-chemical parameters of the Brahmani River, Odisha using statistical analysis involving the calculation of correlation coefficient and regression equation. Besides this, the work also highlights and draws attention towards the “Water Quality Index” in a simplified format which may be used at large and could represent the reliable picture of water quality. Surface water quality data is taken from OSPCB of various location i.e. Panposh D/S, Rourkela D/S, Rengali, Talcher U/S, Kamalanga D/S, Bhuban, Pattamundai and was assessed for summer, monsoon, winter for the years 2011, 2012, 2013, 2014 and 2015. Average of values, minimum of values and maximum of values of water quality parameters were obtained seasonally over the above mentioned years. Besides this, the standard deviation for the water quality parameters was also obtained for water quality parameters namely pH, Temperature, DO, TDS, Alkalinity, EC, Na+, Ca2+, Mg2+, K+, F-, Cl-, NO3-, SO42- and PO42-. Seasonal changes in various physical and chemical parameters were analysed.The values obtained were compared with the guideline values for drinking water by Bureau of Indian Standard (BIS). A systematic correlation and regression study is carried out for three seasons, showed linear relationship among different water quality parameters. This provides an easy and rapid method of monitoring water quality. Highly significant (0.8< r <1.0), moderately significant (0.6< r <0.8) and significant (0.5< r <0.6) correlations between the parameters have been worked out. High correlation coefficient has been observed between TDS,EC-Na+, Ca2+, Cl-, SO42- ; Na+- Cl-. From the collected quantities, certain parameters were selected to derive WQI for the variations in water quality of each designated sampling site. WQI of Brahmani River ranged from 36.7 to 44.1 which falls in the range of good quality of water.Panposh D/S and Rourkela D/S showed poor water quality in summer and winter season. It is shown that WQI may be a useful tool for assessing water quality and predicting trend of variation in water quality at differentlocations in the Brahmani River.
ARTICLE | doi:10.20944/preprints202006.0073.v1
Subject: Life Sciences, Virology Keywords: Epidemiology; SARS-CoV-2; Multivariable regression; Tuberculosis; Demography; Coronavirus; MMR vaccine
Online: 7 June 2020 (09:25:55 CEST)
COVID-19 pandemic that started in China has spread within 3 months to the entire globe. We tested the hypothesis that the vaccination against tuberculosis by BCG correlates with a better outcome for COVID-19 patients. Our analysis covers 55 countries complying with predetermined thresholds on the population size and number of deaths per million (DPM). We found a strong negative correlation between the years of BCG administration and the DPM along with the progress of the pandemic, corroborated by permutation tests. The results from multivariable regression tests with 23 economic, demographic, health-related, and pandemic restriction quantitative properties, substantiate the dominant contribution of BCG years to the COVID-19 outcomes. The analysis of countries according to an age-group partition reveals that the strongest correlation is attributed to the coverage in BCG vaccination of the young population (0-24 years). Furthermore, a strong correlation and statistical significance are associated with the degree of BCG coverage for the most recent 15 years, but no association was observed in these years for other broadly used vaccination protocols for measles and rubella. We propose that BCG immunization coverage, especially among the most recently vaccinated contributes to attenuation of the spread and severity of the COVID-19 pandemic.
COMMUNICATION | doi:10.20944/preprints202004.0445.v2
Subject: Keywords: COVID-19; Coronavirus; Respiratory Distress; Tobacco Smoking; Correlation Statistics; Conditional Probability; Regression; China; U.S.A.
Online: 27 July 2020 (05:59:51 CEST)
The novel COVID-19 disease is a contagious acute respiratory infectious disease whose causative agent has been demonstrated to be a new virus of the coronavirus family, SARS- CoV-2. Multiple studies have already reported that risk factors for severe disease include older age and the presence of at least one of several underlying health conditions. However, a recent physiopathological report and the French COVID-19 scientific council have postulated a protective effect of tobacco smoking. Thanks to a meta-analysis, we have been able to demonstrate the statistical significance in this regard of twelve series from China, France and in the US, reporting three different smoking status (current smoker,former smoker, with a smoking history) as well as disease severity (with respectively odds-ratio of 1.78 [1.08-3.10], 4.60 [3.13-7.17], 2.74 [0.63-5.89]). Subsequently and using a Bayesian approach we have established that past, and present smoking is associated with more severe COVID-19 outcomes. Finally, we refute claims linking general population smoking status (N=O(10^8) or O(10^9)) to much smaller disease course series (N=O(10^4)). The latter point in particular is presented to stimulate academic discussion, and must be further investigated by well-designed studies.
ARTICLE | doi:10.20944/preprints202104.0592.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: Flexible count regression; balanced discrete gamma distribution; deviance statistic; latent equidispersion; likelihood ratio
Online: 22 April 2021 (08:55:29 CEST)
Most existing flexible count regression models allow only approximate inference. Balanced discretization is a simple method to produce a mean-parametrizable flexible count distribution starting from a continuous probability distribution. This makes easy the definition of flexible count regression models allowing exact inference under various types of dispersion (equi-, under- and overdispersion). This study describes maximum likelihood (ML) estimation and inference in count regression based on balanced discrete gamma (BDG) distribution and introduces a likelihood ratio based latent equidispersion (LE) test to identify the parsimonious dispersion model for a particular dataset. A series of Monte Carlo experiments were carried out to assess the performance of ML estimates and the LE test in the BDG regression model, as compared to the popular Conway-Maxwell-Poisson model (CMP). The results show that the two evaluated models recover population effects even under misspecification of dispersion related covariates, with coverage rates of asymptotic 95% confidence interval approaching the nominal level as the sample size increases. The BDG regression approach, nevertheless, outperforms CMP regression in very small samples (n = 15 − 30), mostly in overdispersed data. The LE test proves appropriate to detect latent equidispersion, with rejection rates converging to the nominal level as the sample size increases. Two applications on real data are given to illustrate the use of the proposed approach to count regression analysis.
ARTICLE | doi:10.20944/preprints202108.0111.v2
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: SARS-COV-2; Bayesian regression; Changepoint detection; European football championship
Online: 16 August 2021 (10:57:52 CEST)
While Europe was beginning to deal with the resurgence of COVID-19 due to the Delta variant, the European football championship took place, June 11 - July 11, 2021. We studied the inversion in the decrease/increase rate of new SARS-COV-2 infections in the countries of the tournament, investigating the hypothesis of an association. Using a Bayesian piecewise regression with a Poisson Generalized Linear Model, we looked for a changepoint in the timeseries of the new SARS-COV-2 cases of each country, expecting it to appear not later than two to three weeks after the date of their first match. The two slopes, before and after the changepoint, were used to discuss the reversal from a decreasing to an increasing rate of the infections. For 17 out of 22 countries (77%) the changepoint came on average 14.97 days after their first match [95% CI 12.29 to 17.47]. For all those 17 countries, the changepoint coincides with an inversion from a decreasing to an increasing rate of the infections. Before the changepoint, the new cases were decreasing, halving on average every 18.07 days [95% CI 11.81 to 29.42]. After the changepoint, the cases begin to increase, doubling every 29.10 days [95% CI 14.12 to 49.78]. This inversion in the SARS-COV-2 case rate, happened during the tournament, provides evidence in favor of a relationship
ARTICLE | doi:10.20944/preprints201711.0138.v1
Subject: Keywords: sensitive analysis; variable fuzzy method; mutual entropy; stepwise regression analysis; mountain flash flood risk
Online: 21 November 2017 (09:28:07 CET)
Flash flood is one of the most significant natural disasters in China, particularly in mountainous area, causing heavy economic damage and casualties of life. Accurate risk assessment is critical to an efficient flash flood management. There are more than 530,000 small watersheds in 2058 counties in China where flash flood should be prevented. In practice, with limited fund and different risk levels, the priorities of each small watershed for flash flood prevention and control are also needed for an efficient flash flood management. This paper, take Licheng county in China as an example, aims to give out these priorities for management. First, sensitive indexes are identified among index system, which includes 9 indexes based on underlying surface characteristics of small watershed in hilly region. Second, the range of each index and the rank division of each index for evaluation are determined. Based on the rank divisions for evaluation, the flash flood risk grade eigenvalue (H) is calculated by Variable Fuzzy Method (VFM ) using 1000 samplings generated by Latin hypercube sampling method. Third, the key sensitivity factors that affect flash flood risk grade eigenvalue (H) are assessed by two different global sensitivity analysis methods -- stepwise regression analysis and mutual entropy. Both results indicate that watershed slope (S) is the most sensitive factor; the second is antecedent precipitation index (CN); while other factors are slightly different sensitive in sequence. This study shows that stepwise regression analysis and mutual information analysis are appropriate for the sensitivity analysis of mountain flash flood risk. Finally, based on watershed slope (S), the priorities of flash flood prevention and control of 119 small watersheds in Licheng county are given out.
ARTICLE | doi:10.20944/preprints202101.0375.v1
Subject: Social Sciences, Business And Administrative Sciences Keywords: cold chain logistics of agricultural products; demand forecast; principal component analysis, multiple linear regression, neural network.
Online: 19 January 2021 (11:50:09 CET)
Cold chain logistics of Agricultural Products demand forecasting can provide the scientific basis for the country to formulate logistics strategy, which further promotes the development of social economy and the improvement of living standards in China. In this paper, a new mathematical combined model is proposed to Agricultural Products Demand. Shandong, one of a China’s province, serves as the main producer and distributor of agricultural products. Based on the index system created from multiple related factors influencing cold chain logistics demand of agricultural products in Shandong, this paper employs principal component analysis to reduce the dimension of various indexes and predicts principal components with time series. Thereafter, multiple linear regression model and neural network model were constructed to forecast the cold chain logistics demand of agricultural products in Shandong, and their combined forecast models were compared. What's more, the paper provides insight for reference and decision-making concerning the development of cold chain logistics industry of agricultural products in Shandong province.
ARTICLE | doi:10.20944/preprints202112.0007.v1
Subject: Engineering, Energy & Fuel Technology Keywords: SO2; unburned carbon; fly ash; activated carbon; adsorption kinetics; kinetics models; linear regression; non-linear regression; statistical error functions; the sum of normalized error method
Online: 1 December 2021 (10:55:30 CET)
Kinetic parameters of SO2 adsorption on unburned carbons from lignite fly ash and activated carbons based on hard coal dust were determined. The model studies were performed using the linear and non-linear regression method for the following models: pseudo first and second-order, intraparticle diffusion, and chemisorption on a heterogeneous surface. The quality of the fitting of a given model to empirical data was assessed based on: R2, R, Δq, SSE, ARE, χ2, HYBRID, MPSD, EABS, and SNE. It was clearly shown that it is the linear regression that more accurately reflects the behaviour of the adsorption system, which is consistent with the first-order kinetic reaction – for activated carbons (SO2+Ar) or chemisorption on a heterogeneous surface – for unburned carbons (SO2+Ar and SO2+Ar+H2O(g)+O2) and activated carbons (SO2+Ar+H2O(g)+O2). Importantly, usually, each of the approaches (linear/non-linear) indicated a different mechanism of the studied phenomenon. A certain universality of the χ2 and HYBRID functions has been proved, the minimization of which repeatedly led to the lowest SNE values for the indicated models. Fitting data by any of the non-linear equations based on the R or R2 functions only, cannot be treated as evidence/prerequisite of the existence of a given adsorption mechanism.
ARTICLE | doi:10.20944/preprints202002.0200.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: uniqueness: regression depth; maximum depth estimator; regression median; robustness
Online: 15 February 2020 (14:51:15 CET)
Notion of median in one dimension is a foundational element in nonparametric statistics. It has been extended to multi-dimensional cases both in location and in regression via notions of data depth. Regression depth (RD) and projection regression depth (PRD) represent the two most promising notions in regression. Carrizosa depth DC is another depth notion in regression. Depth induced regression medians (maximum depth estimators) serve as robust alternatives to the classical least squares estimator. The uniqueness of regression medians is indispensable in the discussion of their properties and the asymptotics (consistency and limiting distribution) of sample regression medians. Are the regression medians induced from RD, PRD, and DC unique? Answering this question is the main goal of this article. It is found that only the regression median induced from PRD possesses the desired uniqueness property. The conventional remedy measure for non-uniqueness, taking average of all medians, might yield an estimator that no longer possesses the maximum depth in both RD and DC cases. These and other findings indicate that the PRD and its induced median are highly favorable among their leading competitors.
ARTICLE | doi:10.20944/preprints201808.0229.v1
Subject: Social Sciences, Economics Keywords: Economic evaluation; Water resource management; Meta-regression analysis; River management funds; Sustainability of water resources
Online: 13 August 2018 (12:26:24 CEST)
Water management can improve the quality of valuable ecosystem services but can be costly to implement and the management costs are covered by national taxes collected by water users. Based on 30 valuation studies of water quality improvement from the Environmental Valuation Information System (EVIS) database provided by Korea Environment Institute (KEI), a meta-regression analysis was employed to measure the benefits that major river basins provided to the society. We compare these benefits to the costs, namely River Management Funds (RMFs) which are financial resources to support a variety of projects for managing and improving upstream water quality. Based on benefit-cost comparison, this study evaluates the efficiency of water resource management in South Korea. This study also provides policy options that are helpful to maintain the sustainability of water resource by improving the planning and performance of water management in the long run.
ARTICLE | doi:10.20944/preprints202201.0209.v1
Subject: Social Sciences, Economics Keywords: Economic Growth; Gross Fixed Capital Formation; Government Expenditure; Government Deficit; Vector Auto-Regression and South Africa
Online: 14 January 2022 (11:36:07 CET)
The study uses annual time series data from the South Africa Reverse Bank (SARB) from 1980 to 2020 to examine the effectiveness of fiscal policy on economic growth in South Africa. The Augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) unit root tests, as well as the Johansen Co-integration test, Granger causality test, and Vector Auto-Regression (VAR) method, were used in the study. Real GDP per capita (RGDP) is used as proxy of economic growth and gross fixed capital formation (GFCF), government expenditure (GEXP) and government deficit (GOVD) as the proxies of fiscal policy. The ADF test results show that all variables are stationary at the first difference, with the exception of GFCF and GEXP, which are stationary at I(0), whereas the PP test results show that all variables are stationary at I(1), with the exception of GEXP, which is stationary at I(0). At Maximum Eigenvalue, the four variables are not cointegrated. The findings of the Granger causality test demonstrated a unidirectional causation from GOVD to RGDP, as well as a bidirectional causality from RGDP to GFCF and GEXP. Error Correction Model Estimated using VAR shows that GFCF, GEXP have positive effect on RGDP whereas GOVD has a negative effect on RGDP in the short run. The findings also presented that the VAR's residuals are homoscedastic, which means they are normally distributed and have no serial correlation.
ARTICLE | doi:10.3390/sci2040074
Subject: Keywords: trend analysis; Mann–Kendall test; Sen’s slope estimator; linear regression; cereal yield; northern Togo
Online: 24 September 2020 (00:00:00 CEST)
This study investigates the trend in monthly and annual rainfall, minimum and maximum temperature (Tmin and Tmax) using the Mann–Kendall (MK) test and Sen’s slope (SS) method and evaluates the significance of their variability for maize, sorghum and millet yields in northern Togo employing multiple regression analysis. The historical data of Kara, Niamtougou, Mango and Dapaong weather stations from 1977 to 2012 were used. Four non-parametric methods—Alexandersson’s Standard Normal Homogeneity Test (SNHT), Buishand’s Range Test (BRT), Pettitt’s Test (PT) and Von Neumann’s Ratio Test (VNRT)—were applied to detect homogeneity in the data. For the data which were serially correlated, a modified version of the MK test (pre-whitening) was utilised. Results showed an increasing trend in the annual rainfall in all four locations. However, this trend was only significant at Dapaong (p < 0.1). There was an increasing trend in Tmax at Kara, Mango and Niamtougou, unlike Dapaong where Tmax revealed a significant decreasing trend (p < 0.01). Similarly, there was an increasing trend in Tmin at Kara, Mango and Dapaong, unlike Niamtougou where Tmin showed a non-significant decreasing trend (p > 0.05). Rainfall in Dapaong was found to have increased (7.79 mm/year) more than the other locations such as Kara (2.20 mm/year), Niamtougou (4.57 mm/year) and Mango (0.67 mm). Tmax increased by 0.13, 0.13 and 0.32 °C per decade at Kara, Niamtougou and Mango, respectively, and decreased by 0.20 per decade in Dapaong. Likewise, Tmin increased by 0.07, 0.20 and 0.02 °C per decade at Kara, Mango and Dapaong, respectively, and decreased by 0.01 °C per decade at Niamtougou. Results of multiple regression analysis revealed nonlinear yield responses to changes in rainfall and temperature. Rainfall and temperature variability affects rainfed cereal crops production, but the effects vary across crops. The temperature has a positive effect on maize yield in Kara, Niamtougou and Mango but a negative effect on sorghum in Niamtougou and millet in Dapaong, while rainfall has a negative effect on maize yield in Niamtougou and Dapaong and millet yield in Mango. In all locations, rainfall and temperature variability has a significant effect on the cereal crop yields. There is, therefore, a need to adopt some adaptation strategies for sustainable agricultural production in northern Togo.
ARTICLE | doi:10.20944/preprints201807.0299.v1
Subject: Social Sciences, Business And Administrative Sciences Keywords: commuting stress; turnover intention; life satisfaction; mediation model; demographics; ANOVA; hierarchical regression; bootstrap; Turkey
Online: 17 July 2018 (09:49:16 CEST)
Using hierarchical regression analysis within a mediation model framework, the present study explores direct and indirect (through life satisfaction) causal impacts of commuting stress on turnover intention of employees from 29 business organizations in six populous cities of Turkey. A semi-random heterogeneous sample of 214 employees with different demographics was surveyed in winter and summer times for also capturing seasonal variations of variables. The results supporting the partial mediating role of life satisfaction in the positive relationship between commuting stress and turnover intention infer that commuting stress induces turnover intention directly and indirectly (by reducing life satisfaction). The analysis of variance reveals that demographic characteristics of employees such as gender, marital status, age, and family size together with commuting type and commuting duration matter for their perceived commuting stress, life satisfaction, and turnover intention levels. Commuting stress perception is relatively higher in summer time whereas the other magnitudes are consistently and significantly invariant between two survey implementations. The study concludes with a call for the consideration of commuting stress and life satisfaction together with environmental and demographic factors when analyzing the antecedents and consequences of employee turnover intention.
ARTICLE | doi:10.20944/preprints202007.0008.v1
Subject: Social Sciences, Econometrics & Statistics Keywords: Copula Regression; ICT resources; Middle East; Spatial Analysis; Students Well-being; Sustainable Development Goals
Online: 2 July 2020 (13:18:03 CEST)
Target 9.c of the 2015 United Nations (UN) sustainable development goals (SDGs) specifically addresses increasing access to information and communication technology (ICT) resources, and striving for universal access to the internet by 2020. The present study seeks to evaluate the effectiveness of the youth related national strategies implemented in this regard by a select number of countries in the Middle East region. The study does so, by relying on a spatial bivariate copula regression analysis of data on youth respondents from five countries, extracted from the 2018 Program for international students’ assessment (PISA). Focusing specifically on evaluating the availability of ICT resources to the youth population, and also identifying the impact of ICT resources on youth subjective well-being in the region, we find that except for the UAE and Qatar that have above OECD average youth performance on the ICT resource index, youth from the remaining countries reported below OECD level average access to ICT resources. The within region cross-country comparative analysis of ICT resources availability to the youth population at home, also highlighted significant heterogeneity across the five countries, post 2015 SDG adoption by UN country members. Furthermore, looking at the impact of ICT resources on youth well-being, controlling for not only cross-country spatial correlations, and factors such as home educational resources, cultural possessions at home, parental occupation status, youth expected occupation status, economic and socio-cultural status, age, gender, and grade level in school; we found that every standard deviation increase in ICT resources to the youth population in the region raises their self-expressed sense of belonging in school by 1.88% standard deviations. Given the empowering nature of ICT resources to youth, and the potential of both to support national as well as regional economic development initiatives, a concerted effort to ease ICT resources diffusion by member countries in the middle east region could assist not only each country in its own development path, but also the region as a whole to live up to its growth potential by the 2030.
REVIEW | doi:10.20944/preprints202110.0207.v1
Online: 13 October 2021 (16:28:59 CEST)
Accurate transfer learning of clinical outcomes, e.g., of the effects and side effects of drugs or other interventions, from one cellular context to another (in-vitro versus ex-vivo versus in-vivo, or across tissues), between cell-types, developmental stages, omics modalities or species, is considered tremendously useful. Ultimately, it may avoid most drug development failing in translation, despite large investments in the preclinical stages, which includes animal experiments requiring careful justification. Thus, when transferring a prediction task from a source (model) domain to a target domain, what counts is the high quality of the predictions in the target domain, requiring molecular states or processes common to both source and target that can be learned by the predictor, reflected by latent variables. These latent variables may form a compendium of knowledge that is learned in the source, to enable predictions in the target; usually, there are few, if any, labeled target training samples to learn from. Transductive learning then refers to the learning of the predictor in the source domain, transferring its outcome label calculations to the target domain, considering the same task. Inductive learning considers cases where the target predictor is performing a different yet related task as compared to the source predictor, making some labeled target data necessary. Often, there is also a need to first map the variables in the input/feature spaces (e.g. of gene names to orthologs) and/or the variables in the output/outcome spaces (e.g. by matching of labels). Transfer across omics modalities also requires that the molecular information flow connecting these modalities is sufficiently conserved. Only one of the methods for transfer learning we reviewed offers an assessment of input data, suggesting that transfer learning is unreliable in certain cases. Moreover, source domains feature their very own particularities, and transfer learning should consider these, e.g., as differences in pharmacokinetics, drug clearance or the microenvironment. In light of these general considerations, we here discuss and juxtapose various recent transfer learning approaches, specifically designed (or at least adaptable) to predict clinical (human in-vivo) outcomes based on molecular data, towards finding the right tool for a given task, and paving the way for a comprehensive and systematic comparison of the suitability and accuracy of transfer learning of clinical outcomes.
Online: 18 September 2020 (09:40:45 CEST)
The main objective of this article is to explore the causes of household electricity poverty in Spain from an innovative perspective. Based on evidence of energy inequality across households with different income levels, a quantile regression approach was used to better capture the heterogeneity of determinants of energy poverty across different levels of electricity expenditure. The results illustrate some interesting and counter-intuitive findings about the relationship between household income and electricity poverty, and the technical efficiency of quantile regression compared to the imprecise results of a standard single coefficient/OLS approach.
ARTICLE | doi:10.20944/preprints202201.0441.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Active learning (AL); batch mode; expected model change; linear regression; nonlinear regression
Online: 28 January 2022 (15:03:10 CET)
Training supervised machine learning models requires labeled examples. A judicious choice of examples is helpful when there is a significant cost associated with assigning labels. This article improves upon a promising extant method – Batch-mode Expected Model Change Maximization (B-EMCM) method – for selecting examples to be labeled for regression problems. Specifically, it develops and evaluates alternate strategies for adaptively selecting batch size in B-EMCM. By determining the cumulative error that occurs from the estimation of the stochastic gradient descent, a stop criteria for each iteration of the batch can be specified to ensure that selected candidates are the most beneficial to model learning. This new methodology is compared to B-EMCM via mean absolute error and root mean square error over ten iterations benchmarked against machine learning data sets. Using multiple data sets and metrics across all methods, one variation of AB-EMCM, the max bound of the accumulated error (AB-EMCM Max), showed the best results for an adaptive batch approach. It achieved better root mean squared error (RMSE) and mean absolute error (MAE) than the other adaptive and non-adaptive batch methods while reaching the result in nearly the same number of iterations as the non-adaptive batch methods.
ARTICLE | doi:10.20944/preprints202110.0127.v1
Subject: Mathematics & Computer Science, Applied Mathematics Keywords: Descriptive analysis; principal components analysis; k-means clustering; data panel regression method; machine learning; XGBoost algorithms; random forest algorithms
Online: 8 October 2021 (08:30:13 CEST)
The aim of this work is to explain the behaviour of the multiresistance percentage of Pseudomona aeruginosa in some countries of Europe through a multivariate statistical analysis and machine learning validation, using data from the European Antimicrobial Resistance Surveillance System, the World Health Organization and the World Bank. First, we will use a descriptive analysis and a principal components analysis. Then, we use a k-means clustering to determine the countries and regions that are most affected by the antibiotic resistance. Second, we expand the database by adding some socioeconomic, governance and antibiotic-consumption variables. We then run a data panel regression analysis to determine some functions that relates the multiresistance percentage with those new variables. Finally, we use machine learning techniques to validate a pooling panel data case, using XGBoost and random forest algorithms. The results of the data panel analysis indicate that the most important variables for the multiresistance percentage are corruption control and the rule of law. Similar results are found with the machine learning validation analysis, where the human development index is an additional important variable for the multiresistance percentage.
ARTICLE | doi:10.20944/preprints202208.0222.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: Tuberculosis; Mortality; Indigenous; Logistic Regression
Online: 11 August 2022 (12:00:20 CEST)
Aim. To identify factors associated with mortality with tuberculosis diagnosis in the indigenous population in Peru 2015-2019. Methods. Case-control study nested in a retrospective cohort, using the registry of persons belonging to indigenous peoples of the National Tuberculosis Prevention and Control Strategy of the Ministry of Health of Peru. A descriptive analysis was applied, and then bivariate and multiple logistic regression was used to evaluate associations between the variables and the outcome (live-deceased), the results were presented as OR with their respective 95% confidence intervals. Results. The mortality rate of the total indigenous population of Peru was 1.75 deaths per 100,000 indigenous people diagnosed with TB. The community of Kukama kukamiria - Yagua reported 505 (28.48%) individuals. The final logistic model showed that indigenous men (OR=1.93; 95% CI: 1.001-3.7), with a history of HIV prior to TB (OR=16.7; 95% CI: 4.7-58.7) and indigenous people in old age (OR=2.95; 95% CI: 1.5-5.7), are factors associated with a greater chance of dying from TB. Conclusions. It is important to reorient health services among indigenous populations, especially those related to improving the timely diagnosis and early treatment of TB-HIV co-infection, to ensure comprehensive care for this population, considering that they are vulnerable groups.
ARTICLE | doi:10.20944/preprints201804.0357.v1
Subject: Engineering, General Engineering Keywords: hydrokinetic; energy assessment; unregulated river; daily water velocity estimation; daily water level estimation; IBM statistical package for social sciences (SPSS); regression analysis; east malaysia
Online: 27 April 2018 (08:39:22 CEST)
Electrification coverage in Sarawak is the lowest at 78.74%, compared to Peninsular Malaysia at 99.62% and Sabah at 82.51%. Kapit, Sarawak with its 88.4% populations located in rural areas and mostly situated along the main riverbanks has great potential to generate electrical energy by hydrokinetic system. Yearly water velocity data is the most significant parameter to perform hydrokinetic analysis study. Nevertheless, the data retrieved from local river databases are inadequate for river energy analysis, thus hindering its progression. Instead, flow rates and rainfall data had been utilised to estimate the water velocity data. This signifies no estimation of water velocity in an unregulated river by using water level data had been made. Therefore, a novel technique of estimating the daily average water velocity data in unregulated rivers is proposed. The modelling of regression equation for water velocity estimation was performed and two regression model equations were generated to estimate both water level and water velocity on-site and proven to be valid as the coefficient of determination values had been R2 = 87.4% and R2=87.9%, respectively. The combination of both regression model equations can be used to estimate long-term time series water velocity data for type-C unregulated river in remote areas.
ARTICLE | doi:10.20944/preprints202008.0139.v1
Subject: Engineering, Industrial & Manufacturing Engineering Keywords: copper price; prediction; support vector regression
Online: 6 August 2020 (08:26:35 CEST)
Predicting copper price is essential for making decisions that can affect companies and governments dependent on the copper mining industry. Copper prices follow a time series that is non-linear, non-stationary, and which have periods that change as a result of potential growth, cyclical fluctuation and errors. Sometimes the trend and cyclical components together are referred to as a trend-cycle. In order to make predictions, it is necessary to consider the different characteristics of trend-cycle. In this paper, we study a copper price prediction method using Support Vector Regression. This work explores the potential of the Support Vector Regression with external recurrences to make predictions at 5, 10, 15, 20 and 30 days into the future in the copper closing price at the London Metal Exchanges. The best model for each forecast interval is performed using a grid search and balanced cross-validation. In experiments on real data-sets, our results obtained indicate that the parameters (C, ε, γ) of the model Support Vector Regression do not differ between the different prediction intervals. Additionally, the amount of preceding values used to make the estimates does not vary according to the predicted interval. Results show that the support vector regression model has a lower prediction error and is more robust. Our results show that the presented model is able to predict copper price volatilities near reality, being the RMSE equal or less than the 2.2% for prediction periods of 5 and 10 days.
ARTICLE | doi:10.20944/preprints201902.0135.v1
Online: 14 February 2019 (11:30:03 CET)
Based on a rich data set of recoveries donated by a debt collection business, recovery rates for non-performing loans taken from a single European country are modelled using linear regression, linear regression with Lasso, beta regression and inflated beta regression. We also propose a two-stage model: beta mixture model combined with a logistic regression model. The proposed model allows us to model the multimodal distribution we find for these recovery rates. All models are built using loan characteristics, default data and collections data prior to purchase by the debt collection business. The intended use of the models is to estimate future recovery rates for improved risk assessment, capital requirement calculations and bad debt management. They are compared using a range of quantitative performance measures under K-fold cross validation. Among all the models, we find that the proposed two-stage beta mixture model performs best.
ARTICLE | doi:10.20944/preprints201809.0499.v1
Online: 26 September 2018 (05:23:02 CEST)
Understanding influences of multiple stressors across the landscape on aquatic biota is important for conservation, as it allows for an understanding of spatial patterns and informs stakeholders of significant conservation value. Data exists for land use/landcover (LULC) and other physicochemical components of the landscape throughout the Appalachian region yet biological data is sparse. This dearth of biological data relative to LULC and physicochemical data creates difficulties in making informed management and conservation decisions across large landscapes. At the HUC12 watershed scale we sought to create a single score for both abiotic and biotic values throughout the central and southern Appalachian region. We used boosted regression trees (BRT) to model biological responses (fish and aquatic macroinvertebrate variables) to abiotic variables. Variance explained by BRT models ranged from 62-94%. We categorized both predictor and response variables into themes and targets respectively to better understand large scale patterns on the landscape that influence biological condition of streams. We combined predicted values for a suite of response variables from BRT models to create a single watershed score for aquatic macroinvertebrates and fish. Regional models were developed for fish but we were unable to develop regional models for aquatic macroinvertebrates due to the low number of sample sites. There was strong correlation between regional and global watershed scores for fish models but not between fish and aquatic macroinvertebrate models. Use of such multimetric scores can inform managers, NGOs, and private land owners regarding land use practices; thereby contributing to largescale landscape scale conservation efforts.
COMMUNICATION | doi:10.20944/preprints202111.0549.v1
Subject: Keywords: Principal Component Regression, Partial Least Squares, Orthogonal Partial Least Squares, multivariate regression, hypothesis generation, Parkinson’s disease
Online: 29 November 2021 (15:42:03 CET)
In the current era of ‘big data’, scientists are able to quickly amass enormous amount of data in a limited number of experiments. The investigators then try to hypothesize about the root cause based on the observed trends for the predictors and the response variable. This involves identifying the discriminatory predictors that are most responsible for explaining variation in the response variable. In the current work, we investigated three related multivariate techniques: Principal Component Regression (PCR), Partial Least Squares or Projections to Latent Structures (PLS), and Orthogonal Partial Least Squares (OPLS). To perform a comparative analysis, we used a publicly available dataset for Parkinson’ disease patien ts. We first performed the analysis using a cross-validated number of principal components for the aforementioned techniques. Our results demonstrated that PLS and OPLS were better suited than PCR for identifying the discriminatory predictors. Since the X data did not exhibit a strong correlation, we also performed Multiple Linear Regression (MLR) on the dataset. A comparison of the top five discriminatory predictors identified by the four techniques showed a substantial overlap between the results obtained by PLS, OPLS, and MLR, and the three techniques exhibited a significant divergence from the variables identified by PCR. A further investigation of the data revealed that PCR could be used to identify the discriminatory variables successfully if the number of principal components in the regression model were increased. In summary, we recommend using PLS or OPLS for hypothesis generation and systemizing the selection process for principal components when using PCR.rewordexplain later why MLR can be used on a dataset with no correlation
ARTICLE | doi:10.20944/preprints201907.0351.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: evaporation; meteorological parameters; Gaussian process regression; support vector regression; machine learning modeling; hydrology; prediction; data science; hydroinformatics
Online: 31 July 2019 (10:58:29 CEST)
Evaporation is one of the main processes in the hydrological cycle, and it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, the evaporation is a complex and nonlinear phenomenon; therefore, the data-based methods can be used to have precise estimations of it. In this regard, in the present study, Gaussian Process Regression (GPR), Nearest-Neighbor (IBK), Random Forest (RF) and Support Vector Regression (SVR) were used to estimate the pan evaporation (PE) in the meteorological stations of Golestan Province, Iran. For this purpose, meteorological data including PE, temperature (T), relative humidity (RH), wind speed (W) and sunny hours (S) collected from the Gonbad-e Kavus, Gorgan and Bandar Torkman stations from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error (RMSE), correlation coefficient (R) and Mean Absolute Error (MAE). Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. The outcome indicates that the optimum state of Gonbad-e Kavus, Gorgan and Bandar Torkman stations, Gaussian Process Regression (GPR) with the error values of 1.521, 1.244, and 1.254, the Nearest-Neighbor (IBK) with error values of 1.991, 1.775, and 1.577, Random Forest (RF) with error values of 1.614, 1.337, and 1.316, and Support Vector Regression (SVR) with error values of 1.55, 1.262, and 1.275, respectively, have more appropriate performances in estimating PE. It found that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W, and S had the most accurate performances and proposed for precise estimation of PE. Due to the high rate of evaporation in Iran and the lack of measurement instruments, the findings of the current study indicated that the PE values might be estimated with few easily measured meteorological parameters accurately.
ARTICLE | doi:10.20944/preprints202205.0417.v1
Subject: Earth Sciences, Geoinformatics Keywords: COVID-19; Eswatini; risk mapping; Poisson regression
Online: 31 May 2022 (11:04:12 CEST)
COVID-19 national spikes had been reported at varying temporal scales as a result of differences in the driving factors. Factors affecting case load and mortality rates have varied between countries and regions. We investigated the association between various socio-economic, demographic and health variables with the spread on COVID-19 cases in Eswatini using the maximum likelihood estimation method for count data. A generalized Poisson regression (GPR) model was fitted with the data comprising of fifteen covariates to predict COVID-19 risk in Eswatini. The results showed that variables that were key determinants in the spread of the disease were those that included the proportion of elderly above 55 years at 98% (95% CI: 97%-99%) and the proportion of youth below 35 years at 0.08% (95% CI: 0.017%-38%) with a pseudo R-square of 0.72. However, in the early phase of the virus when cases were fewer, results from the Poisson regression showed that household size, household density and poverty index were associated with COVID-19. We produced a risk map of predicted COVID-19 in Eswatini using the variables that were selected at 5% significance level. The map could be used by the country to plan and prioritize health interventions against COVID-19. The identified areas of high risk may be further investigated in order to find out the risk amplifiers and assess what could be done to prevent them.
ARTICLE | doi:10.20944/preprints202107.0139.v1
Subject: Social Sciences, Accounting Keywords: circularity; waste streams; circular approaches; regression equation
Online: 6 July 2021 (11:40:19 CEST)
In this paper, the authors identified key elements important for circularity: (1) Background: The primary goal of circularity is to eliminate waste and to prove the constant use of resources. In the paper, we classify studies according to circular approaches. The authors identified main elements and classified them into categories important for circularity, starting with the managing and reducing waste and the recovery of resources; and ending with the circularity of material, and general circularity-related topics and presented scientific works dedicated to each of the above-mentioned categories. The authors analyzed several core elements from the first category aiming to investigate and connect different waste streams and provided a regression model; (2) Methods: The authors used a dynamic regression model to identify relationships among variables and selected the ones, which has an impact on the increase of biowaste. The research was delivered for the 27 European Union countries during the period between 2020 and 2019; (3) Conclusions: The authors indicated that the recycling rate of wasted electrical equipment in the previous year has an impact on the increase of recycling biowaste next year. This is explained as non-metallic spare parts of electronic equipment are used as biowaste for fuel production. And the separation process of the composites of electric equipment takes some time, on average the effect is evident in one year period.
ARTICLE | doi:10.20944/preprints202012.0321.v1
Subject: Earth Sciences, Atmospheric Science Keywords: quantile regression; groundwater; environmental; multivariate; metals; health
Online: 14 December 2020 (10:13:09 CET)
One of the most important defining characteristics of groundwater quality is pH as it fundamentally controls the amount and chemical form of many organic and inorganic solutes in groundwater. Groundwater data are frequently characterized by a wide degree of variability of the factors which possibly influence pH distribution. For this reason, it is challenging to link the spatio-temporal dynamics of pH to a single environmental factor by the ordinary least squares regression technique of the conditional mean. In this study, quantile regression was used to estimate the response of pH to nine environmental factors (As, Cd, Fe, Mn, Pb, turbidity, electrical conductivity, total dissolved solids and nitrates). Results of 25%, 50%, 75% quantile regression and ordinary least squares (OLS) regression were compared. The standard regression of the conditional means (OLS) underestimated the rates of change of pH due to the selected factors in comparison with the regression quantiles. The effect of arsenic increased for sampling locations with higher pH values (higher quantiles) likewise the influence of Pb and Mn. However, the effects of Cd and Fe decreased for sampling locations in higher quantiles. It can be concluded that these detected heterogeneities would be missed if this study had focused exclusively on the conditional means of the pH values. Consequently, quantile regression provides a more comprehensive account of possible spatio-temporal relationships between environmental covariates in groundwater. This study is one of the first to apply this technique on groundwater systems in sub-Saharan Africa. The approach is useful and interesting and has broad application for other mining environments especially tropical low-income countries where climatic conditions can drive rapid cycling or transformations of pollutants. It is also pertinent to geopolitical contexts where regulatory; monitoring and management capacities are weak and where mining pollution of groundwater largely occur.
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Crime prediction; Ensemble Learning; Machine Learning; Regression
Online: 14 September 2020 (00:53:30 CEST)
While the use of crime data has been widely advocated in the literature, its availability is often limited to large urban cities and isolated databases tend not to allow for spatial comparisons. This paper presents an efficient machine learning framework capable of predicting spatial crime occurrences, without using past crime as a predictor, and at a relatively high resolution: the U.S. Census Block Group level. The proposed framework is based on an in-depth multidisciplinary literature review allowing the selection of 188 best-fit crime predictors from socio-economic, demographic, spatial, and environmental data. Such data are published periodically for the entire United States. The selection of the appropriate predictive model was made through a comparative study of different machine learning families of algorithms, including generalized linear models, deep learning, and ensemble learning. The gradient boosting model was found to yield the most accurate predictions for violent crimes, property crimes, motor vehicle thefts, vandalism, and the total count of crimes. Extensive experiments on real-world datasets of crimes reported in 11 U.S. cities demonstrated that the proposed framework achieves an accuracy of 73 and 77% when predicting property crimes and violent crimes, respectively.
REVIEW | doi:10.20944/preprints201910.0362.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: tuberculosis (TB); human immunodeficiency virus (HIV); Acquired Immune Deficiency Syndrome (AIDS); World Health Organization (WHO); panel data; poisson; negative binomial; regression
Online: 31 October 2019 (04:33:45 CET)
Tuberculosis cause of death worldwide and the leading cause from a single infectious agent, ranking above Human immunodeficiency virus (HIV) and Acquired Immune Deficiency Syndrome (AIDS). The aim of this study is to ascertain the trend of tuberculosis prevalence and the effect of HIV prevalence onl Tuberculosis case in some West African countries from 2000 to 2016 using count panel data regression models. The data used annual HIV and Tuberculosis cases spanning from 2000 to 2016 extracted from online publication of World health Organization (WHO). Panel Poisson regression model and Negative binomial regression model for fixed and random effects were used to analyzed the count data, the result revealed a positive trend in TB cases while increased in HIV cases leads to increase in TB cases in West African countries. Among the competing models used in this study, Panel Negative Binomial Regression Model with fixed effect emerged the best model with log likelihood value of -1336.554. This study recommended that Government and NGOs need more strategies to fight against HIV menace in West Africa as this will in turn reduced TB cases in West Africa.
REVIEW | doi:10.20944/preprints202111.0310.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: Functional Data Analysis (FDA); Hybrid Data; Semi-Functional Partial Linear Regression Model (SFPLR); Partial Functional Linear Regression; Literature Review
Online: 17 November 2021 (15:21:19 CET)
Background: In the functional data analysis (FDA), the hybrid or mixed data are scalar and functional datasets. The semi-functional partial linear regression model (SFPLR) is one of the first semiparametric models for the scalar response with hybrid covariates. Various extensions of this model are explored and summarized. Methods: Two first research articles, including “semi-functional partial linear regression model”, and “Partial functional linear regression” have more than 300 citations in Google Scholar. Finally, only 106 articles remained according to the inclusion and exclusion criteria such as 1) including the published articles in the ISI journals and excluding 2) non-English and 3) preprints, slides, and conference papers. We use the PRISMA standard for systematic review. Results: The articles are categorized into the following main topics: estimation procedures, confidence regions, time series, and panel data, Bayesian, spatial, robust, testing, quantile regression, varying Coefficient Models, Variable Selection, Single-index model, Measurement error, Multiple Functions, Missing values, Rank Method and Others. There are different applications and datasets such as the Tecator dataset, air quality, electricity consumption, and Neuroimaging, among others. Conclusions: SFPLR is one of the most famous regression modeling methods for hybrid data that has a lot of extensions among other models.
ARTICLE | doi:10.20944/preprints202106.0533.v1
Online: 22 June 2021 (08:30:30 CEST)
The novel coronavirus disease (COVID-19) has created immense threats to public health on various levels around the globe. The unpredictable outbreak of this disease and the pandemic situation are causing severe depression, anxiety and other mental as physical health related problems among the human beings. To combat against this disease, vaccination is essential as it will boost the immune system of human beings while being in the contact with the infected people. The vaccination process is thus necessary to confront the outbreak of COVID-19. This deadly disease has put social, economic condition of the entire world into an enormous challenge. The worldwide vaccination progress should be tracked to identify how fast the entire economic as well as social life will be stabilized. The monitor ofthe vaccination progress, a machine learning based Regressor model is approached in this study. This tracking process has been applied on the data starting from 14th December, 2020 to 24th April, 2021. A couple of ensemble based machine learning Regressor models such as Random Forest, Extra Trees, Gradient Boosting, AdaBoost and Extreme Gradient Boosting are implemented and their predictive performance are compared. The comparative study reveals that the AdaBoostRegressor outperforms with minimized mean absolute error (MAE) of 9.968 and root mean squared error (RMSE) of 11.133.
Subject: Medicine & Pharmacology, Allergology Keywords: Diagnosing designs; rare diseases; statistics; regression; block designs
Online: 2 June 2021 (12:14:34 CEST)
Far too often, one meets patients who went for years or even decades from doctor to doctor, without getting a valid diagnosis. This brings pain to millions of patients and their families, not to speak of the enormous costs. Often patients cannot tell precisely enough which factors (or combinations thereof) trigger their problems. If conventional methods fail, we propose the use of statistics and algebra to give doctors much more useful inputs from patients. We use statistical regression for independent triggering factors for medical problems, and “balanced incomplete block designs” for non-independent factors. These methods can supply doctors with much more valuable inputs, and can also detect combinations of multiple factors by incredibly few tests. In order to show that these methods do work, we briefly describe a case in which these methods helped to solve a 60 year old problem in a patient, and give some more examples where these methods might be very useful. As a conclusion, while regression is used in clinical medicine, it seems to be widely unknown in diagnosing. Statistics and algebra can save the health systems much money, and the patients also a lot of pain.
ARTICLE | doi:10.20944/preprints202103.0586.v1
Subject: Earth Sciences, Atmospheric Science Keywords: NVOC; phytoncide; bamboo grove; monoterpene; microclimate; regression analysis
Online: 24 March 2021 (13:10:25 CET)
After the COVID-19 outbreak, more and more people are seeking physiological and psychological healing by visiting the forest as the time of stay-at-home became longer. NVOC, a major healing factor of forests, has several positive effects on human health, and this study researched about the NVOC characteristics of bamboo groves. The study revealed that α-pinene, 3-carene, and camphene were the most emitted, and the largest amount of NVOC was emitted in the early morning and late afternoon in bamboo groves. Furthermore, NVOC emission was found to have normal correlations with temperature and humidity, and inverse correlations with solar radiation, PAR and wind speed. A regression analysis conducted to predict the effect of microclimate factors on NVOC emissions resulted in a regression equation with 82.9% explanatory power and found that PAR, temperature, and humidity had a significant effect on NVOC emission prediction. In conclusion, this study investigated NVOC emission characteristics of bamboo groves, examined the relationship between NVOC emissions and microclimate factors and derived a prediction equation of NVOC emissions to figure out bamboo groves' forest healing effects. These results are expected to provide a basis for establishing more effective forest healing programs in bamboo groves.
ARTICLE | doi:10.20944/preprints202008.0329.v2
Subject: Medicine & Pharmacology, General Medical Research Keywords: COVID-19; Geospatial Regression; Health Disparities; Public Health
Online: 11 September 2020 (09:48:57 CEST)
COVID-19 is a potentially fatal viral infection. This study investigates geography, demography, socioeconomics, health conditions, hospital characteristics, and politics as potential explanatory variables for death rates at the state and county levels. Data from the Centers for Disease Control and Prevention, the Census Bureau, Centers for Medicare and Medicaid, Definitive Healthcare, and USAfacts.org were used to evaluate regression models. Yearly pneumonia and flu death rates (state level, 2014-2018) were evaluated as a function of the governors’ political party using repeated measures analysis. At the state and county level, spatial regression models were evaluated. At the county level, we discovered a statistically significant model that included geography, population density, racial and ethnic status, three health status variables along with a political factor. State level analysis identified health status, minority status, and the interaction between governors’ parties and health status as important variables. The political factor, however, did not appear in a subsequent analysis of 2014-2018 pneumonia and flu death rates. The pathogenesis of COVID-19 has greater and disproportionate effect within racial and ethnic minority groups, and the political influence on the reporting of COVID-19 mortality was statistically relevant at the county level and as an interaction term only at the state level.
ARTICLE | doi:10.20944/preprints201906.0291.v1
Subject: Medicine & Pharmacology, Other Keywords: endothelial disorders; glycocalyx injury; syndecan-1; nonlinear regression
Online: 28 June 2019 (07:42:18 CEST)
Endothelial disorders are related to various diseases. An initial endothelial injury is characterized by endothelial glycocalyx injury. We aimed to evaluate endothelial glycocalyx injury by measuring serum syndecan-1 concentrations in patients during comprehensive medical examinations. A single-center, prospective, observational study was conducted at Asahi University Hospital. The participants enrolled in this study were 1313 patients who underwent comprehensive medical examinations at Asahi University Hospital from January 2018, to June 2018. One patient undergoing hemodialysis was excluded from the study. At enrollment, blood samples were obtained, and study personnel collected demographic and clinical data. No treatments or exposures were conducted except for standard medical examinations and blood sample collection. Laboratory data were obtained by collection of blood samples at the time of study enrolment. According to nonlinear regression, the concentrations of serum syndecan-1 were significantly related to age (p = 0.016), aspartic aminotransferase concentration (AST, p = 0.020), blood urea nitrogen concentration (BUN, p = 0.013), triglyceride concentration (p < 0.001), and hematocrit (p = 0.006). These relationships were independent associations. Endothelial glycocalyx injury, which is reflected by serum syndecan-1 concentrations, is related to age, hematocrit, AST concentration, BUN concentration, and triglyceride concentration.
ARTICLE | doi:10.20944/preprints201811.0096.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: machine learning; stacking; forecasting; regression; sales; time series
Online: 5 November 2018 (09:54:54 CET)
In this paper, we study the usage of machine learning models for sales time series forecasting. The effect of machine learning generalization has been considered. A stacking approach for building regression ensemble of single models has been studied. The results show that using stacking technics, we can improve the performance of predictive models for sales time series forecasting.
ARTICLE | doi:10.20944/preprints201608.0025.v2
Subject: Earth Sciences, Atmospheric Science Keywords: solar variability; NAO; ENSO; volcanic eruptions; multiple regression
Online: 17 May 2017 (06:27:16 CEST)
The role of natural factors mainly solar eleven-year cycle variability, and volcanic eruptions on two major modes of climate variability the North Atlantic Oscillation (NAO) and El Niño Southern Oscillation (ENSO) are studied for around last 150 years period. The NAO is the primary factor to regulate Central England Temperature (CET) during winter throughout the period, though NAO is impacted differently by other factors in various time periods. Solar variability indicates a strong positive influence on NAO during 1978-1997, though suggests opposite in earlier period. Solar NAO lag relationship is also shown sensitive to the chosen times of reference and thus points towards the previously proposed mechanism/ relationship related to the sun and NAO. The ENSO is influenced strongly by solar variability and volcanic eruptions in certain periods. This study observes a strong negative association between the sun and ENSO before the 1950s, which is even opposite during the second half of 20th century. The period 1978-1997, when two strong eruptions coincided with active years of strong solar cycles, the ENSO, and volcano suggested a stronger association, and we discussed the important role played by ENSO. That period showed warming in central tropical Pacific while cooling in the North Atlantic with reference to the later period (1999-2017) and also from chosen earlier period. Here we show that the mean atmospheric state is important for understanding the connection between solar variability, the NAO and ENSO and associated mechanism. It presents a critical analysis to improve knowledge about major modes of variability and their role in climate. We also discussed the importance of detecting the robust signal of natural variability, mainly the sun.
ARTICLE | doi:10.20944/preprints202011.0363.v1
Subject: Chemistry, Analytical Chemistry Keywords: cannabinoid receptor 1; synthetic cannabinoids; quantitative structure-activity relationship; multiple linear regression; partial least squares regression; dependence and abuse potential
Online: 13 November 2020 (07:19:36 CET)
In recent years, there have been frequent reports on the adverse effects of synthetic cannabinoid (SC) abuse. SCs cause psychoactive effects, similar to those caused by marijuana, by binding and activating cannabinoid receptor 1 (CB1R) in the central nervous system. The aim of this study was to establish a reliable quantitative structure-activity relationship (QSAR) model to correlate the structures and physicochemical properties of various SCs with their CB1R-binding affinities. We prepared 15 SCs and their derivatives (tetrahydrocannabinol [THC], naphthoylindoles, and cyclohexylphenols) and determined their binding affinity to CB1R, which is known as a dependence-related target. We calculated the molecular descriptors for dataset compounds using an R/CDK (R package integrated with CDK, version 3.5.0) toolkit to build QSAR regression models. These models were established and statistical evaluations were performed using the mlr and plsr packages in R software. The most reliable QSAR model was obtained from the partial least squares regression method via external validation. This model can be applied in vivo to predict the addictive properties of illicit new SCs. Using a limited number of dataset compounds and our own experimental activity data, we built a QSAR model for SCs with good predictability. This QSAR modeling approach provides a novel strategy for establishing an efficient tool to predict the abuse potential of various SCs and to control their illicit use.
ARTICLE | doi:10.20944/preprints202209.0353.v1
Subject: Medicine & Pharmacology, Obstetrics & Gynaecology Keywords: Africa; Maternal mortality rate; Joinpoint regression analysis; mortality; trends
Online: 23 September 2022 (03:06:07 CEST)
Background: United Nations Sustainable Development Goals state that by 2030, the Global maternal mortality rate (MMR) should be lower than 70 per 100,000 live births. MMR is still one of Africa's leading causes of death among women. This research aims to study regional trends in maternal mortality in Africa. Methods: We extracted data for Maternal mortality rates per 100,000 births from the World Bank database from 1990-2015. Joinpoint regression was used to study the trends and estimate the annual percent change (APC). Results: Maternal mortality has decreased in Africa over the study period by an average APC of -2.6%. All regions showed significant downward trends, with the sharpest decreases in East Africa. Only the North African region is close to the United Nations' sustainable development goals for Maternal mortality. The remaining sub-Saharan African regions are still far from achieving the goals. Conclusions: maternal mortality has decreased in Africa, especially in East Africa. The only region closed to the United Nations target is North Africa. The remaining sub-Saharan African regions are still far from achieving the goals. These results could be used for the development of Regional Policies.
ARTICLE | doi:10.20944/preprints202208.0445.v1
Subject: Social Sciences, Economics Keywords: Adult children's education; parental longevity; truncated regression; emotional support.
Online: 26 August 2022 (04:18:44 CEST)
Background: Some developing countries, such as China, population is aging rapidly, meanwhile, the average years of schooling for residents is constantly increasing. However, the question of whether adult children’s education has an effect on the longevity of older parents, remains inadequately studied. Methods: This paper uses China Health and Retirement Longitudinal Survey (CHARLS) data to estimate the causal impact of adult children's education on their parents' longevity. Identification is achieved by using the truncated regression model and using historical education data as instrument variables for adult children’s education. Results: For every unit increase in adult children’s education, the father’s and mother’s longevity increased by 0.89 years and 0.75 years, respectively. Mechanism analysis shows that adult children's education has a significant positive impact on parents' emotional support, financial support and self-reported health. Further evidence shows that for every unit increase in adult children’s education, the father-in-law’s and mother-in-law’s longevity increased by 0.40 years and 0.46 years, respectively. Conclusions: It is conclusion that improving the level of adult children’s education can increase parents’ and parents-in-law’s longevity. Adult children’s education might contribute to the longevity of older parents by three channels that providing emotional, economic support and affecting parents’ health.
ARTICLE | doi:10.20944/preprints202205.0255.v1
Subject: Life Sciences, Biophysics Keywords: SILCS; hERG channel; Physicochemical properties; Multiple linear regression; FragMaps
Online: 19 May 2022 (08:46:24 CEST)
Human ether-a-go-go-related gene (hERG) potassium channel is well-known contributor to drug-induced cardiotoxicity and therefore an extremely important target when performing safety assessments of drug candidates. Ligand-based approaches in connection with quantitative structure active relationships (QSAR) analyses have been developed to predict hERG toxicity. Availability of the recent published cryogenic electron microscopy (cryo-EM) structure for the hERG channel opened the prospect for using structure-based simulation and docking approaches for hERG drug liability predictions. In recent time, the idea of combining structure- and ligand-based approaches for modeling hERG drug liability has gained momentum offering improvements in predictability when compared to ligand-based QSAR practices alone. The present article demonstrates uniting the structure-based SILCS (site-identification by ligand competitive saturation) approach in conjunction with physicochemical properties to develop predictive models for hERG blockade. This combination leads to improved model predictability based on Pearson’s R and percent correct (represents rank-ordering of ligands) metric for different validation sets of hERG blockers involving diverse chemical scaffold and wide range of pIC50 values. The inclusion of the SILCS structure-based approach allows determination of the hERG region to which compounds bind and the contribution of different chemical moieties in the compounds to blockade, thereby facilitating the rational ligand design to minimize hERG liability.
ARTICLE | doi:10.20944/preprints202205.0240.v1
Subject: Social Sciences, Economics Keywords: Credit constraints; Export; SMEs; Instrumental variable; Probit regression; Vietnam
Online: 18 May 2022 (10:35:32 CEST)
Export participation and restricted access to external formal credit are two factors attracting meticulous attention from researchers and policymakers, especially in developing countries. Exploring the interactive relationship of these factors in both the static and dynamic models is the purpose of this study. The study uses data sets from small and medium-sized manufacturing enterprises (SMEs) in Vietnam for the period 2009 - 2015. The instrumental variable approach is implemented to deal with the endogenous variable problem in the model. The results show an effect of credit constraint on the firms’ exporting status, and continuous exports are likely to reduce the limit of credit constraint.
ARTICLE | doi:10.20944/preprints202205.0032.v1
Subject: Social Sciences, Organizational Economics & Management Keywords: digitalisation; sustainability; sustainable development goals; European Union; regression equations
Online: 5 May 2022 (10:24:13 CEST)
Digitalisation provides access to an integrated network of information that can benefit society, and business. Building digital network and society using digital means can create something unique opportunities to strategically address sustainable development challenges for the United Nations Targets (SDG) to ensure higher productivity, education and to equality oriented society. This point of view describes the potential of digitalisation for society and business of the future. The authors revise the links between digitalisation and sustainability in the European Union countries. The methodology for the research is suggested in the paper and linear regression method is applied. The results showed tiers with five SDG, focusing on society and business, and all these tiers are fixed in the constructed equations for each SDG. The suggested solution is statistically valid and proves the novelty of research. Among digitalisation indicators, only mobile-cellular subscriptions and fixed-broadband sub-basket prices in part have no effect on researched sustainable development indicators.
ARTICLE | doi:10.20944/preprints202112.0455.v1
Subject: Medicine & Pharmacology, Other Keywords: COVID- 19; Durbin-Watson statistic; Multiple Linear Regression; Multicollinearity
Online: 28 December 2021 (16:11:44 CET)
This paper will discuss the application of statistic modeling to interpret a health system crisis in Sri Lanka due to COVID- 19.A strong focus on the preventive approach and the contact tracing with the utilization of available resources in a rational manner describes Sri Lanka’s response towards COVID- 19 prevention and mitigation. The early contact tracing, preemptive quarantining, isolation, and treatment were implemented as a concerted effort. This approach, proven efficient during the early phase of the pandemic, was sustainable when there was a rapid increase in the COVID- 19 patients since July 2021, exceeding the health system capacity.The country’s COVID- 19 situation during the period from 01st of August 2021 to 31st of October 2021 was taken into consideration. Variables used for analysis were; total number of cases, recovered cases, comorbid and O2 dependent patients, ICU patients, and deaths. The regression model was applied to analyze the data by using the EViews 12 (x64) software application.The correlation coefficients of all the independent variables under consideration implies that they have a strong positive relationship with the number of deaths occurred during the said period. According to the computed multiple linear regression model, the number of positive cases and O2 dependents have a positive relationship with the dependent variable. Further, the Durbin- Watson stat value of the model and multicollinearity test reflect that it is free from serial correlation thereby the model is fit. From the perspective of epidemiological control, these findings highlight the importance of keeping the number of cases within the limits of health system capacity.
ARTICLE | doi:10.20944/preprints202111.0227.v1
Subject: Social Sciences, Marketing Keywords: Lolita fashion; multiple regression; decision tree; social media; XGBoost
Online: 12 November 2021 (14:54:04 CET)
Despite extensively investigating the impact of social media on fashion products’ marketing, little evidence is available on how the platforms influence sales prediction. Focusing on Lolita fashion, this study investigates the impact of social media marketing on the sales volume prediction of fashion products. Essentially, we analyzed marketing data, including comments, likes, and shares from the Weibo social platform, to forecast future sales, examine how to enhance profit performance, and make production decisions. Using a quantitative approach, we tested three different prediction models, including multiple regression, decision tree, and XGBoost. The results revealed that increasing comments and decreasing the number of likes could significantly improve the sales volumes of Lolita products. In contrast, shares exerted a less significant impact on sales. Regarding prediction models, XGBoost was found to be the best method. In the fashion industry, social media is a useful tool for forecasting market trend. A limitation of this study is that only one social media platform was used to extract data, which might limit the generalization of the findings.
ARTICLE | doi:10.20944/preprints202105.0536.v1
Subject: Biology, Anatomy & Morphology Keywords: Argan biosphere reserve; Climate change; Rainfall; Temperature; Woodland regression
Online: 24 May 2021 (07:44:25 CEST)
This paper explores the effect of climate change on the regression of the Argan tree (Argania spinosa L. Skeels) woodland, focusing on the Argan Biosphere Reserve and especially in the Souss plain (Western Morocco). Rainfall and temperature data of four sites within the Argan Biosphere Reserve were analyzed over the last 60 years to assess any climatic change. Regression curves applied to the dataset showed an important decrease in rainfall (18 to 26 %) in the four locations as well as an increase in temperature (1 to 2 °C). These changes may have a detrimental effect on the Argan woodland although human factors have been reported to be the main factor of its regression. It can therefore be concluded that the reduction in rainfall and the increase in temperature should now be considered as factors of Argan woodland regression.
ARTICLE | doi:10.20944/preprints202104.0622.v1
Subject: Engineering, Automotive Engineering Keywords: Complex Regression, Least-Squares Techniques, Advanced Metering Infrastructure (AMI)
Online: 23 April 2021 (09:46:32 CEST)
This paper uses the complex regression analysis method to establish the customer’s load regression models, which consider economic indicators, temperature and rainfall. Furthermore, the proposed models are used to study the forecasting feasibility of the future energy sales and summer peak load demand. At first, this paper used least-squares techniques to derive regression models for considering economic indicators and temperature of 34 customer energy sales and total energy sales. Besides, the AMI high voltage customer demand data and system generating capacity for 24 hours were adopted to forecast summer peak load. The above-mentioned data analysis tool is used by EViews software to achieve, in order to verify the feasibility of the research framework. The study found that although its forecasting model accuracy is low only when mixed with temperature and high voltage demands. So, when mixed with high voltage demand data and system generating capacity for 24 hours to forecast peak load, the average error is ± 0.87% and in the majority of its energy sales forecasting model of average error is ±3%. This result can provide power company as future reference.
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Face detection; CSEM; Deep learning; GPU; CPU; Benchmark; Regression
Online: 27 July 2020 (14:54:15 CEST)
Face recognition is a valuable forensic tool for criminal investigators since it certainly helps in identifying individuals in scenarios of criminal activity like fugitives or child sexual abuse. It is, however, a very challenging task as it must be able to handle low-quality images of real world settings and fulfill real time requirements. Deep learning approaches for face detection have proven to be very successful but they require a large computation power and processing time. In this work, we evaluate the speed-accuracy tradeoff of three popular deep-learning-based face detectors on the WIDER Face and UFDD data sets in several CPUs and GPUs. We also develop a regression model capable to estimate the performance, both in terms of processing time and accuracy. We expect this to become a very useful tool for the end user in forensic laboratories in order to estimate the performance for different face detection options. Experimental results showed that the best speed-accuracy tradeoff is achieved with images resized to 50% of the original size in GPUs and images resized to 25% of the original size in CPUs. Moreover, performance can be estimated using multiple linear regression models with a Mean Absolute Error (MAE) of 0.113 what is very promising for the forensic field.
ARTICLE | doi:10.20944/preprints202001.0377.v1
Subject: Earth Sciences, Geophysics Keywords: ERT method; regression model; tailings pond; heavy metal; reclamation
Online: 31 January 2020 (05:04:37 CET)
Legacy mining industry has left a large number of tailings ponds exposed to water and wind erosion that causes serious environmental and health problems. Prior to rehabilitation actions a deep sampling of the materials infilling the pond used to be necessary. Thus, the primary objective of this study is to demonstrate the usefulness of the Electrical Resistivity Tomography (ERT) method as a non-invasive tool to determine the physicochemical composition of mine tailings ponds, enabling more efficient and low-cost surveys. To achieve this objective, three ERT profiles and three boreholes in each profile were carried out, from each borehole three waste samples from differents depths were collected and a geochemical characterization of the samples was carried. In order to estimate the composition of the infilling wastes in tailing ponds from electrical resistivity measures, several regression models were calculated for different physicochemical properties and metal concentrations. As a result, a high resistivity area was depicted in profiles G2 and G3 while a non-resistive area (profile G1) was also found. Relationships among low resistivity values and high salinity, clay content and high metal concentrations and mobility were established. Specifically, calibrated models were obtained for electrical conductivity, particles sizes of 0.02-50 µm and 50-2000 µm, total Zn and Cd concentration, and bioavailable Ni, Cd and Fe. Therefore, the ERT technique could be considered as a useful tool for mine tailings ponds characterization, and it can be used to estmate some physicochemical properties and metal concentrations of this mine waste.
ARTICLE | doi:10.20944/preprints201903.0090.v1
Subject: Engineering, Energy & Fuel Technology Keywords: Sustainable development; House prices; ARIMA; Regression analysis; New Zealand
Online: 7 March 2019 (12:02:50 CET)
The New Zealand housing sector is experiencing rapid growth that boosts the national economy but also results in the loss of valuable resources. In line with the growth, the housing market for both residential and business purposes has been booming, as have house prices. To sustain the housing development, it is critical to accurately monitor and predict housing prices so as to support the decision-making process in housing sector. This study is devoted to applying a mathematical method to predict housing prices. The forecasting performance of two types of models: ARIMA and multiple linear regression analysis are compared. The ARIMA and regression models are developed based on a training-validation sample method. The results show that the ARIMA model generally performs better than the regression model. However, the regression model explores, to some extent, the significant correlations between house prices in New Zealand and the macro-economic conditions.
ARTICLE | doi:10.20944/preprints201811.0394.v3
Subject: Engineering, Electrical & Electronic Engineering Keywords: marine current turbine; blade attachment; sparse autoencoder; softmax regression
Online: 12 February 2019 (09:59:09 CET)
The development and application of marine current energy are attracting more and more attention around the world. Due to the hardness of its working environment, it is important and difficult to study the fault diagnosis of a marine current generation system. In this paper, an underwater image is chosen as the fault-diagnosing signal, after different sensors are compared. This paper proposes a diagnosis method based on the sparse autoencoder (SA) and softmax regression (SR). The SA is used to extract the features and SR is used to classify them. Images are used to monitor whether the blade is attached by benthos and to determine its corresponding degree of attachment. Compared with other methods, the experiment results show that the proposed method can diagnose the blade attachment with higher accuracy.
ARTICLE | doi:10.20944/preprints201809.0076.v1
Subject: Medicine & Pharmacology, Pharmacology & Toxicology Keywords: pharmacovigilance; drug safety; segmented regression; interrupted time series; variation
Online: 5 September 2018 (01:27:54 CEST)
Introduction Pharmacovigilance may detect safety issues after marketing of medications, and this can result in regulatory action such as direct healthcare professional communications (DHPC). DHPC can be effective in changing prescribing behaviour, however the extent to which prescribers vary in their response to DHPC is unknown. This study aims to explore changes in prescribing and prescribing variation among GP practices following a DHPC on the safety of mirabegron, a medication to treat overactive bladder (OAB). Methods This is an interrupted time series study of English GP practices from 2014-2017. NHS Digital provided monthly statistics on aggregate practice-level prescribing and practice characteristics (practice staff and registered patient profiles, Quality & Outcomes Framework indicators, and deprivation of the practice area). The primary outcome was monthly mirabegron items as a percentage of all OAB drug items. The exposure was a DHPC issued by the European Medicines Agency in September 2015. Variation between practices in mirabegron prescribing before and after the DHPC was assessed using the systematic component of variation (SCV). Multilevel segmented regression with random effects quantified the change in level and trend of prescribing after the DHPC. Practice characteristics were assessed for their association with a reduction in prescribing following the DHPC. Results This study included 7,408 practices. During September 2015, 88.9% of practices prescribed mirabegron and mirabegron composed a mean of 8.2% (SD 6.8) of OAB items. Variation between practices was classified as very high and the median SCV did not change significantly (p=0.11) in the 6 months after the September 2015 DHPC (12.4) compared to before (11.6). Before the DHPC, there was a monthly trend of 0.294 (95%CI, 0.287, 0.301) percentage points increase in mirabegron percentage. There was no significant change in the month immediately after the DHPC (-0.023, 95% CI -0.105 to 0.058) however there was a significant reduction in trend (-0.036, 95% CI -0.049 to -0.023). Higher numbers of registered patients and patients aged ≥65 years, and practice area deprivation were associated with having a significant decrease in level and slope of mirabegron prescribing post-DHPC. Conclusion Variation in mirabegron prescribing was high over the study period and did not change substantively following the DHPC. There was no immediate prescribing change post-DHPC, although the monthly growth did slow. Knowledge of the degree of variation in and determinants of response to safety communications may allow those that do not change prescribing to be provided with additional supports.
ARTICLE | doi:10.20944/preprints201807.0353.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: corporate default swap spreads, correlation networks, vector autoregressive regression.
Online: 19 July 2018 (10:16:11 CEST)
We propose a novel credit risk measurement model for Corporate Default Swap spreads, that combines vector autoregressive regression with correlation networks. We focus on the sovereign CDS spreads of a collection of countries, that can be regarded as idiosyncratic measures of credit risk. We model them by means of a vector autoregressive regression model, composed by a time dependent country specific component, and by a contemporaneous component that describes contagion effects among countries. To disentangle the two components, we employ correlation networks, derived from the correlation matrix between the reduced form residuals. The proposed model is applied to ten countries that are representative of the recent financial crisis: top borrowing/lending countries, and peripheral European countries. The empirical findings show that the proposed model is a good predictor of CDS spreads movements, and that the contemporaneous component decreases prediction errors with respect to a simpler autoregressive model. From an applied viewpoint, core countries appear to import risk, as contagion increases their CDS spread, whereas peripheral countries appear as exporters of risk. Greece is an unfortunate exception, as its spreads seem to increase for both idiosyncratic factors and contagion effects.
ARTICLE | doi:10.20944/preprints201807.0087.v1
Subject: Keywords: Nigeria; financial development; economic growth; threshold regression; time series
Online: 5 July 2018 (08:39:38 CEST)
The relationship between economic growth, growth volatility and financial sector development continues to attract attention in the theoretical and empirical literature. Over time, some studies hypothesize that finance has a causal linear relationship with growth. Recently several other authors contradict this claim and argue that the relationship that exists between finance and growth is nonlinear. We investigate these claims for Nigeria for the period between 1970 and 2015, using semi-parametric econometric methods, Hansen sample splitting techniques and threshold estimator. We observed no evidence of ‘Too much finance’ as claimed by many researchers in recent times. We show that the relationship between financial development and economic growth is U-shaped. This is equally true for the relationship between financial development and growth volatility. We also discuss policy implications of our findings and recommend financial innovations and decentralization of stock exchanges to boost access to financial services, in addition, improved regulation to enhance financial market efficiency.
ARTICLE | doi:10.20944/preprints201806.0030.v1
Subject: Earth Sciences, Oceanography Keywords: synthetic aperture radar; automatic identification system; ice thickness; regression
Online: 4 June 2018 (10:28:38 CEST)
Ship speeds extracted from AIS data vary with ice conditions. We extrapolated this variation with SAR data to a chart of expected icegoing speed. The study is for the Gulf of Bothnia in March 2013 and for ships with ice class 1A Super that are able to navigate without icbreaker assistance. The speed was normalized to 0-10 for each ship. As the matching between AIS and SAR was complicated by ice drift during the time gap, from hours to two days, we calculated a set of local SAR statistics over several scales. We used random tree regression to estimate the speed. The accuracy was quantified by mean squared error (MSE), and the fraction of estimates close to the actual speeds. These depended strongly on the route and the day. MSE varied from 0.4 to 2.7 units2 for daily routes. 65 % of the estimates deviated less than one unit and 82 % less than 1.5 units from the AIS speeds. The estimated daily mean speeds were close to the observations. Largest speed decreases were provided by the estimator in a dampened form or not at all. This improved when ice chart thickness was included as one predictor.
ARTICLE | doi:10.20944/preprints201803.0093.v1
Subject: Engineering, Control & Systems Engineering Keywords: linear regression; covariance matrix; data association; sensor fusing; SLAM
Online: 13 March 2018 (04:06:56 CET)
Linear regression is a basic tool in mobile robotics, since it enables accurate estimation of straight lines from range-bearing scans or in digital images, which is a prerequisite for reliable data association and sensor fusing in the context of feature-based SLAM. This paper discusses, extends and compares existing algorithms for line fitting applicable also in case of strong covariances between the coordinates at each single data point, which must not be neglected if range-bearing sensors are used. Besides, particularly the determination of the covariance matrix is considered, which is required for stochastic modeling. The main contribution is a new error model of straight lines in closed form for calculating fast and reliably the covariance matrix dependent on just a few comprehensible and easily obtainable parameters. The model can be applied widely in any case when a line is fitted from a number of distinct points also without a-priori knowledge of the specific measurement noise. By means of extensive simulations the performance and robustness of the new model in comparison to existing approaches is shown.
ARTICLE | doi:10.20944/preprints201803.0084.v1
Subject: Engineering, Civil Engineering Keywords: anfis; missing data; multiple regression; normal ratio method; Yeşilırmak
Online: 12 March 2018 (07:00:46 CET)
Good data analysis is required for the optimal design of water resources projects. However, data are not regularly collected due to material or technical reasons, which results in incomplete-data problems. Available data and data length are of great importance to solve those problems. Various studies have been conducted on missing data treatment. This study used data from the flow observation stations on Yeşilırmak River in Turkey. In the first part of the study, models were generated and compared in order to complete missing data using ANFIS, multiple regression and Normal Ratio Method. In the second part of the study, the minimum number of data required for ANFIS models was determined using the optimum ANFIS model. Of all methods compared in this study, ANFIS models yielded the most accurate results. A 10-year training set was also found to be sufficient as a data set.
ARTICLE | doi:10.20944/preprints201801.0090.v1
Subject: Social Sciences, Econometrics & Statistics Keywords: clustering; curve fitting; nonparametric regression; smoothing data; polynomial approximation
Online: 10 January 2018 (09:48:23 CET)
Nonlinear nonparametric statistics (NNS) algorithm offers new tools for curve fitting. A relationship between k-means clustering and NNS regression points is explored with graphics showing a perfect fit in the limit. The goal of this paper is to demonstrate NNS as a form of unsupervised learning, and supply a proof of its limit condition. The procedural similarity NNS shares with vector quantization is also documented, along with identical outputs for NNS and a k nearest neighbours classification algorithm under a specific NNS setting. Fisher's iris data and artificial data are used. Even though a perfect fit should obviously be reserved for instances of high signal to noise ratios, NNS permits greater flexibility by offering a large spectrum of possible fits from linear to perfect.
ARTICLE | doi:10.20944/preprints201705.0007.v1
Subject: Social Sciences, Economics Keywords: adoption; land degradation; poisson regression; sustainable land management practices
Online: 1 May 2017 (08:33:17 CEST)
Land degradation is a serious impediment to improving rural livelihoods in Eastern Africa. This paper identifies major land degradation patterns and causes, and analyzes the determinants of sustainable land management (SLM) in three countries (Ethiopia, Malawi and Tanzania). The results show that land degradation hotspots cover about 51%, 41%, 23% and 23% of the terrestrial areas in Tanzania, Malawi and Ethiopia respectively. The analysis of nationally representative household surveys shows that the key drivers of SLM in these countries are biophysical, demographic, regional and socio-economic determinants. Secure land tenure, access to extension services and market access are some of the determinants incentivizing SLM adoption. The implications of this study are that policies and strategies that facilities secure land tenure and access to SLM information are likely to incentivize investments in SLM. Local institutions providing credit services, inputs such as seed and fertilizers, and extension services must also not be ignored in the development policies.
ARTICLE | doi:10.20944/preprints202207.0383.v1
Subject: Engineering, Marine Engineering Keywords: machine learning; forecast; regression models; Liquified Natural Gas; maritime transportation
Online: 26 July 2022 (03:50:12 CEST)
Recent maritime legislations demand the transformation of the sector to greener and more energy efficient transportation. Liquified Natural Gas (LNG) seems a promising alternative fuel solution that could replace the conventional fuel sources. Various studies have been focused on the prediction of LNG price, however, no previous work has been made on the forecast of spot charter rate of LNG carrier ships. An important knowledge for the maritime industries and companies when it comes to decision-making. Therefore, this study is focused on the development of a machine learning pipeline to address the aforementioned problem by: (i) forming a dataset with variables relevant to LNG; (ii) identifying the variables that impact on the freight price of LNG carrier; (iii) developing and evaluating regression models for short and mid-term forecast. The results showed that the General Regression Neural Network presented a stable overall performance for 2, 4 and 6 months forecast.
ARTICLE | doi:10.20944/preprints202205.0391.v1
Subject: Social Sciences, Business And Administrative Sciences Keywords: circularity of materials; circular activity; recycling; regression model; key elements
Online: 30 May 2022 (09:59:03 CEST)
The authors have revised the circularity of materials, which is important to stimulate circular activity processes. The theoretical part starts with describing the characteristics of the circular activity and the comparison of circular and linear systems in terms of recycling. Later on, the authors examined key elements important for the circularity and the results of an examination of various sectors. The authors formed a correlation matrix and used a dynamic regression model to identify the circular material use rate. The authors suggested a three-level methodology, using it provided a dynamic regression model which could be applied for forecasting the size of circular material use rate in European Union countries. The results show that private investments into recycling and the recycling of electronic waste and the recycling of other municipal waste categories are important in seeking to increase the usage rate of circular materials.
ARTICLE | doi:10.20944/preprints202205.0095.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: Regression; AI based Tornado Analysis; Decision Support System; Mobile Application
Online: 9 May 2022 (03:15:11 CEST)
Tropical cyclones devastate large areas, take numerous lives and damage extensive property in Bangladesh. Research on landfalling tropical cyclones affecting Bangladesh has primarily focused on events occurring since AD1960 with limited work examining earlier historical records. We rectify this gap by developing a new tornado catalogue that include present and past records of tornados across Bangladesh maximizing use of available sources. Within this new tornado database, 119 records were captured starting from 1838 till 2020 causing 8,735 deaths and 97,868 injuries leaving more than 1,02,776 people affected in total. Moreover, using this new tornado data, we developed an end-to-end system that allows a user to explore and analyze the full range of tornado data on multiple scenarios. The user of this new system can select a date range or search a particular location, and then, all the tornado information along with Artificial Intelligence (AI) based insights within that selected scope would be dynamically presented in a range of devices including iOS, Android, and Windows. Using a set of interactive maps, charts, graphs, and visualizations the user would have a comprehensive understanding of the historical records of Tornados, Cyclones and associated landfalls with detailed data distributions and statistics.
ARTICLE | doi:10.20944/preprints202201.0165.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: HIV/TB co-infected Mortality; Residential Variations; Multilevel Logistic Regression
Online: 12 January 2022 (13:34:06 CET)
The purpose of this study was to identify the factors that affect the mortality among adult HIV/TB co-infected patients and to see the nutritional difference among mortality in residence level. Retrospective cohort studies of 417 patients which fulfill our criteria were included. Multilevel logistic regression models were used. MLwiN and SPSS software are used to estimate the parameter. The variance of the random factor in the empty model was significant which indicates that there were residential differences in TB-HIV co-infected mortality and it shows multilevel analysis was an appropriate approach for further analysis. The prevalence of HIV/TB co-infected patients' death was 12.9% in study time. Functional status, age of patients, WHO clinical stages, nutritional status, CD4 counts, regimen, and BMI were found to be significant determinants of HIV/TB co-infected mortality. In our study, patients with the bedridden category of functional status, the fourth stages of WHO clinical stages (stage IV), patients with higher age, patients whose treatments were second-line regimen and low CD4 cell counts were more at risk of death. The study also revealed that; poor nutritional status increased the risk of mortality among HIV/TB co-infected patients and it varies among the residence of the patients (rural area were more at risk).
ARTICLE | doi:10.20944/preprints202105.0216.v1
Subject: Social Sciences, Accounting Keywords: Built environment; pedestrian volume; stepwise regression; principal component analysis; Melbourne
Online: 10 May 2021 (15:34:00 CEST)
Previous studies have mostly examined how sustainable cities try to promote non-motorized travel by creating a walking-friendly environment. Such existing studies provide little research that identifies how the built environment affects pedestrian volume in high-density areas. This paper presents a methodology that combines person correlation analysis, stepwise regression, and principal component analysis for exploring the internal correlation and potential impact of built environment variables. To study this relationship, cross-sectional data in the Melbourne central business district were selected. Pearson’s correlation coefficient confirmed that visible green index and intersection density were not correlated to pedestrian volume. The results from stepwise regression showed that land-use mix degree, public transit stop density, and employment density could be associated with pedestrian volume. Moreover, two principal components were extracted by factor analysis. The result of the first component yielded an internal correlation where land-use and amenities components were positively associated with the pedestrian volume. Component 2 presents parking facilities density, which negatively relates to the pedestrian volume. Based on the results, existing street problems and policy recommendations were put forward to suggest diversifying community service within walking distance, improving the service level of the public transit system, and restricting on-street parking in Melbourne.
HYPOTHESIS | doi:10.20944/preprints202104.0516.v1
Subject: Medicine & Pharmacology, Allergology Keywords: spontaneous regression; tumors; cancer; bacterial therapy; Coley; immunotherapy; hyperthermia; oncology
Online: 19 April 2021 (21:03:16 CEST)
Neither tumor growth nor regression is truly spontaneous, but both may under special circumstances be driven by similar events. We describe a sequence of processes that typically leads to tumor progression but may on occasion inadvertently result in regression. A possible procedure for reducing tumor mass through a controlled intervention is also outlined.
ARTICLE | doi:10.20944/preprints202103.0446.v1
Subject: Chemistry, Analytical Chemistry Keywords: acrylamide; coffee; partial least square regression; NMR; LC-MS/MS
Online: 17 March 2021 (14:48:40 CET)
Acrylamide is probably carcinogenic to humans (International Agency for Research on Cancer, group 2A) with major occurrence in heated, mainly carbohydrate-rich foods. For roasted coffee, a European Union benchmark level of 400 µg/kg acrylamide is of importance. Regularly, the acrylamide contents are controlled using liquid chromatography combined with tandem mass spectrometry (LC-MS/MS). This reference method is reliable and precise but laborious because of the necessary sample clean-up procedure and instrument requirements. This research investigates the possibility of predicting the acrylamide content from proton nuclear magnetic resonance (NMR) spectra that are already recorded for other purposes of coffee control. In the NMR spectrum acrylamide is not directly quantifiable, so that the aim was to establish a correlation between the reference value and the corresponding NMR spectrum by means of a partial least squares (PLS) regression. Therefore, 40 commercially available coffee samples with already available LC-MS/MS data and NMR spectra were used as calibration data. To test the accuracy and robustness of the model and its limitations, 50 coffee samples with extreme roasting degrees and blends were additionally prepared as test set. The PLS model shows an applicability for the varieties C. arabica and C. canephora, which were medium to very dark roasted using drum or infrared roasters. The root mean square error of prediction (RMSEP) is 79 µg/kg acrylamide (n=32). The PLS model is judged as suitable to predict the acrylamide values of commercially available coffee samples. On the other hand, very light roasts containing more than 1000 µg/kg acrylamide are currently not suitable for PLS prediction.
ARTICLE | doi:10.20944/preprints202005.0147.v1
Online: 9 May 2020 (04:30:32 CEST)
The sudden pervasive of severe acute respiratory syndrome Covid-19 has been leading the universe into a prominent crisis. It has influenced each zone, for example, industrial area, horticultural zone, Public transportation, economic zone, and so on. So as to see how Covid-19 affected the globe, we conducted an investigation characterizing the effects of the pandemic over the world using Machine Learning (ML) method. Prediction is a typical data science exercise that helps the administration with function planning, objective setting, and anomaly detection. We propose an additive regression model with interpretable parameters that can be naturally balanced by experts with domain intuition about the time series. We focus on global data beginning from 22nd January 2020, till 26th April 2020 and performed dynamic map visualization of Covid-19 expansion globally by date wise and predicting the spread of virus on all countries and continents. The major advantages of this work include accurate analysis of country-wise as well as province/state-wise confirmed cases, recovered cases, deaths, prediction of pandemic viral attack and how far it is expanding globally.
ARTICLE | doi:10.20944/preprints202003.0146.v1
Subject: Social Sciences, Other Keywords: conventional agriculture; land degradation; small-holders; multinomial logistic regression; Nepal
Online: 9 March 2020 (01:24:35 CET)
Land degradation is a critical issue globally putting our future generations at risk. The decrease in farm productivity over the years is evidence of land degradation severity in Nepal. Among the many strategies in place, agroforestry, which is an integrated tree-based farming, is widely recommended to address this productivity issue. This paper thoroughly examines what influences the choice of agroforestry adoption by farmers and what discourages the adoption. For this, a total of 288 households were surveyed using a structured questionnaire. Two agroforestry practices were compared with conventional agriculture with the help of the Multinomial Logistic Regression (MNL) model. The likelihood of adoption was found to be influenced by gender; the male-headed households were more likely to adopt the tree-based farming practice. Having a source of off-farm income was positively associated with the adoption decision of farmers. Area of farmland was found being the major constraint to agroforestry adoption for smallholder farmers. Some other variables that affected positively included livestock herd size, provision of extension service, home-to- forest distance, farmers’ group membership and awareness of farmers about environmental benefits of agroforestry. Irrigation was another adoption constraint that the study area farmers were faced with. The households with means of transport and with larger family (household) size were found to be reluctant towards agroforestry adoption. A collective farming practice could be a strategy to engage the smallholder farmers in agroforestry.
ARTICLE | doi:10.20944/preprints202002.0069.v1
Subject: Mathematics & Computer Science, Applied Mathematics Keywords: coal; supercritical CO2; Gaussian process regression; machine learning; adsorption model
Online: 5 February 2020 (14:09:33 CET)
Deep coal beds have been suggested as possible usable underground geological locations for carbon dioxide storage. Furthermore, injecting carbon dioxide into coal beds can improve the methane recovery. Due to importance of this issue, a novel investigation has been done on adsorption of carbon dioxide on various types of coal seam. This study has proposed four types of Gaussian Process Regression (GPR) approaches with different kernel functions to estimate excess adsorption of carbon dioxide in terms of temperature, pressure and composition of coal seams. The comparison of GPR outputs and actual excess adsorption expresses that proposed models have interesting accuracy and also the Exponential GPR approach has better performance than other ones. For this structure, R2=1, MRE=0.01542, MSE=0, RMSE=0.00019 and STD=0.00014 have been determined. Additionally, the impacts of effective parameters on excess adsorption capacity have been studied for the first time in literature. According to these results, the present work has valuable and useful tools for petroleum and chemical engineers who dealing with enhancement of recovery and environment protection.
ARTICLE | doi:10.20944/preprints201910.0321.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: maximum likelihood; logistic regression; firth's correction; separation; penalized likelihood; bias
Online: 28 October 2019 (12:01:17 CET)
The parameters of logistic regression models are usually obtained by the method of maximum likelihood (ML). However, in analyses of small data sets or data sets with unbalanced outcomes or exposures, ML parameter estimates may not exist. This situation has been termed “separation” as the two outcome groups are separated by the values of a covariate or a linear combination of covariates. To overcome the problem of non-existing ML parameter estimates, applying Firth’s correction (FC) was proposed. In practice, however, a principal investigator might be advised to “bring more data” in order to solve a separation issue. We illustrate the problem by means of an examples from colorectal cancer screening and ornithology. It is unclear if such an increasing sample size (ISS) strategy that keeps sampling new observations until separation is removed improves estimation compared to applying FC to the original data set. We performed an extensive simulation study where the main focus was to estimate the cost-adjusted relative efficiency of ML combined with ISS compared to FC. FC yielded reasonably small root mean squared errors and proved to be the more efficient estimator. Given our findings, we propose not to adapt the sample size when separation is encountered but to use FC as the default method of analysis whenever the number of observations or outcome events is critically low.
ARTICLE | doi:10.20944/preprints201810.0159.v1
Subject: Earth Sciences, Environmental Sciences Keywords: Pollution dispersion; PM10; air quality; Land Use Regression; Symos’97
Online: 8 October 2018 (16:18:22 CEST)
Abstract: The air pollution dispersion modelling via spatial analyses (Land Use Regression – LUR) is an alternative approach to the air quality assessment to the standard air pollution dispersion modelling techniques. Its advantages are mainly much simpler mathematical apparatus, quicker and simpler calculations and a possibility to incorporate other factors affecting pollutant’s concentration. The goal of the study was to model the PM10 particles dispersion modelling via spatial analyses v in Czech-Polish border area of Upper Silesian industrial agglomeration and compare results with results of the standard Gaussian dispersion model SYMOS’97. Results show that standard Gaussian model with the same data as the LUR model gives better results (determination coefficient 71% for Gaussian model to 48% for LUR model). When factors of the land cover and were included into the LUR model, the LUR model results were significantly improved (65% determination coefficient) to the level comparable with Gaussian model. The hybrid approach combining the Gaussian model with the LUR gives superior quality of results (65% determination coefficient).
TECHNICAL NOTE | doi:10.20944/preprints201809.0539.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: Non-normality, Classical Linear Regression Model, Modified Maximum Likelihood Estimation.
Online: 27 September 2018 (10:04:26 CEST)
Regression models form the core of the discipline of econometrics. One of the basic assumptions of classical linear regression model is that the values of the explanatory variables are fixed in repeated sampling. However, in most of the real life cases, particularly in economics the assumption of fixed regressors is not always tenable. Under a non-experimental or uncontrolled environment, the dependent variable is often under the influence of explanatory variables that are stochastic in nature. There is a huge literature related to stochastic regressors in various aspects. In this paper, a historical perspective on some of the works related to stochastic regressor is being tried to pen down based on literature search.
ARTICLE | doi:10.20944/preprints201807.0412.v1
Subject: Social Sciences, Econometrics & Statistics Keywords: P.C. regression; AIC criterion; logit function; pearson's Chi-square use
Online: 23 July 2018 (10:58:36 CEST)
In this paper, we use the Principal Components Logistic Regression as a technique to reduce the variables being used in Credit Scoring Modeling. Specifically, we construct two models in which greek enterprises are classified, through their credit behavior and we evaluate them, relying on real data. In general, we propose a general way to use PC Regression, in case that we have high correlations and categorical variables in the sample.
ARTICLE | doi:10.20944/preprints201806.0467.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: variable annuity; portfolio valuation; linear regression; group-lasso; interaction effect
Online: 28 June 2018 (12:05:27 CEST)
A variable annuity is a popular life insurance product that comes with financial guarantees. Using Monte Carlo simulation to value a large variable annuity portfolio is extremely time-consuming. Metamodeling approaches have been proposed in the literature to speed up the valuation process. In metamodeling, a metamodel is first fitted to a small number of variable annuity contracts and then used to predict the values of all other contracts. However, metamodels that have been investigated in the literature are sophisticated predictive models. In this paper, we investigate the use of linear regression models with interaction effects for the valuation of large variable annuity portfolios. Our numerical results show that linear regression models with interactions are able to produce accurate predictions and can be useful additions to the toolbox of metamodels that insurance companies can use to speed up the valuation of large VA portfolios.
ARTICLE | doi:10.20944/preprints201806.0429.v1
Subject: Biology, Forestry Keywords: Mixed forests; Questionnaire Survey; Ecosystem Services; Stepwise Regression; Climate Change
Online: 26 June 2018 (15:48:31 CEST)
Scientific studies had shown that mixed forests of silver fir (Abies alba Mill.) and European beech (Fagus sylvatica L.) provide higher ecosystem services than monospecific forests. Mixed forests are known for their high resilience to climate change impacts and superior biodiversity compared to monospecific forests. In many countries, promotion of mixed forests in forest management is becoming a government policy since they can contribute to fulfill the Sustainable Development Goals set by the United Nation, respectively Goal 13 and 15. However, not much is known about public perceptions on mixed forests compared to monoculture forests. Our study on ecosystem services provided by mixed and monospecific forests in southwest Germany fill this gap. Based on a survey with 520 valid responses we analyzed people’s perception on 18 different supporting, cultural, regulating and provisioning ecosystem services measured by Likert scale. Stepwise regression analyses show relations between social profiles (gender, age, education, profession) and preferences on respondents’ perceptions. Our findings show that people perceive that mixed forests provide better cultural, regulating and supporting ecosystem services than monospecific forests of fir and beech whereas provisioning services were perceived as being equally or better provided by monospecific forests. Significant effects towards a positive perception on ecosystem services provided by mixed forests were mainly influenced by the perceived abundance of old trees, feeling of pleasantness in mixed forests, age, profession, and education. Our findings indicate that there is a high public support for the promotion of silver fir and beech mixed forests in Southwest Germany.
ARTICLE | doi:10.20944/preprints201806.0365.v1
Subject: Engineering, General Engineering Keywords: ARIMA model; data forecasting; multi-objective genetic algorithm; regression model
Online: 24 June 2018 (07:48:49 CEST)
The aim of this study has been to develop a novel two-level multi-objective genetic algorithm (GA) to optimize time series forecasting data for fans used in road tunnels by the Swedish Transport Administration (Trafikverket). Level 1 is for the process of forecasting time series cost data, while level 2 evaluates the forecasting. Level 1 implements either a multi-objective GA based on the ARIMA model or a multi-objective GA based on the dynamic regression model. Level 2 utilises a multi-objective GA based on different forecasting error rates to identify a proper forecasting. Our method is compared with using the ARIMA model only. The results show the drawbacks of time series forecasting using only the ARIMA model. In addition, the results of the two-level model show the drawbacks of forecasting using a multi-objective GA based on the dynamic regression model. A multi-objective GA based on the ARIMA model produces better forecasting results. In level 2, five forecasting accuracy functions help in selecting the best forecasting. Selecting a proper methodology for forecasting is based on the averages of the forecasted data, the historical data, the actual data and the polynomial trends. The forecasted data can be used for life cycle cost (LCC) analysis.
ARTICLE | doi:10.20944/preprints201806.0080.v1
Subject: Social Sciences, Economics Keywords: data envelopment analysis; biennial Luenberger index; geographically weighted regression; EEP
Online: 6 June 2018 (09:59:47 CEST)
This paper proposes a new non-radial biennial Luenberger energy and environmental performance index (EEPI) to measure the EEP change in various Chinese cities. The sources of EEP change, in terms of technical efficiency change and technological change, are examined by Luenberger EEPI. The contributions from specific undesirable outputs and energy inputs to the EEP change are identified by means of the non-radial efficiency measure. The proposed approach is applied to evaluate the EEP of the industrial sector in 283 cities in China over 2010-2014. Factors influencing the emission abatement potential are investigated by employing geographically weighted regression (GWR) model. We find that 1) changes in EEP can be attributed to technological progress but that technological progress slows down across the study period; 2) the soot emission performance experiences a downtrend among four specific sub-performances in line with the truth that severe haze happened frequently in China; 3) the best performers begin to move from the coastal to inland cities with the less resource consumption and higher ecological equality; 4) cities with the strongest positive effect in regards to pollution intensity on emission abatement potential are located in the areas around the Bohai Gulf, where air pollution is particularly severe.
ARTICLE | doi:10.20944/preprints201804.0221.v1
Subject: Earth Sciences, Environmental Sciences Keywords: renewable energy sources; life quality; RES public acceptance; logit regression
Online: 17 April 2018 (10:18:42 CEST)
The aim of this paper is to analyze and evaluate Renewable Energy Sources (RES) usage and their contribution to citizens’ life quality. For this purpose, a survey was conducted, using a sample of 400 residents in an urban area of Attica region in Greece. The methods of Principal Components Analysis and Logit Regression were used on a dataset containing respondents’ views on various aspects of RES. Two statistical models were constructed for the identification of the main variables that are associated with RES’ usage and respondents’ opinion on their contribution to life quality. The conclusions that can be drawn show that the respondents are adequately informed about some of the RES’ types while most of them use at least one of the examined types of RES. The benefits that RES offer, were the most crucial variable in determining both respondents’ perceptions on their usage and on their contribution to life quality.
ARTICLE | doi:10.20944/preprints201803.0245.v1
Subject: Earth Sciences, Environmental Sciences Keywords: severity mapping; regression models; maximum likelihood; GeoCBI; dNBR; RdNBR; RBR
Online: 29 March 2018 (06:06:32 CEST)
The severity of forest fires derived from remote sensing data for research and management has become increasingly widespread in the last decade, where these data typically quantify the pre- and post-fire spectral change between satellite images on multi-spectral sensors. However, there is an active discussion about which of the main indices (dNBR, RdNBR or RBR) is the most adequate to estimate the severity of the fire, as well about the adjustment model used in the classification of severity levels. This study proposes and evaluates a new technique for mapping severity as an alternative to regression models, based on the use of the maximum likelihood estimation (MLE) automatic learning algorithm, from GeoCBI field data and spectral indices dNBR, RdNBR and RBR applied to Landsat TM, ETM+ Images, for two fires in central Spain. We compare the severity discrimination capability on dNBR, RdNBR and RBR, through a spectral separability index (M) and then evaluated the concordance of these metrics with field data based on GeoCBI measurements. Specifically, we evaluated the correspondence (R2) between each metric and the continuous measurement of fire severity (GeoCBI) and the general precision of the regression and MLE models, for the four categorized levels of severity (Unburned, Low, Moderate, and High). The results show that the RBR has more spectral separability (average between two fires M = 2.00) that the dNBR (M = 1.82) and the RdNBR (M=1.80), additionally the GeoCBI has a better adjustment with the RBR of (R2 = 0.73), than the RdNBR (R2 = 0.72), and dNBR (R2 = 0.71). Finally, the overall classification accuracy achieved with the MLE (Kappa = 0.65) has a better result than regression models (Kappa = 0.58) and higher accuracy of individual classes.
ARTICLE | doi:10.20944/preprints201803.0212.v1
Subject: Medicine & Pharmacology, Obstetrics & Gynaecology Keywords: fetal weight estimation; regression model; ultrasound measures; expectation maximization algorithm
Online: 26 March 2018 (09:59:51 CEST)
Fetal weight estimation before delivery is important in obstetrics, which assists doctors diagnose abnormal or diseased cases. Linear regression based on ultrasound measures such as bi-parietal diameter (bpd), head circumference (hc), abdominal circumference (ac), and fetal length (fl) is common statistical method for weight estimation but the regression model requires that time points of collecting such measures must not be too far from last ultrasound scans. Therefore this research proposes a method of early weight estimation based on expectation maximization (EM) algorithm so that ultrasound measures can be taken at any time points in gestational period. In other words, gestational sample can lack some or many fetus weights, which gives facilities to practitioners because practitioners need not concern fetus weights when taking ultrasound examinations. The proposed method is called dual regression expectation maximization (DREM) algorithm. Experimental results indicate that accuracy of DREM decreases insignificantly when completion of ultrasound sample decreases significantly. So it is proved that DREM withstands missing values in incomplete sample or sparse sample.
ARTICLE | doi:10.20944/preprints201704.0054.v1
Subject: Earth Sciences, Environmental Sciences Keywords: construction industry; energy rebound effect; sustainability; solow remainder; ridge regression
Online: 10 April 2017 (07:35:45 CEST)
As the largest energy consumer and carbon emitter, China has made substantial efforts to improve energy efficiency for decrease energy consumption, while the energy rebound effect determines its effectiveness. The embodied energy consumption of construction projects accounted for nearly one-sixth of the total economy's energy consumption in China. This paper is based on the logical relationship among capital input, technological progress, economic growth, and energy consumption, adapting an alternative estimation model to estimate the energy rebound effect for the construction industry in China for the first time. Empirical results in our paper reveal that the energy rebound effect for the construction industry in China is about 59.5% for the period of 1990–2014. The results indicate that the energy rebound effect does exist in China’s construction industry and it presented a fluctuating declining trend. This implies that half of the energy savings by technological progress is achieved. In addition, China’s government should implement proper energy pricing reforms and energy taxes to promote the sustainable development of China’s construction industry.
ARTICLE | doi:10.20944/preprints201607.0001.v1
Subject: Social Sciences, Finance Keywords: PUN, artificial intelligence models, regression tree, bootstrap aggregation, forecasting error
Online: 2 July 2016 (03:48:36 CEST)
Electricity price forecasting has become a crucial element for both private and public decision-making. This importance has been growing since the wave of deregulation and liberalization of energy sector worldwide late 1990s. Given these facts, this paper tries to come up with a precise and flexible forecasting model for the wholesale electricity price for the Italian power market on an hourly basis. We utilize artificial intelligence models such as neural networks and bagged regression trees that are rarely used to forecast electricity prices. After model calibration, our final model is bagged regression trees with exogenous variables. The selected model outperformed neural network and bagged regression with single price used in this paper, it also outperformed other statistical and non-statistical models used in other studies. We also confirm some theoretical specifications of the model. As a policy implication, this model might be used by energy traders, transmission system operators and energy regulators for an enhanced decision-making process.
ARTICLE | doi:10.20944/preprints202209.0273.v1
Subject: Earth Sciences, Environmental Sciences Keywords: net ecosystem exchange; eddy-covariance; regression; upscaling; data augmentation; feature selection
Online: 19 September 2022 (10:17:33 CEST)
Despite a rapid rise in NBS development in recent years, the methods for evaluating NBS still have certain gaps. We propose an approach based on a combination of remote sensing data and meteorological variables to reconstruct the spatio-temporal variation of net ecosystem exchange from eddy-covariance stations. Lagrangian particle dispersion model was used for upscaling of satellite images and flux towers. We trained data-driven models based on kernel methods separately for each selected land cover class. The results suggest that the proposed approach to quantifying carbon exchange on a medium-to-large scale by blending eddy covariance flux data with moderate resolution satellite and weather data provides a set of key advantages over previously deployed methods: (1) scalability, achieved via the validation design based on a separate set of eddy covariance stations; (2) high spatial and temporal resolution due to use of the Landsat imagery; (3) robust and accurate predictions due to improved data quality control, advanced machine learning techniques, and rigorous validation. The machine learning models yielded high cross-validation results. Overall we present here globally scaled technology for the land sector based on high resolution remote sensing imagery, meteorological variables, and direct carbon flux measurements of eddy covariance flux stations.
SHORT NOTE | doi:10.20944/preprints202207.0302.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: machine learning; artificial intelligence; pattern; models; classification; regression; GIS; remote sensing
Online: 20 July 2022 (10:58:15 CEST)
Machine learning (ML) is a subdivision of artificial intelligence in which the machine learns from machine-readable data and information. It uses data, learns the pattern and predicts the new outcomes. Its popularity is growing because it helps to understand the trend and provides a solution that can be either a model or a product. Applications of ML algorithms have increased drastically in G.I.S. and remote sensing in recent years. It has a broad range of applications, from developing energy-based models to assessing soil liquefaction to creating a relation between air quality and mortality. Here, in this paper, we discuss the most popular supervised ML models (classification and regression) in G.I.S. and remote sensing. The motivation for writing this paper is that ML models produce higher accuracy than traditional parametric classifiers, especially for complex data with many predictor variables. This paper provides a general overview of some popular supervised non-parametric ML models that can be used in most of the G.I.S. and remote sensing-based projects. We discuss classification (Naïve Bayes (NB), Support Vector Machine (SVM), Random Forest (RF), Decision Trees (DT)) and regression models (Random Forest (RF), Support Vector Machine (SVM), Linear and Non-Linear) here. Therefore, the article can be a guide to those interested in using ML models in their G.I.S. and remote sensing-based projects
ARTICLE | doi:10.20944/preprints202112.0184.v2
Subject: Earth Sciences, Other Keywords: Spectral; Geochemistry; Random Forest; Regression; Whole Rock; MIR; SWIR; VNIR; NMF
Online: 21 December 2021 (12:35:45 CET)
The efficacy of predicting geochemical parameters with a 2-chain workflow using spectral data as the initial input is evaluated. Spectral measurements spanning the approximate 400-25000nm spectral range are used to train a workflow consisting of a non-negative matrix function (NMF) step, for data reduction, and a random forest regression (RFR) to predict 8 geochemical parameters. Approximately 175000 spectra with their corresponding chemical analysis were available for training, testing and validation purposes. The samples and their spectral and chemical parameters represent 9399 drillcore. Of those, approximately 20000 spectra and their accompanying analysis were used for training and 5000 for model validation. The remaining pairwise data (150000 samples) were used for testing of the method. The data are distributed over 2 large spatial extents (980 km2 and 3025 km2 respectively) and allowed the proposed method to be tested against samples that are spatially distant from the initial training points. Global R2 scores and wt.% RMSE on the 150000 validation samples are Fe(0.95/3.01), SiO2(0.96/3.77), Al2O3(0.92/1.27), TiO(0.68/0.13), CaO(0.89/0.41), MgO(0.87/0.35), K2O(0.65/0.21) and LOI(0.90/1.14), given as Parameter(R2/RMSE), and demonstrate that the proposed method is capable of predicting the 8 parameters and is stable enough, in the environment tested, to extend beyond the training sets initial spatial location.
ARTICLE | doi:10.20944/preprints202105.0412.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Expression of multilayer network models, oriented graph, multivariate model, nonlinear regression
Online: 18 May 2021 (10:26:19 CEST)
Neural networks models are mostly represented by oriented graphs where only the components, constitutive elements of the graph, are transcribed into mathematical xpression. Indeed, accurate knowledge of the full expression of the model is required in certain situations such as selecting among several reference models, the one that best fits the available data or comparing the explanatory and predictive performance of an established model with respect to some reference models. In this paper, we establish a formalism of the mathematical expression for multilayer perceptron neural network in a general framework, MLP-p-n-q, with p, n and q natural integers and show its restriction to cases where one has a hidden layer and multivariate outputs (MLP-p-1-q), and then a single output (MLP-p-1-1). Then, we give some specific cases of the most commonly used models. An application case is presented in the context of solving a nonlinear regression problem.
ARTICLE | doi:10.20944/preprints202105.0390.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Multilayer perceptron neural network; regression model; backpropagation; missing data; imputation method
Online: 17 May 2021 (14:35:18 CEST)
Missing observations constitute one of the most important issues in data analysis in applied research studies. The magnitude and their structure impact parameters estimation in the modeling with important consequences for decision-making. This study aims to evaluate the efficiency of imputation methods combined with the backpropagation algorithm in a nonlinear regression context. The evaluation is conducted through a simulation study including sample sizes (50, 100, 200, 300 and 400) with different missing data rates (10, 20, 30 40 and 50%) and three missingness mechanisms (MCAR, MAR and MNAR). Four imputation methods (Last Observation Carried Forward, Random Forest, Amelia and MICE) were used to impute datasets before making prediction with backpropagation. 3-MLP model was used by varying the activation functions (Logistic-Linear, Logistic-Exponential, TanH-Linear and TanH-Exponentiel), the number of nodes in the hidden layer (3 - 15) and the learning rate (20 - 70%). Analysis of the performance criteria (R2, r and RMSE) of the network revealed good performances when it is trained with TanH-Linear functions, 11 nodes in the hidden layer and a learning rate of 50%. MICE and Random Forest were the most appropriate for data imputation. These methods can support up to 50% of missing rate with an optimal sample size of 200.
ARTICLE | doi:10.20944/preprints202012.0650.v1
Subject: Earth Sciences, Atmospheric Science Keywords: flood proneness; zoning, CN hydrologic model; curve number (CN); logistic regression
Online: 25 December 2020 (10:36:39 CET)
Spatial evaluation of flood-prone areas at the drainage basins is one of the basic strategies in the field of flood risk management. The present study aims to investigate the efficiency of the CN logistic and hydrological regression model for predicting and zoning floods. In the first stage, 13 runoff parameters, hydrologic soil groups (HSGs), slope, lithology, drainage density (DD), land curvature, elevation, distance to waterways/rivers, topographic wetness index (TWI), stream power index (SPI), rainfall, land use, and NDVI were employed. In the SCS-CN model of the drainage basin, the infiltration rate (S) and runoff amount (Q) were determined. The weights of the used layers were weighted by the AHP. Also, a flood zoning map of the drainage basin with different 5, 15, 25, and 50 year return periods was drawn by applying the weights of the layers. To ensure the accuracy of the zoning map with the logistic regression model, the ROC curve, and the area below the curve were used. The results showed that for the prediction rate, the AUC is 0.81%, indicating that the model has acceptable accuracy. The most important factors affecting flood are geological index; distance to waterways/rivers; and NDVI in the logistic regression model, and slope, DD, rainfall, and land use in the SCS-CN model respectively. 30 to 46% of the drainage basin area during 5 to 50 year periods has moderate flood potential, and 28 to 34% has high potential.
SHORT NOTE | doi:10.20944/preprints202011.0267.v1
Subject: Earth Sciences, Atmospheric Science Keywords: Siberian fir; regression model; forest type group; bonitet; growth rate table
Online: 9 November 2020 (09:11:44 CET)
The paper presents an assessment of the growth dynamics of the modal fir plantations in the Lower Angara region. At present, a vast area of fir forests in the Lower Angara region is characterised by a significant decrease in sustainability due to periodic forest fires, insect pests outbreaks and diseases, which lead to their natural degradation and death. However, the intensity of coniferous stand growth in certain forest site characteristics persists in the long term. Therefore, creating regression models of forest growth and development involving the identification of site conditions is very important both from a practical point of view and for environmental monitoring. The materials of the mass inventory of 3491 stands served as the initial data for studying the processes of fir plantations natural growth. The Hoerl Model function is suitable for the best approximation of stand growth since it is characterised by high levelling factor (from 0.970 to 0.987) and a small standard error (not exceeding 7%). As a result of the research, there have been constructed sketches of the growth rate tables for the modal Siberian fir stands of the third bonitet class of the forb and mossy groups of forest types.