ARTICLE | doi:10.20944/preprints202306.1849.v1
Subject: Engineering, Mechanical Engineering Keywords: Machine Learning; Regression Model; XGBoost Regression; Yield Strength
Online: 27 June 2023 (05:25:11 CEST)
Magnesium matrix composites have attracted significant attention due to their lightweight nature and impressive mechanical properties. However, the fabrication process for these alloy compo-sites is often time-consuming, expensive, and labor-intensive. To overcome these challenges, this study employed machine learning (ML) techniques to predict the mechanical properties of magnesium matrix composites. Regression models were utilized to forecast the yield strength of magnesium alloy composites reinforced with various materials. The study incorporated previous research on matrix type, reinforcement type, heat treatment, and mechanical working. The re-gression models employed in this study included decision tree regression, random forest re-gression, extra tree regression, and XGBoost regression. Model performance was assessed using metrics such as RMSE and R2. The XGBoost Regression model out-performed others, exhibiting an R2 value of 0.94 and the lowest error rate. Feature importance analysis indicated that the rein-forcement particle form had the greatest influence on the mechanical properties. The study iden-tified the optimized parameters for achieving the highest yield strength, which was 186.99 MPa. Overall, this study successfully demonstrates the effectiveness of ML as a valuable tool for opti-mizing the production parameters of magnesium matrix composites.
ARTICLE | doi:10.20944/preprints202306.0891.v1
Subject: Engineering, Mining And Mineral Processing Keywords: Fragmentation; Artificial neural network; Random Forest regression; Support vector regression; XG Boost Regression; Sensitivity analysis
Online: 13 June 2023 (08:04:17 CEST)
In a limestone quarry mine, fragmentation is a crucial outcome of blasting operations. The optimization of blasting operations greatly benefits from the prediction of rock fragmentation. The main factors that affect fragmentation are rock mass characteristics, blast geometry, and explosive properties. This paper is a step towards the implementation of machine learning and deep learning algorithms for predicting the extent of fragmentation (in percentage) in opencast mining. Various parameters can affect fragmentation. But, in this paper initially, ten parameters (spacing, drill hole diameter, burden, average bench height, powder factor, number of holes, charge per delay, uniaxial compressive strength, specific drilling, and stemming) are collected to train the model. However, due to a weak correlation with rock fragmentation, drill diameter, Average bench height, compressive strength, stemming, and charge per delay are eliminated to reduce model complexity. A total of 219 data sets having five input features i.e., the number of holes, spacing, burden, specific drilling, and powder factor are used to develop the models. To predict rock fragmentation due to blasting in limestone quarry mines, both machine learning models (Random Forest Regression (Bagging), Support Vector Regression, and XG Boost Regression (Boosting)), as well as a deep learning model (Neural Network Regression), are applied to develop a model that can optimize the prediction of fragmentation. The Artificial neural network model optimization showed that the model with architecture 64-32-16-1 can perform well giving MSE (mean squared error) values of 41.32 and 28.59 on training and test data respectively. The R2 value for both training and test is 0.83. Random Forest regression is also performing well compared to SVR and XG boost with the MSE value 12.37 and 9.89 on training and testing data respectively. Here, the R2 value for both sets are 94%. Based on the permutation importance and Shapely plot values, the powder factor has the highest impact, and the burden has the lowest impact on fragmentation.
ARTICLE | doi:10.20944/preprints202002.0200.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: uniqueness: regression depth; maximum depth estimator; regression median; robustness
Online: 15 February 2020 (14:51:15 CET)
Notion of median in one dimension is a foundational element in nonparametric statistics. It has been extended to multi-dimensional cases both in location and in regression via notions of data depth. Regression depth (RD) and projection regression depth (PRD) represent the two most promising notions in regression. Carrizosa depth DC is another depth notion in regression. Depth induced regression medians (maximum depth estimators) serve as robust alternatives to the classical least squares estimator. The uniqueness of regression medians is indispensable in the discussion of their properties and the asymptotics (consistency and limiting distribution) of sample regression medians. Are the regression medians induced from RD, PRD, and DC unique? Answering this question is the main goal of this article. It is found that only the regression median induced from PRD possesses the desired uniqueness property. The conventional remedy measure for non-uniqueness, taking average of all medians, might yield an estimator that no longer possesses the maximum depth in both RD and DC cases. These and other findings indicate that the PRD and its induced median are highly favorable among their leading competitors.
REVIEW | doi:10.20944/preprints202311.0156.v1
Subject: Biology And Life Sciences, Aquatic Science Keywords: tilapia; probiotics; linear regression analysis; hierarchical regression analysis; Pearson correlation
Online: 2 November 2023 (10:29:36 CET)
Data regarding the pandemic's impact on tilapia culture remain limited, but it is known that there was a significant decline in production and marketing since 2020. The post-pandemic challenges confronting tilapia farming necessitate prompt solutions, encompassing the management of bacterial infections and the adoption of more advanced technologies by small-scale producers in developing nations. Probiotics, acknowledged as a viable alternative, are presently extensively employed in tilapia aquaculture. Multiple studies have suggested that the application of diverse probiotics in tilapia culture has yielded favorable outcomes. Nonetheless, only a limited number of studies have employed statistical methods to evaluate such findings. To address this gap, a regression analysis was carried out to investigate the existence of a linear relationship between the probiotic dosage added to the feed and two key dependent variables: the specific growth rate (SGR) and the feed conversion ratio (FCR). Additionally, a hierarchical regression analysis was undertaken to ascertain the extent to which the variance observed in these responses could be explained by the variable "probiotic dosage in feed," after accounting for covariates such as initial weight, test duration, water temperature, and number of replicate tanks. Finally, two Pearson correlation matrices were constructed since different studies were included for the SGR and FCR analyses.
ARTICLE | doi:10.20944/preprints202309.2134.v1
Online: 30 September 2023 (05:42:45 CEST)
Dipteryx spp. is an important species in reforestation in the Amazon. The objective of this study is to characterize and compare the relationships between dendrometric variables in Dipteryx spp. stands in the Western Amazon by fitting linear regression equations for total height and crown diameter. Six forest stands were evaluated in three municipalities. Dendrometric variables collected included diameter at 1.3 m height (dbh), total height (ht) and crown diameter (dc). Simple and multiple linear regression equations were fitted to characterize the relationships between ht and dc. The total aboveground biomass of Dipteryx spp. trees and the carbon stock of the stands were estimated. The general equations showed higher R² values, exceeding 0.7. The general equations for estimating ht and dc were significant for all coefficients. The trees averaged 22 t/ha of aboveground biomass in the stands. There was a variation in carbon sequestration potential among stands, ranging from 5.12 to 88.91 t CO2.ha-1. Single-input equations using dbh as an independent variable are recommended for estimating dc and ht for individual Dipteryx spp. stands. Stands in the Western Amazon play a significant role in carbon sequestration and accumulation. Trees can sequester an average of 4.8 tons of CO2 per year.
REVIEW | doi:10.20944/preprints202110.0207.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: transfer learning; classification; regression
Online: 13 October 2021 (16:28:59 CEST)
Accurate transfer learning of clinical outcomes, e.g., of the effects and side effects of drugs or other interventions, from one cellular context to another (in-vitro versus ex-vivo versus in-vivo, or across tissues), between cell-types, developmental stages, omics modalities or species, is considered tremendously useful. Ultimately, it may avoid most drug development failing in translation, despite large investments in the preclinical stages, which includes animal experiments requiring careful justification. Thus, when transferring a prediction task from a source (model) domain to a target domain, what counts is the high quality of the predictions in the target domain, requiring molecular states or processes common to both source and target that can be learned by the predictor, reflected by latent variables. These latent variables may form a compendium of knowledge that is learned in the source, to enable predictions in the target; usually, there are few, if any, labeled target training samples to learn from. Transductive learning then refers to the learning of the predictor in the source domain, transferring its outcome label calculations to the target domain, considering the same task. Inductive learning considers cases where the target predictor is performing a different yet related task as compared to the source predictor, making some labeled target data necessary. Often, there is also a need to first map the variables in the input/feature spaces (e.g. of gene names to orthologs) and/or the variables in the output/outcome spaces (e.g. by matching of labels). Transfer across omics modalities also requires that the molecular information flow connecting these modalities is sufficiently conserved. Only one of the methods for transfer learning we reviewed offers an assessment of input data, suggesting that transfer learning is unreliable in certain cases. Moreover, source domains feature their very own particularities, and transfer learning should consider these, e.g., as differences in pharmacokinetics, drug clearance or the microenvironment. In light of these general considerations, we here discuss and juxtapose various recent transfer learning approaches, specifically designed (or at least adaptable) to predict clinical (human in-vivo) outcomes based on molecular data, towards finding the right tool for a given task, and paving the way for a comprehensive and systematic comparison of the suitability and accuracy of transfer learning of clinical outcomes.
Subject: Business, Economics And Management, Economics Keywords: electricity poverty; quantile regression
Online: 18 September 2020 (09:40:45 CEST)
The main objective of this article is to explore the causes of household electricity poverty in Spain from an innovative perspective. Based on evidence of energy inequality across households with different income levels, a quantile regression approach was used to better capture the heterogeneity of determinants of energy poverty across different levels of electricity expenditure. The results illustrate some interesting and counter-intuitive findings about the relationship between household income and electricity poverty, and the technical efficiency of quantile regression compared to the imprecise results of a standard single coefficient/OLS approach.
ARTICLE | doi:10.20944/preprints202201.0441.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Active learning (AL); batch mode; expected model change; linear regression; nonlinear regression
Online: 28 January 2022 (15:03:10 CET)
Training supervised machine learning models requires labeled examples. A judicious choice of examples is helpful when there is a significant cost associated with assigning labels. This article improves upon a promising extant method – Batch-mode Expected Model Change Maximization (B-EMCM) method – for selecting examples to be labeled for regression problems. Specifically, it develops and evaluates alternate strategies for adaptively selecting batch size in B-EMCM. By determining the cumulative error that occurs from the estimation of the stochastic gradient descent, a stop criteria for each iteration of the batch can be specified to ensure that selected candidates are the most beneficial to model learning. This new methodology is compared to B-EMCM via mean absolute error and root mean square error over ten iterations benchmarked against machine learning data sets. Using multiple data sets and metrics across all methods, one variation of AB-EMCM, the max bound of the accumulated error (AB-EMCM Max), showed the best results for an adaptive batch approach. It achieved better root mean squared error (RMSE) and mean absolute error (MAE) than the other adaptive and non-adaptive batch methods while reaching the result in nearly the same number of iterations as the non-adaptive batch methods.
ARTICLE | doi:10.20944/preprints202208.0222.v1
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: Tuberculosis; Mortality; Indigenous; Logistic Regression
Online: 11 August 2022 (12:00:20 CEST)
Aim. To identify factors associated with mortality with tuberculosis diagnosis in the indigenous population in Peru 2015-2019. Methods. Case-control study nested in a retrospective cohort, using the registry of persons belonging to indigenous peoples of the National Tuberculosis Prevention and Control Strategy of the Ministry of Health of Peru. A descriptive analysis was applied, and then bivariate and multiple logistic regression was used to evaluate associations between the variables and the outcome (live-deceased), the results were presented as OR with their respective 95% confidence intervals. Results. The mortality rate of the total indigenous population of Peru was 1.75 deaths per 100,000 indigenous people diagnosed with TB. The community of Kukama kukamiria - Yagua reported 505 (28.48%) individuals. The final logistic model showed that indigenous men (OR=1.93; 95% CI: 1.001-3.7), with a history of HIV prior to TB (OR=16.7; 95% CI: 4.7-58.7) and indigenous people in old age (OR=2.95; 95% CI: 1.5-5.7), are factors associated with a greater chance of dying from TB. Conclusions. It is important to reorient health services among indigenous populations, especially those related to improving the timely diagnosis and early treatment of TB-HIV co-infection, to ensure comprehensive care for this population, considering that they are vulnerable groups.
ARTICLE | doi:10.20944/preprints202011.0297.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: regression; time point data; modelling
Online: 10 November 2020 (10:00:37 CET)
In this paper, we present a relapse based demonstrating way to deal with investigate various arrangement MTC information. A commonplace use of this displaying approach incorporates three stages: first, define a model that approximates the connection between quality articulation and trial factors, with boundaries consolidated to address the exploration premium; second, utilize least-squares and assessing condition methods to gauge boundaries and their relating standard blunders; third, register test insights, P-qualities and NFD as proportions of factual criticalness. The benefits of this methodology are as per the following. To begin with, it tends to the exploration interest in a particular, precise way, and maximally uses all the information and other important data. Second, it represents both orderly and irregular varieties related with the information, and the consequences of such examination give not just quality explicit data applicable to the exploration objective, yet additionally its dependability, in this way helping agents to settle on better choices for subsequent investigations. Third, this methodology is truly adaptable, and can undoubtedly be stretched out to different sorts of MTC considers or other microarray explores by detailing various models dependent on the test plan of the investigations.
ARTICLE | doi:10.20944/preprints202307.0288.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Idiosyncratic Volatility Estimation/Prediction; Machine Learning; Deep learning Based Regression; Tree-Based Regression; Artificial Intelligence
Online: 6 July 2023 (02:14:16 CEST)
Financial markets require a great deal of decision making from the investors and market makers. One metric that can help ease the process of decision making is investment risk which can be measured in two parts; systematic risk and idiosyncratic risk. Clear understanding of the volatilities in each risk component can be a powerful signal in recognizing the right assets to maximize the investment returns. In this paper, we focus on the idiosyncratic volatility values and pre-calculate the idiosyncratic volatility values for 31,198 members of NYSE, Amex and Nasdaq markets for the trades occurring between January 1963 and December 2019. Utilizing a subset of dataset, limited to Nasdaq100 index, we consider the application of machine learning techniques in predicting the idiosyncratic volatility values using the raw trade data to explore a data extension option for the future market trade records that have not yet occurred. We offer a deep learning based regression model and compare it with traditional tree-based methods on a small subset of our per-calculated idiosyncratic volatility dataset. Our analytical results show that the performance of the deep learning techniques is much more robust in comparison to that of the traditional tree-based baselines.
ARTICLE | doi:10.20944/preprints202310.0202.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: amaranth; environmental index; linear regression; stability
Online: 4 October 2023 (05:04:02 CEST)
Amaranth has the potential to support Malawi's food and nutrition security, income generation and livelihoods, and climate change resilience efforts. Due to the high genetic variability of Ama-ranth, there is a need to develop stable and high-yielding genotypes for sustainable production. To determine the degree of genetic stability in different environments, five Amaranth accessions were subjected to stability analysis. The experiment was carried out at three sites (Bunda, Bembeke, and Chipoka) for two seasons in 2020-2021 in the central region of Malawi. It was laid out in Ran-domized Complete Block Design (RCBD) with four replicates. Eberhart and Russell linear regres-sion model was used for stability analysis and Pearson correlation was used to test the relationship between variables. Environmental variance + (genotype x environment) was significant for four of the parameters studied, namely grain yield, plant height, leaf length, and leaf width, indicating the presence of a remarkable interaction between genotypes and environment. The results of a pooled analysis of variance showed significant differences at a 5% significance level among the Amaranth accessions, indicating inherent genetic variability. Using the linear regression model of Eberhart and Russell, accessions PE-LO-BH -01 and LL-BH -04 were identified as the highest yielding stable genotypes for leaf and grain yield, respectively. In addition, Bembeke site was the most favourable environment for all the accessions. Thus, to enhance the production of amaranth in Malawi, LL-BH-04 and PE-LO-BH-01 were put forward for release as varieties for grain and leaf respectively. These results will also guide and support for future breeding programs.
ARTICLE | doi:10.20944/preprints202008.0139.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: copper price; prediction; support vector regression
Online: 6 August 2020 (08:26:35 CEST)
Predicting copper price is essential for making decisions that can affect companies and governments dependent on the copper mining industry. Copper prices follow a time series that is non-linear, non-stationary, and which have periods that change as a result of potential growth, cyclical fluctuation and errors. Sometimes the trend and cyclical components together are referred to as a trend-cycle. In order to make predictions, it is necessary to consider the different characteristics of trend-cycle. In this paper, we study a copper price prediction method using Support Vector Regression. This work explores the potential of the Support Vector Regression with external recurrences to make predictions at 5, 10, 15, 20 and 30 days into the future in the copper closing price at the London Metal Exchanges. The best model for each forecast interval is performed using a grid search and balanced cross-validation. In experiments on real data-sets, our results obtained indicate that the parameters (C, ε, γ) of the model Support Vector Regression do not differ between the different prediction intervals. Additionally, the amount of preceding values used to make the estimates does not vary according to the predicted interval. Results show that the support vector regression model has a lower prediction error and is more robust. Our results show that the presented model is able to predict copper price volatilities near reality, being the RMSE equal or less than the 2.2% for prediction periods of 5 and 10 days.
ARTICLE | doi:10.20944/preprints202008.0058.v1
Subject: Environmental And Earth Sciences, Geography Keywords: Rwandz; residential function; GIS; correlation; regression
Online: 3 August 2020 (00:37:42 CEST)
House is the haven that keeps people from natural and human conditions, it gives them trust, safety, and steadiness. It is one of the most basic human needs this became a serious function which cities offer, and became one of the most important aspects which caught urban researchers interest, they take into consideration a wide range of architectural, social, and economic indicators. The study aims to provide an overall conception of Rwandz residential functions, using a collection of parameters and some GIS and statistical techniques, to help establish plans and future projects to improve the growth of this city and other towns and cities in that area. The study found that the old parts of Rwandz city which are located in the core, differ from the outer parts which are relatively newer in many properties, generally, the core is more densely populated than the outer, bigger family size, more illiteracy, and unemployment, few incomes, older houses, smaller houses, in the opposite of the outer parts. Besides, the study tested the correlation coefficient between the criteria; it found some strong statistical relationships between them, which reflected some real-life properties of the residential function. Lastly, the study designed a regression model to predict the main residential function criteria.
ARTICLE | doi:10.20944/preprints201902.0135.v1
Subject: Business, Economics And Management, Finance Keywords: recovery rates; beta regression; credit risk
Online: 14 February 2019 (11:30:03 CET)
Based on a rich data set of recoveries donated by a debt collection business, recovery rates for non-performing loans taken from a single European country are modelled using linear regression, linear regression with Lasso, beta regression and inflated beta regression. We also propose a two-stage model: beta mixture model combined with a logistic regression model. The proposed model allows us to model the multimodal distribution we find for these recovery rates. All models are built using loan characteristics, default data and collections data prior to purchase by the debt collection business. The intended use of the models is to estimate future recovery rates for improved risk assessment, capital requirement calculations and bad debt management. They are compared using a range of quantitative performance measures under K-fold cross validation. Among all the models, we find that the proposed two-stage beta mixture model performs best.
ARTICLE | doi:10.20944/preprints201809.0499.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: aquatics; modeling; boosted regression trees; appalachians
Online: 26 September 2018 (05:23:02 CEST)
Understanding influences of multiple stressors across the landscape on aquatic biota is important for conservation, as it allows for an understanding of spatial patterns and informs stakeholders of significant conservation value. Data exists for land use/landcover (LULC) and other physicochemical components of the landscape throughout the Appalachian region yet biological data is sparse. This dearth of biological data relative to LULC and physicochemical data creates difficulties in making informed management and conservation decisions across large landscapes. At the HUC12 watershed scale we sought to create a single score for both abiotic and biotic values throughout the central and southern Appalachian region. We used boosted regression trees (BRT) to model biological responses (fish and aquatic macroinvertebrate variables) to abiotic variables. Variance explained by BRT models ranged from 62-94%. We categorized both predictor and response variables into themes and targets respectively to better understand large scale patterns on the landscape that influence biological condition of streams. We combined predicted values for a suite of response variables from BRT models to create a single watershed score for aquatic macroinvertebrates and fish. Regional models were developed for fish but we were unable to develop regional models for aquatic macroinvertebrates due to the low number of sample sites. There was strong correlation between regional and global watershed scores for fish models but not between fish and aquatic macroinvertebrate models. Use of such multimetric scores can inform managers, NGOs, and private land owners regarding land use practices; thereby contributing to largescale landscape scale conservation efforts.
ARTICLE | doi:10.20944/preprints201712.0032.v1
Subject: Engineering, Energy And Fuel Technology Keywords: statistics; uncertainty; regression; sampling; outlier; probabilistic
Online: 6 December 2017 (06:36:02 CET)
Energy Measurement and Verification (M&V) aims to make inferences about the savings achieved in energy projects, given the data and other information at hand. Traditionally, a frequentist approach has been used to quantify these savings and their associated uncertainties. We demonstrate that the Bayesian paradigm is an intuitive, coherent, and powerful alternative framework within which M&V can be done. Its advantages and limitations are discussed, and two examples from the industry-standard International Performance Measurement and Verification Protocol (IPMVP) are solved using the framework. Bayesian analysis is shown to describe the problem more thoroughly and yield richer information and uncertainty quantification than the standard methods while not sacrificing model simplicity. We also show that Bayesian methods can be more robust to outliers. Bayesian alternatives to standard M&V methods are listed, and examples from literature are cited.
COMMUNICATION | doi:10.20944/preprints202111.0549.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Principal Component Regression, Partial Least Squares, Orthogonal Partial Least Squares, multivariate regression, hypothesis generation, Parkinson’s disease
Online: 29 November 2021 (15:42:03 CET)
In the current era of ‘big data’, scientists are able to quickly amass enormous amount of data in a limited number of experiments. The investigators then try to hypothesize about the root cause based on the observed trends for the predictors and the response variable. This involves identifying the discriminatory predictors that are most responsible for explaining variation in the response variable. In the current work, we investigated three related multivariate techniques: Principal Component Regression (PCR), Partial Least Squares or Projections to Latent Structures (PLS), and Orthogonal Partial Least Squares (OPLS). To perform a comparative analysis, we used a publicly available dataset for Parkinson’ disease patien ts. We first performed the analysis using a cross-validated number of principal components for the aforementioned techniques. Our results demonstrated that PLS and OPLS were better suited than PCR for identifying the discriminatory predictors. Since the X data did not exhibit a strong correlation, we also performed Multiple Linear Regression (MLR) on the dataset. A comparison of the top five discriminatory predictors identified by the four techniques showed a substantial overlap between the results obtained by PLS, OPLS, and MLR, and the three techniques exhibited a significant divergence from the variables identified by PCR. A further investigation of the data revealed that PCR could be used to identify the discriminatory variables successfully if the number of principal components in the regression model were increased. In summary, we recommend using PLS or OPLS for hypothesis generation and systemizing the selection process for principal components when using PCR.rewordexplain later why MLR can be used on a dataset with no correlation
ARTICLE | doi:10.20944/preprints201907.0351.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: evaporation; meteorological parameters; Gaussian process regression; support vector regression; machine learning modeling; hydrology; prediction; data science; hydroinformatics
Online: 31 July 2019 (10:58:29 CEST)
Evaporation is one of the main processes in the hydrological cycle, and it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, the evaporation is a complex and nonlinear phenomenon; therefore, the data-based methods can be used to have precise estimations of it. In this regard, in the present study, Gaussian Process Regression (GPR), Nearest-Neighbor (IBK), Random Forest (RF) and Support Vector Regression (SVR) were used to estimate the pan evaporation (PE) in the meteorological stations of Golestan Province, Iran. For this purpose, meteorological data including PE, temperature (T), relative humidity (RH), wind speed (W) and sunny hours (S) collected from the Gonbad-e Kavus, Gorgan and Bandar Torkman stations from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error (RMSE), correlation coefficient (R) and Mean Absolute Error (MAE). Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. The outcome indicates that the optimum state of Gonbad-e Kavus, Gorgan and Bandar Torkman stations, Gaussian Process Regression (GPR) with the error values of 1.521, 1.244, and 1.254, the Nearest-Neighbor (IBK) with error values of 1.991, 1.775, and 1.577, Random Forest (RF) with error values of 1.614, 1.337, and 1.316, and Support Vector Regression (SVR) with error values of 1.55, 1.262, and 1.275, respectively, have more appropriate performances in estimating PE. It found that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W, and S had the most accurate performances and proposed for precise estimation of PE. Due to the high rate of evaporation in Iran and the lack of measurement instruments, the findings of the current study indicated that the PE values might be estimated with few easily measured meteorological parameters accurately.
ARTICLE | doi:10.20944/preprints202307.1405.v1
Subject: Engineering, Chemical Engineering Keywords: neural network regression; wastewater quality; spectral reflectance
Online: 20 July 2023 (10:44:00 CEST)
Wastewater (WW) analysis is a critical step in various operations such as control of a WW treatment facility, and speeding-up the analysis of WW quality can significantly improve such operations. This work demonstrates the capability of neural network (NN) regression models to estimate WW characteristic properties such as biochemical oxygen demand (BOD), chemical oxygen demand (COD), ammonia (NH3-N), total dissolved substances (TDS), total alkalinity (TA), and total hardness (TH) by training on WW spectral reflectance in the visible to near-infrared spectrum (400nm-2000nm). The dataset contains samples of spectral reflectance intensity, which were the inputs, and the WW parameter levels (BOD, COD, NH3-N, TDS, TA, and TH), which were the outputs. Various NN model configurations were evaluated in terms of regression model fitness. The mean-absolute-error (MAE) was used as the metric for training and testing the NN models, and the coefficient of determination (R2) between the model predictions and true values was also computed to measure how well the NN models predict the true values. With online spectral measurements, the trained neural network model can provide non-contact and real-time estimation of WW quality at minimum estimation error.
ARTICLE | doi:10.20944/preprints202305.1678.v1
Subject: Business, Economics And Management, Economics Keywords: Europe; Income Distrubution; Relative Distribution; RIF-regression
Online: 24 May 2023 (03:34:42 CEST)
The issue of polarization, as opposed to inequality, has been little explored for European countries. in this paper, using harmonized data produced by Luxembourg Income Study Database, observes income trends for 12 European countries, showing an increase in polarization in many of the countries considered. the drivers that led to this concentration of income are also analyzed, noting heterogeneous factors within countries.
ARTICLE | doi:10.20944/preprints202305.0792.v1
Subject: Business, Economics And Management, Business And Management Keywords: Baltic Dry Index; Covid-19; Stepwise Regression
Online: 11 May 2023 (05:11:46 CEST)
The outbreak of COVID-19 in 2020 caused significant disruptions to global shipping and the world economy. This paper aims to investigate the impact of the pandemic on global shipping by analyzing the Baltic Dry Index (BDI). The BDI is a metric that reflects the worldwide shipping costs and directs related to supply and demand conditions, making it an indicator of economic production. The study utilizes data from 2019 to 2021, before and after the outbreak of COVID-19, and considers 13 independent variables, including raw materials, energy, stock market indexes, global port calls, and confirmed COVID-19 cases to investigate how to influent the BDI. The study employs stepwise regression to select variables and build models before and after the pandemic. The findings reveal that the key factors affecting the freight index BDI before the outbreak are: international scrap steel prices, iron ore prices, and the Commodity Research Bureau Index. However, after the COVID-19 outbreak, the factors affecting the BDI changed to the Shanghai Index, global port calls, and the number of confirmed COVID-19 cases.
ARTICLE | doi:10.20944/preprints202305.0096.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: Topological indices; Fibrates; Curvilinear regression; QSPR analysis
Online: 3 May 2023 (04:48:22 CEST)
The paper describes the use of topological indices in conjunction with high cholesterol drugs, specifically Fibrates, to predict their physicochemical properties and biological activities. Fibrates are known to lower high triglycerides, increase HDL cholesterol, and reduce the small dense fraction of LDL cholesterol. The study uses a quantitative structural-property relationships (QSPR) approach, which involves analyzing the relationships between physicochemical properties and topological indices using curvilinear regression. The QSPR model predicts the physicochemical properties of the drugs based on degrees and distances determined from topological indices. The study also conducted (DFT) calculations at the B3LYP/6-31G(d,p) level on the four investigated derivatives to gain insights into their optimized geometries, DOS plots, HOMO and LUMO orbital energies, and distribution. The theoretical results presented in the study suggest that the use of topological indices in QSPR models could provide a powerful tool for predicting the physicochemical properties and biological activities of molecules, including drugs. These findings could lead to the development of new cholesterol-lowering drugs with desirable properties.
ARTICLE | doi:10.20944/preprints202205.0417.v1
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: COVID-19; Eswatini; risk mapping; Poisson regression
Online: 31 May 2022 (11:04:12 CEST)
COVID-19 national spikes had been reported at varying temporal scales as a result of differences in the driving factors. Factors affecting case load and mortality rates have varied between countries and regions. We investigated the association between various socio-economic, demographic and health variables with the spread on COVID-19 cases in Eswatini using the maximum likelihood estimation method for count data. A generalized Poisson regression (GPR) model was fitted with the data comprising of fifteen covariates to predict COVID-19 risk in Eswatini. The results showed that variables that were key determinants in the spread of the disease were those that included the proportion of elderly above 55 years at 98% (95% CI: 97%-99%) and the proportion of youth below 35 years at 0.08% (95% CI: 0.017%-38%) with a pseudo R-square of 0.72. However, in the early phase of the virus when cases were fewer, results from the Poisson regression showed that household size, household density and poverty index were associated with COVID-19. We produced a risk map of predicted COVID-19 in Eswatini using the variables that were selected at 5% significance level. The map could be used by the country to plan and prioritize health interventions against COVID-19. The identified areas of high risk may be further investigated in order to find out the risk amplifiers and assess what could be done to prevent them.
ARTICLE | doi:10.20944/preprints202107.0139.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: circularity; waste streams; circular approaches; regression equation
Online: 6 July 2021 (11:40:19 CEST)
In this paper, the authors identified key elements important for circularity: (1) Background: The primary goal of circularity is to eliminate waste and to prove the constant use of resources. In the paper, we classify studies according to circular approaches. The authors identified main elements and classified them into categories important for circularity, starting with the managing and reducing waste and the recovery of resources; and ending with the circularity of material, and general circularity-related topics and presented scientific works dedicated to each of the above-mentioned categories. The authors analyzed several core elements from the first category aiming to investigate and connect different waste streams and provided a regression model; (2) Methods: The authors used a dynamic regression model to identify relationships among variables and selected the ones, which has an impact on the increase of biowaste. The research was delivered for the 27 European Union countries during the period between 2020 and 2019; (3) Conclusions: The authors indicated that the recycling rate of wasted electrical equipment in the previous year has an impact on the increase of recycling biowaste next year. This is explained as non-metallic spare parts of electronic equipment are used as biowaste for fuel production. And the separation process of the composites of electric equipment takes some time, on average the effect is evident in one year period.
ARTICLE | doi:10.20944/preprints202012.0321.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: quantile regression; groundwater; environmental; multivariate; metals; health
Online: 14 December 2020 (10:13:09 CET)
One of the most important defining characteristics of groundwater quality is pH as it fundamentally controls the amount and chemical form of many organic and inorganic solutes in groundwater. Groundwater data are frequently characterized by a wide degree of variability of the factors which possibly influence pH distribution. For this reason, it is challenging to link the spatio-temporal dynamics of pH to a single environmental factor by the ordinary least squares regression technique of the conditional mean. In this study, quantile regression was used to estimate the response of pH to nine environmental factors (As, Cd, Fe, Mn, Pb, turbidity, electrical conductivity, total dissolved solids and nitrates). Results of 25%, 50%, 75% quantile regression and ordinary least squares (OLS) regression were compared. The standard regression of the conditional means (OLS) underestimated the rates of change of pH due to the selected factors in comparison with the regression quantiles. The effect of arsenic increased for sampling locations with higher pH values (higher quantiles) likewise the influence of Pb and Mn. However, the effects of Cd and Fe decreased for sampling locations in higher quantiles. It can be concluded that these detected heterogeneities would be missed if this study had focused exclusively on the conditional means of the pH values. Consequently, quantile regression provides a more comprehensive account of possible spatio-temporal relationships between environmental covariates in groundwater. This study is one of the first to apply this technique on groundwater systems in sub-Saharan Africa. The approach is useful and interesting and has broad application for other mining environments especially tropical low-income countries where climatic conditions can drive rapid cycling or transformations of pollutants. It is also pertinent to geopolitical contexts where regulatory; monitoring and management capacities are weak and where mining pollution of groundwater largely occur.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Crime prediction; Ensemble Learning; Machine Learning; Regression
Online: 14 September 2020 (00:53:30 CEST)
While the use of crime data has been widely advocated in the literature, its availability is often limited to large urban cities and isolated databases tend not to allow for spatial comparisons. This paper presents an efficient machine learning framework capable of predicting spatial crime occurrences, without using past crime as a predictor, and at a relatively high resolution: the U.S. Census Block Group level. The proposed framework is based on an in-depth multidisciplinary literature review allowing the selection of 188 best-fit crime predictors from socio-economic, demographic, spatial, and environmental data. Such data are published periodically for the entire United States. The selection of the appropriate predictive model was made through a comparative study of different machine learning families of algorithms, including generalized linear models, deep learning, and ensemble learning. The gradient boosting model was found to yield the most accurate predictions for violent crimes, property crimes, motor vehicle thefts, vandalism, and the total count of crimes. Extensive experiments on real-world datasets of crimes reported in 11 U.S. cities demonstrated that the proposed framework achieves an accuracy of 73 and 77% when predicting property crimes and violent crimes, respectively.
REVIEW | doi:10.20944/preprints202111.0310.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Functional Data Analysis (FDA); Hybrid Data; Semi-Functional Partial Linear Regression Model (SFPLR); Partial Functional Linear Regression; Literature Review
Online: 17 November 2021 (15:21:19 CET)
Background: In the functional data analysis (FDA), the hybrid or mixed data are scalar and functional datasets. The semi-functional partial linear regression model (SFPLR) is one of the first semiparametric models for the scalar response with hybrid covariates. Various extensions of this model are explored and summarized. Methods: Two first research articles, including “semi-functional partial linear regression model”, and “Partial functional linear regression” have more than 300 citations in Google Scholar. Finally, only 106 articles remained according to the inclusion and exclusion criteria such as 1) including the published articles in the ISI journals and excluding 2) non-English and 3) preprints, slides, and conference papers. We use the PRISMA standard for systematic review. Results: The articles are categorized into the following main topics: estimation procedures, confidence regions, time series, and panel data, Bayesian, spatial, robust, testing, quantile regression, varying Coefficient Models, Variable Selection, Single-index model, Measurement error, Multiple Functions, Missing values, Rank Method and Others. There are different applications and datasets such as the Tecator dataset, air quality, electricity consumption, and Neuroimaging, among others. Conclusions: SFPLR is one of the most famous regression modeling methods for hybrid data that has a lot of extensions among other models.
ARTICLE | doi:10.20944/preprints202312.0092.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Active Learning; Design of experiments; Regression; s-PGD
Online: 1 December 2023 (15:04:37 CET)
Machine learning approaches are currently used to understand or model complex physical systems. In general, a substantial number of samples must be collected to create a model with reliable results. However, collecting numerous data is often relatively time-consuming or expensive. Moreover, the problems of industrial interest tend to be more and more complex and depending on a high number of parameters. High dimensional problems intrinsically involve the need of large data amount through the curse of dimensionality. That is why, new approaches based on smart sampling techniques are investigated to minimize the number of samples to be given to train the model, such as Active Learning methods. Here, we propose a technique based on a combination of Fisher information matrix and of Sparse Proper Generalized Decomposition that enables the definition of a new Active Learning informativeness criterion in high dimensions. We provide examples proving the performances of this technique on a theoretical 5D polynomial function and on an industrial crash simulation application. The results prove that the proposed strategy over-perform the usual ones.
ARTICLE | doi:10.20944/preprints202311.1782.v1
Subject: Business, Economics And Management, Economics Keywords: DEA; wood processing enterprises; small enterprises; fractional regression
Online: 28 November 2023 (07:49:48 CET)
Micro and small wood-processing enterprises represent the heart of the European forest-based industries, being among the key drivers of economic growth in rural, mountainous, and poor regions. Their economic efficiency is of fundamental importance for their existence and the pro-vision of income for the local population in rural areas. Data Envelopment Analysis (DEA) is nonparametric, linear-programming-based approach, commonly used to analyse the efficiency of organizational units. This method allows estimating the economic efficiency of a certain eco-nomic system without assumptions about the functional form between resources and products. Furthermore, DEA determines the efficiency frontier and gives results of whether an enterprise, i.e., a Decision Making Unit (DMU) is efficient or not. The main objective of this study was to investigate and evaluate the economic efficiency of micro and small wood-processing enterpris-es in the EU countries and reveal the hidden inputs that facilitate efficiency generation. The eco-nomic efficiency evaluation was carried out on the basis of the official statistical data for the mi-cro and small wood-processing companies in the EU member states for the period 2015-2020 by performing a two-stage DEA analysis. The data used were standardized by value per employee. In addition to the first stage of DEA, fractional regression probit and logit models with four contextual variables were used to reveal the influence of the hidden inputs in the model. The results showed that the micro and small wood-processing enterprises can be regarded as more scale-efficient than technically-efficient entities. The only contextual variable affecting the eco-nomic efficiency was Investments per Person Employed, improving the efficiency by 2% per 1% increase of the investments.
ARTICLE | doi:10.20944/preprints202311.1435.v1
Subject: Business, Economics And Management, Finance Keywords: Exchange Rate Volatility; Exports; NARDL; Smooth Threshold Regression
Online: 22 November 2023 (13:48:53 CET)
This research paper aimed to examine the impact of exchange rate volatility on South Africa's exports from 1994 Q1 to 2023 Q2. The study used the Augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) tests to test for stationarity. The nonlinear autoregressive distributed lag (NARDL) model and smooth threshold regression (STR) are employed to analyse the relationship between exchange rate volatility and exports. The GARCH (1.1) technique is used to construct the exchange rate volatility data. The results of the stationarity tests reveal that variables are either integrated in order I(0) or I(1). This implies that the variables used in this study are stationary, which is crucial for conducting accurate analyses. Moreover, the NARDL test approach provided insights into the long-run effects of exchange rate volatility on South Africa's exports. Based on the NARDL test, positive shocks have a greater but statistically insignificant effect on exports than negative shocks. Therefore, a greater level of exchange rate volatility may lead to increased exports from South Africa. Furthermore, the STR also reveals that the impact of exchange rate volatility is insignificant. These findings provide valuable insights for policymakers and firms to make informed decisions regarding exchange rate management and export strategies in South Africa.
REVIEW | doi:10.20944/preprints202310.1913.v3
Subject: Engineering, Civil Engineering Keywords: Solar PV system; Regression Model; DOE; Solar energy; Fossil fuels
Online: 9 November 2023 (10:58:47 CET)
AbstractTo overcome the negative impacts on the environment and other problems associated with fossil fuels have forced many countries to inquire into and change to environmentally friendly alternatives that are renewable to sustain the increasing energy demand. Solar energy is one of the best renewable energy sources with the least negative impacts on the environment. Different countries have formulated solar energy policies to reduce dependence on fossil fuel and increasing domestic energy production by solar energy. According to the 2010 BP Statistical Energy Survey, the world cumulative installed solar energy capacity was 22928.9 MW in 2009, a change of 46.9% compared to 2008. In this study, a PV generation system has been modeled and installed considering uncertain whether based on the hourly wind speed data of New York City (NYC) of year 2014. Regression models has been used to forecast the hourly, weekly, and monthly wind speed of NYC year 2014. Design of experiment (DOE) has been used to determine the optimal panel size (area), the battery capacity size, and other levels of factors.
ARTICLE | doi:10.20944/preprints202310.0432.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: European Union; public revenues; public expenditures; regression analysis
Online: 8 October 2023 (10:08:59 CEST)
Modern countries generally deal with significant budget deficits and public debt. These countries need to rationalize their expenditures and increase revenue without major interference to economic flows. The aim of this paper is to create a model for forecasting public revenue and expenditure based on data from previous years. In the paper we formulated two hypotheses related to the validity of the set models. After detailed analysis, both hypotheses were accepted. The analysis includes all EU Member States and public revenue and expenditure data for the last decade. The significance of the analysis is reflected on the practical foundation of the pre-set theoretical views, which will have their basis in statistically significant results. By analyzing the model, we formulated the regression formulas of revenues and expenditures, which can be efficiently used in predicting these variables.
ARTICLE | doi:10.20944/preprints202308.0823.v1
Subject: Engineering, Bioengineering Keywords: chicken egg fertility; classification; PLS regression; hyperspectral imaging
Online: 10 August 2023 (08:59:12 CEST)
Partial least square (PLS) regression is a well-known chemometric method used for predictive modelling, especially in the presence of many variables. Although PLS was not initially developed as a technique for classification tasks, scientists have reportedly used this approach successfully for discrimination purposes. Whereas some non-supervised learning approaches including but not limited to PCA, and k-means clustering do well in identifying/understanding grouping and clustering patterns in multidimensional data, they are limited when the end target is discrimination, making PLS a preferable alternative. A total of fertilized 672 chicken egg hyperspectral imaging data, consisting of 336 white eggs and 336 brown eggs were used in this study. Hyperspectral images in the NIR region of 900-1700 nm wavelength range were captured prior to incubation on day 0 and on days 1-4 after incubation. Eggs were candled on incubation day 5 and broken out on day 10 to confirm fertility. While a total number of 312 and 314 eggs were found to be fertile in the brown and white egg batches respectively, total numbers of non-fertile eggs in the same set of batches were 23 and 21 respectively. Spectral information was extracted from a segmented region of interest (ROI) of each hyperspectral image and spectral transmission characteristics were obtained by averaging the spectral information. A moving-thresholding technique was implemented for discrimination based on PLS regression results on the calibration set. With true positive rates (TPR) of up to 100% obtained at selected threshold values of between 0.50-0.85 and on different days of incubation, the results indicated that the proposed PLS technique can accurately discriminate between fertile and non-fertile eggs. The adaptive PLS approach was thereby presented as suitable for handling hyperspectral imaging-based chicken egg fertility data
ARTICLE | doi:10.20944/preprints202211.0227.v1
Subject: Medicine And Pharmacology, Orthopedics And Sports Medicine Keywords: Bayesian; cardiovascular disease; CVD; cross-sectional; logistic regression
Online: 14 November 2022 (01:55:06 CET)
Background: Cardiovascular disease (CVD) has been one of the leading causes of death and disability-adjusted life years lost worldwide. Blood pressure, lipid, and cholesterol are good predictors of CVD risk and correspond upon age and physical fitness. However, few studies have explored the variation trend of CVD risk factors across different populations upon age and their muscle strength. Objective: to analysis the variation tendency of CVD risk factors in blood according to age and relative grip strength among different populations. Method: 25363 participants were recruited in this cross-sectional study and 24709 were included in the analysis. A logistic regression and a Bayesian probabilistic analysis based on Markov Chain Monte Carlo (MCMC) Modeling is conducted to build probability prediction models of hypertension, hyperlipidemia, and hypercholesterolemia according to age, relative grip strength, body weight conditions, and physical activity levels. Results: 1) age might be the main influence factor of hypertension, which is regarded as one of the primary CVD risk factors. However, although keeping a high level of physical activity might have positive effect on preventing hypertension because that individuals with normal body weight and higher physical activity shows a lower probability of being diagnosed with hypertension, it might could not prevent individuals from getting hypertension with age. 2) After 60, individuals of normal body weight seem more likely to have hyperlipidemia than those are overweight or obese. 3) Larger relative grip strength might not be able to offset the negative effects of obesity, overweight and physical inactivity on hyperlipidemia. 4) The probability of getting hypercholesterolemia varies less with age and relative grip strength. Conclusion: Body weight management and keeping high levels of physical activity are recommended at any age. It might benefit to increase some bodyweight after 60 years old.
REVIEW | doi:10.20944/preprints202210.0391.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Tillage; Traction; Compaction; Neural networks; Support vector regression
Online: 26 October 2022 (02:07:19 CEST)
Soil working tools, implements, and machines are inevitable in mechanized agriculture. The soil-tool/machine interaction is a multivariate, dynamic, and intricate process. The accurate interpretation, description, and modeling of a soil-machine interaction is key to providing a solution to sustainable crop production by reducing energy input, excessive soil pulverization, and compaction. The traditional method provides insight into soil-machine interaction but often provides inadequate solutions and lacks broad applicability. Computational intelligence (CI) is a comprehensive class of approaches that rely on approximate information to solve complex problems. The CI method has been extensively studied and applied in soil tillage and traction domain in recent decades. The study critically reviews the CI techniques implemented in soil-machine interactions, especially in the context of tillage, traction, and compaction. The traditional methods and their limitation are discussed. The fundamental of CI methods and a detailed overview of the most popular methods are provided. The study reviews and summarizes the 50 selected articles on soil-machine interaction studies where CI methods were employed. It discusses the strength and limitations of employed CI methods. It also suggests the emergent CI methods and future applications are discussed. The outlined study would serve as a concise reference and a quick and systematic way to understand the applicable CI methods that allow crucial farm management decision-making.
ARTICLE | doi:10.20944/preprints202106.0533.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: COVID-19; Vaccine; Prediction; Regression; Ensemble learning; AdaBoost
Online: 22 June 2021 (08:30:30 CEST)
The novel coronavirus disease (COVID-19) has created immense threats to public health on various levels around the globe. The unpredictable outbreak of this disease and the pandemic situation are causing severe depression, anxiety and other mental as physical health related problems among the human beings. To combat against this disease, vaccination is essential as it will boost the immune system of human beings while being in the contact with the infected people. The vaccination process is thus necessary to confront the outbreak of COVID-19. This deadly disease has put social, economic condition of the entire world into an enormous challenge. The worldwide vaccination progress should be tracked to identify how fast the entire economic as well as social life will be stabilized. The monitor ofthe vaccination progress, a machine learning based Regressor model is approached in this study. This tracking process has been applied on the data starting from 14th December, 2020 to 24th April, 2021. A couple of ensemble based machine learning Regressor models such as Random Forest, Extra Trees, Gradient Boosting, AdaBoost and Extreme Gradient Boosting are implemented and their predictive performance are compared. The comparative study reveals that the AdaBoostRegressor outperforms with minimized mean absolute error (MAE) of 9.968 and root mean squared error (RMSE) of 11.133.
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Diagnosing designs; rare diseases; statistics; regression; block designs
Online: 2 June 2021 (12:14:34 CEST)
Far too often, one meets patients who went for years or even decades from doctor to doctor, without getting a valid diagnosis. This brings pain to millions of patients and their families, not to speak of the enormous costs. Often patients cannot tell precisely enough which factors (or combinations thereof) trigger their problems. If conventional methods fail, we propose the use of statistics and algebra to give doctors much more useful inputs from patients. We use statistical regression for independent triggering factors for medical problems, and “balanced incomplete block designs” for non-independent factors. These methods can supply doctors with much more valuable inputs, and can also detect combinations of multiple factors by incredibly few tests. In order to show that these methods do work, we briefly describe a case in which these methods helped to solve a 60 year old problem in a patient, and give some more examples where these methods might be very useful. As a conclusion, while regression is used in clinical medicine, it seems to be widely unknown in diagnosing. Statistics and algebra can save the health systems much money, and the patients also a lot of pain.
ARTICLE | doi:10.20944/preprints202103.0586.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: NVOC; phytoncide; bamboo grove; monoterpene; microclimate; regression analysis
Online: 24 March 2021 (13:10:25 CET)
After the COVID-19 outbreak, more and more people are seeking physiological and psychological healing by visiting the forest as the time of stay-at-home became longer. NVOC, a major healing factor of forests, has several positive effects on human health, and this study researched about the NVOC characteristics of bamboo groves. The study revealed that α-pinene, 3-carene, and camphene were the most emitted, and the largest amount of NVOC was emitted in the early morning and late afternoon in bamboo groves. Furthermore, NVOC emission was found to have normal correlations with temperature and humidity, and inverse correlations with solar radiation, PAR and wind speed. A regression analysis conducted to predict the effect of microclimate factors on NVOC emissions resulted in a regression equation with 82.9% explanatory power and found that PAR, temperature, and humidity had a significant effect on NVOC emission prediction. In conclusion, this study investigated NVOC emission characteristics of bamboo groves, examined the relationship between NVOC emissions and microclimate factors and derived a prediction equation of NVOC emissions to figure out bamboo groves' forest healing effects. These results are expected to provide a basis for establishing more effective forest healing programs in bamboo groves.
ARTICLE | doi:10.20944/preprints202008.0329.v2
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: COVID-19; Geospatial Regression; Health Disparities; Public Health
Online: 11 September 2020 (09:48:57 CEST)
COVID-19 is a potentially fatal viral infection. This study investigates geography, demography, socioeconomics, health conditions, hospital characteristics, and politics as potential explanatory variables for death rates at the state and county levels. Data from the Centers for Disease Control and Prevention, the Census Bureau, Centers for Medicare and Medicaid, Definitive Healthcare, and USAfacts.org were used to evaluate regression models. Yearly pneumonia and flu death rates (state level, 2014-2018) were evaluated as a function of the governors’ political party using repeated measures analysis. At the state and county level, spatial regression models were evaluated. At the county level, we discovered a statistically significant model that included geography, population density, racial and ethnic status, three health status variables along with a political factor. State level analysis identified health status, minority status, and the interaction between governors’ parties and health status as important variables. The political factor, however, did not appear in a subsequent analysis of 2014-2018 pneumonia and flu death rates. The pathogenesis of COVID-19 has greater and disproportionate effect within racial and ethnic minority groups, and the political influence on the reporting of COVID-19 mortality was statistically relevant at the county level and as an interaction term only at the state level.
ARTICLE | doi:10.20944/preprints201906.0291.v1
Subject: Medicine And Pharmacology, Internal Medicine Keywords: endothelial disorders; glycocalyx injury; syndecan-1; nonlinear regression
Online: 28 June 2019 (07:42:18 CEST)
Endothelial disorders are related to various diseases. An initial endothelial injury is characterized by endothelial glycocalyx injury. We aimed to evaluate endothelial glycocalyx injury by measuring serum syndecan-1 concentrations in patients during comprehensive medical examinations. A single-center, prospective, observational study was conducted at Asahi University Hospital. The participants enrolled in this study were 1313 patients who underwent comprehensive medical examinations at Asahi University Hospital from January 2018, to June 2018. One patient undergoing hemodialysis was excluded from the study. At enrollment, blood samples were obtained, and study personnel collected demographic and clinical data. No treatments or exposures were conducted except for standard medical examinations and blood sample collection. Laboratory data were obtained by collection of blood samples at the time of study enrolment. According to nonlinear regression, the concentrations of serum syndecan-1 were significantly related to age (p = 0.016), aspartic aminotransferase concentration (AST, p = 0.020), blood urea nitrogen concentration (BUN, p = 0.013), triglyceride concentration (p < 0.001), and hematocrit (p = 0.006). These relationships were independent associations. Endothelial glycocalyx injury, which is reflected by serum syndecan-1 concentrations, is related to age, hematocrit, AST concentration, BUN concentration, and triglyceride concentration.
ARTICLE | doi:10.20944/preprints201811.0096.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: machine learning; stacking; forecasting; regression; sales; time series
Online: 5 November 2018 (09:54:54 CET)
In this paper, we study the usage of machine learning models for sales time series forecasting. The effect of machine learning generalization has been considered. A stacking approach for building regression ensemble of single models has been studied. The results show that using stacking technics, we can improve the performance of predictive models for sales time series forecasting.
ARTICLE | doi:10.20944/preprints201608.0025.v2
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: solar variability; NAO; ENSO; volcanic eruptions; multiple regression
Online: 17 May 2017 (06:27:16 CEST)
The role of natural factors mainly solar eleven-year cycle variability, and volcanic eruptions on two major modes of climate variability the North Atlantic Oscillation (NAO) and El Niño Southern Oscillation (ENSO) are studied for around last 150 years period. The NAO is the primary factor to regulate Central England Temperature (CET) during winter throughout the period, though NAO is impacted differently by other factors in various time periods. Solar variability indicates a strong positive influence on NAO during 1978-1997, though suggests opposite in earlier period. Solar NAO lag relationship is also shown sensitive to the chosen times of reference and thus points towards the previously proposed mechanism/ relationship related to the sun and NAO. The ENSO is influenced strongly by solar variability and volcanic eruptions in certain periods. This study observes a strong negative association between the sun and ENSO before the 1950s, which is even opposite during the second half of 20th century. The period 1978-1997, when two strong eruptions coincided with active years of strong solar cycles, the ENSO, and volcano suggested a stronger association, and we discussed the important role played by ENSO. That period showed warming in central tropical Pacific while cooling in the North Atlantic with reference to the later period (1999-2017) and also from chosen earlier period. Here we show that the mean atmospheric state is important for understanding the connection between solar variability, the NAO and ENSO and associated mechanism. It presents a critical analysis to improve knowledge about major modes of variability and their role in climate. We also discussed the importance of detecting the robust signal of natural variability, mainly the sun.
COMMENT | doi:10.20944/preprints201608.0166.v1
Subject: Social Sciences, Geography, Planning And Development Keywords: Regional inequality; Multilevel regression; Markov chain; Guizhou Province
Online: 17 August 2016 (12:58:58 CEST)
This study analyses regional development in one of the poorest provinces in China, Guizhou Province, between 2000 and 2012 using a multiscale and multi-mechanism framework. In general, regional inequality has been declining since 2000. In addition, economic development in Guizhou Province presented spatial agglomeration and club convergence, which shows how the development pattern of one core area, two-wing areas and a contiguous area at the edge of the province have been developed between 2006 and 2012. Multilevel regression analysis revealed that industrialization and investment level were the primary driving forces of regional economic disparity in Guizhou Province. The influences of marketization and decentralization on regional economic disparity were relatively weak. Investment level reinforced regional economic disparity and the development of core-periphery structure in the province. However, investment level actually weakened the regional economic disparity in Guizhou Province when the variable of time was considered. In addition, both the topography and urban–rural differentiation were the two main reasons for forming a core-periphery structure in Guizhou Province.
ARTICLE | doi:10.20944/preprints202304.1023.v1
Subject: Social Sciences, Safety Research Keywords: vehicle crash data; collision risk; ordinal logistic regression; multinomial logistic regression; proportional odds model (POM); partial proportional odds model (PPOM)
Online: 27 April 2023 (04:02:49 CEST)
The use of logistic regression models in data analysis and machine learning has expanded in recent years and has become the primary preference of researchers in risk assessment studies across a wide range of scientific fields. From the assessment of credit risk in financial institutions to the estimation of risk factors for traffic accidents or the identification of etiological factors for chronic diseases. All logistic models are natural extensions of the simple binary model, and their interpretation is based on it. Using the data of a cross-sectional study on the risk factors of traffic collisions, the two main extended models of logistic techniques, multinomial and ordinal logistic regression, are presented in the article in detail. Emphasis is placed on the use of ordinal regression since the outcome variable of the collision data is defined as ordinal measurement reflecting a latent continuous scale.
ARTICLE | doi:10.20944/preprints202011.0363.v1
Subject: Chemistry And Materials Science, Analytical Chemistry Keywords: cannabinoid receptor 1; synthetic cannabinoids; quantitative structure-activity relationship; multiple linear regression; partial least squares regression; dependence and abuse potential
Online: 13 November 2020 (07:19:36 CET)
In recent years, there have been frequent reports on the adverse effects of synthetic cannabinoid (SC) abuse. SCs cause psychoactive effects, similar to those caused by marijuana, by binding and activating cannabinoid receptor 1 (CB1R) in the central nervous system. The aim of this study was to establish a reliable quantitative structure-activity relationship (QSAR) model to correlate the structures and physicochemical properties of various SCs with their CB1R-binding affinities. We prepared 15 SCs and their derivatives (tetrahydrocannabinol [THC], naphthoylindoles, and cyclohexylphenols) and determined their binding affinity to CB1R, which is known as a dependence-related target. We calculated the molecular descriptors for dataset compounds using an R/CDK (R package integrated with CDK, version 3.5.0) toolkit to build QSAR regression models. These models were established and statistical evaluations were performed using the mlr and plsr packages in R software. The most reliable QSAR model was obtained from the partial least squares regression method via external validation. This model can be applied in vivo to predict the addictive properties of illicit new SCs. Using a limited number of dataset compounds and our own experimental activity data, we built a QSAR model for SCs with good predictability. This QSAR modeling approach provides a novel strategy for establishing an efficient tool to predict the abuse potential of various SCs and to control their illicit use.
ARTICLE | doi:10.20944/preprints202311.0350.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: 3D segmentation; feature extraction; regression machine learning; weight estimation
Online: 6 November 2023 (11:20:30 CET)
Accurate weight measurement is pivotal for monitoring the growth and well-being of cattle. However, the conventional weighing process, which involves physically placing cattle on scales, is labor-intensive and distressing for the animals. Hence, the development of automated cattle weight prediction techniques assumes critical significance. This study proposes a weight prediction approach for Korean cattle using 3D segmentation-based feature extraction and regression machine learning techniques from incomplete 3D shapes acquired from real farm environments. In the initial phase, we generated mesh data of 3D Korean cattle shapes using a multiple-camera system. Subsequently, deep learning-based 3D segmentation with the PointNet network model was employed to segment two dominant parts of the cattle. From these segmented parts, three crucial dimensions of Korean cattle were extracted. Finally, we implemented five regression machine learning models (CatBoost regression, LightGBM, Polynomial regression, Random Forest regression, and XGBoost regression) for weight prediction. To validate our approach, we captured 270 Korean cattle in various poses, totaling 1190 poses of 270 cattle. The best result was achieved with mean absolute error (MAE) of 25.2 kg and mean absolute percent error (MAPE) of 5.81% using the random forest regression model.
ARTICLE | doi:10.20944/preprints202310.0938.v1
Subject: Engineering, Mechanical Engineering Keywords: onion; peeling; compressed air; skin; waste; non-linear regression
Online: 16 October 2023 (09:11:18 CEST)
The paper presents the relationship between the efficiency of the process of skin onion peeling and its effect in the form of waste. The research was carried out on a pilot test stand for onion peeling. The process variables were compressed air with a pressure of (p) and valve controlling opening time of flow (t). The experiment took into account the influence of the onion diameter (d0) and its hardness (H). The obtained results were subjected to statistical analysis. Standard deviations were of the percentage loss of onion mass in the form of the skin removed of onion peeling in the process in relation to obtained aver-age values. Tukey's multiple comparison test was performed in order to identify the importance of individual process variables on the final effect of onion peeling. This was the basis for the development of a predictive model in the form of a nonlinear regression Mp=f(p,t,d0,H), which is a mathematical description of the peeling onion skin process . Finally, the response surface area of relationship between analyzed variables was determined. The results of research showed the peeling efficiency of the onion and waste of skin mass depend on the compressed air pressure. Extending the onion blowing time does not improve the process efficiency, while the hardness and size of the onion are irrelevant to the process.
ARTICLE | doi:10.20944/preprints202309.0755.v1
Subject: Medicine And Pharmacology, Endocrinology And Metabolism Keywords: diabetes; CGM; hypoglycemia; hyperglycemia; prediction; ARIMA; logistic regression; LSTM
Online: 12 September 2023 (16:53:51 CEST)
Background: Novel technologies like continuous glucose monitor (CGM) systems are improving diabetes management by means of real-time sensor glucose levels, retrospective course of glucose and trend arrows. Continuous Glucose Monitoring (CGM) offers real-time alerts for (prognostic) hypo- and hyperglycemia, fast dropping or increasing glucose, and hence improving glycaemia under unstable conditions like during meals, physical activity and exercise management. Complex CGM systems challenge people with diabetes and health care professionals in interpreting rapid changes, sensor delay (~10-minute difference between interstitial and plasma glucose), and malfunctions. Enhanced prediction models are necessary for optimal insulin dosing, daily activities, and especially for future fully closed-loop systems. Methods: The aim of this study was to investigate the efficacy of three different predictive models for glucose responses: 1) an autoregressive integrated moving average model (ARIMA), 2) logistic regression, 3)and long short-term memory networks (LSTM), in predicting glucose levels after 15 minutes and one hour. We compared and evaluated the performance of these models in predicting hypoglycemia (<70 mg/dL), euglycemia (70-180 mg/dL), and hyperglycemia (>180 mg/dL). In more detail, by assessing metrics such as precision, recall, F1-score, and accuracy, we specifically assessed which model provided the most accurate and reliable predictions for glucose levels Results: As expected, ARIMA showed the worst accuracy especially predicting hypoglycaemia withing 1-hour (7.3%). The accuracy of the logistic regression model, predicting hypoglycemia during the first 15 min was higher (98%), comparing to LSTM (88%). However, the LSTM model (87%) exceeded the accuracy of hypoglycemia prediction of the logistic regression (83%) during an hour prognosis. The same pattern observed in hyperglycemia - ARIMA model (60%, 1 hour), logistic regression (96%, 15 minutes) and LSTM (85%, 1 hour) Conclusions: These findings suggest that different models may have varying strengths and weaknesses in predicting glucose levels, and the choice of model should be carefully considered based on the specific requirements and context of the clinical application. The logistic regression model was more accurate for the next 15 minutes, especially predicting hypoglycemia. However, the LSTM model exceeded logistic regression for the next one hour prediction. Future research could explore hybrid models or ensemble approaches that combine the strengths of multiple models to further improve the accuracy and reliability of glucose predictions.
ARTICLE | doi:10.20944/preprints202309.0302.v1
Subject: Computer Science And Mathematics, Robotics Keywords: stabilization; symbolic regression; synthesized control; evolutionary computations; quadcopter model
Online: 5 September 2023 (10:11:12 CEST)
The development of artificial intelligence systems assumes that a machine can independently generate an algorithm of actions or a control system to solve the tasks. To do this, the machine must have a formal description of the problem and possess computational methods for solving it. The article deals with the problem of optimal control, which is the main task in the development of control systems, insofar as all systems being developed must be optimal from the point of view of a certain criterion. However, there are certain difficulties in implementing the resulting optimal control modes. The paper considers an extended formulation of the optimal control problem, which implies the creation of such systems that would have the necessary properties for its practical implementation. To solve it, an adaptive synthesized optimal control approach based on the use of numerical methods of machine learning is proposed. The method moves the control object, optimally changing the position of the stable equilibrium point in the presence of some initial position uncertainty. As a result, from all possible synthesized controls, he chooses one that is less sensitive to changes in the initial states. As an example, the optimal control problem of quadcopter with complex phase constraints is considered. To solve this problem? according to the proposed approach, the control synthesis problem is firstly solved to obtain a stable equilibrium point in the state space by a machine learning method of symbolic regression. After that optimal positions of the stable equilibrium point are searched according to source functional from the optimal control problem by particle swarm optimization algorithm. It is shown that such approach allows generating the control system automatically by computer basing on the formal statement of the problem and then directly implementing it onboard as far as they have already had a stabilization system inserted.
ARTICLE | doi:10.20944/preprints202308.1978.v1
Subject: Biology And Life Sciences, Life Sciences Keywords: biomarker, LLM, interpretability, scRNA-seq, machine learning, symbolic regression
Online: 30 August 2023 (03:53:31 CEST)
Single-cell RNA sequencing (scRNA-seq) technology has significantly advanced our understanding of the diversity of cells and how this diversity is implicated in diseases. Yet, translating these findings across various scRNA-seq datasets poses challenges due to technical variability and dataset-specific biases. To overcome this, we present a novel approach that employs both an LLM-based framework and explainable machine learning to facilitate generalization across single-cell datasets and identify gene signatures to capture disease-driven transcriptional changes. Our approach uses scBERT, which harnesses shared transcriptomic features among cell types to establish consistent cell-type annotations across multiple scRNA-seq datasets. Additionally, we employ a symbolic regression algorithm to pinpoint highly relevant yet minimally redundant models and features for inferring a cell type’s disease state based on its transcriptomic profile. We ascertain the versatility of these cell-specific gene signatures across datasets, showcasing their resilience as molecular markers to pinpoint and characterize disease-associated cell types. Validation is carried out using four publicly available scRNA-seq datasets from both healthy individuals and those suffering from ulcerative colitis (UC). This demonstrates our approach’s efficacy in bridging disparities specific to different datasets, fostering comparative analyses. Notably, the simplicity and symbolic nature of the retrieved gene signatures facilitate their interpretability, allowing us to elucidate underlying molecular disease mechanisms using these models.
ARTICLE | doi:10.20944/preprints202308.0314.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Hail; Lightning; Climate change; Regression analysis; Trends; Reanalysis data
Online: 3 August 2023 (10:07:40 CEST)
We have developed additive logistic models for the occurrence of lightning, large (≥ 2 cm), and very large (≥ 5 cm) hail to investigate the evolution of these hazards in the past, in the future, and for forecasting applications. The models, trained with lightning observations, hail reports, and predictors from atmospheric reanalysis, assign an hourly probability to any location and time on a 0.25° × 0.25° × 1-hourly grid as a function of reanalysis-derived predictor parameters, selected following an ingredients- based approach. The resulting hail models outperform the Significant Hail Parameter and the simulated climatological spatial distributions and annual cycles of lightning and hail are consistent with observations from storm report databases, radar, and lightning detection data. As a corollary result, CAPE released above the -10°C isotherm was found to be a more universally skilful predictor for large hail than CAPE. In the period 1950–2021, the models applied to the ERA5 reanalysis indicate significant increases of lightning and hail across most of Europe, primarily due to rising low-level moisture. The strongest modelled hail increases occur in northern Italy with increasing rapidity after 2010. Here, very large hail has become 3 times more likely than it was in the 1950s. Across North America trends are comparatively small, apart from isolated significant increases in the direct lee of the Rocky Mountains and across the Canadian Plains. In the southern Plains, a period of enhanced storm activity occurred in the 1980s and 1990s.
ARTICLE | doi:10.20944/preprints202307.0405.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Jackknife; Kibria-Lukman; estimator; Maximum Likelihood; Negative Binomial regression
Online: 6 July 2023 (08:58:10 CEST)
The negative binomial regression model (NBRM) is a generalized linear model which relaxes the restrictive assumption by the Poisson regression model when the variance is equal to the mean. The estimation of the parameters of the NBRM is obtained using the maximum likelihood (ML) method. Maximum likelihood estimator becomes unstable when the explanatory variables are linearly dependent, a situation known as multicollinearity. Based on this, we developed a new estimator called modified jackknifed Negative Binomial Kibria-Lukman (MJNBKL) estimator for the radiation of multicollinearity in NBRM using four different biasing (shrinkage) parameters. We establish superiority condition for MJNBKL estimator over the ones. The performance MJNBKL estimator was ascertained by comparing it with the existing ones through a Monte Carlo simulation study and two real life application datasets. The results of the simulation and real life application show that MJNBKL estimator outperformed the other estimators compared with by having the smallest MSE across all sample sizes and for different levels of correlation for the four biasing parameters used and the third biasing parameter is the optimal shrinkage parameter with the lowest MSE.
REVIEW | doi:10.20944/preprints202303.0401.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Strawman fallacy; UK General Medical Council; autism; regression; MMR
Online: 22 March 2023 (14:39:30 CET)
Background: Articles published in scholarly journals form part of the scientific evidence base. It is the responsibility of the scientific community to maintain its integrity. In 2011 the BMJ commissioned a feature article to draw attention to an article that had appeared in another journal- The Lancet 13 years previously. The Lancet had already retracted the article. These actions exemplify the best traditions of scientific record-keeping. Objective: This submission examines whether the main claims summary made in the BMJ were factual. Method: We examine what was published in the Lancet against what was published in the BMJ and verify against the findings in the GMC hearings transcripts and verdict of the UK High Court. Results: The 6 points highlighted in BMJ had errors and need to be corrected. Conclusions: There are significant differences between what was reported in the Lancet paper and what was alleged to be there by the BMJ. This article aims only to point to errors in the BMJ article, to set the record straight. It does not show there was a causal association between MMR vaccination and autism.
ARTICLE | doi:10.20944/preprints202302.0083.v2
Subject: Environmental And Earth Sciences, Environmental Science Keywords: Multilinear Regression; Dissolve Oxygen; Modeling; Machine Learning; Levenberg–Marquardt algorithm; ANN; Urban Lake
Online: 27 February 2023 (07:25:06 CET)
The paper portrays predictive models for dissolved oxygen (DO) levels in an urban lake using common water quality parameters like Temperature, pH, Conductivity and ORP at a time. Data were sampled using three real-time, industry-standard sensors, OPTOD, CTZN, and PHEHT, and then interpolated using the ArcGIS kriging technique. Correlation studies were analyzed through the ML algorithm, the correlation study signified a highly positive correlation between DO and other water parameters and the model was corroborated by R-score in order to create the linear regression model. In addition, an artificial neural network- a machine learning method using the Levenberg-Marquardt algorithm was developed to build a model to predict the do as well. Then, the performance of the models was validated and also the R2 accuracy was checked of the predicted data against the actual data. Thus, the appropriateness of the ANN model for the forecasting of investigated attributes is indicated by the fact that the discrepancy between the forecasted and real ANN model is significantly lesser than that of the regression model. However, the model can be used to reveal DO data from unknown urban lake water.
ARTICLE | doi:10.20944/preprints202210.0078.v1
Subject: Medicine And Pharmacology, Obstetrics And Gynaecology Keywords: Africa; Maternal mortality rate; Joinpoint regression analysis; mortality; trends.
Online: 7 October 2022 (10:30:10 CEST)
Background: United Nations Sustainable Development Goals state that by 2030, the Global maternal mortality rate (MMR) should be lower than 70 per 100,000 live births. MMR is still one of Africa's leading causes of death among women. This research aims to study regional trends in maternal mortality in Africa. Methods: We extracted data for Maternal mortality rates per 100,000 births from the UNICE data bank from 2000 to 2017, being 2017 the last date available. Joinpoint regression was used to study the trends and estimate the annual percent change (APC). Results: Maternal mortality has decreased in Africa over the study period by an average APC of -3.0% (95% CI -2.9;-3,2%). All regions showed significant downward trends, with the sharpest decreases in the South. Only the North African region is close to the United Nations' sustainable development goals for Maternal mortality. The remaining sub-Saharan African regions are still far from achieving the goals. Conclusions: maternal mortality has decreased in Africa, especially in the South Africa region. The only region closed to the United Nations target is North Africa. The remaining sub-Saharan African regions are still far from achieving the goals. These results could be used for the development of Regional Policies.
ARTICLE | doi:10.20944/preprints202209.0353.v1
Subject: Medicine And Pharmacology, Obstetrics And Gynaecology Keywords: Africa; Maternal mortality rate; Joinpoint regression analysis; mortality; trends
Online: 23 September 2022 (03:06:07 CEST)
Background: United Nations Sustainable Development Goals state that by 2030, the Global maternal mortality rate (MMR) should be lower than 70 per 100,000 live births. MMR is still one of Africa's leading causes of death among women. This research aims to study regional trends in maternal mortality in Africa. Methods: We extracted data for Maternal mortality rates per 100,000 births from the World Bank database from 1990-2015. Joinpoint regression was used to study the trends and estimate the annual percent change (APC). Results: Maternal mortality has decreased in Africa over the study period by an average APC of -2.6%. All regions showed significant downward trends, with the sharpest decreases in East Africa. Only the North African region is close to the United Nations' sustainable development goals for Maternal mortality. The remaining sub-Saharan African regions are still far from achieving the goals. Conclusions: maternal mortality has decreased in Africa, especially in East Africa. The only region closed to the United Nations target is North Africa. The remaining sub-Saharan African regions are still far from achieving the goals. These results could be used for the development of Regional Policies.
ARTICLE | doi:10.20944/preprints202208.0445.v1
Subject: Business, Economics And Management, Economics Keywords: Adult children's education; parental longevity; truncated regression; emotional support.
Online: 26 August 2022 (04:18:44 CEST)
Background: Some developing countries, such as China, population is aging rapidly, meanwhile, the average years of schooling for residents is constantly increasing. However, the question of whether adult children’s education has an effect on the longevity of older parents, remains inadequately studied. Methods: This paper uses China Health and Retirement Longitudinal Survey (CHARLS) data to estimate the causal impact of adult children's education on their parents' longevity. Identification is achieved by using the truncated regression model and using historical education data as instrument variables for adult children’s education. Results: For every unit increase in adult children’s education, the father’s and mother’s longevity increased by 0.89 years and 0.75 years, respectively. Mechanism analysis shows that adult children's education has a significant positive impact on parents' emotional support, financial support and self-reported health. Further evidence shows that for every unit increase in adult children’s education, the father-in-law’s and mother-in-law’s longevity increased by 0.40 years and 0.46 years, respectively. Conclusions: It is conclusion that improving the level of adult children’s education can increase parents’ and parents-in-law’s longevity. Adult children’s education might contribute to the longevity of older parents by three channels that providing emotional, economic support and affecting parents’ health.
ARTICLE | doi:10.20944/preprints202205.0255.v1
Subject: Biology And Life Sciences, Biophysics Keywords: SILCS; hERG channel; Physicochemical properties; Multiple linear regression; FragMaps
Online: 19 May 2022 (08:46:24 CEST)
Human ether-a-go-go-related gene (hERG) potassium channel is well-known contributor to drug-induced cardiotoxicity and therefore an extremely important target when performing safety assessments of drug candidates. Ligand-based approaches in connection with quantitative structure active relationships (QSAR) analyses have been developed to predict hERG toxicity. Availability of the recent published cryogenic electron microscopy (cryo-EM) structure for the hERG channel opened the prospect for using structure-based simulation and docking approaches for hERG drug liability predictions. In recent time, the idea of combining structure- and ligand-based approaches for modeling hERG drug liability has gained momentum offering improvements in predictability when compared to ligand-based QSAR practices alone. The present article demonstrates uniting the structure-based SILCS (site-identification by ligand competitive saturation) approach in conjunction with physicochemical properties to develop predictive models for hERG blockade. This combination leads to improved model predictability based on Pearson’s R and percent correct (represents rank-ordering of ligands) metric for different validation sets of hERG blockers involving diverse chemical scaffold and wide range of pIC50 values. The inclusion of the SILCS structure-based approach allows determination of the hERG region to which compounds bind and the contribution of different chemical moieties in the compounds to blockade, thereby facilitating the rational ligand design to minimize hERG liability.
ARTICLE | doi:10.20944/preprints202205.0240.v1
Subject: Business, Economics And Management, Economics Keywords: Credit constraints; Export; SMEs; Instrumental variable; Probit regression; Vietnam
Online: 18 May 2022 (10:35:32 CEST)
Export participation and restricted access to external formal credit are two factors attracting meticulous attention from researchers and policymakers, especially in developing countries. Exploring the interactive relationship of these factors in both the static and dynamic models is the purpose of this study. The study uses data sets from small and medium-sized manufacturing enterprises (SMEs) in Vietnam for the period 2009 - 2015. The instrumental variable approach is implemented to deal with the endogenous variable problem in the model. The results show an effect of credit constraint on the firms’ exporting status, and continuous exports are likely to reduce the limit of credit constraint.
ARTICLE | doi:10.20944/preprints202205.0032.v1
Subject: Business, Economics And Management, Business And Management Keywords: digitalisation; sustainability; sustainable development goals; European Union; regression equations
Online: 5 May 2022 (10:24:13 CEST)
Digitalisation provides access to an integrated network of information that can benefit society, and business. Building digital network and society using digital means can create something unique opportunities to strategically address sustainable development challenges for the United Nations Targets (SDG) to ensure higher productivity, education and to equality oriented society. This point of view describes the potential of digitalisation for society and business of the future. The authors revise the links between digitalisation and sustainability in the European Union countries. The methodology for the research is suggested in the paper and linear regression method is applied. The results showed tiers with five SDG, focusing on society and business, and all these tiers are fixed in the constructed equations for each SDG. The suggested solution is statistically valid and proves the novelty of research. Among digitalisation indicators, only mobile-cellular subscriptions and fixed-broadband sub-basket prices in part have no effect on researched sustainable development indicators.
ARTICLE | doi:10.20944/preprints202201.0408.v1
Subject: Medicine And Pharmacology, Dietetics And Nutrition Keywords: Indonesia; islands cluster; multiple logistic regression; obesity; risk factor
Online: 27 January 2022 (06:53:58 CET)
Obesity has become a rising global health problem affecting adults’ quality of life. The objective of this study was to describe the prevalence of obesity in Indonesian adults based on the cluster of islands. The study was also aimed to identify the risk factors of obesity in each island cluster. This study analysed secondary data of Indonesian Basic Health Research 2018. Our data for analysis comprised 688,638 adults (>=15 years) randomly selected using proportionate to population size throughout Indonesia. We included 20 variables for sociodemographic and obesity-related risk factors for analysis. Obese status was defined using Body Mass Index (BMI) >= 27.5 kg/m2. Our current study defined seven major islands cluster as the unit analysis consisting of 34 provinces in Indonesia. Descriptive analysis was conducted to determine the characteristics of the population and to calculate the prevalence of obesity within provinces in each of the island’s clusters. Multivariate logistic regression analyses to calculate odds ratios (ORs) was performed using R version 3.6.3. The study results showed that all island clusters had at least one province with an obesity prevalence of more than 20%. Six out of twenty variables, comprising four diet factors (consumption of sweet food, high-salt food, meat food, and carbonated drinks) and two other factors (mental health disorders and smoking behaviour), varied across the island clusters. In conclusion, there was a variation of obesity prevalence of the provinces within and between island clusters. Variation of risk factors raised in each cluster island suggested the government rethink and reframe the intervention to address obesity.
ARTICLE | doi:10.20944/preprints202112.0455.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: COVID- 19; Durbin-Watson statistic; Multiple Linear Regression; Multicollinearity
Online: 28 December 2021 (16:11:44 CET)
This paper will discuss the application of statistic modeling to interpret a health system crisis in Sri Lanka due to COVID- 19.A strong focus on the preventive approach and the contact tracing with the utilization of available resources in a rational manner describes Sri Lanka’s response towards COVID- 19 prevention and mitigation. The early contact tracing, preemptive quarantining, isolation, and treatment were implemented as a concerted effort. This approach, proven efficient during the early phase of the pandemic, was sustainable when there was a rapid increase in the COVID- 19 patients since July 2021, exceeding the health system capacity.The country’s COVID- 19 situation during the period from 01st of August 2021 to 31st of October 2021 was taken into consideration. Variables used for analysis were; total number of cases, recovered cases, comorbid and O2 dependent patients, ICU patients, and deaths. The regression model was applied to analyze the data by using the EViews 12 (x64) software application.The correlation coefficients of all the independent variables under consideration implies that they have a strong positive relationship with the number of deaths occurred during the said period. According to the computed multiple linear regression model, the number of positive cases and O2 dependents have a positive relationship with the dependent variable. Further, the Durbin- Watson stat value of the model and multicollinearity test reflect that it is free from serial correlation thereby the model is fit. From the perspective of epidemiological control, these findings highlight the importance of keeping the number of cases within the limits of health system capacity.
ARTICLE | doi:10.20944/preprints202111.0227.v1
Subject: Business, Economics And Management, Marketing Keywords: Lolita fashion; multiple regression; decision tree; social media; XGBoost
Online: 12 November 2021 (14:54:04 CET)
Despite extensively investigating the impact of social media on fashion products’ marketing, little evidence is available on how the platforms influence sales prediction. Focusing on Lolita fashion, this study investigates the impact of social media marketing on the sales volume prediction of fashion products. Essentially, we analyzed marketing data, including comments, likes, and shares from the Weibo social platform, to forecast future sales, examine how to enhance profit performance, and make production decisions. Using a quantitative approach, we tested three different prediction models, including multiple regression, decision tree, and XGBoost. The results revealed that increasing comments and decreasing the number of likes could significantly improve the sales volumes of Lolita products. In contrast, shares exerted a less significant impact on sales. Regarding prediction models, XGBoost was found to be the best method. In the fashion industry, social media is a useful tool for forecasting market trend. A limitation of this study is that only one social media platform was used to extract data, which might limit the generalization of the findings.
ARTICLE | doi:10.20944/preprints202106.0497.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Ecosystem services; Benefit transfer; Meta-analysis; Meta-regression function.
Online: 21 June 2021 (10:04:14 CEST)
Meta-analysis has increasingly been used to synthesize the ecosystem services literature, with some testing of the use of such analyses to transfer benefits. These are typically based on local primary studies. However, meta-analyses associated with ecosystem services are a potentially powerful tool for transferring benefits, especially for environmental assets for which no primary studies are available. In this study we use the Ecosystem Service Valuation Database (ESVD), which brings together 1350 value estimates from more than 320 studies around the world, to estimate meta-regression functions for provisioning, regulating & maintenance and cultural ecosystem services across 12 biomes. We tested the reliability of these meta-regression functions and found that even using variables with high explanatory power, transfer errors could still be large. We show that meta-analytic transfer performs better than simple value transfer and, in addition, that local meta-analytical transfer (i.e. based on local explanatory variable values) provides more reliable estimates than global meta-analytical transfer (i.e. based on mean global explanatory variable values). Thus, we conclude that when taking into account the characteristics of the study area under analysis, including explanatory variables such as income, population density and protection status, we can determine the value of ecosystem services with greater accuracy.
ARTICLE | doi:10.20944/preprints202105.0536.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: Argan biosphere reserve; Climate change; Rainfall; Temperature; Woodland regression
Online: 24 May 2021 (07:44:25 CEST)
This paper explores the effect of climate change on the regression of the Argan tree (Argania spinosa L. Skeels) woodland, focusing on the Argan Biosphere Reserve and especially in the Souss plain (Western Morocco). Rainfall and temperature data of four sites within the Argan Biosphere Reserve were analyzed over the last 60 years to assess any climatic change. Regression curves applied to the dataset showed an important decrease in rainfall (18 to 26 %) in the four locations as well as an increase in temperature (1 to 2 °C). These changes may have a detrimental effect on the Argan woodland although human factors have been reported to be the main factor of its regression. It can therefore be concluded that the reduction in rainfall and the increase in temperature should now be considered as factors of Argan woodland regression.
ARTICLE | doi:10.20944/preprints202104.0622.v1
Subject: Engineering, Automotive Engineering Keywords: Complex Regression, Least-Squares Techniques, Advanced Metering Infrastructure (AMI)
Online: 23 April 2021 (09:46:32 CEST)
This paper uses the complex regression analysis method to establish the customer’s load regression models, which consider economic indicators, temperature and rainfall. Furthermore, the proposed models are used to study the forecasting feasibility of the future energy sales and summer peak load demand. At first, this paper used least-squares techniques to derive regression models for considering economic indicators and temperature of 34 customer energy sales and total energy sales. Besides, the AMI high voltage customer demand data and system generating capacity for 24 hours were adopted to forecast summer peak load. The above-mentioned data analysis tool is used by EViews software to achieve, in order to verify the feasibility of the research framework. The study found that although its forecasting model accuracy is low only when mixed with temperature and high voltage demands. So, when mixed with high voltage demand data and system generating capacity for 24 hours to forecast peak load, the average error is ± 0.87% and in the majority of its energy sales forecasting model of average error is ±3%. This result can provide power company as future reference.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Face detection; CSEM; Deep learning; GPU; CPU; Benchmark; Regression
Online: 27 July 2020 (14:54:15 CEST)
Face recognition is a valuable forensic tool for criminal investigators since it certainly helps in identifying individuals in scenarios of criminal activity like fugitives or child sexual abuse. It is, however, a very challenging task as it must be able to handle low-quality images of real world settings and fulfill real time requirements. Deep learning approaches for face detection have proven to be very successful but they require a large computation power and processing time. In this work, we evaluate the speed-accuracy tradeoff of three popular deep-learning-based face detectors on the WIDER Face and UFDD data sets in several CPUs and GPUs. We also develop a regression model capable to estimate the performance, both in terms of processing time and accuracy. We expect this to become a very useful tool for the end user in forensic laboratories in order to estimate the performance for different face detection options. Experimental results showed that the best speed-accuracy tradeoff is achieved with images resized to 50% of the original size in GPUs and images resized to 25% of the original size in CPUs. Moreover, performance can be estimated using multiple linear regression models with a Mean Absolute Error (MAE) of 0.113 what is very promising for the forensic field.
ARTICLE | doi:10.20944/preprints202001.0377.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: ERT method; regression model; tailings pond; heavy metal; reclamation
Online: 31 January 2020 (05:04:37 CET)
Legacy mining industry has left a large number of tailings ponds exposed to water and wind erosion that causes serious environmental and health problems. Prior to rehabilitation actions a deep sampling of the materials infilling the pond used to be necessary. Thus, the primary objective of this study is to demonstrate the usefulness of the Electrical Resistivity Tomography (ERT) method as a non-invasive tool to determine the physicochemical composition of mine tailings ponds, enabling more efficient and low-cost surveys. To achieve this objective, three ERT profiles and three boreholes in each profile were carried out, from each borehole three waste samples from differents depths were collected and a geochemical characterization of the samples was carried. In order to estimate the composition of the infilling wastes in tailing ponds from electrical resistivity measures, several regression models were calculated for different physicochemical properties and metal concentrations. As a result, a high resistivity area was depicted in profiles G2 and G3 while a non-resistive area (profile G1) was also found. Relationships among low resistivity values and high salinity, clay content and high metal concentrations and mobility were established. Specifically, calibrated models were obtained for electrical conductivity, particles sizes of 0.02-50 µm and 50-2000 µm, total Zn and Cd concentration, and bioavailable Ni, Cd and Fe. Therefore, the ERT technique could be considered as a useful tool for mine tailings ponds characterization, and it can be used to estmate some physicochemical properties and metal concentrations of this mine waste.
ARTICLE | doi:10.20944/preprints201903.0090.v1
Subject: Engineering, Energy And Fuel Technology Keywords: Sustainable development; House prices; ARIMA; Regression analysis; New Zealand
Online: 7 March 2019 (12:02:50 CET)
The New Zealand housing sector is experiencing rapid growth that boosts the national economy but also results in the loss of valuable resources. In line with the growth, the housing market for both residential and business purposes has been booming, as have house prices. To sustain the housing development, it is critical to accurately monitor and predict housing prices so as to support the decision-making process in housing sector. This study is devoted to applying a mathematical method to predict housing prices. The forecasting performance of two types of models: ARIMA and multiple linear regression analysis are compared. The ARIMA and regression models are developed based on a training-validation sample method. The results show that the ARIMA model generally performs better than the regression model. However, the regression model explores, to some extent, the significant correlations between house prices in New Zealand and the macro-economic conditions.
ARTICLE | doi:10.20944/preprints201811.0394.v3
Subject: Engineering, Electrical And Electronic Engineering Keywords: marine current turbine; blade attachment; sparse autoencoder; softmax regression
Online: 12 February 2019 (09:59:09 CET)
The development and application of marine current energy are attracting more and more attention around the world. Due to the hardness of its working environment, it is important and difficult to study the fault diagnosis of a marine current generation system. In this paper, an underwater image is chosen as the fault-diagnosing signal, after different sensors are compared. This paper proposes a diagnosis method based on the sparse autoencoder (SA) and softmax regression (SR). The SA is used to extract the features and SR is used to classify them. Images are used to monitor whether the blade is attached by benthos and to determine its corresponding degree of attachment. Compared with other methods, the experiment results show that the proposed method can diagnose the blade attachment with higher accuracy.
ARTICLE | doi:10.20944/preprints201809.0076.v1
Subject: Medicine And Pharmacology, Pharmacology And Toxicology Keywords: pharmacovigilance; drug safety; segmented regression; interrupted time series; variation
Online: 5 September 2018 (01:27:54 CEST)
Introduction Pharmacovigilance may detect safety issues after marketing of medications, and this can result in regulatory action such as direct healthcare professional communications (DHPC). DHPC can be effective in changing prescribing behaviour, however the extent to which prescribers vary in their response to DHPC is unknown. This study aims to explore changes in prescribing and prescribing variation among GP practices following a DHPC on the safety of mirabegron, a medication to treat overactive bladder (OAB). Methods This is an interrupted time series study of English GP practices from 2014-2017. NHS Digital provided monthly statistics on aggregate practice-level prescribing and practice characteristics (practice staff and registered patient profiles, Quality & Outcomes Framework indicators, and deprivation of the practice area). The primary outcome was monthly mirabegron items as a percentage of all OAB drug items. The exposure was a DHPC issued by the European Medicines Agency in September 2015. Variation between practices in mirabegron prescribing before and after the DHPC was assessed using the systematic component of variation (SCV). Multilevel segmented regression with random effects quantified the change in level and trend of prescribing after the DHPC. Practice characteristics were assessed for their association with a reduction in prescribing following the DHPC. Results This study included 7,408 practices. During September 2015, 88.9% of practices prescribed mirabegron and mirabegron composed a mean of 8.2% (SD 6.8) of OAB items. Variation between practices was classified as very high and the median SCV did not change significantly (p=0.11) in the 6 months after the September 2015 DHPC (12.4) compared to before (11.6). Before the DHPC, there was a monthly trend of 0.294 (95%CI, 0.287, 0.301) percentage points increase in mirabegron percentage. There was no significant change in the month immediately after the DHPC (-0.023, 95% CI -0.105 to 0.058) however there was a significant reduction in trend (-0.036, 95% CI -0.049 to -0.023). Higher numbers of registered patients and patients aged ≥65 years, and practice area deprivation were associated with having a significant decrease in level and slope of mirabegron prescribing post-DHPC. Conclusion Variation in mirabegron prescribing was high over the study period and did not change substantively following the DHPC. There was no immediate prescribing change post-DHPC, although the monthly growth did slow. Knowledge of the degree of variation in and determinants of response to safety communications may allow those that do not change prescribing to be provided with additional supports.
ARTICLE | doi:10.20944/preprints201807.0353.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: corporate default swap spreads, correlation networks, vector autoregressive regression.
Online: 19 July 2018 (10:16:11 CEST)
We propose a novel credit risk measurement model for Corporate Default Swap spreads, that combines vector autoregressive regression with correlation networks. We focus on the sovereign CDS spreads of a collection of countries, that can be regarded as idiosyncratic measures of credit risk. We model them by means of a vector autoregressive regression model, composed by a time dependent country specific component, and by a contemporaneous component that describes contagion effects among countries. To disentangle the two components, we employ correlation networks, derived from the correlation matrix between the reduced form residuals. The proposed model is applied to ten countries that are representative of the recent financial crisis: top borrowing/lending countries, and peripheral European countries. The empirical findings show that the proposed model is a good predictor of CDS spreads movements, and that the contemporaneous component decreases prediction errors with respect to a simpler autoregressive model. From an applied viewpoint, core countries appear to import risk, as contagion increases their CDS spread, whereas peripheral countries appear as exporters of risk. Greece is an unfortunate exception, as its spreads seem to increase for both idiosyncratic factors and contagion effects.
ARTICLE | doi:10.20944/preprints201807.0087.v1
Subject: Business, Economics And Management, Economics Keywords: Nigeria; financial development; economic growth; threshold regression; time series
Online: 5 July 2018 (08:39:38 CEST)
The relationship between economic growth, growth volatility and financial sector development continues to attract attention in the theoretical and empirical literature. Over time, some studies hypothesize that finance has a causal linear relationship with growth. Recently several other authors contradict this claim and argue that the relationship that exists between finance and growth is nonlinear. We investigate these claims for Nigeria for the period between 1970 and 2015, using semi-parametric econometric methods, Hansen sample splitting techniques and threshold estimator. We observed no evidence of ‘Too much finance’ as claimed by many researchers in recent times. We show that the relationship between financial development and economic growth is U-shaped. This is equally true for the relationship between financial development and growth volatility. We also discuss policy implications of our findings and recommend financial innovations and decentralization of stock exchanges to boost access to financial services, in addition, improved regulation to enhance financial market efficiency.
ARTICLE | doi:10.20944/preprints201806.0030.v1
Subject: Environmental And Earth Sciences, Oceanography Keywords: synthetic aperture radar; automatic identification system; ice thickness; regression
Online: 4 June 2018 (10:28:38 CEST)
Ship speeds extracted from AIS data vary with ice conditions. We extrapolated this variation with SAR data to a chart of expected icegoing speed. The study is for the Gulf of Bothnia in March 2013 and for ships with ice class 1A Super that are able to navigate without icbreaker assistance. The speed was normalized to 0-10 for each ship. As the matching between AIS and SAR was complicated by ice drift during the time gap, from hours to two days, we calculated a set of local SAR statistics over several scales. We used random tree regression to estimate the speed. The accuracy was quantified by mean squared error (MSE), and the fraction of estimates close to the actual speeds. These depended strongly on the route and the day. MSE varied from 0.4 to 2.7 units2 for daily routes. 65 % of the estimates deviated less than one unit and 82 % less than 1.5 units from the AIS speeds. The estimated daily mean speeds were close to the observations. Largest speed decreases were provided by the estimator in a dampened form or not at all. This improved when ice chart thickness was included as one predictor.
ARTICLE | doi:10.20944/preprints201803.0093.v1
Subject: Engineering, Control And Systems Engineering Keywords: linear regression; covariance matrix; data association; sensor fusing; SLAM
Online: 13 March 2018 (04:06:56 CET)
Linear regression is a basic tool in mobile robotics, since it enables accurate estimation of straight lines from range-bearing scans or in digital images, which is a prerequisite for reliable data association and sensor fusing in the context of feature-based SLAM. This paper discusses, extends and compares existing algorithms for line fitting applicable also in case of strong covariances between the coordinates at each single data point, which must not be neglected if range-bearing sensors are used. Besides, particularly the determination of the covariance matrix is considered, which is required for stochastic modeling. The main contribution is a new error model of straight lines in closed form for calculating fast and reliably the covariance matrix dependent on just a few comprehensible and easily obtainable parameters. The model can be applied widely in any case when a line is fitted from a number of distinct points also without a-priori knowledge of the specific measurement noise. By means of extensive simulations the performance and robustness of the new model in comparison to existing approaches is shown.
ARTICLE | doi:10.20944/preprints201803.0084.v1
Subject: Engineering, Civil Engineering Keywords: anfis; missing data; multiple regression; normal ratio method; Yeşilırmak
Online: 12 March 2018 (07:00:46 CET)
Good data analysis is required for the optimal design of water resources projects. However, data are not regularly collected due to material or technical reasons, which results in incomplete-data problems. Available data and data length are of great importance to solve those problems. Various studies have been conducted on missing data treatment. This study used data from the flow observation stations on Yeşilırmak River in Turkey. In the first part of the study, models were generated and compared in order to complete missing data using ANFIS, multiple regression and Normal Ratio Method. In the second part of the study, the minimum number of data required for ANFIS models was determined using the optimum ANFIS model. Of all methods compared in this study, ANFIS models yielded the most accurate results. A 10-year training set was also found to be sufficient as a data set.
ARTICLE | doi:10.20944/preprints201801.0090.v1
Subject: Business, Economics And Management, Econometrics And Statistics Keywords: clustering; curve fitting; nonparametric regression; smoothing data; polynomial approximation
Online: 10 January 2018 (09:48:23 CET)
Nonlinear nonparametric statistics (NNS) algorithm offers new tools for curve fitting. A relationship between k-means clustering and NNS regression points is explored with graphics showing a perfect fit in the limit. The goal of this paper is to demonstrate NNS as a form of unsupervised learning, and supply a proof of its limit condition. The procedural similarity NNS shares with vector quantization is also documented, along with identical outputs for NNS and a k nearest neighbours classification algorithm under a specific NNS setting. Fisher's iris data and artificial data are used. Even though a perfect fit should obviously be reserved for instances of high signal to noise ratios, NNS permits greater flexibility by offering a large spectrum of possible fits from linear to perfect.
ARTICLE | doi:10.20944/preprints201705.0007.v1
Subject: Business, Economics And Management, Economics Keywords: adoption; land degradation; poisson regression; sustainable land management practices
Online: 1 May 2017 (08:33:17 CEST)
Land degradation is a serious impediment to improving rural livelihoods in Eastern Africa. This paper identifies major land degradation patterns and causes, and analyzes the determinants of sustainable land management (SLM) in three countries (Ethiopia, Malawi and Tanzania). The results show that land degradation hotspots cover about 51%, 41%, 23% and 23% of the terrestrial areas in Tanzania, Malawi and Ethiopia respectively. The analysis of nationally representative household surveys shows that the key drivers of SLM in these countries are biophysical, demographic, regional and socio-economic determinants. Secure land tenure, access to extension services and market access are some of the determinants incentivizing SLM adoption. The implications of this study are that policies and strategies that facilities secure land tenure and access to SLM information are likely to incentivize investments in SLM. Local institutions providing credit services, inputs such as seed and fertilizers, and extension services must also not be ignored in the development policies.
ARTICLE | doi:10.20944/preprints201608.0026.v1
Subject: Engineering, Civil Engineering Keywords: concrete; sustainability; regression analysis mix design; CO2 emission; cost
Online: 3 August 2016 (06:05:26 CEST)
As argued by ‘Declaration of Concrete Environment (2010)’ of Korea and ‘Declaration of Asian Concrete Environment (2011)’ of six Asian countries, concrete as a single material has lately shown extremely large impact on environmental issues such as climate change. Assessment of environmental impact from concrete material and production has considerable importance. Concrete is a major material used in the construction industry that emits a large amount of substances with environmental impacts during its life cycle. Accordingly, technologies for the reduction in and assessment of the environmental impact of concrete from the perspective of Life Cycle Assessment must be developed. At present, the studies in relation to greenhouse gas emission from concrete are being carried out globally as a countermeasure against climate change. In this study, a sustainable concrete mix design algorithm was designed using correlation analyses, and its carbon emission and cost reduction performances were assessed. Using correlation analyses, the concrete strength, w/b and s/a ratios, and CO2 emissions were identified as major variables of concrete mix design that influenced other variables. Also, this study aims to evaluate the CO2 emission reduction performance of the algorithm-deduced sustainable concrete mix design, and therefore, the CO2 emissions of the sustainable concrete mix design are compared with those of the actual concrete mix design applied to the construction of the office building A in South Korea.
ARTICLE | doi:10.20944/preprints202311.0515.v1
Subject: Engineering, Civil Engineering Keywords: shared micromobility; e-scooters; spatiotemporal analysis; negative binomial regression analysis
Online: 8 November 2023 (04:46:20 CET)
Shared micromobility has gained significant attention in the field of transportation engineering in recent years as an environmentally friendly, convenient, and easily accessible transportation mode. Like other medium-sized cities, Birmingham, Alabama implemented a shared micromobility pilot program in 2021 that captured the attention of local travelers. This study examined shared e-scooter usage and associated travel patterns in Birmingham using 2021-2022 field data. From these data, ArcGIS maps were used to showcase trip origins and destinations. To gain a further understanding of e-scooter travel patterns in the study area, zip code and block group densities were calculated. Additionally, a negative binomial regression model was constructed to identify determinants of shared e-scooter trips. The analysis results showed that the usage of shared e-scooters was the highest during the nighttime, on weekends, and in the fall season. Furthermore, the research findings indicated that shared e-scooters experienced their highest utilization rates in areas with a higher proportion of educated and higher-income individuals. These findings suggest that travelers’ mode choice related to the use of micromobility modes is influenced by environmental and demographic factors. Overall, this case study offers valuable contributions to the understanding of the role of shared e-scooters in Birmingham's transportation landscape and can guide transportation authorities in other medium-sized cities in their efforts to plan for micromobility options.
ARTICLE | doi:10.20944/preprints202310.1324.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: feature selection; milk mid-infrared spectra; fatty acids concentration; regression
Online: 20 October 2023 (10:15:41 CEST)
Milk MIR spectra have been shown to provide valuable information on a wide range of traits to be used in dairy cattle breeding programs. Selecting the most informative variables from complex data can improve prediction accuracy and model robustness and, consequently, the interpretability of MIR spectra. Thus, we aimed to investigate the prediction performance of feature selection methods based on MIR spectra data, using the milk fatty acid (FA) profile as an example to illustrate the evaluated procedure. Data of MIR spectra, milk test-day records, and reference FA concentrations of 155 first-parity Holstein cows were used in the analyses. Four models comprising different explanatory variables and five feature selection methods were evaluated. The results indicated that the Competitive Adaptive Reweighted Sampling (CARS) method can effectively select the most informative variables from the MIR spectra, resulting in higher prediction accuracies than other variable selection approaches. The model including selected MIR spectra and cow information variables [days in milk at the test day, age at the test day, pregnancy stage (in days), number of days open, number of inseminations, and somatic cell count] yielded the best FA profile predictions based on Partial Least Square regression. In particular, ten FAs (C8:0, C10:0, C14:1, C17:0 isomers, C18:1, C18:1 isomer, medium-chain FA, unsaturation FA, monounsaturated FA, and polyunsaturated FA) presented accuracies based on the determination coefficient (R2cv) ranging from 0.66 to 0.85 in internal validation and from 0.65 to 0.84 in external validation. By running CARS 1,000 times in internal validations, we obtained the frequency of selected milk MIR wavenumber for 35 FAs. The most related wavenumbers to FAs were found within 1,003 to 1,145 cm-1, while other discrete areas were between 1,651 to 1,797 and 2,834 to 2,954 cm-1. These biomarkers may give insights into the relationship between MIR spectra and FA phenotypes. In conclusion, using CARS and cow information improved predictions of FAs based on MIR spectra in Chinese Holstein dairy cows. Additional validation studies should be conducted as larger datasets become available.
ARTICLE | doi:10.20944/preprints202310.1008.v1
Subject: Business, Economics And Management, Economics Keywords: agricultural cooperatives; technology adopters; non-technology adopters; panel data regression
Online: 17 October 2023 (08:14:04 CEST)
In the current situation, the world is busy with technological advances, including Indonesia. Since its arrival, many business fields have competed with each other to take part as technology users. One business sector that cannot be separated from technological support is cooperatives. On the other hand, some cooperatives in the developing phase experience technological lag. At the same time, East Kutai Regency, which is the agricultural center in East Kalimantan Province, tends to rely on the cooperative sector to encourage small and medium-scale economies. This research aims to investigate the causality between access to computers (AC), internet networks (IN), digital administration skills (DAS), and financial literacy (FL) on profits (PFT). The objectivity of the study compares agricultural cooperatives that adopt technology with adopt non-technology. Using panel data regression from eighteen sub-districts in East Kutai, it is proven that technology adopting agricultural cooperatives were more prominent than non-technology adopting agricultural cooperatives during 2017–2022. However, there is a harmony in the statistical findings from both observations, where access to computers and financial literacy both have a significant effect on profits. Other analysis results show that internet networks and digital administration skills have an insignificant impact on profits. The study's implications provide valuable output for the future sustainability of agricultural cooperatives. The success of agricultural cooperatives depends greatly on the effectiveness of the application of technology.
ARTICLE | doi:10.20944/preprints202310.0871.v1
Subject: Engineering, Aerospace Engineering Keywords: fiber optic gyroscope; thermal errors; prediction model; overfitting; biased regression
Online: 13 October 2023 (08:18:22 CEST)
For a fiber optic gyroscope, thermal deformation of the fiber coil can introduce additional ther-mal-induced phase errors, commonly referred to as thermal errors. Thermal error compensation techniques are effective means of addressing this issue. The principle behind these techniques involves real-time sensing of thermal errors and correcting them within the output signal. Since it is challenging to directly separate thermal errors from the output signal of the fiber optic gyro-scope, it is necessary to predict thermal errors based on temperature. To establish a mathematical model between temperature and thermal errors, this paper measured synchronized data of phase errors and angular velocity for the fiber coil under different temperature conditions and aimed to model it using data-driven methods. Due to the difficulty of conducting tests and the limited number of data samples, an algorithm called TD-model modeling is proposed to address the issue of overfitting, which can reduce the model's generalization ability. First, a theoretical analysis of the phase errors caused by thermal deformation of the fiber coil is performed. Subsequently, the critical parameters, such as the thermal expansion coefficient, are determined, and a theoretical model is established. Finally, the theoretical analysis model is incorporated as a regularization term and combined with the test data to jointly participate in the regression of model coefficients. Through experimental comparative analysis, it is shown that, relative to ordinary regression models, the TD-model effectively mitigates overfitting caused by the limited number of samples, leading to a 58% improvement in predictive accuracy.
ARTICLE | doi:10.20944/preprints202309.1617.v1
Subject: Social Sciences, Psychology Keywords: nurses; México; logistic regression; predictors; mental health; Spanish burnout inventory
Online: 25 September 2023 (09:48:40 CEST)
The aim of this study was to use latent profile analysis to identify specific profiles of burnout syndrome in combination with work engagement and to identify whether job satisfaction, psychological well-being, and other sociodemographic and work variables affect the probability of presenting a profile of burnout syndrome and low work enthusiasm. A total of 355 healthcare professionals completed the Spanish Burnout Inventory, the Utrecht Work Engagement Scale, the Job Satisfaction Scale, and the Psychological Well-Being Scale for Adults. Latent profile analysis identified 4 profiles: 1) burnout with high indolence (BwHIn); 2) burnout with low indolence (BwLIn); 3) high engagement, low burnout (HeLb); and 4) in the process of burning out (IPB). Multivariate logistic regression showed that a second job in a government health care institution; a shift other than the morning shift; being divorced, separated or widowed; and work load are predictors of burnout profiles with respect to the HeLb profile. These data are useful for designing intervention strategies according to the needs and characteristics of each type of burnout profile.
ARTICLE | doi:10.20944/preprints202309.1143.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: aging; sigmoidal growth function; nonlinear regression; threshold estimation; fractional anisotropy
Online: 18 September 2023 (14:27:39 CEST)
Backgrounds Linear association has widely been assumed for prediction of aging-related fractional anisotropy (FA) decline in white matter of the brain. While useful for testing significance of the aging effect, it fails to identify a threshold age before and after which the age-FA association changes. Identification of such a threshold is often of clinical interest for timely intervention. Methods We employed a sigmoidal growth function to test a threshold effect in age triggering onset of cerebral decline in 21 white matter tracts, and compared its fitting performance to those of linear, and power regression. The study sample was a normal healthy cohort of 106 participants with ages in mid-life ranging from 18 to 60 years. Results Of the 21 white matter tracts analyzed, the posterior thalamic radiation showed better fit with sigmoidal curve model, compared to a linear or power regression. The estimated threshold age in years (95% confidence interval) were 47.2 (44.1-48.4). Conclusion While available evidence regarding the presence of a specific age threshold for cerebral decline in mid-life based on FA was limited, the posterior thalamic radiation exhibited a threshold age of 47.2. Beyond this age point, we observed a significant change in the FA risk pattern.
ARTICLE | doi:10.20944/preprints202309.0259.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: XGBoost; Computational Fluid Dynamics; Steel Blast Furnace; Machine Learning; Regression
Online: 5 September 2023 (07:58:59 CEST)
Computational Fluid Dynamics (CFD)-based simulation has been the traditional way to model complex industrial systems and processes. One very large and complex industrial system that has benefited from CFD-based simulations is the steel blast furnace. The problem with the CFD-based simulation approach is that it tends to be very slow to generate data. The CFD-only approach may not be fast enough for use in real-time decision-making. To address this issue, in this work, the authors propose the use of machine learning techniques to train and test models based on data generated via CFD simulation. Regression models based on neural networks are compared to tree boosting models. In particular, several areas (tuyere, raceway, and shaft) of the blast furnace are modeled using these approaches. The results of the model training and testing are presented and discussed. The obtained R2 metrics are, in general, very high. The results look promising and may help to improve the efficiency of operator and process engineer decision-making when running a blast furnace.
ARTICLE | doi:10.20944/preprints202308.1881.v1
Subject: Arts And Humanities, Art Keywords: youth; unemployment; urban; determinant factors; regression model; Somali regional state
Online: 29 August 2023 (04:40:47 CEST)
Youth is the essential entrepreneurial force for every country's social, political, and economic development. Urban youth make up about 34.27 percent of the population in Ethiopia. The youth unemployment crisis is one of the serious challenges that Ethiopia is facing nowadays, and it should be mentioned in the nation's public discourse. Therefore, the main objective of this study is to specify the socioeconomic and demographic factors that influence urban youth unemployment in the Somali Regional State, specifically in Jigjiga, Degahbour, Kebridahar, Gode, and Dollo Ado towns. Both primary and secondary data sources were employed. A multi-stage sampling technique has been used to select 385 sample respondents from the total population of the study. This study found that almost 63% of youth in the study area were unemployed, and a large number of unemployed youths live in Jigjiga and Dollo Ado Towns as compared to other city administrations. The econometric model's findings indicate that factors like gender, education level, work experience, access to credit, access to information, time applied for job vacancy, active job seeker, social network (network with clan elders, government officials, businessmen, and other friends), and residence in Jigjiga and Gode town have a significant association with the likelihood of youth unemployment in the Somali regional state. Therefore, this study recommends that the regional government and local city administration should develop effective policies and strategies to address the underlying causes of gender inequality, improving access and quality of education, supporting freshman graduates to find jobs, providing access to credit and job vacancies, ensuring fair and free job competition, and managing youth migration for job seeking.
ARTICLE | doi:10.20944/preprints202308.0348.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Logistic regression; Machine learning; Prediction model; ROC curve; Variable selection
Online: 3 August 2023 (14:22:22 CEST)
Machine learning methods have been a standard approach to select features that are associated with an outcome and build a prediction model when the number of candidate features is large. LASSO has been one of the most popular approaches to this end. LASSO approach selects features with large regression estimates, rather than based on statistical significance, associating the outcome, by imposing L1-norm penalty to overcome the high dimensionality of the candidate features. As a result, LASSO may select insignificant features while possibly missing significant ones. Furthermore, from our experience, LASSO has been found to select too many features. By selecting features that are not associated with the outcome, we may have to spend more cost to collect and manage them in the future use of a fitted prediction model. Using the combination of L1- and L2-norm penalties, elastic net (EN) tends to select more features than LASSO. The overly selected features that are not associated with the outcome act like white noise, so that the fitted prediction model loses the prediction accuracy. In this paper, we propose to use the standard regression methods (without any penalizing approach) with stepwise variable selection procedure to overcome these issues. Unlike LASSO and EN, this method selects features based on statistical significance. Through extensive simulations, we show that this maximum likelihood estimation based method selects very small number of features while maintaining a high prediction power, while LASSO and EN make a large number of false selections to result in loss of prediction accuracy. Contrary to LASSO and EN, the regression methods combined with a stepwise variable selection method is a standard statistical method, so that any biostatistician can use it to analyze high dimensional data even without advanced bioinformatics knowledge.
REVIEW | doi:10.20944/preprints202307.0111.v1
Subject: Medicine And Pharmacology, Surgery Keywords: tumor response grade (TRG); gastric cancer; RECIST; tumor regression score
Online: 3 July 2023 (14:25:48 CEST)
Gastric cancer is among the top 5 causes of cancer-related death worldwide. Preoperative chemotherapy has been established as an option in patients with locally advanced gastric cancer. However, chemotherapy yields variable results, owing to the cellular and molecular heterogeneity of this disease. Identifying patients who did or did not respond to preoperative therapy can allow clinicians to alter treatment modalities and provide important information related to prognostication. Pathologic response to preoperative therapies, called Tumor Response Grade (TRG), has been evaluated to quantify treatment response. Multiple systems for TRG have been established. However, literature has demonstrated inconsistent results for TGR systems and prognosis, possibly due to variability in interpretation of tumor response between systems and interobserver variability. Radiographic response to preoperative therapies using RECIST 1.1 criteria and endoscopically-assessed tumor response have demonstrated association with survival; however their use in gastric cancer remains challenging given the inability to accurately and consistently identify and measure the tumor, especially in the setting of neoadjuvant therapy where treatment-related changes can obscure the gastric wall layers. This review is focused on summarizing the available literature related to evaluating TRG in gastric cancer, as well as providing a brief overview on the use of radiographic and endoscopic methods to assess response to preoperative therapies. Lastly, we outline future directions regarding the use of a universal TRG system to guide care and assist with prognosis.
ARTICLE | doi:10.20944/preprints202306.2080.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: Africa; ARIMA; Maternal mortality rate; Joinpoint regression analysis; Mortality; trends
Online: 29 June 2023 (09:42:35 CEST)
(1) Background: With the United Nations Sustainable Development Goals (SDG) (2015-2030) fo-cusing on reducing maternal mortality, monitoring and forecasting Maternal Mortality Rates (MMR) in regions like Africa become crucial for health strategy planning by policymakers, in-ternational organizations, and NGOs. (2) Methods: We collected maternal mortality rates per 100,000 births from the World Bank database between 1990 and 2015. Join Point regression was applied to assess trends, and the autoregressive integrated moving average (ARIMA) model was used on 1990-2015 data to forecast the MMR for the next 15 years. (3) Results: The study found a decline in MMR in Africa with an average annual percentage change (APC) of -2.6% (95% CI -2.7; -2.5). North Africa reported the lowest MMR, while East Africa experienced the sharpest decline. The region-specific ARIMA models predict that the maternal mortality rate (MMR) in 2030 will vary across regions, ranging from 65 deaths per 100,000 births in North Africa to 249 deaths per 100,000 births in Central Africa., averaging 197 per 100,000 births for the continent. (4) Conclusions: Despite the observed decreasing trend in maternal mortality rate (MMR), the MMR in Africa remains relatively high. The results indicate that MMR in Africa will continue to decrease by 2030. However, only North and South Africa will likely reach the SDG target.
ARTICLE | doi:10.20944/preprints202305.1462.v2
Subject: Engineering, Safety, Risk, Reliability And Quality Keywords: Pedestrian safety; Traffic characteristics, City of Kigali; Binary logistic regression
Online: 20 June 2023 (03:46:20 CEST)
The safety of a pedestrian crossing may depend on infrastructure, vehicular and pedestrian traffic characteristics. This research portrays the safety challenges caused by vehicles on crosswalk in the City of Kigali. Through observing the stopping of drivers in pedestrian crossing events, the study aims to evaluate driver’s behaviors against traffic flow parameters. 10 collection sites were finally selected purposively and randomly to suit observations for data recording. A total of 10,259 crossing events were recorded within 280 hours. Statistical analysis, tests and Binary logistic regression model were used to evaluate the behaviors. Sadly,82.4% drivers violate crosswalks, endangering crossing. Motorcyclists exhibit the most aggressive behavior. Car drivers are relatively less aggressive,60% managed to brake in the events. Buses and bicycles share a negligible collective of 2%, being aggressive and would not stop. Cars are 10.389 times more likely to stop compared to bicycles. Having more vehicles in a row is safer to cross, for each unit increase on the vehicle density scale, there were 1.956 more chances that every driver would stop.13% to 21% of traffic variables predict the variance in stopping behaviors model.
ARTICLE | doi:10.20944/preprints202306.1016.v1
Subject: Environmental And Earth Sciences, Water Science And Technology Keywords: Decision Tree; linear regression; Naïve Bayes; Python; Support Vector Machine
Online: 14 June 2023 (08:40:50 CEST)
Water pollution is a common problem for dams situated within an urban or agricultural catchment. This can negatively affect the hydro ecosystem, drinking, recreational and other uses of water. In this study, the drinking water quality class of the Roodeplaat Dam, South Africa which faces pollution problems was modeled using machine learning algorisms in Python Jupyter Notebook 6.0.0. Eleven monthly water quality parameters recorded at five sampling stations from January 1981 to September 2017 were used for training and testing the model. Five machine learning classifiers: Gaussian Naïve Bayes (GNB), K-nearest neighbors (KNN), Decision Tree (DT), Support Vector Machines (SVM), and Linear Regression (LR) at a test size of 20%, 25%, 30%, and 40% were used to classify water into five classes (Excellent to Very bad). It was investigated that the dam water has only three classes good, medium, and bad. The prediction accuracies of machine learning algorithms from the highest to the lowest were 96.39%, 96.17%, 92.25%, 90.20, and 54.19% for KNN, DT, SVM, GNB, and LR, respectively. Therefore, KNN at a test size of 30% was recommended to classify the water quality of Roodeplat Dam accurately. Hence, machine learning algorithms can be used to identify the class of water quality before the water is treated and distributed for drinking use.
ARTICLE | doi:10.20944/preprints202306.0933.v1
Subject: Public Health And Healthcare, Public, Environmental And Occupational Health Keywords: Opisthorchis viverrini; geographic weighted regression; sub-basin; Sakon Nakhon, Thailand.
Online: 14 June 2023 (02:11:11 CEST)
Infection of liver flukes (Opisthorchis viverrini) is partly due to their suitability for habitats in sub-basin areas, which causes the intermediate host to remain in the watershed system in all seasons. Spatial monitoring of fluke infection at the small -basin analysis scale is important because this can enable analysis at the level of the spatial factors involved and influencing infections. A geographic weighted regression model was developed to analyze the spatial characteristics of liver fluke infection, aiming to 1. analyze the spatial factors associated with human liver fluke infection according to sub-basin boundaries and 2. generate an alternative model for enhancing the effectiveness of preventive public health management to reduce the risk of liver fluke infection in humans. The number of infected persons was obtained from local authorities and converted into a percentage of infected people and generated as raster data with a heat map so that the data were continuous and defined as dependent variables. The independent set consisted of nine variables, both vector and raster data, that correlated the location with the village location of an infected person. The results showed that the variables X5stream, X7ndmi, and X9savi were statistically significantly correlated to the percentage of infected people, with the t-stat and p-value being (-2.068, 1.875, and -2.661) and (0.048, 0.034, and 0.021), respectively. The GWR model was able to increase accuracy more than the comparable models such as OLS, in all tests of the four alternative models, with an accuracy increase in R2 of 7.69% (0.576 to 0.624). This study confirms that the development of spatial models with GWR models can screen for factors associated with liver fluke infection at the level of small spatial units such as sub-basins.
ARTICLE | doi:10.20944/preprints202304.0268.v1
Subject: Public Health And Healthcare, Other Keywords: wellness; health; user engagement; social media; instagram; negative binomial regression
Online: 12 April 2023 (09:46:13 CEST)
Wellness is a multidimensional concept that touches upon the various physical, mental, emotional, spiritual, social and environmental facets of health. Interest towards and importance of wellness have been growing constantly for the past two decades and thus makes it crucial to understand which factors affect public engagement with wellness information for multiple stakeholders. The Instagram account of New York Times (NYT) specifically for sharing wellness content with the handle nyt_well was selected as the object of study. 773 posts from this account between March of 2019 and December of 2022 were collected and analyzed to answer the research question of which factors are most influential to public engagement with wellness content. Two negative binomial regressions were run on features including the type of post, length, word count, sentiment score and topic with number of likes and comments as the dependent variables for each of those regression models. Results indicated that the type of post and its sentiment score were the two most influential determinants of public engagement with p-values smaller than 0.05. While the effects of some of these factors aligned with findings from previous studies conducted on social media content not related to wellness (e.g., marketing), some others affected the two separate public engagement metrics in opposite directions, warranting future studies to investigate further on the cause of this phenomenon.
ARTICLE | doi:10.20944/preprints202303.0048.v1
Subject: Biology And Life Sciences, Immunology And Microbiology Keywords: Minimum Inhibitory Concentrations; Deep Learning; Regression; Antimicrobial peptides; Drug Discovery
Online: 3 March 2023 (01:29:51 CET)
Antimicrobial peptides (AMPs) are a promising alternative to antibiotics to combat drug resistance in pathogenic bacteria. However, the development of AMPs with high potency and specificity remains a challenge, and new tools to evaluate antimicrobial activity are needed to accelerate the discovery process. As a step toward direct prediction of the experimental minimum inhibitory concentration (MIC) of AMPs, we proposed MBC-Attention, a combination of a multi-branch CNN architecture and attention mechanism. Using a curated dataset of 3929 AMP against Escherichia coli, the optimal MBC-Attention model achieved an average Pearson correlation coefficient of 0.775 and an RMSE of 0.533 (log μM) in three independent tests of 393 sequences each. This results in a 5–12% improvement in PCC and 7–13% improvement in RMSE compared with RF and SVM models. Ablation studies confirmed that both attention mechanisms contributed to performance improvement.
ARTICLE | doi:10.20944/preprints202301.0216.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: Quality of Governance; Tax Revenue; Multiple Regression Analysis; South Africa
Online: 12 January 2023 (08:40:18 CET)
The purpose of the study is to empirically analyze the effect of quality of governance on tax revenue in South Africa. This is done by analyzing a time series dataset covering 1996 to 2020. The study used voice and accountability, regulatory quality, government effectiveness, control of corruption, political stability and rule of law as proxies of quality of governance. Multiple regression analysis was performed to test hypotheses. Based on the regression results, all quality of governance variables in South Africa have a negative effect on tax revenue except corruption control. The findings of this study also include policy recommendations. The government of South Africa must design and implement effective ways to combat poor governance, which results in a tax revenue shortfall.
ARTICLE | doi:10.20944/preprints202211.0360.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: Groins; Dredging; Storm events; Natural factors; Anthropogenic factors; Regression model.
Online: 21 November 2022 (01:19:51 CET)
Vagueira Beach on the Central Portuguese coast, is known as one of the places in Europe most affected by coastal erosion. The area has suffered more than 156 meters of coastline retreat over the period 1958 to 2001. With the aim to evaluate the influence of local factors on coastal erosion, this paper assesses the anthropogenic and natural factors that are related with the retreat of the coastline, by adopting statistical correlation and regression analyses. Through the Pearson's correlation coefficient (r) it was observed that local factors such as annual dredging at the Aveiro Port entrance (r = 0.93), the total length of groins in the Espinho-Vagueira section (r = 0.89), and storm events (r = 0.52) are directly related to coastline retreat in the area. A multiple linear regression model was developed in which coastline retreat is explained by these same factors over the period 1980 to 2006. With a coefficient of determination of R² = 0.91, it was observed that length of groins (significant at the 1% level), dredging of the port entrance (significant at the 5% level) and precipitation (as a proxy for storm events; significant at the 10% level) are significantly correlated with coastline retreat. Hence, it is shown that anthropogenic factors are the main drivers of coastline retreat in Vagueira Beach. This study provides an innovative approach for the assessment of coastal erosion, resulting in important information that can be used for decision-making related to coastal zone management as it allows to understand in greater detail the main drivers of coastal erosion.
ARTICLE | doi:10.20944/preprints202207.0383.v1
Subject: Engineering, Marine Engineering Keywords: machine learning; forecast; regression models; Liquified Natural Gas; maritime transportation
Online: 26 July 2022 (03:50:12 CEST)
Recent maritime legislations demand the transformation of the sector to greener and more energy efficient transportation. Liquified Natural Gas (LNG) seems a promising alternative fuel solution that could replace the conventional fuel sources. Various studies have been focused on the prediction of LNG price, however, no previous work has been made on the forecast of spot charter rate of LNG carrier ships. An important knowledge for the maritime industries and companies when it comes to decision-making. Therefore, this study is focused on the development of a machine learning pipeline to address the aforementioned problem by: (i) forming a dataset with variables relevant to LNG; (ii) identifying the variables that impact on the freight price of LNG carrier; (iii) developing and evaluating regression models for short and mid-term forecast. The results showed that the General Regression Neural Network presented a stable overall performance for 2, 4 and 6 months forecast.
ARTICLE | doi:10.20944/preprints202205.0391.v1
Subject: Business, Economics And Management, Business And Management Keywords: circularity of materials; circular activity; recycling; regression model; key elements
Online: 30 May 2022 (09:59:03 CEST)
The authors have revised the circularity of materials, which is important to stimulate circular activity processes. The theoretical part starts with describing the characteristics of the circular activity and the comparison of circular and linear systems in terms of recycling. Later on, the authors examined key elements important for the circularity and the results of an examination of various sectors. The authors formed a correlation matrix and used a dynamic regression model to identify the circular material use rate. The authors suggested a three-level methodology, using it provided a dynamic regression model which could be applied for forecasting the size of circular material use rate in European Union countries. The results show that private investments into recycling and the recycling of electronic waste and the recycling of other municipal waste categories are important in seeking to increase the usage rate of circular materials.