COMMENT | doi:10.20944/preprints201608.0166.v1
Subject: Social Sciences, Geography, Planning And Development Keywords: Regional inequality; Multilevel regression; Markov chain; Guizhou Province
Online: 17 August 2016 (12:58:58 CEST)
This study analyses regional development in one of the poorest provinces in China, Guizhou Province, between 2000 and 2012 using a multiscale and multi-mechanism framework. In general, regional inequality has been declining since 2000. In addition, economic development in Guizhou Province presented spatial agglomeration and club convergence, which shows how the development pattern of one core area, two-wing areas and a contiguous area at the edge of the province have been developed between 2006 and 2012. Multilevel regression analysis revealed that industrialization and investment level were the primary driving forces of regional economic disparity in Guizhou Province. The influences of marketization and decentralization on regional economic disparity were relatively weak. Investment level reinforced regional economic disparity and the development of core-periphery structure in the province. However, investment level actually weakened the regional economic disparity in Guizhou Province when the variable of time was considered. In addition, both the topography and urban–rural differentiation were the two main reasons for forming a core-periphery structure in Guizhou Province.
ARTICLE | doi:10.20944/preprints202011.0297.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: regression; time point data; modelling
Online: 10 November 2020 (10:00:37 CET)
In this paper, we present a relapse based demonstrating way to deal with investigate various arrangement MTC information. A commonplace use of this displaying approach incorporates three stages: first, define a model that approximates the connection between quality articulation and trial factors, with boundaries consolidated to address the exploration premium; second, utilize least-squares and assessing condition methods to gauge boundaries and their relating standard blunders; third, register test insights, P-qualities and NFD as proportions of factual criticalness. The benefits of this methodology are as per the following. To begin with, it tends to the exploration interest in a particular, precise way, and maximally uses all the information and other important data. Second, it represents both orderly and irregular varieties related with the information, and the consequences of such examination give not just quality explicit data applicable to the exploration objective, yet additionally its dependability, in this way helping agents to settle on better choices for subsequent investigations. Third, this methodology is truly adaptable, and can undoubtedly be stretched out to different sorts of MTC considers or other microarray explores by detailing various models dependent on the test plan of the investigations.
ARTICLE | doi:10.20944/preprints201712.0032.v1
Subject: Engineering, Energy And Fuel Technology Keywords: statistics; uncertainty; regression; sampling; outlier; probabilistic
Online: 6 December 2017 (06:36:02 CET)
Energy Measurement and Verification (M&V) aims to make inferences about the savings achieved in energy projects, given the data and other information at hand. Traditionally, a frequentist approach has been used to quantify these savings and their associated uncertainties. We demonstrate that the Bayesian paradigm is an intuitive, coherent, and powerful alternative framework within which M&V can be done. Its advantages and limitations are discussed, and two examples from the industry-standard International Performance Measurement and Verification Protocol (IPMVP) are solved using the framework. Bayesian analysis is shown to describe the problem more thoroughly and yield richer information and uncertainty quantification than the standard methods while not sacrificing model simplicity. We also show that Bayesian methods can be more robust to outliers. Bayesian alternatives to standard M&V methods are listed, and examples from literature are cited.
ARTICLE | doi:10.20944/preprints201608.0202.v2
Subject: Environmental And Earth Sciences, Environmental Science Keywords: HR satellite remote sensing; urban fabric vulnerability; UHI & heat waves; landsat & MODIS sensors; LST & urban heating; segmentation & objects classification; data mining; feature extraction & selection; stepwise regression & model calibration
Online: 26 October 2021 (13:11:23 CEST)
Densely urbanized areas, with a low percentage of green vegetation, are highly exposed to Heat Waves (HW) which nowadays are increasing in terms of frequency and intensity also in the middle-latitude regions, due to ongoing Climate Change (CC). Their negative effects may combine with those of the UHI (Urban Heat Island), a local phenomenon where air temperatures in the compact built up cores of towns increase more than those in the surrounding rural areas, with significant impact on the quality of urban environment, on citizens health and energy consumption and transport, as it has occurred in the summer of 2003 on France and Italian central-northern areas. In this context this work aims at designing and developing a methodology based on aero-spatial remote sensing (EO) at medium-high resolution and most recent GIS techniques, for the extensive characterization of the urban fabric response to these climatic impacts related to the temperature within the general framework of supporting local and national strategies and policies of adaptation to CC. Due to its extension and variety of built-up typologies, the municipality of Rome was selected as test area for the methodology development and validation. First of all, we started by operating through photointerpretation of cartography at detailed scale (CTR 1: 5000) on a reference area consisting of a transect of about 5x20 km, extending from the downtown to the suburbs and including all the built-up classes of interest. The reference built-up vulnerability classes found inside the transect were then exploited as training areas to classify the entire territory of Rome municipality. To this end, the satellite EO HR (High Resolution) multispectral data, provided by the Landsat sensors were used within a on purpose developed "supervised" classification procedure, based on data mining and “object-classification” techniques. The classification results were then exploited for implementing a calibration method, based on a typical UHI temperature distribution, derived from MODIS satellite sensor LST (Land Surface Temperature) data of the summer 2003, to obtain an analytical expression of the vulnerability model, previously introduced on a semi-empirical basis.
REVIEW | doi:10.20944/preprints202311.0156.v1
Subject: Biology And Life Sciences, Aquatic Science Keywords: tilapia; probiotics; linear regression analysis; hierarchical regression analysis; Pearson correlation
Online: 2 November 2023 (10:29:36 CET)
Data regarding the pandemic's impact on tilapia culture remain limited, but it is known that there was a significant decline in production and marketing since 2020. The post-pandemic challenges confronting tilapia farming necessitate prompt solutions, encompassing the management of bacterial infections and the adoption of more advanced technologies by small-scale producers in developing nations. Probiotics, acknowledged as a viable alternative, are presently extensively employed in tilapia aquaculture. Multiple studies have suggested that the application of diverse probiotics in tilapia culture has yielded favorable outcomes. Nonetheless, only a limited number of studies have employed statistical methods to evaluate such findings. To address this gap, a regression analysis was carried out to investigate the existence of a linear relationship between the probiotic dosage added to the feed and two key dependent variables: the specific growth rate (SGR) and the feed conversion ratio (FCR). Additionally, a hierarchical regression analysis was undertaken to ascertain the extent to which the variance observed in these responses could be explained by the variable "probiotic dosage in feed," after accounting for covariates such as initial weight, test duration, water temperature, and number of replicate tanks. Finally, two Pearson correlation matrices were constructed since different studies were included for the SGR and FCR analyses.
ARTICLE | doi:10.20944/preprints202310.0202.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: amaranth; environmental index; linear regression; stability
Online: 4 October 2023 (05:04:02 CEST)
Amaranth has the potential to support Malawi's food and nutrition security, income generation and livelihoods, and climate change resilience efforts. Due to the high genetic variability of Ama-ranth, there is a need to develop stable and high-yielding genotypes for sustainable production. To determine the degree of genetic stability in different environments, five Amaranth accessions were subjected to stability analysis. The experiment was carried out at three sites (Bunda, Bembeke, and Chipoka) for two seasons in 2020-2021 in the central region of Malawi. It was laid out in Ran-domized Complete Block Design (RCBD) with four replicates. Eberhart and Russell linear regres-sion model was used for stability analysis and Pearson correlation was used to test the relationship between variables. Environmental variance + (genotype x environment) was significant for four of the parameters studied, namely grain yield, plant height, leaf length, and leaf width, indicating the presence of a remarkable interaction between genotypes and environment. The results of a pooled analysis of variance showed significant differences at a 5% significance level among the Amaranth accessions, indicating inherent genetic variability. Using the linear regression model of Eberhart and Russell, accessions PE-LO-BH -01 and LL-BH -04 were identified as the highest yielding stable genotypes for leaf and grain yield, respectively. In addition, Bembeke site was the most favourable environment for all the accessions. Thus, to enhance the production of amaranth in Malawi, LL-BH-04 and PE-LO-BH-01 were put forward for release as varieties for grain and leaf respectively. These results will also guide and support for future breeding programs.
ARTICLE | doi:10.20944/preprints202310.0432.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: European Union; public revenues; public expenditures; regression analysis
Online: 8 October 2023 (10:08:59 CEST)
Modern countries generally deal with significant budget deficits and public debt. These countries need to rationalize their expenditures and increase revenue without major interference to economic flows. The aim of this paper is to create a model for forecasting public revenue and expenditure based on data from previous years. In the paper we formulated two hypotheses related to the validity of the set models. After detailed analysis, both hypotheses were accepted. The analysis includes all EU Member States and public revenue and expenditure data for the last decade. The significance of the analysis is reflected on the practical foundation of the pre-set theoretical views, which will have their basis in statistically significant results. By analyzing the model, we formulated the regression formulas of revenues and expenditures, which can be efficiently used in predicting these variables.
ARTICLE | doi:10.20944/preprints202008.0058.v1
Subject: Environmental And Earth Sciences, Geography Keywords: Rwandz; residential function; GIS; correlation; regression
Online: 3 August 2020 (00:37:42 CEST)
House is the haven that keeps people from natural and human conditions, it gives them trust, safety, and steadiness. It is one of the most basic human needs this became a serious function which cities offer, and became one of the most important aspects which caught urban researchers interest, they take into consideration a wide range of architectural, social, and economic indicators. The study aims to provide an overall conception of Rwandz residential functions, using a collection of parameters and some GIS and statistical techniques, to help establish plans and future projects to improve the growth of this city and other towns and cities in that area. The study found that the old parts of Rwandz city which are located in the core, differ from the outer parts which are relatively newer in many properties, generally, the core is more densely populated than the outer, bigger family size, more illiteracy, and unemployment, few incomes, older houses, smaller houses, in the opposite of the outer parts. Besides, the study tested the correlation coefficient between the criteria; it found some strong statistical relationships between them, which reflected some real-life properties of the residential function. Lastly, the study designed a regression model to predict the main residential function criteria.
ARTICLE | doi:10.20944/preprints202106.0497.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Ecosystem services; Benefit transfer; Meta-analysis; Meta-regression function.
Online: 21 June 2021 (10:04:14 CEST)
Meta-analysis has increasingly been used to synthesize the ecosystem services literature, with some testing of the use of such analyses to transfer benefits. These are typically based on local primary studies. However, meta-analyses associated with ecosystem services are a potentially powerful tool for transferring benefits, especially for environmental assets for which no primary studies are available. In this study we use the Ecosystem Service Valuation Database (ESVD), which brings together 1350 value estimates from more than 320 studies around the world, to estimate meta-regression functions for provisioning, regulating & maintenance and cultural ecosystem services across 12 biomes. We tested the reliability of these meta-regression functions and found that even using variables with high explanatory power, transfer errors could still be large. We show that meta-analytic transfer performs better than simple value transfer and, in addition, that local meta-analytical transfer (i.e. based on local explanatory variable values) provides more reliable estimates than global meta-analytical transfer (i.e. based on mean global explanatory variable values). Thus, we conclude that when taking into account the characteristics of the study area under analysis, including explanatory variables such as income, population density and protection status, we can determine the value of ecosystem services with greater accuracy.
ARTICLE | doi:10.20944/preprints202305.0096.v1
Subject: Computer Science And Mathematics, Mathematics Keywords: Topological indices; Fibrates; Curvilinear regression; QSPR analysis
Online: 3 May 2023 (04:48:22 CEST)
The paper describes the use of topological indices in conjunction with high cholesterol drugs, specifically Fibrates, to predict their physicochemical properties and biological activities. Fibrates are known to lower high triglycerides, increase HDL cholesterol, and reduce the small dense fraction of LDL cholesterol. The study uses a quantitative structural-property relationships (QSPR) approach, which involves analyzing the relationships between physicochemical properties and topological indices using curvilinear regression. The QSPR model predicts the physicochemical properties of the drugs based on degrees and distances determined from topological indices. The study also conducted (DFT) calculations at the B3LYP/6-31G(d,p) level on the four investigated derivatives to gain insights into their optimized geometries, DOS plots, HOMO and LUMO orbital energies, and distribution. The theoretical results presented in the study suggest that the use of topological indices in QSPR models could provide a powerful tool for predicting the physicochemical properties and biological activities of molecules, including drugs. These findings could lead to the development of new cholesterol-lowering drugs with desirable properties.
SHORT NOTE | doi:10.20944/preprints202011.0284.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Covid19; Best fit regression; Hyperbolic fit; Recovery rate; Reproducibity of research
Online: 9 November 2020 (16:15:21 CET)
In this report the positive cases of Covid19 in India with effect from 7th September ,2020 to 25th October ,2020 are analysed for statistical relevance . The scattered data are used to find out a model equation correlating two variables number of recovered Covid –patient with an interval of regular seven days . The best fit regression analysis shows a significant correlation of Pearson coefficient (r) with standard error ( s ) with a probable lower mortality rate . Finally the limitations of this analysis is discussed herewith .
ARTICLE | doi:10.20944/preprints202311.1782.v1
Subject: Business, Economics And Management, Economics Keywords: DEA; wood processing enterprises; small enterprises; fractional regression
Online: 28 November 2023 (07:49:48 CET)
Micro and small wood-processing enterprises represent the heart of the European forest-based industries, being among the key drivers of economic growth in rural, mountainous, and poor regions. Their economic efficiency is of fundamental importance for their existence and the pro-vision of income for the local population in rural areas. Data Envelopment Analysis (DEA) is nonparametric, linear-programming-based approach, commonly used to analyse the efficiency of organizational units. This method allows estimating the economic efficiency of a certain eco-nomic system without assumptions about the functional form between resources and products. Furthermore, DEA determines the efficiency frontier and gives results of whether an enterprise, i.e., a Decision Making Unit (DMU) is efficient or not. The main objective of this study was to investigate and evaluate the economic efficiency of micro and small wood-processing enterpris-es in the EU countries and reveal the hidden inputs that facilitate efficiency generation. The eco-nomic efficiency evaluation was carried out on the basis of the official statistical data for the mi-cro and small wood-processing companies in the EU member states for the period 2015-2020 by performing a two-stage DEA analysis. The data used were standardized by value per employee. In addition to the first stage of DEA, fractional regression probit and logit models with four contextual variables were used to reveal the influence of the hidden inputs in the model. The results showed that the micro and small wood-processing enterprises can be regarded as more scale-efficient than technically-efficient entities. The only contextual variable affecting the eco-nomic efficiency was Investments per Person Employed, improving the efficiency by 2% per 1% increase of the investments.
ARTICLE | doi:10.20944/preprints202103.0530.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Multiblock data analysis; redundancy analysis; PLS regression; super- vised methods; multicolinearity
Online: 22 March 2021 (12:25:20 CET)
Within the framework of multiblock data analysis, a unified approach of supervised methods is discussed. It encompasses multiblock redundancy analysis (MB-RA) and multiblock partial least squares (MB-PLS) regression. Moreover, we develop new supervised strategies of multiblock data analysis, which can be seen as variants of one or the other of these two methods. They are respectively refered to as multiblock weighted redundancy analysis (MB-WRA) and multiblock weighted covariate analysis (MB-WCov). The four methods are based on the determination of latent variables associated with the various blocks of variables. They are derived from clear optimization criteria whose aim is to maximize either the sum of the covariances or the sum of squared covariances between the latent variable associated with the response block of variables and the block latent variables associated with the various explanatory blocks of variables. We also propose indices to help better interpreting the outcomes of the analyses. The methods are illustrated and compared based on simulated and real datasets.
ARTICLE | doi:10.20944/preprints201608.0026.v1
Subject: Engineering, Civil Engineering Keywords: concrete; sustainability; regression analysis mix design; CO2 emission; cost
Online: 3 August 2016 (06:05:26 CEST)
As argued by ‘Declaration of Concrete Environment (2010)’ of Korea and ‘Declaration of Asian Concrete Environment (2011)’ of six Asian countries, concrete as a single material has lately shown extremely large impact on environmental issues such as climate change. Assessment of environmental impact from concrete material and production has considerable importance. Concrete is a major material used in the construction industry that emits a large amount of substances with environmental impacts during its life cycle. Accordingly, technologies for the reduction in and assessment of the environmental impact of concrete from the perspective of Life Cycle Assessment must be developed. At present, the studies in relation to greenhouse gas emission from concrete are being carried out globally as a countermeasure against climate change. In this study, a sustainable concrete mix design algorithm was designed using correlation analyses, and its carbon emission and cost reduction performances were assessed. Using correlation analyses, the concrete strength, w/b and s/a ratios, and CO2 emissions were identified as major variables of concrete mix design that influenced other variables. Also, this study aims to evaluate the CO2 emission reduction performance of the algorithm-deduced sustainable concrete mix design, and therefore, the CO2 emissions of the sustainable concrete mix design are compared with those of the actual concrete mix design applied to the construction of the office building A in South Korea.
ARTICLE | doi:10.20944/preprints202310.0938.v1
Subject: Engineering, Mechanical Engineering Keywords: onion; peeling; compressed air; skin; waste; non-linear regression
Online: 16 October 2023 (09:11:18 CEST)
The paper presents the relationship between the efficiency of the process of skin onion peeling and its effect in the form of waste. The research was carried out on a pilot test stand for onion peeling. The process variables were compressed air with a pressure of (p) and valve controlling opening time of flow (t). The experiment took into account the influence of the onion diameter (d0) and its hardness (H). The obtained results were subjected to statistical analysis. Standard deviations were of the percentage loss of onion mass in the form of the skin removed of onion peeling in the process in relation to obtained aver-age values. Tukey's multiple comparison test was performed in order to identify the importance of individual process variables on the final effect of onion peeling. This was the basis for the development of a predictive model in the form of a nonlinear regression Mp=f(p,t,d0,H), which is a mathematical description of the peeling onion skin process . Finally, the response surface area of relationship between analyzed variables was determined. The results of research showed the peeling efficiency of the onion and waste of skin mass depend on the compressed air pressure. Extending the onion blowing time does not improve the process efficiency, while the hardness and size of the onion are irrelevant to the process.
ARTICLE | doi:10.20944/preprints202201.0408.v1
Subject: Medicine And Pharmacology, Dietetics And Nutrition Keywords: Indonesia; islands cluster; multiple logistic regression; obesity; risk factor
Online: 27 January 2022 (06:53:58 CET)
Obesity has become a rising global health problem affecting adults’ quality of life. The objective of this study was to describe the prevalence of obesity in Indonesian adults based on the cluster of islands. The study was also aimed to identify the risk factors of obesity in each island cluster. This study analysed secondary data of Indonesian Basic Health Research 2018. Our data for analysis comprised 688,638 adults (>=15 years) randomly selected using proportionate to population size throughout Indonesia. We included 20 variables for sociodemographic and obesity-related risk factors for analysis. Obese status was defined using Body Mass Index (BMI) >= 27.5 kg/m2. Our current study defined seven major islands cluster as the unit analysis consisting of 34 provinces in Indonesia. Descriptive analysis was conducted to determine the characteristics of the population and to calculate the prevalence of obesity within provinces in each of the island’s clusters. Multivariate logistic regression analyses to calculate odds ratios (ORs) was performed using R version 3.6.3. The study results showed that all island clusters had at least one province with an obesity prevalence of more than 20%. Six out of twenty variables, comprising four diet factors (consumption of sweet food, high-salt food, meat food, and carbonated drinks) and two other factors (mental health disorders and smoking behaviour), varied across the island clusters. In conclusion, there was a variation of obesity prevalence of the provinces within and between island clusters. Variation of risk factors raised in each cluster island suggested the government rethink and reframe the intervention to address obesity.
ARTICLE | doi:10.20944/preprints202309.1617.v1
Subject: Social Sciences, Psychology Keywords: nurses; México; logistic regression; predictors; mental health; Spanish burnout inventory
Online: 25 September 2023 (09:48:40 CEST)
The aim of this study was to use latent profile analysis to identify specific profiles of burnout syndrome in combination with work engagement and to identify whether job satisfaction, psychological well-being, and other sociodemographic and work variables affect the probability of presenting a profile of burnout syndrome and low work enthusiasm. A total of 355 healthcare professionals completed the Spanish Burnout Inventory, the Utrecht Work Engagement Scale, the Job Satisfaction Scale, and the Psychological Well-Being Scale for Adults. Latent profile analysis identified 4 profiles: 1) burnout with high indolence (BwHIn); 2) burnout with low indolence (BwLIn); 3) high engagement, low burnout (HeLb); and 4) in the process of burning out (IPB). Multivariate logistic regression showed that a second job in a government health care institution; a shift other than the morning shift; being divorced, separated or widowed; and work load are predictors of burnout profiles with respect to the HeLb profile. These data are useful for designing intervention strategies according to the needs and characteristics of each type of burnout profile.
ARTICLE | doi:10.20944/preprints202304.0268.v1
Subject: Public Health And Healthcare, Other Keywords: wellness; health; user engagement; social media; instagram; negative binomial regression
Online: 12 April 2023 (09:46:13 CEST)
Wellness is a multidimensional concept that touches upon the various physical, mental, emotional, spiritual, social and environmental facets of health. Interest towards and importance of wellness have been growing constantly for the past two decades and thus makes it crucial to understand which factors affect public engagement with wellness information for multiple stakeholders. The Instagram account of New York Times (NYT) specifically for sharing wellness content with the handle nyt_well was selected as the object of study. 773 posts from this account between March of 2019 and December of 2022 were collected and analyzed to answer the research question of which factors are most influential to public engagement with wellness content. Two negative binomial regressions were run on features including the type of post, length, word count, sentiment score and topic with number of likes and comments as the dependent variables for each of those regression models. Results indicated that the type of post and its sentiment score were the two most influential determinants of public engagement with p-values smaller than 0.05. While the effects of some of these factors aligned with findings from previous studies conducted on social media content not related to wellness (e.g., marketing), some others affected the two separate public engagement metrics in opposite directions, warranting future studies to investigate further on the cause of this phenomenon.
ARTICLE | doi:10.20944/preprints202003.0088.v1
Subject: Engineering, Civil Engineering Keywords: Major ions; Physicochemical parameters; Pearson’s correlation matrix; Regression; Water Quality Index (WQI)
Online: 5 March 2020 (12:02:36 CET)
This work evaluates the surface water quality in terms of physico-chemical parameters of the Brahmani River, Odisha using statistical analysis involving the calculation of correlation coefficient and regression equation. Besides this, the work also highlights and draws attention towards the “Water Quality Index” in a simplified format which may be used at large and could represent the reliable picture of water quality. Surface water quality data is taken from OSPCB of various location i.e. Panposh D/S, Rourkela D/S, Rengali, Talcher U/S, Kamalanga D/S, Bhuban, Pattamundai and was assessed for summer, monsoon, winter for the years 2011, 2012, 2013, 2014 and 2015. Average of values, minimum of values and maximum of values of water quality parameters were obtained seasonally over the above mentioned years. Besides this, the standard deviation for the water quality parameters was also obtained for water quality parameters namely pH, Temperature, DO, TDS, Alkalinity, EC, Na+, Ca2+, Mg2+, K+, F-, Cl-, NO3-, SO42- and PO42-. Seasonal changes in various physical and chemical parameters were analysed.The values obtained were compared with the guideline values for drinking water by Bureau of Indian Standard (BIS). A systematic correlation and regression study is carried out for three seasons, showed linear relationship among different water quality parameters. This provides an easy and rapid method of monitoring water quality. Highly significant (0.8< r <1.0), moderately significant (0.6< r <0.8) and significant (0.5< r <0.6) correlations between the parameters have been worked out. High correlation coefficient has been observed between TDS,EC-Na+, Ca2+, Cl-, SO42- ; Na+- Cl-. From the collected quantities, certain parameters were selected to derive WQI for the variations in water quality of each designated sampling site. WQI of Brahmani River ranged from 36.7 to 44.1 which falls in the range of good quality of water.Panposh D/S and Rourkela D/S showed poor water quality in summer and winter season. It is shown that WQI may be a useful tool for assessing water quality and predicting trend of variation in water quality at differentlocations in the Brahmani River.
ARTICLE | doi:10.20944/preprints202308.1469.v1
Subject: Social Sciences, Area Studies Keywords: rural residents; post-retirement migration intention; logistic regression model; influencing variables
Online: 22 August 2023 (04:48:50 CEST)
With the intensification of population aging in rural areas, it becomes increasingly important to analyze the post-retirement migration intention of rural residents and the variables influencing these intentions. In this study, we focus on rural residents aged 45 to 60 and investigate the main variables that influence the post-retirement migration intention of rural residents, using survey data collected from 164 households in three different rural areas and the logistic regression model. From the result of this study, we found that gender, part-time employment, savings level, children's residence and occupation stability, and interest in urban living positively affect migration intention. In contrast, the number of rural companions, relationships with others in rural areas, and evaluation of rural living have a negative effect. In addition, we employ age and the proportion of mobile income as control variables to examine the variables that influence the post-retirement migration intention in different age groups and mobile income groups. The analysis reveals that, the variables influencing post-retirement migration intention varied across age groups and mobile income groups, and this variation can be attributed to the differences between groups' characteristics.
BRIEF REPORT | doi:10.20944/preprints202303.0427.v1
Subject: Environmental And Earth Sciences, Water Science And Technology Keywords: Correlation study; Regression analysis; Physicochemical parameters; quality of water; Electrical conductivity
Online: 24 March 2023 (07:50:50 CET)
Water is an essential commodity to sustain life. The condition of water in Tiruchirappalli was measured using different Physicochemical parameters like Temperature, pH, TDS, Total Solids, Salinity, Total Hardness, and Electrical Conductivity. Water samples were collected from different places in Tiruchirappalli city, Tamil Nadu. Water samples were examined by different chemical methods. According to the results, Thiruvarambur-1 showed comparatively higher results in the case of every parameter compared to other sampling stations. Obtained results were further interpreted using statistical tools. Considering ECs as a principal component for the regression and correlation analysis with other parameters significant correlation was found. A strong correlation was observed between ECs and TDS, Total Hardness, Turbidity, and Salinity.
ARTICLE | doi:10.20944/preprints202006.0073.v1
Subject: Biology And Life Sciences, Virology Keywords: Epidemiology; SARS-CoV-2; Multivariable regression; Tuberculosis; Demography; Coronavirus; MMR vaccine
Online: 7 June 2020 (09:25:55 CEST)
COVID-19 pandemic that started in China has spread within 3 months to the entire globe. We tested the hypothesis that the vaccination against tuberculosis by BCG correlates with a better outcome for COVID-19 patients. Our analysis covers 55 countries complying with predetermined thresholds on the population size and number of deaths per million (DPM). We found a strong negative correlation between the years of BCG administration and the DPM along with the progress of the pandemic, corroborated by permutation tests. The results from multivariable regression tests with 23 economic, demographic, health-related, and pandemic restriction quantitative properties, substantiate the dominant contribution of BCG years to the COVID-19 outcomes. The analysis of countries according to an age-group partition reveals that the strongest correlation is attributed to the coverage in BCG vaccination of the young population (0-24 years). Furthermore, a strong correlation and statistical significance are associated with the degree of BCG coverage for the most recent 15 years, but no association was observed in these years for other broadly used vaccination protocols for measles and rubella. We propose that BCG immunization coverage, especially among the most recently vaccinated contributes to attenuation of the spread and severity of the COVID-19 pandemic.
COMMUNICATION | doi:10.20944/preprints202004.0445.v2
Subject: Medicine And Pharmacology, Pulmonary And Respiratory Medicine Keywords: COVID-19; Coronavirus; Respiratory Distress; Tobacco Smoking; Correlation Statistics; Conditional Probability; Regression; China; U.S.A.
Online: 27 July 2020 (05:59:51 CEST)
The novel COVID-19 disease is a contagious acute respiratory infectious disease whose causative agent has been demonstrated to be a new virus of the coronavirus family, SARS- CoV-2. Multiple studies have already reported that risk factors for severe disease include older age and the presence of at least one of several underlying health conditions. However, a recent physiopathological report and the French COVID-19 scientific council have postulated a protective effect of tobacco smoking. Thanks to a meta-analysis, we have been able to demonstrate the statistical significance in this regard of twelve series from China, France and in the US, reporting three different smoking status (current smoker,former smoker, with a smoking history) as well as disease severity (with respectively odds-ratio of 1.78 [1.08-3.10], 4.60 [3.13-7.17], 2.74 [0.63-5.89]). Subsequently and using a Bayesian approach we have established that past, and present smoking is associated with more severe COVID-19 outcomes. Finally, we refute claims linking general population smoking status (N=O(10^8) or O(10^9)) to much smaller disease course series (N=O(10^4)). The latter point in particular is presented to stimulate academic discussion, and must be further investigated by well-designed studies.
ARTICLE | doi:10.20944/preprints202306.0933.v1
Subject: Public Health And Healthcare, Public, Environmental And Occupational Health Keywords: Opisthorchis viverrini; geographic weighted regression; sub-basin; Sakon Nakhon, Thailand.
Online: 14 June 2023 (02:11:11 CEST)
Infection of liver flukes (Opisthorchis viverrini) is partly due to their suitability for habitats in sub-basin areas, which causes the intermediate host to remain in the watershed system in all seasons. Spatial monitoring of fluke infection at the small -basin analysis scale is important because this can enable analysis at the level of the spatial factors involved and influencing infections. A geographic weighted regression model was developed to analyze the spatial characteristics of liver fluke infection, aiming to 1. analyze the spatial factors associated with human liver fluke infection according to sub-basin boundaries and 2. generate an alternative model for enhancing the effectiveness of preventive public health management to reduce the risk of liver fluke infection in humans. The number of infected persons was obtained from local authorities and converted into a percentage of infected people and generated as raster data with a heat map so that the data were continuous and defined as dependent variables. The independent set consisted of nine variables, both vector and raster data, that correlated the location with the village location of an infected person. The results showed that the variables X5stream, X7ndmi, and X9savi were statistically significantly correlated to the percentage of infected people, with the t-stat and p-value being (-2.068, 1.875, and -2.661) and (0.048, 0.034, and 0.021), respectively. The GWR model was able to increase accuracy more than the comparable models such as OLS, in all tests of the four alternative models, with an accuracy increase in R2 of 7.69% (0.576 to 0.624). This study confirms that the development of spatial models with GWR models can screen for factors associated with liver fluke infection at the level of small spatial units such as sub-basins.
ARTICLE | doi:10.20944/preprints202307.1757.v2
Subject: Business, Economics And Management, Finance Keywords: Gross Value Added; Indian economy; Payment and Settlement Systems; Regression analysis
Online: 16 October 2023 (09:56:09 CEST)
This study explores the correlation between payment statistics and sector-wise Gross Value Added (GVA) in the Indian economy from 2011 to 2022, assessing the importance of payment and settlement systems. Through regression analysis, the impact of payment statistics on GVA across diverse sectors is assessed. The results indicate that, during the specified period, payment statistics exhibit a significant influence on sector-wise GVA. These findings underscore the pivotal role of payment and settlement systems in the Indian financial landscape.
ARTICLE | doi:10.20944/preprints202104.0592.v1
Subject: Computer Science And Mathematics, Discrete Mathematics And Combinatorics Keywords: Flexible count regression; balanced discrete gamma distribution; deviance statistic; latent equidispersion; likelihood ratio
Online: 22 April 2021 (08:55:29 CEST)
Most existing flexible count regression models allow only approximate inference. Balanced discretization is a simple method to produce a mean-parametrizable flexible count distribution starting from a continuous probability distribution. This makes easy the definition of flexible count regression models allowing exact inference under various types of dispersion (equi-, under- and overdispersion). This study describes maximum likelihood (ML) estimation and inference in count regression based on balanced discrete gamma (BDG) distribution and introduces a likelihood ratio based latent equidispersion (LE) test to identify the parsimonious dispersion model for a particular dataset. A series of Monte Carlo experiments were carried out to assess the performance of ML estimates and the LE test in the BDG regression model, as compared to the popular Conway-Maxwell-Poisson model (CMP). The results show that the two evaluated models recover population effects even under misspecification of dispersion related covariates, with coverage rates of asymptotic 95% confidence interval approaching the nominal level as the sample size increases. The BDG regression approach, nevertheless, outperforms CMP regression in very small samples (n = 15 − 30), mostly in overdispersed data. The LE test proves appropriate to detect latent equidispersion, with rejection rates converging to the nominal level as the sample size increases. Two applications on real data are given to illustrate the use of the proposed approach to count regression analysis.
ARTICLE | doi:10.20944/preprints202108.0111.v2
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: SARS-COV-2; Bayesian regression; Changepoint detection; European football championship
Online: 16 August 2021 (10:57:52 CEST)
While Europe was beginning to deal with the resurgence of COVID-19 due to the Delta variant, the European football championship took place, June 11 - July 11, 2021. We studied the inversion in the decrease/increase rate of new SARS-COV-2 infections in the countries of the tournament, investigating the hypothesis of an association. Using a Bayesian piecewise regression with a Poisson Generalized Linear Model, we looked for a changepoint in the timeseries of the new SARS-COV-2 cases of each country, expecting it to appear not later than two to three weeks after the date of their first match. The two slopes, before and after the changepoint, were used to discuss the reversal from a decreasing to an increasing rate of the infections. For 17 out of 22 countries (77%) the changepoint came on average 14.97 days after their first match [95% CI 12.29 to 17.47]. For all those 17 countries, the changepoint coincides with an inversion from a decreasing to an increasing rate of the infections. Before the changepoint, the new cases were decreasing, halving on average every 18.07 days [95% CI 11.81 to 29.42]. After the changepoint, the cases begin to increase, doubling every 29.10 days [95% CI 14.12 to 49.78]. This inversion in the SARS-COV-2 case rate, happened during the tournament, provides evidence in favor of a relationship
ARTICLE | doi:10.20944/preprints202307.0194.v1
Subject: Medicine And Pharmacology, Neuroscience And Neurology Keywords: general taste status; taste loss; supervised learning regression; random forest regressor
Online: 4 July 2023 (10:26:21 CEST)
In healthy humans, taste sensitivity varies widely, influencing food selection and nutritional status. Chemosensory reductions have been associated with numerous pathological disorders or pharmacological interventions. Reliable psychophysical methods are crucial resources to analyze the taste function during routine clinical assessment. However, in the daily clinical routine, they are often considered to be too time-consuming. We used the Supervised Learning (SL) regression method to analyze with high precision the overall taste status of healthy controls (HC) and patients with chemosensory loss and to characterize the combination of responses that best can predict the overall taste status of two groups. Random Forest regressor allowed us to achieve our objective. The analysis of the order of importance and impact of each parameter on the prediction of overall taste status in the two groups showed that salty (low concentration) and sour (high concentration) stimuli specifically characterized healthy subjects, while bitter (high concentration) and astringent (high concentration) stimuli identified patients with chemosensory loss. The identification of these distinctions appears to be of interest to the health system since they may allow the use of specific stimuli during routine clinical assessments of taste function reducing the commitment in terms of time and costs.
ARTICLE | doi:10.20944/preprints201711.0138.v1
Subject: Engineering, Safety, Risk, Reliability And Quality Keywords: sensitive analysis; variable fuzzy method; mutual entropy; stepwise regression analysis; mountain flash flood risk
Online: 21 November 2017 (09:28:07 CET)
Flash flood is one of the most significant natural disasters in China, particularly in mountainous area, causing heavy economic damage and casualties of life. Accurate risk assessment is critical to an efficient flash flood management. There are more than 530,000 small watersheds in 2058 counties in China where flash flood should be prevented. In practice, with limited fund and different risk levels, the priorities of each small watershed for flash flood prevention and control are also needed for an efficient flash flood management. This paper, take Licheng county in China as an example, aims to give out these priorities for management. First, sensitive indexes are identified among index system, which includes 9 indexes based on underlying surface characteristics of small watershed in hilly region. Second, the range of each index and the rank division of each index for evaluation are determined. Based on the rank divisions for evaluation, the flash flood risk grade eigenvalue (H) is calculated by Variable Fuzzy Method (VFM ) using 1000 samplings generated by Latin hypercube sampling method. Third, the key sensitivity factors that affect flash flood risk grade eigenvalue (H) are assessed by two different global sensitivity analysis methods -- stepwise regression analysis and mutual entropy. Both results indicate that watershed slope (S) is the most sensitive factor; the second is antecedent precipitation index (CN); while other factors are slightly different sensitive in sequence. This study shows that stepwise regression analysis and mutual information analysis are appropriate for the sensitivity analysis of mountain flash flood risk. Finally, based on watershed slope (S), the priorities of flash flood prevention and control of 119 small watersheds in Licheng county are given out.
ARTICLE | doi:10.20944/preprints202306.1849.v1
Subject: Engineering, Mechanical Engineering Keywords: Machine Learning; Regression Model; XGBoost Regression; Yield Strength
Online: 27 June 2023 (05:25:11 CEST)
Magnesium matrix composites have attracted significant attention due to their lightweight nature and impressive mechanical properties. However, the fabrication process for these alloy compo-sites is often time-consuming, expensive, and labor-intensive. To overcome these challenges, this study employed machine learning (ML) techniques to predict the mechanical properties of magnesium matrix composites. Regression models were utilized to forecast the yield strength of magnesium alloy composites reinforced with various materials. The study incorporated previous research on matrix type, reinforcement type, heat treatment, and mechanical working. The re-gression models employed in this study included decision tree regression, random forest re-gression, extra tree regression, and XGBoost regression. Model performance was assessed using metrics such as RMSE and R2. The XGBoost Regression model out-performed others, exhibiting an R2 value of 0.94 and the lowest error rate. Feature importance analysis indicated that the rein-forcement particle form had the greatest influence on the mechanical properties. The study iden-tified the optimized parameters for achieving the highest yield strength, which was 186.99 MPa. Overall, this study successfully demonstrates the effectiveness of ML as a valuable tool for opti-mizing the production parameters of magnesium matrix composites.
ARTICLE | doi:10.20944/preprints202302.0083.v2
Subject: Environmental And Earth Sciences, Environmental Science Keywords: Multilinear Regression; Dissolve Oxygen; Modeling; Machine Learning; Levenberg–Marquardt algorithm; ANN; Urban Lake
Online: 27 February 2023 (07:25:06 CET)
The paper portrays predictive models for dissolved oxygen (DO) levels in an urban lake using common water quality parameters like Temperature, pH, Conductivity and ORP at a time. Data were sampled using three real-time, industry-standard sensors, OPTOD, CTZN, and PHEHT, and then interpolated using the ArcGIS kriging technique. Correlation studies were analyzed through the ML algorithm, the correlation study signified a highly positive correlation between DO and other water parameters and the model was corroborated by R-score in order to create the linear regression model. In addition, an artificial neural network- a machine learning method using the Levenberg-Marquardt algorithm was developed to build a model to predict the do as well. Then, the performance of the models was validated and also the R2 accuracy was checked of the predicted data against the actual data. Thus, the appropriateness of the ANN model for the forecasting of investigated attributes is indicated by the fact that the discrepancy between the forecasted and real ANN model is significantly lesser than that of the regression model. However, the model can be used to reveal DO data from unknown urban lake water.
ARTICLE | doi:10.20944/preprints202306.0891.v1
Subject: Engineering, Mining And Mineral Processing Keywords: Fragmentation; Artificial neural network; Random Forest regression; Support vector regression; XG Boost Regression; Sensitivity analysis
Online: 13 June 2023 (08:04:17 CEST)
In a limestone quarry mine, fragmentation is a crucial outcome of blasting operations. The optimization of blasting operations greatly benefits from the prediction of rock fragmentation. The main factors that affect fragmentation are rock mass characteristics, blast geometry, and explosive properties. This paper is a step towards the implementation of machine learning and deep learning algorithms for predicting the extent of fragmentation (in percentage) in opencast mining. Various parameters can affect fragmentation. But, in this paper initially, ten parameters (spacing, drill hole diameter, burden, average bench height, powder factor, number of holes, charge per delay, uniaxial compressive strength, specific drilling, and stemming) are collected to train the model. However, due to a weak correlation with rock fragmentation, drill diameter, Average bench height, compressive strength, stemming, and charge per delay are eliminated to reduce model complexity. A total of 219 data sets having five input features i.e., the number of holes, spacing, burden, specific drilling, and powder factor are used to develop the models. To predict rock fragmentation due to blasting in limestone quarry mines, both machine learning models (Random Forest Regression (Bagging), Support Vector Regression, and XG Boost Regression (Boosting)), as well as a deep learning model (Neural Network Regression), are applied to develop a model that can optimize the prediction of fragmentation. The Artificial neural network model optimization showed that the model with architecture 64-32-16-1 can perform well giving MSE (mean squared error) values of 41.32 and 28.59 on training and test data respectively. The R2 value for both training and test is 0.83. Random Forest regression is also performing well compared to SVR and XG boost with the MSE value 12.37 and 9.89 on training and testing data respectively. Here, the R2 value for both sets are 94%. Based on the permutation importance and Shapely plot values, the powder factor has the highest impact, and the burden has the lowest impact on fragmentation.
ARTICLE | doi:10.20944/preprints202112.0007.v1
Subject: Engineering, Energy And Fuel Technology Keywords: SO2; unburned carbon; fly ash; activated carbon; adsorption kinetics; kinetics models; linear regression; non-linear regression; statistical error functions; the sum of normalized error method
Online: 1 December 2021 (10:55:30 CET)
Kinetic parameters of SO2 adsorption on unburned carbons from lignite fly ash and activated carbons based on hard coal dust were determined. The model studies were performed using the linear and non-linear regression method for the following models: pseudo first and second-order, intraparticle diffusion, and chemisorption on a heterogeneous surface. The quality of the fitting of a given model to empirical data was assessed based on: R2, R, Δq, SSE, ARE, χ2, HYBRID, MPSD, EABS, and SNE. It was clearly shown that it is the linear regression that more accurately reflects the behaviour of the adsorption system, which is consistent with the first-order kinetic reaction – for activated carbons (SO2+Ar) or chemisorption on a heterogeneous surface – for unburned carbons (SO2+Ar and SO2+Ar+H2O(g)+O2) and activated carbons (SO2+Ar+H2O(g)+O2). Importantly, usually, each of the approaches (linear/non-linear) indicated a different mechanism of the studied phenomenon. A certain universality of the χ2 and HYBRID functions has been proved, the minimization of which repeatedly led to the lowest SNE values for the indicated models. Fitting data by any of the non-linear equations based on the R or R2 functions only, cannot be treated as evidence/prerequisite of the existence of a given adsorption mechanism.
ARTICLE | doi:10.20944/preprints202101.0375.v1
Subject: Business, Economics And Management, Business And Management Keywords: cold chain logistics of agricultural products; demand forecast; principal component analysis, multiple linear regression, neural network.
Online: 19 January 2021 (11:50:09 CET)
Cold chain logistics of Agricultural Products demand forecasting can provide the scientific basis for the country to formulate logistics strategy, which further promotes the development of social economy and the improvement of living standards in China. In this paper, a new mathematical combined model is proposed to Agricultural Products Demand. Shandong, one of a China’s province, serves as the main producer and distributor of agricultural products. Based on the index system created from multiple related factors influencing cold chain logistics demand of agricultural products in Shandong, this paper employs principal component analysis to reduce the dimension of various indexes and predicts principal components with time series. Thereafter, multiple linear regression model and neural network model were constructed to forecast the cold chain logistics demand of agricultural products in Shandong, and their combined forecast models were compared. What's more, the paper provides insight for reference and decision-making concerning the development of cold chain logistics industry of agricultural products in Shandong province.
ARTICLE | doi:10.20944/preprints202002.0200.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: uniqueness: regression depth; maximum depth estimator; regression median; robustness
Online: 15 February 2020 (14:51:15 CET)
Notion of median in one dimension is a foundational element in nonparametric statistics. It has been extended to multi-dimensional cases both in location and in regression via notions of data depth. Regression depth (RD) and projection regression depth (PRD) represent the two most promising notions in regression. Carrizosa depth DC is another depth notion in regression. Depth induced regression medians (maximum depth estimators) serve as robust alternatives to the classical least squares estimator. The uniqueness of regression medians is indispensable in the discussion of their properties and the asymptotics (consistency and limiting distribution) of sample regression medians. Are the regression medians induced from RD, PRD, and DC unique? Answering this question is the main goal of this article. It is found that only the regression median induced from PRD possesses the desired uniqueness property. The conventional remedy measure for non-uniqueness, taking average of all medians, might yield an estimator that no longer possesses the maximum depth in both RD and DC cases. These and other findings indicate that the PRD and its induced median are highly favorable among their leading competitors.
ARTICLE | doi:10.20944/preprints202305.0010.v1
Subject: Public Health And Healthcare, Health Policy And Services Keywords: DTP vaccine,; Africa; COVID-19; Vaccine coverage; Joinpoint regression; Health care system; Vaccination rates
Online: 1 May 2023 (03:49:11 CEST)
: Background: Vaccine-related death is one of the leading causes of death among African Children. Vaccine coverage is a very important measure to decrease infant mortality. Covid-19 Pandemic has affected the healthcare system and may have disrupted vaccine coverage. Methods: DTP third doses (DTP3) Vaccine Coverage was extracted from UNICEF databases from 2012 to 2021( the last available date). Joinpoint regression was performed to detect the point where the trend changed. The annual percentage change (APC) with 95% confidence intervals (95% CI) was calculated for Africa and the regions. We compared DTP3 vaccination coverage in 2019 with 2021 in each country to verify compliance with WHO targets. Result: During the whole period, the vaccine coverage in Africa has increased with an Annual Percent change of 1.2% (IC 95% 0.9-1.5): We detected one Joint point in 2019. In 2019-2021, there was a decrease in DTP3 coverage with an APC of -3.5(95% -6.0;-0,9). (P< 0.001). Vaccination rates have decreased in many regions and countries during the last two years. Conclusions. COVID-19 has disrupted vaccine coverage, decreasing it all over Africa.
ARTICLE | doi:10.20944/preprints201808.0229.v1
Subject: Business, Economics And Management, Economics Keywords: Economic evaluation; Water resource management; Meta-regression analysis; River management funds; Sustainability of water resources
Online: 13 August 2018 (12:26:24 CEST)
Water management can improve the quality of valuable ecosystem services but can be costly to implement and the management costs are covered by national taxes collected by water users. Based on 30 valuation studies of water quality improvement from the Environmental Valuation Information System (EVIS) database provided by Korea Environment Institute (KEI), a meta-regression analysis was employed to measure the benefits that major river basins provided to the society. We compare these benefits to the costs, namely River Management Funds (RMFs) which are financial resources to support a variety of projects for managing and improving upstream water quality. Based on benefit-cost comparison, this study evaluates the efficiency of water resource management in South Korea. This study also provides policy options that are helpful to maintain the sustainability of water resource by improving the planning and performance of water management in the long run.
ARTICLE | doi:10.20944/preprints202201.0209.v1
Subject: Business, Economics And Management, Economics Keywords: Economic Growth; Gross Fixed Capital Formation; Government Expenditure; Government Deficit; Vector Auto-Regression and South Africa
Online: 14 January 2022 (11:36:07 CET)
The study uses annual time series data from the South Africa Reverse Bank (SARB) from 1980 to 2020 to examine the effectiveness of fiscal policy on economic growth in South Africa. The Augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) unit root tests, as well as the Johansen Co-integration test, Granger causality test, and Vector Auto-Regression (VAR) method, were used in the study. Real GDP per capita (RGDP) is used as proxy of economic growth and gross fixed capital formation (GFCF), government expenditure (GEXP) and government deficit (GOVD) as the proxies of fiscal policy. The ADF test results show that all variables are stationary at the first difference, with the exception of GFCF and GEXP, which are stationary at I(0), whereas the PP test results show that all variables are stationary at I(1), with the exception of GEXP, which is stationary at I(0). At Maximum Eigenvalue, the four variables are not cointegrated. The findings of the Granger causality test demonstrated a unidirectional causation from GOVD to RGDP, as well as a bidirectional causality from RGDP to GFCF and GEXP. Error Correction Model Estimated using VAR shows that GFCF, GEXP have positive effect on RGDP whereas GOVD has a negative effect on RGDP in the short run. The findings also presented that the VAR's residuals are homoscedastic, which means they are normally distributed and have no serial correlation.
ARTICLE | doi:10.3390/sci2040074
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: trend analysis; Mann–Kendall test; Sen’s slope estimator; linear regression; cereal yield; northern Togo
Online: 24 September 2020 (00:00:00 CEST)
This study investigates the trend in monthly and annual rainfall, minimum and maximum temperature (Tmin and Tmax) using the Mann–Kendall (MK) test and Sen’s slope (SS) method and evaluates the significance of their variability for maize, sorghum and millet yields in northern Togo employing multiple regression analysis. The historical data of Kara, Niamtougou, Mango and Dapaong weather stations from 1977 to 2012 were used. Four non-parametric methods—Alexandersson’s Standard Normal Homogeneity Test (SNHT), Buishand’s Range Test (BRT), Pettitt’s Test (PT) and Von Neumann’s Ratio Test (VNRT)—were applied to detect homogeneity in the data. For the data which were serially correlated, a modified version of the MK test (pre-whitening) was utilised. Results showed an increasing trend in the annual rainfall in all four locations. However, this trend was only significant at Dapaong (p < 0.1). There was an increasing trend in Tmax at Kara, Mango and Niamtougou, unlike Dapaong where Tmax revealed a significant decreasing trend (p < 0.01). Similarly, there was an increasing trend in Tmin at Kara, Mango and Dapaong, unlike Niamtougou where Tmin showed a non-significant decreasing trend (p > 0.05). Rainfall in Dapaong was found to have increased (7.79 mm/year) more than the other locations such as Kara (2.20 mm/year), Niamtougou (4.57 mm/year) and Mango (0.67 mm). Tmax increased by 0.13, 0.13 and 0.32 °C per decade at Kara, Niamtougou and Mango, respectively, and decreased by 0.20 per decade in Dapaong. Likewise, Tmin increased by 0.07, 0.20 and 0.02 °C per decade at Kara, Mango and Dapaong, respectively, and decreased by 0.01 °C per decade at Niamtougou. Results of multiple regression analysis revealed nonlinear yield responses to changes in rainfall and temperature. Rainfall and temperature variability affects rainfed cereal crops production, but the effects vary across crops. The temperature has a positive effect on maize yield in Kara, Niamtougou and Mango but a negative effect on sorghum in Niamtougou and millet in Dapaong, while rainfall has a negative effect on maize yield in Niamtougou and Dapaong and millet yield in Mango. In all locations, rainfall and temperature variability has a significant effect on the cereal crop yields. There is, therefore, a need to adopt some adaptation strategies for sustainable agricultural production in northern Togo.
ARTICLE | doi:10.20944/preprints201807.0299.v1
Subject: Business, Economics And Management, Business And Management Keywords: commuting stress; turnover intention; life satisfaction; mediation model; demographics; ANOVA; hierarchical regression; bootstrap; Turkey
Online: 17 July 2018 (09:49:16 CEST)
Using hierarchical regression analysis within a mediation model framework, the present study explores direct and indirect (through life satisfaction) causal impacts of commuting stress on turnover intention of employees from 29 business organizations in six populous cities of Turkey. A semi-random heterogeneous sample of 214 employees with different demographics was surveyed in winter and summer times for also capturing seasonal variations of variables. The results supporting the partial mediating role of life satisfaction in the positive relationship between commuting stress and turnover intention infer that commuting stress induces turnover intention directly and indirectly (by reducing life satisfaction). The analysis of variance reveals that demographic characteristics of employees such as gender, marital status, age, and family size together with commuting type and commuting duration matter for their perceived commuting stress, life satisfaction, and turnover intention levels. Commuting stress perception is relatively higher in summer time whereas the other magnitudes are consistently and significantly invariant between two survey implementations. The study concludes with a call for the consideration of commuting stress and life satisfaction together with environmental and demographic factors when analyzing the antecedents and consequences of employee turnover intention.
ARTICLE | doi:10.20944/preprints202007.0008.v1
Subject: Business, Economics And Management, Econometrics And Statistics Keywords: Copula Regression; ICT resources; Middle East; Spatial Analysis; Students Well-being; Sustainable Development Goals
Online: 2 July 2020 (13:18:03 CEST)
Target 9.c of the 2015 United Nations (UN) sustainable development goals (SDGs) specifically addresses increasing access to information and communication technology (ICT) resources, and striving for universal access to the internet by 2020. The present study seeks to evaluate the effectiveness of the youth related national strategies implemented in this regard by a select number of countries in the Middle East region. The study does so, by relying on a spatial bivariate copula regression analysis of data on youth respondents from five countries, extracted from the 2018 Program for international students’ assessment (PISA). Focusing specifically on evaluating the availability of ICT resources to the youth population, and also identifying the impact of ICT resources on youth subjective well-being in the region, we find that except for the UAE and Qatar that have above OECD average youth performance on the ICT resource index, youth from the remaining countries reported below OECD level average access to ICT resources. The within region cross-country comparative analysis of ICT resources availability to the youth population at home, also highlighted significant heterogeneity across the five countries, post 2015 SDG adoption by UN country members. Furthermore, looking at the impact of ICT resources on youth well-being, controlling for not only cross-country spatial correlations, and factors such as home educational resources, cultural possessions at home, parental occupation status, youth expected occupation status, economic and socio-cultural status, age, gender, and grade level in school; we found that every standard deviation increase in ICT resources to the youth population in the region raises their self-expressed sense of belonging in school by 1.88% standard deviations. Given the empowering nature of ICT resources to youth, and the potential of both to support national as well as regional economic development initiatives, a concerted effort to ease ICT resources diffusion by member countries in the middle east region could assist not only each country in its own development path, but also the region as a whole to live up to its growth potential by the 2030.
ARTICLE | doi:10.20944/preprints202308.1961.v1
Subject: Engineering, Energy And Fuel Technology Keywords: contact pressure; finite element analysis; gasket material; hyperelastic models, PEMFC, polynomial regression, strain functions, von Mises stress
Online: 29 August 2023 (13:56:40 CEST)
The degradation of Proton-exchange membrane fuel cell (PEMFC) gasket materials is crucial in electric vehicles as it can cause hazardous hydrogen fuel leaks, which are usually due to high temperatures, pressures, and hydrogen fuel exposure. Degradation of gasket materials in PEMFC presents a critical concern for electric vehicle safety due to potential hydrogen fuel leaks. This study utilizes finite element analysis (FEA) to assess the suitability of gasket materials for PEMFC applications, focusing on aging and tensile conditions. The dual degradation framework, incorporating contact pressure and von Mises stress, is employed to evaluate Liquid Silicon Rubber (LSR) and Ethylene Propylene Diene Monomer (EPDM) materials. Under aging techniques, the Yeoh model exhibits the least Mean Absolute Percentage Error (MAPE) and computational cost of 0.27 seconds, while the Ogden model records the highest computational cost of 0.89 seconds. In evaluating MAE, Root Mean Square Error (RMSE), and R-squared metrics, LSR and EPDM materials demonstrate respective averages of 0.25%, 0.275%, 0.945%, and 0.815%, 0.685%, 0.77%. Tensile testing (Uniaxial) reveals RMSE and MAE values of 0.30%, 0.40%, and 0.50%, 0.40%, respectively. FEA proves instrumental in identifying suitable gasket materials for PEMFC applications. LSR emerges as the superior choice, demonstrating enhanced FEA modelling performance under aging and tensile conditions. These findings contribute valuable insights to the design and development of improved gasket materials, bolstering the safety and reliability of electric vehicles.
ARTICLE | doi:10.20944/preprints202309.2134.v1
Online: 30 September 2023 (05:42:45 CEST)
Dipteryx spp. is an important species in reforestation in the Amazon. The objective of this study is to characterize and compare the relationships between dendrometric variables in Dipteryx spp. stands in the Western Amazon by fitting linear regression equations for total height and crown diameter. Six forest stands were evaluated in three municipalities. Dendrometric variables collected included diameter at 1.3 m height (dbh), total height (ht) and crown diameter (dc). Simple and multiple linear regression equations were fitted to characterize the relationships between ht and dc. The total aboveground biomass of Dipteryx spp. trees and the carbon stock of the stands were estimated. The general equations showed higher R² values, exceeding 0.7. The general equations for estimating ht and dc were significant for all coefficients. The trees averaged 22 t/ha of aboveground biomass in the stands. There was a variation in carbon sequestration potential among stands, ranging from 5.12 to 88.91 t CO2.ha-1. Single-input equations using dbh as an independent variable are recommended for estimating dc and ht for individual Dipteryx spp. stands. Stands in the Western Amazon play a significant role in carbon sequestration and accumulation. Trees can sequester an average of 4.8 tons of CO2 per year.
REVIEW | doi:10.20944/preprints202110.0207.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: transfer learning; classification; regression
Online: 13 October 2021 (16:28:59 CEST)
Accurate transfer learning of clinical outcomes, e.g., of the effects and side effects of drugs or other interventions, from one cellular context to another (in-vitro versus ex-vivo versus in-vivo, or across tissues), between cell-types, developmental stages, omics modalities or species, is considered tremendously useful. Ultimately, it may avoid most drug development failing in translation, despite large investments in the preclinical stages, which includes animal experiments requiring careful justification. Thus, when transferring a prediction task from a source (model) domain to a target domain, what counts is the high quality of the predictions in the target domain, requiring molecular states or processes common to both source and target that can be learned by the predictor, reflected by latent variables. These latent variables may form a compendium of knowledge that is learned in the source, to enable predictions in the target; usually, there are few, if any, labeled target training samples to learn from. Transductive learning then refers to the learning of the predictor in the source domain, transferring its outcome label calculations to the target domain, considering the same task. Inductive learning considers cases where the target predictor is performing a different yet related task as compared to the source predictor, making some labeled target data necessary. Often, there is also a need to first map the variables in the input/feature spaces (e.g. of gene names to orthologs) and/or the variables in the output/outcome spaces (e.g. by matching of labels). Transfer across omics modalities also requires that the molecular information flow connecting these modalities is sufficiently conserved. Only one of the methods for transfer learning we reviewed offers an assessment of input data, suggesting that transfer learning is unreliable in certain cases. Moreover, source domains feature their very own particularities, and transfer learning should consider these, e.g., as differences in pharmacokinetics, drug clearance or the microenvironment. In light of these general considerations, we here discuss and juxtapose various recent transfer learning approaches, specifically designed (or at least adaptable) to predict clinical (human in-vivo) outcomes based on molecular data, towards finding the right tool for a given task, and paving the way for a comprehensive and systematic comparison of the suitability and accuracy of transfer learning of clinical outcomes.
Subject: Business, Economics And Management, Economics Keywords: electricity poverty; quantile regression
Online: 18 September 2020 (09:40:45 CEST)
The main objective of this article is to explore the causes of household electricity poverty in Spain from an innovative perspective. Based on evidence of energy inequality across households with different income levels, a quantile regression approach was used to better capture the heterogeneity of determinants of energy poverty across different levels of electricity expenditure. The results illustrate some interesting and counter-intuitive findings about the relationship between household income and electricity poverty, and the technical efficiency of quantile regression compared to the imprecise results of a standard single coefficient/OLS approach.
ARTICLE | doi:10.20944/preprints202201.0441.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Active learning (AL); batch mode; expected model change; linear regression; nonlinear regression
Online: 28 January 2022 (15:03:10 CET)
Training supervised machine learning models requires labeled examples. A judicious choice of examples is helpful when there is a significant cost associated with assigning labels. This article improves upon a promising extant method – Batch-mode Expected Model Change Maximization (B-EMCM) method – for selecting examples to be labeled for regression problems. Specifically, it develops and evaluates alternate strategies for adaptively selecting batch size in B-EMCM. By determining the cumulative error that occurs from the estimation of the stochastic gradient descent, a stop criteria for each iteration of the batch can be specified to ensure that selected candidates are the most beneficial to model learning. This new methodology is compared to B-EMCM via mean absolute error and root mean square error over ten iterations benchmarked against machine learning data sets. Using multiple data sets and metrics across all methods, one variation of AB-EMCM, the max bound of the accumulated error (AB-EMCM Max), showed the best results for an adaptive batch approach. It achieved better root mean squared error (RMSE) and mean absolute error (MAE) than the other adaptive and non-adaptive batch methods while reaching the result in nearly the same number of iterations as the non-adaptive batch methods.
ARTICLE | doi:10.20944/preprints202305.0917.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: Machine learning; Geriartic fall detection; Dataset; K Nearest Neighbours; Naive Bayes; Logistic Regression; Random Forest; Support Vector Machine
Online: 12 May 2023 (10:00:16 CEST)
ARTICLE | doi:10.20944/preprints202312.0295.v1
Subject: Engineering, Energy And Fuel Technology Keywords: biomass; ultimate analysis; near-infrared spectroscopy; partial least squares regression; wood; non-wood; scatter plot analysis
Online: 6 December 2023 (09:45:02 CET)
The ultimate analysis parameters including carbon (C), hydrogen (H), nitrogen (N), and oxygen (O) content in biomass was rarely found to be predicted by nondestructive tests until to date. In this research, we developed partial least squares regression (PLSR) models to predict the ultimate analysis parameters of chip biomass using near infrared (NIR) raw spectra of non-wood and wood samples from fast growing tree and agricultural residue and nine different traditional spectral preprocessing techniques. These techniques include first derivative (sd1), second derivative (sd2), constant offset, standard normal variate (SNV), multiplicative scatter correction (MSC), vector normalization, min-max normalization, mean centering, sd1 + vector normalization, and sd1 + MSC. Additionally, we employed a genetic algorithm (GA), successive projection algorithm (SPA), multi-preprocessing (MP) 5−range, and MP 3−range to develop a PLSR model for rapid prediction. A dataset consisting of 120 chip biomass samples was utilized for model development in which the samples was non-wood samples of 65-67% and wood samples was 33-35%, and the model performance were evaluated and compared. The selection of the optimum performing model was mainly based on criteria such as the coefficient of determination in the prediction set (R2P), root mean square error of the prediction set (RMSEP), and the ratio of prediction to deviation values (RPD). The optimal model for weight percentage (wt.%) of C was obtained using GA−PLSR, yielding R2P, RMSEP, and RPD values of 0.6954, 1.1252 wt.%, and 1.8, respectively. Similarly, for wt.% of O, the most effective model was obtained using the multi-preprocessing PLSR−5 range method with R2P of 0.7150, RMSEP of 1.3088 wt.%, and RPD of 1.9. For wt.% of N, the optimal model was obtained using the MP PLSR−3 range method, resulting in R2P, RMSEP, and RPD values of 0.6073, 0.1008 wt.%, and 1.6, respectively. However, wt.% of H model provided R2P, RMSEP, and RPD values of 0.5162, 0.2322 wt.%, and 1.5, respectively. Notably, the limit of quantification (LOQ) values for C, H, and O were lower than the minimum reference values used during model development, indicating a high level of sensitivity. However, the LOQ for N, exceeded the minimum reference value, implying the samples to be predicted by the model must be in the range of reference range in calibration set. By scatter plot analysis, the effect of combined non-wood and wood spectra of biomass chips on rapid prediction of ultimate analysis parameters using NIR spectroscopy was investigated. To include different species in a model, the species have to be not only in the different values of the constituents to make a wider range for robust model but also they must provide their trend line characteristics in the scatter plot i.e. correlation coefficient (R), slope and intercept (same slope and slope approached to 1 and intercept is same (no gap) and approached zero, high R approached to 1). The effect of the R, slope and intercept to obtain the better optimized model were studied. The results show that the different species affected model performance of each parameter prediction in a different manner and by scatter plot analysis which of these species were affecting the model negatively and how the model could be improved was indicated. This is the first time of the effect is studied by the principle of scatter plot.
ARTICLE | doi:10.20944/preprints202309.0970.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: terraced-field areas (TRAs); machine learning; Yellow River Basin (YRB); linear mixed model (LMM); random forest regression; Google Earth Engine (GEE)
Online: 14 September 2023 (08:54:12 CEST)
The Yellow River Basin (YRB) is a crucial ecological zone and an environmentally vulnerable re-gion in China. Understanding the temporal and spatial trends of terraced-field areas (TRAs) and the factors underlying them in the YRB is essential for improving land use, conserving water re-sources, promoting biodiversity, and preserving cultural heritage. In this study, we employed ma-chine learning on the Google Earth Engine (GEE) platform to obtain spatial distribution images of TRAs from 1990 to 2020 using Landsat 5 (1990－2010) and Landsat 8 (2015－2020) remote sens-ing data. The GeoDa software platform was used for spatial autocorrelation analysis, revealing distinct spatial clustering patterns. Mixed linear and random forest models were constructed to identify the driving force factors behind TRA changes. The research findings reveal that TRAs were primarily concentrated in the upper and middle reaches of the YRB, encompassing provinc-es such as Shaanxi, Shanxi, Qinghai, and Gansu, with areas exceeding 40,000 km2, whereas other provinces had TRAs of less than 30,000 km2 in total. The TRAs exhibited a relatively stable trend, with provinces such as Gansu, Qinghai, and Shaanxi showing an overall upward trajectory. Conversely, Shanxi and Inner Mongolia demonstrated an overall declining trend. When com-pared with other provinces, the variations in TRAs in Ningxia, Shandong, Sichuan, and Henan appeared to be more stable. The linear mixed model (LMM) revealed that farmland, shrubs, and grassland had significant positive effects on the TRA, explaining 41.6% of the variance. The ran-dom forest model also indicated positive effects for these factors, with high R² values of 0.983 and 0.86 for the training and testing sets, respectively, thus outperforming the LMM. The findings of this study can contribute to the restoration of the YRB's ecosystem and support sustainable devel-opment. The insights gained will be valuable for policymaking and decision support in soil and water conservation, agricultural planning, and environmental protection in the region.
ARTICLE | doi:10.20944/preprints202208.0222.v1
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: Tuberculosis; Mortality; Indigenous; Logistic Regression
Online: 11 August 2022 (12:00:20 CEST)
Aim. To identify factors associated with mortality with tuberculosis diagnosis in the indigenous population in Peru 2015-2019. Methods. Case-control study nested in a retrospective cohort, using the registry of persons belonging to indigenous peoples of the National Tuberculosis Prevention and Control Strategy of the Ministry of Health of Peru. A descriptive analysis was applied, and then bivariate and multiple logistic regression was used to evaluate associations between the variables and the outcome (live-deceased), the results were presented as OR with their respective 95% confidence intervals. Results. The mortality rate of the total indigenous population of Peru was 1.75 deaths per 100,000 indigenous people diagnosed with TB. The community of Kukama kukamiria - Yagua reported 505 (28.48%) individuals. The final logistic model showed that indigenous men (OR=1.93; 95% CI: 1.001-3.7), with a history of HIV prior to TB (OR=16.7; 95% CI: 4.7-58.7) and indigenous people in old age (OR=2.95; 95% CI: 1.5-5.7), are factors associated with a greater chance of dying from TB. Conclusions. It is important to reorient health services among indigenous populations, especially those related to improving the timely diagnosis and early treatment of TB-HIV co-infection, to ensure comprehensive care for this population, considering that they are vulnerable groups.
ARTICLE | doi:10.20944/preprints202110.0127.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Descriptive analysis; principal components analysis; k-means clustering; data panel regression method; machine learning; XGBoost algorithms; random forest algorithms
Online: 8 October 2021 (08:30:13 CEST)
The aim of this work is to explain the behaviour of the multiresistance percentage of Pseudomona aeruginosa in some countries of Europe through a multivariate statistical analysis and machine learning validation, using data from the European Antimicrobial Resistance Surveillance System, the World Health Organization and the World Bank. First, we will use a descriptive analysis and a principal components analysis. Then, we use a k-means clustering to determine the countries and regions that are most affected by the antibiotic resistance. Second, we expand the database by adding some socioeconomic, governance and antibiotic-consumption variables. We then run a data panel regression analysis to determine some functions that relates the multiresistance percentage with those new variables. Finally, we use machine learning techniques to validate a pooling panel data case, using XGBoost and random forest algorithms. The results of the data panel analysis indicate that the most important variables for the multiresistance percentage are corruption control and the rule of law. Similar results are found with the machine learning validation analysis, where the human development index is an additional important variable for the multiresistance percentage.
ARTICLE | doi:10.20944/preprints202307.0288.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Idiosyncratic Volatility Estimation/Prediction; Machine Learning; Deep learning Based Regression; Tree-Based Regression; Artificial Intelligence
Online: 6 July 2023 (02:14:16 CEST)
Financial markets require a great deal of decision making from the investors and market makers. One metric that can help ease the process of decision making is investment risk which can be measured in two parts; systematic risk and idiosyncratic risk. Clear understanding of the volatilities in each risk component can be a powerful signal in recognizing the right assets to maximize the investment returns. In this paper, we focus on the idiosyncratic volatility values and pre-calculate the idiosyncratic volatility values for 31,198 members of NYSE, Amex and Nasdaq markets for the trades occurring between January 1963 and December 2019. Utilizing a subset of dataset, limited to Nasdaq100 index, we consider the application of machine learning techniques in predicting the idiosyncratic volatility values using the raw trade data to explore a data extension option for the future market trade records that have not yet occurred. We offer a deep learning based regression model and compare it with traditional tree-based methods on a small subset of our per-calculated idiosyncratic volatility dataset. Our analytical results show that the performance of the deep learning techniques is much more robust in comparison to that of the traditional tree-based baselines.
ARTICLE | doi:10.20944/preprints201804.0357.v1
Subject: Engineering, Control And Systems Engineering Keywords: hydrokinetic; energy assessment; unregulated river; daily water velocity estimation; daily water level estimation; IBM statistical package for social sciences (SPSS); regression analysis; east malaysia
Online: 27 April 2018 (08:39:22 CEST)
Electrification coverage in Sarawak is the lowest at 78.74%, compared to Peninsular Malaysia at 99.62% and Sabah at 82.51%. Kapit, Sarawak with its 88.4% populations located in rural areas and mostly situated along the main riverbanks has great potential to generate electrical energy by hydrokinetic system. Yearly water velocity data is the most significant parameter to perform hydrokinetic analysis study. Nevertheless, the data retrieved from local river databases are inadequate for river energy analysis, thus hindering its progression. Instead, flow rates and rainfall data had been utilised to estimate the water velocity data. This signifies no estimation of water velocity in an unregulated river by using water level data had been made. Therefore, a novel technique of estimating the daily average water velocity data in unregulated rivers is proposed. The modelling of regression equation for water velocity estimation was performed and two regression model equations were generated to estimate both water level and water velocity on-site and proven to be valid as the coefficient of determination values had been R2 = 87.4% and R2=87.9%, respectively. The combination of both regression model equations can be used to estimate long-term time series water velocity data for type-C unregulated river in remote areas.
ARTICLE | doi:10.20944/preprints202008.0139.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: copper price; prediction; support vector regression
Online: 6 August 2020 (08:26:35 CEST)
Predicting copper price is essential for making decisions that can affect companies and governments dependent on the copper mining industry. Copper prices follow a time series that is non-linear, non-stationary, and which have periods that change as a result of potential growth, cyclical fluctuation and errors. Sometimes the trend and cyclical components together are referred to as a trend-cycle. In order to make predictions, it is necessary to consider the different characteristics of trend-cycle. In this paper, we study a copper price prediction method using Support Vector Regression. This work explores the potential of the Support Vector Regression with external recurrences to make predictions at 5, 10, 15, 20 and 30 days into the future in the copper closing price at the London Metal Exchanges. The best model for each forecast interval is performed using a grid search and balanced cross-validation. In experiments on real data-sets, our results obtained indicate that the parameters (C, ε, γ) of the model Support Vector Regression do not differ between the different prediction intervals. Additionally, the amount of preceding values used to make the estimates does not vary according to the predicted interval. Results show that the support vector regression model has a lower prediction error and is more robust. Our results show that the presented model is able to predict copper price volatilities near reality, being the RMSE equal or less than the 2.2% for prediction periods of 5 and 10 days.
ARTICLE | doi:10.20944/preprints201902.0135.v1
Subject: Business, Economics And Management, Finance Keywords: recovery rates; beta regression; credit risk
Online: 14 February 2019 (11:30:03 CET)
Based on a rich data set of recoveries donated by a debt collection business, recovery rates for non-performing loans taken from a single European country are modelled using linear regression, linear regression with Lasso, beta regression and inflated beta regression. We also propose a two-stage model: beta mixture model combined with a logistic regression model. The proposed model allows us to model the multimodal distribution we find for these recovery rates. All models are built using loan characteristics, default data and collections data prior to purchase by the debt collection business. The intended use of the models is to estimate future recovery rates for improved risk assessment, capital requirement calculations and bad debt management. They are compared using a range of quantitative performance measures under K-fold cross validation. Among all the models, we find that the proposed two-stage beta mixture model performs best.
ARTICLE | doi:10.20944/preprints201809.0499.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: aquatics; modeling; boosted regression trees; appalachians
Online: 26 September 2018 (05:23:02 CEST)
Understanding influences of multiple stressors across the landscape on aquatic biota is important for conservation, as it allows for an understanding of spatial patterns and informs stakeholders of significant conservation value. Data exists for land use/landcover (LULC) and other physicochemical components of the landscape throughout the Appalachian region yet biological data is sparse. This dearth of biological data relative to LULC and physicochemical data creates difficulties in making informed management and conservation decisions across large landscapes. At the HUC12 watershed scale we sought to create a single score for both abiotic and biotic values throughout the central and southern Appalachian region. We used boosted regression trees (BRT) to model biological responses (fish and aquatic macroinvertebrate variables) to abiotic variables. Variance explained by BRT models ranged from 62-94%. We categorized both predictor and response variables into themes and targets respectively to better understand large scale patterns on the landscape that influence biological condition of streams. We combined predicted values for a suite of response variables from BRT models to create a single watershed score for aquatic macroinvertebrates and fish. Regional models were developed for fish but we were unable to develop regional models for aquatic macroinvertebrates due to the low number of sample sites. There was strong correlation between regional and global watershed scores for fish models but not between fish and aquatic macroinvertebrate models. Use of such multimetric scores can inform managers, NGOs, and private land owners regarding land use practices; thereby contributing to largescale landscape scale conservation efforts.
COMMUNICATION | doi:10.20944/preprints202111.0549.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Principal Component Regression, Partial Least Squares, Orthogonal Partial Least Squares, multivariate regression, hypothesis generation, Parkinson’s disease
Online: 29 November 2021 (15:42:03 CET)
In the current era of ‘big data’, scientists are able to quickly amass enormous amount of data in a limited number of experiments. The investigators then try to hypothesize about the root cause based on the observed trends for the predictors and the response variable. This involves identifying the discriminatory predictors that are most responsible for explaining variation in the response variable. In the current work, we investigated three related multivariate techniques: Principal Component Regression (PCR), Partial Least Squares or Projections to Latent Structures (PLS), and Orthogonal Partial Least Squares (OPLS). To perform a comparative analysis, we used a publicly available dataset for Parkinson’ disease patien ts. We first performed the analysis using a cross-validated number of principal components for the aforementioned techniques. Our results demonstrated that PLS and OPLS were better suited than PCR for identifying the discriminatory predictors. Since the X data did not exhibit a strong correlation, we also performed Multiple Linear Regression (MLR) on the dataset. A comparison of the top five discriminatory predictors identified by the four techniques showed a substantial overlap between the results obtained by PLS, OPLS, and MLR, and the three techniques exhibited a significant divergence from the variables identified by PCR. A further investigation of the data revealed that PCR could be used to identify the discriminatory variables successfully if the number of principal components in the regression model were increased. In summary, we recommend using PLS or OPLS for hypothesis generation and systemizing the selection process for principal components when using PCR.rewordexplain later why MLR can be used on a dataset with no correlation
ARTICLE | doi:10.20944/preprints202310.1665.v1
Subject: Environmental And Earth Sciences, Remote Sensing Keywords: Unmanned Aerial Systems (UAS); Urea Deep Placement (UDP); Linear Regression; Plot Scale; Field Scale; Crop Health; NDVI; OSAVI; Jenks Natural Breaks Classification
Online: 26 October 2023 (04:58:59 CEST)
TThe following three objectives were tested in this study: (1) investigate the utility of low-altitude remote sensing using UAS technology to compare the effects of different N application systems in rice production; (2) use spatial extrapolation to scale up plot-level generated to farmer field rice yield data based on crop spectral signatures, and (3) predict and map out rice productivity as a function of N placement systems. Images were captured on a UAV platform at midseason of the rice crop. Orthomosaics were developed for selected fields in rice-producing zones. Grain yields were assessed from low, medium, and high crop health plots delineated based on NDVI values. On the plot scale, UDP outyielded non-UDP by 0.84%. Individual plot yield data were scaled up to the farmer field level through Jenks natural breaks classification and es-tablishing an empirical relationship between OSAVI and plot yields. Assessment of the scaled-up field levelfield-level data also confirmed the superiority of UDP N man-agement over the non-UDP systems in promoting rice yields. Scaling up plot scale da-ta to whole field levels also facilitated generating and mapping expected yield maps for individual farmer fields in the three zones studied. This study has established a tangible simple but tangible protocol protocol for predicting and mapping rice yields in small-scale farmer fields using UAS data.
ARTICLE | doi:10.20944/preprints201907.0351.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: evaporation; meteorological parameters; Gaussian process regression; support vector regression; machine learning modeling; hydrology; prediction; data science; hydroinformatics
Online: 31 July 2019 (10:58:29 CEST)
Evaporation is one of the main processes in the hydrological cycle, and it is one of the most critical factors in agricultural, hydrological, and meteorological studies. Due to the interactions of multiple climatic factors, the evaporation is a complex and nonlinear phenomenon; therefore, the data-based methods can be used to have precise estimations of it. In this regard, in the present study, Gaussian Process Regression (GPR), Nearest-Neighbor (IBK), Random Forest (RF) and Support Vector Regression (SVR) were used to estimate the pan evaporation (PE) in the meteorological stations of Golestan Province, Iran. For this purpose, meteorological data including PE, temperature (T), relative humidity (RH), wind speed (W) and sunny hours (S) collected from the Gonbad-e Kavus, Gorgan and Bandar Torkman stations from 2011 through 2017. The accuracy of the studied methods was determined using the statistical indices of Root Mean Squared Error (RMSE), correlation coefficient (R) and Mean Absolute Error (MAE). Furthermore, the Taylor charts utilized for evaluating the accuracy of the mentioned models. The outcome indicates that the optimum state of Gonbad-e Kavus, Gorgan and Bandar Torkman stations, Gaussian Process Regression (GPR) with the error values of 1.521, 1.244, and 1.254, the Nearest-Neighbor (IBK) with error values of 1.991, 1.775, and 1.577, Random Forest (RF) with error values of 1.614, 1.337, and 1.316, and Support Vector Regression (SVR) with error values of 1.55, 1.262, and 1.275, respectively, have more appropriate performances in estimating PE. It found that GPR for Gonbad-e Kavus Station with input parameters of T, W and S and GPR for Gorgan and Bandar Torkmen stations with input parameters of T, RH, W, and S had the most accurate performances and proposed for precise estimation of PE. Due to the high rate of evaporation in Iran and the lack of measurement instruments, the findings of the current study indicated that the PE values might be estimated with few easily measured meteorological parameters accurately.
ARTICLE | doi:10.20944/preprints202307.1405.v1
Subject: Engineering, Chemical Engineering Keywords: neural network regression; wastewater quality; spectral reflectance
Online: 20 July 2023 (10:44:00 CEST)
Wastewater (WW) analysis is a critical step in various operations such as control of a WW treatment facility, and speeding-up the analysis of WW quality can significantly improve such operations. This work demonstrates the capability of neural network (NN) regression models to estimate WW characteristic properties such as biochemical oxygen demand (BOD), chemical oxygen demand (COD), ammonia (NH3-N), total dissolved substances (TDS), total alkalinity (TA), and total hardness (TH) by training on WW spectral reflectance in the visible to near-infrared spectrum (400nm-2000nm). The dataset contains samples of spectral reflectance intensity, which were the inputs, and the WW parameter levels (BOD, COD, NH3-N, TDS, TA, and TH), which were the outputs. Various NN model configurations were evaluated in terms of regression model fitness. The mean-absolute-error (MAE) was used as the metric for training and testing the NN models, and the coefficient of determination (R2) between the model predictions and true values was also computed to measure how well the NN models predict the true values. With online spectral measurements, the trained neural network model can provide non-contact and real-time estimation of WW quality at minimum estimation error.
ARTICLE | doi:10.20944/preprints202305.1678.v1
Subject: Business, Economics And Management, Economics Keywords: Europe; Income Distrubution; Relative Distribution; RIF-regression
Online: 24 May 2023 (03:34:42 CEST)
The issue of polarization, as opposed to inequality, has been little explored for European countries. in this paper, using harmonized data produced by Luxembourg Income Study Database, observes income trends for 12 European countries, showing an increase in polarization in many of the countries considered. the drivers that led to this concentration of income are also analyzed, noting heterogeneous factors within countries.
ARTICLE | doi:10.20944/preprints202305.0792.v1
Subject: Business, Economics And Management, Business And Management Keywords: Baltic Dry Index; Covid-19; Stepwise Regression
Online: 11 May 2023 (05:11:46 CEST)
The outbreak of COVID-19 in 2020 caused significant disruptions to global shipping and the world economy. This paper aims to investigate the impact of the pandemic on global shipping by analyzing the Baltic Dry Index (BDI). The BDI is a metric that reflects the worldwide shipping costs and directs related to supply and demand conditions, making it an indicator of economic production. The study utilizes data from 2019 to 2021, before and after the outbreak of COVID-19, and considers 13 independent variables, including raw materials, energy, stock market indexes, global port calls, and confirmed COVID-19 cases to investigate how to influent the BDI. The study employs stepwise regression to select variables and build models before and after the pandemic. The findings reveal that the key factors affecting the freight index BDI before the outbreak are: international scrap steel prices, iron ore prices, and the Commodity Research Bureau Index. However, after the COVID-19 outbreak, the factors affecting the BDI changed to the Shanghai Index, global port calls, and the number of confirmed COVID-19 cases.
ARTICLE | doi:10.20944/preprints202205.0417.v1
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: COVID-19; Eswatini; risk mapping; Poisson regression
Online: 31 May 2022 (11:04:12 CEST)
COVID-19 national spikes had been reported at varying temporal scales as a result of differences in the driving factors. Factors affecting case load and mortality rates have varied between countries and regions. We investigated the association between various socio-economic, demographic and health variables with the spread on COVID-19 cases in Eswatini using the maximum likelihood estimation method for count data. A generalized Poisson regression (GPR) model was fitted with the data comprising of fifteen covariates to predict COVID-19 risk in Eswatini. The results showed that variables that were key determinants in the spread of the disease were those that included the proportion of elderly above 55 years at 98% (95% CI: 97%-99%) and the proportion of youth below 35 years at 0.08% (95% CI: 0.017%-38%) with a pseudo R-square of 0.72. However, in the early phase of the virus when cases were fewer, results from the Poisson regression showed that household size, household density and poverty index were associated with COVID-19. We produced a risk map of predicted COVID-19 in Eswatini using the variables that were selected at 5% significance level. The map could be used by the country to plan and prioritize health interventions against COVID-19. The identified areas of high risk may be further investigated in order to find out the risk amplifiers and assess what could be done to prevent them.
ARTICLE | doi:10.20944/preprints202107.0139.v1
Subject: Business, Economics And Management, Accounting And Taxation Keywords: circularity; waste streams; circular approaches; regression equation
Online: 6 July 2021 (11:40:19 CEST)
In this paper, the authors identified key elements important for circularity: (1) Background: The primary goal of circularity is to eliminate waste and to prove the constant use of resources. In the paper, we classify studies according to circular approaches. The authors identified main elements and classified them into categories important for circularity, starting with the managing and reducing waste and the recovery of resources; and ending with the circularity of material, and general circularity-related topics and presented scientific works dedicated to each of the above-mentioned categories. The authors analyzed several core elements from the first category aiming to investigate and connect different waste streams and provided a regression model; (2) Methods: The authors used a dynamic regression model to identify relationships among variables and selected the ones, which has an impact on the increase of biowaste. The research was delivered for the 27 European Union countries during the period between 2020 and 2019; (3) Conclusions: The authors indicated that the recycling rate of wasted electrical equipment in the previous year has an impact on the increase of recycling biowaste next year. This is explained as non-metallic spare parts of electronic equipment are used as biowaste for fuel production. And the separation process of the composites of electric equipment takes some time, on average the effect is evident in one year period.
ARTICLE | doi:10.20944/preprints202012.0321.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: quantile regression; groundwater; environmental; multivariate; metals; health
Online: 14 December 2020 (10:13:09 CET)
One of the most important defining characteristics of groundwater quality is pH as it fundamentally controls the amount and chemical form of many organic and inorganic solutes in groundwater. Groundwater data are frequently characterized by a wide degree of variability of the factors which possibly influence pH distribution. For this reason, it is challenging to link the spatio-temporal dynamics of pH to a single environmental factor by the ordinary least squares regression technique of the conditional mean. In this study, quantile regression was used to estimate the response of pH to nine environmental factors (As, Cd, Fe, Mn, Pb, turbidity, electrical conductivity, total dissolved solids and nitrates). Results of 25%, 50%, 75% quantile regression and ordinary least squares (OLS) regression were compared. The standard regression of the conditional means (OLS) underestimated the rates of change of pH due to the selected factors in comparison with the regression quantiles. The effect of arsenic increased for sampling locations with higher pH values (higher quantiles) likewise the influence of Pb and Mn. However, the effects of Cd and Fe decreased for sampling locations in higher quantiles. It can be concluded that these detected heterogeneities would be missed if this study had focused exclusively on the conditional means of the pH values. Consequently, quantile regression provides a more comprehensive account of possible spatio-temporal relationships between environmental covariates in groundwater. This study is one of the first to apply this technique on groundwater systems in sub-Saharan Africa. The approach is useful and interesting and has broad application for other mining environments especially tropical low-income countries where climatic conditions can drive rapid cycling or transformations of pollutants. It is also pertinent to geopolitical contexts where regulatory; monitoring and management capacities are weak and where mining pollution of groundwater largely occur.
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Crime prediction; Ensemble Learning; Machine Learning; Regression
Online: 14 September 2020 (00:53:30 CEST)
While the use of crime data has been widely advocated in the literature, its availability is often limited to large urban cities and isolated databases tend not to allow for spatial comparisons. This paper presents an efficient machine learning framework capable of predicting spatial crime occurrences, without using past crime as a predictor, and at a relatively high resolution: the U.S. Census Block Group level. The proposed framework is based on an in-depth multidisciplinary literature review allowing the selection of 188 best-fit crime predictors from socio-economic, demographic, spatial, and environmental data. Such data are published periodically for the entire United States. The selection of the appropriate predictive model was made through a comparative study of different machine learning families of algorithms, including generalized linear models, deep learning, and ensemble learning. The gradient boosting model was found to yield the most accurate predictions for violent crimes, property crimes, motor vehicle thefts, vandalism, and the total count of crimes. Extensive experiments on real-world datasets of crimes reported in 11 U.S. cities demonstrated that the proposed framework achieves an accuracy of 73 and 77% when predicting property crimes and violent crimes, respectively.
REVIEW | doi:10.20944/preprints201910.0362.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: tuberculosis (TB); human immunodeficiency virus (HIV); Acquired Immune Deficiency Syndrome (AIDS); World Health Organization (WHO); panel data; poisson; negative binomial; regression
Online: 31 October 2019 (04:33:45 CET)
Tuberculosis cause of death worldwide and the leading cause from a single infectious agent, ranking above Human immunodeficiency virus (HIV) and Acquired Immune Deficiency Syndrome (AIDS). The aim of this study is to ascertain the trend of tuberculosis prevalence and the effect of HIV prevalence onl Tuberculosis case in some West African countries from 2000 to 2016 using count panel data regression models. The data used annual HIV and Tuberculosis cases spanning from 2000 to 2016 extracted from online publication of World health Organization (WHO). Panel Poisson regression model and Negative binomial regression model for fixed and random effects were used to analyzed the count data, the result revealed a positive trend in TB cases while increased in HIV cases leads to increase in TB cases in West African countries. Among the competing models used in this study, Panel Negative Binomial Regression Model with fixed effect emerged the best model with log likelihood value of -1336.554. This study recommended that Government and NGOs need more strategies to fight against HIV menace in West Africa as this will in turn reduced TB cases in West Africa.
REVIEW | doi:10.20944/preprints202111.0310.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Functional Data Analysis (FDA); Hybrid Data; Semi-Functional Partial Linear Regression Model (SFPLR); Partial Functional Linear Regression; Literature Review
Online: 17 November 2021 (15:21:19 CET)
Background: In the functional data analysis (FDA), the hybrid or mixed data are scalar and functional datasets. The semi-functional partial linear regression model (SFPLR) is one of the first semiparametric models for the scalar response with hybrid covariates. Various extensions of this model are explored and summarized. Methods: Two first research articles, including “semi-functional partial linear regression model”, and “Partial functional linear regression” have more than 300 citations in Google Scholar. Finally, only 106 articles remained according to the inclusion and exclusion criteria such as 1) including the published articles in the ISI journals and excluding 2) non-English and 3) preprints, slides, and conference papers. We use the PRISMA standard for systematic review. Results: The articles are categorized into the following main topics: estimation procedures, confidence regions, time series, and panel data, Bayesian, spatial, robust, testing, quantile regression, varying Coefficient Models, Variable Selection, Single-index model, Measurement error, Multiple Functions, Missing values, Rank Method and Others. There are different applications and datasets such as the Tecator dataset, air quality, electricity consumption, and Neuroimaging, among others. Conclusions: SFPLR is one of the most famous regression modeling methods for hybrid data that has a lot of extensions among other models.
ARTICLE | doi:10.20944/preprints202312.0092.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Active Learning; Design of experiments; Regression; s-PGD
Online: 1 December 2023 (15:04:37 CET)
Machine learning approaches are currently used to understand or model complex physical systems. In general, a substantial number of samples must be collected to create a model with reliable results. However, collecting numerous data is often relatively time-consuming or expensive. Moreover, the problems of industrial interest tend to be more and more complex and depending on a high number of parameters. High dimensional problems intrinsically involve the need of large data amount through the curse of dimensionality. That is why, new approaches based on smart sampling techniques are investigated to minimize the number of samples to be given to train the model, such as Active Learning methods. Here, we propose a technique based on a combination of Fisher information matrix and of Sparse Proper Generalized Decomposition that enables the definition of a new Active Learning informativeness criterion in high dimensions. We provide examples proving the performances of this technique on a theoretical 5D polynomial function and on an industrial crash simulation application. The results prove that the proposed strategy over-perform the usual ones.
ARTICLE | doi:10.20944/preprints202311.1435.v1
Subject: Business, Economics And Management, Finance Keywords: Exchange Rate Volatility; Exports; NARDL; Smooth Threshold Regression
Online: 22 November 2023 (13:48:53 CET)
This research paper aimed to examine the impact of exchange rate volatility on South Africa's exports from 1994 Q1 to 2023 Q2. The study used the Augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) tests to test for stationarity. The nonlinear autoregressive distributed lag (NARDL) model and smooth threshold regression (STR) are employed to analyse the relationship between exchange rate volatility and exports. The GARCH (1.1) technique is used to construct the exchange rate volatility data. The results of the stationarity tests reveal that variables are either integrated in order I(0) or I(1). This implies that the variables used in this study are stationary, which is crucial for conducting accurate analyses. Moreover, the NARDL test approach provided insights into the long-run effects of exchange rate volatility on South Africa's exports. Based on the NARDL test, positive shocks have a greater but statistically insignificant effect on exports than negative shocks. Therefore, a greater level of exchange rate volatility may lead to increased exports from South Africa. Furthermore, the STR also reveals that the impact of exchange rate volatility is insignificant. These findings provide valuable insights for policymakers and firms to make informed decisions regarding exchange rate management and export strategies in South Africa.
REVIEW | doi:10.20944/preprints202310.1913.v3
Subject: Engineering, Civil Engineering Keywords: Solar PV system; Regression Model; DOE; Solar energy; Fossil fuels
Online: 9 November 2023 (10:58:47 CET)
AbstractTo overcome the negative impacts on the environment and other problems associated with fossil fuels have forced many countries to inquire into and change to environmentally friendly alternatives that are renewable to sustain the increasing energy demand. Solar energy is one of the best renewable energy sources with the least negative impacts on the environment. Different countries have formulated solar energy policies to reduce dependence on fossil fuel and increasing domestic energy production by solar energy. According to the 2010 BP Statistical Energy Survey, the world cumulative installed solar energy capacity was 22928.9 MW in 2009, a change of 46.9% compared to 2008. In this study, a PV generation system has been modeled and installed considering uncertain whether based on the hourly wind speed data of New York City (NYC) of year 2014. Regression models has been used to forecast the hourly, weekly, and monthly wind speed of NYC year 2014. Design of experiment (DOE) has been used to determine the optimal panel size (area), the battery capacity size, and other levels of factors.
ARTICLE | doi:10.20944/preprints202308.0823.v1
Subject: Engineering, Bioengineering Keywords: chicken egg fertility; classification; PLS regression; hyperspectral imaging
Online: 10 August 2023 (08:59:12 CEST)
Partial least square (PLS) regression is a well-known chemometric method used for predictive modelling, especially in the presence of many variables. Although PLS was not initially developed as a technique for classification tasks, scientists have reportedly used this approach successfully for discrimination purposes. Whereas some non-supervised learning approaches including but not limited to PCA, and k-means clustering do well in identifying/understanding grouping and clustering patterns in multidimensional data, they are limited when the end target is discrimination, making PLS a preferable alternative. A total of fertilized 672 chicken egg hyperspectral imaging data, consisting of 336 white eggs and 336 brown eggs were used in this study. Hyperspectral images in the NIR region of 900-1700 nm wavelength range were captured prior to incubation on day 0 and on days 1-4 after incubation. Eggs were candled on incubation day 5 and broken out on day 10 to confirm fertility. While a total number of 312 and 314 eggs were found to be fertile in the brown and white egg batches respectively, total numbers of non-fertile eggs in the same set of batches were 23 and 21 respectively. Spectral information was extracted from a segmented region of interest (ROI) of each hyperspectral image and spectral transmission characteristics were obtained by averaging the spectral information. A moving-thresholding technique was implemented for discrimination based on PLS regression results on the calibration set. With true positive rates (TPR) of up to 100% obtained at selected threshold values of between 0.50-0.85 and on different days of incubation, the results indicated that the proposed PLS technique can accurately discriminate between fertile and non-fertile eggs. The adaptive PLS approach was thereby presented as suitable for handling hyperspectral imaging-based chicken egg fertility data
ARTICLE | doi:10.20944/preprints202211.0227.v1
Subject: Medicine And Pharmacology, Orthopedics And Sports Medicine Keywords: Bayesian; cardiovascular disease; CVD; cross-sectional; logistic regression
Online: 14 November 2022 (01:55:06 CET)
Background: Cardiovascular disease (CVD) has been one of the leading causes of death and disability-adjusted life years lost worldwide. Blood pressure, lipid, and cholesterol are good predictors of CVD risk and correspond upon age and physical fitness. However, few studies have explored the variation trend of CVD risk factors across different populations upon age and their muscle strength. Objective: to analysis the variation tendency of CVD risk factors in blood according to age and relative grip strength among different populations. Method: 25363 participants were recruited in this cross-sectional study and 24709 were included in the analysis. A logistic regression and a Bayesian probabilistic analysis based on Markov Chain Monte Carlo (MCMC) Modeling is conducted to build probability prediction models of hypertension, hyperlipidemia, and hypercholesterolemia according to age, relative grip strength, body weight conditions, and physical activity levels. Results: 1) age might be the main influence factor of hypertension, which is regarded as one of the primary CVD risk factors. However, although keeping a high level of physical activity might have positive effect on preventing hypertension because that individuals with normal body weight and higher physical activity shows a lower probability of being diagnosed with hypertension, it might could not prevent individuals from getting hypertension with age. 2) After 60, individuals of normal body weight seem more likely to have hyperlipidemia than those are overweight or obese. 3) Larger relative grip strength might not be able to offset the negative effects of obesity, overweight and physical inactivity on hyperlipidemia. 4) The probability of getting hypercholesterolemia varies less with age and relative grip strength. Conclusion: Body weight management and keeping high levels of physical activity are recommended at any age. It might benefit to increase some bodyweight after 60 years old.
REVIEW | doi:10.20944/preprints202210.0391.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Tillage; Traction; Compaction; Neural networks; Support vector regression
Online: 26 October 2022 (02:07:19 CEST)
Soil working tools, implements, and machines are inevitable in mechanized agriculture. The soil-tool/machine interaction is a multivariate, dynamic, and intricate process. The accurate interpretation, description, and modeling of a soil-machine interaction is key to providing a solution to sustainable crop production by reducing energy input, excessive soil pulverization, and compaction. The traditional method provides insight into soil-machine interaction but often provides inadequate solutions and lacks broad applicability. Computational intelligence (CI) is a comprehensive class of approaches that rely on approximate information to solve complex problems. The CI method has been extensively studied and applied in soil tillage and traction domain in recent decades. The study critically reviews the CI techniques implemented in soil-machine interactions, especially in the context of tillage, traction, and compaction. The traditional methods and their limitation are discussed. The fundamental of CI methods and a detailed overview of the most popular methods are provided. The study reviews and summarizes the 50 selected articles on soil-machine interaction studies where CI methods were employed. It discusses the strength and limitations of employed CI methods. It also suggests the emergent CI methods and future applications are discussed. The outlined study would serve as a concise reference and a quick and systematic way to understand the applicable CI methods that allow crucial farm management decision-making.
ARTICLE | doi:10.20944/preprints202106.0533.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: COVID-19; Vaccine; Prediction; Regression; Ensemble learning; AdaBoost
Online: 22 June 2021 (08:30:30 CEST)
The novel coronavirus disease (COVID-19) has created immense threats to public health on various levels around the globe. The unpredictable outbreak of this disease and the pandemic situation are causing severe depression, anxiety and other mental as physical health related problems among the human beings. To combat against this disease, vaccination is essential as it will boost the immune system of human beings while being in the contact with the infected people. The vaccination process is thus necessary to confront the outbreak of COVID-19. This deadly disease has put social, economic condition of the entire world into an enormous challenge. The worldwide vaccination progress should be tracked to identify how fast the entire economic as well as social life will be stabilized. The monitor ofthe vaccination progress, a machine learning based Regressor model is approached in this study. This tracking process has been applied on the data starting from 14th December, 2020 to 24th April, 2021. A couple of ensemble based machine learning Regressor models such as Random Forest, Extra Trees, Gradient Boosting, AdaBoost and Extreme Gradient Boosting are implemented and their predictive performance are compared. The comparative study reveals that the AdaBoostRegressor outperforms with minimized mean absolute error (MAE) of 9.968 and root mean squared error (RMSE) of 11.133.
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Diagnosing designs; rare diseases; statistics; regression; block designs
Online: 2 June 2021 (12:14:34 CEST)
Far too often, one meets patients who went for years or even decades from doctor to doctor, without getting a valid diagnosis. This brings pain to millions of patients and their families, not to speak of the enormous costs. Often patients cannot tell precisely enough which factors (or combinations thereof) trigger their problems. If conventional methods fail, we propose the use of statistics and algebra to give doctors much more useful inputs from patients. We use statistical regression for independent triggering factors for medical problems, and “balanced incomplete block designs” for non-independent factors. These methods can supply doctors with much more valuable inputs, and can also detect combinations of multiple factors by incredibly few tests. In order to show that these methods do work, we briefly describe a case in which these methods helped to solve a 60 year old problem in a patient, and give some more examples where these methods might be very useful. As a conclusion, while regression is used in clinical medicine, it seems to be widely unknown in diagnosing. Statistics and algebra can save the health systems much money, and the patients also a lot of pain.
ARTICLE | doi:10.20944/preprints202103.0586.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: NVOC; phytoncide; bamboo grove; monoterpene; microclimate; regression analysis
Online: 24 March 2021 (13:10:25 CET)
After the COVID-19 outbreak, more and more people are seeking physiological and psychological healing by visiting the forest as the time of stay-at-home became longer. NVOC, a major healing factor of forests, has several positive effects on human health, and this study researched about the NVOC characteristics of bamboo groves. The study revealed that α-pinene, 3-carene, and camphene were the most emitted, and the largest amount of NVOC was emitted in the early morning and late afternoon in bamboo groves. Furthermore, NVOC emission was found to have normal correlations with temperature and humidity, and inverse correlations with solar radiation, PAR and wind speed. A regression analysis conducted to predict the effect of microclimate factors on NVOC emissions resulted in a regression equation with 82.9% explanatory power and found that PAR, temperature, and humidity had a significant effect on NVOC emission prediction. In conclusion, this study investigated NVOC emission characteristics of bamboo groves, examined the relationship between NVOC emissions and microclimate factors and derived a prediction equation of NVOC emissions to figure out bamboo groves' forest healing effects. These results are expected to provide a basis for establishing more effective forest healing programs in bamboo groves.
ARTICLE | doi:10.20944/preprints202008.0329.v2
Subject: Medicine And Pharmacology, Epidemiology And Infectious Diseases Keywords: COVID-19; Geospatial Regression; Health Disparities; Public Health
Online: 11 September 2020 (09:48:57 CEST)
COVID-19 is a potentially fatal viral infection. This study investigates geography, demography, socioeconomics, health conditions, hospital characteristics, and politics as potential explanatory variables for death rates at the state and county levels. Data from the Centers for Disease Control and Prevention, the Census Bureau, Centers for Medicare and Medicaid, Definitive Healthcare, and USAfacts.org were used to evaluate regression models. Yearly pneumonia and flu death rates (state level, 2014-2018) were evaluated as a function of the governors’ political party using repeated measures analysis. At the state and county level, spatial regression models were evaluated. At the county level, we discovered a statistically significant model that included geography, population density, racial and ethnic status, three health status variables along with a political factor. State level analysis identified health status, minority status, and the interaction between governors’ parties and health status as important variables. The political factor, however, did not appear in a subsequent analysis of 2014-2018 pneumonia and flu death rates. The pathogenesis of COVID-19 has greater and disproportionate effect within racial and ethnic minority groups, and the political influence on the reporting of COVID-19 mortality was statistically relevant at the county level and as an interaction term only at the state level.
ARTICLE | doi:10.20944/preprints201906.0291.v1
Subject: Medicine And Pharmacology, Internal Medicine Keywords: endothelial disorders; glycocalyx injury; syndecan-1; nonlinear regression
Online: 28 June 2019 (07:42:18 CEST)
Endothelial disorders are related to various diseases. An initial endothelial injury is characterized by endothelial glycocalyx injury. We aimed to evaluate endothelial glycocalyx injury by measuring serum syndecan-1 concentrations in patients during comprehensive medical examinations. A single-center, prospective, observational study was conducted at Asahi University Hospital. The participants enrolled in this study were 1313 patients who underwent comprehensive medical examinations at Asahi University Hospital from January 2018, to June 2018. One patient undergoing hemodialysis was excluded from the study. At enrollment, blood samples were obtained, and study personnel collected demographic and clinical data. No treatments or exposures were conducted except for standard medical examinations and blood sample collection. Laboratory data were obtained by collection of blood samples at the time of study enrolment. According to nonlinear regression, the concentrations of serum syndecan-1 were significantly related to age (p = 0.016), aspartic aminotransferase concentration (AST, p = 0.020), blood urea nitrogen concentration (BUN, p = 0.013), triglyceride concentration (p < 0.001), and hematocrit (p = 0.006). These relationships were independent associations. Endothelial glycocalyx injury, which is reflected by serum syndecan-1 concentrations, is related to age, hematocrit, AST concentration, BUN concentration, and triglyceride concentration.
ARTICLE | doi:10.20944/preprints201811.0096.v1
Subject: Computer Science And Mathematics, Information Systems Keywords: machine learning; stacking; forecasting; regression; sales; time series
Online: 5 November 2018 (09:54:54 CET)
In this paper, we study the usage of machine learning models for sales time series forecasting. The effect of machine learning generalization has been considered. A stacking approach for building regression ensemble of single models has been studied. The results show that using stacking technics, we can improve the performance of predictive models for sales time series forecasting.
ARTICLE | doi:10.20944/preprints201608.0025.v2
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: solar variability; NAO; ENSO; volcanic eruptions; multiple regression
Online: 17 May 2017 (06:27:16 CEST)
The role of natural factors mainly solar eleven-year cycle variability, and volcanic eruptions on two major modes of climate variability the North Atlantic Oscillation (NAO) and El Niño Southern Oscillation (ENSO) are studied for around last 150 years period. The NAO is the primary factor to regulate Central England Temperature (CET) during winter throughout the period, though NAO is impacted differently by other factors in various time periods. Solar variability indicates a strong positive influence on NAO during 1978-1997, though suggests opposite in earlier period. Solar NAO lag relationship is also shown sensitive to the chosen times of reference and thus points towards the previously proposed mechanism/ relationship related to the sun and NAO. The ENSO is influenced strongly by solar variability and volcanic eruptions in certain periods. This study observes a strong negative association between the sun and ENSO before the 1950s, which is even opposite during the second half of 20th century. The period 1978-1997, when two strong eruptions coincided with active years of strong solar cycles, the ENSO, and volcano suggested a stronger association, and we discussed the important role played by ENSO. That period showed warming in central tropical Pacific while cooling in the North Atlantic with reference to the later period (1999-2017) and also from chosen earlier period. Here we show that the mean atmospheric state is important for understanding the connection between solar variability, the NAO and ENSO and associated mechanism. It presents a critical analysis to improve knowledge about major modes of variability and their role in climate. We also discussed the importance of detecting the robust signal of natural variability, mainly the sun.
ARTICLE | doi:10.20944/preprints202304.1023.v1
Subject: Social Sciences, Safety Research Keywords: vehicle crash data; collision risk; ordinal logistic regression; multinomial logistic regression; proportional odds model (POM); partial proportional odds model (PPOM)
Online: 27 April 2023 (04:02:49 CEST)
The use of logistic regression models in data analysis and machine learning has expanded in recent years and has become the primary preference of researchers in risk assessment studies across a wide range of scientific fields. From the assessment of credit risk in financial institutions to the estimation of risk factors for traffic accidents or the identification of etiological factors for chronic diseases. All logistic models are natural extensions of the simple binary model, and their interpretation is based on it. Using the data of a cross-sectional study on the risk factors of traffic collisions, the two main extended models of logistic techniques, multinomial and ordinal logistic regression, are presented in the article in detail. Emphasis is placed on the use of ordinal regression since the outcome variable of the collision data is defined as ordinal measurement reflecting a latent continuous scale.
ARTICLE | doi:10.20944/preprints202011.0363.v1
Subject: Chemistry And Materials Science, Analytical Chemistry Keywords: cannabinoid receptor 1; synthetic cannabinoids; quantitative structure-activity relationship; multiple linear regression; partial least squares regression; dependence and abuse potential
Online: 13 November 2020 (07:19:36 CET)
In recent years, there have been frequent reports on the adverse effects of synthetic cannabinoid (SC) abuse. SCs cause psychoactive effects, similar to those caused by marijuana, by binding and activating cannabinoid receptor 1 (CB1R) in the central nervous system. The aim of this study was to establish a reliable quantitative structure-activity relationship (QSAR) model to correlate the structures and physicochemical properties of various SCs with their CB1R-binding affinities. We prepared 15 SCs and their derivatives (tetrahydrocannabinol [THC], naphthoylindoles, and cyclohexylphenols) and determined their binding affinity to CB1R, which is known as a dependence-related target. We calculated the molecular descriptors for dataset compounds using an R/CDK (R package integrated with CDK, version 3.5.0) toolkit to build QSAR regression models. These models were established and statistical evaluations were performed using the mlr and plsr packages in R software. The most reliable QSAR model was obtained from the partial least squares regression method via external validation. This model can be applied in vivo to predict the addictive properties of illicit new SCs. Using a limited number of dataset compounds and our own experimental activity data, we built a QSAR model for SCs with good predictability. This QSAR modeling approach provides a novel strategy for establishing an efficient tool to predict the abuse potential of various SCs and to control their illicit use.
ARTICLE | doi:10.20944/preprints202312.0131.v1
Subject: Engineering, Civil Engineering Keywords: epoxy resin; grout; creep; strength; permeability; porosity; regression analysis
Online: 5 December 2023 (06:08:16 CET)
The aim of this research was to undertake laboratory testing to investigate the beneficial effects of epoxy resin grouts on the physical and mechanical properties of sands with a wide range of granulometric characteristics. Six sands, of different particle size and uniformity coefficients, were grouted using epoxy resin solutions with three ratios of epoxy resin to water (3.0, 2.0 and 1.5). A set of unconfined compressive strength tests were conducted on grouted samples at different curing periods and a set of long-term unconfined compressive creep tests in dry and wet conditions after 180 days of curing were also carried out, in order to evaluate the development of the mechanical properties of the sands, as well as, the impact of water on them. The findings of the investigation showed that epoxy resin resulted in appreciable strength values in the specimens, especially those of fine sands, grouted with the different epoxy resin grouts. In general, the compressive strength varied between 0.68 - 5.60 MPa and the modulus of elasticity between 75 - 480 MPa, after a curing period of 180 days. In terms of physical properties, the permeability and porosity (before and after the grouting process) were estimated. Grouts with an epoxy resin to water ratio of 3 decreased permeability by up to four orders of magnitude. Using laboratory results and regression analysis, three mathematical equations were developed that relate each of the dependent variables; compressive strength, elastic modulus, and coefficient of permeability, with particular explanatory variables.
ARTICLE | doi:10.20944/preprints202311.0350.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: 3D segmentation; feature extraction; regression machine learning; weight estimation
Online: 6 November 2023 (11:20:30 CET)
Accurate weight measurement is pivotal for monitoring the growth and well-being of cattle. However, the conventional weighing process, which involves physically placing cattle on scales, is labor-intensive and distressing for the animals. Hence, the development of automated cattle weight prediction techniques assumes critical significance. This study proposes a weight prediction approach for Korean cattle using 3D segmentation-based feature extraction and regression machine learning techniques from incomplete 3D shapes acquired from real farm environments. In the initial phase, we generated mesh data of 3D Korean cattle shapes using a multiple-camera system. Subsequently, deep learning-based 3D segmentation with the PointNet network model was employed to segment two dominant parts of the cattle. From these segmented parts, three crucial dimensions of Korean cattle were extracted. Finally, we implemented five regression machine learning models (CatBoost regression, LightGBM, Polynomial regression, Random Forest regression, and XGBoost regression) for weight prediction. To validate our approach, we captured 270 Korean cattle in various poses, totaling 1190 poses of 270 cattle. The best result was achieved with mean absolute error (MAE) of 25.2 kg and mean absolute percent error (MAPE) of 5.81% using the random forest regression model.
ARTICLE | doi:10.20944/preprints202309.0755.v1
Subject: Medicine And Pharmacology, Endocrinology And Metabolism Keywords: diabetes; CGM; hypoglycemia; hyperglycemia; prediction; ARIMA; logistic regression; LSTM
Online: 12 September 2023 (16:53:51 CEST)
Background: Novel technologies like continuous glucose monitor (CGM) systems are improving diabetes management by means of real-time sensor glucose levels, retrospective course of glucose and trend arrows. Continuous Glucose Monitoring (CGM) offers real-time alerts for (prognostic) hypo- and hyperglycemia, fast dropping or increasing glucose, and hence improving glycaemia under unstable conditions like during meals, physical activity and exercise management. Complex CGM systems challenge people with diabetes and health care professionals in interpreting rapid changes, sensor delay (~10-minute difference between interstitial and plasma glucose), and malfunctions. Enhanced prediction models are necessary for optimal insulin dosing, daily activities, and especially for future fully closed-loop systems. Methods: The aim of this study was to investigate the efficacy of three different predictive models for glucose responses: 1) an autoregressive integrated moving average model (ARIMA), 2) logistic regression, 3)and long short-term memory networks (LSTM), in predicting glucose levels after 15 minutes and one hour. We compared and evaluated the performance of these models in predicting hypoglycemia (<70 mg/dL), euglycemia (70-180 mg/dL), and hyperglycemia (>180 mg/dL). In more detail, by assessing metrics such as precision, recall, F1-score, and accuracy, we specifically assessed which model provided the most accurate and reliable predictions for glucose levels Results: As expected, ARIMA showed the worst accuracy especially predicting hypoglycaemia withing 1-hour (7.3%). The accuracy of the logistic regression model, predicting hypoglycemia during the first 15 min was higher (98%), comparing to LSTM (88%). However, the LSTM model (87%) exceeded the accuracy of hypoglycemia prediction of the logistic regression (83%) during an hour prognosis. The same pattern observed in hyperglycemia - ARIMA model (60%, 1 hour), logistic regression (96%, 15 minutes) and LSTM (85%, 1 hour) Conclusions: These findings suggest that different models may have varying strengths and weaknesses in predicting glucose levels, and the choice of model should be carefully considered based on the specific requirements and context of the clinical application. The logistic regression model was more accurate for the next 15 minutes, especially predicting hypoglycemia. However, the LSTM model exceeded logistic regression for the next one hour prediction. Future research could explore hybrid models or ensemble approaches that combine the strengths of multiple models to further improve the accuracy and reliability of glucose predictions.
ARTICLE | doi:10.20944/preprints202309.0302.v1
Subject: Computer Science And Mathematics, Robotics Keywords: stabilization; symbolic regression; synthesized control; evolutionary computations; quadcopter model
Online: 5 September 2023 (10:11:12 CEST)
The development of artificial intelligence systems assumes that a machine can independently generate an algorithm of actions or a control system to solve the tasks. To do this, the machine must have a formal description of the problem and possess computational methods for solving it. The article deals with the problem of optimal control, which is the main task in the development of control systems, insofar as all systems being developed must be optimal from the point of view of a certain criterion. However, there are certain difficulties in implementing the resulting optimal control modes. The paper considers an extended formulation of the optimal control problem, which implies the creation of such systems that would have the necessary properties for its practical implementation. To solve it, an adaptive synthesized optimal control approach based on the use of numerical methods of machine learning is proposed. The method moves the control object, optimally changing the position of the stable equilibrium point in the presence of some initial position uncertainty. As a result, from all possible synthesized controls, he chooses one that is less sensitive to changes in the initial states. As an example, the optimal control problem of quadcopter with complex phase constraints is considered. To solve this problem? according to the proposed approach, the control synthesis problem is firstly solved to obtain a stable equilibrium point in the state space by a machine learning method of symbolic regression. After that optimal positions of the stable equilibrium point are searched according to source functional from the optimal control problem by particle swarm optimization algorithm. It is shown that such approach allows generating the control system automatically by computer basing on the formal statement of the problem and then directly implementing it onboard as far as they have already had a stabilization system inserted.
ARTICLE | doi:10.20944/preprints202308.1978.v1
Subject: Biology And Life Sciences, Life Sciences Keywords: biomarker, LLM, interpretability, scRNA-seq, machine learning, symbolic regression
Online: 30 August 2023 (03:53:31 CEST)
Single-cell RNA sequencing (scRNA-seq) technology has significantly advanced our understanding of the diversity of cells and how this diversity is implicated in diseases. Yet, translating these findings across various scRNA-seq datasets poses challenges due to technical variability and dataset-specific biases. To overcome this, we present a novel approach that employs both an LLM-based framework and explainable machine learning to facilitate generalization across single-cell datasets and identify gene signatures to capture disease-driven transcriptional changes. Our approach uses scBERT, which harnesses shared transcriptomic features among cell types to establish consistent cell-type annotations across multiple scRNA-seq datasets. Additionally, we employ a symbolic regression algorithm to pinpoint highly relevant yet minimally redundant models and features for inferring a cell type’s disease state based on its transcriptomic profile. We ascertain the versatility of these cell-specific gene signatures across datasets, showcasing their resilience as molecular markers to pinpoint and characterize disease-associated cell types. Validation is carried out using four publicly available scRNA-seq datasets from both healthy individuals and those suffering from ulcerative colitis (UC). This demonstrates our approach’s efficacy in bridging disparities specific to different datasets, fostering comparative analyses. Notably, the simplicity and symbolic nature of the retrieved gene signatures facilitate their interpretability, allowing us to elucidate underlying molecular disease mechanisms using these models.
ARTICLE | doi:10.20944/preprints202308.0314.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: Hail; Lightning; Climate change; Regression analysis; Trends; Reanalysis data
Online: 3 August 2023 (10:07:40 CEST)
We have developed additive logistic models for the occurrence of lightning, large (≥ 2 cm), and very large (≥ 5 cm) hail to investigate the evolution of these hazards in the past, in the future, and for forecasting applications. The models, trained with lightning observations, hail reports, and predictors from atmospheric reanalysis, assign an hourly probability to any location and time on a 0.25° × 0.25° × 1-hourly grid as a function of reanalysis-derived predictor parameters, selected following an ingredients- based approach. The resulting hail models outperform the Significant Hail Parameter and the simulated climatological spatial distributions and annual cycles of lightning and hail are consistent with observations from storm report databases, radar, and lightning detection data. As a corollary result, CAPE released above the -10°C isotherm was found to be a more universally skilful predictor for large hail than CAPE. In the period 1950–2021, the models applied to the ERA5 reanalysis indicate significant increases of lightning and hail across most of Europe, primarily due to rising low-level moisture. The strongest modelled hail increases occur in northern Italy with increasing rapidity after 2010. Here, very large hail has become 3 times more likely than it was in the 1950s. Across North America trends are comparatively small, apart from isolated significant increases in the direct lee of the Rocky Mountains and across the Canadian Plains. In the southern Plains, a period of enhanced storm activity occurred in the 1980s and 1990s.
ARTICLE | doi:10.20944/preprints202307.0405.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Jackknife; Kibria-Lukman; estimator; Maximum Likelihood; Negative Binomial regression
Online: 6 July 2023 (08:58:10 CEST)
The negative binomial regression model (NBRM) is a generalized linear model which relaxes the restrictive assumption by the Poisson regression model when the variance is equal to the mean. The estimation of the parameters of the NBRM is obtained using the maximum likelihood (ML) method. Maximum likelihood estimator becomes unstable when the explanatory variables are linearly dependent, a situation known as multicollinearity. Based on this, we developed a new estimator called modified jackknifed Negative Binomial Kibria-Lukman (MJNBKL) estimator for the radiation of multicollinearity in NBRM using four different biasing (shrinkage) parameters. We establish superiority condition for MJNBKL estimator over the ones. The performance MJNBKL estimator was ascertained by comparing it with the existing ones through a Monte Carlo simulation study and two real life application datasets. The results of the simulation and real life application show that MJNBKL estimator outperformed the other estimators compared with by having the smallest MSE across all sample sizes and for different levels of correlation for the four biasing parameters used and the third biasing parameter is the optimal shrinkage parameter with the lowest MSE.
REVIEW | doi:10.20944/preprints202303.0401.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Strawman fallacy; UK General Medical Council; autism; regression; MMR
Online: 22 March 2023 (14:39:30 CET)
Background: Articles published in scholarly journals form part of the scientific evidence base. It is the responsibility of the scientific community to maintain its integrity. In 2011 the BMJ commissioned a feature article to draw attention to an article that had appeared in another journal- The Lancet 13 years previously. The Lancet had already retracted the article. These actions exemplify the best traditions of scientific record-keeping. Objective: This submission examines whether the main claims summary made in the BMJ were factual. Method: We examine what was published in the Lancet against what was published in the BMJ and verify against the findings in the GMC hearings transcripts and verdict of the UK High Court. Results: The 6 points highlighted in BMJ had errors and need to be corrected. Conclusions: There are significant differences between what was reported in the Lancet paper and what was alleged to be there by the BMJ. This article aims only to point to errors in the BMJ article, to set the record straight. It does not show there was a causal association between MMR vaccination and autism.
ARTICLE | doi:10.20944/preprints202210.0078.v1
Subject: Medicine And Pharmacology, Obstetrics And Gynaecology Keywords: Africa; Maternal mortality rate; Joinpoint regression analysis; mortality; trends.
Online: 7 October 2022 (10:30:10 CEST)
Background: United Nations Sustainable Development Goals state that by 2030, the Global maternal mortality rate (MMR) should be lower than 70 per 100,000 live births. MMR is still one of Africa's leading causes of death among women. This research aims to study regional trends in maternal mortality in Africa. Methods: We extracted data for Maternal mortality rates per 100,000 births from the UNICE data bank from 2000 to 2017, being 2017 the last date available. Joinpoint regression was used to study the trends and estimate the annual percent change (APC). Results: Maternal mortality has decreased in Africa over the study period by an average APC of -3.0% (95% CI -2.9;-3,2%). All regions showed significant downward trends, with the sharpest decreases in the South. Only the North African region is close to the United Nations' sustainable development goals for Maternal mortality. The remaining sub-Saharan African regions are still far from achieving the goals. Conclusions: maternal mortality has decreased in Africa, especially in the South Africa region. The only region closed to the United Nations target is North Africa. The remaining sub-Saharan African regions are still far from achieving the goals. These results could be used for the development of Regional Policies.
ARTICLE | doi:10.20944/preprints202209.0353.v1
Subject: Medicine And Pharmacology, Obstetrics And Gynaecology Keywords: Africa; Maternal mortality rate; Joinpoint regression analysis; mortality; trends
Online: 23 September 2022 (03:06:07 CEST)
Background: United Nations Sustainable Development Goals state that by 2030, the Global maternal mortality rate (MMR) should be lower than 70 per 100,000 live births. MMR is still one of Africa's leading causes of death among women. This research aims to study regional trends in maternal mortality in Africa. Methods: We extracted data for Maternal mortality rates per 100,000 births from the World Bank database from 1990-2015. Joinpoint regression was used to study the trends and estimate the annual percent change (APC). Results: Maternal mortality has decreased in Africa over the study period by an average APC of -2.6%. All regions showed significant downward trends, with the sharpest decreases in East Africa. Only the North African region is close to the United Nations' sustainable development goals for Maternal mortality. The remaining sub-Saharan African regions are still far from achieving the goals. Conclusions: maternal mortality has decreased in Africa, especially in East Africa. The only region closed to the United Nations target is North Africa. The remaining sub-Saharan African regions are still far from achieving the goals. These results could be used for the development of Regional Policies.
ARTICLE | doi:10.20944/preprints202208.0445.v1
Subject: Business, Economics And Management, Economics Keywords: Adult children's education; parental longevity; truncated regression; emotional support.
Online: 26 August 2022 (04:18:44 CEST)
Background: Some developing countries, such as China, population is aging rapidly, meanwhile, the average years of schooling for residents is constantly increasing. However, the question of whether adult children’s education has an effect on the longevity of older parents, remains inadequately studied. Methods: This paper uses China Health and Retirement Longitudinal Survey (CHARLS) data to estimate the causal impact of adult children's education on their parents' longevity. Identification is achieved by using the truncated regression model and using historical education data as instrument variables for adult children’s education. Results: For every unit increase in adult children’s education, the father’s and mother’s longevity increased by 0.89 years and 0.75 years, respectively. Mechanism analysis shows that adult children's education has a significant positive impact on parents' emotional support, financial support and self-reported health. Further evidence shows that for every unit increase in adult children’s education, the father-in-law’s and mother-in-law’s longevity increased by 0.40 years and 0.46 years, respectively. Conclusions: It is conclusion that improving the level of adult children’s education can increase parents’ and parents-in-law’s longevity. Adult children’s education might contribute to the longevity of older parents by three channels that providing emotional, economic support and affecting parents’ health.
ARTICLE | doi:10.20944/preprints202205.0255.v1
Subject: Biology And Life Sciences, Biophysics Keywords: SILCS; hERG channel; Physicochemical properties; Multiple linear regression; FragMaps
Online: 19 May 2022 (08:46:24 CEST)
Human ether-a-go-go-related gene (hERG) potassium channel is well-known contributor to drug-induced cardiotoxicity and therefore an extremely important target when performing safety assessments of drug candidates. Ligand-based approaches in connection with quantitative structure active relationships (QSAR) analyses have been developed to predict hERG toxicity. Availability of the recent published cryogenic electron microscopy (cryo-EM) structure for the hERG channel opened the prospect for using structure-based simulation and docking approaches for hERG drug liability predictions. In recent time, the idea of combining structure- and ligand-based approaches for modeling hERG drug liability has gained momentum offering improvements in predictability when compared to ligand-based QSAR practices alone. The present article demonstrates uniting the structure-based SILCS (site-identification by ligand competitive saturation) approach in conjunction with physicochemical properties to develop predictive models for hERG blockade. This combination leads to improved model predictability based on Pearson’s R and percent correct (represents rank-ordering of ligands) metric for different validation sets of hERG blockers involving diverse chemical scaffold and wide range of pIC50 values. The inclusion of the SILCS structure-based approach allows determination of the hERG region to which compounds bind and the contribution of different chemical moieties in the compounds to blockade, thereby facilitating the rational ligand design to minimize hERG liability.
ARTICLE | doi:10.20944/preprints202205.0240.v1
Subject: Business, Economics And Management, Economics Keywords: Credit constraints; Export; SMEs; Instrumental variable; Probit regression; Vietnam
Online: 18 May 2022 (10:35:32 CEST)
Export participation and restricted access to external formal credit are two factors attracting meticulous attention from researchers and policymakers, especially in developing countries. Exploring the interactive relationship of these factors in both the static and dynamic models is the purpose of this study. The study uses data sets from small and medium-sized manufacturing enterprises (SMEs) in Vietnam for the period 2009 - 2015. The instrumental variable approach is implemented to deal with the endogenous variable problem in the model. The results show an effect of credit constraint on the firms’ exporting status, and continuous exports are likely to reduce the limit of credit constraint.
ARTICLE | doi:10.20944/preprints202205.0032.v1
Subject: Business, Economics And Management, Business And Management Keywords: digitalisation; sustainability; sustainable development goals; European Union; regression equations
Online: 5 May 2022 (10:24:13 CEST)
Digitalisation provides access to an integrated network of information that can benefit society, and business. Building digital network and society using digital means can create something unique opportunities to strategically address sustainable development challenges for the United Nations Targets (SDG) to ensure higher productivity, education and to equality oriented society. This point of view describes the potential of digitalisation for society and business of the future. The authors revise the links between digitalisation and sustainability in the European Union countries. The methodology for the research is suggested in the paper and linear regression method is applied. The results showed tiers with five SDG, focusing on society and business, and all these tiers are fixed in the constructed equations for each SDG. The suggested solution is statistically valid and proves the novelty of research. Among digitalisation indicators, only mobile-cellular subscriptions and fixed-broadband sub-basket prices in part have no effect on researched sustainable development indicators.
ARTICLE | doi:10.20944/preprints202112.0455.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: COVID- 19; Durbin-Watson statistic; Multiple Linear Regression; Multicollinearity
Online: 28 December 2021 (16:11:44 CET)
This paper will discuss the application of statistic modeling to interpret a health system crisis in Sri Lanka due to COVID- 19.A strong focus on the preventive approach and the contact tracing with the utilization of available resources in a rational manner describes Sri Lanka’s response towards COVID- 19 prevention and mitigation. The early contact tracing, preemptive quarantining, isolation, and treatment were implemented as a concerted effort. This approach, proven efficient during the early phase of the pandemic, was sustainable when there was a rapid increase in the COVID- 19 patients since July 2021, exceeding the health system capacity.The country’s COVID- 19 situation during the period from 01st of August 2021 to 31st of October 2021 was taken into consideration. Variables used for analysis were; total number of cases, recovered cases, comorbid and O2 dependent patients, ICU patients, and deaths. The regression model was applied to analyze the data by using the EViews 12 (x64) software application.The correlation coefficients of all the independent variables under consideration implies that they have a strong positive relationship with the number of deaths occurred during the said period. According to the computed multiple linear regression model, the number of positive cases and O2 dependents have a positive relationship with the dependent variable. Further, the Durbin- Watson stat value of the model and multicollinearity test reflect that it is free from serial correlation thereby the model is fit. From the perspective of epidemiological control, these findings highlight the importance of keeping the number of cases within the limits of health system capacity.
ARTICLE | doi:10.20944/preprints202111.0227.v1
Subject: Business, Economics And Management, Marketing Keywords: Lolita fashion; multiple regression; decision tree; social media; XGBoost
Online: 12 November 2021 (14:54:04 CET)
Despite extensively investigating the impact of social media on fashion products’ marketing, little evidence is available on how the platforms influence sales prediction. Focusing on Lolita fashion, this study investigates the impact of social media marketing on the sales volume prediction of fashion products. Essentially, we analyzed marketing data, including comments, likes, and shares from the Weibo social platform, to forecast future sales, examine how to enhance profit performance, and make production decisions. Using a quantitative approach, we tested three different prediction models, including multiple regression, decision tree, and XGBoost. The results revealed that increasing comments and decreasing the number of likes could significantly improve the sales volumes of Lolita products. In contrast, shares exerted a less significant impact on sales. Regarding prediction models, XGBoost was found to be the best method. In the fashion industry, social media is a useful tool for forecasting market trend. A limitation of this study is that only one social media platform was used to extract data, which might limit the generalization of the findings.
ARTICLE | doi:10.20944/preprints202105.0536.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: Argan biosphere reserve; Climate change; Rainfall; Temperature; Woodland regression
Online: 24 May 2021 (07:44:25 CEST)
This paper explores the effect of climate change on the regression of the Argan tree (Argania spinosa L. Skeels) woodland, focusing on the Argan Biosphere Reserve and especially in the Souss plain (Western Morocco). Rainfall and temperature data of four sites within the Argan Biosphere Reserve were analyzed over the last 60 years to assess any climatic change. Regression curves applied to the dataset showed an important decrease in rainfall (18 to 26 %) in the four locations as well as an increase in temperature (1 to 2 °C). These changes may have a detrimental effect on the Argan woodland although human factors have been reported to be the main factor of its regression. It can therefore be concluded that the reduction in rainfall and the increase in temperature should now be considered as factors of Argan woodland regression.