Subject: Social Sciences, Accounting Keywords: performance analysis; elite football; multivariate analysis; principal components analysis; LaLiga
Online: 8 February 2021 (16:18:14 CET)
The use of principal components analysis provided information about the main characteristics of teams, based on a set of indicators, instead of displaying individualized information for each of these indicators. In this work we have considered reducing an extensive data matrix to improve interpretation, using the principal components analysis. Subsequently, with new components and with a multiple linear regression, we have carried out a comparative analysis between the best and bottom teams of LaLiga. The sample consisted of the matches corresponding to the 2015/16, 2016/17 and 2017/18 seasons. The results showed that the best teams were characterized and differentiated from bottom teams in the realization of a greater number of successful passes and in the execution of a greater number of dynamic offensive transitions. The bottom teams were characterized by executing more defensive than offensive actions and showing a fewer number of goals, a greater ball possession time in the final third of the field. Goals, ball possession time in the final third of the field, number of effective shots and crosses are the main performance factors that influence the offensive success of football. This information allows us to increase knowledge about the key performance indicators in football.
ARTICLE | doi:10.20944/preprints202201.0160.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: Multilevel Principal Components Analysis (mPCA); 3D shape analysis; Monte Carlo simulations
Online: 12 January 2022 (10:44:23 CET)
3D facial surface imaging is a useful tool in dentistry and in terms of diagnostics and treatment planning. Between-groups PCA (bgPCA) is a method that has been used to analyse shapes in biological morphometrics, although various “pathologies” of bgPCA have recently been proposed. Monte Carlo (MC) simulated datasets were created here in order to explore “pathologies” of multilevel PCA (mPCA), where mPCA with two levels is equivalent to bgPCA. The first set of MC experiments involved 300 uncorrelated normally distributed variables, whereas the second set of MC experiments used correlated multivariate MC data describing 3D facial shape. We confirmed previous results of other researchers that indicated that bgPCA (and so also mPCA) can give a false impression of strong differences in component scores between groups when there is none in reality. These spurious differences in component scores via mPCA reduced strongly as the sample sizes per group were increased. Eigenvalues via mPCA were also found to be strongly effected by imbalances in sample sizes per group, although this problem was removed by using weighted forms of covariance matrices suggested by the maximum likelihood solution of the two-level model. However, this did not solve problems of spurious differences between groups in these simulations, which was driven by very small sample sizes in one group here. As a “rule of thumb” only, all of our experiments indicate that reasonable results are obtained when sample sizes per group in all groups are at least equal to the number of variables. Interestingly, the sum of all eigenvalues over both levels via mPCA scaled approximately linearly with the inverse of the sample size per group in all experiments. Finally, between-group variation was added explicitly to the MC data generation model in two experiments considered here. Results for the sum of all eigenvalues via mPCA predicted the asymptotic amount for the total amount of variance correctly in this case, whereas standard “single-level” PCA underestimated this quantity.
ARTICLE | doi:10.20944/preprints202207.0087.v1
Subject: Chemistry, Food Chemistry Keywords: millet porridge; electric cooker; nutritional composition; principal component analysis; cluster analysis
Online: 6 July 2022 (04:38:57 CEST)
(1) Background: In order to study the effects of different electric cookers on the nutritional components of millet porridge, five different electric cookers were selected to cook millet porridge, and sensory and nutritional components in millet porridge, millet soup, and millet grains were analyzed. (2) Methods: Using principal component and cluster analysis, a variety of nutritional components were comprehensively compared. (3) Results: The results showed that among the different cooked samples, the content of amylose and reducing sugar was the highest in the samples cooked by electric cooker no. 3. The electric cooker no. 4 samples had the highest sensory evaluation score, crude fat, and protein content. The contents of ash, fatty acids, bound amino acids, and minerals were the highest in the electric cooker no. 5 samples. The sensory evaluation score and content of crude fat, ash, reducing sugar, direct starch, and Cu were higher in millet grains than in millet soup or porridge. The content of fatty acid, protein, amino acid, Zn, Fe, Mg, Mn, and Ca was highest in millet soup. Different electric cookers produced millet porridge with varying nutritional levels. (4) Conclusions: This study provides a reference for the further development of new electric cookers.
ARTICLE | doi:10.20944/preprints202202.0120.v1
Subject: Earth Sciences, Atmospheric Science Keywords: Meycauayan; aerosols; source apportionment; principal component analysis; MMORS
Online: 8 February 2022 (14:55:34 CET)
This paper focuses on the application of principal component analysis (PCA) to conduct a source apportionment of atmospheric aerosols from 8 sampling locations along the Marilao-Meycauayan-Obando River System (MMORS). Aerosols were collected on May 2016 during the same time that water samples were collected. Elemental analysis was conducted using a scanning electron microscope coupled with energy dispersive x-ray (SEM-EDX). Carbon (C), nitrogen (N), oxygen (O), sodium (Na), magnesium (Mg), aluminum (Al), silicon (Si), sulfur (S), chlorine (Cl), potassium (K), calcium (Ca), titanium (Ti), manganese (Mn), iron (Fe), copper (Cu), zinc (Zn), bromine (Br), niobium (Nb), barium (Ba), mercury (Hg), and lead (Pb) concentrations were measured and used as inputs in Principal Component Analysis (PCA). The aerosol samples showed the presence of heavy metals Pb and Hg, elements that were also detected in trace amounts in the water measurements. Concentrations of heavy metals Fe, Pb, Hg in the aerosols were attributed to industrial sources. However, it was determined that the primary source of aerosols in the area were traffic and crustal emissions (C, N, O, Si, Al, Ca). Thus, control of traffic emissions would be more beneficial in reducing aerosol emissions in Meycauayan.
ARTICLE | doi:10.20944/preprints201805.0436.v1
Subject: Life Sciences, Other Keywords: coefficient of variation; correlation; factor; jenny; principal components analysis
Online: 30 May 2018 (06:14:46 CEST)
Colostrum is a natural product, issued by both mammals and humans in first week of lactation. Among different species, donkey colostrum is considered as having, besides a valuable composition in nutrients and immune factors, an outstanding similitude with human colostrum. In this context, and taking into account the scarcity of available data concerning the interaction between climate factors and colostrum quality, a trial was conducted aiming to identify the possible influence of environmental factors on donkey colostrum nutritional traits. A stock of 16 jennies from 2 farms located in the County of Cluj, during a 7 days postpartum period was analyzed. During experimental period the daily temperature, humidity, and wind velocity data were collected. Strong positive correlations may be reported between fat and lactose, and fat and protein respectively, while moderate to strong correlation is emphasized between lactose and protein content of donkey colostrum. Testing the influence of environmental temperature, relative humidity, and wind velocity influence upon nutritional content of donkey colostrum, results the neglectable influence of the wind velocity, the negative influence of the heat stress upon all studied colostrum components, and complex influence of relative humidity, which has positive influence on fat and lactose increase when it increases, while its increase has negative influence on protein content of donkey colostrum.
ARTICLE | doi:10.20944/preprints202012.0008.v1
Subject: Behavioral Sciences, Applied Psychology Keywords: Organizational factors; employee creativity; employee innovation; workplace innovation; principal component analysis
Online: 1 December 2020 (09:40:43 CET)
Organizations with proper human resources (HR) practices play an exemplary role in developing their employees’ innovation. Though there is extensive literature on managing organizational innovation, even in today’s scenario some organizations stand as a barrier for employees’ growth and innovation at the workplace. This study aimed to holistically explore the organizational factors affecting employee innovation using principal component analysis (PCA) and condense the dimensionalities for a better focus of organizational development. The study executed a survey questionnaire and collected useful data from one hundred and ninety-five (195) respondents of various Indian companies. The study identified forty-six sub-factors and evolved into nine major organizational factors influencing employee innovation namely organization structure, organization culture and environment, corporate strategy, innovation process, employee, technology, resources, knowledge management and management and leadership. The study recommended that any firm must focus on these factors to encourage employee innovation leading to overall organizational success. It also provides broad implications to HR managers, firm policymakers and top management to reassess and formulate the best organizational strategies to promote innovation culture in the organization.
ARTICLE | doi:10.20944/preprints201808.0512.v1
Subject: Engineering, Civil Engineering Keywords: benchmarking; evaluation of performance; performance indicator; principal component analysis
Online: 30 August 2018 (05:16:30 CEST)
The Inefficient water use, varying and low productivity in Kenya public irrigation schemes is a major concern. It is therefore necessary to periodically monitor and evaluate the performance of public irrigation schemes. The performance of public irrigation in western Kenya was assessed by combining benchmarking methodology and principal component analysis. The aim was to quantify and rank the performance of pumped public irrigation schemes in Kenya. Eleven benchmarking indicators were computed for the period from 2012 to 2016 and compared to global benchmark values. The indicators used fall under agricultural productivity, water supply and financial performance categories. The computed agricultural productivity was 36%–51% in Ahero, 23%–42% in West Kano and 26%–50% Bunyala irrigation scheme. Water supply performance in Ahero, West Kano and Bunyala irrigation schemes varied from 24% to 58%, 3% to 49% and 19% to 43% respectively. Financial performance varied from 46% to 54% in Ahero, 25% to 32% in West Kano and 54%–56% in Bunyala irrigation scheme. An average overall performance efficiency of 46%, 39% and 31% was obtained in Ahero, Bunyala and West Kano irrigation schemes respectively. The performance of the irrigation schemes is very poor and measures on improving performance are needed.
ARTICLE | doi:10.20944/preprints202011.0586.v1
Subject: Chemistry, Analytical Chemistry Keywords: chemical composition; antioxidant; Citrus; essential oils; Principal Component Analysis.
Online: 23 November 2020 (14:19:03 CET)
Citrus essential oils (EOs) have various bioactivities like antioxidants, with many applications. Antioxidant activities depend on the chemical compositions of the EOs, which are affected by climate, soil, and geographical region. Thus, investigations on chemical compositions and antioxidant activities of Citrus EOs in different countries are valuable. In this study, we distilled EOs from peels of Indonesian-grown Citrus, including C. nobilis, C. limon, C. aurantifolia, C. amblycarpa, and Citrus spp.Chemical compositions of EOs were analyzed using Gas Chromatography-Mass Spectrometer (GC-MS), whereas the antioxidant activities were determined by employing 2,2-diphenyl-2-picrylhydrazyl (DPPH) method. Furthermore, principal component analysis (PCA) was applied to elucidate the main contributing compounds for antioxidant activity. The results show that all EOs possess unique chemical characteristics, with limonene as the majority constituent. For antioxidant activities, C. limon and C. amblycarpa EOs are the two strongest, IC50 values below 7.00 μL/mL. PCA approach suggests that -terpinene mainly contributes to the high antioxidant activities of C. limon and C. amblycarpa. Moreover, o-cymene, thymol, p-cymene, and α-pharnesene may also be responsible for the antioxidant activity of C. limon EO. These results are valuable information for the applications of Citrus EOs as antioxidant sources.
ARTICLE | doi:10.20944/preprints202007.0397.v1
Subject: Mathematics & Computer Science, Computational Mathematics Keywords: Machine learning; Dimensionality reduction; Wavelet transform; Water quality; Principal component analysis
Online: 17 July 2020 (15:47:53 CEST)
In this research, an attempt was made to reduce the dimension of wavelet-ANFIS/ANN (artificial neural network/adaptive neuro-fuzzy inference system) models toward reliable forecasts as well as to decrease computational cost. In this regard, the principal component analysis was performed on the input time series decomposed by a discrete wavelet transform to feed the ANN/ANFIS models. The models were applied for dissolved oxygen (DO) forecasting in rivers which is an important variable affecting aquatic life and water quality. The current values of DO, water surface temperature, salinity, and turbidity have been considered as the input variable to forecast DO in a three-time step further. The results of the study revealed that PCA can be employed as a powerful for dimension reduction of input variables and also to detect inter-correlation of input variables. Results of the PCA-Wavelet-ANN models are compared with those obtained from Wavelet-ANN models while the earlier one has the advantage of less computational time than the later models. Dealing with ANFIS models, PCA is more beneficial to avoid Wavelet-ANFIS models creating too many rules which deteriorate the efficiency of the ANFIS models. Moreover, manipulating the Wavelet-ANFIS models utilizing PCA leads to a significant decreasing in computational time. Finally, it was found that the PCA-Wavelet-ANN/ANFIS models can provide reliable forecasts of dissolved oxygen as an important water quality indicators in rivers.
ARTICLE | doi:10.20944/preprints202102.0234.v1
Subject: Biology, Anatomy & Morphology Keywords: Principal Component Analysis, RNA-seq, prostate cancer, biomarkers, RNA genes
Online: 9 February 2021 (10:26:47 CET)
Prostate cancer (Pca) is a highly heterogeneous disease and the second more common tumor in males. Molecular and genetic profiles have been used to identify subtypes and guide therapeutic intervention. However, roughly 26% of primary Pca are driven by unknown molecular lesions. We use Principal Component Analysis (PCA) and custom RNAseq-data normalization to identify a gene expression signature which segregates primary PRAD from normal tissues. This Core-Expression Signature (PRAD-CES) includes 33 genes and accounts for 39% of data complexity along the PC1-cancer axis. The PRAD-CES is populated by protein-coding (AMACR, TP63, HPN) and RNA-genes (PCA3, ARLN1) sparsely found in previous studies, validated/predicted biomarkers (HOXC6, TDRD1, DLX1), and/or cancer drivers (PCA3, ARLN1, PCAT-14). Of note, the PRAD-CES also comprises six over-expressed LncRNAs without previous Pca association, four of them potentially modulating driver’s genes TMPRSS2, PRUNE2 and AMACR. Overall, our PCA capture 57% of data complexity within PC1-3. GO enrichment and correlation analysis involving major clinical features (i.e., Gleason Score, AR Score, TMPRSS2-ERG fusion and Tumor Cellularity) suggest that PC2 and PC3 gene signatures might describe more aggressive and inflammation-prone transitional forms of PRAD. Of note, surfaced genes may entail novel prognostic biomarkers and molecular alterations to intervene. Particularly, our work uncovered RNA genes with appealing implications on Pca biology and progression.
Subject: Engineering, Mechanical Engineering Keywords: Laser shock peening; FE simulation; Residual stress; Minimum principal stress; Static damping
Online: 18 August 2021 (10:51:34 CEST)
Laser shock peening is a process which can reduce stress corrosion cracking and improve fatigue life by forming compressive residual stress on the surface of the material. In a computational FE simulation of laser shock peening, during applying the pressure load generated by the laser pulse to the surface of simulation geometry, the peening is simulated by explicit analysis and then convert to implicit analysis to dissipate the dynamic energy remaining in the geometry. In this study, static damping is applied to dissipate residual dynamic energy without converting it into an implicit analysis. The compressive residual stress distribution is compared between the simulation results for the stainless steel 304 material and the same material subjected to actual laser shock peening. The laser shock peening parameters were 4.2J laser pulse energy, 50% overlap of 3mm diameter of the laser beam and water as a confinement layer. As a result, the compressive residual stress from the surface to the depth direction is similar to both the simulation and the experimental result measured by the hole drilling method.
ARTICLE | doi:10.20944/preprints202101.0413.v1
Online: 21 January 2021 (09:31:21 CET)
Urgent environmental challenges and emerging additive manufacturing (AM) technologies push research towards more performant and new materials. In the field of metallurgy, high entropy alloys (HEAs) have recently represented a topic of intense research because of their promising properties, such as high temperature strength and stability. Moreover, this class of multi-principal element alloys (MPEAs) have opened up researcher community to unexplored compositional spaces, making prosper literature of high-throughput methodologies and tools for rapidly screening large number of alloys. However, none of the methods has been aimed to design new MPEAs for AM process known as selective laser melting (SLM) so far. Here we conducted nanoindentation testing on single scan tracks of elemental powder blends and pre-alloyed powders after ball milling of AlTiCuNb and AlTiVNb. Results show that nanoindentation can represent an effective technique to gain information about phase evolution during laser scanning, contributing to accelerate the development of new MPEAs.
ARTICLE | doi:10.20944/preprints202101.0375.v1
Subject: Social Sciences, Business And Administrative Sciences Keywords: cold chain logistics of agricultural products; demand forecast; principal component analysis, multiple linear regression, neural network.
Online: 19 January 2021 (11:50:09 CET)
Cold chain logistics of Agricultural Products demand forecasting can provide the scientific basis for the country to formulate logistics strategy, which further promotes the development of social economy and the improvement of living standards in China. In this paper, a new mathematical combined model is proposed to Agricultural Products Demand. Shandong, one of a China’s province, serves as the main producer and distributor of agricultural products. Based on the index system created from multiple related factors influencing cold chain logistics demand of agricultural products in Shandong, this paper employs principal component analysis to reduce the dimension of various indexes and predicts principal components with time series. Thereafter, multiple linear regression model and neural network model were constructed to forecast the cold chain logistics demand of agricultural products in Shandong, and their combined forecast models were compared. What's more, the paper provides insight for reference and decision-making concerning the development of cold chain logistics industry of agricultural products in Shandong province.
ARTICLE | doi:10.20944/preprints201607.0069.v1
Subject: Mathematics & Computer Science, Applied Mathematics Keywords: Principal of prediction; random sequences; recurrence; Law of Large Numbers; exponential; normal; bivariate; distribution; Entropy; Information
Online: 23 July 2016 (09:25:13 CEST)
The philosophy of testing statistical hypothesis is a natural consequence and functional extension of mathematical analysis of Probability. Along with the concept of recurrence when applied to random sequences and functions, it leads to the analysis of a priori and posterior which implies testing statistical hypothesis. Testing statistical hypothesis also involves algebraic, functional and dimensional considerations, which are found in the works of Laplace. Aspects of mathematical analysis such as universality of solutions, Laws of Large Numbers, Entropy, Information, and various functional dependencies are the main factors explained in the five properties that lead to implication of testing statistical hypothesis. Various interesting examples with modern scientific significance from genetics, astrophysics, and other areas give methodological access to answers of different problems and phenomena which are involved in the logic of testing statistical hypothesis.
REVIEW | doi:10.20944/preprints202205.0171.v1
Subject: Life Sciences, Biophysics Keywords: principal component analysis; collective variables; molecular dynamics; energy landscape; solvent effects; linear response theory; independent component analysis
Online: 12 May 2022 (10:53:37 CEST)
Principal component analysis (PCA) is used to reduce the dimensionalities of high dimensional datasets in a variety of research areas. For example, biological macromolecules, such as proteins, exhibit many degrees of freedom, allowing them to adopt intricate structures and exhibit complex functions by undergoing large conformational changes. Therefore, molecular simulations of and experiments on proteins generate a large number of structure variations in high dimensional space. PCA and many PCA-related methods have been developed to extract key features from such structural data, and these approaches have been widely applied for over 30 years to elucidate macromolecular dynamics. This review mainly focuses on the methodological aspects of PCA and related methods, and their applications for investigating protein dynamics.
ARTICLE | doi:10.20944/preprints202112.0146.v1
Subject: Biology, Entomology Keywords: Chironomids; taxonomy; morphometry; principal component analysis
Online: 9 December 2021 (08:39:58 CET)
The larvae of some species of the subgenus Orthocladius s. str. (Diptera, Chironomidae) are here described for the first time with corrections and additions to the descriptions of adult males and pupal exuviae. The identification of larvae is generally not possible without association with their pupal exuviae and/or adult males, so the descriptions here are based only on reared material or on pupae with the associated larval exuviae. Usually, Chironomidae larvae can be separated on the basis of morphometric characters, and the most discriminant characters ones are: 1- the ratio between the width of median tooth of mentum (Dm) and the width of the first lateral tooth (Dl) = mental ratio (DmDl), 2- the ratio between the length of the first antennal segment (A1) and the combined length of segments 2-5 (A(2-5)2-5) = antennal ratio (AR). The shape of mandible, maxilla, and other body parts are almost identical in all the species considered in this study. The larva of Orthocladius (Symposiocladius) lignicola is very characteristic and can be separated by the shape of mentum and the larvae of all the known species of Symposiocladius are characterized by the presence of large Lauterborn organs on antennae and of tufts of setae on abdominal segments. The larvae of Orthocladius (Orthocladius) oblidens and Orthocladius (Orthocladius) rhyacobius can be distinguished from other species basing on their large Dm and to each other by AR. A principal component analysis was carried out using 5 characters: 1- Dm, 2- Dl, 3- length of A1, 4- width of A1 (A1W), 5- combined length of segments 2-5 (A2-5). The most discriminant characters were Dm and A1, confirming that DmDl and AR can be used to separate species at larval stage, but the large superposition of morphometric characters in different species confirms that association with pupal exuviae is in any case needed to identify larvae. In future perspective, the development of reference DNA barcodes from specimens identified by specialists is recommended since possibly the best tool for larvae identification, but association of barcodes with morphotypes is in any case fundamental.
ARTICLE | doi:10.20944/preprints202110.0127.v1
Subject: Mathematics & Computer Science, Applied Mathematics Keywords: Descriptive analysis; principal components analysis; k-means clustering; data panel regression method; machine learning; XGBoost algorithms; random forest algorithms
Online: 8 October 2021 (08:30:13 CEST)
The aim of this work is to explain the behaviour of the multiresistance percentage of Pseudomona aeruginosa in some countries of Europe through a multivariate statistical analysis and machine learning validation, using data from the European Antimicrobial Resistance Surveillance System, the World Health Organization and the World Bank. First, we will use a descriptive analysis and a principal components analysis. Then, we use a k-means clustering to determine the countries and regions that are most affected by the antibiotic resistance. Second, we expand the database by adding some socioeconomic, governance and antibiotic-consumption variables. We then run a data panel regression analysis to determine some functions that relates the multiresistance percentage with those new variables. Finally, we use machine learning techniques to validate a pooling panel data case, using XGBoost and random forest algorithms. The results of the data panel analysis indicate that the most important variables for the multiresistance percentage are corruption control and the rule of law. Similar results are found with the machine learning validation analysis, where the human development index is an additional important variable for the multiresistance percentage.
ARTICLE | doi:10.20944/preprints202206.0033.v1
Subject: Engineering, Industrial & Manufacturing Engineering Keywords: MV20/20; PoDFA; anomaly detection; statistical process control; principal components analysis; K-Means; DBSCAN; multi-layer perceptron; inclusion; receiver operating characteristic; confusion matrix
Online: 2 June 2022 (11:08:35 CEST)
The analysis of data produced by the MV20/20 sensor, tagged with quality outcomes, is presented with the aim of developing a predictor model for real-time anomaly detection and classification. Three types of inclusions, undesired particles that deteriorate the quality of production, are used to tag the quality data using results from the lab. We explore both unsupervised and supervised learning, which both offer advantages in monitoring and controlling the quality of production. It is discovered that the dataset can be clustered using techniques like K-Means and DBSCAN. Bounding the data within a 95% confidence interval ellipse ensures we can detect anomalous events in real time. For supervised learning, a two-stage classifier is explored, which classifies the outcome of a cast and secondly the inclusion responsible for the negative outcome. We explore models from logistic regression and support vector machines, to two neural networks, namely the multi-layer perceptron and the radial basis function network. While the cast outcome is adequately predicted by all the models, the multi-layer perceptron provides a boundary performance for the inclusion type. A more advanced technique for model optimisation, namely grid search, is applied in order to improve on the results. The outcome for the grid search is not much better, which indicates a global maximum in the learning capacity of the model. Recommendations include the addition of sensor systems and an audit of data collection variation.
Subject: Medicine & Pharmacology, Pharmacology & Toxicology Keywords: Cannabis; Metabolite; Principal Component Analysis; Random Forest
Online: 5 September 2020 (07:51:50 CEST)
The many strains of Cannabis spp. are associated with many effects on users and contain many different potentially psychoactive metabolites, but the links between metabolite profiles and user effects are unclear. Here we take a statistical approach to linking cause (i.e. metabolites) to effects in Cannabis spp. through the prism of strains, using quantitative data for metabolite composition and user effects. We find that species (indica vs. sativa) explains <2% of the variability in metabolite profiles, while strain explains 1/3 of variability, indicating species is nonindicative of metabolite composition, while strain is approximately indicative. Using random forests we generate a table of potential metabolite-effect links. We also find that effect-weighted metabolite composition can effectively be described in terms of four values representing the concentrations of pairs or triplets of particular compounds.
ARTICLE | doi:10.20944/preprints201801.0043.v1
Subject: Behavioral Sciences, Cognitive & Experimental Psychology Keywords: traditional; local; consumer behavior; principal component analysis
Online: 8 January 2018 (04:17:29 CET)
This study assesses attitudes of young adults' (18-30 years old) consumption on local and traditional products in7 European countries. A clustered sample (n=836) from natives of Greece, Bulgaria, Romania, Slovenia, Croatia, Denmark and France was collected, by distributing questionnaires through social media and university mail services. Sample was examined by implementing Principal Component Analysis (PCA) in three different samples; overall and two subgroups, Eastern and Western European countries. Six major factors revealed: consumer behavior, health issues, cost, influence from media and close environment and availability on store. As a result, young adults have a positive attitude to local and traditional food products but they express insecurity for health issues. Cost factor influences less people from Eastern European countries than those from the overall sample (3rd and 5th factor accordingly). Influence of close environment is a different factor in Eastern countries comparing to Western ones that it common with influence from media. Females and older people (25-30 years old) doubt less about TFPs, while media have high influence on consumers’ decisions. Aim of this survey is to create consumer profiles of young adults and create different promotion strategies of local and traditional products among the two groups of countries.
ARTICLE | doi:10.20944/preprints201706.0043.v1
Subject: Earth Sciences, Environmental Sciences Keywords: DPSIR model; green mine; principal component analysis
Online: 8 June 2017 (16:29:56 CEST)
Strategic researches on green mine construction are of great theoretical and practical significance to the sustainable development of China's mining industry as well as the great-leap-forward development strategies of China. Strategies of green mine construction in China are methods summarized to solve all potential problems from mine production to ecological restoration. At present, strategies of green mine construction in China are not fully evaluated and studied yet. Therefore, on the basis of green mine construction related literatures carried out by researchers in China and abroad, this study took the green mine of Yongcheng City in China as the research object to evaluate the current situation of green mine construction in Yongcheng City and put forward corresponding countermeasures. First of all, driving force-pressure-state-impact-response (DPSIR) model was introduced for the construction of evaluation index system; construction principles and selection methods of indexes and the index system based on driving force, pressure, state, impact and response were constructed. Secondly, principal component analysis (PCA) was adopted to calculate and evaluate data of green mine of Yongcheng City in recent years, and construction state of green mine in Yongcheng City was analyzed concretely according to the evaluation results. Empirical results showed that, the evaluation system constructed in this study was feasible, which could be applied to evaluate construction of green mine effectively.
ARTICLE | doi:10.20944/preprints202102.0528.v3
Subject: Biology, Anatomy & Morphology Keywords: Fermentation; Honey production; Principal component analysis; Organoleptic characteristics; Coffea arabica
Online: 30 June 2021 (12:50:43 CEST)
The post-harvest processes of coffee are widely accepted as key factors in determining the quality of the product. In the Cauca department, Southwestern Colombia, this stage is carried out empirically by farmers in the region, using old methods that do not assure consistent quality. This study proposes to determine the best post-harvest temperature and time conditions for coffee produced in the region. For this purpose, we carried the fermentation and honey process out on different coffee samples of the Coffea Arabica species of the Castillo variety. Subsequently, the cup profile quality of the coffee samples was determined by a sensory evaluation by experts. Finally, we applied descriptive statistical techniques to the resulting data and principal component analysis and hierarchical cluster analysis to find similarities between the samples. The results suggest that the honey process gets better evaluations in the cup profile over any condition of temperature and fermentation time.
ARTICLE | doi:10.20944/preprints202010.0375.v1
Subject: Medicine & Pharmacology, Psychiatry & Mental Health Studies Keywords: hope; mental health; reliability; validity; principal component analysis; schizophrenia
Online: 19 October 2020 (11:18:42 CEST)
Hope is important in the rehabilitation of persons with schizophrenia, through scales to measure hope are not appropriate for this population. The purpose of this cross-sectional study was to identify the psychometric properties of the Schizophrenia Hope Scale-9 (SHS-9) using data from 83 people with schizophrenia in four mental health centers and 762 healthy persons from two universities in South Korea. The mean (standard deviation) SHS-9 score of the participants with schizophrenia and healthy participants was 11.24 (4.90) and 14.83 (3.10), respectively. Lower scores indicate a lower level of hope. The internal consistency alpha coefficient was 0.92 with a 4-week test-retest reliability of 0.89. Criterion-related construct validity was established by examining the correlation between the SHS-9 and the State-Trait Hope Inventory scores. Divergent validity was identified through a negative relationship of SHS-9 with the Beck Hopelessness Scale. The construct validity of the SHS-9 was confirmed through principal component analysis with extraction methods, which resulted in a one-factor solution, accounting for 49–60% of the total item variance.. This study provided evidence for the validity and reliability of the SHS-9; therefore, it could be used to measure hope in people with schizophrenia.
ARTICLE | doi:10.20944/preprints201906.0308.v1
Subject: Life Sciences, Biochemistry Keywords: chrysanthemum; HPLC; phenolic compounds; principal component analysis; antioxidant capacity
Online: 30 June 2019 (08:55:38 CEST)
This study investigated the phenolic compounds of 15 Chrysanthemum morifolium Ramat cv. ‘Hangbaiju’, including 6 ‘Duoju’ and 9 ‘Taiju’ using high performance liquid chromatography. The antioxidant activities of these ‘Hangbaiju’ were estimated by DPPH, ABTS and FPAR assays. Results showed that a total of 14 phenolic compounds were detected in these flowers, including 3 mono-caffeoylquinic acids, 3 di-caffeoylquinic acids, 1 phenolic acid and 7 flavonoids. ‘Duoju’ and ‘Taiju’ possessed different concentration of phenolic compounds, and ‘Taiju’ exhibited higher caffeoylquinic acids and stronger antioxidant activities than ‘Duoju’. Caffeoylquinic acids showed a strong correlation with the antioxidant activities of the samples. Principal component analysis revealed an obvious separation between ‘Duoju’ and ‘Taiju’ using phenolic compounds as variables. Apigenin-7-O-glucoside, 3,5-di-O-caffeoylquinic acid, luteolin and acacetin were found to be the key phenolic compounds to differentiate ‘Duoju’ from ‘Taiju’.
ARTICLE | doi:10.20944/preprints201712.0127.v1
Subject: Earth Sciences, Environmental Sciences Keywords: vulnerability index; Maasai pastoralists; principal component analysis; climate change
Online: 18 December 2017 (17:21:00 CET)
Human adaptive responses to climate change occur at the local level, where climatic variability is experienced. Therefore analyzing vulnerability at the local level is important in planning effective adaptation options in a semi-arid environment. This study was conducted to assess vulnerability of Maasai pastoralist communities in Kajiado County, Kenya to climate change by generating vulnerability index for the communities. Data was collected using questionnaires that were administered to 305 households in the five different administrative wards (Oloosirkon/Sholinke, Kitengela, Kapetui North, Kenyawa-Poka and Ilmaroro) in Kajiado East. Vulnerability was measured as the net effect of adaptive capacity, sensitivity and exposure to climate change. Principal Component Analysis (PCA) was used to assign weights to the vulnerability indicators used for the study and also to calculate the household vulnerability index. A vulnerability map was produced using the GIS software package ArcGIS 10.2. Results showed that gender of household head, age of household head, educational level, access to extension agents, herd size, livestock diversity and access to credit facility influenced vulnerability of the Maasai pastoralists to climate change in Kajiado East. The result showed that the most vulnerable communities with the highest negative vulnerability index value are Ilpolosat (-2.31), Oloosirikon (-2.22), Lenihani (-2.05), Konza (-1.81) and Oloshaiki (-1.53). The communities with the highest positive vulnerability index values were Kekayaya (4.02), Kepiro (3.47), Omoyi (2.81), Esilanke (2.23), Kisaju (2.16) and Olmerui (2.15). We conclude that provision of basic amenities such as good roads and electricity; access to extension agents, access to credit facilities and herd mobility will reduce vulnerability of Maasai pastoralists in Kajiado east to climate change and variability.
ARTICLE | doi:10.20944/preprints201608.0142.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: churn prediction; incremental principal component analysis; stochastic gradient descent
Online: 13 August 2016 (11:28:39 CEST)
Modern companies accumulate a vast amount of customer data that can be used for creating a personalized experience. Analyzing this data is difficult and most business intelligence tools cannot cope with the volume of the data. One example is churn prediction, where the cost of retaining existing customers is less than acquiring new ones. Several data mining and machine learning approaches can be used, but there is still little information about the different algorithm settings to be used when the dataset doesn't fit into a single computer memory. Because of the difficulties of applying feature selection techniques at a large scale, Incremental Probabilistic Component Analysis (IPCA) is proposed as a data preprocessing technique. Also, we present a new approach to large scale churn prediction problems based on the mini-batch Stochastic Gradient Decent (SGD) algorithm. Compared to other techniques, the new method facilitates training with large data volumes using a small memory footprint while achieving good prediction results.
ARTICLE | doi:10.20944/preprints202105.0216.v1
Subject: Social Sciences, Accounting Keywords: Built environment; pedestrian volume; stepwise regression; principal component analysis; Melbourne
Online: 10 May 2021 (15:34:00 CEST)
Previous studies have mostly examined how sustainable cities try to promote non-motorized travel by creating a walking-friendly environment. Such existing studies provide little research that identifies how the built environment affects pedestrian volume in high-density areas. This paper presents a methodology that combines person correlation analysis, stepwise regression, and principal component analysis for exploring the internal correlation and potential impact of built environment variables. To study this relationship, cross-sectional data in the Melbourne central business district were selected. Pearson’s correlation coefficient confirmed that visible green index and intersection density were not correlated to pedestrian volume. The results from stepwise regression showed that land-use mix degree, public transit stop density, and employment density could be associated with pedestrian volume. Moreover, two principal components were extracted by factor analysis. The result of the first component yielded an internal correlation where land-use and amenities components were positively associated with the pedestrian volume. Component 2 presents parking facilities density, which negatively relates to the pedestrian volume. Based on the results, existing street problems and policy recommendations were put forward to suggest diversifying community service within walking distance, improving the service level of the public transit system, and restricting on-street parking in Melbourne.
ARTICLE | doi:10.20944/preprints202102.0544.v2
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Principal component; Jarque-Bera statistic; Normality testing; Empirical power; Simulation
Online: 25 February 2021 (08:07:12 CET)
The testing of high-dimensional normality has been an important issue and has been intensively studied in literatures, it depends on the Variance-Covariance matrix of the sample, numerous methods have been proposed to reduce the complex of the Variance-Covariance matrix. The principle component analysis(PCA) was widely used since it can project the high-dimensional data into lower dimensional orthogonal space, and the normality of the reduced data can be evaluated by Jarque-Bera(JB) statistic on each principle direction. We propose two combined statistics, the summation and the maximum of one-way JB statistics, upon the independency of each principle direction, to test the multivariate normality of data in high dimensions. The performance of the proposed methods is illustrated by the empirical power of the simulated data of normal data and non-normal data. Two real examples show the validity of our proposed methods.
ARTICLE | doi:10.20944/preprints202009.0605.v1
Subject: Earth Sciences, Atmospheric Science Keywords: hydrometeorological variability; Indian Ocean Dipole; principal component analysis; mutual information
Online: 25 September 2020 (11:45:13 CEST)
In this study, we used statistical models to analyze nonlinear behavior links with atmospheric teleconnections between hydrometeorological variables and Indian Ocean Dipole (IOD) mode over the East Asia (EA) region. The analysis of atmospheric teleconnections was conducted using principal component analysis and singular spectrum analysis techniques. Moreover, the nonlinear lag-time correlations between climate indices and hydrological variables were calculated using mutual information (MI) techniques. The teleconnection-based nonlinear correlation coefficients (CCs) were higher than the linear CCs in each lag time. Additionally, we documented that the IOD has a direct influence on hydro-meteorological variables, such as precipitation within the Korean Peninsula (KP). Moreover, during the warm season (June to September) the variation of hydro-meteorological variables in the KP demonstrated significantly decreasing patterns during positive IOD years and they have neutral conditions during negative IOD years in comparison with long-term normal conditions. Finally, the revealed relationship between climate indices and hydro-meteorological variables and their possible changes will allow better understanding of stakeholder decision-making regarding to manage of freshwater management over the EA region. It can also provide useful data for long-range water resources prediction, to minimize hydrological uncertainties in a changing climate.
ARTICLE | doi:10.20944/preprints201811.0416.v1
Subject: Medicine & Pharmacology, Dentistry Keywords: multilevel principal components analysis; shape and image texture; facial expression
Online: 19 November 2018 (04:57:36 CET)
Single-level Principal Components Analysis (PCA) and multi-level PCA (mPCA) methods are applied here to a set of (2D frontal) facial images from a group of 80 Finnish subjects (34 male; 46 female) with two different facial expressions (smiling and neutral) per subject. Inspection of eigenvalues gives insight into the importance of different factors affecting shapes, including: biological sex, facial expression (neutral versus smiling), and all other variations. Biological sex and facial expression are shown to be reflected in those components at appropriate levels of the mPCA model. Dynamic 3D shape data for all phases of a smile made up a second dataset sampled from 60 adult British subjects (31 male; 29 female). Modes of variation reflected the act of smiling at the correct level of the mPCA model. Seven phases of the dynamic smiles are identified: rest pre-smile, onset 1 (acceleration), onset 2 (deceleration), apex, offset 1 (acceleration), offset 2 (deceleration), and rest post-smile. A clear cycle is observed in standardized scores at an appropriate level for mPCA and in single-level PCA. mPCA can be used to study static shapes and images, as well as dynamic changes in shape. It gave us much insight into the question “what’s in a smile?”
ARTICLE | doi:10.20944/preprints201802.0110.v1
Subject: Engineering, Mechanical Engineering Keywords: Structural Health Monitoring (SHM), distributed sensing, Principal Component Analysis (PCA)
Online: 16 February 2018 (16:04:42 CET)
Fiber optic sensors cannot measure damage; for getting information about damage from strain measurements, additional strategies are needed, and several alternatives have been proposed. This paper discuss two independent concepts: the first one is based on detecting the new strains appearing around a damage spot; the structure does not need to be under loads; the technique is very robust, damage detectability is high, but it requires sensors to be located very close to the damage, so it is a local technique. The second approach offers a wider coverage of the structure, it is based on identifying the changes caused by the damage on the strains field in the whole structure for similar external loads. Damage location does not need to be known a priori, detectability is dependent upon the sensors network density, damage size and the external loads. Examples of application to real structures are given.
ARTICLE | doi:10.20944/preprints201702.0104.v1
Subject: Social Sciences, Geography Keywords: Sustainable rural development; EAFRD; LEADER Approach; GIS; Principal Component Analysis
Online: 28 February 2017 (12:16:38 CET)
The European Commission has been striving to achieve sustainable development in its rural areas for more than 25 years through funds aimed at modernizing the agricultural and forestry sectors, protecting the environment and improving the quality of life. But is sustainable rural development really being accomplished? This study sets out to answer this question in the case of Extremadura, a Spanish territory with Low Demographic Density and a Gross Domestic Product still below 75 % of the European average. Both qualitative and quantitative methodology have been employed, using a Principal Component Analysis the result of which has provided us with a model which shows how various behaviors coexist in the region in view of the distribution of current funding from the EAFRD. The most dynamic areas have received the largest amounts of funding and these are linked to the agricultural sector and to the protection of the environment, leaving aside the more depressed areas and the implementation of the LEADER Approach as well. Therefore, we have come to the conclusion that the current rural development in Extremadura is not sustainable enough.
ARTICLE | doi:10.20944/preprints201611.0081.v1
Subject: Keywords: symplectic geometry; symplectic principal component analysis; symplectic entropy; complex system.
Online: 16 November 2016 (12:40:50 CET)
The real systems are often complex, nonlinear, and noisy in various areas including mathematics, natural science, and social science. We present the symplectic entropy (SymEn) measure as well as an analysis method based on SymEn to estimate the nonlinearity of the complex system by analyzing the given time series. The SymEn estimation is a kind of entropy based on symplectic principal component analysis (SPCA) which represent organized but unpredictable behaviors of systems. The key to SPCA is to preserve the global submanifold geometrical properties of the systems through symplectic transform in the phase space, which is a kind of the measure-preserving transforms. The capability of preserving the global geometrical characteristics makes the SymEn a test statistic to detect the nonlinear characteristics in several typical chaotic time series and the stochastic characteristic in the Gaussian white noise. The results are in agreement with findings in the approximate entropy (ApEn), the sample entropy (SampEn) and the fuzzy entropy (FuzzyEn). Moreover, the SymEn method is also used to analyze the nonlinearities of the real signals (including the EEG signals for ASD and healthy subjects, and the sound and vibration signals for the mechanical systems). The results indicate that the SymEn estimation can be taken as a measure for describing the nonlinear characteristics in the data collected from the natural complex systems.
ARTICLE | doi:10.20944/preprints202002.0133.v1
Subject: Social Sciences, Education Studies Keywords: entrepreneurship; entrepreneurial interest; youth, family; entrepreneurial eco-system; principal component analysis
Online: 10 February 2020 (15:52:32 CET)
As entrepreneurial interest is believed to represent a causal factor increasing entrepreneurship, research has begun to explore how family systems affect youth entrepreneurial interests. In the present study, we attempt to identify different types of family influence on the entrepreneurial interests of young people. A questionnaire was used to obtain data from 1,633 Spanish youths, who were 15 to 18 years old, and another questionnaire was used to obtain data from 839 parents. Principal Component Analysis identified unique family types and revealed that they have differential associations to entrepreneurial interest among youths. These findings reaffirm the influence of family on the entrepreneurial ecosystem and the promotion of an entrepreneurial family cuture. This study further suggests that early attention should focus on the detection of entrepreneurial interest among youths so that actions can be implemented in the families of low-interest youths to incentivize an entrepreneurial family culture.
ARTICLE | doi:10.20944/preprints201809.0096.v1
Subject: Biology, Plant Sciences Keywords: Montiaceae; phylogeny; phylogeography; long-distance dispersal; idiosyncrasy; Principal of Evolutionary Idiosyncraticity
Online: 5 September 2018 (12:02:42 CEST)
Montiaceae comprise a clade of at least 270 species plus about 20 accepted subspecific taxa, primarily of western America and Australia. The present paper is the first of a two-part work that seeks to evaluate evolutionary theory via metadata analysis of Montiaceae. In particular, it uses metadata analysis to evaluate the theory in theory-laden methods that have been applied in evolutionary analyses of Montiaceae. This part focuses on phylogeny and phylogeography. The second part focuses on phenotypic and ecological diversification. An emergent theme in this paper is the degree to which historical idiosyncrasy during Montiaceae evolution misleads quantitative methods of evolutionary reconstruction and phylogeographic interpretation. This suggests that idiosyncraticity itself is a fundamental property of evolution. The second part of this work elaborates this notion as the Principle of Evolutionary Idiosyncraticity. The present part describes idiosyncraticity in molecular phylogenetic and phylogeographic data and uses this notion to refine ideas on Montiaceae evolution. Phylogenetic metadata conflicts and conflicting phylogeographic interpretations are discussed. I conclude that, owing to PEI, quantitative methods of evolutionary analysis cannot be globally accurate, though they are useful heuristically. In contrast, classical narrative analysis is robust in the face of PEI.
ARTICLE | doi:10.20944/preprints202106.0211.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: COVID-19; spatial; mobility; spatial weight matrices; principal component analysis; hierarchical clustering
Online: 8 June 2021 (10:56:22 CEST)
The COVID-19 pandemic starting in the first half of 2020 has changed the lives of everyone across the world. Reduced mobility was essential due to it being the largest impact possible against the spread of the little understood SARS-CoV-2 virus. To understand the spread, a comprehension of human mobility patterns is needed. The use of mobility data in modelling is thus essential to capture the intrinsic spread through the population. It is necessary to determine to what extent mobility data convey the same message of mobility within a region. This paper compares different mobility data sources by constructing spatial weight matrices and further compares the results through hierarchical clustering. This provides insight for the user into which data provides what type of information and in what situations a particular source is most useful.
ARTICLE | doi:10.20944/preprints202002.0073.v2
Subject: Earth Sciences, Atmospheric Science Keywords: principal component analysis; PCA; directional component analysis; DCA; empirical orthogonal functions; extremes; US rainfall
Online: 11 February 2020 (16:10:09 CET)
Floods and droughts are driven, in part, by spatial patterns of extreme rainfall. Heat waves are driven by spatial patterns of extreme temperature. It is therefore of interest to design statistical methodologies that allow the identification of likely patterns of extreme rain or temperature from observed historical data. The standard work-horse for identifying patterns of climate variability in historical data is Principal Component Analysis (PCA) and its variants. But PCA optimizes for variance not spatial extremes, and so there is no particular reason why the first PCA spatial pattern should identify, or even approximate, the types of patterns that may drive these phenomena, even if the linear assumptions underlying PCA are correct. We present an alternative pattern identification algorithm that makes the same linear assumptions as PCA, but which can be used to explicitly optimize for spatial extremes. We call the method Directional Component Analysis (DCA), since it involves introducing a preferred direction, or metric, such as `sum of all points in the spatial field'. We compare the first PCA and DCA spatial patterns for US rainfall anomalies on a 6 month timescale, using the sum metric for the definition of DCA in order to focus on total rainfall anomaly over the domain, and find that they are somewhat different. The definitions of PCA and DCA result in the first PCA spatial pattern having the larger explained variance of the two patterns, while the first DCA spatial pattern, when scaled appropriately, has a higher likelihood and greater total rainfall anomaly, and indeed is the pattern with the highest total rainfall anomaly for any given likelihood. In combination these two patterns yield more insight into rainfall variability and extremes than either pattern on its own.
ARTICLE | doi:10.20944/preprints201910.0018.v1
Subject: Biology, Ecology Keywords: allergenic pollen; ozone; automatic real-time device; image analysis; principal component analysis
Online: 2 October 2019 (06:02:31 CEST)
Alnus glutinosa is important woody plant in Lithuanian forest ecosystems. Knowledge of fluorescence properties of black alder pollen is necessary for scientific and practical purposes. By the results of the study we aimed to evaluate possibilities of identifying Alnus glutinosa pollen fluorescence properties by modeling ozone effect and applying two different fluorescence-based devices. To implement experiments, black alder pollen was collected in a typical habitat during the annual flowering period in 2018-2019. There were three groups of experimental variants, which differed in the duration of exposure to ozone, conditions of pollen storage before the start of the experiment, and the experiment start time. Data for pollen fluorescence analysis were collected using two methods. The microscopy method was used in order to evaluate the possibility of employing image analysis systems for investigation of pollen fluorescence. The second data collection method is related to the automatic device identifying pollen in real-time, which uses the fluorescence method in the pollen recognition process. Data were assessed employing image analysis and principal component analysis (PCA) methods. Digital images of ozone-exposed pollen observed under the fluorescence microscope showed the change of the dominant green colour towards the blue spectrum. Meanwhile, the automatic detector detects more pollen whose fluorescence is at the blue light spectrum. It must be noted that assessing pollen fluorescence several months after exposure to ozone, no effect of ozone on fluorescence remains.
ARTICLE | doi:10.20944/preprints201809.0196.v1
Subject: Social Sciences, Geography Keywords: global indices, global metrics, global society, new global geographies, principal components analysis.
Online: 11 September 2018 (12:05:12 CEST)
There are now a wide variety of global metrics. To find the degree of overlap between these different measures, we employ a principal components analysis (PCA) to 15 indices across 145 countries. Our results demonstrate that the most important underlying dimension highlights that economic development and social progress go hand in hand with state stability. The results are used to produce categorical divisions of the world. The threefold division identifies a world composed of what we describe and map as Rich, Poor and Middle countries. A five-group classification provided a more nuanced categorization described as; The Very Rich, Free and Stable, Affluent and Free, Upper Middle, Lower Middle, and Poor and Not Free.
ARTICLE | doi:10.20944/preprints201806.0194.v1
Subject: Social Sciences, Economics Keywords: Economic growth, Principal Component analysis, Cointegration, Stock market development, financial market development
Online: 12 June 2018 (14:05:09 CEST)
Does the choice of proxy for stock market development matter? This paper suggests that the growth effect of stock market development is sensitive to the choice of proxy and using alternative financial development indicators have practically no influence on the results. We found that using either the stock market capitalization to GDP ratio or the stock market returns; have a positive and significant effect on growth. However, we cannot make same conclusion when one uses either the ratio of total value of trades on the major stock exchanges to GDP or stock market turnover ratio to proxy for stock market development as the coefficient on these variables were found to be statistically insignificant. The indexes extracted from principal component analysis confirm the sensitivity of the effect to the choice of proxy. This finding suggest that stock market development is a conceptual terms, thus, representing it with single indicators make it impossible to identify which stock market development indicators have a significant positive growth effects.
ARTICLE | doi:10.20944/preprints202107.0509.v1
Subject: Biology, Anatomy & Morphology Keywords: Perilla crop; genetic resources; morphological traits; principal component analysis; SSR marker; genetic variation
Online: 22 July 2021 (08:04:28 CEST)
Using morphological characteristics and SSR markers, we evaluated the morphological and genetic variation of 200 Perilla accessions collected from the five regions of South Korea and other region. In morphological characteristics analysis, particularly, leaf color, stem color, degree of pubescence, leaf size were found to be useful for distinguishing the characteristics of native Perilla accessions cultivated in South Korea. A total of 137 alleles were identified in the 20 simple sequence repeat (SSR) markers, and the number of alleles per locus ranged from 3 to 13, and the average number of alleles per locus was 6.85. The average gene diversity (GD) was 0.649, with a range of 0.290-0.828. From analysis of SSR markers, accessions from the Jeolla-do and Gyeongsang-do regions showed comparatively high genetic diversity values compared with those from other regions in South Korea. In the unweighted pair group method with arithmetic mean (UPGMA) analysis, the 200 Perilla accessions were found to cluster into three major groups and an outgroup with a genetic similarity of 42%, and did not showed a clear geographic structure from the five regions of South Korea. Therefore, it is believed that landrace Perilla seeds are frequently exchanged by farmers through various routes between the five regions of South Korea. The results of this study are expected to provide useful information for conservation of these genetic resources and selection of useful resources for the development of varieties for seeds and leafy vegetables of cultivated var. frutescens of Perilla crop in South Korea.
ARTICLE | doi:10.20944/preprints201803.0171.v1
Subject: Earth Sciences, Atmospheric Science Keywords: firework displays; toxic metals; principal component analysis; risk assessment; hazard quotient; hazard index
Online: 20 March 2018 (07:19:53 CET)
Bonfire night is a worldwide phenomenon given to numerous annual celebrations characterised by bonfires and fireworks. Since Thailand has no national ambient air quality standards for metal particulates, it is important to investigate the impacts of particulate injections on elevations of air pollutants and ecological health impacts resulting from firework displays. In this investigation, Pb and Ba were considered potential firework tracers because their concentrations were significantly higher during the episode and lower than/comparable with minimum detection limits during other periods, indicating that their elevated concentrations were principally due to pyrotechnic displays. Pb/Ca, Pb/Al, Pb/Mg, and Pb/Cu can be used to pin-point emissions from firework displays. Air mass backward trajectories (72 h) from the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model indicated that areas east and north-east of the study site were the main sources for the air transportation. Although the combined risk associated with levels of Pb, Cr, Co, Ni, Zn, As, Cd, V, and Mn was far below the standards mentioned in international guidelines, the lifetime cancer risks associated with As and Cr levels exceeded US-EPA guidelines, and may expose inhabitants of surrounding areas of Bangkok to elevated cancer risk.
ARTICLE | doi:10.20944/preprints201803.0109.v1
Subject: Medicine & Pharmacology, Nutrition Keywords: Cervical intraepithelial neoplasia; Mediterranean Diet Score; Principal Component analysis; western diet; prudent diet
Online: 15 March 2018 (03:33:48 CET)
Specific foods and nutrients help prevent the progression from persistent high-risk human papillomavirus (hrHPV) infection to cervical cancer (CC). We aimed to focus on dietary patterns which may be associated with hrHPV status and risk of high-grade cervical intraepithelial neoplasia (CIN2+). Overall, 539 eligible women, including 127 CIN2+, were enrolled in a cross-sectional study, and tested for hrHPV infection. Food intakes were estimated using a food frequency questionnaire. Logistic regression models were applied. Using the Mediterranean Diet Score, we demonstrated that, among 252 women with normal cervical epithelium, medium adherence to Mediterranean diet decreased odds of hrHPV infection when compared to low adherence (adjOR=0.40, 95%CI=0.22-0.73). Using principle component analysis, we also identified two dietary patterns which explained 14.31% of variance. Women in the 3rd and 4th quartiles of the “western pattern” had higher odds of hrHPV infection when compared with 1st quartile (adjOR=1.77, 95%CI=1.04-3.54 and adjOR=1.97, 95%CI=1.14-4.18, respectively). Adjusting for hrHPV status and age, women in the 3rd quartile of the “prudent pattern” had lower odds of CIN2+ when compared with 1st quartile (OR=0.50, 95%CI=0.26-0.98). Our study is the first to demonstrate the association of dietary patterns with hrHPV infection and CC, discouraging unhealthy habits in favour of Mediterranean-like diet.
ARTICLE | doi:10.20944/preprints202009.0692.v1
Subject: Earth Sciences, Atmospheric Science Keywords: wheat production; multiple linear regression; soil quality index; principal components analysis; digital soil mapping
Online: 28 September 2020 (17:44:16 CEST)
Soil quality assessment based on crop yields and identification of key indicators of it can be used for better management of agricultural production. In the current research, the weighted additive soil quality index (SQIw), factor analysis (FA) and multiple linear regression (MLR) method are used to assess the soil quality of rainfed winter wheat fields with two soil orders on 53.20 km2 of agricultural land in western Iran. A total of 18 soil quality indicators were determined for 100 soil samples (0-20 cm depth) from two soil orders (Inceptisols and Entisols). The soil properties measured were: pH, soil texture, organic carbon (OC), cation exchange capacity (CEC), electrical conductivity (EC), soil microbial respiration (SMR), carbonate calcium equivalent (CCE), soil porosity (SP), bulk density (BD), exchangeable sodium percentage (ESP), mean weight diameter (MWD), available potassium (AK), total nitrogen (TN), available phosphorus (AP), available Fe (AFe), available Zn (AZn), available Mn (AMn), and available Cu (ACu). Mean wheat grain yield for the two years for all of the 100 sampling sites was also gathered. The SQIw was calculated using two weighting methods (FA and MLR) and maps were created using a digital soil mapping framework. The soil indicators taken in the minimum data set (MDS) were AK, clay, CEC, AP, SMR, and sand. The correlation between the MLR weighting technique (SQI-M) and the rainfed wheat yield (r=0.62) was slightly larger than that the correlation of yield with the FA weighted technique (SQI-F) (r=0.58). Results showed that the means of both SQI-M and SQI-F and rainfed wheat yield for Inceptisols were higher than for Entisols although these differences were not statistically significant. Both SQI-M and SQI-F showed that areas with Entisols had lower proportions of good soil quality grades (Grade I and II), and higher proportions of poor soil quality grades (Grade IV and V) compared to Inceptisols. Based on these results, soil type must be considered for soil quality assessment in future studies to maintain and enhance soil quality and sustainable production. The overall soil quality of the study region was of poor and moderate grades. To improve soil quality, it is therefore recommended that effective practices such as the implementation of scientific integrated nutrient management involving the combined use of organic and inorganic fertilizers in rainfed wheat fields be promoted.
ARTICLE | doi:10.20944/preprints202007.0106.v1
Subject: Medicine & Pharmacology, Nutrition Keywords: polygenic risk; wellness; food frequency; principal component analysis; healthy eating index; obesity; food desert
Online: 7 July 2020 (02:36:11 CEST)
Diet influences, and is influenced by, a wide range of socioeconomic, cultural, geographic, and genetic variables. Here we survey a matrix of such interactions as well as their connection to a variety of health outcomes, in a cohort of 689 diverse adults employed at Emory University and enrolled in the Center for Health Discovery and Well-Being (CHDWB) study. Principal component analysis (PCA) of the Block Food Frequency Questionnaire revealed seven PC cumulatively explaining 25.8% and each individually at least 2% of the proportional consumption of 110 food items. PC1 is strongly correlated with the Healthy Eating Index-2015 measure, and accordingly healthier scores associate with multiple measures of physical and mental health. It, as well as PC2 (likely a measure of food expense) and PC3 (carbohydrate versus protein consumption) show significant geographic structure across the Atlanta metropolitan area, correlating with race and ethnicity, income level, age and sex. Notably, a polygenic score for body mass index (BMI) consisting of 281 SNPs explains 2.8% of the variance in PC5, which is as strong as its association with BMI itself. PC5 appears to differentiate participants with respect to conscious eating behavior related to the choice of diet or comfort foods. Our analysis adds to the growing literature on factor analysis of socio-demographic influences on nutrition and health.
CONCEPT PAPER | doi:10.20944/preprints202007.0084.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Hyperspectral Imagery (HSI); Hyperspectral Document Imagery (HSDI); k-means clustering; Principal component analysis (PCA)
Online: 5 July 2020 (15:28:52 CEST)
Hyperspectral imaging provides vital information about the objects and elements present inside the image. That’s why they are very useful in satellite imagery as well as image forensics. Hyperspectral document analysis (HSDI) can be used for document authentication using ink analysis which can provide sufficient information about the composition and type of ink. In this project, we have implemented HSDI based ink classification technique using Principle Component Analysis for dimensionality reduction and K-means clustering for ink classification. This is unsupervised learning approach and it is very simple and efficient in order to classify limited number of bands. We have used this technique to classify 33 different bands of ink.
REVIEW | doi:10.20944/preprints201912.0149.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: damage detection; machine learning; principal component analysis; composites; micromechanics of damage; continuum damage mechanics
Online: 11 December 2019 (04:50:48 CET)
The loss of integrity and adverse effect on mechanical properties can be concluded as attributing miro/macro-mechanics damage in structures, especially in composite structures. Damage as a progressive degradation of material continuity in engineering predictions for any aspects of initiation and propagation requires to be identified by a trustworthy mechanism to guarantee the safety of structures. Besides the materials design, structural integrity and health are usually prone to be monitored clearly. One of the most powerful methods for the detection of damage is machine learning (ML). This paper presents the state of the art of ML methods and their applications in structural damage and prediction. Popular ML methods are identified and the performance and future trends are discussed.
Subject: Earth Sciences, Geology Keywords: gold deposit; alteration information; ASTER image; support vector machine (SVM); principal component analysis (PCA)
Online: 22 October 2019 (04:26:18 CEST)
Dayaoshan, as an important metal ore producing area in China, is faced with the dilemma of resource depletion due to long-term exploitation. In this paper, remote sensing method is used to circle the favorable metallogenic areas and find new ore points for Gulong. Firstly, vegetation interference bas been removed by using mixed pixel decomposition method with hyperplane and genetic algorithm (GA) optimization; then, altered mineral distribution information has been extracted based on principal component analysis (PCA) and support vector machine (SVM) method; Thirdly, the favorable areas of gold mining in Gulong has been delineated by using ant colony algorithm (ACA) optimization SVM model to remove false altered minerals; Lastly, field survey verified that the extracted alteration mineralization information is correct and effective. The results show that the mineral alteration extraction method proposed in this paper has certain guiding significance for metallogenic prediction by remote sensing.
ARTICLE | doi:10.20944/preprints201903.0038.v1
Subject: Chemistry, Other Keywords: biomass; chemometrics, genotype; HSQC NMR; lignin; Miscanthus X giganteus; monolignol ratio; principal component analysis
Online: 4 March 2019 (10:30:04 CET)
As a renewable industrial crop, Miscanthus offers numerous advantages, namely high photosynthesis activity (as a C4 plant) and exceptional CO2 fixation rate. These properties make Miscanthus very attractive for industrial exploitation, such as lignin generation. Here, we present a systematic study analyzing the correlation of the lignin structure with Miscanthus genotype and plant portion (stem versus leaf). Specifically, the ratio of the three monolignols and corresponding building blocks as well as the linkages formed between the units have been studied. Depending on the Miscanthus genotype and plant component (leaf versus stem), correlations between chemical structure and properties of the lignins have been determined, i.e. correlations in molecular weight, polydispersity and decomposition temperature. Lignin isolation was performed using non-catalyzed organosolv pulping and the structure analysis includes NREL, FTIR, UV-Vis, HSQC-NMR, TGA, pyrolysis GC/MS. Structural differences were found for stem and leaf-derived lignins. Compared to beech wood lignins, Miscanthus lignins possess lower molecular weight and narrow polydispersities (< 1.5 Miscanthus vs. > 2.5 beech) corresponding to improved homogeneity. In addition to conventional univariate analysis of FTIR spectra, multivariate chemometrics revealed distinct differences for aromatic in-plane deformations of stem versus leaf-derived lignins. These results emphasize the potential of Miscanthus as low-input resource and Miscanthus-derived lignin as promising agricultural feedstock.
ARTICLE | doi:10.20944/preprints201810.0370.v1
Subject: Engineering, Energy & Fuel Technology Keywords: principal component analysis (PCA); fluid catalytic cracking (FCC); waste valorization; scrap tires; polyolefin pyrolysis
Online: 16 October 2018 (17:43:50 CEST)
Associating the most influential parameters with the product distribution is of uttermost importance in complex catalytic processes such as fluid catalytic cracking (FCC). These correlations can lead to the information-driven catalyst screening, kinetic modeling and reactor design. In this work, a dataset of 104 uncorrelated experiments, with 64 variables, has been obtained in an FCC simulator using 6 types of feedstock (vacuum gasoil, polyethylene pyrolysis waxes, scrap tire pyrolysis oil, dissolved polyethylene and blends of the previous), 36 possible sets of conditions (varying contact time, temperature and catalyst/oil ratio) and 3 industrial catalysts. Principal component analysis (PCA) has been applied over the dataset, showing that the main components are associated with feed composition (27.41% variance); operational conditions (19.09%) and catalyst properties (12.72%). The variables of each component have been correlated with the indexes and yields of the products: conversion, octane number, aromatics, olefins (propylene) or coke, among others.
ARTICLE | doi:10.20944/preprints202107.0657.v1
Subject: Engineering, Other Keywords: underground engineering; numerical simulation; excavation length effect; major principal stress; displacement; damage initiation; CPU time
Online: 29 July 2021 (13:10:23 CEST)
Keywords: underground engineering; numerical simulation; excavation length effect, major principal stress; displacement; damage initiation; CPU time
ARTICLE | doi:10.20944/preprints202012.0791.v1
Subject: Biology, Anatomy & Morphology Keywords: Mint; Plant volatiles; Electronic Nose; Principal Component Analysis; Linear Discriminant Analysis; k-Nearest-Neighbors Analysis
Online: 31 December 2020 (11:45:40 CET)
Mints emit diverse scents that exert specific biological functions and are relevance for applications. The current work strives to develop electronic noses that can electronically discriminate the scents emitted by different species of Mint as alternative to conventional profiling by gas chromatography. Here, 12 different sensing materials including 4 different metal oxide nanoparticle dispersions (AZO, ZnO, SnO2, ITO), one Metal-Organic Frame as Cu(BPDC), and 7 different polymer films including PVA, PEDOT: PSS, PFO, SB, SW, SG, PB were used for functionalizing of QCM sensors. The purpose was to discriminate six economically relevant Mint species (Mentha x piperita, Mentha spicata, Mentha spicata ssp. crispa, Mentha longifolia, Agastache rugosa, and Nepeta cataria). The adsorption and desorption datasets obtained from each modified QCM sensor were processed by three different classification models including Principal Component Analy-sis (PCA), Linear Discriminant Analysis (LDA), and k-Nearest Neighbor Analysis (k-NN). This allowed discriminating the different Mints with classification accuracies of 97.2% (PCA), 100% (LDA), and 99.9% (k-NN), respectively. Prediction accuracies with a repeating test measurement reached up to 90.6% for LDA, and 85.6% for k-NN. These data demonstrate that this electronic nose can discriminate different Mint scents in a reliable and efficient manner.
TECHNICAL NOTE | doi:10.20944/preprints202201.0353.v1
Subject: Life Sciences, Other Keywords: RNA sequencing; metabolomics; data visualization; overrepresentation analysis; cluster analysis; principal component analysis; scientific plotting; Mapman; Mercator
Online: 24 January 2022 (12:36:04 CET)
Next generation sequencing and metabolomics have become very cost and work efficient and are integrated into an ever growing number of life science research projects. Typically, well established software pipelines provide quantitative data informing about gene expression or concentrations of metabolites from the raw data. This data needs to be visualized and further analyzed in order to support scientific hypothesis building and identification of underlying biological patterns. Some tools exist, but require installation or manual programming. We developed “Gene Expression Plotter” (GXP), an RNA-Seq and Metabolomics data visualization and analysis tool entirely running in the user’s web browsers, thus not needing any custom installation, manual programming or upload of confidential data to third party servers. GXP enables the user to generate interactive plots, visually summarize genetic or metabolic responses in scientific sketches (Mapman), carry out cluster and principal component analysis, and conduct overrepresentation analyses. GXP can be used to publish research data along with interactive plots and results of analyses carried out with it. GXP is freely available on GitHub: https://github.com/usadellab/GeneExpressionPlots
COMMUNICATION | doi:10.20944/preprints202111.0549.v1
Subject: Keywords: Principal Component Regression, Partial Least Squares, Orthogonal Partial Least Squares, multivariate regression, hypothesis generation, Parkinson’s disease
Online: 29 November 2021 (15:42:03 CET)
In the current era of ‘big data’, scientists are able to quickly amass enormous amount of data in a limited number of experiments. The investigators then try to hypothesize about the root cause based on the observed trends for the predictors and the response variable. This involves identifying the discriminatory predictors that are most responsible for explaining variation in the response variable. In the current work, we investigated three related multivariate techniques: Principal Component Regression (PCR), Partial Least Squares or Projections to Latent Structures (PLS), and Orthogonal Partial Least Squares (OPLS). To perform a comparative analysis, we used a publicly available dataset for Parkinson’ disease patien ts. We first performed the analysis using a cross-validated number of principal components for the aforementioned techniques. Our results demonstrated that PLS and OPLS were better suited than PCR for identifying the discriminatory predictors. Since the X data did not exhibit a strong correlation, we also performed Multiple Linear Regression (MLR) on the dataset. A comparison of the top five discriminatory predictors identified by the four techniques showed a substantial overlap between the results obtained by PLS, OPLS, and MLR, and the three techniques exhibited a significant divergence from the variables identified by PCR. A further investigation of the data revealed that PCR could be used to identify the discriminatory variables successfully if the number of principal components in the regression model were increased. In summary, we recommend using PLS or OPLS for hypothesis generation and systemizing the selection process for principal components when using PCR.rewordexplain later why MLR can be used on a dataset with no correlation
ARTICLE | doi:10.20944/preprints202106.0278.v1
Subject: Materials Science, Biomaterials Keywords: Basil; Mint; Plant volatiles; Electronic Nose; Principal Component Analysis,; Linear Discriminant Analysis; k-Nearest-Neighbors Analysis.
Online: 10 June 2021 (08:09:36 CEST)
The Lamiaceae belong to the species-richest families of flowering plants and harbor many species used as herbs or for medicinal applications, such as Basils or Mints. Evolution of this group has been driven by chemical speciation, mainly of Volatile Organic Compounds (VOCs). The commercial use of these plants is characterized by a large extent of adulteration and surrogation. To authenticate and discern the species, is, thus, relevant for consumer safety, but usually requires cumbersome analytics, such as Gas Chromatography, often to be coupled with Mass Spectroscopy. We demon-strate here that quartz-crystal microbalance (QCM)-based electronic noses provide a very cost-efficient alternative, allowing for a fast, automated discrimination of scents emitted from leaves of different plants. To explore the range of this strategy, we used leaf material from four genera of Lamiaceae along with Lemongrass as similarly scented, but non-related outgroup. In order to unambiguously differentiate the scents from the different plants, the output of the 6 different SURMOF/QCM sensors was analyzed using machine learning (ML) methods, together with a thorough statistical analysis. The exposure and purging datasets (4 cycles) obtained from a QCM-based, low-cost homemade portable e-Nose were analyzed with Linear Discriminant Analysis (LDA) classification model. Prediction accuracies with repeating test measurements reached values of up to 90%. We show that it is not only possible to discern and identify plants on the genus level, but even to discriminate closely related sister clades within a genus (Basil), demonstrating that e-Noses are a powerful technology to safeguard consumer safety against the challenges of globalized trade.
ARTICLE | doi:10.20944/preprints202101.0590.v1
Subject: Earth Sciences, Atmospheric Science Keywords: Environment; ·Comprehensive treatmen; ·public-private partnerships (PPP); ·Full life cycle; ·Risk assessmen; ·Principal component analysis (PCA)
Online: 28 January 2021 (15:39:03 CET)
China's implementation of public-private partnerships projects has been quite effective, involving infrastructure and other livelihood projects, a total of 19 industries, and an investment of nearly 1.5 trillion yuan. The characteristics of PPP projects such as long construction period and large investment amount determine the risks of PPP projects are also great, and the PPP projects of comprehensive environmental governance are also the same. The government and social capital use the PPP model to cooperate, and use the principal component analysis method to assess the risks of the entire life cycle of the comprehensive environmental governance PPP project. Therefore, it plays an important role in ensuring the smooth implementation of projects and reducing the losses caused by risks. According to the risk factors of the whole life cycle of the comprehensive environmental governance PPP project, an indicator system of 5 first-level indicators, 18 second-level indicators, and 43 third-level indicators has been established. Principal component analysis is used to analyze the influence weight of risk factors at each stage. The analysis shows that among the four stages, environmental pollution risk, project approval delay risk, completion risk, interest rate and financial fluctuation risk, and franchise life risk are the most influential risks in the implementation of PPP projects. Therefore, suggestions are made through the risk factors of each stage in the comprehensive environmental governance PPP project. For example, strengthen the response to the external environment risks of the comprehensive environmental governance PPP project, standardize the bidding and procurement of the comprehensive environmental governance PPP project, and strengthen the subsequent management of the transfer of the comprehensive environmental governance PPP project. In this way, the ability to resist risks of comprehensive environmental governance PPP projects is improved; the smooth implementation of the project is guaranteed, and the long-term development of comprehensive environmental governance PPP projects is promoted.
ARTICLE | doi:10.20944/preprints202004.0398.v1
Subject: Social Sciences, Other Keywords: COVID-19; Perception-based questionnaire; principal component analysis (PCA); Linear regression model; social panic; social conflict
Online: 22 April 2020 (09:55:38 CEST)
The COVID-19 pandemic situation, disease intensity, weak healthcare facilities, unawareness, and misinformation led people to fear and anxiety in Bangladesh. This study intended to get peoples’ perception on psychosocial, socio-economic and environmental crisis amidst the pandemi. An online questionnaire was surveyed nationwide (respondents no.1066). Datasets were analyzed through the Principal Component Analysis (PCA), hierarchical Cluster Analysis (CA), Pearson’s correlation matrix (PCM), Linear regression analysis (LRA), and psychometric characteristics were included in the Classical Test Theory (CTT) analysis. There were good associations among the psychosocial, socio-economic and environmental parameters. A significant association between fear of COVID-19 with struggling healthcare system (p<0.05) was found. Also, negative association between fragile health system and government’s ability to deal with the pandemic (p<0.05) revealing poor governance. Again, a positive association of shutdown and social distancing with fear of losing life, and due to lack of health treatment (p<0.05) reveals that shut down hampers normal activities which may lead to mental and economic stress. However, a positive association of socio-economic impact of the shutdown with poor people’s suffering, the price hike of basic need, hamper of formal education (p<0.05) may lead to severe socio-economic and health crisis. There is a possibility of climate-induced disaster during/after the pandemic, which will create severe food insecurity (p<0.01). Daily wage earners and poors will suffer most by food and nutritional deficiency, and the country may face huge economic burden. Proper risk assessment and communications is needed to alleviate fear and anxiety. Thus, financial support and mental boosting is required.
ARTICLE | doi:10.20944/preprints201812.0024.v1
Subject: Chemistry, Food Chemistry Keywords: extra-virgin olive oil adulteration; vegetables oils; triglycerides; fatty acids; linear discriminant analysis; principal component analysis
Online: 3 December 2018 (13:38:58 CET)
Nowadays, the fingerprinting methodologies of olive oils are dominated. They consider the entire analytical signal, which is acquired and recorded by the analytical instrument, directly from olive oil or isoleted fraction, i,e chromatogram. The shape and intensity of the recorded signal the instrumental fingerprint from the whole olive oil adulteration. Therefore, the methodolygy is based on the chemical composition (Fatty acids and Triglycerides compositions). However, Fatty acids composition as an indicator of purity suggests that linolenic acid content could be used as a parameter for the detection of extra virgin olive oil fraud with 5% of soybean oil. The adulteration could also be detected by the increase of the trans-fatty acid contents with 3% of soybean oil, 2% of corn oil and 4% of sunflower oil. The use of the ∆ECN42 proved to be effective in the Chemlali extra-virgin olive oil adulteration even at low levels: 1% of sunflower oil, 3% of soybean oil and 3% of corn oil. Therefore, compared to classical methods PCA and new approach of using LDA application could represent an alternative and innovative tool for faster and cheaper evaluation of extra-virgin olive oil adulteration.
ARTICLE | doi:10.20944/preprints201805.0045.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: robust principal component analysis; video separation; compressive measurements; prior information; optical flow; motion estimation; motion compensation
Online: 2 May 2018 (13:19:49 CEST)
In the context of video background-foreground separation, we propose a compressive online Robust Principal Component Analysis (RPCA) with optical flow that separates recursively a sequence of video frames into foreground (sparse) and background (low-rank) components. This separation method can process per video frame from a small set of measurements, in contrast to conventional batch-based RPCA, which processes the full data. The proposed method also leverages multiple prior information by incorporating previously separated background and foreground frames in an n-l1 minimization problem. Moreover, optical flow is utilized to estimate motions between the previous foreground frames and then compensate the motions to achieve higher quality prior foregrounds for improving the separation. Our method is tested on several video sequences in different scenarios for online background-foreground separation given compressive measurements. The visual and quantitative results show that the proposed method outperforms other existing methods.
ARTICLE | doi:10.20944/preprints202206.0202.v1
Subject: Social Sciences, Economics Keywords: Machine Learning; Clusterization; Elbow Method; Prediction; Correlation Matrix; Principal Component Analysis; Binary and non-Binary regression models
Online: 14 June 2022 (09:54:46 CEST)
The following article presents an analysis of the determinants of diabetes using a dataset containing the surveys of 2000 patients from the Frankfurt Hospital in Germany. The data were analyzed using the following models, namely: Tobit, Probit, Logit, Multinomial Logit, OLS, WLS with heteroskedasticity. The results show that the presence of diabetes is positively associated with "Pregnancies", "Glucose", "BMI", "Diabetes Pedigree Function", "Age" and negatively associated with "Blood Pressure". A cluster analysis is realized using the fuzzy c-Means algorithm optimized with the Elbow method and three clusters were found. Finally a confrontation among eight different machine learning algorithms is realized to select the best performing algorithm to predict the probability of patients to develop diabetes.
ARTICLE | doi:10.20944/preprints201911.0079.v1
Subject: Earth Sciences, Environmental Sciences Keywords: environmental fate; Raman spectroscopy; chemometrics; principal component analysis, biodegradation; kinetics; post-processing; Whittaker filter; partial least square
Online: 8 November 2019 (03:03:34 CET)
Surfactants based on polyfluoroalky ethers are commonly used in fire-fighting foams on airport platforms, including for training sessions. Because of their persistence into the environment, their toxicity and their bioaccumulation, abnormal amounts can be found in ground and surface water following operations of airport platforms. As many other anthropogenic organic compounds, some concerns raised about their biodegradation. That is why the OECD 301 F protocol was implemented to appreciate the oxygen consumption during the biodegradation of a commercial fire-fighting foam. However, a Raman spectroscopic monitoring of the process was also attached to this experimental procedure to evaluate to what extent a polyfluoroalkyl ether disappeared from the environmental matrix. The relevance of our approach is to use chemometrics, including the Principal Component Analysis (PCA) and the Partial Least Square (PLS), in order to monitor the kinetics of the biodegradation reaction of one fire-fighting foam, Tridol S3B, containing a polyfluoroalkyl ether. This study provided a better appreciation of the partial biodegradation of some polyfluoroalkyl ethers by coupling Raman spectroscopy and chemometrics. This will ultimately facilitates the design of a future purification and remediation devices for the airport platforms.
ARTICLE | doi:10.20944/preprints201809.0344.v1
Subject: Social Sciences, Economics Keywords: inclusive growth; CEE countries; sustainable development; globalization; cohesion; public policy; factor analysis; principal component analysis; bibliometric analysis
Online: 18 September 2018 (10:38:58 CEST)
Referring to the concept of inclusive growth, the authors analyse the transition economies of the Central and Eastern European countries, which are the current EU members (Bulgaria, Croatia, the Czech Republic, Estonia, Latvia, Lithuania, Hungary, Poland, Romania, Slovakia and Slovenia). That region was selected as the CEE countries characterized by comparable historic and economic background but now they seem to reach diversified stages of development. The objective of the study is to identify the level of inclusive growth among the CEE countries, taking into account indicators assigned to its seven pillars. The thesis is that the CEE countries represent socio and economic heterogeneity as well as different levels of sustainable development. The research methods involved the application of the principal components analysis and the multivariate analysis. For literature review, the bibliometric analysis was conducted with the visualization prepared by the VOSviewer software. The main findings suggest that Estonia, Slovenia and the Czech Republic seem to be the ones with the highest inclusive growth. On the other hand, Bulgaria and Romania represent the lowest level of inclusive growth indicators.
ARTICLE | doi:10.20944/preprints202105.0716.v1
Subject: Engineering, Civil Engineering Keywords: hard rock mine; cemented rock fill (CRF); backfilling step scenario; major principal stress; stress concentration factor (SCF); displacement
Online: 31 May 2021 (08:43:52 CEST)
Cemented rock fill (CRF) is commonly used in cut-and-fill stoping operation in underground mining. This allows for the maximum recovery of ore. Backfilling can improve stope stability in underground workings, and then improve ground stability of the whole mine site. Backfilling step scenarios vary from site to site. This paper presents the investigation of five different backfilling step scenarios and their impacts on the stability of stopes at four different mining levels. A comprehensive comparison of displacements, major principal stress and stress concentration factor (SCF) was conducted. The results show that different backfilling step scenarios have little influence on the final displacement for displacement in the stopes. Among the five backfilling scenarios, the major principal stress and stress concentration factor (SCF) have almost the same final results. The backfilling scenario SCN-1 is the optimum option among these five backfilling scenarios. It can immediately prevent the increase of the displacement and reduce the sidewall stress concentration, thereby preventing possible failures. Using the same strength of CRF can achieve same effects among the four mining levels. Applying backfilling CRF of the same strength at different mining depths is acceptable and feasible to improve the stability of the stopes.
ARTICLE | doi:10.20944/preprints201909.0139.v1
Subject: Mathematics & Computer Science, Other Keywords: computer-aided detection (CAD) system\and computerized tomography (CT) scan; acquisition; segmentation; Classification and Principal Components Analysis (PCA)
Online: 14 September 2019 (18:25:45 CEST)
Lung cancer is a deadly disease if not diagnosed in its early stages. However, early detection of lung cancer is a challenging task due to the shape and size of its nodules. Radiologists need support from automated tools for precise opinion. Automated detection of the affected lungs nodule is difficult because of the shape similarity among healthy tissues. Over the years, several expert systems have been developed that help radiologists to diagnose lung cancer. In this article, we propose a framework to precisely detect lungs cancer by classifying it between benign and malignant nodules. The framework is tested using the subset of the publicly available at the Lung Image Database Consortium image collection (LIDC-IDRI). Multiple techniques including filtering and noise removing are applied for pre-processing. Subsequently, the OTSU and the semantic segmentation are used to accurately detect the unhealthy lungs nodules. In total, 13 nodules features were extracted using Principal Components Analysis (PCA) algorithm. Four optimal features are selected based on the classification performance. In the classification phase, 9 different classifiers are used along with two types of validation schemes i.e. train test holdout validation with 70-30 data split and 10 fold cross-validation. Our experiments show that the proposed system provides 99.23\% accuracy using logic boost classifier.
ARTICLE | doi:10.20944/preprints201906.0095.v1
Subject: Keywords: paracetamol; laser-induced breakdown spectroscopy; cyanide; carbon Swan bands; principal component analysis; Raman spectroscopy; Fourier-Transform-infra-red spectroscopy
Online: 11 June 2019 (10:42:58 CEST)
Laser-induced breakdown spectroscopy (LIBS) of pharmaceutical drugs that contain paracetamol is investigated in air and argon atmospheres. Characteristic neutral and ionic spectral lines of various elements and molecular signatures of CN violet and C2 Swan band systems are observed. The relative hardness of all drug samples is measured as well. Principal component analysis, a multivariate method, is applied in data analysis for demarcation purposes of the drug samples. The CN violet and C2 Swan spectral radiances are investigated for evaluation of possible correlation of the chemical and molecular structures of the pharmaceuticals. Complementary Raman and Fourier-transform-infra-red spectroscopies are used record molecular spectra of the drug samples. The applicationof the above techniques for the drug screening are important for identification and mitigation of drugs that reveal additives that may cause adverse side-effects.
ARTICLE | doi:10.20944/preprints201711.0020.v2
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: the free-energy principle; internal model hypothesis; unconscious inference; infomax principle; predictive information; independent component analysis; principal component analysis
Online: 11 May 2018 (06:24:06 CEST)
The mutual information between the state of a neural network and the state of the external world represents the amount of information stored in the neural network that is associated with the external world. In contrast, the surprise of the sensory input indicates the unpredictability of the current input. In other words, this is a measure of inference ability, and an upper bound of the surprise is known as the variational free energy. According to the free-energy principle (FEP), a neural network continuously minimizes the free energy to perceive the external world. For the survival of animals, inference ability is considered to be more important than simply memorized information. In this study, the free energy is shown to represent the gap between the amount of information stored in the neural network and that available for inference. This concept involves both the FEP and the infomax principle, and will be a useful measure for quantifying the amount of information available for inference.
ARTICLE | doi:10.20944/preprints202108.0325.v1
Subject: Earth Sciences, Environmental Sciences Keywords: Multi-granularity encoding neural networks (MGNNE); feature extraction; multilayer perceptron (MLP); Principal component analysis (PCA); Remote Sensing image classification,LCLU.
Online: 16 August 2021 (11:28:21 CEST)
Deep learning classification is the state-of-the-art of machine learning approach. Earlier work proves that the deep convolutional neural network has successfully and brilliantly in different applications such as images or video data. Recognizing and clarifying the remote sensing aspect of the earth's surface and exploit land cover and land use (LCLU). First, this article summarized the remote sensing emerging application and challenges for deep learning methods. Second, we propose four approaches to learn efficient and effective CNNs to transfer image representation on the ImageNet dataset to recognize LCLU datasets. We use VGG16, Inception-ResNet-V2, Inception-V3, and DenseNet201 models to extract features from the EACC dataset. We use pre-trained CNNs on ImageNet to extract features. For feature selection we proposed principal component analysis (PCA) to improve accuracy and speed up the model. We train our model by multi-layer perceptron (MLP) as a classifier. Lastly, we apply the multi-granularity encoding ensemble model. We achieve an overall accuracy of 92.3% for the nine-class classification problem. This work will help remote sensing scientists understand deep learning tools and apply them in large-scale remote sensing challenges
ARTICLE | doi:10.20944/preprints201709.0092.v1
Subject: Engineering, Energy & Fuel Technology Keywords: renewable energy networks; principal component analysis; large-scale integration of renewables; wind power; solar power; super grid; energy system design
Online: 20 September 2017 (04:44:36 CEST)
Due to its spatio-temporal variability, the mismatch between the weather and demand patterns challenges the design of highly renewable energy systems. A principal component analysis is applied to a simplified networked European electricity system with a high share of wind and solar power generation. It reveals a small number of important mismatch patterns, which explain most of the system's required backup and transmission infrastructure. Whereas the first principal component is already able to reproduce most of the temporal mismatch variability for a solar dominated system, a few more principal components are needed for a wind dominated system. Due to its monopole structure the first principal component causes most of the system's backup infrastructure. The next few principal components have a dipole structure and dominate the transmission infrastructure of the renewable electricity network.
ARTICLE | doi:10.20944/preprints201912.0163.v4
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: ARCH; ARMA; functional data; functional principal components; functional time series; GARCH; invertible linear processes; parameter estimation; stationary solutions; Yule-Walker equation
Online: 23 September 2020 (04:32:09 CEST)
Conditional heteroskedastic financial time series are commonly modelled by (G)ARCH processes. ARCH(1) and GARCH were recently established in C[0,1] and L^2[0,1]. This article provides sufficient conditions for the existence of strictly stationary solutions, weak dependence and finite moments of (G)ARCH processes for any order in C[0,1] and L^p[0,1]. It deduces explicit asymptotic upper bounds of estimation errors for the shift term, the complete (G)ARCH operators and the projections of ARCH operators on finite-dimensional subspaces. The operator estimaton is based on Yule-Walker equations, and estimating the GARCH operators also involves a result estimating operators in invertible linear processes being valid beyond the scope of (G)ARCH. Moreover, our results regarding (G)ARCH can be transferred to functional AR(MA).
ARTICLE | doi:10.20944/preprints202005.0356.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Supervised Learning; Time Series Classification; Jamming Detection; Automatic Modulation Classification; Feature Selection; Genetic Algorithm; Principal Component Analysis; QPSK modulation; APSK modulation
Online: 23 May 2020 (05:10:36 CEST)
Satellite communication (Satcom) is an artificial geostationary satellite that facilitates a wide range of telecommunications. Considering its quality of service (QoS) and security is crucial in government/military applications. The most challenging situation for efficient Satcom is radio frequency interference (RFI) environment. Thus, it is necessary to ensure that transmissions are incorruptible or at least sense the quality of its spectrum. This paper presents a new method to recognize received signal characteristics using a hierarchical classification in a multi-layer perceptron neural network. We consider signal modulation and the type of RFI as the characteristics of a real-time video stream transmitted in the direct broadcast satellite. Four different modulation types are investigated in this study. Moreover, the combination of the communication signal with various kinds of interference and their effects on the classification method widely have been analyzed. Besides, two robust feature selection techniques have been developed to reduce the data-set dimensional, which leads to optimizing the classification process. The results show that the Genetic Algorithm (GA) slightly outperforms Principal Component Analysis (PCA) for feature selection. Furthermore, the robustness of the proposed techniques is assessed to detect unknown signals at different signal to noise ratios.
ARTICLE | doi:10.20944/preprints201907.0033.v1
Subject: Physical Sciences, Atomic & Molecular Physics Keywords: laser-induced plasma; atomic spectroscopy; laser-induced breakdown spectroscopy; 29 atomic spectroscopy; principal component analysis; partial least-square regression; gypsum; Mars
Online: 2 July 2019 (08:03:52 CEST)
The first detection of gypsum (CaSO4.2H2O) by the Mars Science Laboratory (MSL) rover Curiosity in the Gale Crater, Mars created a profound impact on planetary science and exploration. The unique capability of plasma spectroscopy involving in situ elemental analysis in extraterrestrial environments, suggesting the presence of water in the red planet based on phase characterization and providing a clue to Martian paleoclimate. The key to gypsum as an ideal paleoclimate proxy lies in its textural variants, and in this study terrestrial gypsum samples from varied locations and textural types have been analyzed by Laser Induced Breakdown Spectroscopy (LIBS) technique. Petrographic, sub-microscopic and powder X-ray diffraction characterizations confirm the presence of gypsum (hydrated calcium sulphate; CaSO4.2H2O), bassanite (semi-hydrated calcium sulphate; CaSO4.1/2H2O) and anhydrite (anhydrous calcium sulphate; CaSO4) along with accessory phases (quartz and jarosite). The principal component analysis of LIBS spectra from texturally varied gypsums can be differentiated from one another because of the chemical variability in their elemental concentrations. The concentration of gypsum is determined from the partial least-square regressions model. Rapid characterization of gypsum samples with LIBS is expected to work well in extraterrestrial environments.
Subject: Mathematics & Computer Science, Applied Mathematics Keywords: Approach Path Management; Atypical Flight Event; Non-Compliant Approach; Real Time; Anomaly Detection; Functional Principal Component Analysis; Unsupervised Learn- ing; Dubins Path
Online: 12 March 2021 (21:17:22 CET)
In this paper, a complete tool for real-time detection of atypical energy behaviors of airplanes is presented. The methodology extends in real time an existing offline process using Dubins trajectories as a predictor of the remaining distance to the runway threshold. Two major contributions are presented in this paper. First, a real-time measure of the aircraft energy behaviour is defined, indicating whether the aircraft is in good condition to intercept the extended runway centreline from its current position. Secondly, a 2D trajectory suggestion is given, allowing safe management of the approach path according to atypical criteria of historical data. Finally, this document proposes a comprehensive tool for air traffic controllers, which is a major step forward in understanding, becoming aware of and resolving critical situations that could lead to accidents.
ARTICLE | doi:10.20944/preprints201810.0367.v1
Subject: Mathematics & Computer Science, Applied Mathematics Keywords: action integral, fiber bundle, connection in a principal fiber bundle and its curvature, pull-back of forms, Lie groups and their algebras.
Online: 16 October 2018 (16:44:23 CEST)
In the paper we show that the general relativity in recent Einstein-Palatini formulation is equivalent to a gauge field. We begin with a bit of information of the Einstein-Palatini formulation and derive Einstein field equations from it. In the next section, we consider general relativity with a positive cosmological constant in terms of the corrected curvature. We show that in terms of the corrected curvature general relativity takes the form typical for a gauge field. Finally, we give a geometrical interpretation of the corrected curvature.
ARTICLE | doi:10.20944/preprints202012.0001.v1
Subject: Life Sciences, Biochemistry Keywords: Mungbean; low phosphorus; drought stress; organic acid exudation; photosynthetic rate; relative water content; membrane stability index; stress susceptibility index; principal component analysis ranking
Online: 1 December 2020 (08:05:27 CET)
To understand the physiological basis of tolerance to combined stresses to low phosphorus (P) and drought in mungbean (Vigna radiata (L.) R. Wilczek), a core set of 100 accessions were evaluated in hydroponics at sufficient (250 μM) and low (3 μM) P, and exposed to drought (dehydration) stress. The principal component analysis and ranking of accessions based on relative values revealed that IC280489, EC397142, IC76415, IC333090, IC507340 and IC121316 performed superior while IC119005, IC73401, IC488526 and IC325853 performed poorly in all treatments. Selected accessions were evaluated in soil under control (sufficient P, irrigated), low P (without P, irrigated), drought (sufficient P, withholding irrigation) and combined stress (low P, withholding irrigation). Under combined stress, a significant reduction in gas exchange traits (photosynthesis, stomatal conductance, transpiration, instantaneous water use efficiency), P uptake in seed and shoot was observed under combined stress as compared to individual stresses. Among accessions, IC488526 was most sensitive while IC333090 and IC507340 exhibited tolerance to individual or combined stress. The water balance and low P adaptation traits like membrane stability index, relative water content, specific leaf weight, organic acid exudation, biomass, grain yield and P uptake can be used as physiological markers to evaluate for agronomic performance. Accessions with considerable resilience to low P and drought stress can be either used as ‘donors’ in Vigna breeding program or cultivated in areas with limited P and water availability or both.
ARTICLE | doi:10.20944/preprints202111.0519.v1
Subject: Engineering, Biomedical & Chemical Engineering Keywords: principal component analysis (PCA); motion model; respiratory-correlated four-dimensional cone-beam CT (4D-CBCT); lung cancer; stereotactic body radiotherapy (SBRT); image-guided radiation therapy (IGRT)
Online: 29 November 2021 (10:04:11 CET)
A method for generating fluoroscopic (time-varying) volumetric images using patient-specific motion models derived from 4-dimensional cone-beam CT (4D-CBCT) images is developed. 4D-CBCT images acquired immediately prior to treatment have the potential to accurately represent patient anatomy and respiration during treatment. Fluoroscopic 3D image estimation is done in two steps: 1) deriving motion models and 2) optimization. To derive motion models, every phase in a 4D-CBCT set is registered to a reference phase chosen from the same set using deformable image registration (DIR). Principal components analysis (PCA) is used to reduce the dimensionality of the displacement vector fields (DVFs) resulting from DIR into a few vectors representing organ motion found in the DVFs. The PCA motion models are optimized iteratively by comparing a cone-beam CT (CBCT) projection to a simulated projection computed from both the motion model and a reference 4D-CBCT phase, resulting in a sequence of fluoroscopic 3D images. Patient datasets were used to evaluate the method by estimating the tumor location in the generated images compared to manually defined ground truth positions. Experimental results showed that the average tumor mean absolute error (MAE) along the superior-inferior (SI) direction and the 95th percentile in two patient datasets were (2.29 mm and 5.79 mm) for patient 1 and (1.89 mm and 4.82 mm) for patient 2. This study has demonstrated the feasibility of deriving 4D-CBCT-based PCA motion models that have the potential to account for the 3D non-rigid patient motion and localize tumors and other patient anatomical structures on the day of treatment.