TECHNICAL NOTE | doi:10.20944/preprints202209.0404.v1
Subject: Engineering, Energy And Fuel Technology Keywords: Recurrent Neural Network; Renewable Energy; Power consumption; Open Power System Data; Multivariate Exploratory; Time series forecasting
Online: 27 September 2022 (02:44:29 CEST)
The environmental issues we are currently facing require long-term prospective efforts for sustainable growth. Renewable energy sources seem to be one of the most practical and efficient alternatives in this regard. Understanding a nation's pattern of energy use and renewable energy production is crucial for developing strategic plans. No previous study has been performed to explore the dynamics of power consumption with the change in renewable energy production on a country-wide scale. In contrast, a number of deep learning algorithms demonstrated acceptable performance while handling sequential data in the era of data-driven predictions. In this study, we developed a scheme to investigate and predict total power consumption and renewable energy production time series for eleven years of data using a Recurrent Neural Network (RNN). The dynamics of the interaction between the total annual power consumption and renewable energy production are investigated through extensive Exploratory Data Analysis (EDA) and a feature engineering framework. The performance of the model is found satisfactory through the comparison of the predicted data with the observed data, visualization of the distribution of the errors and Root Mean Squared Error (RMSE) value of 0.084. Higher performance is achieved through the increase in the number of epochs and hyperparameter tuning. The proposed framework can be used and transferred to investigate the trend of renewable energy production and power consumption and predict the future scenarios for different communities. Incorporation of the cloud-based platform into the proposed pipeline may lead to real-time forecasting.
ARTICLE | doi:10.20944/preprints202303.0026.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: Time Series Segmentation; Deep Learning; Multivariate Time Series; Transfer Learning; End-of-Line Testing
Online: 2 March 2023 (01:19:34 CET)
Industrial data scarcity is one of the largest factors holding back the widespread use of machine learning in manufacturing. To overcome this problem, the concept of transfer learning was developed and it achieved high attention in recent industrial research. Our paper focuses on the problem of time series segmentation and presents the first in-depth research about transfer learning for deep-learning based time series segmentation on the example of industrial end-of-line pump testing. In particularly, we investigate if the performance of deep learning models can be increased by pretraining the network with data from other domains. Three different scenarios are analyzed: source and target data being closely related, source and target data being distantly related, and source and target data being non-related. The results demonstrate that transfer learning can enhance the performance of time series segmentation models in respect to accuracy and training speed. The benefit is most clearly seen in scenarios where source and training data are closely related and the number of target training data samples is lowest. However, in the scenario of non-related datasets, cases of negative Transfer Learning were observed as well. Thus, the research emphasizes the potential, but also the challenges of industrial Transfer Learning.
ARTICLE | doi:10.20944/preprints202211.0247.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: Chemometrics; Wastewater; Hydrochemical Characterization; North Africa; Environmental Science; Groundwater; Data Analysis; Time Series; Multivariate Analysis; Statistics
Online: 14 November 2022 (09:25:04 CET)
Drinking water quality is a major concern, especially in African countries. This manuscript aims to analyze the chemical composition of Lioua’s groundwater in order to determine the geological processes influencing the chemical elements' composition and origin. Therefore, chemometrics techniques such as multivariate statistical analysis (MSA) and time series methods (TSM) are used. Indeed, MSA includes a component analysis (PCA) and a cluster analysis (CA), while au-tocorrelation analysis (AA) supplemented by simple spectral density analysis (SDA) is used for TMS. PCA displays three main factors explaining a total variance (TV) of 85.01 %. Factors 1, 2, and 3 are 68.72%, 11.96%, and 8.89 % of TV, respectively. In the CA, three groups were controlled by TDS and EC. G1 reveals a close association between SO42−, K+, Ca2+, and TDS; G2 reveals a close association between Na+, Cl−, Mg2+, and EC; G3 shows the dissociation of bicarbonates HCO3− and NO3− from other chemical elements. AA shows a linear interrelationship of EC, Mg2+, Na+, K+, Cl−, and SO42−. However, NO3− and HCO3− indicate uncorrelated characteristics with other parameters. For SDA, the correlograms of Mg2+, Na+, K+, Cl−, and SO42− have a similar trend with EC. None-theless, pH, Ca2+, HCO3− and NO3− exhibit multiple peaks related to the presence of several dis-tinct cyclic mechanisms. The methods enabled the authors to conclude that the geochemical processes influencing the chemical composition are: (i) dissolution of evaporated mineral depos-its, (ii) water-rock interaction, and (iii) evaporation process. In addition, Groundwater exhibits two bipolar characteristics, one recorded with negative and positive charges on pH and Ca+ and another recorded only with negative charges on HCO3− and NO3−. On the other hand, SO42-, K+, Ca2+, and TDS are the major predominant elements in the groundwater’s chemical composition. The major participation of salts and chlorides is in the electrical conductivity of water. The dominance of the lithological factor in the overall mineralization of the Plio-Quaternary surface aquifer waters. The origins of HCO3− and NO3− are different. Indeed, carbonated for HCO3- has a carbonate origin, whereas NO3– has an anthropogenic origin. The salinity was affected by Mg2+, SO42-, Cl-, Na+, K+, and EC. Ca2+, HCO3− and NO3− are resulted from human activity fertilizers, the carbonate facies outcrops, and domestic sewage.
ARTICLE | doi:10.20944/preprints202201.0447.v1
Subject: Medicine And Pharmacology, Psychiatry And Mental Health Keywords: mass multivariate analysis; neuroimaging, depression, schizophrenia
Online: 31 January 2022 (11:07:48 CET)
We have used Mass Multivariate Method on structural, resting state and task related fMRI data from two groups of patients with schizophrenia and depression, respectively, in order to define several regions of significant relevance to the differential diagnosis between those conditions. The regions included the left Planum polare, Left opercular part of the inferior frontal gyrus (OpIFG), Medial orbital gyrus (MOrG), Posterior Insula (PIns), and Parahippocampal gyrus (PHG). This study delivers evidence that multimodal neuroimaging approach can potentially enhance the validity of psychiatric diagnosis. Either structural, or resting state or task related functional MRI modality cannot provide independent biomarkers. Further studies need to consider and implement a model of incremental validity to combine clinical measures with different neuroimaging modalities to discriminate depressive disorders from schizophrenia. Biological signatures of disease on the level of neuroimaging are more likely to underpin broader nosological entities in psychiatry.
ARTICLE | doi:10.20944/preprints201711.0191.v1
Subject: Business, Economics And Management, Econometrics And Statistics Keywords: Pesticides, Vegetable, Nepal, Determinant, Multivariate Probit
Online: 29 November 2017 (13:27:57 CET)
Currently, the pesticides are the global core concern because it is a boon to farmers against increasing disease-pest and simultaneously, pesticide residue is the major anxiety regarding human health. For that reason, identification and determination of factors affecting the application of pesticides are essential. To identify and evaluate determinants of pesticides application in Nepal, a household survey of 300 households was carried-out and an empirical analysis was done using multivariate probit model. Moreover, powder and liquid forms of pesticides were considered for summer and winter season in vegetable farming, which was assigned as outcome variables. Likewise, socio-economic, demographic, farm-level and perception data were considered as explanatory variables. Use of chemical fertilizers, age and gender of head of household, household size and access to weather information were found the most influencing factors. Moreover, forms of pesticides and growing seasons were found complementary to each other. Therefore, devising the policy options accordingly should balance needs of farmers and health of consumers.
ARTICLE | doi:10.20944/preprints201807.0215.v1
Subject: Engineering, Electrical And Electronic Engineering Keywords: multivariate gaussian mixture model (MVGMM); multivariate linear regression; expectation-maximization imputation; WiFi localization; hidden markov model (HMM)
Online: 12 July 2018 (08:24:06 CEST)
The extensive deployment of wireless infrastructure provides a low-cost way to track mobile users in indoor environment. This paper demonstrates a prototype model of an accurate and reliable room location awareness system in a real public environment, where three typical problems arise. First, a massive number of access points (APs) can be sensed leading to a high-dimensional classification problem. Second, heterogeneous devices record different received signal strength (RSS) levels due to the variations in chip-set and antenna attenuation. Third, APs are not necessarily visible in every scanning cycle leading to missing data. This paper presents a probabilistic Wi-Fi fingerprinting method in a hidden Markov model (HMM) framework for mobile user tracking. Considering the spatial correlation of the signal strengths from multiple APs, a Multivariate Gaussian Mixture Model (MVGMM) is fitted to model the probability distribution of RSS measurements in each cell. Furthermore, the unseen property of invisible AP has been investigated in this research, and demonstrated the efficiency of differentiation between cells. The proposed system is able to achieve comparable localization performance. The filed test results present a reliable 97% localization room level accuracy of multiple mobile users in a real university campus WiFi network without any prior knowledge of the environment.
ARTICLE | doi:10.20944/preprints202012.0321.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: quantile regression; groundwater; environmental; multivariate; metals; health
Online: 14 December 2020 (10:13:09 CET)
One of the most important defining characteristics of groundwater quality is pH as it fundamentally controls the amount and chemical form of many organic and inorganic solutes in groundwater. Groundwater data are frequently characterized by a wide degree of variability of the factors which possibly influence pH distribution. For this reason, it is challenging to link the spatio-temporal dynamics of pH to a single environmental factor by the ordinary least squares regression technique of the conditional mean. In this study, quantile regression was used to estimate the response of pH to nine environmental factors (As, Cd, Fe, Mn, Pb, turbidity, electrical conductivity, total dissolved solids and nitrates). Results of 25%, 50%, 75% quantile regression and ordinary least squares (OLS) regression were compared. The standard regression of the conditional means (OLS) underestimated the rates of change of pH due to the selected factors in comparison with the regression quantiles. The effect of arsenic increased for sampling locations with higher pH values (higher quantiles) likewise the influence of Pb and Mn. However, the effects of Cd and Fe decreased for sampling locations in higher quantiles. It can be concluded that these detected heterogeneities would be missed if this study had focused exclusively on the conditional means of the pH values. Consequently, quantile regression provides a more comprehensive account of possible spatio-temporal relationships between environmental covariates in groundwater. This study is one of the first to apply this technique on groundwater systems in sub-Saharan Africa. The approach is useful and interesting and has broad application for other mining environments especially tropical low-income countries where climatic conditions can drive rapid cycling or transformations of pollutants. It is also pertinent to geopolitical contexts where regulatory; monitoring and management capacities are weak and where mining pollution of groundwater largely occur.
ARTICLE | doi:10.3390/sci1030057
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: adaptation; perception; climate change; Nepal; multivariate probit
Online: 20 September 2019 (00:00:00 CEST)
This study assessed farmers’ perception of climate change, estimated the determinants of, and evaluated the relationship among adaptation practices using the multivariate probit model. A survey in 300 agricultural households was carried out covering 10 sample districts considering five agro-ecological zones and a vulnerability index. Four adaptation choices (change in planting date, crop variety, crop type and investment in irrigation) were deemed as outcome variables and socioeconomic, demographic, institutional, farm-level and perceptions variables were deployed as explanatory variables. Their marginal effects were determined for three climatic variables—temperature, precipitation and drought. Age, gender and education of head of household, credit access, farm area, rain-fed farming and tenure, are found to be more influential compared to other factors. All four adaptation-options are found to be complimentary to each other. Importantly, the intensity of impact of dependent variables in different models, and for available adaptation-options, are found to be unequal. Therefore, policy options and support facilities should be devised according to climatic variables and adaptation options to achieve superior results.
ARTICLE | doi:10.20944/preprints201808.0118.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Archimedean Copula; Elliptical Copula; Multivariate Distribution; Hydrology
Online: 6 August 2018 (11:39:25 CEST)
This study generalized the best copula to characterize the joint probability distribution between rainfall severity and duration in Peninsular Malaysia using two dimensional copulas. Specifically, to construct copulas, Inference Function for Margins (IFM) and Canonical Maximum Likelihood (CML) methods were specially exploited. For the purpose of achieving copula fitting, the derived rainfall variables by making use of the Standardized Precipitation Index (SPI) were fitted into several distributions. Five copulas, namely Gaussian, Clayton, Frank, Joe and Gumbel were put to the tests to establish the best data fitted copula. The tests produced acknowledged and satisfactory results of copula fitting for rainfall severity and duration. Surveying the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), only three copulas produced a better fit for parametric and semi parametric approaches. Finally, two consistency tests were conducted and the results had shown that Frank Copula produced consistent results.
ARTICLE | doi:10.20944/preprints202305.1793.v1
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: COVID-19; Clusters; Multivariate geospatial model; Minas Gerais state
Online: 25 May 2023 (10:12:34 CEST)
Abstract. Background: COVID-19's first victim was announced by Chinese health authorities on the 11th of January 2020. On January 13, the first official case was reported outside China, in Thailand. On January 25, the same occurred in São Paulo and on March 8, the first case was recorded in Minas Gerais. From that point onwards until the 3rd of October 2020, a total of 370,911 cases and 9,204 deaths were recorded in the state. This study aims to investigate spatiotemporal patterns of COVID-19 with incidence from March 22 to October 3 of 2020. Methods: The database itself was obtained from Health Division of Minas Gerais state. The vulnerability index was calculated using a principal component analysis. Moran's I autocorrelation was tested, z-score and P-value < 0.05. Results: From March 22 to October 3 of 2020 the incidence level varied from 45.680/100,000 to 312.130/100,000. The most influential variables were: illiteracy, gross domestic product and breath apparel per municipality. The clusters were concentrated in the metropolitan area of Belo Horizonte, Zona da Mata and Triangulo Mineiro. Conclusion: The spatial distribution of COVID-19 from week 13 until week 40 showed that different levels of endemicity and mesoregional vulnerabilities were represented in these maps.
ARTICLE | doi:10.20944/preprints202109.0081.v1
Subject: Chemistry And Materials Science, Food Chemistry Keywords: rootstocks; untargeted metabolomics; features; grafted; multivariate analysis; volatile compounds
Online: 6 September 2021 (09:52:40 CEST)
To allow for a broad survey of subtle metabolic shifts in wine caused by rootstock and irrigation, an integrated metabolomics-based workflow followed by quantitation was developed. This workflow was particularly useful when applied to a poorly studied variety cv. Chambourcin. Allowing volatile metabolites that otherwise may have been missed with a targeted analysis to be included, this approach allowed deeper modeling of treatment differences which then could be used to identify important compounds. Wines produced on a per vine basis, over two years, were analyzed using SPME-GC-MS/MS. From the 382 and 221 features that differed significantly among rootstocks in 2017 and 2018 respectively, we tentatively identified 94 compounds by library search and retention index, with 22 confirmed and quantified using authentic standards. Own-rooted Chambourcin differed from other root-systems for multiple volatile compounds with fewer dif-ferences among grafted vines. For example, the average concentration of β-Damascenone present in own-rooted vines (9.49 µg/L) was significantly lower in other rootstocks (8.59 µg/L), whereas mean Linalool was significantly higher in 1103P rootstock compared to own-rooted. β-Damascenone was higher in regulated deficit irrigation (RDI) than other treatments. The workflow outlined not only was shown to be useful for scientific investigation, but also in creating a protocol for analysis that would ensure differences of interest to industry are not missed.
ARTICLE | doi:10.20944/preprints202106.0530.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: airborne LiDAR; forest attributes; multivariate power model; sample size
Online: 22 June 2021 (13:03:33 CEST)
Exploring the effect of the sample size on the estimation accuracy of airborne LiDAR forest attributes in a large-scale area can help in optimizing the technical application scheme of operational ALS-based large-scale forest stand inventories. In our study, sample datasets composed of different sample plots were constructed by repeated sampling from 1003 sample plots in a subtropical study area covering 2376 × 103 km2. Sixteen multiplicative power models were built in each forest type consisting of four forest attributes. Through these models, the variations of standard deviation (SD) and coefficient of variation (CV) of R2 and rRMSE of forest attribute estimation models for different quantity levels of sample plots were also analyzed. The results showed that, first, when the sample size increased from 30 to the top limit, the SD of the forest attributes and LiDAR variables showed a decreasing trend. Second, as the sample size increased, the rRMSE of the 16 forest attribute estimation models gradually decreased, while the R2 gradually increased. Third, when the sample size was small, both the SD of R2 and rRMSE of the models were large, and the SD of R2 and rRMSE gradually decreased as the sample size increased. In 50 models conducted for each attribute at the same sample size, for the mean standard deviations of forest attributes, the ten best performing models were lower than those of the total 50 models, and the worst ten models were the opposite. When the sample size increased, the accuracy of each forest attribute estimation model for each forest type gradually improved. The variation of forest attributes and the LiDAR variable of the construction model are critical factors that affect the model’s accuracy. To efficiently apply airborne LiDAR in order to survey large-scale subtropical forest resources, the sample size of the Chinese fir forest, pine forest, eucalyptus forest, and broad-leaved forest should be 110, 80, 85, and 70, respectively.
ARTICLE | doi:10.20944/preprints201810.0374.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: mini-bioreactors; parallelization; automation; digitalization; multivariate analysis; dynamic processes
Online: 17 October 2018 (06:19:46 CEST)
Mini-bioreactor systems enabling automatized operation of numerous parallel cultivations have been used to accelerate and optimize bioprocess development. As implementation of fed-batch conditions, multiple options of process control and sample analysis are possible, these systems represent valuable screening tools for large-scale production. However, the dynamic behavior of cultivations has not yet been considered regarding data evaluation and decision making during high-throughput screening in mini-bioreactors. In this study, the characterization of Saccharomyces cerevisiae AH22 secreting recombinant endopolygalacturonase is performed in 48 parallel fed-batch cultivations regarding 16 experimental conditions. Automated parallel process control, frequent sampling and analysis were implemented. Data-driven multivariate methods were developed to allow for fast, automated decision making as well as online predictive data analysis regarding endopolygalacturonase production. Using dynamic process information, a cultivation with abnormal behavior could be detected by principal component analysis as well as two clusters of similarly behaving cultivations, later classified according to the feeding rate. By decision tree analysis, cultivation conditions leading to an optimal recombinant product formation could be identified automatically. The developed method is easily adaptable and suitable for automatized process development reducing the experimental times and costs.
ARTICLE | doi:10.20944/preprints201805.0126.v1
Subject: Computer Science And Mathematics, Computer Vision And Graphics Keywords: graph isomorphism problem; multivariate polynomial system; zero-knowledge proof
Online: 8 May 2018 (09:35:15 CEST)
Zero-Knowledge Proofs ZKP provide a reliable option to verify that a claim is true without giving detailed information other than the answer. A classical example is provided by the ZKP based in the Graph Isomorphism problem (GI), where a prover must convince the verifier that he knows an isomorphism between two isomorphic graphs without publishing the bijection. We design a novel ZKP exploiting the NP-hard problem of finding the algebraic ideal of a multivariate polynomial set, and consequently resistant to quantum computer attacks. Since this polynomial set is obtained considering instances of GI, we guarantee that the protocol is at least as secure as the GI based protocol.
ARTICLE | doi:10.20944/preprints202201.0472.v1
Subject: Medicine And Pharmacology, Pharmacology And Toxicology Keywords: aspirin; pharmacometabolomic; nuclear magnetic resonance; spectroscopy; gastric toxicity; multivariate analysis
Online: 31 January 2022 (17:26:48 CET)
Background: Low-dose aspirin (LDA) is the backbone for secondary prevention of coronary artery disease, though limited by gastric toxicity. This study was aimed to identify novel metabolites that could predict LDA-induced gastric toxicity using pharmacometabolomics. Methods: Pre-dosed urine samples were collected from male Sprague-Dawley rats. The rats were treated with either LDA (10 mg/kg) or 1% methylcellulose (10 ml/kg) per oral for 28 days. The rats' stomachs were examined for gastric toxicity using a stereomicroscope. The urine samples were analyzed using a proton nuclear magnetic resonance spectroscopy. Metabolites were systematically identified by exploring established databases and multivariate analyses to identify the spectral pattern of metabolites related to LDA-induced gastric toxicity. Results: Treatment with LDA resulted in gastric toxicity in 20/32 rats (62.5%). The orthogonal projections to latent structures discriminant analysis (OPLS-DA) model displayed a goodness-of-fit (R2Y) value of 0.947, suggesting a near-perfect reproducibility, a goodness-of-prediction (Q2Y) of -0.185 with perfect sensitivity, specificity and accuracy (100%). Furthermore, the area under the receiver operating characteristic (AUROC) displayed was 1. The final OPLS-DA model had an R2Y value of 0.726 and Q2Y of 0.142 with sensitivity (100%), specificity (95.0%) and accuracy (96.9%). Citrate, hippurate, methylamine, trimethylamine N-oxide and alpha-keto-glutarate were identified as the possible metabolites implicated in the LDA-induced gastric toxicity. Conclusion: The study identiﬁed metabolic signatures that correlated with the development of a low dose Aspirin-induced gastric toxicity in rats. This pharmacometabolomic approach could further be validated to predict LDA-induced gastric toxicity in patients with coronary artery disease.
REVIEW | doi:10.20944/preprints202105.0194.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: aphthous stomatitis, risk factors, genetic polymorphisms, multivariate analysis, systematic review
Online: 10 May 2021 (13:55:48 CEST)
The cause and prevention of recurrent aphthous stomatitis (also called aphthous ulcers or canker sores) are still unknown. This may be due in part to ignorance of the risk factors present in susceptible people. In this systematic review (PROSPERO record #CRD42019122214), we show that most of the risk factors for the disease are single nucleotide genetic polymorphisms in genes related to the functioning of immune system (TLR4, MMP9, E-selectin, IL-1 beta and TNF-alpha). Single nucleotide genetic polymorphisms do not constitute a modifiable risk. This indicates that, at least in part, susceptibility to recurrent aphthous stomatitis is hereditary, and that these factors cannot be modified.
Subject: Business, Economics And Management, Accounting And Taxation Keywords: performance analysis; elite football; multivariate analysis; principal components analysis; LaLiga
Online: 8 February 2021 (16:18:14 CET)
The use of principal components analysis provided information about the main characteristics of teams, based on a set of indicators, instead of displaying individualized information for each of these indicators. In this work we have considered reducing an extensive data matrix to improve interpretation, using the principal components analysis. Subsequently, with new components and with a multiple linear regression, we have carried out a comparative analysis between the best and bottom teams of LaLiga. The sample consisted of the matches corresponding to the 2015/16, 2016/17 and 2017/18 seasons. The results showed that the best teams were characterized and differentiated from bottom teams in the realization of a greater number of successful passes and in the execution of a greater number of dynamic offensive transitions. The bottom teams were characterized by executing more defensive than offensive actions and showing a fewer number of goals, a greater ball possession time in the final third of the field. Goals, ball possession time in the final third of the field, number of effective shots and crosses are the main performance factors that influence the offensive success of football. This information allows us to increase knowledge about the key performance indicators in football.
ARTICLE | doi:10.20944/preprints201608.0118.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: acclimation; coral reefs; endosymbiosis; molecular biology; multivariate statistics; temperature; upwelling
Online: 11 August 2016 (11:03:03 CEST)
Multivariate statistical approaches (MSA), such as principal components analysis and multidimensional scaling, seek to uncover meaningful patterns within datasets by considering multiple response variables in a concerted fashion. Although these techniques are readily used by ecologists to visualize and explain differences between study sites, they could theoretically be employed to differentiate organisms within an experimental framework while simultaneously identifying response variables that drive documented experimental differences. Therefore, MSA were used herein to attempt to understand the response of the common, Indo-Pacific reef coral Seriatopora hystrix to temperature changes using data from laboratory-based temperature challenge studies performed in Southern Taiwan. Gene expression and physiological data partitioned experimental specimens by time of sampling, treatment temperature, and site of origin upon employing MSA, signifying that S. hystrix and its dinoflagellate endosymbionts display physiological and molecular signatures that are characteristic of sampling time, site of colony origin, and/or temperature regime. These findings promote the utility of MSA for documenting biologically meaningful shifts in the physiological and/or sub-cellular response of marine invertebrates exposed to environmental change.
CONCEPT PAPER | doi:10.20944/preprints202305.1909.v1
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: multivariate Gaussians; correlated random variables; visualization; entropy; relative entropy; mutual information
Online: 26 May 2023 (10:00:47 CEST)
The fundamental objective is to study the application of multivariate sets of data in Gaussian distribution. This paper examines broad measurements of structure for both Gaussian and non-Gaussian distributions, which shows that they can be described in terms of the infor-mation-theoretic between the given covariance matrix and correlated random variables (in terms of relative entropy). In order to develop the multivariate Gaussian distribution with entropy and mutual information, several significant methodologies are presented through the discussion supported by illustrations, both technically and statistically. The content obtained allows readers to better perceive concepts, comprehend techniques, and properly execute software programs for future study on the topic's science and implementations. It also helps readers grasp the themes' fundamental concepts. Involving the relative entropy and mutual information as well as the potential correlated covariance analysis based on differential equations, a wide range of information is addressed, including basic to application concerns.
ARTICLE | doi:10.20944/preprints202206.0328.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: mangrove forests; Marine Protected Areas; α-diversity; β-diversity; multivariate analyses
Online: 24 June 2022 (03:28:50 CEST)
Differences in fish assemblages’ structure and their relation with environmental variables (due to the variations in sampled seasons, habitats, and zones), were analyzed in two adjacent estuaries on the north Pacific coast of Mexico. Environmental variables and fish catches were registered monthly between August 2018 and October 2020. Multivariate analyses were conducted to define habitats and zones based on their environmental characteristics, and the effect of this variability on fish assemblages’ composition, biomass, and diversity (α and β) was evaluated. A total of 12,008 fish individuals of 143 species were collected using different fishing nets. Multivariate analyses indicated that fish assemblages’ structure was different between zones due to the presence, height, and coverage of distinct mangrove species. Additionally, factors such as depth and salinity showed effects on fish assemblages’ diversity (α and β-nestedness), which presented higher values in the ocean and remained similar in the rest of the analyzed zones and habitats. These results and the differences in species replacement (β-turnover) indicate the singularity of fish assemblages at estuaries (even in areas very close to the ocean), and the necessity to establish local management strategies for these ecosystems.
ARTICLE | doi:10.20944/preprints202106.0470.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: heavy metals; surface sediment; Manila Bay; pollution; multivariate analysis; ecological risk
Online: 18 June 2021 (08:32:18 CEST)
Recent work on heavy metal pollution in Manila Bay suggests elevated concentration in the surface sediments. It is critical to identify the sources of these heavy metals to effectively rehabilitate the bay. Our study investigated the sources of the heavy metal pollution that ended up in Manila Bay and the risks associated with these toxic metals based on a recent survey conducted. Surface sediment samples with higher heavy metal concentrations were found in the upper to middle parts of the bay while lower concentrations were in the southeast areas. Multivariate analyses such as hierarchical cluster analysis (HCA), principal component analysis (PCA), and Pearson correlation analysis were used to identify the sources of the heavy metals. The heavy metal pollution in Manila Bay is attributed to several rivers draining northeast of Manila Bay, particularly the Marilao-Meycauayan-Obando River System (MMORS) which is cited as one of the 30 dirtiest river systems in the world. The ecological risks associated with heavy metals in the sediments found higher incidences of toxicity in north and middle parts of Manila Bay. Cu and Cr posed the highest risks of toxicities than any other heavy metals. Based on our analysis, the counterclockwise water gyre of the bay can explain the distribution and ecological risks associated with the heavy metals as supported by the findings of the PCA. Given the high priority by the Philippine government to rehabilitate the bay, our study strongly shows that efforts to restore the ecological status of Manila Bay will only succeed if the pollution from major rivers draining to it will be properly addressed.
ARTICLE | doi:10.20944/preprints202105.0412.v1
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: Expression of multilayer network models, oriented graph, multivariate model, nonlinear regression
Online: 18 May 2021 (10:26:19 CEST)
Neural networks models are mostly represented by oriented graphs where only the components, constitutive elements of the graph, are transcribed into mathematical xpression. Indeed, accurate knowledge of the full expression of the model is required in certain situations such as selecting among several reference models, the one that best fits the available data or comparing the explanatory and predictive performance of an established model with respect to some reference models. In this paper, we establish a formalism of the mathematical expression for multilayer perceptron neural network in a general framework, MLP-p-n-q, with p, n and q natural integers and show its restriction to cases where one has a hidden layer and multivariate outputs (MLP-p-1-q), and then a single output (MLP-p-1-1). Then, we give some specific cases of the most commonly used models. An application case is presented in the context of solving a nonlinear regression problem.
ARTICLE | doi:10.20944/preprints202012.0080.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: multivariate linear method; validation; diagnosis; discriminative; signatures of disease; schizophrenia; depression
Online: 3 December 2020 (10:38:31 CET)
In order to overcome this problem our group designed a novel machine learning technique, multivariate linear method (MLM) which can capture convergent data from voxel-based morphometry, functional resting state and task-related neuroimaging and the relevant clinical measures. In this paper we report results from convergent cross-validation of biological signatures of disease in a sample of patients with schizophrenia as compared to depression. Our model provides evidence that the combination of the neuroimaging and clinical data in MLM analysis can inform the differential diagnosis in terms of incremental validity to reach 90 % accuracy of the prediction.
ARTICLE | doi:10.20944/preprints202007.0195.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: cranial variation; otters (Lutra lutra); 3D surface scanning; multivariate statistical methods
Online: 9 July 2020 (12:52:26 CEST)
3D surface scans were carried out to determine the shapes of the upper sections of (skeletal) crania of adult Eurasian otters (Lutra lutra) from Great Britain. Landmark points were placed on these shapes by using a graphical user interface (GUI) and distance measurements (i.e., the length, height, and width of the crania) could be found by using the landmark points. These “GUI-based” distances were shown to be accurate and reliable in comparison to physical measurements taken on the crania directly by using a digital calliper. The crania of males were 6.85mm, 5.44mm, 1.66mm larger in terms of length, width and height, respectively, than females in our sample (P < 0.001), i.e., male otters had significantly larger skulls than females. Significant differences in size occurred also by geographical area in Great Britain (P < 0.05). Multilevel Principal Components Analysis (mPCA) indicated that sex and geographical area explained 31.1% and 9.6% of shape variation in “unscaled” shape data and that they explained 17.2% and 9.7% of variation in “scaled” data. The first mode of variation at level 1 (sex) correctly reflected size changes between males and females for “unscaled” shape data. Modes at level 2 (geographical area) also showed possible changes in size and shape. Clustering by sex and geographical area was observed in standardised component scores. Such clustering in cranial shape by geographical area might reflect genetic differences that are known to occur in otter populations in Great Britain, although other potentially confounding factors (e.g. population age-structure, diet, etc.) might also drive regional differences. Furthermore, sample sizes per group were small for geographical comparisons. However, this work provides a successful first test of the effectiveness of 3D surface scans and multivariate methods such as mPCA to study the cranial morphology of otters.
ARTICLE | doi:10.20944/preprints202004.0392.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Multivariate Public Key Cryptosystem; Random polynomial; Oil Vinegar signature; Provable Security
Online: 22 April 2020 (06:09:50 CEST)
An oil and vinegar scheme is a signature scheme based on multivariate quadratic polynomials over finite fields. The system of polynomials contains $n$ variables, divided into two groups: $v$ vinegar variables and $o$ oil variables. The scheme is called balanced (OV) or unbalanced (UOV), depending on whether $v = 0$ or not, respectively. These schemes are very fast and require modest computational resources, which make them ideal for low-cost devices such as smart cards. However, the OV scheme has been already proven to be insecure and the UOV scheme has been proven to be very vulnerable for many parameter choices. In this paper, we propose a new multivariate public key signature whose central map consists of a set of polynomials obtained from the multiplication of block matrices. Our construction is motivated by the design of the Simple Matrix Scheme for Encryption and the UOV scheme. We show that it is secure against the Separation Method, which can be used to attack the UOV scheme, and against the Rank Attack, which is one of the deadliest attacks against multivariate public-key cryptosystems. Some theoretical results on matrices with polynomial entries are also given, to support the construction of the scheme.
ARTICLE | doi:10.20944/preprints202301.0007.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: systolic blood pressure; diastolic blood pressure; GWAS; high blood pressure; multivariate; univariate
Online: 3 January 2023 (07:28:58 CET)
Background: High blood pressure (BP) has been implicated as a major risk factor for cardiovascular diseases in several global populations, including in individuals of African ancestry. Despite the elevated burden of high BP-induced cardiovascular diseases in Africa and other global populations with African ancestry, limited genetic studies have been carried out to explore the genetic machinery driving this phenomenon. Methods: We performed univariate and multivariate analyses using Genome-wide association studies (GWAS) and summary statistics data of 77,850 individuals of African ancestry for systolic (SBP) and diastolic blood pressure (DBP) traits. The six independent cohorts used included individuals derived from the African Partnership for Chronic Disease Research (APCDR), the UK Biobank, and the Million Veteran Program (MVP). Subsequently, we annotated, prioritized, visualized, and interpreted our meta-analyses results using FUMA, to gain further insight into the molecular mechanism(s) that contribute to the genetics of BP traits. Finally, loci attaining genome-wide significance, GWS (p<5x10-8) were also followed up with Bayesian fine-mapping to identify potential causal variants. Results: Our meta-analyses altogether identified 350 GWAS SNPs for SBP (166 SNPs) and DBP (184 SNPs, including two novel loci) whilst our multivariate GWAS method identified 166 SNPs (including three novel loci). Interestingly, in FUMA there was significant tissue enrichment of up-regulated differentially expressed genes (DEGs) in the sigmoid and transverse colon for SBP, as well as 10 significant gene sets from MAGMA gene set analyses, However, for DBP, no significant DEGs nor gene sets in MAGMA were found; instead, in DBP for gene property analysis for tissue specificity nine candidates were found to be significant and all nine were in different brain regions. Finally, Bayesian fine-mapping revealed that only 11 variants from the lead SNPs had >50% posterior probability (PP) of being causal and they included novel variant rs562545 (MOBP, PP = 77%) and 10 other previously published variants. Conclusion: Our results demonstrate the importance of performing GWAS in large sample sizes of global populations of African ancestry, including continental Africans; which yield novel insights, from novel loci to novel pathways/tissue expression candidates. Large-scale genomic datasets are required to enhance further discovery and fine-mapping of high-risk loci/variants in highly susceptible groups for cardiovascular disease and other related traits. Our study highlights the need for diversity in genetic research and the importance of expanding large GWASs to include ancestrally diverse populations.
ARTICLE | doi:10.20944/preprints201911.0025.v1
Subject: Chemistry And Materials Science, Physical Chemistry Keywords: macro-minerals; micro-minerals; environmental-minerals; beef quality; beef production; multivariate analysis
Online: 3 November 2019 (17:38:11 CET)
Mineral profile of beef interests human health, but also animal performance and meat quality. This study analyzes the relationships of 20 minerals in beef (ICP-OES) with 3 animal performance and 13 meat quality traits analyzed on 182 samples of Longissimus thoracis. Animals’ breed and sex showed limited effects. The major sources of variation (farm/date of slaughter, individual animal within group and side/sample within animal) differed greatly from trait to trait. Mineral contents were correlated to animal performance and meat quality being significant 52 out of the 320 correlations at the farm/date level, and 101 out of the 320 at the individual animal level. Five latent factors explained 69% of mineral co-variation. The most important, “Mineral quantity” factor correlated with age at slaughter and with the meat color traits. Two latent factors (“Na+Fe+Cu” and “Fe+Mn”) correlated with performance and meat color traits. Two other (“K-B-Pb” and “Zn”) correlated with meat chemical composition and the latter also with carcass weight and daily gain, and meat color traits. Meat cooking losses correlated with “K-B-Pb”. Latent factor analysis appears be a useful means of disentangling the very complex relationships that the minerals in meat have with animal performance and meat quality traits.
ARTICLE | doi:10.20944/preprints201810.0669.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: scorpion ecology; multivariate statistics; body size; offspring characteristics, K and r strategists
Online: 29 October 2018 (10:09:45 CET)
There are no studies that quantitatively compare life histories among scorpion species. Statistical procedures applied to 94 scorpion species indicate that those with larger bodies do not necessarily have larger litters or longer life cycles, opposite to some theoretical predictions.
ARTICLE | doi:10.20944/preprints201810.0176.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: agricultural stakeholders; extension; multivariate analysis; socio-ecological systems; mental models; sustainable agriculture
Online: 9 October 2018 (06:03:38 CEST)
The sustainability of agriculture depends as much on the natural resources required for production as it does on the stakeholders that manage those resources. It is thus essential to understand the variables that influence the decision-making process of agricultural stakeholders to design educational programs, interventions, and policies geared towards their specific needs, a required step to enhance agricultural sustainability. We examined the perceptions, experiences, and priorities that influence management decisions of five major groups of agricultural stakeholders (conventional small grain producers, organic small grain producers, organic vegetable producers, extension agents and agro-industry crop consultants, and researchers) across the Montana, United States. Results revealed that while stakeholder groups have distinct perceptions, experiences, and priorities, there were similarities across groups. Specifically, organic vegetable and organic small grain producers showed similar responses that were, in turn, divergent of conventional producers, researchers, and crop consultants. Conventional small grain producers and researchers showed overlapping response patterns while crop consultants formed an isolated group. Our results reinforce the need for agricultural education and programs that address unique and shared experiences, priorities, and concerns of multiple stakeholder groups. This study endorses the call for a paradigm shift from the traditional top-down agricultural extension model to one that accounts for participants’ socio-ecological contexts to facilitate the adoption of sustainable agricultural systems that support environmental and human wellbeing.
ARTICLE | doi:10.20944/preprints201809.0224.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: aging; muscle; protein; metabolism; metabolomics; profiling; biomarkers; multi-marker; physical performance; multivariate
Online: 12 September 2018 (17:11:33 CEST)
Physical frailty and sarcopenia (PF&S) are hallmarks of aging that share a common pathogenic background. Perturbations in protein/amino acid metabolism may play a role in the development of PF&S. In this preliminary study, 68 community-dwellers aged 70 years and older, 38 with PF&S and 30 non-sarcopenic, non-frail controls (nonPF&S), were enrolled. A panel of 37 serum amino acids and derivatives was assayed by UPLC-MS. Partial Least Squares Discriminant Analysis (PLS-DA) was used to characterize the amino acid profile of PF&S. The optimal complexity of the PLS-DA model was found to be three latent variables. The proportion of correct classification was 76.6 ± 3.9% (75.1 ± 4.6% for enrollees with PF&S; 78.5 ± 6.0% for controls). Older adults with PF&S were characterized by higher levels of asparagine, aspartic acid, citrulline, ethanolamine, glutamic acid, sarcosine, and taurine. The profile of nonPF&S individuals was defined by higher levels of α-aminobutyric acid and methionine. Distinct profiles of circulating amino acids and derivatives characterize older individuals with PF&S. The dissection of these patterns may provide novel insights into the role played by protein/amino acid perturbations in the disabling cascade and possible new targets for interventions.
ARTICLE | doi:10.20944/preprints201806.0482.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Monte Carlo; regime-switching multivariate black-scholes; metamodeling; variable annuity; portfolio valuation
Online: 29 June 2018 (11:31:49 CEST)
Dynamic hedging has been adopted by many insurance companies to mitigate the financial risks associated with variable annuity guarantees. In order to simulate the performance of dynamic hedging for variable annuity products, insurance companies rely on nested stochastic projections, which is highly computationally intensive and often prohibitive for large variable annuity portfolios. Metamodeling techniques have recently been proposed to address the computational issues. However, it is difficult for researchers to obtain real datasets from insurance companies to test metamodeling techniques and publish the results in academic journals. In this paper, we create synthetic datasets that can be used for the purpose of addressing the computational issues associated with the nested stochastic valuation of large variable annuity portfolios. The runtime used to create these synthetic datasets would be about 3 years if a single CPU were used. These datasets are readily available to researchers and practitioners so that they can focus on testing metamodeling techniques.
ARTICLE | doi:10.20944/preprints202105.0105.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: data scarcity; water quality; missing data; univariate imputation; multivariate imputation; machine learning; hydroinformatics.
Online: 6 May 2021 (15:18:23 CEST)
The monitoring of surface-water quality followed by water-quality modeling and analysis is essential for generating effective strategies in water-resource management. However, worldwide, particularly in developing countries, water-quality studies are limited due to the lack of a complete and reliable dataset of surface-water-quality variables. In this context, several statistical and machine-learning models were assessed for imputing water-quality data at six monitoring stations located in the Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. The challenge of this study is represented by the high percentage of missing data (between 50% and 70%) and the high temporal and spatial variability that characterizes the water-quality variables. The competing algorithms implemented belonged to both univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Hubber Regressor (HR), Support Vector Regressor (SVR), and K-nearest neighbors Regressor (KNNR)). According to the results, more than 76% of the imputation outcomes are considered satisfactory (NSE > 0.45). The imputation performance shows better results at the monitoring stations located inside the reservoir than the ones positioned along the mainstream. IDW was the most chosen model for data imputation.
ARTICLE | doi:10.20944/preprints201808.0415.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: Taiwan rivers; water quality; multivariate statistical analysis; river pollution index; pollution source apportionment
Online: 23 August 2018 (11:54:51 CEST)
This study reports multivariate statistical techniques applied including cluster analysis to evaluate and classify the river pollution level in Taiwan, and principal component analysis-multiple linear regression (PCA-MLR) to identify the possible pollution source. Water quality and heavy metal monitoring data from Taiwan Environmental Protection Administration (EPA) was evaluated for 14 rivers in the four regions of Taiwan. The Erren River was classified as the most polluted River in Taiwan. Biochemical oxygen demand, ammonia, and total phosphate concentration in this river were the highest of the 14 rivers evaluated. In addition, heavy metal levels of the following rivers exceeded the Taiwan EPA standard limit: lead - in the Dongshan, Jhuoshuei, and Xinhuwei Rivers; copper - in the Dahan, Laojie, and Erren Rivers; and manganese - in all rivers. Water pollution in the Erren River was estimated to originate 72% from industrial sources, 16% from domestic black water, and 12% from natural sources and runoff from other tributaries. Our research showed that PCA-MLR and the cluster analysis model accomplished our study objectives and will be helpful tools to evaluate water quality in rivers and we suggest that the continuous monitoring should be conducted to monitor water pollution from anthropogenic activities.
ARTICLE | doi:10.20944/preprints202104.0499.v1
Subject: Engineering, Automotive Engineering Keywords: fault detection; induced draft fan; multivariate state estimation technique (MSET); model update; power plant
Online: 19 April 2021 (14:36:56 CEST)
The induced draft (ID) fan is important auxiliary equipment in the thermal power plant. It is of great significance to monitor the operation of the ID fan for safe and efficient production. In this paper, an adaptive warning model is proposed to detect early faults of ID fans. First, a non-parametric monitoring model is constructed to describe the normal operation states with the multivariate state estimation technique (MSET). Then, an early warning approach is presented to identify abnormal behaviors based on the results of the MSET model. As the performance of the MSET model is heavily influenced by the normal operation data in the historic memory matrix, an adaptive strategy is proposed by using the samples with a high data quality index (DQI) to manage the memory matrix and update the model. The proposed method is applied to a 300 MW coal-fired power plant for early fault detection, and it is compared with the model without an update. Results show that the proposed method can detect the fault earlier and more accurately.
ARTICLE | doi:10.20944/preprints201808.0072.v4
Subject: Engineering, Civil Engineering Keywords: flood risk; copula; compound events; multivariate; storm surge; spatial dependence; coastal catchment; Bayesian Network.
Online: 11 September 2018 (14:19:43 CEST)
Traditional flood hazard analyses often rely on univariate probability distributions; however, in many coastal catchments, flooding is the result of complex hydrodynamic interactions between multiple drivers. For example, synoptic meteorological conditions can produce considerable rainfall-runoff, while also generating wind-driven elevated sea levels. When these drivers interact in space and time, they can exacerbate flood impacts; this phenomenon is known as compound flooding. In this paper, we build a Bayesian Network based on Gaussian copulas to generate the equivalent of 500 years of daily stochastic boundary conditions for a coastal watershed in Southeast Texas. In doing so, we overcome many of the limitations of conventional univariate approaches and are able to probabilistically represent compound floods caused by riverine and coastal interactions. We calculate the resulting water levels using a 1D steady-state hydraulic model and find that flood stages in the catchment are strongly affected by backwater effects from tributary inflows and downstream water levels. By comparing with a bathtub modeling approach, we show that simplifying the multivariate dependence between flood drivers can lead to an underestimation of flood impacts, highlighting that accounting for multivariate dependence is critical for the accurate representation of flood risk in coastal catchments prone to compound events.
ARTICLE | doi:10.20944/preprints201709.0099.v1
Subject: Biology And Life Sciences, Forestry Keywords: near-infrared spectroscopy; multivariate analysis; partial least-squares regression; floor litter; optimal wavelength selection
Online: 21 September 2017 (04:36:21 CEST)
Near-infrared spectroscopy (NIRS) was implemented to monitor the moisture content of broadleaf litters. Partial least-squares regression (PLSR) models, incorporating optimal wavelength selection techniques, have been proposed to better predict the litter moisture of forest floor. Three broadleaf litters were used to sample the reflection spectra corresponding the different degrees of litter moisture. Maximum normalization preprocessing technique was successfully applied to remove unwanted noise from the reflectance spectra of litters. Four variable selection methods were also employed to extract the optimal subset of measured spectra for establishing the best prediction model. The results showed that the PLSR model with the peak of beta coefficients method was the best predictor among all candidate models. The proposed NIRS procedure is thought to be a suitable technique for on-the-spot evaluation of litter moisture.
ARTICLE | doi:10.20944/preprints202305.0634.v1
Subject: Chemistry And Materials Science, Analytical Chemistry Keywords: virgin coconut oil; oil adulteration; Fourier transform infrared; multivariate curve resolution; genetic algorithm; control chart.
Online: 9 May 2023 (10:22:27 CEST)
Virgin coconut oil (VCO) is a functional food with important health benefits. Its economic interest encourages fraudsters to deliberately adulterate VCO with cheap and low-quality vegetable oils for financial gain, causing health and safety problems for consumers. In this context, there is an urgent need for rapid, accurate and precise analytical techniques to detect VCO adulteration. In this study, the use of Fourier Transform Infrared (FTIR) spectroscopy combined with Multivariate Curve Resolution - Alternating Least Squares (MCR-ALS) methodology was evaluated to verify purity or adulteration of VCO with reference to several low-cost commercial oils such as sun-flower, maize and peanut oils. Control charts were developed to assess the purity of oil samples using MCR-ALS scores values calculated from a data set of pure and adulterated oils. In addition, quantification models were developed using MCR-ALS with correlation constraint for adulterated coconut oil to assess the blend composition. Different data pre-treatment strategies were tested in order to best extract the information contained in the sample fingerprints, and the calibration models were optimised using a genetic algorithm (GA) to select the most important variables. The models gave satisfactory results in external validation procedure, with absolute errors of less than 4.6 % for samples adulterated with sunflower, maize and peanut oils.
ARTICLE | doi:10.20944/preprints202208.0283.v1
Subject: Engineering, Industrial And Manufacturing Engineering Keywords: grinding; multivariate statistics; maintenance decision; condition-based maintenance; condition monitoring; health management; prognostics; fault diagnosis
Online: 16 August 2022 (09:44:46 CEST)
Grinding processes’ stochastic nature poses a challenge in predicting the quality of the resulting surfaces. Post-production measurements for form, surface roughness, and circumferential waviness are commonly performed due to infeasibility in measuring all quality parameters during the grinding operation. Therefore, it is challenging to diagnose the root cause of quality deviations in real-time resulting from variations in the machine’s operating condition. This paper introduces a novel approach to predicting the overall quality of the individual parts. The grinder is equipped with sensors to implement condition-based maintenance and is induced with five frequently occurring failure conditions for the experimental test runs. The crucial quality parameters are measured for the produced parts. Fuzzy c-means (FCM) and Hotelling’s T-squared (T2) have been evaluated to generate quality labels from the multi-variate quality data. Benchmarked random forest regression models are trained using fault diagnosis feature set and quality labels. Quality labels from the T2 statistic of quality parameters are preferred over FCM approach for their repeatability. The model, trained from T2 labels achieves more than 94% accuracy when compared to the measured ring disposition. The predicted overall quality using the sensors’ feature set is compared against the threshold to reach a trustworthy maintenance decision.
ARTICLE | doi:10.20944/preprints202203.0205.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: heavy metals; abandoned mine; soil pollution; potential ecological risk; multivariate analysis; health index; soil; sediments
Online: 15 March 2022 (10:58:46 CET)
A recent survey that determined heavy metal concentrations in an abandoned Hg mine in Palawan, Philippines, found the occurrence of Hg with As, Ba, Cd, Co, Cr, Cu, Fe, Hg, Mn, Ni, Pb, Sb, Tl, V, and Zn. While the Hg originated from the mine waste calcines as supported by previous studies, the critical knowledge about the origin of the other heavy metals remains to be unknown. Our study investigated the sources of heavy metal pollution surrounding the abandoned Hg mine; and assessed the soil and sediment quality, ecological risks, and health risks associated with these toxic metals. Multivariate analyses, such as hierarchical cluster analysis (HCA), principal component analysis (PCA), and Pearson correlation analysis, were used to identify the heavy metal sources from the results of a previous paper. Our results showed that Fe, Ni, Cr, Co, and Mn are associated with the ultramafic geology of the study, whereas As, Ba, Cd, Cu, Pb, Sb, Tl, V, and Zn are likely due to historical mining and processing of cinnabar from 1953-1976. The mine waste calcines were used as construction material for the wharf and as land filler for the adjacent communities. The modified contamination factor (mCdeg) showed that the coast of Honda Bay is highly contaminated, while the inland areas, including the rivers, are very- to ultra-highly contaminated. There is a considerable ecological risk associated with the heavy metals, wherein Ni, Hg, Cr, and Mn contribute an average of 46.3 %, 26.3 %, 11.2 %, and 9.3 % to the potential ecological risk index (RI), respectively. The overall mean hazard index (HI) for both adults (1.4) and children (12.1) exceeded 1, implying the probability of non-carcinogenic adverse effects. The mean total cancer risk over a lifetime (LCR) for both adults (1.19×10-3) and children (2.89×10-3) exceeded the tolerable threshold of 10-4, suggesting a potentially high risk for developing cancer mainly by Ni, Co, and Cr exposure.
ARTICLE | doi:10.20944/preprints202106.0439.v1
Subject: Environmental And Earth Sciences, Atmospheric Science And Meteorology Keywords: heavy metals; MMORS, Meycauayan River; soil pollution; multivariate analysis; Sediment Quality Guidelines; Single Pollution Index
Online: 16 June 2021 (10:34:20 CEST)
The City of Meycauayan is considered as one of the most polluted cities in the developing world on the account of industrial discharges of toxic materials to the environment. This work investigated the sources of the heavy metal pollution by analyzing soil and sediment samples for heavy metals (Cr, Hg, Ni, and Pb) together with selected environmental indicators (TN, TOM, and TP) located along the Meycauayan River. Hierarchical cluster analysis (HCA), principal components analysis (PCA), and Pearson correlation analysis (CA) were used to identify the sources of the metals. Results showed delineated locations of severe levels of heavy metal pollution downstream because of the concentration of industrial activities. Cr contributed more than any other heavy metals analyzed due to proliferation of tanneries discharging untreated wastewaters to the river. Significant inputs of Pb and Hg from Pb-acid battery recycling and gold smelting industries were also found. Risk assessments indicated severe levels of heavy metal pollution where industrial activities are concentrated. The mean Cr, Pb, Ni, and Hg in the sampling locations have mean incidences of toxicity of 91.7 %, 53.6 %, 27.7 %, and 70.0 %, respectively. Our study showed a serious need to address heavy metal pollution in Meycauayan.
ARTICLE | doi:10.20944/preprints201809.0038.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: Karenia brevis, harmful algal bloom (HAB), moderate resolution imaging Spectroradiometer (MODIS), prediction, chlorophyll, multivariate regression
Online: 3 September 2018 (13:52:41 CEST)
Over the past two decades, persistent occurrences of harmful algal blooms (HAB; Karenia brevis) have been reported in Charlotte County, southwestern Florida. We developed data-driven models that rely on spatiotemporal remote sensing and field data to identify factors controlling HAB propagation, provide a same-day distribution (nowcasting), and forecast their occurrences up to three days in advance. We constructed multivariate regression models using historical HAB occurrences (213 events reported from January 2010 to October 2017) compiled by the Florida Fish and Wildlife Conservation Commission and validated the models against a subset (20%) of the reported historical events. The models were designed to specifically capture the onset of the HABs instead of those that developed days earlier and continued thereafter. A prototype of an early warning system was developed through a threefold exercise. The first step involved the automatic downloading and processing of daily Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua products using SeaDAS ocean color processing software to extract temporal and spatial variations of remote sensing-based variables over the study area. The second step involved the development of a multivariate regression model for same-day mapping of HABs and similar subsequent models for forecasting HAB occurrences one, two, and three days in advance. Eleven remote sensing variables and two non-remote sensing variables were used as inputs for the generated models. In the third and final step, model outputs (same-day and forecasted distribution of HABs) were posted automatically on a web-based GIS (http://www.esrs.wmich.edu/webmap/bloom/). Our findings include the following: (1) the variables most indicative of the timing of bloom propagation are bathymetry, euphotic depth, wind direction, SST, chlorophyll-a [OC3M] and distance from the river mouth, and (2) the model predictions were 90% successful for same-day mapping and 65%, 72% and 71% for the one-, two- and three-day advance predictions, respectively. The adopted methodologies are reliable, dependent on readily available remote sensing data sets, and cost-effective and thus could potentially be used to map and forecast algal bloom occurrences in data-scarce regions.
COMMUNICATION | doi:10.20944/preprints202111.0549.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: Principal Component Regression, Partial Least Squares, Orthogonal Partial Least Squares, multivariate regression, hypothesis generation, Parkinson’s disease
Online: 29 November 2021 (15:42:03 CET)
In the current era of ‘big data’, scientists are able to quickly amass enormous amount of data in a limited number of experiments. The investigators then try to hypothesize about the root cause based on the observed trends for the predictors and the response variable. This involves identifying the discriminatory predictors that are most responsible for explaining variation in the response variable. In the current work, we investigated three related multivariate techniques: Principal Component Regression (PCR), Partial Least Squares or Projections to Latent Structures (PLS), and Orthogonal Partial Least Squares (OPLS). To perform a comparative analysis, we used a publicly available dataset for Parkinson’ disease patien ts. We first performed the analysis using a cross-validated number of principal components for the aforementioned techniques. Our results demonstrated that PLS and OPLS were better suited than PCR for identifying the discriminatory predictors. Since the X data did not exhibit a strong correlation, we also performed Multiple Linear Regression (MLR) on the dataset. A comparison of the top five discriminatory predictors identified by the four techniques showed a substantial overlap between the results obtained by PLS, OPLS, and MLR, and the three techniques exhibited a significant divergence from the variables identified by PCR. A further investigation of the data revealed that PCR could be used to identify the discriminatory variables successfully if the number of principal components in the regression model were increased. In summary, we recommend using PLS or OPLS for hypothesis generation and systemizing the selection process for principal components when using PCR.rewordexplain later why MLR can be used on a dataset with no correlation
ARTICLE | doi:10.20944/preprints201810.0523.v1
Subject: Biology And Life Sciences, Neuroscience And Neurology Keywords: spatiotemporal neural dynamics; vision; dorsal and ventral streams; multivariate pattern analysis; representational similarity analysis; fMRI; MEG
Online: 23 October 2018 (06:41:16 CEST)
To build a representation of what we see, the human brain recruits regions throughout the visual cortex in cascading sequence. Recently, an approach was proposed to evaluate the dynamics of visual perception in high spatiotemporal resolution at the scale of the whole brain. This method combined functional magnetic resonance imaging (fMRI) data with magnetoencephalography (MEG) data using representational similarity analysis and revealed a hierarchical progression from primary visual cortex through the dorsal and ventral streams. To assess the replicability of this method, here we present results of a visual recognition neuro-imaging fusion experiment, and compare them within and across experimental settings. We evaluated the reliability of this method by assessing the consistency of the results under similar test conditions, showing high agreement within participants. We then generalized these results to a separate group of individuals and visual input by comparing them to the fMRI-MEG fusion data of Cichy et al. (2016), revealing a highly similar temporal progression recruiting both the dorsal and ventral streams. Together these results are a testament to the reproducibility of the fMRI-MEG fusion approach and allows for the interpretation of these spatiotemporal dynamic in a broader context.
REVIEW | doi:10.20944/preprints201807.0241.v1
Subject: Chemistry And Materials Science, Biomaterials Keywords: biomaterial; bone regeneration; drug release; hydrogel; lignin; multivariate data processing; osteogenesis; scaffolds; stem cells; tissue engineering
Online: 13 July 2018 (15:07:37 CEST)
Renewable resources gain increasing interest as source for environmentally benign biomaterials, such as drug encapsulation/release compounds, and scaffolds for tissue engineering in regenerative medicine. Being the second largest naturally abundant polymer, the interest in lignin valorization for biomedical utilization is rapidly growing. Depending on resource and isolation procedure, lignin shows specific antioxidant and antimicrobial activity. Today, efforts in research and industry are directed toward lignin utilization as renewable macromolecular building block for the preparation of polymeric drug encapsulation and scaffold materials. Within the last five years, remarkable progress has been made in isolation, functionalization and modification of lignin and lignin-derived compounds. However, literature so far mainly focuses lignin-derived fuels, lubricants and resins. The purpose of this review is to summarize the current state of the art and to highlight the most important results in the field of lignin-based materials for potential use in biomedicine (reported in 2014–2018). Special focus is drawn on lignin-derived nanomaterials for drug encapsulation and release as well as lignin hybrid materials used as scaffolds for guided bone regeneration in stem cell-based therapies.
ARTICLE | doi:10.20944/preprints201804.0161.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: electronic nose; nanowire gas sensors; food quality control; Parmigiano Reggiano; multivariate data analysis; artificial neural network
Online: 12 April 2018 (06:25:29 CEST)
Parmigiano Reggiano cheese is one of the most appreciated and consumed food worldwide, especially in Italy, for its high content of nutrients and for its taste. However, these characteristics make this product subject to counterfeiting in different forms. In this study, a novel method based on an electronic nose has been developed in order to investigate the potentiality of this tool to distinguish rind percentage in grated Parmigiano Reggiano packages that should be lower than 18%. Different samples in terms of percentage, seasoning and rind working process were considered to tackle the problem at 360°. In parallel, GC-MS technique was used to give a name to the compounds that characterize Parmigiano and to relate them with sensors responses. Data analysis consisted of two stages: multivariate analysis (PLS) and classification made in a hierarchical way with PLS-DA ad ANNs. Results are promising in terms of correct classification of the samples. The classification rate is higher for ANNs than PLS-DA, reaching also 100% values.
ARTICLE | doi:10.20944/preprints201808.0519.v1
Subject: Chemistry And Materials Science, Analytical Chemistry Keywords: metabolomics; γ-Hydroxybutyric acid; polyamine profiling analysis, gas chromatography-mass spectrometry; star pattern recognition analysis; multivariate analysis
Online: 30 August 2018 (08:14:55 CEST)
1) Background: Recently, illegal abuse of γ-hydroxybutyric acid (GHB) has increased in drug-facilitated crimes, but determination of GHB exposure and intoxication is difficult due to rapid metabolism of GHB. Its biochemical mechanism has not been completely investigated. And metabolomic study by polyamine profile and pattern analyses was not performed in rat urinefollowing intraperitoneal injection with GHB. 2) Methods: Polyamine profiling analysis by gas chromatography-mass spectrometry combined with star pattern recognition analysis was performed in this study. Multivariate statistical analysis was used to evaluate discrimination between control and GHB administration groups. 3) Results: Six polyamines were determined in control, single and multiple GHB administration groups. Star pattern showed distorted hexagonal shapes with characteristic and readily distinguishable patterns for each group. N1-Acetylspermine (p < 0.001), putrescine (p <0.006), N1-acetylspermidine (p <0.009), and spermine (p < 0.027) were significantly increased in single administration group but were significantly lower in the multiple administration group than in the control group. N1-Acetylspermine was the main polyamine for discrimination between control, single and multiple administration groups. Spermine showed similar levels in single and multiple administration groups. 4) Conclusions: The polyamine metabolic pattern was monitored in GHB administration groups. N1-Acetylspermine and spermine were evaluated as potential biomarkers of GHB exposure and addiction.
ARTICLE | doi:10.20944/preprints201902.0039.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: Volatility; Stocks; Persistence; Exchange Rate, Inflation Rate; Financial Time Series; Generalized Autoregressive Conditional Heteroscedasticity (GARCH); Multivariate GARCH (MGARCH)
Online: 4 February 2019 (14:56:07 CET)
The aim of this research work was to provide model for predicting stock volatility in Nigeria Stock market. To achieve this, monthly data for Nigerian stock exchange, Exchange rate, Share index and inflation rate was collected for a period of January 1990 to December 2016.The descriptive statistics revealed these variables to exhibit volatility as a characteristics of financial time –varying series. DCC Model was fitted, were the coefficients for all the parameters and that of the correlation-Targeting (rho_21) are both negative and positive and tend very close to 1 and -1, indicating that high persistence in the conditional variances. The Model DCC, satisfied the properties of a good model of conditional mean and variance of the confidential Interval (C.I) of 1 and -1, that is, the conditional variances are finites and their series are strictly stationary. This therefore implies that the Nigerian Stock Exchange, Exchange rate, share index and Inflation rate will experience a non-steady shock in the Stock market. However Each of these variables have different length of recovery (volatility half- life) ranging from 1.5month, 6.5months, 6months to 2,4months for stock exchange, exchange rate, share index and inflation rate respectively. By implication, the volatility of these variables had a long memory, persistence and mean-reverting.
ARTICLE | doi:10.20944/preprints202304.0021.v1
Subject: Chemistry And Materials Science, Food Chemistry Keywords: Hass avocado dry matter gradient; near infrared spectroscopy; Hass avocado harvest; fruit quality; multivariate data analysis; root mean squares
Online: 3 April 2023 (10:22:00 CEST)
Knowing, with reasonable accuracy, the dry matter (DM) content of Hass avocado fruit will help determine when the fruit must be harvested. The reliability of predictive models based on near infrared spectra for DM quantification depends on the ability of the spectra to be representative of the DM gradient within a whole fruit. The aim of this work was to develop a methodology to determine the optimum number of spectra to develop a robust model for DM content quantification. Three spectra were recorded for each zone of the intact fruit: peduncle, equator, and base. Each scanning point was sampled, and the DM content was determined using oven drying. Two-way ANOVA confirmed the DM gradient within the whole fruit. This gradient was observed within spectra using the RMS (root mean square) criterion and PCA. The PLS models showed that at least one spectrum per zone could be enough to construct an efficient and robust model for dry matter quantification.
ARTICLE | doi:10.20944/preprints202301.0358.v1
Subject: Environmental And Earth Sciences, Geophysics And Geology Keywords: Landslide susceptibility; Multivariate Adaptive Regression Splines (MARS); GIS; earthquake; earthquake-induced landslides; rainfall-induced landslides; El Salvador; Central America
Online: 19 January 2023 (11:54:14 CET)
In January and February 2001, El Salvador was hit by two strong earthquakes that triggered thousands of landslides, causing 1,259 fatalities and extensive damage. The analysis of aerial and SPOT-4 satellite images taken a few days after the events allowed us to map 6,491 coesismic landslides, which occurred in 14 study areas extending for about 400 km2. Four different Multivariate Adaptive Regression Splines (MARS) models were produced by using different covariate sets and landslide inventories, the latter containing the slope failures triggered by an extreme rainfall event of November 2009 and those induced by the earthquakes of 2001. Moreover, two validation scenarios were employed, including the 25% and 95% of the mapped landslides, respectively. The results of our experiment revealed that: (i) MARS algorithm provides reliable predictions of coesismic landslides; (ii) models calibrated with rainfall-induced landslides predict with acceptable accuracy landslides caused by deep earthquakes and distributed over vast areas; (iii) the best accuracy is achieved by models trained with both preparatory and trigger variables; (iv) a small portion of the landslides produced by an earthquake can be used to calibrate MARS predictive models that help to identify slopes where yet unreported landslides may have occurred.
ARTICLE | doi:10.20944/preprints201711.0114.v1
Subject: Environmental And Earth Sciences, Environmental Science Keywords: Hydrochemcial characteristics; water-rock interaction; multivariate statistical analysis; mixing model; δD and δ18O isotopes; natural water system; Kangding County
Online: 17 November 2017 (12:34:26 CET)
The utilization for water resource has been of great concern to human life. To assess the natural water system in Kangding County, the integrated methods of hydrochemcial analysis, multivariate statistics and geochemical modelling were conducted on surface water, groundwater and thermal water samples. Surface water and groundwater were dominated by Ca-HCO3 type, while thermal water belonged to Ca-HCO3 and Na-Cl type. The analyzing results concluded the driving factors that affect hydrochemical components. Following the results of the combined assessments, hydrochemcial process was controlled by the dissolution of carbonate and silicate minerals with slight influence from anthropogenic activity. The mixing model of groundwater and thermal water was calculated using silica-enthalpy method, yielding cold-water fraction of 0.56-0.79 and estimated reservoir temperature of 130-199 oC, respectively. δD and δ18O isotopes suggested surface water, groundwater and thermal springs were of meteoric origin. Thermal water should have deep circulation through the Xianshuihe fault zone, while groundwater flows through secondary fractures where it recharges with thermal water. Those analytical results were used to construct a hydrological conceptual model, providing a better understanding of the natural water system in Kangding County.
ARTICLE | doi:10.20944/preprints202010.0436.v1
Subject: Computer Science And Mathematics, Mathematical And Computational Biology Keywords: Naïve Bayes Classification; Eulers Strength Formula; Cricket Prediction; Supervised Learning; KNIME Tool; Cricket prediction; sports analytics; multivariate regression; neural network
Online: 21 October 2020 (12:34:00 CEST)
In cricket, particularly the twenty20 format is most watched and loved by the people, where no one can guess who will win the match until the last ball of the last over. In India, The Indian Premier League (IPL) started in 2008 and now it is the most popular T20 league in the world. So we decided to develop a machine learning model for predicting the outcome of its matches. Winning in a Cricket Match depends on many key factors like a home ground advantage, past performances on that ground, records at the same venue, the overall experience of the players, record with a particular opposition, and the overall current form of the team and also the individual player. This paper briefs about the key factors that affect the result of the cricket match and the regression model that best fits this data and gives the best predictions. Cricket, the mainstream and widely played sport across India which has the most noteworthy fan base. Indian Premier League follows 20-20 format which is very unpredictable. IPL match predictor is a ML based prediction approach where the data sets and previous stats are trained in all dimensions covering all important factors such as: Toss, Home Ground, Captains, Favorite Players, Opposition Battle, Previous Stats etc, with each factor having different strength with the help of KNIME Tool and with the added intelligence of Naive Bayes network and Eulers strength calculation formula.
ARTICLE | doi:10.20944/preprints202010.0328.v1
Subject: Engineering, Automotive Engineering Keywords: wearable biosensors; wireless technology; human grip force; motor control; complex task-user systems; expertise; multivariate data; correlation analysis; functional analysis
Online: 15 October 2020 (15:13:43 CEST)
Biosensors and wearable sensor systems with transmitting capabilities are currently developed and used for the monitoring of health data, exercise activities, and other performance data. Unlike conventional approaches, these devices enable convenient, continuous, and unobtrusive monitoring of a user’s behavioral signals in real time. Examples include signals relative to hand an finger movement/pressure control reflected by individual grip force data. As will be shown here, these directly translate into task, skill and hand-specific (dominant versus non-dominant hand) grip force profiles for different measurement loci in the fingers and palm of the hand. On the basis of thousands of sensor data from multiple sensor locations, individual grip force profiles of an task expert, a trained user and a highly proficient user (expert) performing an image-guided and robot-assisted precision task with the dominant or the non-dominant hand are analyzed in several steps following Tukey’s “detective work” approach. Correlation analyses (Person’s Product Moment) reveal skill-specific differences in individual grip force profiles across multiple sources of variation, functionally mapped to the somatosensory brain networks which ensure grip force control and its evolution with control expertise. Implications for the real-time monitoring of individual grip force profiles and their evolution with training in complex task-user systems are brought forward.
ARTICLE | doi:10.20944/preprints201904.0218.v1
Subject: Business, Economics And Management, Business And Management Keywords: eco-innovation; anticipated regulation; self-regulation; industry-specific characteristics; information sourcing openness; multivariate probit model; zero inflated negative binomial model
Online: 19 April 2019 (11:25:06 CEST)
The move to a low carbon economy is very important for enhancing international competitiveness. The eco-innovation is the critical factor of the green paradigm. This study is designed to investigate deeply the determinants of eco-innovation of manufacturing firms in Korea by suggesting anticipated regulations, self-regulations, and industry-specific characteristics as external factors and open information sources as internal factors. The data used in the analysis is 1946 sample firms from Korean Innovation Survey 2010 based on the Oslo Manual. Using the multi-variate probit analysis and the zero-inflated negative binomial (ZINB) regression analysis, we have found out that the anticipated regulations and self-regulations have significant influences both on eco-process innovation and eco-product innovation, while industrial characteristics have no effects. The empirical results also show that the breadth of information sources has a positive effect on businesses in implementing eco-innovations. Our findings show that the Korean government should provide a good platform where firms can better understand the future trends of environmental policies, particularly policies on anticipated and self-regulations. At the same time, Korean firms should establish a voluntary system to control environmental activities so that they can improve eco-innovations through integrating external information.
ARTICLE | doi:10.20944/preprints202201.0317.v1
Subject: Business, Economics And Management, Econometrics And Statistics Keywords: Cohort-Component Method; Multivariate Methods; Time Series Analysis; Monte Carlo Methods; Stochastic Forecasting; Demography; Statistical Epidemiology; Labor Market Research; Health Economics
Online: 21 January 2022 (10:32:54 CET)
Demographic change is leading to the aging of German society. As long as the baby boom co-horts are still of working age, the working population will also age - and decline as soon as this baby boom generation gradually reaches retirement age. At the same time, there has been a trend towards increasing absenteeism (times of inability to work) in companies since the zero years, with the number of days of absence increasing with age. We present a novel stochastic forecast approach that combines population forecasting with forecasts of labor force participation trends, considering epidemiological aspects. For this, we combine a stochastic Monte Carlo-based cohort-component forecast of the population with projections of labor force participation rates and morbidity rates. This article examines the purely demographic effect on the economic costs associated with such absenteeism due to the inability to work. Under expected future employment patterns and constant morbidity patterns, absenteeism is expected by close to 5 percent by 2050 relative to 2020, associated with increasing economic costs of almost 3 percent. Our results illustrate how strongly the pronounced baby boom/ baby bust phenomenon determines demographic development in Germany in the midterm.
ARTICLE | doi:10.20944/preprints201804.0157.v1
Subject: Computer Science And Mathematics, Probability And Statistics Keywords: information theory; cohomology; algebraic topology; topological data analysis; genetic expression; epigenetics; machine learning; statistical physic; multivariate mutual-information; complex systems; biodiversity
Online: 12 April 2018 (05:35:26 CEST)
This paper establishes methods that quantify the structure of statistical interactions within a given data set using the characterization of information theory in cohomology by finite methods, and provides their expression in terms of statistical physic and machine learning. Following [1–3], we show directly that k multivariate mutual-informations (Ik) are k-coboundaries. The k-cocycles are given by Ik = 0, which generalize statistical independence to arbitrary dimension k. The topological approach allows to investigate Shannon’s information in the multivariate case without the assumptions of independent identically distributed variables. We develop the computationally tractable subcase of simplicial information cohomology represented by entropy Hk and information Ik landscapes. The I1 component defines a self-internal energy functional Uk, and (−1)k Ik,k≥2 components define the contribution to a free energy functional Gk of the k-body interactions. The set of information paths in simplicial structures is in bijection with the symmetric group and random processes, provides a topological expression of the 2nd law and points toward a discrete Noether theorem (1st law). The local minima of free-energy, related to conditional information negativity and the non-Shannonian cone of Yeung , characterize a minimum free energy complex. This complex formalizes the minimum free-energy principle in topology, provides a definition of a complex system, and characterizes a multiplicity of local minima that quantifies the diversity observed in biology. Finite data size effects and estimation bias severely constrain the effective computation of the information topology on data, and we provide simple statistical tests for the undersampling bias and for the k-dependences following . We give an example of application of these methods to genetic expression and cell-type classification. The maximal positive Ik identifies the variables that co-vary the most in the population, whereas the minimal negative Ik identifies clusters and the variables that differentiate-segregate the most. The methods unravel biologically relevant I10 with a sample size of 41. It establishes generic methods to quantify the epigenetic information storage and a unified epigenetic unsupervised learning formalism.
ARTICLE | doi:10.20944/preprints202305.0786.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: multitouch option; n-touch option; baseball option; barrier option; step barrier option; first passage time; boundary crossing probability; dimension; multivariate gaussian integral
Online: 11 May 2023 (04:42:23 CEST)
In this article, the multitouch option, also called the n touch option (or the “baseball” option when n 3 ) is analyzed and valued in closed form. This is a kind of barrier option that sets a gradual knockout / knock-in mechanism based on the number of times the underlying asset has crossed a predefined barrier in various time intervals before expiry. The higher the number of predefined time intervals during which the barrier has been touched, the lower the value of a knock-out contract at expiry, and conversely for a knock-in one. Multitouch options can be viewed as an extension of step barrier options, preserving the ability of the latter to adjust the exposure to risk over time, while eliminating the notorious danger of “sudden death” that holders of step barrier options are faced with. Unlike occupation time derivatives, the payoff at expiry does not depend on the amount of time spent outside the authorized range, but on the number of passages beyond the authorized range.
ARTICLE | doi:10.20944/preprints201709.0112.v1
Subject: Computer Science And Mathematics, Computational Mathematics Keywords: multivariate logarithmic polynomial; generating function; completely monotonic function; Bernstein function; integral representation; Lévy-Khintchine representation; real part; imaginary part; uniform convergence; recurrence relation; mathematical induction
Online: 23 September 2017 (10:55:57 CEST)
In the paper, by induction and recursively, the author proves that the generating function of multivariate logarithmic polynomials and its reciprocal are a Bernstein function and a completely monotonic function respectively, establishes a Lévy-Khintchine representation for the generating function of multivariate logarithmic polynomials, deduces an integral representation for multivariate logarithmic polynomials, presents an integral representation for the reciprocal of the generating function of multivariate logarithmic polynomials, computes real and imaginary parts for the generating function of multivariate logarithmic polynomials, derives two integral formulas, and denies the uniform convergence of a known integral representation for Bernstein functions.