ARTICLE | doi:10.20944/preprints201806.0247.v1
Subject: Mathematics & Computer Science, Other Keywords: data mining; association rule learning; policyholder lapse; auto insurance; market inefficiency
Online: 15 June 2018 (09:01:03 CEST)
For automobile insurance, it has long been implied that when a policyholder made at least one claim in the prior year, the subsequent premium is likely to increase. When this happens, the policyholder may seek to switch to another insurance company to possibly avoid paying for a higher premium. In such situations, insurers may be faced with the challenges of policyholder retention by keeping premiums low in the face of competition. In this paper, we seek to find empirical evidence of possible association between policyholder switching after a claim and the associated change in premium. In accomplishing this goal, we employ the method of association rule learning, a data mining technique that has its origins in marketing for analyzing and understanding consumer purchase behavior. We apply this unique technique in two stages. In the first stage, we identify policyholder and vehicle characteristics that affect the size of the claim and resulting change in premium regardless of policy switch. In the second stage, together with policyholder and vehicle characteristics, we identify the association among the size of the claim, the level of premium increase and policy switch. This empirical process is often challenging to insurers because they are unable to observe the new premium for those policyholders who switched. However, we used a 9-year claims data for the entire Singapore automobile insurance market that allowed us to track information before and after the switch. Our results provide evidence of a strong association among the size of the claim, the level of premium increase and policy switch. We attribute this to the possible inefficiency of the insurance market because of the lack of sharing and exchange of claims history among the companies.
ARTICLE | doi:10.20944/preprints202201.0331.v1
Subject: Earth Sciences, Geoinformatics Keywords: Concentration field; Spatial auto-correlation; Association rules; Apriori algorithm; Element co-occurrence
Online: 21 January 2022 (13:42:44 CET)
The spatial distribution of elements can be regarded as a numerical field of concentration values with a continuous spatial coverage. An active area of research is to discover geologically meaningful relationships among elements from their spatial distribution. To solve this problem, we propose an association rule mining method based on clustered events of spatial auto-correlation and applied it to the polymetallic deposits of the Chahanwusu River area, Qinghai Province, China. The elemental data for stream sediments were first clustered into HH (high-high), LL (low-low), HL (high-low), and LH (low-high) groups by using local Moran’s I clustering map (LMIC). Then the Apriori algorithm was used to mine the association rules among different elements in these clusters. More than 86% of the mined rule points are located within 1000 m of faults and near known ore occurrences, and occur in the upper reaches of the stream and catchment areas. In addition, we found that the Indosinian granodiorite is enriched in sulfophile elements, e.g., Zn, Ag and Cd, and the Variscan granite quartz diorite (P1γδο) coexists with Cu and associated elements. Therefore, the proposed algorithm is an effective method for mining co-existence patterns of elements and provides an insight into their enrichment mechanisms.
ARTICLE | doi:10.20944/preprints202209.0087.v1
Subject: Medicine & Pharmacology, Pediatrics Keywords: Malnutrition; association; feeding; practice; infants; Pakistan
Online: 6 September 2022 (10:13:18 CEST)
Breastmilk is the only recommended source of nutrition for infants below six-months of age. However, a significant proportion of children are either on supplemental breastfeeding(SBF) or weaned due to the early introduction of solid/semi-solid/soft food and liquids(SSF) before 6 months of age. There is good evidence that Exclusive Breastfeeding(EBF) in infants below six-months of age protects them from preventable illnesses, including malnutrition. The relationship between infant feeding practices and coexisting forms of malnutrition(CFM) has not yet been explored. This study examined the association of different feeding indicators(continuation of breastfeeding, predominant feeding, and SSF) and feeding practices(EBF, SBF, and complete weaning) with CFM in infants aged below six-months of age in Pakistan. National and regional datasets of Pakistan from the last ten years were retrieved from the Demographic Health Surveys(DHS) and UNICEF data repositories. In Pakistan, 34.5%(n=6131) of infants have some form of malnutrition. Among malnourished infants, 44.7%(~15.4% of the total sample) had a CFM. Continuation of breastfeeding was observed in more than 85% of infants, but less than a quarter were on EBF, and the rest were either SBF(65.4%) or weaned infants(13.7%). Compared to EBF, complete weaning increased the odds of coexistence of underweight with wasting and underweight with both wasting and stunting by 1.96(1.12-3.47) and 2.25(1.16-4.36), respectively. Overall, breastfed children had lower odds of various forms of CFM (compared to non-breastfed), except for the coexistence of stunting with overweight/obesity. Continuation of any breastfeeding protects infants in Pakistan from various types of CFM during the first six months of life.
ARTICLE | doi:10.20944/preprints202208.0537.v1
Online: 31 August 2022 (08:18:32 CEST)
Different soil nutrients affect plant metabolites accumulation characteristics. The main soil nutrients and their correlation with Pepino metabolites were investigated in this study to evaluate differences between greenhouses on the Loess Plateau in northwest China. A total of 269 Pepino metabolites in the fruits were identified using a UPLC-QTOF-MS approach from plants grown in three major Pepino growing regions. Their differential distribution characteristics were analyzed. 99 metabolites differed among the Pepino fruits in the three regions. The main classes of the differentially accumulated metabolites were ranked as Amino acids and derivatives, Nucleotides and derivatives, Organic acids, Alkaloids, Vitamins, Saccharides and Alcohols, Phenolic acids, Lipids, and others. Environmental factor analysis indicated that soil nutrients were the primary differentiating factor. Five soil nutrient indicators: TN（total nitrogen）, TP（total phosphorus）, AP（available phosphorus）, AK（available potassium）, and OM（organic matter）, exhibited significant differences in three growing sites. Metabolite and soil nutrient association analysis using redundancy analysis (RDA) and Mantel test indicated that TNand OM contributed to the accumulation of amino acids and derivatives, nucleotides and derivatives, and alkaloids while inhibiting organic acids, vitamins coagulation biosynthesis. Moreover, AP and TP were associated with the highest accumulation of saccharides and, alcohols, phenolic acids. Consequently, differences in soil nutrients were reflected in Pepino metabolites variability. This study clarified the metabolite variability and the relationship between Pepino and soil nutrients in the main planting areas of northwest China. It provides a theoretical basis for the subsequent development of Pepino's nutritional value and cultivation management.
ARTICLE | doi:10.20944/preprints202009.0012.v1
Online: 1 September 2020 (11:42:14 CEST)
The specificity and potency of venom components gives them a unique advantage in development of various pharmaceutical drugs. Though venom is a cocktail of proteins rarely is the synergy and association between various venom components studied. Understanding the relationship between various components is critical in medical research. Using meta-analysis, we found underlying patterns and associations in the appearance of the toxin families. For Crotalus, Dis has the most associations with the following toxins: PDE; BPP; CRL; CRiSP; LAAO; SVMP P-I & LAAO; SVMP P-III and LAAO. In Sistrurus venom CTL and NGF had most associations. These associations can be used to predict presence of proteins in novel venom and to understand synergies between venom components for enhanced bioactivity. Using this approach, the need to revisit classification of proteins as major components or minor components is highlighted. The revised classification of venom components needs to be based on ubiquity, bioactivity, number of associations and synergies. The revised classification will help in increased research on venom components such as NGF which have high medical importance.
ARTICLE | doi:10.20944/preprints201909.0040.v1
Subject: Social Sciences, Business And Administrative Sciences Keywords: data mining; security; association rule; ECLAT
Online: 4 September 2019 (03:48:58 CEST)
The purpose of this paper is to develop WebSecuDMiner algorithm to discover unusual web access patterns based on analysing the potential rules hidden in web server log and user navigation history. Design/methodology/approach: WebSecuDMiner uses equivalence class transformation (ECLAT) algorithm to extract user access patterns from the web log data, which will be used to identify the user access behaviours pattern and detect unusual one. Data extracted from the web serve log and user browsing behaviour is exploited to retrieve the web access pattern that is produced by the same user. Findings: WebSecuDMiner is used to detect whether any unauthorized access have been posed and take appropriate decisions regarding the review of the original rights of suspicious user. Research limitations/implications: The present work uses the database which is extracted from web serve log file and user browsing behaviour. Although the page is viewed by the user, the visit is not recorded in the server log file, since it can be access from the browser's cache.
ARTICLE | doi:10.20944/preprints202111.0386.v1
Subject: Life Sciences, Genetics Keywords: genome-wide association study; transcriptome-wide association study; meta-analysis; expression quantitative trait loci; nicotine addiction
Online: 22 November 2021 (11:46:13 CET)
Genome-wide association studies (GWAS) have identified and reproduced thousands of diseases associated loci but many of them are not directly interpretable due to the strong linkage disequilibrium among variants. Transcriptome-wide association studies (TWAS) incorporated expression quantitative trait loci (eQTL) cohorts as reference panel to detect associations with the phenotype at the gene level and were gaining popularity in recent years. For nicotine addiction, several important susceptible genetic variants were identified by GWAS, but TWAS that detected genes associated with nicotine addiction and unveiled the underlying molecular mechanism were still lacking. In this study, we used eQTL data from the Genotype-Tissue Expression (GTEx) consortium as reference panel to conduct tissue specific TWAS on cigarettes per day (CPD) over 13 brain tissues in two large cohorts: UK Biobank (UKBB; N=142,202) and the GWAS & Sequencing Consortium of Alcohol and Nicotine use (GSCAN; N=143,210), and then meta-analyzed the results across tissues while considering the heterogeneity across tissues. We identified three major clusters of genes with different meta-patterns across tissues consistent in both cohorts, including homogenous genes associated with CPD in all brain tissues, partially homogeneous genes associated with CPD in cortex, cerebellum and hippocampus tissues, and lastly the tissue-specific genes associated with CPD in only few specific brain tissues. Downstream enrichment analyses on each gene cluster identified unique biological pathways associated with CPD and provided important biological insights into the regulatory mechanism of nicotine dependence in the brain.
ARTICLE | doi:10.20944/preprints201906.0144.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: data mining; network security; association rules; DDoS
Online: 16 June 2019 (02:42:59 CEST)
Typical modern information systems are required to process copious data. Conventional manual approaches can no longer effectively analyze such massive amounts of data, and thus humans resort to smart techniques and tools to complement human effort. Currently, network security events occur frequently, and generate abundant log and alert files. Processing such vast quantities of data particularly requires smart techniques. This study reviewed several crucial developments of existent data mining algorithms, including those that compile alerts generated by heterogeneous IDSs into scenarios and employ various HMMs to detect complex network attacks. Moreover, sequential pattern mining algorithms were examined to develop multi-step intrusion detection. These studies can focus on applying these algorithms in practical settings to effectively reduce the occurrence of false alerts. This article researched the application of data mining algorithms in network security. The academic community has recently generated numerous studies on this topic.
ARTICLE | doi:10.20944/preprints202111.0326.v1
Subject: Life Sciences, Genetics Keywords: SNP; calpaincalpastatin system genes; genomic association; tenderization; ageing
Online: 18 November 2021 (13:48:09 CET)
The most important factor that determines beef tenderness is its proteolytic activity and the balance between calpain1 protease activity and calpastatin inhibition is especially important, while contributions could arise from calpain2 and possibly calpain3. These processes are however affected by the meat aging process itself. To determine whether genotypes in the calpaincalpastatin system can enhance tenderness throughout a 20 day aging period, South African purebred beef bulls (n=166) were genotyped using the Illumina BovineHD SNP BeadChip, through genebased association analysis targeting the cast, capn3, capn2 and capn1 genes. The WarnerBratzler shear force (WBSF) and myofibril fragment length (MFL) of Longissimus thoracis et lumborum (LTL) steaks were evaluated between d 3 d 20 of aging, with protease enzyme activity in the first 20 h postmortem. Although several of the 134 SNP associated with tenderness, only seven SNP in the cast, capn2 and capn1 genes sustained genetic associations, additive to agingassociated increases in tenderness for at least three of the four aging periods. While most genomic associations were relatively stable over time, some genotypes within SNP responded differently to aging, resulting in altered genomic effects over time. The level of aging at which genomic associations are performed is an important factor that determines whether SNP affect tenderness phenotypes.
ARTICLE | doi:10.20944/preprints202010.0001.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: Association mapping; chromosomes; drought tolerance; markers, structure; traits
Online: 1 October 2020 (08:40:47 CEST)
The objective of this study were to conduct association mapping for drought tolerance at the seedling stage and yield-related traits. 60 cowpea accessions were used in the study. Single-nucleotide polymorphisms (SNPs) discovered through genotyping by sequencing (GBS) were used for genotyping. Association mapping was conducted using single-marker regression (SMR) in Q Gene, and general linear model (GLM) and mixed linear model (MLM) built in TASSEL. The population of the cowpea accessions were analysed using STRUCTURE 2.3.4 and the peak of delta K in the greenhouse showed seven population types, whereas the peak of delta K in the glasshouse indicated the presence of six population types. One SNP marker, 14083649|F|0-9 was associated with NP with a p value <0.001. Fifty SNP markers were associated with PWT at p <0.001. Four SNP markers, 14074781|F|0-16, 100047392|F|0-36, 14083801|F|0-28 and 100051488|F|0-49 were associated with AVSPD at p <0.001. SNP markers, 14074781|F|0-16, 14083801|F|0-28 and 100051488|F|0-49 were associated with PL at P <0.001. Five SNP markers, 100047392|F|0-36, 14083801|F|0-28, 100072738|F|0-34, 14076881|F|0-49 and 14076881|F|0-49 were associated with PWDTH at p <0.001. The 65 SNP markers identified can be used in cowpea molecular breeding to select for AVSPD, NP, PL, PWDTH, PWT, and RR through marker assisted selection (MAS).
REVIEW | doi:10.20944/preprints202007.0583.v1
Subject: Life Sciences, Genetics Keywords: genetic association studies; extreme phenotype; genetic epidemiology; tinnitus
Online: 24 July 2020 (13:43:00 CEST)
Exome sequencing has been commonly used in rare diseases by selecting multiplex families or singletons with an extreme phenotype (EP) to search for rare variants in coding regions. The EP strategy covers both extreme ends of a disease spectrum and it has been also used to investigate the contribution of rare variants to heritability in complex clinical traits. We have conducted a systematic review to find evidence supporting the use of EP strategies to search for rare variants in genetic studies of complex diseases, to highlight the contribution of rare variation to the genetic structure of multiallelic conditions. After performing the quality assessment of the retrieved records, we selected 19 genetic studies considering EP to demonstrate genetic association. All the studies successfully identified several rare variants, de novo mutations and many novel candidate genes were also identified by selecting an EP. There is enough evidence to support that the EP approach in patients with an early onset of the disease can contribute to the identification of rare variants in candidate genes or pathways involved in complex diseases. EP patients may contribute to a better understanding of the underlying genetic architecture of common heterogeneous disorders such as tinnitus or age-related hearing loss.
DATASET | doi:10.20944/preprints202006.0226.v1
Subject: Medicine & Pharmacology, Psychiatry & Mental Health Studies Keywords: candidate-gene association; estimation; bias; confounding; case study
Online: 18 June 2020 (07:50:33 CEST)
Estimation of the reality can easily be flawed, hence, in order to result in accurate and useful estimates the process has to be protected from bias and confounding and should follow other methodological milestones inherent to different types of empirical observations. Candidate-gene association studies are a specific form of observations that have been rather extensively applied in psychiatry yielding valuable information on various aspects – when methodologically adequate and used in appropriate settings. However, certain flaws that may occur in such studies might not be bluntly obvious, at least not at first glance, and may pass unnoticed by researchers and reviewers. This case study uses two recent published candidate-gene association reports suggesting involvement of cannabinoid receptor type 1 and of heat shock protein single nucleotide polymorphisms in development of neurocognitive performance and psychopathology in a cohort of adult first episode psychosis patients to point-out the types of flaws inevitably resulting in inaccurate and useless estimates.
ARTICLE | doi:10.20944/preprints202204.0256.v1
Subject: Medicine & Pharmacology, Nutrition Keywords: tea intake; fracture; Mendelian randomization; genome-wide association studies
Online: 27 April 2022 (10:40:34 CEST)
Fracture is a global public health disease. Bone health and fracture risk have become the focus of public and scientific attention. Observational studies have reported that tea consumption is associated with fracture risk, but the results are inconsistent. The present study was conducted to evaluate whether tea consumption was causally associated with the risk of bone fracture through two-sample Mendelian Randomization (MR) analysis. We included a large genome-wide association study (GWAS) associated with tea consumption of 447,485 individuals and analyzed the effects of genetic instruments on fractures using fracture cases from the UK Biobank dataset (n=361,194). Inverse variance weighted (IVW) indicated no causal effects of tea consumption on fractures of the skull and face, shoulder and upper arm, hand and wrist, femur, calf, and ankle (odds ratio=1.000, P=0.881; OR=1.000, P=0.857; OR=1.002, P=0.339; OR=0.997, P=0.054; OR=0.998, P=0.569, respectively). Consistent results were also found in MR-Egger, weighted median, and weighted mode. Our research provided evidence that tea consumption is unlikely to affect the incidence of fractures.
REVIEW | doi:10.20944/preprints202009.0348.v2
Subject: Life Sciences, Biochemistry Keywords: DNA methylation; epialleles; epiRILs; epigenetics; Epigenome-Wide Association Studies.
Online: 26 September 2020 (08:08:27 CEST)
Plant breeding conventionally depends on genetic variability available in a species to improve a particular trait in the crop. However, epigenetic diversity may provide an additional tier of variation. The recent advent of epigenome technologies has elucidated the role of epigenetic variation in shaping phenotype. Further, the development of epigenetic recombinant inbred lines (epi-RILs) in the model species such as Arabidopsis has enabled accurate genetic analysis of epigenetic variation. Subsequently, mapping of epigenetic quantitative trait loci (epiQTL) allowed association between epialleles and phenotypic traits. Thus, quantitative epigenetics provides ample opportunities to dissect the role of epigenetic variation in trait regulation, which can be eventually utilized in crop improvement programs. Moreover, locus-specific manipulation of DNA methylation by epigenome-editing tools such as clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) can facilitate epigenetic based molecular breeding of important crop plants.
ARTICLE | doi:10.20944/preprints201803.0093.v1
Subject: Engineering, Control & Systems Engineering Keywords: linear regression; covariance matrix; data association; sensor fusing; SLAM
Online: 13 March 2018 (04:06:56 CET)
Linear regression is a basic tool in mobile robotics, since it enables accurate estimation of straight lines from range-bearing scans or in digital images, which is a prerequisite for reliable data association and sensor fusing in the context of feature-based SLAM. This paper discusses, extends and compares existing algorithms for line fitting applicable also in case of strong covariances between the coordinates at each single data point, which must not be neglected if range-bearing sensors are used. Besides, particularly the determination of the covariance matrix is considered, which is required for stochastic modeling. The main contribution is a new error model of straight lines in closed form for calculating fast and reliably the covariance matrix dependent on just a few comprehensible and easily obtainable parameters. The model can be applied widely in any case when a line is fitted from a number of distinct points also without a-priori knowledge of the specific measurement noise. By means of extensive simulations the performance and robustness of the new model in comparison to existing approaches is shown.
TECHNICAL NOTE | doi:10.20944/preprints201901.0126.v1
Subject: Life Sciences, Genetics Keywords: flax; association mapping; genome-wide association study (GWAS); simple sequence repeat (SSR); single nucleotide polymorphism (SNP); quantitative trait loci (QTL); chromosome-scale pseudomolecules
Online: 14 January 2019 (07:19:08 CET)
Quantitative trait loci (QTL) are genomic regions associated with phenotype variation of quantitative traits in a population. To date, a total of 267 QTL for 29 quantitative traits have been reported in 13 studies on flax. Of these, 200 QTL from 12 studies were identified based on genetic maps, scaffold sequences, or pre-released chromosome-scale pseudomolecules. Molecular markers for QTL identification differed across studies but were mainly based on simple sequence repeat (SSR) or single nucleotide polymorphism (SNP) markers. This article provides methods with software tools and database files to uniquely map SSR and SNP markers from different references onto the recently released chromosome-scale pseudomolecules. Using these methods, 195 QTL were successfully sorted onto the 15 flax chromosomes and grouped into 133 co-located QTL clusters. Mapping of QTL from different studies to the same reference enables comparisons and facilitates genome-wide QTL analysis, candidate gene scanning, and breeding applications.
ARTICLE | doi:10.20944/preprints202105.0102.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Market basket analysis; association rule mining; buying pattern; data mining
Online: 6 May 2021 (15:14:25 CEST)
Buyer practices have changed as individuals are figuring out how to live with the new truth of COVID-19. Take-out and conveyance orders have expanded, and our customer has added new items to their menu because of new client inclinations. With every one of the continuous changes, the customer had numerous unanswered inquiries, for example, Smartbridge has broad involvement with café innovation development Café TECHNOLOGY CAPABILITIES :Are the most famous items as yet unchanged after COVID? :Which are the most sold item blends now? :What is the acknowledgment of new things? :What are clients purchasing alongside new things? :How have liquor deals changed? The customer previously had reports that followed item deals and operational measurements, notwithstanding, there was a need to get a more profound knowledge into item examination. The customer expected to recognize what items and introductions were being sold all the more frequently, measure the acknowledgment of new items, and figure out what items clients buy together to improve advertising efforts, advancements, and deals. he E-business industry is filling immensely in the Indian market. The modest 4G web bundles in India clearly gives a push to these ventures. Thus, as Covid19 first hit in Quite a while, individuals got terrified to go out from their homes in light of the fact that, in their mind, it's a dread of Covid. They even wonder whether or not to go out to purchase fundamental (FMCG) products. Frenzy purchasing additionally has seen and to stay away from this dread of COVID-19, individuals are offering inclinations to the E-Commerce destinations to purchase fundamental products and a few clients are new which joined to purchase fundamental merchandise during this Pandemic Lockdown period. Numerous clients are moving their purchasing conduct from disconnected retail locations to online stores. This paper examines the customer buying pattern during lockdown.
ARTICLE | doi:10.20944/preprints202001.0384.v1
Subject: Social Sciences, Business And Administrative Sciences Keywords: Mixed reality; Interactivity; Vividness; Brand loyalty; Brand awareness; Brand association
Online: 31 January 2020 (11:35:28 CET)
Mixed reality technology is being increasingly used in cultural heritage attractions to enhance visitors’ experience. However, how the characteristics of mixed reality affect satisfaction and brand loyalty has not been explored in previous research. The purpose of this study is to identify factors affecting satisfaction with mixed reality experiences at cultural and artistic visitor attractions and their influence on brand loyalty, which is connected with management performance. We propose a theoretical model based on brand equity theory in the context of mixed reality experience. Survey data were gathered from 251 respondents visiting a cultural and artistic visitor attraction in Seoul, Korea using a stratified sampling method. PLS-SEM was employed for the data analysis. The results suggest that the characteristics of mixed reality (interactivity, vividness) not only influence the affective aspects (perceived immersion, perceived enjoyment) of visitors’ experience, but also positively affect brand awareness, brand association, and brand loyalty.
ARTICLE | doi:10.20944/preprints201911.0117.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: malignant mesothelioma; epidemiology; association rule mining; Apriori method; imbalanced dataset
Online: 10 November 2019 (16:15:14 CET)
Malignant mesothelioma is a rare proliferative cancer that develops in the thin layer of tissues surrounding the lungs. Malignant mesothelioma is associated with an extremely poor prognosis and the majority of patients do not show symptoms. The epidemiology of mesothelioma is important for the identification of disease. The primary aim of this study is to explore the risk factors associated with mesothelioma. The dataset consists of healthy and mesothelioma patients but only mesothelioma patients were selected for the identification of symptoms. The raw data set has been pre-processed and then the Apriori method was utilized for association rules with various configurations. The pre-processing task involved the removal of duplicated and irrelevant attributes, balanced the dataset, numerical to the nominal conversion of attributes in the dataset and creating the association rules in the dataset. Strong associations of disease’s factors; asbestos exposure, duration of asbestos exposure, duration of symptoms, erythrocyte sedimentation rate and Pleural to serum LDH ratio determined via Apriori algorithm. The identification of risk factors associated with mesothelioma may prevent patients from going into the high danger of the disease. This will also help to control the comorbidities associated with mesothelioma which are cardiovascular diseases, cancer-related emotional distress, diabetes, anemia, and hypothyroidism.
ARTICLE | doi:10.20944/preprints201807.0238.v1
Subject: Mathematics & Computer Science, Other Keywords: Multiple object tracking; Airborne video; Tracklet confidence; Hierarchical association framework
Online: 13 July 2018 (14:27:22 CEST)
Multi-object tracking (MOT) in airborne videos is a challenging problem due to the uncertain airborne vehicle motion, vibrations of the mounted camera, unreliable detections, size, appearance and motion of the moving objects as well as occlusions due to the interaction between the moving objects and with other static objects in the scene.To deal with these problems, this work proposes a four-stage Hierarchical Association framework for multiple object Tracking in Airborne video (HATA). The proposed framework combines data association-based tracking (DAT) methods and target tracking using a Compressive Tracking approach, to robustly track objects in complex airborne surveillance scenes. In each association stage, different sets of tracklets and detections are associated to efficiently handle local tracklet generation, local trajectory construction, global drifting tracklet correction and global fragmented tracklet linking. Experiments with challenging airborne video datasets show significant tracking improvement compared to existing state-of-art methods.
ARTICLE | doi:10.20944/preprints201801.0231.v1
Subject: Engineering, Other Keywords: Data mining; Association rules; Previous Cause; Type of Accident; Overexertion
Online: 24 January 2018 (19:40:52 CET)
An analysis of workplace accidents in the mining sector has been done using the database from the Spanish administration between the period 2005-2015 and applying data mining techniques. Data has been processed by means of the software Weka. Two scenarios were chosen regarding the accidents database, surface and underground mining. The most important variables involved in occupation accidents and their association rules have been determined. These rules are formed by several predictor variables that cause an accident, defining its characteristics and context. This study exposes the 20 most important association rules of the sector, either surface or underground mining, based on statistical confidence levels of each rule obtained by Weka. The outcomes display the most typical immediate causes with the percentage of accident basis of each association rule. The most typical immediate cause is body movement with physical effort or overexertion and type of accident is physical effort or overexertion. On the other hand, the second most important immediate cause and type of accident change in both scenarios. Data mining techniques have been proved as a very powerful tool to find out the root of the accidents, apply corrective measures and verify their effectiveness, either for public or private companies.
ARTICLE | doi:10.20944/preprints202208.0178.v1
Subject: Medicine & Pharmacology, Psychiatry & Mental Health Studies Keywords: alcohol dependence; comorbidity; gene network; genome-wide association study; sex differences
Online: 9 August 2022 (10:35:29 CEST)
At least 50% of factors predisposing to alcohol dependence (AD) are genetic and women affected with this disorder present with more psychiatric comorbidities, probably indicating different genetic factors involved. We aimed to run a genome-wide association study (GWAS) followed by a bioinformatic functional annotation of associated genomic regions in male and female patients with AD and eight related clinical measures. A genome-wide significant association of rs220677 with AD (p-value = 1.33×10^-8 calculated with the Yates-corrected Chi-square test under the assumption of dominant inheritance) was discovered in female patients. Associations of AD and related clinical measures with seven other single nucleotide polymorphisms listed in previous GWAS of psychiatric and addiction traits were differently replicated in male and female patients. The bioinformatic analysis showed that regulatory elements in the eight associated linkage disequilibrium blocks define the expression of 80 protein-coding genes. Nearly 68% of these and of 120 previously published coding genes associated with alcohol phenotypes directly interact in a single network. This study indicates that a number of genes behind the pathogenesis of AD are different in male and female patients, but implicated molecular mechanisms are functionally connected. The results also suggest the genetic basis of sex-specific psychiatric comorbidities of AD.
ARTICLE | doi:10.20944/preprints202206.0360.v1
Subject: Social Sciences, Business And Administrative Sciences Keywords: tourism and related; SMEs; small particulate matters; association rules; data mining
Online: 27 June 2022 (10:24:27 CEST)
In northern Thailand, the problem of small particulate matter happens every year, with the pri-mary source being agricultural weed burning and wildfire. The tourism industry is strongly impacted and has been the spotlight for the past few years. Thus, the study aims to investigate the effect of small particulate matter on tourism and related SMEs in Chiang Mai, Thailand. The data was collected from 286 entrepreneurs in the tourism and related SMEs sectors. The data was analyzed using data mining and association rule techniques. The study revealed that small particulate matter has a considerable impact on customer factors, especially the number of customers has decreased. Operational factors and prod-uct/service factors are also affected by the dust in the form of adjustments to keep the business running and the protection of the health of employees and customers. Certainly, financial factors are affected by the small particulate matter situation, both lower revenues and higher costs.
ARTICLE | doi:10.20944/preprints202205.0258.v1
Subject: Life Sciences, Genetics Keywords: Mendelian randomisation; Alcohol Consumption; UK Biobank; Phenome wide association studies; Biomarker
Online: 19 May 2022 (09:09:35 CEST)
Background: Alcohol consumption is associated with the development of cardiovascular diseases, cancer, and liver disease. The biological mechanisms are still largely unclear. Here, we aimed to use an agnostic approach to identify phenotypes mediating the effect of alcohol on various diseases. Methods: We performed an agnostic association analysis between alcohol consumption (red, and white wine, beer/cider, fortified wine, and spirits) with over 7,800 phenotypes from the UK biobank comprising 223,728 participants. We performed Mendelian randomisation analysis to infer causality. We additionally performed a Phenome-wide association analysis and a mediation analysis between alcohol consumption as exposure, traits in causal relationship with alcohol consumption as mediators, and various diseases as outcome. Results: Of 45 traits in association with alcohol consumption, 20 were in causal relationship with alcohol consumption. Gamma glutamyltransferase (GGT; β=9.44; CI,5.94-12.93; Pfdr=9.04×10-7), mean sphered cell volume (β=0.189; CI,0.11-0.27; Pfdr=1.00×10-4), mean corpuscular volume (β=0.271; CI,0.19-0.35; Pfdr=7.09×10-10) and mean corpuscular haemoglobin (β=0.278; CI,0.19-0.36; Pfdr=1.60×10-6) showed the strongest causal relationships. We also identified GGT and physical activity as mediators causing liver cirrhosis and alcohol dependence. Conclusion: Our study provides evidence of causality between alcohol consumption and 20 traits and a mediation effect for physical activity on health consequences of alcohol consumption.
Subject: Keywords: membrane theory; Association-Induction Hypothesis; ion transport, ion adsorption; membrane potential
Online: 13 August 2021 (08:53:04 CEST)
Accurate prediction of the membrane potential by membrane theory is possible on the basis that the plasma membrane is selectively permeable to ions and that permeability determines the characteristics of the membrane potential. However, an experimental and artificial cell system with an impermeable membrane serving as a model plasma membrane has a non-zero membrane potential, and this potential generated across the membrane is somehow consistent with the potential characteristics predicted by the membrane theory, despite the impermeability of the membrane to ions. A long-forgotten theory, called the association-induction hypothesis (AIH), has emerged as a more plausible mechanism for generating the membrane potential than the membrane theory to explain this unexpected behavior. The AIH asserts that ion-selective membrane permeability is not necessary for the generation of the membrane potential, which is contrary to the membrane theory. Although such an idea is not easy to accept, the experimental results clearly suggest the correctness of the AIH.
REVIEW | doi:10.20944/preprints202103.0066.v1
Subject: Biology, Anatomy & Morphology Keywords: GRAS protein, DELLA, Intrinsically Disordered Proteins, Arbuscular Mycorrhizal association, abiotic stress
Online: 2 March 2021 (10:01:42 CET)
The GAI‐RGA ‐ and ‐SCR (GRAS) proteins belong to the plant-specific transcription factor gene family and involved in several developmental processes, phytohormone and phytochrome signaling, symbiosis, stress responses etc. GRAS proteins have a conserved GRAS domain at C-terminal and hypervariable N-terminal. The C-terminal conserved domain directly affects the function of the GRAS proteins. For instance, in Arabidopsis, mutations in this domain in Slender rice 1 (SLR1) and Repressor of GA (RGA) proteins cause significant phenotypic changes. GRAS proteins have been reported in more than 30 plant species and till now it has been divided into 17 subfamilies. This review highlighted GRAS protein's importance during several biological processes in plants, structural features of GRAS proteins, their expansion and diversification in the plants, GRAS-interacting proteins complexes and their role in biological processes. We also summarized available recent research that utilized CRISPR-Cas9 technology to manipulate GRAS genes in a plant for different traits. Further, the exploitation of GRAS genes in crop improvement programs has also been discussed
ARTICLE | doi:10.20944/preprints202007.0077.v1
Subject: Behavioral Sciences, Cognitive & Experimental Psychology Keywords: Second Language Learning; Word Learning; Cognate Effect; Synonymy; Picture Word Association
Online: 5 July 2020 (15:00:57 CEST)
The effects of cognate synonymy in L2 word learning are explored. Participants learned the names of well-known concrete concepts in a new fictional language following a picture-word association paradigm. Half of the concepts (set A) had two possible translations in the new language (i.e., both words were synonyms): one was a cognate in participants’ L1 and the other one was not. The other half of the concepts (set B) had only one possible translation in the new language, a non-cognate word. After learning the new words, participants’ memory was tested in a picture-word matching task and a translation recognition task. In line with previous findings, our results clearly indicate that cognates are much easier to learn, as we found that the cognate translation was remembered much better than both its non-cognate synonym and the non-cognate from set B. Our results also seem to suggest that non-cognates without cognate synonyms (set B) are better learned than non-cognates with cognate synonyms (set A). This suggests that, at early stages of L2 acquisition, learning a cognate would produce a poorer acquisition of its non-cognate synonym, as compared to a solely learned non-cognate. These results are discussed under the light of different theories and models of bilingual mental lexicon.
ARTICLE | doi:10.20944/preprints201906.0235.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: bivariate Copula; measures of association; dependence modeling; Kendall’s t; Blomqvist’s P
Online: 24 June 2019 (08:58:06 CEST)
Copulas are useful tools for modeling the dependence structure between two or more variables. Copulas are becoming a quite flexible tool in modeling dependence among the components of a multivariate vector, in particular to predict losses in insurance and finance. In this article, we study the dependence structure of some well-known real life insurance data (with two components mainly) and subsequently identify the best bivariate copula to model such a scenario via VineCopula package in R. Associated structural properties of these bivariate copulas are also discussed.
Subject: Biology, Animal Sciences & Zoology Keywords: Genome-wide association studies (GWAS); post-GWAS; sheep; tail fat deposition
Online: 11 June 2019 (10:04:39 CEST)
The type of tail of sheep is an important economic trait. However, the candidate genes associated with the tail type are uncertain. The objective of this study was to identify the genetic region and genotype responsible for the tail type phenotype. Here we perform a genome-wide association study (GWAS) in 40 large tailed Han sheep and 40 Altay sheep as case and 40 Tibetan sheep as control. The results indicated that a total 31 genome-wide significant SNPs associated with type of tail traits were detected. For significant SNPS loci, determine its physical location, and screening of candidate genes within section. By combining information of previously reported and annotated biological functional genes, we identified SPAG17, Tbx15, VRTN, NPC2, BMP2 and PDGFD as the most promising candidate genes for type of tail traits. Based on the above identified candidate genes on type of tail traits, we selected BMP2 and PDGFD to conduct the genetic effect analysis in a large Altay sheep and Tibetan sheep population. Rs119 T>C in the exon1 of BMP2 gene and 1 SNPs in the exon4 (rs69 C>A) of PDGFD gene were detected, rs119 that located on exon1 of BMP2 gene was TT genotype in Altay sheep, while with CC genotype in Tibetan sheep. On rs69 of PDGFD gene, Altay sheep with CC genotype, however, Tibetan sheep with AA genotype. These results indicated that the significant associations of SNPs detected in GWAS were indirectly caused by the genetic effects of BMP2 and PDGFD on sheep tail fat deposition.
ARTICLE | doi:10.20944/preprints201804.0227.v1
Subject: Mathematics & Computer Science, Numerical Analysis & Optimization Keywords: Ensemble clustering; cluster stability; F-measure; co-association matrix; genetic algorithm
Online: 17 April 2018 (15:58:22 CEST)
Nowadays, we live in a world in which people are facing with a lot of data that should be stored or displayed. One of the key methods to control and manage this data refers to grouping and classifying them in clusters. Today, clustering has a critical role in information retrieval methods for organizing large collections inside a few significant clusters. One of the main motivations for the use of clustering is to determine and reveal the hidden and inherent structure of a set of data. Ensemble clustering algorithms combine multiple clustering algorithms to finally reach an overall clustering system. Ensemble clustering methods by lack of information fusing utilize several primary partitions of data to find better ways. Since various clustering algorithms look at the different data points, they can produce various partitions from such data. It is possible to create a partition with high performance by combining the partitions obtained from different algorithms, even if the clusters to be very dense from each other. Most studies in this area have examined all the initial clusters. In this study, a new method is used in which the most sustainable clusters are utilized instead of all primary produced clusters. Consensus function based on co-association matrixes used to select more stable clusters. The most stable clusters selection method is done by cluster stability criterion based on F-measure. Optimization functions are used to optimize the obtained final clusters. The genetic algorithm is the optimizer used in this article to find the ultimate clusters participated in a consensus. Experimental results on several datasets show that the output of proposed method is various clusters with high stability.
ARTICLE | doi:10.20944/preprints202202.0164.v1
Subject: Life Sciences, Genetics Keywords: rare variants; genome-wide association study; validation test; SNP chip; genomic selection
Online: 11 February 2022 (15:59:26 CET)
The experiments described in this research article were designed to test the effect of rare variants into genomic prediction in dairy cattle. Common polymorphisms are able to explain only a small proportion of the underlying genetic variation of complex phenotypes. Variants representing functional mutations with large effects on complex phenotypes are expected to be rare due to natural (humans) or artificial (livestock) selection pressure. Therefore, it is important to check whether the use of rare variants could increase the accuracy of ranking of animals by providing the tool for more precise differentiation among the bulls with high additive genetic merit. The goal of our study was to verify whether including rare variants in a genomic selection model allows for a more accurate description of the additive genetic background of traits under selection in dairy cattle. We used the linear mixed model for comparison SNP estimates for Holstein-Friesian cattle of the two data sets – a set containing only single nucleotide polymorphisms defined by minor allele frequency ≥ 0.01, which is routinely used in the Polish genomic evaluation system (46,216 SNPs), and a set containing SNPs selected based only on the call rate (54,378 SNPs). Based on the SNP estimates we also calculated DGV and GEBV and compared them between both data sets. In all the analyses we used production, fertility, conformation and udder health traits. We also assessed the time required for the two most computationally demanding components of genomic selection: preparing genotype data, and estimation of SNP effects between those two data sets. The results of our study indicated that the analysis including rare variants resulted in changes in the individual ranking of the top 100 male and female candidates, but had no effect on the outcome of the quality of EBV prediction as expressed by the Interbull validation test.
Subject: Medicine & Pharmacology, Psychiatry & Mental Health Studies Keywords: feeding and eating disorder; genome-wide association study; methylation quantitative trait loci
Online: 8 October 2021 (14:23:39 CEST)
Eating disorders (ED) are characterized by alterations in eating behavior. The genetic factors shared between ED diagnoses have been underexplored. The present study aimed to perform a genome-wide association study on individuals with disordered eating behaviors in the Mexican population, blood methylation quantitative trait loci (blood-meQTL) analysis, and in silico function prediction by different algorithms. The analysis included a total of 1803 individuals. Genome-wide association study and blood-meQTL analysis were performed by logistic and linear regression. In silico functional variant prediction, phenome-wide, and transcriptome-wide association studies by different algorithms were analyzed. In the genome-wide association study, we identified 44 single-nucleotide polymorphisms (SNP) associated at a nominal value and 7 blood-meQTL at a genome-wide umbral. The SNPs were enriched in genome-wide associations of the metabolic and immunologic domains. In the in silico analysis, the SNP rs10419198 located on an enhancer mark could change the expression of PRR12 on blood, adipocytes, and brain areas that regulate food intake. The present study supports the previous associations of genetic variation in the metabolic domain with ED.
REVIEW | doi:10.20944/preprints202107.0045.v1
Subject: Medicine & Pharmacology, Allergology Keywords: genome wide association studies (GWAS); single nucleotide polymorphism (SNP); oestrogen; ESR1; HOXA10
Online: 2 July 2021 (09:59:27 CEST)
Endometriosis is a chronic neuro-inflammatory disorder the defining feature of which is the growth of tissue (lesions) that resembles the endometrium in sites outside the uterus. Estimates of prevalence typically quote rates of ~10% of women of reproductive age, equating to ~190 million women world-wide. Three subtypes of endometriosis are usually considered when discussing the aetiology of the disorder - superficial peritoneal, ovarian (endometrioma cysts), and deep (infiltrating). Genetic, hormonal and immunological factors have all been proposed as contributing to risk factors associated with the development of lesions. Twin studies report the heritable component of endometriosis as ~50%. Genome wide association studies (GWAS) have been conducted allowing unbiased scanning of the genome for single nucleotide polymorphisms (SNPs) in many thousands of individuals. These studies have identified SNPs that appear over-represented in patients with endometriosis, particularly those with more extensive disease (stage III/IV). Amongst the larger scale GWAS there has been replication of SNPs near genes involved in oestrogen and other signalling pathways including ESR1 (oestrogen receptor alpha), GREB1, HOXA10, WNT4 and MAPK kinase signalling. The results from patients with endometriosis have also provided an opportunity to make comparisons with GWAS conducted on other patient cohorts including those with reproductive traits (age at menarche) and disorders (fibroids, endometrial and ovarian cancer) and conditions that are reported by women with endometriosis (migraine, depression). These comparative studies have highlighted some shared genetically-controlled biological mechanisms, including hormone-regulated pathways which might explain the co-occurrence of endometriosis with these disorders. In summary, unbiased genetic analysis has provided new insights into the genetic factors that may contribute to increased risk of developing endometriosis. New studies are needed to broaden the range of patients contributing to these datasets and to improve integration with non-genomic and tissue expression data before their full potential for diagnosis and improvements in patient care can be fully realised.
ARTICLE | doi:10.20944/preprints202008.0268.v1
Subject: Mathematics & Computer Science, General & Theoretical Computer Science Keywords: Authentication; Password Authentication; Password Strength; Password Memorability; Association Password Technique; Computer Security
Online: 11 August 2020 (15:08:21 CEST)
The Study proposes possible solution to enhance Password Authentication using Association Technique based on the Ecological theory of memory and data at the Presbyterian university College Ghana. The study used a deductive research approach and employed two empirical Studies using the non-probability sampling technique where few respondents were selected in categories out of the populace by means of openness in other to get similar categories of respondents with different age groups and education background. The two Empirical study also used a quasi-experimental approach which structure incorporate observation, experimental treatment and timing. The First empirical study carried out an investigation to identify existing Password authentication Technique used by End Users, as well as their behavior in password utilization through a self-completed questionnaires which was analyzed using SPSS version 21 . The Second empirical study was an experiment to compared three kinds of password constructions that is own set, modified dictionary, and association against one another to see which of them would be best meet the ecological theory of memory and data which aims at creating a secured password that is easy to recall. The computation and evaluation of password construction was done using My1login Password meter whiles Levenshtein Distance String Edit Software was also used to compute the memorability of all given password. Across-tabulation was then employed out of the experiment using SPSS version 21. The result from the analysis revealed that the majority of the respondents do have weak passwords and also have few passwords which they even end up sharing with families and friends and reuse it. This confirms statements made by researchers that human being is the weakest connection in information system securities. To maintain Confidentiality, Availability and Integrity of data, the study therefore recommended the use of the Association Password technique which makes it easier to develop a much secured password that is difficult to crack but easy remember.
ARTICLE | doi:10.20944/preprints202002.0202.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: Napier grass; elephant grass; EMBRAPA; forage yield; feed quality; marker trait association
Online: 15 February 2020 (15:01:37 CET)
The evaluation of forage crops for adaptability and performance across production systems and environments is one of the main strategies used to improve forage production. To enhance the genetic resource base and identify traits responsible for increased feed potential of Napier grass, forty-five genotypes from EMBRAPA, Brazil, were evaluated for forage biomass yield and feed nutritional quality in a replicated trial under wet and dry season conditions in Ethiopia. The results revealed significant variation in forage yield and feed nutritional qualities among the genotypes and between the wet and dry seasons. Feed fibre components were lower in the dry season while crude protein, in vitro organic matter digestibility and metabolizable energy were higher. Based on the cumulative biomass yield and metabolizable energy yield, top performing genotypes were identified that are candidates for future forage improvement studies. Furthermore, the marker-trait association study identified diagnostic SNP and SilicoDArT markers and potential candidate genes that could differentiate high biomass yielding and high metabolizable energy genotypes in the collection.
ARTICLE | doi:10.20944/preprints201907.0166.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: DArTseq; Groundnut; Linkage disequilibrium; Marker assisted selection; Marker trait association; Physiological traits
Online: 12 July 2019 (11:42:33 CEST)
In order to integrate genomics in breeding and development of drought tolerant groundnut genotypes, identification of genomic regions/genetic markers for drought surrogate traits is essential. We used SNP markers for a genetic analysis of the ICRISAT groundnut minicore collection for genome wide marker-trait association for some physiological traits and to determine the magnitude of linkage disequilibrium (LD) present in the genetic resources. The LD analysis showed that about 36% of loci pairs were in significant LD (P < 0.05 and r2 > 0.2) and 3.14% of the pairs were in complete LD. There was rapid decline in LD with distance and the LD was <0.2 at a distance of 41635 bp. The marker trait association (MTAs) studies revealed 20 significant MTAs (p <0.001) with 11 markers for leaf area index (4), canopy temperature (13), chlorophyll content (1) and NDVI (2). The markers explained 2 to 21% of the phenotypic variation observed. Most of the MTAs identified on the A subgenome were also identified on the respective homeologous chromosome on the B subgenome. The duplications of effect observed could be due to common ancestor of the A and B genome which explains the linkage detected between markers lying on different chromosomes seen in the current study. The present study identified a total of 20 highly significant marker trait associations with 11 markers for four physiological traits of importance in groundnut; LAI, CT, SCMR and NDVI. The markers identified in this study can serve as useful genomic resources to initiate marker-assisted selection and trait introgression of groundnut for drought tolerance. The identified markers in this study may be useful for marker assisted selection after further validation.
ARTICLE | doi:10.20944/preprints201806.0491.v1
Subject: Medicine & Pharmacology, Other Keywords: epidemiology, causality, association, smoking, lung cancer, vitamin D, sun exposure, multiple sclerosis
Online: 29 June 2018 (15:42:02 CEST)
If environmental exposures are shown to cause an adverse health outcome, reducing exposure should reduce the disease risk. Links between exposures and outcomes are typically based on ‘associations’ derived from observational studies, and causality may not be clear. Randomised controlled trials to ‘prove’ causality are often not feasible or ethical. Here the history of evidence that tobacco smoking causes lung cancer – in observational studies – is compared to that of low sun exposure and/or low vitamin D status as causal risk factors for the autoimmune disease, multiple sclerosis. Evidence derives from in vitro and animal studies, as well as ecological, case-control and cohort studies, in order of increasing strength. For smoking and lung cancer, the associations are strong, consistent, and biologically plausible – the evidence is coherent or ‘in harmony’. For low sun exposure/vitamin D as risk factors for MS, the evidence is weaker, with smaller effect sizes, but coherent across a range of sources of evidence, and biologically plausible. The association is less direct – smoking is directly toxic and carcinogenic to the lung, but sun exposure/vitamin D modulate the immune system, which in turn may reduce the risk of immune attack on self-proteins in the central nervous system. Opinion about whether there is sufficient evidence to conclude that low sun exposure/vitamin D increase the risk of multiple sclerosis, is divided. General public health advice to receive sufficient sun exposure to avoid vitamin D deficiency (<50nmol/L) should also ensure any benefits for multiple sclerosis.
ARTICLE | doi:10.20944/preprints202008.0210.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Textual Entailment by Generality; Asymmetric Word Similarities; Asymmetric Association Measure; Informative Asymmetric Measure
Online: 8 August 2020 (17:45:18 CEST)
In this work we present a new unsupervised and language-independent methodology to detect relations of textual generality, for this, we introduce a particular case of textual entailment (TE), namely Textual Entailment by Generality (TEG). TE aims to capture primary semantic inference needs across applications in Natural Language Processing (NLP). Since 2005, in the TE recognition (RTE) task, systems are asked to automatically judge whether the meaning of a portion of the text, the Text - T, entails the meaning of another text, the Hypothesis - H. Several novel approaches and improvements in TE technologies demonstrated in RTE Challenges are signalling of renewed interest towards a more in-depth and better understanding of the core phenomena involved in TE. In line with this direction, in this work, we focus on a particular case of entailment, entailment by generality, to detect relations of textual generality. In-text, there are different kinds of entailment, yielded from different types of implicative reasoning (lexical, syntactical, common sense based), but here we focus just on TEG, which can be defined as an entailment from a specific statement towards a relatively more general one. Therefore, we have T→GH whenever the premise T entails the hypothesis H, being it also more general than the premise. We propose an unsupervised and language-independent method to recognize TEGs, from a pair 〈T,H〉 having an entailment relation. To this end, we introduce an Informative Asymmetric Measure (IAM) called Simplified Asymmetric InfoSimba (AISs), which we combine with different Asymmetric Association Measures (AAM). In this work, we hypothesize the existence of a particular mode of TE, namely TEG. Thus, the main contribution of our study is to highlight the importance of this inference mechanism. Consequently, the new annotation data seems to be a valuable resource for the community.
REVIEW | doi:10.20944/preprints202007.0516.v1
Subject: Medicine & Pharmacology, Other Keywords: Megakaryocyte, IFITM3, VWF, ADAMTS13, emperipolesis, self-association, unfractionated heparin (UFH), histone, NETs, Thrombin
Online: 22 July 2020 (11:03:08 CEST)
COVID-19 thromboembolic disease has brought all of us back to the drawing board. In COVID-19, pre-existing activated endothelium with increased Von Willebrand factor (VWF), low density lipoprotein (LDL) promoting “self-association” and “sticking” of long VWF strings to the vascular endothelial wall, suppressed ADAMTS13 cleavage of VWF, hypoxia induced upregulation and activation of VWF, fibrous network from neutrophil extracellular traps (NETs) with free DNA and histone, all appear to be initiating the thrombogenesis. Worsening complement activation, cytokine storm and resulting endothelial destruction, unregulated thrombogenesis leads to vascular occlusions and hypoxia. At this stage, the presence of abundant extracellular DNA, histone and -defensins appears worse than the SARS-CoV-2 itself. Previously observed in vitro mechanisms like histone “auto-activating” prothrombin, histone activated platelets generating thrombin without FXII, thrombin and plasmin cleaving complement C5 appears highly likely in COVID-19. Megakaryocytes are actively producing platelets in the lungs and appear to play a major role in thrombogenesis of COVID-19 raising suspicion of emperipolesis. This focused review is a compilation of my observations in relation to the pathophysiology of the intravascular environment, mainly in COVID-19 lungs. Pathophysiology based clinical trials are paramount in reducing morbidity and mortality in COVID-19.
ARTICLE | doi:10.20944/preprints201810.0221.v1
Subject: Mathematics & Computer Science, Information Technology & Data Management Keywords: Smart Cities; Internet of things; Bicycle sharing systems; Machine learning; Association rule mining
Online: 10 October 2018 (14:24:32 CEST)
Bike sharing systems are a key element of a smart city as they have the potential for reducing pollutant emissions and traffic congestion thus substantially improving citizens’ quality of life. In these systems, bicycles are made available for shared use to individuals on a very short-term basis. They are rented in a station and returned in any other station with free docks. However, to achieve a satisfactory user experience, all the stations in the system must be neither overloaded nor empty. The occupancy level of the stations can be constantly monitored through IoT-based services. The goal of this work is to analyze occupancy level data acquired from real systems to discover situations of dock overload in multiple stations which could lead to service disruption. The proposed methodology relies on a pattern mining approach. A new pattern type, called Occupancy Monitoring Pattern (OMPs), is proposed to characterize situations of dock overload in multiple stations. Since stations are geo-referenced and their occupancy levels are periodically monitored, OMPs can be filtered and evaluated by considering also the spatial and temporal correlation of the acquired measurements. The results achieved on real Smart City data highlight the potential of these techniques in supporting domain experts in maintenance activities, such as periodic re-balancing of the occupancy levels of the stations, as well as in improving the user experience, such as suggesting alternative stations in the neighborhood.
ARTICLE | doi:10.20944/preprints201803.0107.v2
Subject: Earth Sciences, Environmental Sciences Keywords: fire management; human activities; participation; firewood; charcoal; grazing; water; honey; farming; community forest association
Online: 12 June 2018 (11:20:43 CEST)
This paper proposes an Integrated Fire Management (IFM) framework that can be used to support communities and resource managers in finding effective and efficient approaches to prevent damaging fires, as well as maintain desirable fire regimes in Kenya. Designing and implementing an IFM approach in Kenya calls for a systematic understanding of the various uses of fire and the underlying perceptions and traditional ecological knowledge of the local people. The here proposed IFM framework allows an evaluation of the risks posed by fires, while balancing them with their beneficial ecological and economic effects, and thus developing effective fire management approaches. A case study of the proposed IFM framework was conducted in Gathiuru Forest that is part of the larger Mt. Kenya Forest Ecosystem. Focus group discussions were held with key resource persons, primary and secondary data on socio-economic activities were studied, fire and weather records were analyzed and the current fire management plans were consulted. Questionnaires were used to assess how the IFM is implemented in the Gathiuru Forest Station. The results show that the proposed IFM framework is scalable and can be applied in places with fire-dependent ecosystems as well as in places with fire-sensitive ecosystems in Kenya. The effectiveness is dependent on the active participation, formulation and implementation of the IFM activities by the main stakeholder groups (Kenya Forest Service (KFS), Kenya Wildlife Service (KWS), and the Community Forest Associations (CFA)). The proposed IFM framework helps in implementing cost-effective approaches to prevent damaging fires and maintain desirable fire regimes in Kenya.
ARTICLE | doi:10.20944/preprints201801.0136.v1
Subject: Materials Science, Nanotechnology Keywords: polymer-drug association; inclusion nano-complex; an amphiphilic polymer; polysoaps; antibiotic resistance; ampicillin trihydrate
Online: 16 January 2018 (07:56:15 CET)
Biocompatible polymeric materials with potential to form functional structures in association with different therapeutic molecules have a high potential for biological, medical and pharmaceutical applications. Therefore, the protective capability of the inclusion nano-Complex formed between the sodium salt of poly(maleic acid-alt-octadecene) and a β-lactam drug (ampicillin trihydrate) on the chemical, enzymatic and biological degradation was evaluated. PAM-18Na was produced and characterized as reported previously. The formation of polymeric hydrophobic aggregates in aqueous solution was determined, using pyrene as a fluorescent probe. Furthermore, the formation of polymer-drug nano-complexes was characterized by Differential Scanning Calorimetry-DSC, viscometric, ultrafiltration/centrifugation assays, zeta potential and size measurements by dynamic light scattering-DLS. The PAM-18Na capacity to avoid the chemical degradation was studied through stress stability tests. The enzymatic degradation was evaluated from a pure β-lactamase, while the biological degradation was determined by different β-lactamase producing Staphylococcus aureus strains. When ampicillin was associated with PAM-18Na, the half-life time in acidic conditions increased, whereas both the enzymatic degradation and the minimum inhibitory concentration decreased to a 90 and 75%, respectively. These results suggest a promissory capability of this polymer to protect the β-lactam drugs against chemical, enzymatic and biological degradation.
REVIEW | doi:10.20944/preprints202111.0253.v1
Subject: Medicine & Pharmacology, Cardiology Keywords: Cell therapy; chronic limb-threating ischemia; peripheral artery disease; diabetes; atherosclerosis obliterans; thromboangiitis obliterans; personalized medicine; artificial intelligence; machine learning; genome-wide association studies; transcriptome-wide association studies; clonal hematopoiesis of indeterminate potential.
Online: 15 November 2021 (11:18:43 CET)
Stem/progenitor cell transplantation is a potential novel therapeutic strategy to induce angiogenesis in ischemic tissue, which can prevent major amputation in patients with advanced peripheral artery disease (PAD). Thus, clinicians can use cell therapies worldwide to treat PAD. However, some cell therapy studies did not report beneficial outcomes. Clinical researchers suggested that classical risk factors and comorbidities may adversely affect the efficacy of cell therapy. Some studies have indicated that the response to stem cell therapy varies among patients even in those harboring limited risk factors. This suggested the role of undetermined risk factors, including genetic alterations, somatic mutations, and clonal hematopoiesis. Personalized stem cell-based therapy can be developed by analyzing individual risk factors. These approaches must consider several clinical biomarkers and perform studies (such as genome-wide association studies (GWAS)) on disease-related genetic traits and integrate the findings with those of transcriptome-wide association studies (TWAS) and whole-genome sequencing in PAD. Additional unbiased analyses with state-of-the-art computational methods, such as machine learning-based patient stratification, are suited for predictions in clinical investigations. The integration of these complex approaches into a unified analysis procedure for the identification of responders and non-responders before stem cell therapy, which can decrease treatment expenditure, is a major challenge to increase the efficacy of therapies.
CASE REPORT | doi:10.20944/preprints202206.0400.v1
Subject: Earth Sciences, Environmental Sciences Keywords: Ganga; environmental flows; river conservation; Ramganga; Karula; irrigation water use efficiency; Water Users Association; minor canal
Online: 29 June 2022 (08:52:50 CEST)
The pressure on freshwater resources is leading to diminishing flows in some of the critical river systems across the globe and India is no exception and this is mainly because of water withdrawal for irrigation, which is often to the tune of 70% to 80% with some proportion for domestic and industrial use. While graduating from the concept of environmental flows and its assessment methodologies in India, the water-managers, the researchers and the conservationists are now moving towards answering the next question if the rivers are to be revived, then where will the water come from, especially in the case of over-allocated rivers, including the river Ganga. While the logical way is to look at the biggest user of water, i.e. irrigation, it remains to be seen whether the irrigation water savings will actually lead to enhancing flows in a river, complementing the efforts towards maintaining e-flows in rivers, or whether it will lead to more area under agriculture, bring changes in cropping patterns towards more water-intensive crops or result in something else. This is a growing debate across the globe, where India is no exception, and there has been a wide range of opinions in this regard. This paper discusses the process, findings and lessons from a joint initiative involving farmers, the Uttar Pradesh state Irrigation and Water Resources Department, Bijnor District Administration and a conservation organisation to enhance flows in a rivulet, called karula River, which is part of the Ganga river system.
ARTICLE | doi:10.20944/preprints202112.0305.v1
Subject: Biology, Horticulture Keywords: replant disease; Malus; free-living nematodes; bacteria; fungi; rhizosphere; nematode-microbe association; disease complex; metabarcoding; nematode community
Online: 20 December 2021 (10:34:03 CET)
Apple replant disease is a severe problem in orchards and tree nurseries. Evidence for the involvement of a nematode-microbe disease complex was reported. To search for this complex, plots with a history of apple replanting, and control plots cultivated for the first time with apple were sampled in two fields in two years. Shoot weight drastically decreased with each replanting. Nematodes were extracted from soil samples by floatation-centrifugation, washed on a 20 µm-sieve, and used for DNA extraction. Nematode communities and co-extracted fungi and bacteria were analyzed by high-throughput sequencing of amplified ribosomal fragments. The nematode community and co-extracted fungal and bacterial communities significantly differed between replanted and control plots. Free-living nematodes of the genera Aphelenchus, Cephalenchus, and an unidentified Dorylaimida were associated with replanted plots, as indicated by linear discriminant analysis effect size. Among the co-extracted fungi and bacteria, Mortierella was most indicative of replanting. Some genera, mostly Rhabditis, indicated healthy control plots. Isolating and investigating the putative disease complexes will help to understand and alleviate stress-induced root damage of apple in replanted soil.
ARTICLE | doi:10.20944/preprints202212.0065.v1
Subject: Life Sciences, Genetics Keywords: pulmonary tuberculosis; lymph node tuberculosis; extra-pulmonary tuberculosis; single nucleotide polymorphisms; cytokine; innate immunity; genetic association; genotype; serum
Online: 5 December 2022 (08:00:15 CET)
Background: Tuberculosis (TB) manifests itself primarily in the lungs as pulmonary disease (PTB) and sometimes disseminates to other organs to cause extra-pulmonary TB, such as lymph node TB (LNTB). This study aimed to investigate the role of host genetic polymorphism in immunity related genes to find a genetic basis for such differences. Methods: Sixty-three, Single nucleotide polymorphisms (SNPs) in twenty-three, TB-immunity related genes including eleven innate immunity (SLCA11, VDR, TLR2, TLR4, TLR8, IRGM, P2RX7, LTA4H, SP110, DCSIGN and NOS2A) and twelve cytokine (TNFA, IFNG, IL2, Il12, IL18, IL1B, IL10, IL6, IL4, IL1RA, IL8 and TNFB) genes were investigated to find genetic associations in both PTB and LNTB as compared to healthy community controls. The serum cytokine levels were correlated for association with the genotypes. Results: PTB and LNTB showed differential genetic associations. The genetic variants in the cytokine genes (IFNG, IL12, IL4, TNFB and IL1RA and TLR2,4 associated with PTB susceptibility and cytokine levels but not LNTB (p < 0.05). Similarly, genetic variants in LTA4H, P2RX7, DCSIGN and SP110 showed susceptibility to LNTB and not PTB. Pathway analysis showed abundance of cytokine related variants for PTB and apoptosis related variants for LNTB. Conclusions: PTB and LNTB outcomes of TB infection have a genetic component and should be considered for any future susceptibility and functional studies.
ARTICLE | doi:10.20944/preprints202205.0277.v1
Subject: Life Sciences, Genetics Keywords: immunoglobulin A nephropathy; expression quantitative trait loci; summary data-based Mendelian randomization; genome-wide association study; functional mapping
Online: 20 May 2022 (12:13:06 CEST)
Background: Immunoglobulin A nephropathy (IgAN) is a complex autoimmune disease, and the exact pathogenesis remains to be elucidated. Methods: We conducted summary data-based Mendelian randomization (SMR) analysis and performed functional mapping and annotation using FUMA to explore genetic loci that are po-tentially involved in the pathogenies of IgAN. Both analyses used summarized data of a recent genome-wide association study (GWAS) on IgANs, which included 477,784 Europeans (15,587 cases and 462,197 controls) and 175,359 East Asians (71 cases and 175,288 controls). We performed separate SMR analysis using CAGE and GTEx eQTL data. Results: Using the CAGE eQTL data, our SMR analysis identified 32 probes tagging 25 unique genes that were pleiotropically/potentially causally associated with IgAN, with the top three probes being ILMN_2150787 (tagging HLA-C, PSMR=2.10×10-18), ILMN_1682717 (tagging IER3, PSMR=1.07×10-16) and ILMN_1661439 (tagging FLOT1, PSMR=1.16×10-14). Using GTEx eQTL data, our SMR analysis identified 24 probes tagging 24 unique genes, with the top three probes being ENSG00000271581.1 (tagging XXbac-BPG248L24.12, PSMR=1.44×10-10), ENSG00000186470.9 (tagging BTN3A2, PSMR=2.28×10-10), and ENSG00000224389.4 (tagging C4B, PSMR=1.23×10-9). FUMA analysis identified 3 independent, significant and lead SNPs, 2 genomic risk loci and 39 genes. Conclusion: We identified many genetic variants/loci that are potentially involved in the patho-genesis of IgAN.
ARTICLE | doi:10.20944/preprints202003.0127.v1
Subject: Biology, Entomology Keywords: plant-insect interaction; host shift; parallel evolution; detoxification; experimental evolution; population genomics; genome-wide association mapping; gene expression; Callosobruchus maculatus
Online: 8 March 2020 (01:52:10 CET)
Genes that affect adaptive traits have been identified, but our knowledge of the genetic basis of adaptation in a more general sense (across multiple traits) remains limited. We combined population-genomic analyses of evolve and resequence experiments, genome-wide association mapping of performance traits, and analyses of gene expression to fill this knowledge gap, and shed light on the genomics of adaptation to a marginal host (lentil) by the seed beetle Callosobruchus maculatus. Using population-genomic approaches, we detected modest parallelism in allele frequency change across replicate lines during adaptation to lentil. Mapping populations derived from each lentil-adapted line revealed a polygenic basis for two host-specific performance traits (weight and development time), which had low to modest heritabilities. We found less evidence of parallelism in genotype-phenotype associations across these lines than in allele frequency changes during the experiments. Differential gene expression caused by differences in recent evolutionary history exceeded that caused by immediate rearing host. Together, the three genomic data sets suggest that genes affecting traits other than weight and development time are likely to be the main causes of parallel evolution, and that detoxification genes (especially cytochrome P450s and beta-glucosidase) could be especially important for colonization of lentil by C. maculatus.
REVIEW | doi:10.20944/preprints201812.0267.v1
Subject: Medicine & Pharmacology, Clinical Neurology Keywords: Alzheimer’s disease; CTH gene; DNA methylation; epigenetics; epigenome-wide association study; methylome; MTHFR gene; nutrition; S-adenosylmethionine; vitamin B complex
Online: 24 December 2018 (04:48:53 CET)
DNA methylation and other epigenetic factors are important in the pathogenesis of late-onset Alzheimer’s disease (LOAD). Methylenetetrahydrofolate reductase (MTHFR) gene mutations occur in most elderly patients with memory loss. MTHFR is critical for production of S-adenosyl-L-methionine (SAM), the principal methyl donor. A common mutation (1364T/T) of the cystathionine-γ-lyase (CTH) gene affects the enzyme that converts cystathionine to cysteine in the trans-sulfuration pathway causing plasma elevation of total homocysteine (tHcy) or hyperhomocysteinemia – a strong and independent risk factor for cognitive loss and AD. Other causes of hyperhomocysteinemia include aging, nutritional factors, and deficiencies of B vitamins. We emphasize the importance of supplementing vitamin B12 (methylcobalamin), vitamin B9 (folic acid), vitamin B6 (pyridoxine), and SAM to patients in early stages of LOAD.
Subject: Life Sciences, Other Keywords: Analysis of variance; Variance-decomposition; The Bayesian brain; High-dimensional data; Association; Explanation; Prediction; Causation; The neural law of large numbers
Online: 23 September 2021 (11:13:08 CEST)
We discuss what we believe could be an improvement in future discussions of the ever-changing brain. We do so by distinguishing different types of brain variability and outlining methods suitable to analyse them. We argue that, when studying brain and behaviour data, classical methods such as regression analysis and more advanced approaches both aim to decompose the total variance into sensible variance components. In parallel, we argue that a distinction needs to be made between innate and acquired brain variability. For varying high-dimensional brain data, we present methods useful to extract their low-dimensional representations. Finally, to trace potential causes and predict plausible consequences of brain variability, we discuss how to combine statistical principles and neurobiological insights to make associative, explanatory, predictive, and causal enquires; but cautions are needed to raise association- or prediction-based neurobiological findings to causal claims.
ARTICLE | doi:10.20944/preprints202104.0791.v1
Subject: Medicine & Pharmacology, Other Keywords: COVID-19 Vaccines; Cross-Sectional Studies; Decision Making; Dental Education; Dental Students; International Association of Dental Students; Mass Vaccination; Multicenter Study; Social Determinants of Health
Online: 30 April 2021 (15:26:07 CEST)
Background: Acceleration of mass vaccination strategies is the only pathway to overcome the COVID-19 pandemic. Healthcare professionals and students have a key role in shaping public opinion about vaccines. This study aimed to evaluate the attitudes of dental students globally towards COVID-19 vaccines and explore the potential drivers for students' acceptance levels; Methods: A global cross-sectional study was carried out in February 2021 using an online ques-tionnaire. The study was liaised by the scientific committee of the International Association of Dental Students (IADS), and data was collected through the national and local coordinators of IADS member organizations. The dependent variable was the willingness to take the COVID-19 vaccine, and the independent variables included demographic characteristics, COVID-19-related experi-ence, and the drivers of COVID-19 vaccine-related attitude suggested by the WHO-SAGE; Results: A total of 6639 students from 22 countries representing all world regions responded to the ques-tionnaire properly. Their mean age was 22.06 ± 2.79 (17-40) years, and the majority were females (70.5%), in clinical years (66.8%), and from upper-middle-income economies (45.7%). In general, 22.5% of dental students worldwide were hesitant, and 13.9% rejected COVID-19 vaccines. The students in low- and lower-middle-income (LLMI) economies had significantly higher levels of vaccine hesitancy compared to their peers in upper-middle- and high-income (UMHI) economies (30.4% vs 19.8%; p < 0.001); Conclusions: The global acceptance level of dental students for COVID-19 vaccines was suboptimal, and their worrisome level of vaccine hesitancy was influenced by the socioeconomic context where the dental students live and study. The media and social media, public figures, insufficient knowledge about vaccines, and mistrust of governments and the pharmaceutical industry were barriers to vaccination. The findings of this study call for further implementation of epidemiology (infectious diseases) education within undergraduate dental curricula.
ARTICLE | doi:10.20944/preprints202109.0021.v1
Subject: Mathematics & Computer Science, Probability And Statistics Keywords: Effect size; correlation coefficient; association measure; covariance; mean square contingency coefficient; mean square effect half-size; Pearson’s Phi; 2 × 2 table; binary crosstab; gross crosstab; contingency table
Online: 1 September 2021 (14:28:47 CEST)
Evidence-based medicine (EBM) is in crisis, in part due to bad methods, which are understood as misuse of statistics that is considered correct in itself. This article exposes two related common misconceptions in statistics, the effect size (ES) based on correlation (CBES) and a misconception of contingency tables (MCT). CBES is a fallacy based on misunderstanding of correlation and ES and confusion with 2 × 2 tables, which makes no distinction between gross crosstabs (GCTs) and contingency tables (CTs). This leads to misapplication of Pearson’s Phi, designed for CTs, to GCTs and confusion of the resulting gross Pearson Phi, or mean-square effect half-size, with the implied Pearson mean square contingency coefficient. Generalizing this binary fallacy to continuous data and the correlation in general (Pearson’s r) resulted in flawed equations directly expressing ES in terms of the correlation coefficient, which is impossible without including covariance, so these equations and the whole CBES concept are fundamentally wrong. MCT is a series of related misconceptions due to confusion with 2 × 2 tables and misapplication of related statistics. The misconceptions are threatening because most of the findings from contingency tables, including CBES-based meta-analyses, can be misleading. Problems arising from these fallacies are discussed and the necessary changes to the corpus of statistics are proposed resolving the problem of correlation and ES in paired binary data. Since exposing these fallacies casts doubt on the reliability of the statistical foundations of EBM in general, we urgently need to revise them.
ARTICLE | doi:10.20944/preprints201807.0397.v2
Subject: Biology, Agricultural Sciences & Agronomy Keywords: flax; genome-wide association study (GWAS); selective sweep; genotyping by sequencing (GBS); bi-parental population; single nucleotide polymorphism (SNP); seed yield; plant height; maturity; fatty acid composition
Online: 3 August 2018 (15:34:24 CEST)
A genome-wide association study (GWAS) was performed on a set of 260 lines which belong to three different bi-parental flax mapping populations. These lines were sequenced to an averaged genome coverage of 19× using the Illumina Hi-Seq platform. Phenotypic data for 11 seed yield and oil quality traits were collected in eight year/location environments. A total of 17,288 single nucleotide polymorphisms were identified, which explained more than 80% of the phenotypic variation for days to maturity (DTM), iodine value (IOD), palmitic (PAL), stearic, linoleic (LIO) and linolenic (LIN) acid contents. Twenty-three unique genomic regions associated with 33 QTL for the studied traits were detected, thereby validating four genomic regions previously identified. The 33 QTL explained 48-73% of the phenotypic variation for oil content, IOD, PAL, LIO and LIN but only 8-14% for plant height, DTM and seed yield. A genome-wide selective sweep scan for selection signatures detected 114 genomic regions that accounted for 7.82% of the flax pseudomolecule and overlapped with the 11 GWAS-detected genomic regions associated with 18 QTL for 11 traits. The results demonstrate the utility of GWAS combined with selection signatures for dissection of the genetic structure of traits and for pinpointing genomic regions for breeding improvement.