ARTICLE | doi:10.20944/preprints202208.0178.v1
Subject: Medicine & Pharmacology, Psychiatry & Mental Health Studies Keywords: alcohol dependence; comorbidity; gene network; genome-wide association study; sex differences
Online: 9 August 2022 (10:35:29 CEST)
At least 50% of factors predisposing to alcohol dependence (AD) are genetic and women affected with this disorder present with more psychiatric comorbidities, probably indicating different genetic factors involved. We aimed to run a genome-wide association study (GWAS) followed by a bioinformatic functional annotation of associated genomic regions in male and female patients with AD and eight related clinical measures. A genome-wide significant association of rs220677 with AD (p-value = 1.33×10^-8 calculated with the Yates-corrected Chi-square test under the assumption of dominant inheritance) was discovered in female patients. Associations of AD and related clinical measures with seven other single nucleotide polymorphisms listed in previous GWAS of psychiatric and addiction traits were differently replicated in male and female patients. The bioinformatic analysis showed that regulatory elements in the eight associated linkage disequilibrium blocks define the expression of 80 protein-coding genes. Nearly 68% of these and of 120 previously published coding genes associated with alcohol phenotypes directly interact in a single network. This study indicates that a number of genes behind the pathogenesis of AD are different in male and female patients, but implicated molecular mechanisms are functionally connected. The results also suggest the genetic basis of sex-specific psychiatric comorbidities of AD.
Subject: Medicine & Pharmacology, Psychiatry & Mental Health Studies Keywords: feeding and eating disorder; genome-wide association study; methylation quantitative trait loci
Online: 8 October 2021 (14:23:39 CEST)
Eating disorders (ED) are characterized by alterations in eating behavior. The genetic factors shared between ED diagnoses have been underexplored. The present study aimed to perform a genome-wide association study on individuals with disordered eating behaviors in the Mexican population, blood methylation quantitative trait loci (blood-meQTL) analysis, and in silico function prediction by different algorithms. The analysis included a total of 1803 individuals. Genome-wide association study and blood-meQTL analysis were performed by logistic and linear regression. In silico functional variant prediction, phenome-wide, and transcriptome-wide association studies by different algorithms were analyzed. In the genome-wide association study, we identified 44 single-nucleotide polymorphisms (SNP) associated at a nominal value and 7 blood-meQTL at a genome-wide umbral. The SNPs were enriched in genome-wide associations of the metabolic and immunologic domains. In the in silico analysis, the SNP rs10419198 located on an enhancer mark could change the expression of PRR12 on blood, adipocytes, and brain areas that regulate food intake. The present study supports the previous associations of genetic variation in the metabolic domain with ED.
ARTICLE | doi:10.20944/preprints202111.0386.v1
Subject: Life Sciences, Genetics Keywords: genome-wide association study; transcriptome-wide association study; meta-analysis; expression quantitative trait loci; nicotine addiction
Online: 22 November 2021 (11:46:13 CET)
Genome-wide association studies (GWAS) have identified and reproduced thousands of diseases associated loci but many of them are not directly interpretable due to the strong linkage disequilibrium among variants. Transcriptome-wide association studies (TWAS) incorporated expression quantitative trait loci (eQTL) cohorts as reference panel to detect associations with the phenotype at the gene level and were gaining popularity in recent years. For nicotine addiction, several important susceptible genetic variants were identified by GWAS, but TWAS that detected genes associated with nicotine addiction and unveiled the underlying molecular mechanism were still lacking. In this study, we used eQTL data from the Genotype-Tissue Expression (GTEx) consortium as reference panel to conduct tissue specific TWAS on cigarettes per day (CPD) over 13 brain tissues in two large cohorts: UK Biobank (UKBB; N=142,202) and the GWAS & Sequencing Consortium of Alcohol and Nicotine use (GSCAN; N=143,210), and then meta-analyzed the results across tissues while considering the heterogeneity across tissues. We identified three major clusters of genes with different meta-patterns across tissues consistent in both cohorts, including homogenous genes associated with CPD in all brain tissues, partially homogeneous genes associated with CPD in cortex, cerebellum and hippocampus tissues, and lastly the tissue-specific genes associated with CPD in only few specific brain tissues. Downstream enrichment analyses on each gene cluster identified unique biological pathways associated with CPD and provided important biological insights into the regulatory mechanism of nicotine dependence in the brain.
ARTICLE | doi:10.20944/preprints202204.0256.v1
Subject: Medicine & Pharmacology, Nutrition Keywords: tea intake; fracture; Mendelian randomization; genome-wide association studies
Online: 27 April 2022 (10:40:34 CEST)
Fracture is a global public health disease. Bone health and fracture risk have become the focus of public and scientific attention. Observational studies have reported that tea consumption is associated with fracture risk, but the results are inconsistent. The present study was conducted to evaluate whether tea consumption was causally associated with the risk of bone fracture through two-sample Mendelian Randomization (MR) analysis. We included a large genome-wide association study (GWAS) associated with tea consumption of 447,485 individuals and analyzed the effects of genetic instruments on fractures using fracture cases from the UK Biobank dataset (n=361,194). Inverse variance weighted (IVW) indicated no causal effects of tea consumption on fractures of the skull and face, shoulder and upper arm, hand and wrist, femur, calf, and ankle (odds ratio=1.000, P=0.881; OR=1.000, P=0.857; OR=1.002, P=0.339; OR=0.997, P=0.054; OR=0.998, P=0.569, respectively). Consistent results were also found in MR-Egger, weighted median, and weighted mode. Our research provided evidence that tea consumption is unlikely to affect the incidence of fractures.
ARTICLE | doi:10.20944/preprints201807.0397.v2
Subject: Biology, Agricultural Sciences & Agronomy Keywords: flax; genome-wide association study (GWAS); selective sweep; genotyping by sequencing (GBS); bi-parental population; single nucleotide polymorphism (SNP); seed yield; plant height; maturity; fatty acid composition
Online: 3 August 2018 (15:34:24 CEST)
A genome-wide association study (GWAS) was performed on a set of 260 lines which belong to three different bi-parental flax mapping populations. These lines were sequenced to an averaged genome coverage of 19× using the Illumina Hi-Seq platform. Phenotypic data for 11 seed yield and oil quality traits were collected in eight year/location environments. A total of 17,288 single nucleotide polymorphisms were identified, which explained more than 80% of the phenotypic variation for days to maturity (DTM), iodine value (IOD), palmitic (PAL), stearic, linoleic (LIO) and linolenic (LIN) acid contents. Twenty-three unique genomic regions associated with 33 QTL for the studied traits were detected, thereby validating four genomic regions previously identified. The 33 QTL explained 48-73% of the phenotypic variation for oil content, IOD, PAL, LIO and LIN but only 8-14% for plant height, DTM and seed yield. A genome-wide selective sweep scan for selection signatures detected 114 genomic regions that accounted for 7.82% of the flax pseudomolecule and overlapped with the 11 GWAS-detected genomic regions associated with 18 QTL for 11 traits. The results demonstrate the utility of GWAS combined with selection signatures for dissection of the genetic structure of traits and for pinpointing genomic regions for breeding improvement.
ARTICLE | doi:10.20944/preprints202107.0311.v1
Online: 13 July 2021 (15:11:54 CEST)
The SnRK gene family is a key regulator playing an important role in plant stress response by phosphorylating the target protein to regulate the signalling pathways. The function of SnRK gene family has been reported in many species but is limited to Triticum asetivum. In this study, SnRK gene family in the wheat genome was identified and its structural characteristics were described. One hundred forty-seven SnRK genes distributed across 21 chromosomes were identified in the Triticum aestivum genome and categorised into three subgroups (SnRK1/2/3) based on phylogenetic analyses and domain types. The gene intron-exon structure and protein-motif composition of SnRKs were similar within each subgroup but different amongst the groups. Gene duplication between the wheat, Arabidopsis, rice and barley genomes was also investigated in order to get insight into the evolutionary aspects of the TaSnRK family genes. SnRK genes showed differential expression patterns in leaves, roots, spike, and grains. Redundant stress-related cis-elements were also found in the promoters of 129 SnRK genes and their expression levels varied widely following drought, ABA and light regulated elements. In particular, TaSnRK2.11 had higher and increased expression under the abiotic stresses and can be a candidate gene for the abiotc stress tolerance. The findings will aid in the functional characterization of TaSnRK genes for further research.
ARTICLE | doi:10.20944/preprints202206.0376.v1
Subject: Biology, Other Keywords: effector proteins; genome-wide analysis; Ganoderma boninense; basal stem rot; genome architecture
Online: 28 June 2022 (04:59:14 CEST)
Ganoderma boninense is the major causal agent for the basal stem rot (BSR) disease in oil palm, causing the progressive rot of the basal part of the stem. Despite its prominence, key pathogenicity determinants for the aggressive nature of hemibiotrophic infection remain unknown. In this study, genome sequencing and annotation of G. boninense T10 were carried out using the Illumina sequencing platform and comparative genome analysis was performed with previously reported G. boninense strains (NJ3 and G3). The pan-secretome of G. boninense was constructed and comprised of 937 core orthogroups, 243 accessory orthogroups, and 84 strain-specific orthogroups. A set of core candidate effector proteins (CEPs) were found to be enriched with catalytic protein classified as the carbohydrate-active enzymes, hydrolases as well as non-catalytic proteins. Differential expression analysis revealed an upregulation of CEP genes which was linked to the suppression of PTI signaling cascade while the downregulation of CEP genes was linked to the inhibition of PTI by preventing host defense elicitation. Genome architecture analysis revealed the one-speed architecture of the G. boninense genome and the lack of preferential association of CEP genes to the transposable elements. The findings obtained from this study would aid in the characterization of pathogenicity determinants and molecular biomarkers of BSR disease.
ARTICLE | doi:10.20944/preprints201810.0171.v3
Subject: Medicine & Pharmacology, General Medical Research Keywords: genome-wide polygenic score; coronary artery disease; AUC
Online: 6 December 2018 (07:06:32 CET)
A recent study claimed that genome-wide polygenic scores (GPSs) for five common diseases could identify individuals with risk equivalent to monogenic mutations. Receiver operator curve analyses were reported to have areas under the curve (AUCs) ranging from 0.63 for inflammatory bowel disease up to 0.81 for coronary artery disease (CAD) but these models also included age and sex, themselves strong predictors of risk. The GPS for CAD identified 8% of the population at threefold increased risk, which it was claimed was comparable to the excess risk from monogenic mutations. In the present study attempts were made to model the distribution of the GPS for CAD to match the information provided. These models were based on the reported distribution of prevalence by centile of GPS and on the distribution of GPS in controls and cases and were fitted to the reported results using linear approximations to the distributions and using simulations of a liability-threshold model. It was impossible to produce a compatible model in which the GPS produced an AUC as high as 0.81 and the most plausible estimate was that the true AUC was only 0.65. The reported distributions of the GPS in cases and controls overlap so much that they are not compatible with an AUC of 0.7 or higher. The AUC of the GPS for these diseases is modest. Furthermore, the literature robustly demonstrates that true CAD risk associated with monogenic mutations is much higher than the threefold increase which is predicted by the GPS. Together, these findings cast doubt on the clinical utility of the GPS.
ARTICLE | doi:10.20944/preprints201805.0330.v1
Subject: Life Sciences, Molecular Biology Keywords: fibrillin; cucumber; genome-wide; gene expression; high light stress
Online: 24 May 2018 (05:24:00 CEST)
Fibrillin (FBN) is a plastid lipid-associated protein found in photosynthetic organisms from cyanobacteria to plants. In this study, 10 CsaFBN genes were identified in genomic DNA sequences of cucumber (Chinese long and Gy14) through database searches using the conserved domain of FBN and the 14 FBN genes of Arabidopsis. Phylogenetic analysis of CsaFBN protein sequences showed that there was no counterpart of Arabidopsis and rice FBN5 in the cucumber genome. FBN5 is essential for growth in Arabidopsis and rice; its absence in cucumber may be because of incomplete genome sequences or that another FBN carries out its functions. Among the 10 CsaFBN genes, CsaFBN1 and CsaFBN9 were the most divergent in terms of nucleotide sequences. Most of the CsaFBN genes were expressed in the leaf, stem, and fruit. CsaFBN4 showed the highest mRNA expression levels in various tissues, followed by CsaFBN6, CsaFBN1, and CsaFBN9. High-light stress combined with low temperature decreased photosynthetic efficiency and highly induced transcript levels of CsaFBN1, CsaFBN6, and CsaFBN11, which decreased after 24 h treatment. Transcript levels of the other seven genes were changed only slightly. This result suggests that CsaFBN1, CsaFBN6, and CsaFBN11 may be involved in photoprotection under high-light conditions at low temperature.
Subject: Biology, Animal Sciences & Zoology Keywords: Genome-wide association studies (GWAS); post-GWAS; sheep; tail fat deposition
Online: 11 June 2019 (10:04:39 CEST)
The type of tail of sheep is an important economic trait. However, the candidate genes associated with the tail type are uncertain. The objective of this study was to identify the genetic region and genotype responsible for the tail type phenotype. Here we perform a genome-wide association study (GWAS) in 40 large tailed Han sheep and 40 Altay sheep as case and 40 Tibetan sheep as control. The results indicated that a total 31 genome-wide significant SNPs associated with type of tail traits were detected. For significant SNPS loci, determine its physical location, and screening of candidate genes within section. By combining information of previously reported and annotated biological functional genes, we identified SPAG17, Tbx15, VRTN, NPC2, BMP2 and PDGFD as the most promising candidate genes for type of tail traits. Based on the above identified candidate genes on type of tail traits, we selected BMP2 and PDGFD to conduct the genetic effect analysis in a large Altay sheep and Tibetan sheep population. Rs119 T>C in the exon1 of BMP2 gene and 1 SNPs in the exon4 (rs69 C>A) of PDGFD gene were detected, rs119 that located on exon1 of BMP2 gene was TT genotype in Altay sheep, while with CC genotype in Tibetan sheep. On rs69 of PDGFD gene, Altay sheep with CC genotype, however, Tibetan sheep with AA genotype. These results indicated that the significant associations of SNPs detected in GWAS were indirectly caused by the genetic effects of BMP2 and PDGFD on sheep tail fat deposition.
ARTICLE | doi:10.20944/preprints202202.0164.v1
Subject: Life Sciences, Genetics Keywords: rare variants; genome-wide association study; validation test; SNP chip; genomic selection
Online: 11 February 2022 (15:59:26 CET)
The experiments described in this research article were designed to test the effect of rare variants into genomic prediction in dairy cattle. Common polymorphisms are able to explain only a small proportion of the underlying genetic variation of complex phenotypes. Variants representing functional mutations with large effects on complex phenotypes are expected to be rare due to natural (humans) or artificial (livestock) selection pressure. Therefore, it is important to check whether the use of rare variants could increase the accuracy of ranking of animals by providing the tool for more precise differentiation among the bulls with high additive genetic merit. The goal of our study was to verify whether including rare variants in a genomic selection model allows for a more accurate description of the additive genetic background of traits under selection in dairy cattle. We used the linear mixed model for comparison SNP estimates for Holstein-Friesian cattle of the two data sets – a set containing only single nucleotide polymorphisms defined by minor allele frequency ≥ 0.01, which is routinely used in the Polish genomic evaluation system (46,216 SNPs), and a set containing SNPs selected based only on the call rate (54,378 SNPs). Based on the SNP estimates we also calculated DGV and GEBV and compared them between both data sets. In all the analyses we used production, fertility, conformation and udder health traits. We also assessed the time required for the two most computationally demanding components of genomic selection: preparing genotype data, and estimation of SNP effects between those two data sets. The results of our study indicated that the analysis including rare variants resulted in changes in the individual ranking of the top 100 male and female candidates, but had no effect on the outcome of the quality of EBV prediction as expressed by the Interbull validation test.
REVIEW | doi:10.20944/preprints202107.0045.v1
Subject: Medicine & Pharmacology, Allergology Keywords: genome wide association studies (GWAS); single nucleotide polymorphism (SNP); oestrogen; ESR1; HOXA10
Online: 2 July 2021 (09:59:27 CEST)
Endometriosis is a chronic neuro-inflammatory disorder the defining feature of which is the growth of tissue (lesions) that resembles the endometrium in sites outside the uterus. Estimates of prevalence typically quote rates of ~10% of women of reproductive age, equating to ~190 million women world-wide. Three subtypes of endometriosis are usually considered when discussing the aetiology of the disorder - superficial peritoneal, ovarian (endometrioma cysts), and deep (infiltrating). Genetic, hormonal and immunological factors have all been proposed as contributing to risk factors associated with the development of lesions. Twin studies report the heritable component of endometriosis as ~50%. Genome wide association studies (GWAS) have been conducted allowing unbiased scanning of the genome for single nucleotide polymorphisms (SNPs) in many thousands of individuals. These studies have identified SNPs that appear over-represented in patients with endometriosis, particularly those with more extensive disease (stage III/IV). Amongst the larger scale GWAS there has been replication of SNPs near genes involved in oestrogen and other signalling pathways including ESR1 (oestrogen receptor alpha), GREB1, HOXA10, WNT4 and MAPK kinase signalling. The results from patients with endometriosis have also provided an opportunity to make comparisons with GWAS conducted on other patient cohorts including those with reproductive traits (age at menarche) and disorders (fibroids, endometrial and ovarian cancer) and conditions that are reported by women with endometriosis (migraine, depression). These comparative studies have highlighted some shared genetically-controlled biological mechanisms, including hormone-regulated pathways which might explain the co-occurrence of endometriosis with these disorders. In summary, unbiased genetic analysis has provided new insights into the genetic factors that may contribute to increased risk of developing endometriosis. New studies are needed to broaden the range of patients contributing to these datasets and to improve integration with non-genomic and tissue expression data before their full potential for diagnosis and improvements in patient care can be fully realised.
ARTICLE | doi:10.20944/preprints202005.0463.v3
Subject: Biology, Anatomy & Morphology Keywords: SARS-CoV-2; genome-wide mutations; transition; transversion; nonsynonymous and synonymous mutations; microevolution
Online: 5 October 2020 (10:56:36 CEST)
To understand SARS-CoV-2 microevolution, this study explored the genome-wide frequency, gene-wise distribution, and molecular nature of all point-mutations detected across its 71,703 RNA-genomes deposited in the GISAID repository, till 21 August 2020. Globally, nsp1/nsp2/nsp3/ nsp11 and orf7a/orf3a/S were the most mutation-ridden non-structural and structural genes respectively. Phylogeny based on 4,618 spatiotemporally-representative genomes revealed that entities belonging to the early lineages are mostly spread over Asian countries (including India, the biggest hotspot of the pandemic) whereas the recently-derived lineages are more globally distributed. Of the total 16,602 polymorphism-bearing sites in the pan-genome, 11,037 and 4,965 involved transitions and transversions, which in turn were predominated by cytidine-to-uridine and guanosine-to-uridine conversions, respectively. Positive selection of nonsynonymous mutations (dN/dS >1) in most of the structural, but not non-structural, genes indicated that SARS-CoV-2 has already harmonized its replication/transcription machineries with the host’s metabolic system, while it is still redefining virulence/transmissibility strategies at the molecular level.
ARTICLE | doi:10.20944/preprints202205.0258.v1
Subject: Life Sciences, Genetics Keywords: Mendelian randomisation; Alcohol Consumption; UK Biobank; Phenome wide association studies; Biomarker
Online: 19 May 2022 (09:09:35 CEST)
Background: Alcohol consumption is associated with the development of cardiovascular diseases, cancer, and liver disease. The biological mechanisms are still largely unclear. Here, we aimed to use an agnostic approach to identify phenotypes mediating the effect of alcohol on various diseases. Methods: We performed an agnostic association analysis between alcohol consumption (red, and white wine, beer/cider, fortified wine, and spirits) with over 7,800 phenotypes from the UK biobank comprising 223,728 participants. We performed Mendelian randomisation analysis to infer causality. We additionally performed a Phenome-wide association analysis and a mediation analysis between alcohol consumption as exposure, traits in causal relationship with alcohol consumption as mediators, and various diseases as outcome. Results: Of 45 traits in association with alcohol consumption, 20 were in causal relationship with alcohol consumption. Gamma glutamyltransferase (GGT; β=9.44; CI,5.94-12.93; Pfdr=9.04×10-7), mean sphered cell volume (β=0.189; CI,0.11-0.27; Pfdr=1.00×10-4), mean corpuscular volume (β=0.271; CI,0.19-0.35; Pfdr=7.09×10-10) and mean corpuscular haemoglobin (β=0.278; CI,0.19-0.36; Pfdr=1.60×10-6) showed the strongest causal relationships. We also identified GGT and physical activity as mediators causing liver cirrhosis and alcohol dependence. Conclusion: Our study provides evidence of causality between alcohol consumption and 20 traits and a mediation effect for physical activity on health consequences of alcohol consumption.
ARTICLE | doi:10.20944/preprints202205.0277.v1
Subject: Life Sciences, Genetics Keywords: immunoglobulin A nephropathy; expression quantitative trait loci; summary data-based Mendelian randomization; genome-wide association study; functional mapping
Online: 20 May 2022 (12:13:06 CEST)
Background: Immunoglobulin A nephropathy (IgAN) is a complex autoimmune disease, and the exact pathogenesis remains to be elucidated. Methods: We conducted summary data-based Mendelian randomization (SMR) analysis and performed functional mapping and annotation using FUMA to explore genetic loci that are po-tentially involved in the pathogenies of IgAN. Both analyses used summarized data of a recent genome-wide association study (GWAS) on IgANs, which included 477,784 Europeans (15,587 cases and 462,197 controls) and 175,359 East Asians (71 cases and 175,288 controls). We performed separate SMR analysis using CAGE and GTEx eQTL data. Results: Using the CAGE eQTL data, our SMR analysis identified 32 probes tagging 25 unique genes that were pleiotropically/potentially causally associated with IgAN, with the top three probes being ILMN_2150787 (tagging HLA-C, PSMR=2.10×10-18), ILMN_1682717 (tagging IER3, PSMR=1.07×10-16) and ILMN_1661439 (tagging FLOT1, PSMR=1.16×10-14). Using GTEx eQTL data, our SMR analysis identified 24 probes tagging 24 unique genes, with the top three probes being ENSG00000271581.1 (tagging XXbac-BPG248L24.12, PSMR=1.44×10-10), ENSG00000186470.9 (tagging BTN3A2, PSMR=2.28×10-10), and ENSG00000224389.4 (tagging C4B, PSMR=1.23×10-9). FUMA analysis identified 3 independent, significant and lead SNPs, 2 genomic risk loci and 39 genes. Conclusion: We identified many genetic variants/loci that are potentially involved in the patho-genesis of IgAN.
ARTICLE | doi:10.20944/preprints202003.0127.v1
Subject: Biology, Entomology Keywords: plant-insect interaction; host shift; parallel evolution; detoxification; experimental evolution; population genomics; genome-wide association mapping; gene expression; Callosobruchus maculatus
Online: 8 March 2020 (01:52:10 CET)
Genes that affect adaptive traits have been identified, but our knowledge of the genetic basis of adaptation in a more general sense (across multiple traits) remains limited. We combined population-genomic analyses of evolve and resequence experiments, genome-wide association mapping of performance traits, and analyses of gene expression to fill this knowledge gap, and shed light on the genomics of adaptation to a marginal host (lentil) by the seed beetle Callosobruchus maculatus. Using population-genomic approaches, we detected modest parallelism in allele frequency change across replicate lines during adaptation to lentil. Mapping populations derived from each lentil-adapted line revealed a polygenic basis for two host-specific performance traits (weight and development time), which had low to modest heritabilities. We found less evidence of parallelism in genotype-phenotype associations across these lines than in allele frequency changes during the experiments. Differential gene expression caused by differences in recent evolutionary history exceeded that caused by immediate rearing host. Together, the three genomic data sets suggest that genes affecting traits other than weight and development time are likely to be the main causes of parallel evolution, and that detoxification genes (especially cytochrome P450s and beta-glucosidase) could be especially important for colonization of lentil by C. maculatus.
REVIEW | doi:10.20944/preprints202111.0253.v1
Subject: Medicine & Pharmacology, Cardiology Keywords: Cell therapy; chronic limb-threating ischemia; peripheral artery disease; diabetes; atherosclerosis obliterans; thromboangiitis obliterans; personalized medicine; artificial intelligence; machine learning; genome-wide association studies; transcriptome-wide association studies; clonal hematopoiesis of indeterminate potential.
Online: 15 November 2021 (11:18:43 CET)
Stem/progenitor cell transplantation is a potential novel therapeutic strategy to induce angiogenesis in ischemic tissue, which can prevent major amputation in patients with advanced peripheral artery disease (PAD). Thus, clinicians can use cell therapies worldwide to treat PAD. However, some cell therapy studies did not report beneficial outcomes. Clinical researchers suggested that classical risk factors and comorbidities may adversely affect the efficacy of cell therapy. Some studies have indicated that the response to stem cell therapy varies among patients even in those harboring limited risk factors. This suggested the role of undetermined risk factors, including genetic alterations, somatic mutations, and clonal hematopoiesis. Personalized stem cell-based therapy can be developed by analyzing individual risk factors. These approaches must consider several clinical biomarkers and perform studies (such as genome-wide association studies (GWAS)) on disease-related genetic traits and integrate the findings with those of transcriptome-wide association studies (TWAS) and whole-genome sequencing in PAD. Additional unbiased analyses with state-of-the-art computational methods, such as machine learning-based patient stratification, are suited for predictions in clinical investigations. The integration of these complex approaches into a unified analysis procedure for the identification of responders and non-responders before stem cell therapy, which can decrease treatment expenditure, is a major challenge to increase the efficacy of therapies.
TECHNICAL NOTE | doi:10.20944/preprints201901.0126.v1
Subject: Life Sciences, Genetics Keywords: flax; association mapping; genome-wide association study (GWAS); simple sequence repeat (SSR); single nucleotide polymorphism (SNP); quantitative trait loci (QTL); chromosome-scale pseudomolecules
Online: 14 January 2019 (07:19:08 CET)
Quantitative trait loci (QTL) are genomic regions associated with phenotype variation of quantitative traits in a population. To date, a total of 267 QTL for 29 quantitative traits have been reported in 13 studies on flax. Of these, 200 QTL from 12 studies were identified based on genetic maps, scaffold sequences, or pre-released chromosome-scale pseudomolecules. Molecular markers for QTL identification differed across studies but were mainly based on simple sequence repeat (SSR) or single nucleotide polymorphism (SNP) markers. This article provides methods with software tools and database files to uniquely map SSR and SNP markers from different references onto the recently released chromosome-scale pseudomolecules. Using these methods, 195 QTL were successfully sorted onto the 15 flax chromosomes and grouped into 133 co-located QTL clusters. Mapping of QTL from different studies to the same reference enables comparisons and facilitates genome-wide QTL analysis, candidate gene scanning, and breeding applications.
REVIEW | doi:10.20944/preprints202006.0324.v1
Subject: Life Sciences, Genetics Keywords: De-novo Genome Assembly; Short Read Genome Assembly; Long Read Genome Assembly; Hybrid Genome Assembly
Online: 28 June 2020 (08:56:09 CEST)
Despite advances in algorithms and computational platforms, de-novo genome assembly remains a challenging process. Due to the constant innovation in sequencing technologies (Sanger, SOLiD, Illumina, 454, PacBio and Oxford Nanopore), genome assembly has evolved to respond to the changes in input data type. This paper includes a broad and comparative review of the most recent short-read, long-read and hybrid assembly techniques. In this review, we provide (1) an algorithmic description of the important processes in the workflow that introduces fundamental concepts and improvements; (2) a review of existing software that explains possible options for genome assembly; and (3) a comparison of the accuracy and the performance of existing methods executed on the same computer using the same processing capabilities and using the same set of real and synthetic datasets. Such evaluation allows a fair and precise comparison of accuracy in all aspects. As a result, this paper identifies both the strengths and weaknesses of each method. This comparative review is unique in providing a detailed comparison of a broad spectrum of cutting-edge algorithms and methods.
ARTICLE | doi:10.20944/preprints202107.0400.v1
Subject: Life Sciences, Biochemistry Keywords: Pangenome; horizontal gene transfer (HGT); core genome; accessory genome
Online: 19 July 2021 (10:19:29 CEST)
Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot of foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onion. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onion and millets or on millets only, respectively. In the current study we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n= 4) and pv. setariae (n=13)]. The full spectrum of the pan-genome contained 7,030 genes. Among these, 3,546 (present in genomes of all 17 strains) were the core genes that were a subset of 3,682 soft-core genes (present in ≥16 strains). The accessory genome included 1,308 shell genes and 2,040 cloud genes (present in ≤ 2 strains). The pan-genome showed a clear liner progression with >6,000 genes, suggesting the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison to core genome SNP-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study including four other Pantoea species namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfers observed between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes including seven cluster of genes associated with the pathogenicity phenotype on onion. One of the clusters contain 11 genes with known functions and are found to be chromosomally located.
ARTICLE | doi:10.20944/preprints201910.0271.v1
Subject: Life Sciences, Microbiology Keywords: genome assembly; monoxenous trypanosomatids; insect trypanosomatids; trypanosomatidae; whole genome
Online: 24 October 2019 (05:20:52 CEST)
We presented here the first draft genome sequence of the trypanosomatid Herpetomonas muscarum ingenoplastis. This parasite was isolated repeatedly in the black blowfly, Phormia regina. This is the first draft genome of a flagellate from the phylogenetically distinct clade of Trypanosomatidae.
ARTICLE | doi:10.20944/preprints201808.0423.v1
Subject: Biology, Animal Sciences & Zoology Keywords: mitochondrial DNA; mitochondrial genome; genome assembly; genome annotation; next generation sequencing; animal genomics; partial genomics; bioinformatics
Online: 24 August 2018 (03:24:37 CEST)
Next-generation sequencing is now a mature technology, allowing partial animal genomes to be produced for many clades. Though many software exist for genome assembly and annotation, a simple pipeline that allows researchers to input raw sequencing reads in fastq format and allow the retrieval of a completely assembled and annotated mitochondrial genome is still missing. mitoMaker 1.0 is a pipeline developed in python that implements (i) recursive de novo assembly of mitochondrial genomes using a set of increasing k-mers; (ii) search for the best matching result to a target mitogenome and; (iii) performs iterative reference-based strategies to optimize the assembly. After (iv) checking for circularization and (v) positioning tRNA-Phe at the beginning, (vi) geneChecker.py module performs a complete annotation of the mitochondrial genome and provides a GenBank formatted file as output.
REVIEW | doi:10.20944/preprints202009.0348.v2
Subject: Life Sciences, Biochemistry Keywords: DNA methylation; epialleles; epiRILs; epigenetics; Epigenome-Wide Association Studies.
Online: 26 September 2020 (08:08:27 CEST)
Plant breeding conventionally depends on genetic variability available in a species to improve a particular trait in the crop. However, epigenetic diversity may provide an additional tier of variation. The recent advent of epigenome technologies has elucidated the role of epigenetic variation in shaping phenotype. Further, the development of epigenetic recombinant inbred lines (epi-RILs) in the model species such as Arabidopsis has enabled accurate genetic analysis of epigenetic variation. Subsequently, mapping of epigenetic quantitative trait loci (epiQTL) allowed association between epialleles and phenotypic traits. Thus, quantitative epigenetics provides ample opportunities to dissect the role of epigenetic variation in trait regulation, which can be eventually utilized in crop improvement programs. Moreover, locus-specific manipulation of DNA methylation by epigenome-editing tools such as clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) can facilitate epigenetic based molecular breeding of important crop plants.
ARTICLE | doi:10.20944/preprints202204.0298.v1
Subject: Life Sciences, Microbiology Keywords: genome; accessory; core genome; Fusarium circinatum; structural variants; inversions; indels; pangenome
Online: 29 April 2022 (10:47:31 CEST)
Fusarium circinatum is an important global pathogen of pine trees. Genome plasticity has been observed in different isolates of the fungus, but no genome comparisons are available. To address this gap, we sequenced and assembled to chromosome level five isolates of F. circinatum. These genomes were analysed together with previously published genomes of F. circinatum isolates FSP34 and KS17. Multi-sample variant calling identified a total of 461683 micro variants (SNPs and small indels) and a total of 1828 macro structural variants of which 1717 were copy number variants and 111 were inversions. Variant density was higher on sub-telomeric regions of chromosomes. Variant annotation revealed that genes involved in transcription, transport, metabolism and transmembrane proteins were overrepresented in gene sets affected by high impact variants. A core genome representing genomic elements conserved in all the isolates and a non-redundant pangenome representing all genomic elements is presented. Whole genome alignments showed that an average of 93% of the genomic elements are present in all isolates. The results of this study reveal that some genomic elements are not conserved within the isolates and some variants are high impact. The described genome-scale variations will help inform novel disease management strategies against the pathogen.
ARTICLE | doi:10.20944/preprints201811.0518.v1
Subject: Engineering, Energy & Fuel Technology Keywords: solar; LiDAR; rooftop photovoltaics; building characteristics; wide-area solar yield
Online: 21 November 2018 (06:59:32 CET)
A new method for wide-area urban roof assessment of suitability for solar photovoltaics is introduced and validated. Knowledge of roof geometry and physical features is essential for evaluation of the impact of multiple rooftop solar photovoltaic (PV) system installations on local electricity networks. This paper begins by reviewing and testing a range of existing techniques for identifying roof characteristics. It was found that no current method is capable of delivering accurate results with publicly available input data. Hence a different approach is developed, based on slope and aspect using LIDAR data, building footprint data, GIS tools and aerial photographs. It assesses each roof’s suitability for PV installation. That is, its properties should allow the installation of at least a minimum size photovoltaic system. In this way the minimum potential solar yield for region or city may be obtained. The accuracy of the new method is then established, by ground-truthing against a database of 886 household systems. This is the largest validation of a rooftop assessment method to date. The method is flexible with few prior assumptions. It is based on separate consideration of buildings and can therefore generate data for various PV scenarios and future analyses.
ARTICLE | doi:10.20944/preprints201703.0042.v1
Subject: Physical Sciences, Optics Keywords: infrared imaging; wide field of view; athermalization; two-piece lens
Online: 8 March 2017 (04:40:47 CET)
For a wide field of view (FoV) wavefront coding athermalized infrared imaging system with a single decoding kernel, the off-axis aberration tends to cause artefacts. In order to correct off-axis aberration, many pieces of lenses will reduce the transmission efficiency and increase the weight and cost. To meet requirements for wide FoV, wide operating temperature and low weight of infrared imaging systems, this paper reports a wide-FoV wavefront coding athermalized infrared imaging system with a two-piece lens. Its principle, design, manufacture, measurement and performance validation are successively discussed. This paper constructs an optimization problem which maximizes the weighted mean of PSF consistency for both the FoV and operating temperature range. The two-piece lens contains four surfaces, where three aspheric surfaces are introduced to reduce optical off-axis aberrations and a cubic surface is introduced to achieve athermalization. The optical phase mask containing an aspheric surface and a cubic surface is manufactured by nano-metric machining of ion implanted material(NiIM). Experimental results validate that our wide-FoV wavefront coding athermalized infrared imaging system has a full FoV of 26.10° and an operating temperature over -20°C to +70°C.
ARTICLE | doi:10.20944/preprints202005.0417.v4
Online: 20 August 2020 (04:20:16 CEST)
Nyssa yunnanensis is a deciduous tree species in the family Nyssaceae within the order Cornales. As only eight individual trees and two populations have been recorded in China’s Yunnan province, this species has been listed among China’s national Class I protection species since 1999 and also among 120 PSESP (Plant Species with Extremely Small Populations) in the Implementation Plan of Rescuing and Conserving China’s Plant Species with Extremely Small Populations(PSESP) (2011-2-15). Here, we present the draft genome assembly of N. yunnanensis. Using 10X Genomics linked-reads sequencing data, we carried out the de novo assembly and annotation analysis. The N. yunnanensis genome assembly is 1475 Mb in length, containing 288,519 scaffolds with a scaffold N50 length of 985.59 kb. Within the assembled genome, 799.51 Mb was identified as repetitive elements, accounting for 54.24% of the sequenced genome, and a total of 39,803 protein-coding genes were predicted. With the genomic characteristics of N. yunnanensis available, our study might facilitate future conservation biology studies to help protect this extremely threatened tree species.
COMMUNICATION | doi:10.20944/preprints201808.0480.v1
Online: 29 August 2018 (04:50:36 CEST)
The recent report that DNA extracted from ancient bone must have from the offspring of a female Neanderthal and a male Denisovan depends on the inference that the subject has a high level of heterozygosity for Neanderthal and Denisovan alleles across the genome. Here I point out that the relative frequencies of derived transversion polymorphisms varies markedly between the new specimen, Denisova 11, and two high-coverage Neanderthal genomes. In Denisova 11 the AC and CG polymorphisms are much commoner than the others and are almost twice as common as the AT polymorphism. In the high-coverage Neanderthal genomes the four types of transversion are about equally common, with the AT being slightly commoner than the others. These results suggest that allele-calling errors are frequent and that this may provide an alternative explanation for the observed heterozygosity.
ARTICLE | doi:10.20944/preprints201810.0609.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: positioning; ultra-wide band; filtration; Kalman filter; smart city; industry 4.0
Online: 25 October 2018 (14:04:47 CEST)
As a part of the proposed article, the authors presented comprehensive data analysis for movement data that comes from a positioning system based on ultra-wide band (UWB) technology. For purpose of this article, a test was carried out during which the car equipped with cruise control overcame the given path at a speed from 10 km/h to 60 km/h. The obtained motion models (information about position) have been filtered through a series of filters - from fundamentals filters with a variable window (median, moving average, Savitzky-Golay filter), through more complex ones like the Wiener or Kalman filter. As a result, the authors proposed a form of data analysis and filtration depending on the speed of the moving object. In addition, the maximum accuracy that can be obtained for a given traffic model was also determined. The whole research proves that it is possible to use a system based on UWB technology in positioning objects for urban applications - smart city, in industry 4.0 applications as well as for positioning autonomous vehicles in urban applications, such as well as on highways to maintain cohesion of convoys vehicles.
ARTICLE | doi:10.20944/preprints201809.0580.v1
Subject: Engineering, Other Keywords: disaster telecommunication; rescue; UWB (ultra-wide band); enclosed space; wireless telecommunication
Online: 29 September 2018 (05:29:26 CEST)
When an earthquake or a large fire has occurred, it is difficult to secure communication networks for rescue in the building due to the destruction of commercial communication networks. Although analog radio systems such as VHF (very high frequency) and UHF (ultra-high frequency) are used for rescue operation in general, communication failure occurs in closed spaces, causing difficulties in smooth rescue operations. When the communication infrastructures have been destroyed in a building in the disaster, an emergency wireless telecommunication environment should be constructed to secure a safer disaster response environment. In this study, along with comparison of the performances of diverse communication frequencies, UWB (Ultra-Wide Band) wireless telecommunication networks were evaluated under five building indoor environment conditions including open spaces. UWB communication modules were fabricated to satisfy the IEEE (The Institute of Electrical and Electronics Engineers) 802.15.4a standard performance to measure distances in which communications are possible according to the indoor environment for each of six channels with different UWB communication frequencies. The results indicated that the distances in which communications are possible for each the six channels were average 15.5 m, maximum 20 m in open spaces; average 17.33 m, maximum 20 m in corridors; average 15.3 m, maximum 20 m in indoor office environments with office fixtures; average 4.33 m, maximum 6 m in vertical spaces of stairs; and average 6.5 m, maximum 17 m in closed horizontal spaces with a fire door. In this case, the communication performance and distance performance were shown to be the most excellent at a frequency (Centre Frequency) of 6489.6 and a band of 5980.3 – 6998.9 MHz, which is UWB 7ch. In conclusion, it is judged that if UWB communication modules are installed in the disaster area at intervals of 20 m and multi-channels are used, communication environments can be constructed even in closed spaces
DATA DESCRIPTOR | doi:10.20944/preprints202208.0349.v1
Online: 18 August 2022 (11:12:25 CEST)
The Peruvian creole cattle (PCC) is a neglected breed, and is an essential livestock resource in the Andean region of Peru. To develop a modern breeding program and conservation strategies for the PCC, a better understanding of the genetics of this breed is needed. We sequenced the whole genome of the PCC using a paired-end 150 strategy on the Illumina HiSeq 2500 platform, obtaining 320 GB of sequencing data. The obtained genome size of the PCC was 2.77 Gb with a contig N50 of 108Mb and 92.59% complete BUSCOs. Also, we identified 40.22% of repetitive DNA of the genome assembly, of which retroelements occupy 32.39% of the total genome. A total of 19,803 protein-coding genes were annotated in the PCC genome. We downloaded proteomes and genomes of the Bovinae subfamily, and conducted a comparative analysis with our draft genome. Phylogenomic analysis showed that PCC is related to Bos indicus. Also, we identified 7,746 family genes shared among the Bovinae subfamily. This first PCC genome is expected to contribute to a better understanding of its genetics to adapt to the tough conditions of the Andean ecosystem, and evolution.
ARTICLE | doi:10.20944/preprints201807.0156.v1
Online: 9 July 2018 (16:08:26 CEST)
Escherichia coli phage Eco_BIFF was isolated from several laboratory stocks of E. coli K-12 MG1655 derivatives. The source of the contamination is unknown. Eco_BIFF is a lytic phage that shows effective growth inhibition of E. coli K-12. Here, we announce the complete genome sequence of Eco_BIFF, and major findings from its genome annotation.
ARTICLE | doi:10.20944/preprints202008.0567.v1
Subject: Materials Science, General Materials Science Keywords: Wide band gap semiconductor; Elastic modulus; Optic-electronic properties; Ab–initio calculations
Online: 26 August 2020 (08:57:57 CEST)
The electronic structure and some of its derived properties of Li2CaGeO4 compound have been investigated. The calculations have been performed using the full-potential linearized augmented plane wave plus local orbitals method and ultra-soft pseudo-potentials . The optimized lattice parameters are found to be ingood accord with experiment. Features such as bulk modulus and its pressure derivative, electronic band structure and density of states are reported. The elastic anisotropy of the crystal is discussed and visualized. Moreover, the optical properties reveal that Li2CaGeO4 compound are suitable candidates for optoelectronic devices in the visible and ultraviolet (UV) regions.
ARTICLE | doi:10.20944/preprints201805.0157.v2
Subject: Earth Sciences, Space Science Keywords: geometry-free; geometry-based; wide-lane ambiguity; orbit and clock residual error
Online: 28 May 2018 (06:06:06 CEST)
Orbit and clock products are used in real-time GNSS precise point positioning without knowing their quality. This study develops a new approach to detect orbit and clock errors through comparing geometry-free and geometry-based wide-lane ambiguities in PPP model. The reparameterization and estimation procedures of the geometry-free and geometry-based ambiguities are described in detail. The effects of orbit and clock errors on ambiguities are given in analytical expressions. The numerical similarity and differences of geometry-free and geometry-based wide-lane ambiguities are analyzed using different orbit and clock products. Furthermore, two types of typical errors in orbit and clock are simulated and their effects on wide-lane ambiguities are numerically produced and analyzed. The contribution discloses that the geometry-free and geometry-based wide-lane ambiguities are equivalent in terms of their formal errors. Although they are very close in terms of their estimates when the used orbit and clock for geometry-based ambiguities are precise enough, they are not the same, in particular, in the case that the used orbit and clock, as a combination, contain significant errors. It is discovered that the discrepancies of geometry-free and geometry-based wide-lane ambiguities are coincided with the actual time-variant errors in the used orbit and clock at the line-of-sight direction. This provides a quality index for real-time users to detect the errors in real-time orbit and clock products, which potentially improves the accuracy of positioning.
ARTICLE | doi:10.20944/preprints201609.0038.v2
Subject: Earth Sciences, Environmental Sciences Keywords: SAR offset and speckle tracking; glacier velocity; Radarsat-2 Wide Fine; Svalbard
Online: 10 September 2016 (05:03:14 CEST)
Glacier dynamics play an important role in the mass balance of many glaciers, ice caps and ice sheets. In this study we exploit Radarsat-2 (RS-2) Wide Fine (WF) data to determine the surface speed of Svalbard glaciers in the winters of 2012/2013 and 2013/2014 using Synthetic Aperture RADAR (SAR) offset and speckle tracking. The RS-2 WF mode combines the advantages of the large spatial coverage of the Wide mode (150 x 150 km) and the high pixel resolution (9m) of the Fine mode and thus has a major potential for glacier velocity monitoring from space through offset and speckle tracking. Faster flowing glaciers (1.95 m d-1 - 2.55 m d-1) which are studied in detail are Nathorstbreen, Kronebreen, Kongsbreen and Monacobreen. Using our Radarsat-2 WF dataset, we compare the performance of two SAR tracking algorithms, namely the GAMMA Remote Sensing Software and a custom written MATLAB script (GRAY method) that has primarily been used in the Canadian Arctic. Both algorithms provide comparable results, especially for the faster flowing glaciers and the termini of slower tidewater glaciers. A comparison of the WF data to RS-2 Ultrafine and Wide mode data reveals the superiority of RS-2 WF data over the Wide mode data.
REVIEW | doi:10.20944/preprints202111.0170.v3
Online: 5 May 2022 (10:38:09 CEST)
Non-vertebrate species represent about ~95% of known metazoan (animal) diversity. They remain to this day relatively unexplored genetically, but understanding their genome structure and function is pivotal for expanding our current knowledge of evolution, ecology and biodiversity. Following the continuous improvements and decreasing costs of sequencing technologies, many genome assembly tools have been released, leading to a significant amount of genome projects being completed in recent years. In this review, we examine the current state of genome projects of non-vertebrate animal species. We present an overview of available sequencing technologies, assembly approaches, as well as pre and post-processing steps, genome assembly evaluation methods, and their application to non-vertebrate animal genomes.
REVIEW | doi:10.20944/preprints202111.0350.v1
Online: 19 November 2021 (12:33:53 CET)
The newly established virus family Phenuiviridae in Bunyavirales harbors viruses infecting three kingdoms of host organisms (animals, plants, and fungi), which is rare in known virus families. Many phenuiviruses are arboviruses and replicate in two distinct hosts (e.g., insects and humans or rice). Multiple phenuiviruses, such as Dabie bandavirus, Rift Valley fever phlebovirus, and Rice stripe tenuivirus, are highly pathogenic to humans, animals, or plants. They impose heavy global burdens on human health, livestock industry, and agriculture and are research hotspots. In recent years the taxonomy of Phenuiviridae has been expanded greatly, and researches on phenuiviruses have made significant progress. With these advances, this review drew a novel panorama regarding the biomedical significance, distribution, morphology, genomics, taxonomy, evolution, replication, transmission, pathogenesis, and control of phenuiviruses, to aid researchers in various fields to recognize this highly adaptive and very important virus family.
ARTICLE | doi:10.20944/preprints202110.0027.v1
Subject: Life Sciences, Other Keywords: eukaryogenesis; genome complexification; atmospheric oxidation; macroevolution
Online: 1 October 2021 (15:26:03 CEST)
The origin of the nucleus remains a great mystery in life science, although nearly two centuries have passed since the discovery of nuclei. To date, studies of eukaryogenesis have focused largely on micro-evolutionary explanations. Here, we examined macro-patterns of C-values (the total amount of DNA within the haploid chromosome set of an organism) for over 110,000 species and the chromosome numbers for over 11,000 species and their potential links with the state of atmospheric oxidation over geological time. Eukaryogenesis was in sync with an over 2.5 order-of-magnitude increase in genome size from prokaryote to eukaryote, and also with a rapid rise of atmospheric oxidation, suggesting that eukaryogenesis would have resulted from a regime shift of genomes driven by the oxidation-driven complexification and structuralization (e.g. chromatin packing).
ARTICLE | doi:10.20944/preprints202009.0207.v1
Online: 9 September 2020 (10:48:24 CEST)
Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. It also has been widely used to study structural variants, phase haplotypes and more. Here, we introduce the assembler— SMARTdenovo, which is an SMS assembler that follows the overlap-layout-consensus (OLC) paradigm. SMARTdenovo (RRID: SCR_017622) was designed to be a fast assembler that did not require highly accurate raw reads for error correction, unlike other, contemporaneous SMS assemblers. It has performed well for evaluating congeneric assemblers and has been successful for a variety of assembly projects. It is compatible with Canu for assembling high-quality genomes, and several of the assembly strategies in this program have been incorporated into subsequent popular assemblers. The assembler has been in use since 2015, and here we provide information on the development of SMARTdenovo and how to implement its algorithms into current projects.
ARTICLE | doi:10.20944/preprints202008.0275.v1
Subject: Biology, Plant Sciences Keywords: transposable elements; genome annotation; software evaluation
Online: 12 August 2020 (08:07:14 CEST)
Background: Transposable elements (TEs) constitute the vast majority of all eukaryotic DNA, and display extreme diversity, with thousands of families. Given their abundance and diversity, TEs discovery and annotation becomes challengeable. At present, tools and databases have built libraries to mask TEs in genomes based on de novo- and homology-based identification strategies, but no consensus criteria about which tools should be used have been proposed. Results: In the de novo-based strategy, we compared performances of TE libraries developed by four commonly used tools, including RepeatModeler, LTR_FINDER, LTRharvest, and MITE_Hunter, by using a simulated genome as a standard control. The results showed that the performance of RepeatModeler decreased as it was combined with either LTR_FINDER or LTRharvest. Combination of RepeatModeler and MITE_Hunter showed better performance than RepeatModeler and MITE_Hunter alone. In the homology-based strategy, we evaluated different sources from a taxonomic point of view to build an accurate TE library. When we selected a library from databases to identify TEs for Arabidopsis thaliana genome, the library from a genus genetically closer to Arabidopsis achieved better performance than other genera with further genetic distance. Without the Arabidopsis, combination of top three genera closer to Arabidopsis showed better performance than combination of all genera. Conclusion: This study proposes a series of recommendations to perform an accurate TE annotation: 1) For de novo-based strategy, RepeatModeler and MITE_Hunter are suggested to build a TE library; 2) For homology-based strategy, it is recommended to use library of genus genetically close to the species rather than use combined library from all genera.
BRIEF REPORT | doi:10.20944/preprints201911.0214.v1
Subject: Biology, Animal Sciences & Zoology Keywords: shark; genome; longevity; gigantism; positive selection
Online: 18 November 2019 (07:46:50 CET)
A previous study involving whole genome sequencing of the white shark suggested unique molecular evolution accounting for gigantism and the enhanced longevity of sharks including positive selection of dozens of protein-coding genes potentially involved in genome stability. We performed a reanalysis on some of the genes and identified serious flaws in their results. In this short article, we scrutinize one of the serious problems we identified, report other concerns, and point out a potential bias in analyzing iconic shark species in general.
ARTICLE | doi:10.20944/preprints201802.0098.v1
Subject: Biology, Plant Sciences Keywords: Boechera; Brassicaceae; genome; assembly; annotation; apomixis
Online: 14 February 2018 (07:29:29 CET)
Closely related to the model plant Arabidopsis thaliana, the genus Boechera is known to contain both sexual and apomictic species or accessions. Boechera retrofracta is a diploid sexually reproducing species and is thought to be an ancestral parent species of the apomictic species Boechera divaricarpa. Here we report the de novo assembly of the B. retrofracta genome using short Illumina and Roche reads from 1 paired-end and 3 mate pair libraries. The distribution of 23-mers from the paired end library has indicated a low level of heterozygosity and the presence of detectable duplications and triplications. The genome size was estimated to be equal 227 Mb. N50 of the assembled scaffolds was 2.3 Mb. 27048 protein-coding genes were predicted using a hybrid approach that combines homology-based and de novo methods. Also repeats, tRNA and rRNA genes were annotated. Finally, genes of B. retrofracta and 6 other Brassicaceae species were used for phylogenetic tree reconstruction. Also, a detailed analysis of evolution of the APOLLO apomixis-associated locus was performed. An assembled genome of B. retrofracta will help in the challenging assembly of the highly heterozygous genomes of hybrid apomictic species such as B. divaricarpa.
ARTICLE | doi:10.20944/preprints201707.0034.v1
Subject: Keywords: localization; internet of things; low power wide area networks; Wi-Fi; sigfox; fingerprinting
Online: 14 July 2017 (11:30:28 CEST)
Supply chain management requires regular updates of the location of assets, which can be enabled by low power wide area networks, such as Sigfox. While it is useful to localize a device simply by its communication signals, this is very difficult to do with Sigfox because of wide area and ultra narrowband nature. On the other hand, installing a satellite localization element on the device greatly increases its power consumption. We investigated using information about nearby Wi-Fi access points as a way to localize the asset over the Sigfox network, so without connecting to those Wi-Fi networks. This paper reports the location error that can be achieved by this type of outdoor localization. By using a combination of two databases, we could localize the device on all 36 test locations with a median location error of 39 m. This shows that the localization accuracy of this method is promising enough to warrant further study, most specifically the minimal power consumption.
ARTICLE | doi:10.20944/preprints201811.0183.v2
Subject: Life Sciences, Molecular Biology Keywords: sequencing technologies; NGS; genome research; genome assembly; variant calling; RNA-Seq; transcriptome assembly; bioinformatics; molecular biology; education
Online: 13 November 2018 (10:22:06 CET)
Combined awareness about the power and limitations of bioinformatics and molecular biology enables advanced research based on high-throughput data. Despite an increasing demand for scientists with a combined background in both fields, the education in dry lab and wet lab is often separated. This work describes an example of integrated education with focus on genomics and transcriptomics. Participants learn computational and molecular biology methods in the same practical course. Peer-review is applied as a teaching method to foster cooperative learning of students with heterogeneous backgrounds. Evaluation results indicate acceptance and appreciation of this approach.
TECHNICAL NOTE | doi:10.20944/preprints202009.0678.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: multi-frame super resolution; wide activation super resolution; 3D convolutional neural network; deep learning
Online: 27 September 2020 (11:54:56 CEST)
The small satellite market continues to grow year after year. A compound annual growth rate of 17% is estimated during the period between 2020 and 2025. Low-cost satellites can send a vast amount of images to be post-processed at the ground to improve the quality and extract detailed information. In this domain lies the resolution enhancement task, where a low-resolution image is converted to a higher resolution automatically. Deep learning approaches to Super-Resolution (SR) reached the state-of-the-art in multiple benchmarks; however, most of them were studied in a single-frame fashion. With satellite imagery, multi-frame images can be obtained at different conditions giving the possibility to add more information per image and improve the final analysis. In this context, we developed and applied to the PROBA-V dataset of multi-frame satellite images a model that recently topped the European Space Agency’s Multi-frame Super Resolution (MFSR) competition. The model is based on proven methods that worked on 2D images tweaked to work on 3D: the Wide Activation Super Resolution (WDSR) family. We show that with a simple 3D CNN residual architecture with WDSR blocks and a frame permutation technique as data augmentation better scores can be achieved than with more complex models. Moreover, the model requires few hardware resources, both for training and evaluation, so it can be applied directly from a personal laptop.
REVIEW | doi:10.20944/preprints202111.0084.v1
Subject: Medicine & Pharmacology, Clinical Neurology Keywords: Parkinson’s disease; gene therapy; mitochondria; genome editing
Online: 3 November 2021 (14:17:16 CET)
Background. Mitochondrial dysfunction has been identified as a pathophysiological hallmark of disease onset and progression in patients with Parkinsonian disorders. Besides the overall emergence of gene therapies in treating these patients, this highly relevant molecular concept has not yet been defined as a target for gene therapeutic approaches. Methods. This narrative review will discuss the experimental evidence suggesting mitochondrial dysfunction as a viable treatment target in patients with monogenic and idiopathic Parkinson’s disease. In addition, we will focus on general treatment strategies and crucial challenges which need to be overcome. Results. Our current understanding of mitochondrial biology in parkinsonian disorders opens up the avenue for viable treatment strategies in Parkinsonian disorders. Insights can be obtained from primary mitochondrial diseases. However, substantial knowledge gaps and unique challenges of mitochondria-targeted gene therapies need to be addressed to provide innovative treatments in the future. Conclusions. Mitochondria-targeted gene therapies are a potential strategy to improve an important primary disease mechanism in Parkinsonian disorders. However, further studies are needed to address the unique design challenges for mitochondria-targeted gene therapies.
REVIEW | doi:10.20944/preprints202109.0264.v1
Subject: Biology, Other Keywords: choanoflagellates; multicellularity; animal origins; genome editing; electroporation
Online: 15 September 2021 (14:39:19 CEST)
Choanoflagellates, the closest living relatives of animals, have the potential to reveal the genetic and cell biological foundations of complex multicellular development in animals. Here we describe the history of research on the choanoflagellate Salpingoeca rosetta. From its original isolation in 2000 to the establishment of CRISPR-mediated genome editing in 2020, S. rosetta provides an instructive case study in the establishment of a new model organism.
ARTICLE | doi:10.20944/preprints202105.0422.v1
Subject: Medicine & Pharmacology, Allergology Keywords: genome editing; CRISPR; Cas9; in vivo editing
Online: 18 May 2021 (11:27:46 CEST)
The development of CRISPR associated proteins, such as Cas9, has led to increased accessibility and ease of use in genome editing. However, additional tools are needed to quantify and identify successful genome editing events in living animals. We developed a method to rapidly and quantitatively monitor gene editing activity non-invasively in living animals that also facilitates confocal microscopy and nucleotide level analyses at the end of study. Here we report a new CRISPR “footprinting” approach to activate luciferase and fluorescent proteins in mice as a function of gene editing. This system is based on experience with our prior Cre-detector system and is designed for Cas editors able to target LoxP including gRNAs including SaCas9 and ErCas12a [1, 2]. These CRISPRs cut specifically within LoxP, an approach that is a departure from previous gene editing in vivo activity detection techniques that targeted adjacent stop sequences. In this sensor paradigm, CRISPR activity was monitored non-invasively in living Cre reporter mice (FVB.129S6(B6)-Gt(ROSA)26Sortm1(Luc)Kael/J and Gt(ROSA)26Sortm4(ACTB-tdTomato,-EGFP)Luo/J, which will be referred to as LSL and mT/mG throughout the paper) after intramuscular or intravenous hydrodynamic plasmid injections, demonstrating utility in two diverse organ systems. The same genome-editing event was examined at the cellular level in specific tissues by confocal microscopy to determine the identity and frequency of successfully genome-edited cells. Further, SaCas9 induced targeted editing at efficiencies that were comparable to Cre recombinase demonstrating high effective delivery and activity in a whole animal. This work establishes genome editing tools and models to track CRISPR editing in vivo non-invasively and to fingerprint the identity of targeted cells. This approach also enables similar utility for any of the thousands of previously generated LoxP animal models.
REVIEW | doi:10.20944/preprints202103.0070.v1
Subject: Biology, Anatomy & Morphology Keywords: Genome; gene families; Transposable elements; Entamoeba histolytica
Online: 2 March 2021 (10:11:58 CET)
Entamoeba histolytica, like other Organismes, is characterized by diversity and heterogeneity in its genetic content, which is one of the most important reasons for survival, and the increase in susceptibility to infection.Non-condensation of chromosomes during the process of cell division and the ambiguity of the chromosomal ploidy makes predicting the exact chromosomal number difficult. Genes distributed across 14 chromosomes as well as many extra-chromosome elements. Most Genes composed of one axon only, with Introns in 25% of Genes. This genome is characterized by the presence of Polymorphic internal repeat regions, and several gene families, one of these large families encoding Transmembrane kinas, Cysteine protease (CP), SREHP protein, and others.
Subject: Life Sciences, Biochemistry Keywords: human metapneumovirus; whole genome sequencing; genomic epidemiology
Online: 3 February 2021 (10:08:44 CET)
Human metapneumovirus (HMPV) is an important cause of upper and lower respiratory tract disease in individuals of all ages. It is estimated that most individuals will be infected by HMPV by the age of 5 years old. Despite this burden of disease, there remains caveats in our knowledge of virus global genetic diversity due to a lack of HMPV sequencing, particularly at whole genome scale. The purpose of this study was to create a simple and robust approach for HMPV whole genome sequencing to be used for genomic epidemiological studies. To design our assay, all available HMPV full length genome sequences were downloaded from the NCBI GenBank database and used to design four primer sets to amplify long, overlapping amplicons spanning the viral genome and, importantly, specific to all known HMPV subtypes. These amplicons were then pooled and sequenced on an Illumina iSeq; however the approach is suitable to other common NGS platforms. We demonstrate the utility of this method using a representative subset of clinical samples and examine these sequences using a phylogenetic approach. Here we present an amplicon-based method for the whole genome sequencing of HMPV from clinical extracts that can be used to better inform genomic studies of HMPV epidemiology and evolution.
REVIEW | doi:10.20944/preprints202101.0212.v1
Online: 12 January 2021 (10:14:46 CET)
The constitutively active tyrosine kinase BCR/ABL1 oncogene plays a key role in human chronic myeloid leukemia development and disease maintenance, and determines most of the features of this leukemia. For this reason, tyrosine kinase inhibitors are the first-line treatment, offering most patients a life expectancy like that of an equivalent healthy person. However, since the oncogene is not destroyed, lifelong oral medication is essential, even though this trigger adverse effects in many patients. Furthermore, leukemic stem cells remain quiescent and resistance is observed in approximately 25% of patients. Thus, new therapeutic alternatives are still needed. In this scenario, the emergence of CRISPR technology can offer a definitive treatment based on its capacity to disrupt coding sequences. This review describes CML disease and the main advances in the genome-editing field by which it may be treated in the future.
ARTICLE | doi:10.20944/preprints201907.0169.v1
Subject: Life Sciences, Microbiology Keywords: polyvalent bacteriophage FP01, Escherichia coli, Salmonella, genome
Online: 12 July 2019 (13:07:09 CEST)
Recently the polyvalent bacteriophage FP01, isolated from wastewater in Valparaiso, Chile, was described to have lytic activity across species against Escherichia coli and Salmonella enterica serovars. Due to it polyvalent nature the bacteriophage FP01 could have potential application in food and agri-industry. Also, fundamental aspects of polyvalent bacteriophage biology are not well known. In this study we sequenced and describe the complete genome of the polyvalent phage FP01 (MH745368) using the nanopore technology. The bacteriophage FP01 genome has a 44,900 bp, double-stranded DNA with an average G+C content of 49.41% and 90 coding sequences (CDSs). We found that the phage FP01 critically depends on host factors for replication and transcription. Also, it has a critical lysogenic repressor pseudogene. Phylogenetic analyses indicated that the phage FP01 is closely related to phages lambda and P22. These results suggest that the phage FP01 could be a lytic variant of a lysogenic phage or acquired genes from lysogenic phages during host infection.
ARTICLE | doi:10.20944/preprints201906.0310.v1
Subject: Life Sciences, Microbiology Keywords: cyanobacteria; secondary metabolite; genome mining; molecular networking
Online: 30 June 2019 (10:42:22 CEST)
Cyanobacteria are an ancient lineage of slow-growing photosynthetic bacteria and a proliﬁc source of natural products with diverse chemical structures and potent biological activities and toxicities. The chemical identiﬁcation of these compounds remains a major bottleneck. Strategies that can prioritize the most proliﬁc strains and novel compounds are of great interest. Here, we combine chemical analysis and genomics to investigate the chemodiversity of secondary metabolites based on their pattern of distribution within some cyanobacteria. Planktothrix being a cyanobacterial genus known to form blooms worldwide and to produce a broad spectrum of toxins and other bioactive compounds, we applied this combined approach on four closely related strains of Planktothrix. The chemical diversity of the metabolites produced by the four strains was evaluated using an untargeted metabolomics strategy with high-resolution LC-MS. Metabolite proﬁles were correlated with the potential of metabolite production identified by genomics for the different strains. Although, the Planktothrix strains present a global similarity in term biosynthetic cluster gene for microcystin, aeruginosin and prenylagaramide for example, we found remarkable strain-specific chemo-diversity. Only few of the chemical features were common to the four studied strains. Additionally, the MS/MS data were analyzed using Global Natural Products Social Molecular Networking (GNPS) to identify molecular families of the same biosynthetic origin. In conclusion, we present an efﬁcient integrative strategy for elucidating the chemical diversity of a given genus and link the data obtained from analytical chemistry to biosynthetic genes of cyanobacteria.
ARTICLE | doi:10.20944/preprints201905.0199.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: wide area protection system (WAPS); reliability; Fault Tree Analysis (FTA) model; information flow; multi-service
Online: 16 May 2019 (10:10:54 CEST)
Based on the topology of wide area protection system (WAPS), after studying the reliability of hardware system and information flow in the WAPS and establishing the reliability assessment model, the multi-service reliability analysis method with multi monitoring and protection tasks in WAPS was proposed. In the model, the impact of network quality of service (QoS) such as information flow loss and delay, is studied. On the base of the model, the multi-service reliability evaluation method is employed to analyze the reliability of a WAPS of IEEE14 node power system, and the key nodes of the WAPS is given, which provides a basis for improving the reliability of the WAPS.
ARTICLE | doi:10.20944/preprints202207.0292.v1
Subject: Medicine & Pharmacology, Other Keywords: Cryptococcus; Whole-Genome Sequencing; VGVI; phylogenomics; Molecular Type
Online: 20 July 2022 (03:16:00 CEST)
Whole-genome sequencing has advanced our understanding of the population structure of the pathogenic species complex Cryptococcus gattii, which has allowed for the phylogenomic specification of previously described major molecular type groupings and novel lineages. Recently, isolates collected in Mexico in the 1960s were determined to be genetically distant from other known molecular types and were classified as VGVI. We sequenced four clinical isolates and one veterinary isolate collected in the southwestern U.S. and Argentina during 2012-2021. Phylogenomic analysis groups these genomes with those of the Mexican VGVI isolates, expanding VGVI into a clade and establishing this molecular type as a clinically important population. These findings also potentially expand the known Cryptococcus ecological range with a previously unrecognized endemic area.
ARTICLE | doi:10.20944/preprints202205.0225.v1
Subject: Life Sciences, Genetics Keywords: chloroplast; genome; sweet cucumber; Solanaceae; next-generation sequencing
Online: 17 May 2022 (08:38:03 CEST)
Sweet cucumber (Solanum muricatum) sect. Basarthrum, is a neglected horticultural crop native of the Andean region. It is naturally distributed very close to potatoes (Solanum sect. Petota) and tomatoes (Solanum sect. Lycopersicon), two groups of high economic importance. To date, molecular tools for this crop are still undetermined. We here obtained the first complete chloroplast (cp) genome of sweet cucumber and compared with seven Solanaceae species. Pair-end clean reads were obtained by PE 150 library and the Illumina HiSeq 2500 platform. The complete cp genome of S. muricatum had a 155,681 bp with typical quadripartite structure, containing a large single copy (LSC) region (86,182 bp) and a small single-copy (SSC) region (18,360 bp), separated by two inverted repeat (IR) regions (25,568 bp). The annotation of chloroplast genome predicted 88 protein-coding genes (CDS), 8 ribosomal RNA (rRNA) genes, 37 transfer RNA (tRNA) genes, and one pseudogene. A total of 48 perfect microsatellites were identified, divided in mononucleotide repeats (32), followed by tetranucleotide (6) and dinucleotides (5). SSRs with trinucleotides repeats (3), pentanucleotide (1) and hexanucleotide (1) repeats motifs in these genomes were identified in lower quantity. Most of these repeats were distributed in the noncoding regions. Whole chloroplast genome comparison with the other seven Solanaceae species revealed that the small and large single copy regions showed more divergence than inverted regions. Finally, phylogenetic analysis resolved that S. muricatum is a sister species to members of sections Petota + Lycopersicum + Etuberosum. This study reports for the first time the genome organization, gene content, and structural features of the cp genome of S. muricatum. Also, this study may provide the basis for evaluating genetic diversity within Solanum, and will be useful to examine the evolutionary processes in sweet cucumber landraces.
ARTICLE | doi:10.20944/preprints202201.0333.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Covid-19; Ensemble; Genome sequencing; Machine learning; Variant
Online: 21 January 2022 (15:17:58 CET)
Covid-19 has caused infections and deaths worldwide. While research in the field of Data Science has contributed good predictions of positive Covid-19 case numbers, this study's review of literature shows there is little research in the use of variants of the virus in predictions. We set out to define and evaluate novel variant features. We find that features relating to variant trends, thresholds and amino acid substitutions are especially powerful in two tasks. In the first task, predicting Covid-19 case numbers, accuracy improved from 71.53% without variant features to 82.12% with variant features. In the second task, predicting transmission severity of variants between two classes, we created a method to build some variable ensembles through selecting appropriate models that are generated with variant features. The test results showed that our ensembles are more accurate and reliable. One particular ensemble of 14 models correctly classified 90.91% of variants, outperforming other models including the popular Random Forest ensemble. In addition, as the variant features have represented more underlying information about Covid-19 pathophysiology, our ensemble methods use only a few data samples to achieve an accurate prediction. The ensemble of 14 models uses only 50 cases of each variant, an ability that could be exploited for early detection of highly infectious variants. These research findings may benefit public health professionals, policy makers, and the research community in the collective efforts to overcome this disease.
ARTICLE | doi:10.20944/preprints202111.0167.v1
Subject: Biology, Plant Sciences Keywords: chloroplast genome; Compositae; phylogenetic incongruence; plastid DNA; Senecioneae
Online: 9 November 2021 (12:51:07 CET)
Plastid genomes are in general highly conserved given their slow evolutionary rate, thus large changes in their structure are unusual. However, when specific rearrangements are present, they are often phylogenetically informative. Asteraceae is a highly diverse family whose evolution is long driven by polyploidy (up to 48x) and hybridisation, both processes usually complicating systematic inferences. In this study, we have generated one of the most comprehensive plastome-based phylogenies of family Asteraceae, providing information about the structure, genetic diversity, and repeat composition of these sequences. By comparing the whole plastome sequences obtained, we confirmed the double inversion located in the long single copy region, for most of the species analysed (with the exception of basal tribes), a well-known feature for Asteraceae plastomes. We also show that genome size, gene order and gene content are highly conserved along the family. However, species representative of the basal subfamily Barnadesioideae -as well as in the sister family Calyceraceae - are lacking the pseudogene rps19 located in one inverted repeat. The phylogenomic analysis conducted here, based on 63 protein-coding genes, 30 transfer RNA genes and 21 ribosomal RNA genes from 36 species of Asteraceae, are overall consistent with the general consensus for the family’s phylogeny, while resolving the position of tribe Senecioneae and revealing some incongruences at tribe level between reconstructions based on nuclear and plastid DNA data.
ARTICLE | doi:10.20944/preprints202110.0367.v1
Subject: Biology, Other Keywords: Bacteria; culturomics; genome; species; sp. nov.,; taxono-genomics
Online: 25 October 2021 (15:47:32 CEST)
Marseille-Q4369 is a strain that we isolated from human healthy skin and characterized by taxono-genomic approach. Marseille-Q4369 exhibited 99.80% 16S rRNA sequence similarity with Agrococcus pavilionensisT the phylogenetically closest bacterium with standing in nomenclature. Furthermore, digital DNA–DNA hybridization revealed a maximum identity similarity of only 52.4% and an OrthoANI parameter provided a value of 93.63% between the novel organism and Agrococcus pavilionensisT. Marseille-Q4369 was observed to be a yellowish-pigmented, Gram-positive, coccoïd, facultative aerobic bacterium, and belonging to the Microbacteriaceae family. The major fatty acids detected are 12-methyl-tetradecanoic acid (66%), 14-methyl-hexadecanoic acid (24%) followed by 13-methyl-tetradecanoic acid (5%). The genome size of strain Marseille-Q4369 was 2,737,735-bp long with a 72,27 % G+C content. Taken altogether, these results confirm the status of this strain as a new member of the Agrococcus genus for which the name of Agrococcus massiliensis is proposed (=CSUR-Q4369 = DSM112404).
ARTICLE | doi:10.20944/preprints202106.0039.v1
Subject: Life Sciences, Virology Keywords: LAIV, Influenza, HA, IgA, IgG, vaccine, genome rearrangement
Online: 1 June 2021 (15:02:27 CEST)
Influenza B virus (IBV) is considered a major respiratory pathogen responsible for seasonal respiratory disease in humans, particularly severe in children and the elderly. Seasonal influenza vaccination is considered the most efficient strategy to prevent and control IBV infections. Live attenuated influenza virus vaccines (LAIVs) are thought to induce both humoral and cellular immune responses by mimicking a natural infection, but their effectiveness have recently come into question. Thus, the opportunity exists to find alternative approaches to improve overall influenza vaccine effectiveness. Two alternative IBV backbones were developed with re-arranged genomes, re-arranged M (FluB-RAM) and a re-arranged NS (FluB-RANS). Both re-arranged viruses showed temperature sensitivity in vitro compared to the WT type B/Bris strain, were genetically stable over multiple passages in embryonated chicken eggs and were attenuated in vivo in mice. In a prime-boost regime in naïve mice, both re-arranged viruses induced antibodies against HA with hemagglutination inhibition titers considered of protective value. In addition, antibodies against NA and NP were readily detected with potential protective value. Upon lethal IBV challenge, mice previously vaccinated with either FluB-RAM or FluB-RANS were completely protected against clinical disease and mortality. In conclusion, genome re-arrangement renders efficacious LAIV candidates to protect mice against IBV.
ARTICLE | doi:10.20944/preprints202101.0526.v1
Subject: Biology, Anatomy & Morphology Keywords: Asaia; paratransgenesis; symbiotic traits; Anopheles stephensi; genome features
Online: 26 January 2021 (08:19:00 CET)
Asaia bacteria commonly comprise part of the microbiome of many mosquito species in the genera Anopheles and Aedes, including important vectors of infectious agents. Their close association with multiple organs and tissues of their mosquito hosts enhances the potential for paratransgenesis for delivery of anti-malaria or anti-virus effectors. The molecular mechanisms involved in the interactions between Asaia and mosquito hosts, as well as Asaia and other bacterial members of the mosquito microbiome, remained unexplored. Here, we determined the genome sequence of the strain W12 isolated from Anopheles stephensi mosquitoes, compared them to other Asaia species associated with plants or insects, and investigated some properties of the bacteria relevant to their symbiosis with host mosquitoes. The assembled genome of strain W12 has a size of 3.94 MB, which is the largest among Asaia spp studied so far. At least 3,585 coding sequences were predicted. The insect-associated Asaia including strain W12 carried more glycoside hydrolase (GH) encoding genes (31 per genome) than those isolated from plants (22 per genome). W12 had the most predicted regulatory protein components (213) among the selected Asaia (ranging from 131 to 211), indicating its great capability to adapt to frequent environmental changes in the mosquito gut. Two complete operons encoding cytochrome bo3-type ubiquinol terminal oxidases (cyoABCD-1 and cyoABCD-2) were found in most of Asaia genomes, which possibly offer alternative terminal oxidases and allow the flexible transition of respiratory pathways. Genes involved in the production of acetoin and 2,3-butandiol have been identified in Asaia sp. W12.
REVIEW | doi:10.20944/preprints202011.0603.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: genome editing, agriculture, crispr, talen, specificity, off-target
Online: 24 November 2020 (08:35:00 CET)
We are in a new chapter of crop and livestock improvement with the emergence of genome editing. This latest generation of molecular tools can be used to make targeted changes in a genome including insertions, deletions, and mutations. With new advances comes new risks for unintended changes and impacts, thus the need for appropriate risk assessment for product development and to inform regulatory measures. Though CRISPR/Cas has arisen as the predominant technology, there are multiple types of genome editing tools each with pros and cons depending on the organism and desired outcome. Furthermore, each editing tool differs in specificity as they may edit non-intended sites, referred to as off-target edits. The consensus of the agricultural editing community is to avoid off-target editing through design and detection, instead of determining whether off-target editing in each case is detrimental. The design of a targeting component, the tool chosen, and the identification of the edit(s) made are the critical factors in avoiding off-target edits and confirming intended edits in final products that are released commercially. The limited amount of head-to-head comparisons of genome editing tools in diverse crops and livestock make it difficult to develop broad conclusions and best practices, which is further compounded by the diversity of techniques, targets, and processes. Developers and breeders should consult the literature and test as needed to determine which editing technology will be the most effective for their purposes, especially as more tools with altered efficiency and specificity become available. Yet, the lack of off-target edits in studies that employed careful design of targeting components followed by wide testing for on- and off-target edits bodes well for the use of genome editing with proper precautions of target selection and screening.
ARTICLE | doi:10.20944/preprints202011.0237.v1
Subject: Life Sciences, Biochemistry Keywords: Saccharomyces cerevisiae; SCRaMbLE; genome evolution; industrial yeast strains
Online: 6 November 2020 (10:30:45 CET)
Genome-scale engineering and custom synthetic genomes are reshaping the next generation of industrial yeast strains. The Cre-recombinase mediated chromosomal rearrangement mechanism of designer synthetic Saccharomyces cerevisiae chromosomes, known as SCRaMbLE, is a powerful tool which allows rapid genome evolution upon command. This system is able to generate millions of novel genomes with potential valuable phenotypes, but the excessive loss of essential genes often results in poor growth or even the death of cells with useful phenotypes. In this study we expanded the versatility of SCRaMbLE to industrial strains, and evaluated different control measures to optimise genomic rearrangement, whilst limiting cell death. To achieve this, we have developed RED (Rapid Evolution Detection), a simple colorimetric plate-assay procedure to rapidly quantify the degree of genomic rearrangements within a post-SCRaMbLE yeast population. RED-enabled semi-synthetic strains were mated with haploid progeny of industrial yeast strains to produce stress tolerant heterozygous diploid strains. Analysis of these heterozygous strains with the RED-assay, genome sequencing and custom bioinformatics scripts demonstrated a correlation between RED-assay frequencies and physical genomic rearrangements. Here we show that RED is a fast and effective method to evaluate optimal SCRaMbLE induction times of different Cre-recombinse expression systems for the development of industrial strains.
BRIEF REPORT | doi:10.20944/preprints202010.0601.v1
Subject: Biology, Anatomy & Morphology Keywords: Diptera; Calliphoridae; Luciliinae; complete mitochondrial genome; Lucilia sericata
Online: 29 October 2020 (09:22:19 CET)
In the present study, the complete mitochondrial genome of the New Zealand parasitic blowfly Lucilia sericata (green bottle blowfly) field strain NZ_LucSer_NP was generated using next-generation sequencing technology. The length of complete the mitochondrial genome is 15,938 bp, with 39.4% A, 13.0% C, 9.3% G, and 38.2% T nucleotide distribution. The complete mitochondrial genome consists of 13 protein-coding genes, two ribosomal RNAs, 22 transfer RNAs, and a and a 1,124 bp non-coding region, similar to most metazoan mitochondrial genomes. Phylogenetic analysis showed that L. sericata NZ_LucSer_NP forms a monophyletic cluster with the remaining six Lucilia species and the Calliphoridae are polyphyletic. This study provides the first complete mitochondrial genome sequence for a L. sericata blowfly species derived from New Zealand to facilitate species identification and phylogenetic analysis.
CONCEPT PAPER | doi:10.20944/preprints202010.0160.v1
Subject: Biology, Anatomy & Morphology Keywords: nomenclature; Candidatus; metagenome-assembled genomes; genome-based taxonomy
Online: 7 October 2020 (15:08:01 CEST)
Latin binomials, popularised in the eighteenth century by the Swedish naturalist Linnaeus, have stood the test of time in providing a stable, clear and memorable system of nomenclature across biology. However, relentless and ever-deeper exploration and analysis of the microbial world has created an urgent unmet need for huge numbers of new names for Archaea and Bacteria. Manual creation of such names remains difficult and slow and typically relies on expert-driven nomenclatural quality control. Keen to ensure the legacy of Linnaeus lives on in the age of microbial genomics and metagenomics, we propose an automated approach, employing combinatorial concatenation of roots from Latin and Greek to create linguistically correct names for genera and species that can be used off the shelf as needed. As proof of principle, we document over a million new names for Bacteria and Archaea. We are confident that our approach provides a road map for how to create new names for decades to come.
ARTICLE | doi:10.20944/preprints202005.0140.v1
Subject: Biology, Other Keywords: Genome; fimbrial; plasmid; ST131; Escherichia coli; evolution; infection
Online: 8 May 2020 (09:39:35 CEST)
The human gut microbiome includes beneficial, commensal and pathogenic bacteria that possess antimicrobial resistance (AMR) genes and exchange these predominantly through conjugative plasmids. Escherichia coli is a significant component of the gastrointestinal microbiome and is typically non-pathogenic in this niche. In contrast, extra-intestinal pathogenic E. coli (ExPEC) including ST131 may occupy other environments like the urinary tract or bloodstream where they express genes enabling AMR and host adhesion like type 1 fimbriae. The extent to which non-pathogenic gut E. coli and infectious ST131 share AMR genes and key associated plasmids remains understudied at a genomic level. Here, we examined AMR gene sharing between gut E. coli and ST131 to discover an extensive shared preterm infant resistome. In addition, individual ST131 show extensive AMR gene diversity highlighting that analyses restricted to the core genome may be limiting and could miss AMR gene transfer patterns. We show that pEK499-like segments are ancestral to most ST131 Clade C isolates, contrasting with a minority with substantial pEK204-like regions encoding a type IV fimbriae operon. Moreover, ST131 possess extensive diversity at genes encoding type 1, type IV, P and F17-like fimbriae, particular within subclade C2. The type, structure and composition of AMR genes, plasmids and fimbriae varies widely in ST131 and this may mediate pathogenicity and infection outcomes.
REVIEW | doi:10.20944/preprints201911.0337.v1
Subject: Biology, Other Keywords: leishmania; visceral leishmaniasis; Americas; genome instability,; fitness gain
Online: 27 November 2019 (09:27:16 CET)
Pathogen fitness landscapes change when transmission cycles establish in non-native environments or spill over into new vectors and hosts. The introduction of Leishmania infantum in the Americas into the Neotropics during European colonization represents a unique case study to investigate mechanisms of ecological adaptation of this important parasite. Defining the evolutionary trajectories that drive L. infantum fitness in this new environment are of great public health importance as they will allow unique insight into pathways of host/pathogen co-evolution and their consequences for region-specific changes in disease manifestation. This review summarizes current knowledge on L. infantum genetic and phenotypic diversity in the Americas and its possible role in the unique epidemiology of VL in the New World. We highlight the importance of appreciating adaptive molecular mechanisms in L. infantum to understand the parasites’ successful establishment on the continent.
ARTICLE | doi:10.20944/preprints201811.0508.v1
Subject: Life Sciences, Other Keywords: Muller's ratchet; genome decay; ribosome; protein synthesis, rudiment
Online: 20 November 2018 (16:30:43 CET)
Microsporidia are fungi-like parasites that have the smallest known eukaryotic genome, and for that reason they are used as a model to study the phenomenon of genome decay in parasitic forms of life. Similar to other intracellular parasites that reproduce asexually in an environment with alleviated natural selection, Microsporidia experience continuous genome decay driven by Muller's ratchet - an evolutionary process of irreversible accumulation of deleterious mutations, which leads to gene loss and miniaturization of cellular components. Particularly, Microsporidia have remarkably small ribosomes in which the rRNA is reduced to the minimal enzymatic core. To better understand the impact of Muller's ratchet on RNA and protein molecules in parasitic organisms, particularly regarding their ribosome structure, we have explored an apparent effect of Muller's ratchet on microsporidian ribosomal proteins. Through mass spectrometry, analysis of microsporidian genome sequences and analysis of ribosome structure from non-parasitic eukaryotes, we found that massive rRNA reduction in microsporidian ribosomes appears to annihilate binding sites for ribosomal proteins eL8, eL27, and eS31, suggesting that these proteins are no longer bound to the ribosome in microsporidian species. We then provided an evidence that protein eS31 is retained in Microsporidia due to its non-ribosomal function in ubiquitin biogenesis. To sum up, our study illustrates that while Microsporidia carry the same set of ribosomal proteins as non-parasitic eukaryotes, some of ribosomal proteins are no longer participating in protein synthesis in Microsporidia and they are preserved from genome decay by having extra-ribosomal functions.
ARTICLE | doi:10.20944/preprints201810.0054.v1
Subject: Life Sciences, Virology Keywords: Bovine enterovirus, EV-E, Nigeria, Sewage, Complete Genome
Online: 3 October 2018 (14:24:49 CEST)
We describe the draft genome of a Bovine enterovirus (EV) recovered from sewage in Nigeria. The virus replicates on both RD and L20B cell lines, but is negative for all EV screens in use by the GPEI. It contains 7,368nt, with 50.2% G+C content and an ORF with 6,525nt (2,174aa).
ARTICLE | doi:10.20944/preprints201804.0326.v1
Subject: Biology, Horticulture Keywords: DNA markers; edible mushroom; genome stability; protoplast regeneration
Online: 25 April 2018 (08:26:25 CEST)
A total of five protoclones were successfully cultured on PDA medium out of regenerated twenty two colonies of Termitomyces protoplast and further studied. Liquid MYG grown mycelial tissue is used for protoplast isolation by enzymatic digestionin a mixture containing Lysing enzyme 2% and Cellulase R10 2% in 0.6 M mannitol. The incubation conditions like temperature, shaking and time were standardized at 24ºC, 60 rpm and 10 hours, respectively for healthy protoplasts liberation. The purified protoplasts showed an average yield of 1.2 × 107 cells/gm tissue with 31.60 ± 9.31% regeneration efficiency on specific medium and 77.12 ± 2.72% viability by FDA test. Four ISSR primers were used in this study resulting a total of 27 reproducible bands with mean value of 6.75. They showed similar banding pattern in all the lines with zero percent polymorphism ranged from 280 bp–2700 bp. The amplified rRNA-ITS gene showed ~600 bp size in gel and found a single restriction site for enzyme HaeIII in all the protoclones and parent with similar fragment size in all.
ARTICLE | doi:10.20944/preprints201703.0182.v1
Subject: Biology, Entomology Keywords: Lauxanioidea; Cyclorrhapha; mitochondrial genome; phylogeny; RNAs; intergenic sequences
Online: 24 March 2017 (08:03:42 CET)
The superfamily Lauxanioidea is a significant dipteran clade including over 2500 known species in three families: Lauxaniidae, Celyphidae and Chamaemyiidae. We sequenced the first five (three complete and two partial) lauxanioid mitochondrial (mt) genomes, and used them to reconstruct the phylogeny of this group. The lauxanioid mt genomes are typical of the Diptera, containing all 37 genes usually present in bilaterian animals. A total of three conserved intergenic sequences have been reported across the Cyclorrhapha. The inferred secondary structure of 22 tRNAs suggested five substitution patterns among the Cyclorrhapha. The control region in the Lauxanioidea has apparently evolved very fast, but four conserved structural elements were detected in all three complete mt genome sequences. Phylogenetic relationships based on the mt genome data were inferred by Maximum Likelihood and Bayesian methods. The traditional relationships between families within the Lauxanioidea, (Chamaemyiidae + (Lauxaniidae + Celyphidae)), was corroborated, however, the higher level relationships between cyclorrhaphan superfamilies are mostly poorly supported.
ARTICLE | doi:10.20944/preprints202208.0057.v1
Subject: Biology, Animal Sciences & Zoology Keywords: infectious bronchitis; viral evolution; whole genome sequencing; DMV; QX.
Online: 2 August 2022 (09:27:23 CEST)
Infectious bronchitis virus (IBV) is a highly variable RNA virus that affects chickens worldwide. Due to its inherited tendency to suffer point mutations and recombination events during viral replication, emergent IBV strains have been linked to nephropathogenic and reproductive disease that are more severe than the typical respiratory disease, leading, in some cases, to mortality, severe production losses, and/or unsuccessful vaccination. QX and DMV/1639 strains are examples of the above-mentioned IBV evolutionary pathway and clinical outcome. In this study, our purpose was to systematically compare whole genomes of QX and DMV strains looking at each IBV gene individually. Phylogenetic analyses and amino acid site searches were performed in datasets obtained from GenBank accounting for all IBV genes and using our own relevant sequences as a basis. The QX dataset studied is more genetically diverse than the DMV dataset, partially due to the greater epidemiological diversity within the five QX strains used as a basis compared to the four DMV strains from our study. Historically, QX strains have emerged and spread earlier than DMV strains in Europe and Asia. Consequently, there are more QX sequences deposited in GenBank than DMV strains, assisting in the identification of a larger pool of QX strains. It is likely that a similar evolutionary pattern will be observed among DMV strains as they develop and spread in North America.
ARTICLE | doi:10.20944/preprints202206.0298.v1
Subject: Biology, Ecology Keywords: cyanosphere; cyanobacteria; Cyanocohniella; Llayta; macrocolonies; metagenomic-assembled genome; microbiome
Online: 21 June 2022 (16:11:44 CEST)
Cyanobacterial macrocolonies known as Llayta are found at Andean wetlands and consumed since pre-Columbian times in South America. Macrocolonies of filamentous cyanobacteria are niches for colonization by other microorganisms; however, the microbiome of edible Llayta has not been explored. Based on a culture-independent approach, we report the presence, identification and metagenomic genome reconstruction of Cyanocohniella sp. LLY associated to Llayta trichomes. The assembled genome of strain LLY is now available for further inquiries, and may be instrumental for taxonomic advances on this genus. All known members of the Cyanocohniella genus have been isolated from salty European habitats. A biogeographic gap for the Cyanocohniella genus is partially filled by the existence of strain LLY at Andes Mountains wetlands in South America as a new habitat. This is the first genome available for members of this genus. Genes involved in primary and secondary metabolism are described providing new insights on the putative metabolic capabilities of Cyanocohniella sp. LLY. The reconstructed genome of strain LLY is now available and instrumental for further inquiries and taxonomic advances on the genus Cyanocohniella.
ARTICLE | doi:10.20944/preprints202112.0354.v1
Subject: Life Sciences, Genetics Keywords: whole genome sequencing; cancer predisposition; mucin; reactive oxygen species
Online: 22 December 2021 (11:44:20 CET)
Familial colorectal cancer (CRC) is only partially explained by known germline predisposing genes. We performed whole genome sequencing in 15 Polish families of many affected individuals, without mutations in known CRC predisposing genes. We focused on loss-of-function variants and functionally characterized them. We identified a frameshift variant in the CYBA gene (c.246delC) in one family and a splice site variant in the TRPM4 gene (c.25-1 G>T) in another family. While both variants were absent or extremely rare in gene variant databases, we identified four additional Polish familial CRC cases and two healthy elderly individuals with the CYBA variant (odds ratio 2.46, 95% confidence interval 0.48-12.69). Both variants led to a premature stop codon and to a truncated protein. Functional characterization of the variants showed that knockdown of CYBA or TRPM4 depressed generation of reactive oxygen species (ROS) in LS174T and HT-29 cell lines. Knockdown of TRPM4 resulted in decreased MUC2 protein production. CYBA encodes a component in the NADPH oxidase system which generates ROS and controls, e.g., bacterial colonization in the gut. Germline CYBA variants are associated with early onset inflammatory bowel disease, supported with experimental evidence on loss of intestinal mucus barrier function due to ROS deficiency. TRPM4 encodes a calcium-activated ion channel, which in a human colonic cancer cell line controls calcium-mediated secretion of MUC2, a major component of intestinal mucus barrier. We suggest that the gene defects in CYBA and TRPM4 mechanistically involve intestinal barrier integrity through ROS and mucus biology, which converges in chronic bowel inflammation.
REVIEW | doi:10.20944/preprints202111.0385.v1
Subject: Life Sciences, Virology Keywords: n/aRNA genome; Viruses; host-viruses interactions; RNA world
Online: 22 November 2021 (11:43:15 CET)
In recent years, the role of non-coding RNAs (ncRNAs) in regulating cell physiology has begun to be better understood. Recent discoveries in viral molecular biology have revealed that such cellular functions are disturbed during viral infections mainly due to host cell ncRNAs, cellular factors, and virus-derived ncRNAs. Apart from the interplay between those molecules, other interactions derive from the specific folding of RNA virus genomes. These fulfill canonical regulation functions such as replication, translation, and viral packaging. In some cases, folds serve as precursors of small viral RNAs whose biogenesis is not yet clearly understood. Since ncRNAs and RNA viral genomes modulate complex molecular and cellular processes in viral infections, a new taxonomy is being proposed here overarching three main categories, considering the current information about ncRNA interactions in some well-known viral infections. The first category shows examples of host ncRNAs associated with the trigger of the immune response under viral infections. The second category describes interactions between the virus and host ncRNAs. The last category shows how the shape of the RNA viral genome is essential in processing RNAs derived from viruses. Finally, we introduce evidence of how these three categories can also work as a framework in order to organize known interactions of ncRNAs and cellular factors under DENV infection. This new taxonomy of interactions provides a comprehensive framework for organizing the ncRNA regulatory roles in the context of viral interactions and an RNA world.
ARTICLE | doi:10.20944/preprints202103.0121.v1
Subject: Life Sciences, Biochemistry Keywords: Familial colorectal cancer; SRC; germline variant; whole genome sequencing
Online: 3 March 2021 (09:52:06 CET)
Colorectal cancer (CRC) shows one of the largest proportions of familial cases among different malignancies, but only 5-10% of all CRC cases are linked to mutations in established predisposition genes. Thus, familial CRC constitutes a promising target for the identification of novel, high- to moderate-penetrance germline variants underlying cancer susceptibility by next generation sequencing. In this study, we performed whole genome sequencing on 3 members of a family with CRC aggregation. Subsequent integrative in silico analysis using our in-house developed variant prioritization pipeline resulted in the identification of a novel germline missense variant in SRC gene (V177M), a proto-oncogene highly upregulated in CRC. Functional validation experiments in HT-29 cells showed that introduction of SRCV177M resulted in increased cell proliferation and enhanced protein expression of phospho-SRC (Y419), a potential marker for SRC activity. Upregulation of paxillin, β-Catenin and STAT3 mRNA levels, increased levels of phospho-ERK, CREB and CCND1 proteins and downregulation of the tumor suppressor p53 further proposed the activation of several pathways due to the SRCV177M variant. The findings of our pedigree-based study contribute to the exploration of the genetic background of familial CRC and bring insights into the molecular basis of upregulated SRC activity and downstream pathways in colorectal carcinogenesis.
REVIEW | doi:10.20944/preprints202009.0604.v1
Subject: Life Sciences, Biochemistry Keywords: Nucleus; Nuclear envelope; Lamins; Genome organization; Chromatin; Gene expression
Online: 25 September 2020 (11:03:59 CEST)
Nuclear lamins are type V intermediate filament proteins that form a filamentous meshwork beneath the inner nuclear membrane. Additionally, a sub-population of A-type and B-type lamins is localized in the nuclear interior. The nuclear lamina protects the nucleus from mechanical stress and mediates nucleo-cytoskeletal coupling. Lamins form a scaffold that partially tethers chromatin at the nuclear envelope. The nuclear lamina also stabilizes protein-protein interactions involved in gene regulation and DNA repair. The lamin-based protein sub-complexes are implicated in both nuclear and cytoskeletal organization, the mechanical stability of the nucleus, genome organization, transcriptional regulation, genome stability, and cellular differentiation. Here we review recent research in the field of nuclear lamins and their role in modulating various nuclear processes and their impact on cell function.
REVIEW | doi:10.20944/preprints202009.0279.v1
Subject: Biology, Other Keywords: selection; mutation; genetic drift; adaptation; ploidy drive; genome instability
Online: 13 September 2020 (11:48:30 CEST)
Ploidy is a significant type of genetic variation, describing the number of chromosome sets per cell. Ploidy evolves in natural populations, clinical populations, and lab experiments, particularly in fungi. Despite a long history of theoretical work on this topic, predicting how ploidy will evolve has proven difficult, as it is often unclear why one ploidy state outperforms another. Here, we review what is known about contemporary ploidy evolution in diverse fungal species through the lens of population genetics. As with typical genetic variants, ploidy evolution depends on the rate that new ploidy states arise by mutation, natural selection on alternative ploidy states, and random genetic drift. However, ploidy variation also has unique impacts on evolution, with the potential to alter chromosomal stability, the rate and patterns of point mutation, and the nature of selection on all loci in the genome. We discuss how ploidy evolution depends on these general and unique factors and highlight areas where additional experimental evidence is required to comprehensively explain the ploidy transitions observed in the field and the lab.
Subject: Keywords: tetraodon palembangensis; chromosome-level genome; genomic annotation; gene family
Online: 31 August 2020 (04:28:47 CEST)
The humpback puffer, Tetraodon palembangensis, also known as Pao palembangensis, is a species of poisonous freshwater pufferfish mainly distributed in Southeast Asia (Thailand, Laos, Malaysia and Indonesia). Despite interesting biological features, such as its very inactive nature, tetrodotoxin production and body expansion mechanisms, molecular research on the humpback puffer is still rare because of the lack of a high-quality reference genome. Here, we reported a first chromosome-level genome assembly of an adult humpback puffer, of which the genome size is 362 Mb with ~1.78 Mb contig N50 and ~15.8 Mb scaffold N50s. Based on the genome, ~61.5Mb (18.11%) repeat sequences were also identified, and totally 19,925 genes were annotated, 99.20% of which could be predicted with function using protein-coding function databases. Finally, a phylogenetic tree was constructed with single-copy gene families from ten teleost fishes. The humpback puffer genome will be a valuable genomic resource to illustrate possible mechanisms of tetrodotoxin synthesis and tolerance, providing clues for future detailed studies of biological toxins.
ARTICLE | doi:10.20944/preprints202007.0251.v1
Subject: Life Sciences, Molecular Biology Keywords: SARS-CoV-2; COVID-19; Spike protein; Mutant; Genome
Online: 12 July 2020 (12:03:16 CEST)
The severity of coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), greatly varies from patient to patient. In the present study, we explored and compared mutation profiles of SARS-CoV-2 isolated from mildly affected and severely affected COVID-19 patients in order to explore any relationship between mutation profile and disease severity. Genomic sequences of SARS-CoV-2 were downloaded from GISAID database. With the help of Genome Detective Coronavirus Typing Tool, genomic sequences were aligned with the Wuhan seafood market pneumonia virus reference sequence and all the mutations were identified. Distribution of mutant variants was then compared between mildly and severely affected groups. Among the numerous mutations detected, 14,408C>T and 23,403A>G mutations resulting in RNA-dependent RNA polymerase (RdRp) P323L and spike protein D614G mutations, respectively, were found predominantly in severely affected group (>82%) compared with mildly affected group (<46%, p<0.001). The 241C>T mutation in the non-coding region of the genome was also found predominantly in severely affected group. The 3,037C>T, a silent mutation, also appeared in relatively high frequency in severely affected group. We concluded that RdRp P323L and spike protein D614G mutations predominate in severely affected COVID-19 patients. Further studies will be required to explore whether these mutations have any impact on the severity of COVID-19.
ARTICLE | doi:10.20944/preprints201912.0024.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: wheat variety bn207; genome composition; fish; snp; chromosomal variations
Online: 3 December 2019 (11:49:52 CET)
Development and deployment of wheat varieties with high yields, wide adaptability, good quality, multiple-resistance to abiotic and biotic stresses, and efficient response to fertilizers have greatly contributed to global wheat sustainable production. The genomic composition of key commercial wheat variety can help understand the genetic basis underlying the development of new variety and permit increased breeding efficiency. In this study, we report the chromosomal and genomic compositions of BN207, presently the leading wheat variety in the southern region of Huang-Huai River Valley, the most important wheat producing area in China through an integrated analysis using fluorescent in situ hybridization (FISH) and wheat 15 K SNP array. Our results showed that BN207 inherited 55.3% and 40.7% of its genome from its male parent BN64 and female parent ZM16, respectively, and generating 64 novel or recombined loci. Besides, we detected nine chromosomal variations in Bn207 and its parents and ten sister lines, and physically mapped two variations, the pericentric inversion of chromosome 6B, and large tandem repeat sequence block at the long arm of 5A, both had positive effects on agronomic traits, by integration of FISH and SNP loci recombination analyses. These results will provide a reference for breeding of high yield wheat varieties as BN207, and the application of founder parents BN64 and ZM16, which are being utilized frequently in wheat breeding programs in Henan Province and surrounding areas.
ARTICLE | doi:10.20944/preprints201807.0380.v1
Subject: Engineering, Electrical & Electronic Engineering Keywords: Ultra-Wide Band; wireless sensor networks; monitoring; warning system; ground instability; landslide; Time Of Flight, Two-way ranging.
Online: 20 July 2018 (11:56:07 CEST)
An innovative wireless sensor network (WSN) based on Ultra-Wide Band (UWB) technology for 3D accurate superficial monitoring of ground deformations, as landslides and subsidence, is proposed. The system has been designed and developed as part of an European Life+ project, called Wi-GIM (Wireless Sensor Network for Ground Instability Monitoring). The details of the architecture, the localization via wireless technology and data processing protocols are described. The flexibility and accuracy achieved by the UWB two-way ranging technique is analysed and compared with the traditional systems, such as robotic total stations (RTSs), Ground-based Interferometric Synthetic Aperture Radar (GB-InSAR), highlighting the pros and cons of the UWB solution to detect the surface movements. An extensive field trial campaign allows the validation of the system and the analysis of its sensitivity to different factors (e.g., sensor nodes inter-visibility, effects of the temperature, etc.). The Wi-GIM system represents a promising solution for landslide monitoring and it can be adopted in conjunction with traditional systems or as an alternative in areas where the available resources are inadequate. The versatility, easy/fast deployment and cost-effectiveness, together with the good accuracy, make the Wi-GIM system a possible solution for municipalities that cannot afford expensive/complex systems to monitor potential landslides in their territory.
ARTICLE | doi:10.20944/preprints202209.0004.v1
Subject: Biology, Other Keywords: phage Rih21; MRSA; novel bacteriophage; S. aureus; bacteriophage; genome analysis
Online: 1 September 2022 (04:11:06 CEST)
From the hospital waste-water, a novel bacteriophage was isolated and characterized. According to characterization properties, this bacteriophage belongs to the Siphoviridae family, the maximum bacteriophage titer was recorded at 37°C and a pH of 7.2, had a 44,789 bp linear double-strand DNA genome, and within the genome sequence, there are 61 genes, all of which are encoded into proteins. Although this bacteriophage does not have any virulence factors or antimicrobial resistance genes and had specific lytic activity against some antimicrobial resistance S. aureus clinical isolates.
ARTICLE | doi:10.20944/preprints202204.0005.v1
Subject: Medicine & Pharmacology, Other Keywords: genome mining; marine environments; molecular networking; bacterial extremophiles; secondary metabolites
Online: 1 April 2022 (10:21:11 CEST)
Understanding extremophiles and their usefulness in biotechnology involves studying their habitat, physiology and biochemical adaptations , as well as their ability to produce biocatalysts, in environments that are still poorly explored. In northwestern Peru, which saline lagoons of marine origin Pacific Ocean, the other site from the coast of Brazil of the Atlantic Ocean. Both environments are considered extreme. The objective of the present work was to compare two different strains isolated from these extreme environments at the metabolic level using molecular network methodology through the Global Natural Products Molecular Social Network (GNPS). In our study, the MS/MS spectra from the network were compared with GNPS spectral libraries, where the metabolites were annotated. Differences were observed in the molecular network presented in the two strains of Streptomyces spp. coming from these two different environments. Within the annotated compounds from marine bacteria, the metabolites characterized for Streptomyces sp. B-81 from Peruvian marshes were lobophorins A (1) and H (2), as well as divergolides A (3), B (4) and C (5). Streptomyces sp. 796.1 produced different compounds, such as glucopiericidin A (6) and dehydro-piericidin A1a (7). The search for new metabolites in underexplored environments may therefore reveal new metabolites with potential application in different areas of biotechnology.
BRIEF REPORT | doi:10.20944/preprints202201.0057.v1
Subject: Life Sciences, Virology Keywords: Dengue virus; complete genome; Cosmopolitan genotype; Senegal; 2018; Regional diversification
Online: 6 January 2022 (09:56:19 CET)
To assess the genetic diversity of circulating dengue virus 2 in Senegal in 2018 we performed molecular characterization by complete genome sequencing and performing phylogenetic analysis. Sequenced strains belong to Cosmopolitan genotype of DENV-2 we observed intra-genotype variability leading to a divergence in two clades with differential geographic distribution. We report two variants namely; the “Northern variant” harbouring three nonsynonymous mutations (V1183M, R1405K, P2266T) located respectively on NS2A, NS2B and NS4A and the “Western variant” with two nonsynonymous mutations (V1185E, V3214E) located respectively in the NS2A gene and the NS5 gene. Findings calls for in depth in vitro and functional study to elucidate the impact of observed mutations on viral fitness, spread, epidemiology and disease outcome.
ARTICLE | doi:10.20944/preprints202111.0557.v1
Subject: Biology, Other Keywords: Bacterial nomenclature; archaeal nomenclature; genome taxonomy; shotgun metagenomics; Candidatus names
Online: 30 November 2021 (10:53:50 CET)
Thousands of new bacterial and archaeal species and higher-level taxa are discovered each year through the analysis of genomes and metagenomes. The Genome Taxonomy Database (GTDB) provides hierarchical sequence-based descriptions and classifications for new and as-yet-unnamed taxa. However, bacterial nomenclature, as currently configured, cannot keep up with the need for new well-formed names. Instead, microbiologists have been forced to use hard-to-remember alphanumeric placeholder labels. Here, we exploit an approach to the generation of well-formed arbitrary Latinate names at a scale sufficient to name tens of thousands of unnamed taxa within GTDB. These newly created names represent an important resource for the microbiology community, facilitating communication between bioinformaticians, microbiologists and taxonomists, while populating the emerging landscape of microbial taxonomic and functional discovery with accessible and memorable linguistic labels.
ARTICLE | doi:10.20944/preprints202111.0517.v1
Subject: Biology, Other Keywords: Rhodotorula babjevae; de-novo hybrid assembly; Nanopore sequencing; genome divergence
Online: 29 November 2021 (07:57:39 CET)
The genus Rhodotorula includes basidiomycetous oleaginous yeast species. R. babjevae can produce compounds of biotechnological interest such as lipids, carotenoids and biosurfactants from low value substrates such as lignocellulose hydrolysate. High-quality genome assemblies are needed to develop genetic tools and to understand fungal evolution and genetics. Here, we combined short- and long-read sequencing to resolve the genomes of two R. babjevae strains, CBS 7808 (type strain) and DBVPG 8058 at chromosomal level. Both genomes have a size of 21 Mbp and a GC content of 68.2%. Allele frequency analysis indicated tetraploidy in both strains. They harbor 21 putative chromosomes with sizes ranging from 0.4 to 2.4 Mb. In both assemblies, the mitochondrial genome was recovered in a single contig, which shared 97% pairwise identity. The pairwise identity between the majority of chromosomes ranges from 82% to 87%. We found indications for strain-specific extrachromosomal endogenous DNA. 7,591 protein-coding genes and 7,607 associated transcripts were annotated in CBS 7808 and 7,481 protein-coding genes and 7,516 associated transcripts in DBVPG 8058. CBS 7808 has accumulated a higher number of tandem duplications than DBVPG 8058. We identified large translocation events between putative chromosomes and a high genetic divergence between the two strains.
ARTICLE | doi:10.20944/preprints202110.0093.v1
Subject: Life Sciences, Genetics Keywords: genome, DNA, alphabet, matrices, tensor product, quantum informatics, stochastic resonance.
Online: 5 October 2021 (16:25:34 CEST)
The article is devoted to the new results of the author, which add his previously published ones, of studying hidden rules and symmetries in structures of long single-stranded DNA sequences in eukaryotic and prokaryotic genomes. The author uses the existence of different alphabets of n-plets in DNA: the alphabet of 4 nucleotides, the alphabet of 16 douplets, the alphabet of 64 triplets, etc. Each of such DNA alphabets of n-plets can serve for constructing a text as a chain of these n-plets. Using this possibility, the author represents any long DNA nucleotide sequence as a bunch of many so-called n-texts, each of which is written on the basis of one of these alphabets of n-plets. Each of such n-texts has its individual percents of different n-plets in its genomic DNA. But it turns out that in such multi-alphabetical or multilayer presentation of each of many genomic DNA, analyzed by the author, universal rules of probabilities and symmetry exist in interrelations of its different n-texts regarding their percents of n-plets. In this study, the tensor product of matrices and vectors is used as an effective analytical tool borrowed from the arsenal of quantum mechanics. Some additions to the topic of algebra-holographic principles in genetics are also presented. Taking into account the described genomic rules of probability, the author puts also forward a concept of the important role of stochastic resonances in genetic informatics.
ARTICLE | doi:10.20944/preprints202102.0604.v1
Subject: Life Sciences, Biochemistry Keywords: West Nile Virus; outbreak; meningoencephalitis; epidemiology; phylogeny; whole genome sequencing
Online: 26 February 2021 (09:46:38 CET)
During the last decades West Nile Virus (WNV) outbreaks have continuously occurred in the Mediterranean area. In August 2020 a new WNV outbreak affected 71 people with meningoencephalitis in Andalusia and 6 more cases in Extremadura (south-west of Spain), causing a total of eight deaths. The whole genomes of four viral isolates were obtained and phylogenetically analyzed in the context of recent outbreaks. The Andalusian viral samples belonged to the lineage 1 and were relatively similar to previous outbreaks occurred in the Mediterranean region. Here we present a detailed analysis of the outbreak, including an extensive phylogenetic study.
REVIEW | doi:10.20944/preprints202101.0110.v2
Subject: Biology, Anatomy & Morphology Keywords: Amphiploidy; Disomic Polyploidy; Plant Genome Evolution; Neo-polyploidy; Polysomic Polyploidy
Online: 23 February 2021 (14:25:28 CET)
Polyploidy means having more than two basic sets of chromosomes. Polyploid plants may be artificially obtained through chemical, physical and biological (2n gametes) methods. This approach allows an increased gene scope and expression, thus resulting in phenotypic changes such as yield and product quality. Nonetheless, breeding new cultivars through induced polyploidy should overcome deleterious effects that are partly contributed by genome and epigenome instability after polyploidization. Furthermore, shortening the time required from early chromosome set doubling to the final selection of high yielding superior polyploids is a must. Despite these hurdles, plant breeders have successfully obtained polyploid bred-germplasm in broad range of forages after optimizing methods, concentration and time, particularly when using colchicine. These experimental polyploids are a valuable tool for understanding gene expression, which seems to be driven by dosage dependent gene expression, altered gene regulation and epigenetic changes. Isozymes and DNA-based markers facilitated the identification of rare alleles for particular loci when compared with diploids, and also explained their heterozygosity, phenotypic plasticity and adaptability to diverse environments. Experimentally induced polyploid germplasm could enhance fresh herbage yield and quality, e.g. leaf protein content, leaf total soluble solids, water soluble carbohydrates and sucrose content. Offspring of experimentally obtained hybrids should undergo selection for several generations to improve their performance and stability.
ARTICLE | doi:10.20944/preprints202012.0421.v1
Online: 17 December 2020 (09:13:29 CET)
Whole genome pooled sequence data of 12 Pakistani Teddy goats is analyzed for positive selection signatures as their breed defining characteristics. Selection imprints left in the Teddy genome are unveiled by genomic differentiation after the successful paired-end alignment of 635,357,043 reads with (ARS1) reference genome assembly. Pooled-heterozygosity ( ) and Tajima’s D (TD) are applied for validation and getting better hits of selection signals, while pairwise FST statistics is conducted on Teddy vs. Bezoar (wild goat ancestor) for genomic differentiation. Annotation of regions under positive selection reveals 59 genes underlying production and adaptive traits. score ≥ 5 detected six windows having highest scores on Chr. 29, 9, 25, 15 and 14 that harbor HRASLS5, LACE1 and AXIN1 genes which are candidate for embryonic development, lactation and body height. Secondly, TD value of ≤ -2.2 showed 4 windows with very strong hits on Chr.5 & 9 harbor STIM1 and ADM genes related to body mass and weight. Lastly, FST analysis generated three strong signals with threshold ≤ 0.42 on Chr.12 & 5 harbor ITGB1 gene associated with milk production & lactation traits. Other significant selection signatures encompass genes associated with wool production, prolificacy, immunity and coat colors. In brief, this study identified the genes under selection in this Pakistani goat breed that will be helpful to refining future breeding policies and converging required productive traits within and across other goat breeds and to explore full genetic potential of this valued livestock species.
CONCEPT PAPER | doi:10.20944/preprints202010.0368.v1
Subject: Life Sciences, Biochemistry Keywords: proteoform; human genome project; proteomics; post-translational modification; human proteome
Online: 19 October 2020 (10:49:39 CEST)
Proteins are the primary effectors of function in biology, and thus complete knowledge of their structure and properties is fundamental to deciphering function in basic and translational research. The chemical diversity of proteins is expressed in their many proteoforms, which result from combinations of genetic polymorphisms, RNA splice variants and post-translational modifications. This knowledge is foundational for the biological complexes and networks that control biology, yet remains largely unknown. We propose here an ambitious initiative to define the human proteome; that is to generate a definitive reference set of the proteoforms produced from the genome. Several examples of the power and importance of proteoform-level knowledge in disease-based research are presented, along with a call for improved technologies in a two-pronged strategy to accomplish the Human Proteoform Project.
REVIEW | doi:10.20944/preprints202006.0086.v2
Subject: Life Sciences, Virology Keywords: SARS-CoV-2; Genome organisation and expression; Polyproteins; Prevention strategies
Online: 14 June 2020 (16:49:10 CEST)
COVID-19 manifests regarding extreme acute respiratory conditions caused by a novel beta coronavirus (SARS-CoV-2) which is reported to be the seventh coronavirus to infect humans. Like other SARS-CoVs it has a large positive-stranded RNA genome. But specific furin site in the spike protein, mutation prone and phylogenetically mess Orf1ab separates SARS-CoV-2 from other RNA viruses. Since, the outbreak (February - March 2020) which originated in China, researchers, scientists, and medical professionals are inspecting all possible facts from every possible aspect including its replication, detection, and prevention strategies. This led to the prompt identification of its basic biology, genome characterization, structural based functional information of proteins, and strategies to prevent its spread. Due to the rapid mutation rate, the functional characterization of a few proteins is still lagging. This review summarizes the recent updates on the basic molecular biology of SARS-CoV-2 and prevention strategies undertaken worldwide to tackle COVID-19. This recent information can be implemented for the development and designing of therapeutics against SARS-CoV-2.
ARTICLE | doi:10.20944/preprints202006.0089.v1
Subject: Life Sciences, Genetics Keywords: Wuhua yellow chicken; whole genome resequencing; heritable variation; selection signal
Online: 7 June 2020 (14:42:23 CEST)
Chickens have extensive phenotypic variation. The Wuhua yellow chicken (WHYC) is an important traditional yellow-feathered chicken in China, characterized by white tail feathers, white flight feathers, and strong disease resistance. However, the genomic basis of traits associated with WHYC is still poorly understood. In this study, whole genome resequencing was performed with an average coverage of 20.77-fold to investigate heritable variation and identify selection signals in WHYC. Reads were mapped onto the chicken reference genome (Galgal5) with a coverage of 85.95%. After quality control, 11,953,471 SNPs and 1,069,574 InDels were obtained. In addition, 41,408 structural variants and 33,278 copy number variants were found. A comparative genomic analysis of WHYC and other yellow-feathered chicken showed that selected regions were enriched in genes involved in transport and catabolism, immune system, infectious diseases, signal transduction, and signaling molecules and interaction. Several genes associated with disease resistance were identified, including IFNA, IFNB, CD86, IL18, IL11RA, VEGFC, and ATG10. Furthermore, PMEL and TYRP1 may contribute to the coloring of white feathers in WHYC. These findings improve our understanding of the genetic characteristics of WHYC and may contribute to future breed improvement.
Subject: Medicine & Pharmacology, General Medical Research Keywords: COVI-19; SARS-CoV-2; virus; mutation; polymorphism; genome sequence
Online: 21 May 2020 (04:09:53 CEST)
Background: SARS-CoV-2 infection has spread to over 200 countries since it was first reported in December of 2019. Significant country-specific variations in infection and mortality rate have been noted. Although country-specific differences in public health response have had a large impact on infection rate control, it is currently unclear as to whether evolution of the virus itself has also contributed to variations in infection and mortality rate. Previous studies on SARS-CoV-2 mutations were based on the analysis of ~ 160 SARS-CoV-2 sequences available until mid-February 2020.2, 3, 4, 5 By mid-April, > 550 SARS-CoV-2 sequences had been deposited in GenBank, and over 8,200 in the GISAID database. Methods: We performed a sequence analysis on 474 SARS-CoV-2 genomes submitted to GenBank up to April 11, 2020 by multiple alignment using Map to a Reference Assembly and Variants/SNP identification. The results were verified on a larger scale, 8,126 hCoV-19 (SARS-CoV-2) sequences from GISAID database. Results: We identified 5 recently emerged mutations in many isolates (up to 40%). Our analysis highlights 5 frequent new mutations that have emerged since late February 2020. These mutations are: one each missense (non-synonymous) mutation in orf1ab (C1059T), orf3 (G25563T) and orf8 (C27964T), one in 5’UTR (C241T), one in a non-coding region (G29553A). The final mutation (G29553A) was found to be almost exclusive to the US isolates. The first 3 mutations are non-synonymous, leading to amino acid substitutions in the viral protein sequence. Except for C241T, all the novel mutations identified are absent in the isolates from Italy and Spain in the SARS-CoV-2 genomes deposited in GenBank and GISAID by April 13, 2020. Conclusion: The results of current study indicate that new mutations are emerging as COVID-19 pandemic are spreading to different countries and that geography specific mutants may exist. The findings of current study lay the foundation for further investigation into the impact of SARS-CoV-2 mutations on disease incidence, severity, and host immune response. In addition, it may also provide insights into vaccine development and serological response detection for the virus.
REVIEW | doi:10.20944/preprints202004.0359.v1
Subject: Life Sciences, Microbiology Keywords: SARS-CoV-2; genetic diversity; genome evolution; diagnostics; therapeutics; vaccines
Online: 20 April 2020 (02:33:15 CEST)
A novel coronavirus COVID-19 was first emerged in Wuhan city of Hubei Province in China in December 2019. The COVID-19, since then spreads to 213 countries and territories, and has become a pandemic. Genomic analyses have indicated that the virus, popularly named as corona, originated through a natural process and is probably not a purposefully manipulated laboratory construct. However, currently available data are not sufficient to precisely conclude the origin of this fearsome virus. Genome-wide annotation of thousands of genomes revealed that more than 1,407 nucleotide mutations and 722 amino acids replacements occurred at different positions of the SARS-CoV-2. The spike (S) glycoprotein of SARS-CoV-2 possesses a functional polybasic (furin) cleavage site at the S1-S2 boundary through the insertion of 12 nucleotides. It leads to the predicted acquisition of 3-O-linked glycan around the cleavage site. Although real-time RT-PCR methods targeting specific gene(s) have widely been used to diagnose the COVID-19 patients, however, recently developed more convenient, rapid, and specific diagnostic tools targeting IgM/IgG or newly developed plug and play methods should be available for resource-poor developing countries. Some drugs, vaccines and therapies have shown great promise in early trials, however, these candidates of preventive or therapeutic agents have to pass a long path of trials before being released for the practical application against COVID-19. This review updates current knowledge on origin, genomic evolution, development of the diagnostic tools and the preventive or therapeutic remedies of the COVID-19, and discusses on scopes for further research and effective management and surveillance of COVID-19.
REVIEW | doi:10.20944/preprints201911.0076.v3
Subject: Life Sciences, Genetics Keywords: phase separation; nuclear bodies; self-assembly; genome organization; gene expression
Online: 11 December 2019 (11:17:34 CET)
The importance of genome organization at the supranucleosomal scale in the control of gene expression is increasingly recognized today. In mammals, Topologically Associating Domains (TADs) and the active / inactive chromosomal compartments are two of the main nuclear structures that contribute to this organization level. However, recent works reviewed here indicate that, at specific loci, chromatin interactions with nuclear bodies could also be crucial to regulate genome functions, in particular transcription. They moreover suggest that these nuclear bodies are membrane-less organelles dynamically self-assembled and disassembled through mechanisms of phase separation. We have recently developed a novel genome-wide experimental method, High-salt Recovered Sequences sequencing (HRS-seq), which allows the identification of chromatin regions associated with large ribonucleoprotein (RNP) complexes and nuclear bodies. We argue that the physical nature of such RNP complexes and nuclear bodies appears to be central in their ability to promote efficient interactions between distant genomic regions. The development of novel experimental approaches, including our HRS-seq method, is opening new avenues to understand how self-assembly of phase separated nuclear bodies possibly contributes to mammalian genome organization and gene expression.
ARTICLE | doi:10.20944/preprints201809.0378.v1
Subject: Biology, Other Keywords: enterobacteriaceae; antibiotics; beta-lactamases; beta-lactam resistome; whole genome sequencing
Online: 19 September 2018 (09:47:42 CEST)
Beta-lactam resistant bacteria, commonly resident in tertiary hospitals, have emerged as a worldwide health problem because of ready-to-eat vegetable intake. We aimed to characterize the genes providing resistance to beta-lactam antibiotics in Enterobacteriaceae, isolated from five commercial salad brands for human consumption in Mexico City. 25 samples were collected, grow in blood agar plates, the bacteria were biochemistry identified and antimicrobial susceptibility testing was done, the carried family genes were identified by endpoint PCR and the specific genes were confirmed with WGS by NGS. 12 positive cultures were identified and their microbiological distribution was as follows, 8.3% for Enterobacter aerogene (n=1), 8.3% for Serratia fonricola (n=1), 16.7% for Serratia marcesens (n=2), 16.7% for Klebsiella pneumoniae (n=2), and 50% (n=6) for Enterobacter cloacae. The endpoint PCR results showed 11 colonies positive for blaBIL (91.7%), 11 for blaSHV (91.7%), 11 for blaCTX (97.7%), 12 for blaDHA (100%),4 for blaVIM (33.3%), 2 for blaOXA (16.7%), 2 for blaIMP (16.7%), 1 for blaKPC (8.3%) and 1 for blaTEM (8.3%) gene, all samples were negative blaROB, blaCMY, blaP, blaCFX and blaLAP gene. The sequencing analysis revels a specific genotypes for Enterobacter cloacae (blaSHV-12, blaCTX-M-15, blaDHA-1, blaKPC-2); Serratia marcescens (blaSHV-1, blaCTX-M-3, blaDHA-1, blaVIM-2); Klebsiella pneumoniae (blaSHV-12, blaCTX-M-15, blaDHA-1); Serratia fonticola (blaSHV-12, blaVIM-1, blaDHA-1) and Enterobacter aerogene (blaSHV-1, blaCTX-M-1, blaDHA-1, blaVIM-2, blaOXA-9). Our results indicate that beta-lactam resistant bacteria have acquired integrons with a different number of genes that providing panresistance to beta-lactam antibiotics, including penicillins, oxacillins, cefalosporins, monobactams, carbapenems and imipenems.
ARTICLE | doi:10.20944/preprints201809.0337.v1
Subject: Life Sciences, Virology Keywords: Echovirus 7; Echovirus 19; Nigeria; Enterovirus Species B; Complete Genome
Online: 18 September 2018 (09:39:11 CEST)
We describe the genomes of two Echovirus isolates from Nigeria as reference enterovirus species B genomes for the region. These Echovirus 7 and 19 genomes have 7,411nt and 7,426nt, and were recovered from sewage contaminated water (in 2010) and an acute flaccid paralysis case (in 2014), respectively.
ARTICLE | doi:10.20944/preprints201804.0106.v1
Subject: Biology, Plant Sciences Keywords: Clematis; chloroplast genome; rearrangement; inversion; IR expansion; synonymous substitution rate
Online: 9 April 2018 (10:34:28 CEST)
Genus Clematis is one of the largest within Ranunculaceae. Here we report the chloroplast genome of two Clematis species, C. brachyura and C. trichotoma endemic to Korea. The chloroplast genome lengths of C. brachyura and C. trichotoma are 159,532 bp and 159,170 bp, respectively. Gene contents in the complete chloroplast genomes of these two Clematis species are identical to that of most Ranunculaceae and other angiosperms. However, our data results demonstrated that genus Clematis has inversion and rearrangement events concerning gene rps4 gene, rps16 to trnH region, and trnL to ndhC region, and IR regions expansion. Comparison of IR regions among Ranunculaceae species revealed that Clematis species contained six protein coding genes (infA, rps8, rpl14, rpl16, rps3, and rpl22) usually found in the long single copy (LSC) region of other species. Phylogenetic analysis demonstrated that genus Clematis is closely related to genus Ranunculus. Differences in repeat structure, substitution rates, and IR expansion in genera Clematis and Ranunculus, explained their relationship. Clematis species showed slightly higher tandem repeats content than Ranunculus species. The six protein-coding genes showed lower synonymous substitution rates in the IR of Clematis species than in the LSC of Ranunculus species. Overall, the chloroplast genomes and results presented here provide important information on the evolution of Ranunculaceae.