ARTICLE | doi:10.20944/preprints201910.0271.v1
Subject: Life Sciences, Microbiology Keywords: genome assembly; monoxenous trypanosomatids; insect trypanosomatids; trypanosomatidae; whole genome
Online: 24 October 2019 (05:20:52 CEST)
We presented here the first draft genome sequence of the trypanosomatid Herpetomonas muscarum ingenoplastis. This parasite was isolated repeatedly in the black blowfly, Phormia regina. This is the first draft genome of a flagellate from the phylogenetically distinct clade of Trypanosomatidae.
Subject: Life Sciences, Biochemistry Keywords: human metapneumovirus; whole genome sequencing; genomic epidemiology
Online: 3 February 2021 (10:08:44 CET)
Human metapneumovirus (HMPV) is an important cause of upper and lower respiratory tract disease in individuals of all ages. It is estimated that most individuals will be infected by HMPV by the age of 5 years old. Despite this burden of disease, there remains caveats in our knowledge of virus global genetic diversity due to a lack of HMPV sequencing, particularly at whole genome scale. The purpose of this study was to create a simple and robust approach for HMPV whole genome sequencing to be used for genomic epidemiological studies. To design our assay, all available HMPV full length genome sequences were downloaded from the NCBI GenBank database and used to design four primer sets to amplify long, overlapping amplicons spanning the viral genome and, importantly, specific to all known HMPV subtypes. These amplicons were then pooled and sequenced on an Illumina iSeq; however the approach is suitable to other common NGS platforms. We demonstrate the utility of this method using a representative subset of clinical samples and examine these sequences using a phylogenetic approach. Here we present an amplicon-based method for the whole genome sequencing of HMPV from clinical extracts that can be used to better inform genomic studies of HMPV epidemiology and evolution.
ARTICLE | doi:10.20944/preprints201912.0354.v1
Subject: Biology, Other Keywords: Lactobacillus helveticus; probiotics; whole genome sequencing; PacBio; probiotic genes; bacteriocins; gene expression
Online: 26 December 2019 (10:56:44 CET)
Whole-genome DNA sequencing of Lactobacillus D75 and D76 strains (Vitaflor, Russia) was performed using the PacBio RS II platform, followed by de novo assembly with SMRT Portal 2.3.0. The average nucleotide identity (ANI) test showed that both strains belong to the Lactobacillus helveticus, but not the L. acidophilus as previously assumed. 31 exopolysaccharide (EPS) production genes (nine of which form a single genetic cluster), 13 adhesion genes, 38 milk protein and 11 milk sugar utilization genes, 13 genes for and against specific antagonistic activity, aight antibiotic resistance genes, and also three CRISPR blocks and eight Cas I-B system genes were identified in the genomes of the both strains. The expression of some genes was confirmed. In fact, the presence of identified genes suggests that L. helveticus D75 and D76 are able to form biofilms on the outer mucin layer, inhibit the growth of pathogens and pathobionts, utilize milk substrates with the formation of digestible milk sugars and bioactive peptides, resist bacteriophages and show some genome-determined resistance to antibiotics, stimulate the host’s immune system. Pathogenicity genes have not been identified. The study results confirm the safety and high probiotic potential of the strains.
ARTICLE | doi:10.20944/preprints202012.0473.v1
Subject: Life Sciences, Biochemistry Keywords: transposable elements; mobile element insertion events; next generation sequencing (NGS); genome evolution
Online: 18 December 2020 (14:53:44 CET)
Transposable elements (TEs) are mobile genetic elements capable of rapidly altering the genome through their movements. The importance of TE activity has been documented in many biological processes, such as introducing genetic instability, altering patterns of gene expression, and accelerating genome evolution. Increasing appreciation of TEs results in the growing number of bioinformatics software to identify insertion events. However, the application of existing TE finding tools is limited by either narrow-focused design of the package, too many dependencies on other tools, or prior knowledge required as input files that may not be readily available to all users. Here, we report a simple pipeline, TEfinder, developed for the detection of new TE insertions with minimal software dependencies using four inputs that can be easily generated with popular variant calling pipelines. The external software requirements are BEDTools, SAMtools, and Picard. Necessary inputs include TEs present in the reference genome, binary paired-end alignment, reference genome index, and a list of TE names. We tested TEfinder pipeline among several evolving populations of Fusarium oxysporum generated through a short-term adaptation study. Our results demonstrate that this easy-to-use tool can effectively detect new TE insertion events, making it accessible and practical for TE analysis.
ARTICLE | doi:10.20944/preprints202301.0480.v1
Subject: Life Sciences, Virology Keywords: avian influenza; highly pathogenic avian influenza; next generation sequencing; whole genome sequencing; nanopore technology; methods comparison; clinical validation
Online: 26 January 2023 (15:19:53 CET)
As exemplified by the global response to the SARS-CoV-2 pandemic, whole genome sequencing played an important role in monitoring the evolution of novel viral variants and provided guidance on potential antiviral treatments. The recent rapid and extensive introduction and spread of highly pathogenic avian influenza virus in Europe, North America and elsewhere raises the need for similarly rapid sequencing to aid in appropriate response and mitigation activities. To facilitate this objective, we investigated a next generation sequencing platform that uses a portable nanopore sequencing device to generate and present data in real time. This platform offers the potential to extend in-house sequencing capacities to laboratories that may otherwise lack resources to adopt sequencing technologies requiring large benchtop instruments. We evaluated this platform for routine use in a diagnostic laboratory. In this study we evaluated different primer sets for the whole genome amplification of influenza A virus and evaluated five different library preparation approaches for sequencing on the nanopore platform using the MinION flow-cell. A limited amplification procedure and a rapid procedure were found to be best among the approaches taken.
ARTICLE | doi:10.20944/preprints201910.0154.v1
Subject: Life Sciences, Genetics Keywords: papillary thyroid cancer; germline mutations; whole genome sequencing; predisposition markers; pathway analysis
Online: 13 October 2019 (17:07:34 CEST)
Evidence of familial inheritance in non-medullary thyroid cancer (NMTC) has accumulated over the last few decades. However, known variants account for a very small percentage of the genetic burden. Here, we focused on the identification of common pathways and networks enriched in NMTC families to better understand its pathogenesis with the final aim of identifying one novel high/moderate-penetrance germline predisposition variant segregating with the disease in each studied family. We performed whole genome sequencing on 23 affected and 3 unaffected family members from five NMTC-prone families and prioritized the identified variants using our Familial Cancer Variant Prioritization Pipeline (FCVPPv2). In total, 31 coding variants and 39 variants located in upstream, downstream, 5′ or 3′ untranslated regions passed FCVPPv2 filtering. Altogether, 210 genes affected by variants that passed the first three steps of the FCVPPv2 were analyzed using Ingenuity Pathway Analysis software. These genes were enriched in tumorigenic signaling pathways mediated by receptor tyrosine kinases and G-protein coupled receptors, implicating a central role of PI3K/AKT and MAPK/ERK signaling in familial NMTC. Our approach can facilitate the identification and functional validation of causal variants in each family as well as the screening and genetic counseling of other individuals at risk of developing NMTC.
ARTICLE | doi:10.20944/preprints202107.0275.v1
Subject: Biology, Anatomy & Morphology Keywords: Genome sequencing; de novo Assembly; Scaffolding; Chromosome-scale; Nanopore sequencing; Long reads; Optical maps; Bionano Genomics; Hi-C; Omni-C; Pore-C; Plant genomes
Online: 12 July 2021 (22:55:37 CEST)
With the rise of long-read sequencers and long-range technologies, delivering high-quality plant genome assemblies is no longer reserved to large consortia. Indeed, sequencing techniques but also computer algorithms have reached a point where the reconstruction of assemblies at the chromosome-scale is now feasible at the laboratory scale. Current technologies, and especially long-range technologies, are numerous and selecting the most promising one for the genome of interest is crucial to obtain optimal results. In this study, we resequenced the genome of the yellow sarson, Brassica rapa cv. Z1, using the Oxford Nanopore PromethION sequencer and assembled the sequenced data using current assemblers. To reconstruct complete chromosomes, we used and compared three long-range techniques, optical mapping, Omni-C and Pore-C sequencing libraries commercialized by Bionano Genomics, Dovetail Genomics and Oxford Nanopore Technologies respectively, or a combination of the three, in order to evaluate the capability of each technology.
Subject: Biology, Agricultural Sciences & Agronomy Keywords: Pantoea agglomerans; plant growth-promotion; Solanum lycopersicum L.; indole-3-acetic acid; siderophores; arsenic resistance; complete genome; horizontal gene transfer
Online: 24 November 2019 (15:29:55 CET)
Distinctive strains of Pantoea are used as soil inoculants for their ability to promote plant growth. Pantoea agglomerans strain C1, previously isolated from the phyllosphere of lettuce, can produce indole-3-acetic acid (IAA), solubilize phosphate, and inhibit plant pathogens, such as Erwinia amylovora. In this paper, the complete genome sequence of strain C1 is reported. In addition, experimental evidence is provided on how the strain tolerates arseniate up to 100 mM, and on how secreted metabolites like IAA and siderophores act as biostimulants in tomato cuttings. The strain has a circular chromosome and two prophages for a total genome of 4,846,925-bp, with a GC content of 55.2%. Genes related to plant growth promotion and biocontrol activity, such as those associated with IAA and spermidine synthesis, solubilization of inorganic phosphate, acquisition of ferrous iron, and production of volatile organic compounds, siderophores and GABA, were found in the genome of strain C1. Genome analysis also provided better understanding of the mechanisms underlying strain resistance to multiple toxic heavy metals and transmission of these genes by horizontal gene transfer. Findings suggested that strain C1 exhibits high biotechnological potential as plant growth-promoting bacterium in heavy metal polluted soils.
REVIEW | doi:10.20944/preprints202006.0324.v1
Subject: Life Sciences, Genetics Keywords: De-novo Genome Assembly; Short Read Genome Assembly; Long Read Genome Assembly; Hybrid Genome Assembly
Online: 28 June 2020 (08:56:09 CEST)
Despite advances in algorithms and computational platforms, de-novo genome assembly remains a challenging process. Due to the constant innovation in sequencing technologies (Sanger, SOLiD, Illumina, 454, PacBio and Oxford Nanopore), genome assembly has evolved to respond to the changes in input data type. This paper includes a broad and comparative review of the most recent short-read, long-read and hybrid assembly techniques. In this review, we provide (1) an algorithmic description of the important processes in the workflow that introduces fundamental concepts and improvements; (2) a review of existing software that explains possible options for genome assembly; and (3) a comparison of the accuracy and the performance of existing methods executed on the same computer using the same processing capabilities and using the same set of real and synthetic datasets. Such evaluation allows a fair and precise comparison of accuracy in all aspects. As a result, this paper identifies both the strengths and weaknesses of each method. This comparative review is unique in providing a detailed comparison of a broad spectrum of cutting-edge algorithms and methods.
ARTICLE | doi:10.20944/preprints202107.0400.v1
Subject: Life Sciences, Biochemistry Keywords: Pangenome; horizontal gene transfer (HGT); core genome; accessory genome
Online: 19 July 2021 (10:19:29 CEST)
Pantoea stewartii subsp. indologenes (Psi) is a causative agent of leafspot of foxtail millet and pearl millet; however, novel strains were recently identified that are pathogenic on onion. Our recent host range evaluation study identified two pathovars; P. stewartii subsp. indologenes pv. cepacicola pv. nov. and P. stewartii subsp. indologenes pv. setariae pv. nov. that are pathogenic on onion and millets or on millets only, respectively. In the current study we developed a pan-genome using the whole genome sequencing of newly identified/classified Psi strains from both pathovars [pv. cepacicola (n= 4) and pv. setariae (n=13)]. The full spectrum of the pan-genome contained 7,030 genes. Among these, 3,546 (present in genomes of all 17 strains) were the core genes that were a subset of 3,682 soft-core genes (present in ≥16 strains). The accessory genome included 1,308 shell genes and 2,040 cloud genes (present in ≤ 2 strains). The pan-genome showed a clear liner progression with >6,000 genes, suggesting the pan-genome of Psi is open. Comparative phylogenetic analysis showed differences in phylogenetic clustering of Pantoea spp. using PAVs/wgMLST approach in comparison to core genome SNP-based phylogeny. Further, we conducted a horizontal gene transfer (HGT) study including four other Pantoea species namely, P. stewartii subsp. stewartii LMG 2715T, P. ananatis LMG 2665T, P. agglomerans LMG L15, and P. allii LMG 24248T. A total of 317 HGT events among four Pantoea species were identified with most gene transfers observed between Psi pv. cepacicola and Psi pv. setariae. Pan-GWAS analysis predicted a total of 154 genes including seven cluster of genes associated with the pathogenicity phenotype on onion. One of the clusters contain 11 genes with known functions and are found to be chromosomally located.
ARTICLE | doi:10.20944/preprints201808.0423.v1
Subject: Biology, Animal Sciences & Zoology Keywords: mitochondrial DNA; mitochondrial genome; genome assembly; genome annotation; next generation sequencing; animal genomics; partial genomics; bioinformatics
Online: 24 August 2018 (03:24:37 CEST)
Next-generation sequencing is now a mature technology, allowing partial animal genomes to be produced for many clades. Though many software exist for genome assembly and annotation, a simple pipeline that allows researchers to input raw sequencing reads in fastq format and allow the retrieval of a completely assembled and annotated mitochondrial genome is still missing. mitoMaker 1.0 is a pipeline developed in python that implements (i) recursive de novo assembly of mitochondrial genomes using a set of increasing k-mers; (ii) search for the best matching result to a target mitogenome and; (iii) performs iterative reference-based strategies to optimize the assembly. After (iv) checking for circularization and (v) positioning tRNA-Phe at the beginning, (vi) geneChecker.py module performs a complete annotation of the mitochondrial genome and provides a GenBank formatted file as output.
REVIEW | doi:10.20944/preprints202211.0318.v1
Subject: Life Sciences, Molecular Biology Keywords: Genome editing 4; C. elegans 5; Genome engineering; CRISPR; CRISPR-Cas
Online: 17 November 2022 (02:22:43 CET)
CRISPR-Cas allows us to introduce desired genome editing, including mutations, epitopes, and deletions with unprecedented efficiency. The development of CRISPR-Cas has progressed to such an extent that it is now applicable in various fields with the help of model organisms. C. elegans is one of the pioneering animals in which numerous CRISPR-Cas strategies have been rapidly es-tablished over the past decade. Ironically, the emergence of numerous methods makes the right choice of method difficult. Choosing an appropriate selection or screening approach is the first step in planning a genome modification. This report summarizes the key features and applications of CRISPR-Cas methods using C. elegans and illustrates key strategies. Our overview of significant advances in CRISPR-Cas will help readers to understand current advances in genome editing and navigate various methods of CRISPR-Cas genome editing.
ARTICLE | doi:10.20944/preprints202204.0298.v1
Subject: Life Sciences, Microbiology Keywords: genome; accessory; core genome; Fusarium circinatum; structural variants; inversions; indels; pangenome
Online: 29 April 2022 (10:47:31 CEST)
Fusarium circinatum is an important global pathogen of pine trees. Genome plasticity has been observed in different isolates of the fungus, but no genome comparisons are available. To address this gap, we sequenced and assembled to chromosome level five isolates of F. circinatum. These genomes were analysed together with previously published genomes of F. circinatum isolates FSP34 and KS17. Multi-sample variant calling identified a total of 461683 micro variants (SNPs and small indels) and a total of 1828 macro structural variants of which 1717 were copy number variants and 111 were inversions. Variant density was higher on sub-telomeric regions of chromosomes. Variant annotation revealed that genes involved in transcription, transport, metabolism and transmembrane proteins were overrepresented in gene sets affected by high impact variants. A core genome representing genomic elements conserved in all the isolates and a non-redundant pangenome representing all genomic elements is presented. Whole genome alignments showed that an average of 93% of the genomic elements are present in all isolates. The results of this study reveal that some genomic elements are not conserved within the isolates and some variants are high impact. The described genome-scale variations will help inform novel disease management strategies against the pathogen.
ARTICLE | doi:10.20944/preprints202005.0417.v4
Online: 20 August 2020 (04:20:16 CEST)
Nyssa yunnanensis is a deciduous tree species in the family Nyssaceae within the order Cornales. As only eight individual trees and two populations have been recorded in China’s Yunnan province, this species has been listed among China’s national Class I protection species since 1999 and also among 120 PSESP (Plant Species with Extremely Small Populations) in the Implementation Plan of Rescuing and Conserving China’s Plant Species with Extremely Small Populations(PSESP) (2011-2-15). Here, we present the draft genome assembly of N. yunnanensis. Using 10X Genomics linked-reads sequencing data, we carried out the de novo assembly and annotation analysis. The N. yunnanensis genome assembly is 1475 Mb in length, containing 288,519 scaffolds with a scaffold N50 length of 985.59 kb. Within the assembled genome, 799.51 Mb was identified as repetitive elements, accounting for 54.24% of the sequenced genome, and a total of 39,803 protein-coding genes were predicted. With the genomic characteristics of N. yunnanensis available, our study might facilitate future conservation biology studies to help protect this extremely threatened tree species.
COMMUNICATION | doi:10.20944/preprints201808.0480.v1
Online: 29 August 2018 (04:50:36 CEST)
The recent report that DNA extracted from ancient bone must have from the offspring of a female Neanderthal and a male Denisovan depends on the inference that the subject has a high level of heterozygosity for Neanderthal and Denisovan alleles across the genome. Here I point out that the relative frequencies of derived transversion polymorphisms varies markedly between the new specimen, Denisova 11, and two high-coverage Neanderthal genomes. In Denisova 11 the AC and CG polymorphisms are much commoner than the others and are almost twice as common as the AT polymorphism. In the high-coverage Neanderthal genomes the four types of transversion are about equally common, with the AT being slightly commoner than the others. These results suggest that allele-calling errors are frequent and that this may provide an alternative explanation for the observed heterozygosity.
ARTICLE | doi:10.20944/preprints202206.0376.v1
Subject: Biology, Other Keywords: effector proteins; genome-wide analysis; Ganoderma boninense; basal stem rot; genome architecture
Online: 28 June 2022 (04:59:14 CEST)
Ganoderma boninense is the major causal agent for the basal stem rot (BSR) disease in oil palm, causing the progressive rot of the basal part of the stem. Despite its prominence, key pathogenicity determinants for the aggressive nature of hemibiotrophic infection remain unknown. In this study, genome sequencing and annotation of G. boninense T10 were carried out using the Illumina sequencing platform and comparative genome analysis was performed with previously reported G. boninense strains (NJ3 and G3). The pan-secretome of G. boninense was constructed and comprised of 937 core orthogroups, 243 accessory orthogroups, and 84 strain-specific orthogroups. A set of core candidate effector proteins (CEPs) were found to be enriched with catalytic protein classified as the carbohydrate-active enzymes, hydrolases as well as non-catalytic proteins. Differential expression analysis revealed an upregulation of CEP genes which was linked to the suppression of PTI signaling cascade while the downregulation of CEP genes was linked to the inhibition of PTI by preventing host defense elicitation. Genome architecture analysis revealed the one-speed architecture of the G. boninense genome and the lack of preferential association of CEP genes to the transposable elements. The findings obtained from this study would aid in the characterization of pathogenicity determinants and molecular biomarkers of BSR disease.
DATA DESCRIPTOR | doi:10.20944/preprints202208.0349.v1
Online: 18 August 2022 (11:12:25 CEST)
The Peruvian creole cattle (PCC) is a neglected breed, and is an essential livestock resource in the Andean region of Peru. To develop a modern breeding program and conservation strategies for the PCC, a better understanding of the genetics of this breed is needed. We sequenced the whole genome of the PCC using a paired-end 150 strategy on the Illumina HiSeq 2500 platform, obtaining 320 GB of sequencing data. The obtained genome size of the PCC was 2.77 Gb with a contig N50 of 108Mb and 92.59% complete BUSCOs. Also, we identified 40.22% of repetitive DNA of the genome assembly, of which retroelements occupy 32.39% of the total genome. A total of 19,803 protein-coding genes were annotated in the PCC genome. We downloaded proteomes and genomes of the Bovinae subfamily, and conducted a comparative analysis with our draft genome. Phylogenomic analysis showed that PCC is related to Bos indicus. Also, we identified 7,746 family genes shared among the Bovinae subfamily. This first PCC genome is expected to contribute to a better understanding of its genetics to adapt to the tough conditions of the Andean ecosystem, and evolution.
ARTICLE | doi:10.20944/preprints201807.0156.v1
Online: 9 July 2018 (16:08:26 CEST)
Escherichia coli phage Eco_BIFF was isolated from several laboratory stocks of E. coli K-12 MG1655 derivatives. The source of the contamination is unknown. Eco_BIFF is a lytic phage that shows effective growth inhibition of E. coli K-12. Here, we announce the complete genome sequence of Eco_BIFF, and major findings from its genome annotation.
REVIEW | doi:10.20944/preprints202111.0170.v3
Online: 5 May 2022 (10:38:09 CEST)
Non-vertebrate species represent about ~95% of known metazoan (animal) diversity. They remain to this day relatively unexplored genetically, but understanding their genome structure and function is pivotal for expanding our current knowledge of evolution, ecology and biodiversity. Following the continuous improvements and decreasing costs of sequencing technologies, many genome assembly tools have been released, leading to a significant amount of genome projects being completed in recent years. In this review, we examine the current state of genome projects of non-vertebrate animal species. We present an overview of available sequencing technologies, assembly approaches, as well as pre and post-processing steps, genome assembly evaluation methods, and their application to non-vertebrate animal genomes.
REVIEW | doi:10.20944/preprints202111.0350.v1
Online: 19 November 2021 (12:33:53 CET)
The newly established virus family Phenuiviridae in Bunyavirales harbors viruses infecting three kingdoms of host organisms (animals, plants, and fungi), which is rare in known virus families. Many phenuiviruses are arboviruses and replicate in two distinct hosts (e.g., insects and humans or rice). Multiple phenuiviruses, such as Dabie bandavirus, Rift Valley fever phlebovirus, and Rice stripe tenuivirus, are highly pathogenic to humans, animals, or plants. They impose heavy global burdens on human health, livestock industry, and agriculture and are research hotspots. In recent years the taxonomy of Phenuiviridae has been expanded greatly, and researches on phenuiviruses have made significant progress. With these advances, this review drew a novel panorama regarding the biomedical significance, distribution, morphology, genomics, taxonomy, evolution, replication, transmission, pathogenesis, and control of phenuiviruses, to aid researchers in various fields to recognize this highly adaptive and very important virus family.
ARTICLE | doi:10.20944/preprints202110.0027.v1
Subject: Life Sciences, Other Keywords: eukaryogenesis; genome complexification; atmospheric oxidation; macroevolution
Online: 1 October 2021 (15:26:03 CEST)
The origin of the nucleus remains a great mystery in life science, although nearly two centuries have passed since the discovery of nuclei. To date, studies of eukaryogenesis have focused largely on micro-evolutionary explanations. Here, we examined macro-patterns of C-values (the total amount of DNA within the haploid chromosome set of an organism) for over 110,000 species and the chromosome numbers for over 11,000 species and their potential links with the state of atmospheric oxidation over geological time. Eukaryogenesis was in sync with an over 2.5 order-of-magnitude increase in genome size from prokaryote to eukaryote, and also with a rapid rise of atmospheric oxidation, suggesting that eukaryogenesis would have resulted from a regime shift of genomes driven by the oxidation-driven complexification and structuralization (e.g. chromatin packing).
ARTICLE | doi:10.20944/preprints202009.0207.v1
Online: 9 September 2020 (10:48:24 CEST)
Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. It also has been widely used to study structural variants, phase haplotypes and more. Here, we introduce the assembler— SMARTdenovo, which is an SMS assembler that follows the overlap-layout-consensus (OLC) paradigm. SMARTdenovo (RRID: SCR_017622) was designed to be a fast assembler that did not require highly accurate raw reads for error correction, unlike other, contemporaneous SMS assemblers. It has performed well for evaluating congeneric assemblers and has been successful for a variety of assembly projects. It is compatible with Canu for assembling high-quality genomes, and several of the assembly strategies in this program have been incorporated into subsequent popular assemblers. The assembler has been in use since 2015, and here we provide information on the development of SMARTdenovo and how to implement its algorithms into current projects.
ARTICLE | doi:10.20944/preprints202008.0275.v1
Subject: Biology, Plant Sciences Keywords: transposable elements; genome annotation; software evaluation
Online: 12 August 2020 (08:07:14 CEST)
Background: Transposable elements (TEs) constitute the vast majority of all eukaryotic DNA, and display extreme diversity, with thousands of families. Given their abundance and diversity, TEs discovery and annotation becomes challengeable. At present, tools and databases have built libraries to mask TEs in genomes based on de novo- and homology-based identification strategies, but no consensus criteria about which tools should be used have been proposed. Results: In the de novo-based strategy, we compared performances of TE libraries developed by four commonly used tools, including RepeatModeler, LTR_FINDER, LTRharvest, and MITE_Hunter, by using a simulated genome as a standard control. The results showed that the performance of RepeatModeler decreased as it was combined with either LTR_FINDER or LTRharvest. Combination of RepeatModeler and MITE_Hunter showed better performance than RepeatModeler and MITE_Hunter alone. In the homology-based strategy, we evaluated different sources from a taxonomic point of view to build an accurate TE library. When we selected a library from databases to identify TEs for Arabidopsis thaliana genome, the library from a genus genetically closer to Arabidopsis achieved better performance than other genera with further genetic distance. Without the Arabidopsis, combination of top three genera closer to Arabidopsis showed better performance than combination of all genera. Conclusion: This study proposes a series of recommendations to perform an accurate TE annotation: 1) For de novo-based strategy, RepeatModeler and MITE_Hunter are suggested to build a TE library; 2) For homology-based strategy, it is recommended to use library of genus genetically close to the species rather than use combined library from all genera.
BRIEF REPORT | doi:10.20944/preprints201911.0214.v1
Subject: Biology, Animal Sciences & Zoology Keywords: shark; genome; longevity; gigantism; positive selection
Online: 18 November 2019 (07:46:50 CET)
A previous study involving whole genome sequencing of the white shark suggested unique molecular evolution accounting for gigantism and the enhanced longevity of sharks including positive selection of dozens of protein-coding genes potentially involved in genome stability. We performed a reanalysis on some of the genes and identified serious flaws in their results. In this short article, we scrutinize one of the serious problems we identified, report other concerns, and point out a potential bias in analyzing iconic shark species in general.
ARTICLE | doi:10.20944/preprints201802.0098.v1
Subject: Biology, Plant Sciences Keywords: Boechera; Brassicaceae; genome; assembly; annotation; apomixis
Online: 14 February 2018 (07:29:29 CET)
Closely related to the model plant Arabidopsis thaliana, the genus Boechera is known to contain both sexual and apomictic species or accessions. Boechera retrofracta is a diploid sexually reproducing species and is thought to be an ancestral parent species of the apomictic species Boechera divaricarpa. Here we report the de novo assembly of the B. retrofracta genome using short Illumina and Roche reads from 1 paired-end and 3 mate pair libraries. The distribution of 23-mers from the paired end library has indicated a low level of heterozygosity and the presence of detectable duplications and triplications. The genome size was estimated to be equal 227 Mb. N50 of the assembled scaffolds was 2.3 Mb. 27048 protein-coding genes were predicted using a hybrid approach that combines homology-based and de novo methods. Also repeats, tRNA and rRNA genes were annotated. Finally, genes of B. retrofracta and 6 other Brassicaceae species were used for phylogenetic tree reconstruction. Also, a detailed analysis of evolution of the APOLLO apomixis-associated locus was performed. An assembled genome of B. retrofracta will help in the challenging assembly of the highly heterozygous genomes of hybrid apomictic species such as B. divaricarpa.
ARTICLE | doi:10.20944/preprints201811.0183.v2
Subject: Life Sciences, Molecular Biology Keywords: sequencing technologies; NGS; genome research; genome assembly; variant calling; RNA-Seq; transcriptome assembly; bioinformatics; molecular biology; education
Online: 13 November 2018 (10:22:06 CET)
Combined awareness about the power and limitations of bioinformatics and molecular biology enables advanced research based on high-throughput data. Despite an increasing demand for scientists with a combined background in both fields, the education in dry lab and wet lab is often separated. This work describes an example of integrated education with focus on genomics and transcriptomics. Participants learn computational and molecular biology methods in the same practical course. Peer-review is applied as a teaching method to foster cooperative learning of students with heterogeneous backgrounds. Evaluation results indicate acceptance and appreciation of this approach.
ARTICLE | doi:10.20944/preprints202212.0185.v1
Online: 12 December 2022 (01:46:15 CET)
Poplar and willow species in the Salicaceae are dioecious, yet have been shown to use different sex determination systems located on different chromosomes. Willows in the section Vetrix are interesting for comparative studies of sex determination systems, yet genomic resources for these species are still quite limited. Only a few annotated reference genome assemblies are available, despite many species in use in breeding programs. Here we present de novo assemblies and annotations of 11 shrub willow genomes from six species. Copy number variation of candidate sex determination genes within each genome was characterized and revealed remarkable differences in putative master regulator gene duplication and deletion. We also analyzed copy number and expression of candidate genes involved in floral secondary metabolism, and identified substantial variation across genotypes, which can be used for parental selection in breeding programs. Lastly, we report on a genotype that produces only female descendants and identified gene presence/absence variation in the mitochondrial genome that may be responsible for this unusual inheritance.
REVIEW | doi:10.20944/preprints202111.0084.v1
Subject: Medicine & Pharmacology, Clinical Neurology Keywords: Parkinson’s disease; gene therapy; mitochondria; genome editing
Online: 3 November 2021 (14:17:16 CET)
Background. Mitochondrial dysfunction has been identified as a pathophysiological hallmark of disease onset and progression in patients with Parkinsonian disorders. Besides the overall emergence of gene therapies in treating these patients, this highly relevant molecular concept has not yet been defined as a target for gene therapeutic approaches. Methods. This narrative review will discuss the experimental evidence suggesting mitochondrial dysfunction as a viable treatment target in patients with monogenic and idiopathic Parkinson’s disease. In addition, we will focus on general treatment strategies and crucial challenges which need to be overcome. Results. Our current understanding of mitochondrial biology in parkinsonian disorders opens up the avenue for viable treatment strategies in Parkinsonian disorders. Insights can be obtained from primary mitochondrial diseases. However, substantial knowledge gaps and unique challenges of mitochondria-targeted gene therapies need to be addressed to provide innovative treatments in the future. Conclusions. Mitochondria-targeted gene therapies are a potential strategy to improve an important primary disease mechanism in Parkinsonian disorders. However, further studies are needed to address the unique design challenges for mitochondria-targeted gene therapies.
REVIEW | doi:10.20944/preprints202109.0264.v1
Subject: Biology, Other Keywords: choanoflagellates; multicellularity; animal origins; genome editing; electroporation
Online: 15 September 2021 (14:39:19 CEST)
Choanoflagellates, the closest living relatives of animals, have the potential to reveal the genetic and cell biological foundations of complex multicellular development in animals. Here we describe the history of research on the choanoflagellate Salpingoeca rosetta. From its original isolation in 2000 to the establishment of CRISPR-mediated genome editing in 2020, S. rosetta provides an instructive case study in the establishment of a new model organism.
ARTICLE | doi:10.20944/preprints202107.0311.v1
Online: 13 July 2021 (15:11:54 CEST)
The SnRK gene family is a key regulator playing an important role in plant stress response by phosphorylating the target protein to regulate the signalling pathways. The function of SnRK gene family has been reported in many species but is limited to Triticum asetivum. In this study, SnRK gene family in the wheat genome was identified and its structural characteristics were described. One hundred forty-seven SnRK genes distributed across 21 chromosomes were identified in the Triticum aestivum genome and categorised into three subgroups (SnRK1/2/3) based on phylogenetic analyses and domain types. The gene intron-exon structure and protein-motif composition of SnRKs were similar within each subgroup but different amongst the groups. Gene duplication between the wheat, Arabidopsis, rice and barley genomes was also investigated in order to get insight into the evolutionary aspects of the TaSnRK family genes. SnRK genes showed differential expression patterns in leaves, roots, spike, and grains. Redundant stress-related cis-elements were also found in the promoters of 129 SnRK genes and their expression levels varied widely following drought, ABA and light regulated elements. In particular, TaSnRK2.11 had higher and increased expression under the abiotic stresses and can be a candidate gene for the abiotc stress tolerance. The findings will aid in the functional characterization of TaSnRK genes for further research.
ARTICLE | doi:10.20944/preprints202105.0422.v1
Subject: Medicine & Pharmacology, Allergology Keywords: genome editing; CRISPR; Cas9; in vivo editing
Online: 18 May 2021 (11:27:46 CEST)
The development of CRISPR associated proteins, such as Cas9, has led to increased accessibility and ease of use in genome editing. However, additional tools are needed to quantify and identify successful genome editing events in living animals. We developed a method to rapidly and quantitatively monitor gene editing activity non-invasively in living animals that also facilitates confocal microscopy and nucleotide level analyses at the end of study. Here we report a new CRISPR “footprinting” approach to activate luciferase and fluorescent proteins in mice as a function of gene editing. This system is based on experience with our prior Cre-detector system and is designed for Cas editors able to target LoxP including gRNAs including SaCas9 and ErCas12a [1, 2]. These CRISPRs cut specifically within LoxP, an approach that is a departure from previous gene editing in vivo activity detection techniques that targeted adjacent stop sequences. In this sensor paradigm, CRISPR activity was monitored non-invasively in living Cre reporter mice (FVB.129S6(B6)-Gt(ROSA)26Sortm1(Luc)Kael/J and Gt(ROSA)26Sortm4(ACTB-tdTomato,-EGFP)Luo/J, which will be referred to as LSL and mT/mG throughout the paper) after intramuscular or intravenous hydrodynamic plasmid injections, demonstrating utility in two diverse organ systems. The same genome-editing event was examined at the cellular level in specific tissues by confocal microscopy to determine the identity and frequency of successfully genome-edited cells. Further, SaCas9 induced targeted editing at efficiencies that were comparable to Cre recombinase demonstrating high effective delivery and activity in a whole animal. This work establishes genome editing tools and models to track CRISPR editing in vivo non-invasively and to fingerprint the identity of targeted cells. This approach also enables similar utility for any of the thousands of previously generated LoxP animal models.
REVIEW | doi:10.20944/preprints202103.0070.v1
Subject: Biology, Anatomy & Morphology Keywords: Genome; gene families; Transposable elements; Entamoeba histolytica
Online: 2 March 2021 (10:11:58 CET)
Entamoeba histolytica, like other Organismes, is characterized by diversity and heterogeneity in its genetic content, which is one of the most important reasons for survival, and the increase in susceptibility to infection.Non-condensation of chromosomes during the process of cell division and the ambiguity of the chromosomal ploidy makes predicting the exact chromosomal number difficult. Genes distributed across 14 chromosomes as well as many extra-chromosome elements. Most Genes composed of one axon only, with Introns in 25% of Genes. This genome is characterized by the presence of Polymorphic internal repeat regions, and several gene families, one of these large families encoding Transmembrane kinas, Cysteine protease (CP), SREHP protein, and others.
REVIEW | doi:10.20944/preprints202101.0212.v1
Online: 12 January 2021 (10:14:46 CET)
The constitutively active tyrosine kinase BCR/ABL1 oncogene plays a key role in human chronic myeloid leukemia development and disease maintenance, and determines most of the features of this leukemia. For this reason, tyrosine kinase inhibitors are the first-line treatment, offering most patients a life expectancy like that of an equivalent healthy person. However, since the oncogene is not destroyed, lifelong oral medication is essential, even though this trigger adverse effects in many patients. Furthermore, leukemic stem cells remain quiescent and resistance is observed in approximately 25% of patients. Thus, new therapeutic alternatives are still needed. In this scenario, the emergence of CRISPR technology can offer a definitive treatment based on its capacity to disrupt coding sequences. This review describes CML disease and the main advances in the genome-editing field by which it may be treated in the future.
ARTICLE | doi:10.20944/preprints201907.0169.v1
Subject: Life Sciences, Microbiology Keywords: polyvalent bacteriophage FP01, Escherichia coli, Salmonella, genome
Online: 12 July 2019 (13:07:09 CEST)
Recently the polyvalent bacteriophage FP01, isolated from wastewater in Valparaiso, Chile, was described to have lytic activity across species against Escherichia coli and Salmonella enterica serovars. Due to it polyvalent nature the bacteriophage FP01 could have potential application in food and agri-industry. Also, fundamental aspects of polyvalent bacteriophage biology are not well known. In this study we sequenced and describe the complete genome of the polyvalent phage FP01 (MH745368) using the nanopore technology. The bacteriophage FP01 genome has a 44,900 bp, double-stranded DNA with an average G+C content of 49.41% and 90 coding sequences (CDSs). We found that the phage FP01 critically depends on host factors for replication and transcription. Also, it has a critical lysogenic repressor pseudogene. Phylogenetic analyses indicated that the phage FP01 is closely related to phages lambda and P22. These results suggest that the phage FP01 could be a lytic variant of a lysogenic phage or acquired genes from lysogenic phages during host infection.
ARTICLE | doi:10.20944/preprints201906.0310.v1
Subject: Life Sciences, Microbiology Keywords: cyanobacteria; secondary metabolite; genome mining; molecular networking
Online: 30 June 2019 (10:42:22 CEST)
Cyanobacteria are an ancient lineage of slow-growing photosynthetic bacteria and a proliﬁc source of natural products with diverse chemical structures and potent biological activities and toxicities. The chemical identiﬁcation of these compounds remains a major bottleneck. Strategies that can prioritize the most proliﬁc strains and novel compounds are of great interest. Here, we combine chemical analysis and genomics to investigate the chemodiversity of secondary metabolites based on their pattern of distribution within some cyanobacteria. Planktothrix being a cyanobacterial genus known to form blooms worldwide and to produce a broad spectrum of toxins and other bioactive compounds, we applied this combined approach on four closely related strains of Planktothrix. The chemical diversity of the metabolites produced by the four strains was evaluated using an untargeted metabolomics strategy with high-resolution LC-MS. Metabolite proﬁles were correlated with the potential of metabolite production identified by genomics for the different strains. Although, the Planktothrix strains present a global similarity in term biosynthetic cluster gene for microcystin, aeruginosin and prenylagaramide for example, we found remarkable strain-specific chemo-diversity. Only few of the chemical features were common to the four studied strains. Additionally, the MS/MS data were analyzed using Global Natural Products Social Molecular Networking (GNPS) to identify molecular families of the same biosynthetic origin. In conclusion, we present an efﬁcient integrative strategy for elucidating the chemical diversity of a given genus and link the data obtained from analytical chemistry to biosynthetic genes of cyanobacteria.
REVIEW | doi:10.20944/preprints202212.0453.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: wheat; resistance; leaf rust; genetic loci; genome-wide
Online: 23 December 2022 (08:11:48 CET)
Due to the global warming and dynamic changes in pathogenic virulence, leaf rust caused by Puccinia triticina has greatly expanded its epidermic region and become a severe threat to global wheat production. Genetic bases of wheat resistance to leaf rust mainly relies on the leaf rust resistance (Lr) gene or quantitative trait locus (QLr). Although these genetic loci have been insensitively studied during the last two decades, an updated overview of Lr/QLr in a genome-wide level is urgently needed. This review summarized recent progresses in genetic studies of wheat resistance to leaf rust. Wheat germplasms with great potentials in genetic improvement of resistance to leaf rust were highlighted. Key information about the genetic loci carrying Lr/QLr were summarized. A genome-wide chromosome distribution map for all the Lr/QLr was generated based on the released wheat reference genome. In conclusion, this review has provided valuable sources for both wheat breeders and researchers to understand the genetics of resistance to leaf rust in wheat.
ARTICLE | doi:10.20944/preprints202211.0382.v1
Subject: Life Sciences, Biotechnology Keywords: Bacillus; bacterial antagonist; genome sequence; antimicrobial peptide; biologicals
Online: 21 November 2022 (07:43:01 CET)
Plant diseases are among the major factors affecting plant productivity. Biological control of plant diseases is preferred over chemical control as it is environment-friendly, cost-effective, and sustainable. Among many microbes capable of providing biological control of plant diseases, probiotic Bacillus species are most promising as they can survive in adverse conditions, provide plants with a wide range of benefits including protection from phytopathogens. Wheat blast caused by Magnaporthe oryzae Triticum pathotype (MoT) has emerged as a potential threat to global wheat production. Due to unreliability of fungicides and limited cultivar resistance, we aimed to screen and identify potential antagonist bacteria collected from internal tissues of rice and wheat seeds to determine their in vitro and in vivo inhibitory effects against MoT. Dual culture and seedling assays were performed to evaluate the efficacy of probiotic bacteria. Out of 170 bacterial isolates, three bacteria (BTS-3, BTS-4, and BTLK6A) were screened as potential antagonists against MoT in vitro. Artificial inoculation at the seedling stage showed that the isolates BTS-4, BTS–3, and BTLK6A reduced 89, 88, and 85% of wheat blast disease severity, respectively, compared to mock-inoculated control. The bacterial isolates were identified as Bacillus subtilis (BTS-3) and B. velezensis (BTS-4 and BTLK6A) through genome phylogeny. The whole genome sequence of these three bacterial strains decoded a number of orthologs to intrinsic genes of antimicrobial peptides, antioxidant defense enzymes, cell wall degrading enzymes, compounds involved in the induction of systemic resistance (ISR) in host plants, and volatile compounds to make them promising biologicals to control MoT in wheat. Combined data of in vitro and in vivo along with genome analysis suggest that Bacillus spp. suppress the destructive wheat blast disease likely through antibiosis and ISR in the host plants. Further field evaluation and characterization of antimicrobial compounds are needed for a better understanding of the mode of action and practical recommendation of these bacteria for wheat blast control in the farmers’ fields.
HYPOTHESIS | doi:10.20944/preprints202211.0211.v1
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: cancer morphospace; microenvironmental complexity; genome instability; developmental abnormalities
Online: 11 November 2022 (03:18:06 CET)
Human cancers comprise an heterogeneous array of diseases with different progression patterns and responses to therapy. However, they all develop within a host context that constraints their natural history. As it occurs with the diversity of organisms, one can conjecture that there is order in the cancer multiverse. Is there a way to capture the broad range of tumor types within a space of the possible? Here we define the oncospace, a coordinate system that integrates the ecological, evolutionary and developmental components of cancer complexity. The spatial position of a tumor results from its departure from the healthy tissue along these three axes, and progression trajectories inform about the components driving malignancy across cancer subtypes. We postulate that the oncospace topology encodes new information regarding tumorigenic pathways, subtype prognosis and therapeutic opportunities: treatment design could benefit from considering how to nudge tumors towards empty evolutionary deserts in the oncospace.
REVIEW | doi:10.20944/preprints202210.0034.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: insect; genome; biopesticide; silencing; topical; gene target; validation
Online: 5 October 2022 (10:57:47 CEST)
Global crop yields are estimated to be reduced by 30–40% per year on account of plant pests and pathogens. Agricultural insect pests raise concerns about constraining global food security and climate changes contributing to the rise of infestation. The current management relies on plant breeding, associated or not with transgenes and chemical pesticides. Both approaches face serious technology obsolescence on the field due to resistance breakdown or development of insecticide resistance. The need for new Modes of Action (MoA) approaches in managing crop health grows each year, driven by market demands to reduce economic losses and phytosanitary requirements to meet the consumer perception. Disabling pest genes by sequence-specific expression silencing is considered a promising tool in the development of environment and health respectful biopesticides. The specificity conferred by long dsRNA-base solutions give support to minimizing effects on off-targeted genes in the insect pest genome and the target gene in non-target organisms (NTOs). In this review, we summarize the current status of gene silencing by RNA interference (RNAi) for agricultural control. More specifically, we focus on the engineering, development and application of gene silencing to control Lepidoptera by the employment of non-transforming dsRNA technologies. Despite some delivery and stability drawbacks of topical applications, we reviewed works showing convincing proof-of-concept results that point to imminent innovative solutions. Considerations about the regulamentation of the ongoing research on dsRNA-based pesticides to produce commercialized products for exogenous application are discussed. Academic and industry initiatives reveal a worthy effort to accomplish controlling Lepidoptera pests with this new mode of action to provide more sustainable and reliable technologies to field management. New data on genomics of this taxon encourage the increment of a customized target genes portfolio. As a case of study, we illustrate how dsRNA and associated methodologies could be applied to control an important Lepidopteran coffee pest.
ARTICLE | doi:10.20944/preprints202207.0292.v1
Subject: Medicine & Pharmacology, Other Keywords: Cryptococcus; Whole-Genome Sequencing; VGVI; phylogenomics; Molecular Type
Online: 20 July 2022 (03:16:00 CEST)
Whole-genome sequencing has advanced our understanding of the population structure of the pathogenic species complex Cryptococcus gattii, which has allowed for the phylogenomic specification of previously described major molecular type groupings and novel lineages. Recently, isolates collected in Mexico in the 1960s were determined to be genetically distant from other known molecular types and were classified as VGVI. We sequenced four clinical isolates and one veterinary isolate collected in the southwestern U.S. and Argentina during 2012-2021. Phylogenomic analysis groups these genomes with those of the Mexican VGVI isolates, expanding VGVI into a clade and establishing this molecular type as a clinically important population. These findings also potentially expand the known Cryptococcus ecological range with a previously unrecognized endemic area.
ARTICLE | doi:10.20944/preprints202205.0225.v1
Subject: Life Sciences, Genetics Keywords: chloroplast; genome; sweet cucumber; Solanaceae; next-generation sequencing
Online: 17 May 2022 (08:38:03 CEST)
Sweet cucumber (Solanum muricatum) sect. Basarthrum, is a neglected horticultural crop native of the Andean region. It is naturally distributed very close to potatoes (Solanum sect. Petota) and tomatoes (Solanum sect. Lycopersicon), two groups of high economic importance. To date, molecular tools for this crop are still undetermined. We here obtained the first complete chloroplast (cp) genome of sweet cucumber and compared with seven Solanaceae species. Pair-end clean reads were obtained by PE 150 library and the Illumina HiSeq 2500 platform. The complete cp genome of S. muricatum had a 155,681 bp with typical quadripartite structure, containing a large single copy (LSC) region (86,182 bp) and a small single-copy (SSC) region (18,360 bp), separated by two inverted repeat (IR) regions (25,568 bp). The annotation of chloroplast genome predicted 88 protein-coding genes (CDS), 8 ribosomal RNA (rRNA) genes, 37 transfer RNA (tRNA) genes, and one pseudogene. A total of 48 perfect microsatellites were identified, divided in mononucleotide repeats (32), followed by tetranucleotide (6) and dinucleotides (5). SSRs with trinucleotides repeats (3), pentanucleotide (1) and hexanucleotide (1) repeats motifs in these genomes were identified in lower quantity. Most of these repeats were distributed in the noncoding regions. Whole chloroplast genome comparison with the other seven Solanaceae species revealed that the small and large single copy regions showed more divergence than inverted regions. Finally, phylogenetic analysis resolved that S. muricatum is a sister species to members of sections Petota + Lycopersicum + Etuberosum. This study reports for the first time the genome organization, gene content, and structural features of the cp genome of S. muricatum. Also, this study may provide the basis for evaluating genetic diversity within Solanum, and will be useful to examine the evolutionary processes in sweet cucumber landraces.
ARTICLE | doi:10.20944/preprints202201.0333.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Covid-19; Ensemble; Genome sequencing; Machine learning; Variant
Online: 21 January 2022 (15:17:58 CET)
Covid-19 has caused infections and deaths worldwide. While research in the field of Data Science has contributed good predictions of positive Covid-19 case numbers, this study's review of literature shows there is little research in the use of variants of the virus in predictions. We set out to define and evaluate novel variant features. We find that features relating to variant trends, thresholds and amino acid substitutions are especially powerful in two tasks. In the first task, predicting Covid-19 case numbers, accuracy improved from 71.53% without variant features to 82.12% with variant features. In the second task, predicting transmission severity of variants between two classes, we created a method to build some variable ensembles through selecting appropriate models that are generated with variant features. The test results showed that our ensembles are more accurate and reliable. One particular ensemble of 14 models correctly classified 90.91% of variants, outperforming other models including the popular Random Forest ensemble. In addition, as the variant features have represented more underlying information about Covid-19 pathophysiology, our ensemble methods use only a few data samples to achieve an accurate prediction. The ensemble of 14 models uses only 50 cases of each variant, an ability that could be exploited for early detection of highly infectious variants. These research findings may benefit public health professionals, policy makers, and the research community in the collective efforts to overcome this disease.
ARTICLE | doi:10.20944/preprints202111.0167.v1
Subject: Biology, Plant Sciences Keywords: chloroplast genome; Compositae; phylogenetic incongruence; plastid DNA; Senecioneae
Online: 9 November 2021 (12:51:07 CET)
Plastid genomes are in general highly conserved given their slow evolutionary rate, thus large changes in their structure are unusual. However, when specific rearrangements are present, they are often phylogenetically informative. Asteraceae is a highly diverse family whose evolution is long driven by polyploidy (up to 48x) and hybridisation, both processes usually complicating systematic inferences. In this study, we have generated one of the most comprehensive plastome-based phylogenies of family Asteraceae, providing information about the structure, genetic diversity, and repeat composition of these sequences. By comparing the whole plastome sequences obtained, we confirmed the double inversion located in the long single copy region, for most of the species analysed (with the exception of basal tribes), a well-known feature for Asteraceae plastomes. We also show that genome size, gene order and gene content are highly conserved along the family. However, species representative of the basal subfamily Barnadesioideae -as well as in the sister family Calyceraceae - are lacking the pseudogene rps19 located in one inverted repeat. The phylogenomic analysis conducted here, based on 63 protein-coding genes, 30 transfer RNA genes and 21 ribosomal RNA genes from 36 species of Asteraceae, are overall consistent with the general consensus for the family’s phylogeny, while resolving the position of tribe Senecioneae and revealing some incongruences at tribe level between reconstructions based on nuclear and plastid DNA data.
ARTICLE | doi:10.20944/preprints202110.0367.v1
Subject: Biology, Other Keywords: Bacteria; culturomics; genome; species; sp. nov.,; taxono-genomics
Online: 25 October 2021 (15:47:32 CEST)
Marseille-Q4369 is a strain that we isolated from human healthy skin and characterized by taxono-genomic approach. Marseille-Q4369 exhibited 99.80% 16S rRNA sequence similarity with Agrococcus pavilionensisT the phylogenetically closest bacterium with standing in nomenclature. Furthermore, digital DNA–DNA hybridization revealed a maximum identity similarity of only 52.4% and an OrthoANI parameter provided a value of 93.63% between the novel organism and Agrococcus pavilionensisT. Marseille-Q4369 was observed to be a yellowish-pigmented, Gram-positive, coccoïd, facultative aerobic bacterium, and belonging to the Microbacteriaceae family. The major fatty acids detected are 12-methyl-tetradecanoic acid (66%), 14-methyl-hexadecanoic acid (24%) followed by 13-methyl-tetradecanoic acid (5%). The genome size of strain Marseille-Q4369 was 2,737,735-bp long with a 72,27 % G+C content. Taken altogether, these results confirm the status of this strain as a new member of the Agrococcus genus for which the name of Agrococcus massiliensis is proposed (=CSUR-Q4369 = DSM112404).
ARTICLE | doi:10.20944/preprints202106.0039.v1
Subject: Life Sciences, Virology Keywords: LAIV, Influenza, HA, IgA, IgG, vaccine, genome rearrangement
Online: 1 June 2021 (15:02:27 CEST)
Influenza B virus (IBV) is considered a major respiratory pathogen responsible for seasonal respiratory disease in humans, particularly severe in children and the elderly. Seasonal influenza vaccination is considered the most efficient strategy to prevent and control IBV infections. Live attenuated influenza virus vaccines (LAIVs) are thought to induce both humoral and cellular immune responses by mimicking a natural infection, but their effectiveness have recently come into question. Thus, the opportunity exists to find alternative approaches to improve overall influenza vaccine effectiveness. Two alternative IBV backbones were developed with re-arranged genomes, re-arranged M (FluB-RAM) and a re-arranged NS (FluB-RANS). Both re-arranged viruses showed temperature sensitivity in vitro compared to the WT type B/Bris strain, were genetically stable over multiple passages in embryonated chicken eggs and were attenuated in vivo in mice. In a prime-boost regime in naïve mice, both re-arranged viruses induced antibodies against HA with hemagglutination inhibition titers considered of protective value. In addition, antibodies against NA and NP were readily detected with potential protective value. Upon lethal IBV challenge, mice previously vaccinated with either FluB-RAM or FluB-RANS were completely protected against clinical disease and mortality. In conclusion, genome re-arrangement renders efficacious LAIV candidates to protect mice against IBV.
ARTICLE | doi:10.20944/preprints202101.0526.v1
Subject: Biology, Anatomy & Morphology Keywords: Asaia; paratransgenesis; symbiotic traits; Anopheles stephensi; genome features
Online: 26 January 2021 (08:19:00 CET)
Asaia bacteria commonly comprise part of the microbiome of many mosquito species in the genera Anopheles and Aedes, including important vectors of infectious agents. Their close association with multiple organs and tissues of their mosquito hosts enhances the potential for paratransgenesis for delivery of anti-malaria or anti-virus effectors. The molecular mechanisms involved in the interactions between Asaia and mosquito hosts, as well as Asaia and other bacterial members of the mosquito microbiome, remained unexplored. Here, we determined the genome sequence of the strain W12 isolated from Anopheles stephensi mosquitoes, compared them to other Asaia species associated with plants or insects, and investigated some properties of the bacteria relevant to their symbiosis with host mosquitoes. The assembled genome of strain W12 has a size of 3.94 MB, which is the largest among Asaia spp studied so far. At least 3,585 coding sequences were predicted. The insect-associated Asaia including strain W12 carried more glycoside hydrolase (GH) encoding genes (31 per genome) than those isolated from plants (22 per genome). W12 had the most predicted regulatory protein components (213) among the selected Asaia (ranging from 131 to 211), indicating its great capability to adapt to frequent environmental changes in the mosquito gut. Two complete operons encoding cytochrome bo3-type ubiquinol terminal oxidases (cyoABCD-1 and cyoABCD-2) were found in most of Asaia genomes, which possibly offer alternative terminal oxidases and allow the flexible transition of respiratory pathways. Genes involved in the production of acetoin and 2,3-butandiol have been identified in Asaia sp. W12.
REVIEW | doi:10.20944/preprints202011.0603.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: genome editing, agriculture, crispr, talen, specificity, off-target
Online: 24 November 2020 (08:35:00 CET)
We are in a new chapter of crop and livestock improvement with the emergence of genome editing. This latest generation of molecular tools can be used to make targeted changes in a genome including insertions, deletions, and mutations. With new advances comes new risks for unintended changes and impacts, thus the need for appropriate risk assessment for product development and to inform regulatory measures. Though CRISPR/Cas has arisen as the predominant technology, there are multiple types of genome editing tools each with pros and cons depending on the organism and desired outcome. Furthermore, each editing tool differs in specificity as they may edit non-intended sites, referred to as off-target edits. The consensus of the agricultural editing community is to avoid off-target editing through design and detection, instead of determining whether off-target editing in each case is detrimental. The design of a targeting component, the tool chosen, and the identification of the edit(s) made are the critical factors in avoiding off-target edits and confirming intended edits in final products that are released commercially. The limited amount of head-to-head comparisons of genome editing tools in diverse crops and livestock make it difficult to develop broad conclusions and best practices, which is further compounded by the diversity of techniques, targets, and processes. Developers and breeders should consult the literature and test as needed to determine which editing technology will be the most effective for their purposes, especially as more tools with altered efficiency and specificity become available. Yet, the lack of off-target edits in studies that employed careful design of targeting components followed by wide testing for on- and off-target edits bodes well for the use of genome editing with proper precautions of target selection and screening.
ARTICLE | doi:10.20944/preprints202011.0237.v1
Subject: Life Sciences, Biochemistry Keywords: Saccharomyces cerevisiae; SCRaMbLE; genome evolution; industrial yeast strains
Online: 6 November 2020 (10:30:45 CET)
Genome-scale engineering and custom synthetic genomes are reshaping the next generation of industrial yeast strains. The Cre-recombinase mediated chromosomal rearrangement mechanism of designer synthetic Saccharomyces cerevisiae chromosomes, known as SCRaMbLE, is a powerful tool which allows rapid genome evolution upon command. This system is able to generate millions of novel genomes with potential valuable phenotypes, but the excessive loss of essential genes often results in poor growth or even the death of cells with useful phenotypes. In this study we expanded the versatility of SCRaMbLE to industrial strains, and evaluated different control measures to optimise genomic rearrangement, whilst limiting cell death. To achieve this, we have developed RED (Rapid Evolution Detection), a simple colorimetric plate-assay procedure to rapidly quantify the degree of genomic rearrangements within a post-SCRaMbLE yeast population. RED-enabled semi-synthetic strains were mated with haploid progeny of industrial yeast strains to produce stress tolerant heterozygous diploid strains. Analysis of these heterozygous strains with the RED-assay, genome sequencing and custom bioinformatics scripts demonstrated a correlation between RED-assay frequencies and physical genomic rearrangements. Here we show that RED is a fast and effective method to evaluate optimal SCRaMbLE induction times of different Cre-recombinse expression systems for the development of industrial strains.
BRIEF REPORT | doi:10.20944/preprints202010.0601.v1
Subject: Biology, Anatomy & Morphology Keywords: Diptera; Calliphoridae; Luciliinae; complete mitochondrial genome; Lucilia sericata
Online: 29 October 2020 (09:22:19 CET)
In the present study, the complete mitochondrial genome of the New Zealand parasitic blowfly Lucilia sericata (green bottle blowfly) field strain NZ_LucSer_NP was generated using next-generation sequencing technology. The length of complete the mitochondrial genome is 15,938 bp, with 39.4% A, 13.0% C, 9.3% G, and 38.2% T nucleotide distribution. The complete mitochondrial genome consists of 13 protein-coding genes, two ribosomal RNAs, 22 transfer RNAs, and a and a 1,124 bp non-coding region, similar to most metazoan mitochondrial genomes. Phylogenetic analysis showed that L. sericata NZ_LucSer_NP forms a monophyletic cluster with the remaining six Lucilia species and the Calliphoridae are polyphyletic. This study provides the first complete mitochondrial genome sequence for a L. sericata blowfly species derived from New Zealand to facilitate species identification and phylogenetic analysis.
CONCEPT PAPER | doi:10.20944/preprints202010.0160.v1
Subject: Biology, Anatomy & Morphology Keywords: nomenclature; Candidatus; metagenome-assembled genomes; genome-based taxonomy
Online: 7 October 2020 (15:08:01 CEST)
Latin binomials, popularised in the eighteenth century by the Swedish naturalist Linnaeus, have stood the test of time in providing a stable, clear and memorable system of nomenclature across biology. However, relentless and ever-deeper exploration and analysis of the microbial world has created an urgent unmet need for huge numbers of new names for Archaea and Bacteria. Manual creation of such names remains difficult and slow and typically relies on expert-driven nomenclatural quality control. Keen to ensure the legacy of Linnaeus lives on in the age of microbial genomics and metagenomics, we propose an automated approach, employing combinatorial concatenation of roots from Latin and Greek to create linguistically correct names for genera and species that can be used off the shelf as needed. As proof of principle, we document over a million new names for Bacteria and Archaea. We are confident that our approach provides a road map for how to create new names for decades to come.
ARTICLE | doi:10.20944/preprints202005.0140.v1
Subject: Biology, Other Keywords: Genome; fimbrial; plasmid; ST131; Escherichia coli; evolution; infection
Online: 8 May 2020 (09:39:35 CEST)
The human gut microbiome includes beneficial, commensal and pathogenic bacteria that possess antimicrobial resistance (AMR) genes and exchange these predominantly through conjugative plasmids. Escherichia coli is a significant component of the gastrointestinal microbiome and is typically non-pathogenic in this niche. In contrast, extra-intestinal pathogenic E. coli (ExPEC) including ST131 may occupy other environments like the urinary tract or bloodstream where they express genes enabling AMR and host adhesion like type 1 fimbriae. The extent to which non-pathogenic gut E. coli and infectious ST131 share AMR genes and key associated plasmids remains understudied at a genomic level. Here, we examined AMR gene sharing between gut E. coli and ST131 to discover an extensive shared preterm infant resistome. In addition, individual ST131 show extensive AMR gene diversity highlighting that analyses restricted to the core genome may be limiting and could miss AMR gene transfer patterns. We show that pEK499-like segments are ancestral to most ST131 Clade C isolates, contrasting with a minority with substantial pEK204-like regions encoding a type IV fimbriae operon. Moreover, ST131 possess extensive diversity at genes encoding type 1, type IV, P and F17-like fimbriae, particular within subclade C2. The type, structure and composition of AMR genes, plasmids and fimbriae varies widely in ST131 and this may mediate pathogenicity and infection outcomes.
REVIEW | doi:10.20944/preprints201911.0337.v1
Subject: Biology, Other Keywords: leishmania; visceral leishmaniasis; Americas; genome instability,; fitness gain
Online: 27 November 2019 (09:27:16 CET)
Pathogen fitness landscapes change when transmission cycles establish in non-native environments or spill over into new vectors and hosts. The introduction of Leishmania infantum in the Americas into the Neotropics during European colonization represents a unique case study to investigate mechanisms of ecological adaptation of this important parasite. Defining the evolutionary trajectories that drive L. infantum fitness in this new environment are of great public health importance as they will allow unique insight into pathways of host/pathogen co-evolution and their consequences for region-specific changes in disease manifestation. This review summarizes current knowledge on L. infantum genetic and phenotypic diversity in the Americas and its possible role in the unique epidemiology of VL in the New World. We highlight the importance of appreciating adaptive molecular mechanisms in L. infantum to understand the parasites’ successful establishment on the continent.
ARTICLE | doi:10.20944/preprints201810.0171.v3
Subject: Medicine & Pharmacology, General Medical Research Keywords: genome-wide polygenic score; coronary artery disease; AUC
Online: 6 December 2018 (07:06:32 CET)
A recent study claimed that genome-wide polygenic scores (GPSs) for five common diseases could identify individuals with risk equivalent to monogenic mutations. Receiver operator curve analyses were reported to have areas under the curve (AUCs) ranging from 0.63 for inflammatory bowel disease up to 0.81 for coronary artery disease (CAD) but these models also included age and sex, themselves strong predictors of risk. The GPS for CAD identified 8% of the population at threefold increased risk, which it was claimed was comparable to the excess risk from monogenic mutations. In the present study attempts were made to model the distribution of the GPS for CAD to match the information provided. These models were based on the reported distribution of prevalence by centile of GPS and on the distribution of GPS in controls and cases and were fitted to the reported results using linear approximations to the distributions and using simulations of a liability-threshold model. It was impossible to produce a compatible model in which the GPS produced an AUC as high as 0.81 and the most plausible estimate was that the true AUC was only 0.65. The reported distributions of the GPS in cases and controls overlap so much that they are not compatible with an AUC of 0.7 or higher. The AUC of the GPS for these diseases is modest. Furthermore, the literature robustly demonstrates that true CAD risk associated with monogenic mutations is much higher than the threefold increase which is predicted by the GPS. Together, these findings cast doubt on the clinical utility of the GPS.
ARTICLE | doi:10.20944/preprints201811.0508.v1
Subject: Life Sciences, Other Keywords: Muller's ratchet; genome decay; ribosome; protein synthesis, rudiment
Online: 20 November 2018 (16:30:43 CET)
Microsporidia are fungi-like parasites that have the smallest known eukaryotic genome, and for that reason they are used as a model to study the phenomenon of genome decay in parasitic forms of life. Similar to other intracellular parasites that reproduce asexually in an environment with alleviated natural selection, Microsporidia experience continuous genome decay driven by Muller's ratchet - an evolutionary process of irreversible accumulation of deleterious mutations, which leads to gene loss and miniaturization of cellular components. Particularly, Microsporidia have remarkably small ribosomes in which the rRNA is reduced to the minimal enzymatic core. To better understand the impact of Muller's ratchet on RNA and protein molecules in parasitic organisms, particularly regarding their ribosome structure, we have explored an apparent effect of Muller's ratchet on microsporidian ribosomal proteins. Through mass spectrometry, analysis of microsporidian genome sequences and analysis of ribosome structure from non-parasitic eukaryotes, we found that massive rRNA reduction in microsporidian ribosomes appears to annihilate binding sites for ribosomal proteins eL8, eL27, and eS31, suggesting that these proteins are no longer bound to the ribosome in microsporidian species. We then provided an evidence that protein eS31 is retained in Microsporidia due to its non-ribosomal function in ubiquitin biogenesis. To sum up, our study illustrates that while Microsporidia carry the same set of ribosomal proteins as non-parasitic eukaryotes, some of ribosomal proteins are no longer participating in protein synthesis in Microsporidia and they are preserved from genome decay by having extra-ribosomal functions.
ARTICLE | doi:10.20944/preprints201810.0054.v1
Subject: Life Sciences, Virology Keywords: Bovine enterovirus, EV-E, Nigeria, Sewage, Complete Genome
Online: 3 October 2018 (14:24:49 CEST)
We describe the draft genome of a Bovine enterovirus (EV) recovered from sewage in Nigeria. The virus replicates on both RD and L20B cell lines, but is negative for all EV screens in use by the GPEI. It contains 7,368nt, with 50.2% G+C content and an ORF with 6,525nt (2,174aa).
ARTICLE | doi:10.20944/preprints201804.0326.v1
Subject: Biology, Horticulture Keywords: DNA markers; edible mushroom; genome stability; protoplast regeneration
Online: 25 April 2018 (08:26:25 CEST)
A total of five protoclones were successfully cultured on PDA medium out of regenerated twenty two colonies of Termitomyces protoplast and further studied. Liquid MYG grown mycelial tissue is used for protoplast isolation by enzymatic digestionin a mixture containing Lysing enzyme 2% and Cellulase R10 2% in 0.6 M mannitol. The incubation conditions like temperature, shaking and time were standardized at 24ºC, 60 rpm and 10 hours, respectively for healthy protoplasts liberation. The purified protoplasts showed an average yield of 1.2 × 107 cells/gm tissue with 31.60 ± 9.31% regeneration efficiency on specific medium and 77.12 ± 2.72% viability by FDA test. Four ISSR primers were used in this study resulting a total of 27 reproducible bands with mean value of 6.75. They showed similar banding pattern in all the lines with zero percent polymorphism ranged from 280 bp–2700 bp. The amplified rRNA-ITS gene showed ~600 bp size in gel and found a single restriction site for enzyme HaeIII in all the protoclones and parent with similar fragment size in all.
ARTICLE | doi:10.20944/preprints201703.0182.v1
Subject: Biology, Entomology Keywords: Lauxanioidea; Cyclorrhapha; mitochondrial genome; phylogeny; RNAs; intergenic sequences
Online: 24 March 2017 (08:03:42 CET)
The superfamily Lauxanioidea is a significant dipteran clade including over 2500 known species in three families: Lauxaniidae, Celyphidae and Chamaemyiidae. We sequenced the first five (three complete and two partial) lauxanioid mitochondrial (mt) genomes, and used them to reconstruct the phylogeny of this group. The lauxanioid mt genomes are typical of the Diptera, containing all 37 genes usually present in bilaterian animals. A total of three conserved intergenic sequences have been reported across the Cyclorrhapha. The inferred secondary structure of 22 tRNAs suggested five substitution patterns among the Cyclorrhapha. The control region in the Lauxanioidea has apparently evolved very fast, but four conserved structural elements were detected in all three complete mt genome sequences. Phylogenetic relationships based on the mt genome data were inferred by Maximum Likelihood and Bayesian methods. The traditional relationships between families within the Lauxanioidea, (Chamaemyiidae + (Lauxaniidae + Celyphidae)), was corroborated, however, the higher level relationships between cyclorrhaphan superfamilies are mostly poorly supported.
ARTICLE | doi:10.20944/preprints202208.0057.v1
Subject: Biology, Animal Sciences & Zoology Keywords: infectious bronchitis; viral evolution; whole genome sequencing; DMV; QX.
Online: 2 August 2022 (09:27:23 CEST)
Infectious bronchitis virus (IBV) is a highly variable RNA virus that affects chickens worldwide. Due to its inherited tendency to suffer point mutations and recombination events during viral replication, emergent IBV strains have been linked to nephropathogenic and reproductive disease that are more severe than the typical respiratory disease, leading, in some cases, to mortality, severe production losses, and/or unsuccessful vaccination. QX and DMV/1639 strains are examples of the above-mentioned IBV evolutionary pathway and clinical outcome. In this study, our purpose was to systematically compare whole genomes of QX and DMV strains looking at each IBV gene individually. Phylogenetic analyses and amino acid site searches were performed in datasets obtained from GenBank accounting for all IBV genes and using our own relevant sequences as a basis. The QX dataset studied is more genetically diverse than the DMV dataset, partially due to the greater epidemiological diversity within the five QX strains used as a basis compared to the four DMV strains from our study. Historically, QX strains have emerged and spread earlier than DMV strains in Europe and Asia. Consequently, there are more QX sequences deposited in GenBank than DMV strains, assisting in the identification of a larger pool of QX strains. It is likely that a similar evolutionary pattern will be observed among DMV strains as they develop and spread in North America.
ARTICLE | doi:10.20944/preprints202206.0298.v1
Subject: Biology, Ecology Keywords: cyanosphere; cyanobacteria; Cyanocohniella; Llayta; macrocolonies; metagenomic-assembled genome; microbiome
Online: 21 June 2022 (16:11:44 CEST)
Cyanobacterial macrocolonies known as Llayta are found at Andean wetlands and consumed since pre-Columbian times in South America. Macrocolonies of filamentous cyanobacteria are niches for colonization by other microorganisms; however, the microbiome of edible Llayta has not been explored. Based on a culture-independent approach, we report the presence, identification and metagenomic genome reconstruction of Cyanocohniella sp. LLY associated to Llayta trichomes. The assembled genome of strain LLY is now available for further inquiries, and may be instrumental for taxonomic advances on this genus. All known members of the Cyanocohniella genus have been isolated from salty European habitats. A biogeographic gap for the Cyanocohniella genus is partially filled by the existence of strain LLY at Andes Mountains wetlands in South America as a new habitat. This is the first genome available for members of this genus. Genes involved in primary and secondary metabolism are described providing new insights on the putative metabolic capabilities of Cyanocohniella sp. LLY. The reconstructed genome of strain LLY is now available and instrumental for further inquiries and taxonomic advances on the genus Cyanocohniella.
ARTICLE | doi:10.20944/preprints202204.0256.v1
Subject: Medicine & Pharmacology, Nutrition Keywords: tea intake; fracture; Mendelian randomization; genome-wide association studies
Online: 27 April 2022 (10:40:34 CEST)
Fracture is a global public health disease. Bone health and fracture risk have become the focus of public and scientific attention. Observational studies have reported that tea consumption is associated with fracture risk, but the results are inconsistent. The present study was conducted to evaluate whether tea consumption was causally associated with the risk of bone fracture through two-sample Mendelian Randomization (MR) analysis. We included a large genome-wide association study (GWAS) associated with tea consumption of 447,485 individuals and analyzed the effects of genetic instruments on fractures using fracture cases from the UK Biobank dataset (n=361,194). Inverse variance weighted (IVW) indicated no causal effects of tea consumption on fractures of the skull and face, shoulder and upper arm, hand and wrist, femur, calf, and ankle (odds ratio=1.000, P=0.881; OR=1.000, P=0.857; OR=1.002, P=0.339; OR=0.997, P=0.054; OR=0.998, P=0.569, respectively). Consistent results were also found in MR-Egger, weighted median, and weighted mode. Our research provided evidence that tea consumption is unlikely to affect the incidence of fractures.
ARTICLE | doi:10.20944/preprints202112.0354.v1
Subject: Life Sciences, Genetics Keywords: whole genome sequencing; cancer predisposition; mucin; reactive oxygen species
Online: 22 December 2021 (11:44:20 CET)
Familial colorectal cancer (CRC) is only partially explained by known germline predisposing genes. We performed whole genome sequencing in 15 Polish families of many affected individuals, without mutations in known CRC predisposing genes. We focused on loss-of-function variants and functionally characterized them. We identified a frameshift variant in the CYBA gene (c.246delC) in one family and a splice site variant in the TRPM4 gene (c.25-1 G>T) in another family. While both variants were absent or extremely rare in gene variant databases, we identified four additional Polish familial CRC cases and two healthy elderly individuals with the CYBA variant (odds ratio 2.46, 95% confidence interval 0.48-12.69). Both variants led to a premature stop codon and to a truncated protein. Functional characterization of the variants showed that knockdown of CYBA or TRPM4 depressed generation of reactive oxygen species (ROS) in LS174T and HT-29 cell lines. Knockdown of TRPM4 resulted in decreased MUC2 protein production. CYBA encodes a component in the NADPH oxidase system which generates ROS and controls, e.g., bacterial colonization in the gut. Germline CYBA variants are associated with early onset inflammatory bowel disease, supported with experimental evidence on loss of intestinal mucus barrier function due to ROS deficiency. TRPM4 encodes a calcium-activated ion channel, which in a human colonic cancer cell line controls calcium-mediated secretion of MUC2, a major component of intestinal mucus barrier. We suggest that the gene defects in CYBA and TRPM4 mechanistically involve intestinal barrier integrity through ROS and mucus biology, which converges in chronic bowel inflammation.
REVIEW | doi:10.20944/preprints202111.0385.v1
Subject: Life Sciences, Virology Keywords: n/aRNA genome; Viruses; host-viruses interactions; RNA world
Online: 22 November 2021 (11:43:15 CET)
In recent years, the role of non-coding RNAs (ncRNAs) in regulating cell physiology has begun to be better understood. Recent discoveries in viral molecular biology have revealed that such cellular functions are disturbed during viral infections mainly due to host cell ncRNAs, cellular factors, and virus-derived ncRNAs. Apart from the interplay between those molecules, other interactions derive from the specific folding of RNA virus genomes. These fulfill canonical regulation functions such as replication, translation, and viral packaging. In some cases, folds serve as precursors of small viral RNAs whose biogenesis is not yet clearly understood. Since ncRNAs and RNA viral genomes modulate complex molecular and cellular processes in viral infections, a new taxonomy is being proposed here overarching three main categories, considering the current information about ncRNA interactions in some well-known viral infections. The first category shows examples of host ncRNAs associated with the trigger of the immune response under viral infections. The second category describes interactions between the virus and host ncRNAs. The last category shows how the shape of the RNA viral genome is essential in processing RNAs derived from viruses. Finally, we introduce evidence of how these three categories can also work as a framework in order to organize known interactions of ncRNAs and cellular factors under DENV infection. This new taxonomy of interactions provides a comprehensive framework for organizing the ncRNA regulatory roles in the context of viral interactions and an RNA world.
ARTICLE | doi:10.20944/preprints202103.0121.v1
Subject: Life Sciences, Biochemistry Keywords: Familial colorectal cancer; SRC; germline variant; whole genome sequencing
Online: 3 March 2021 (09:52:06 CET)
Colorectal cancer (CRC) shows one of the largest proportions of familial cases among different malignancies, but only 5-10% of all CRC cases are linked to mutations in established predisposition genes. Thus, familial CRC constitutes a promising target for the identification of novel, high- to moderate-penetrance germline variants underlying cancer susceptibility by next generation sequencing. In this study, we performed whole genome sequencing on 3 members of a family with CRC aggregation. Subsequent integrative in silico analysis using our in-house developed variant prioritization pipeline resulted in the identification of a novel germline missense variant in SRC gene (V177M), a proto-oncogene highly upregulated in CRC. Functional validation experiments in HT-29 cells showed that introduction of SRCV177M resulted in increased cell proliferation and enhanced protein expression of phospho-SRC (Y419), a potential marker for SRC activity. Upregulation of paxillin, β-Catenin and STAT3 mRNA levels, increased levels of phospho-ERK, CREB and CCND1 proteins and downregulation of the tumor suppressor p53 further proposed the activation of several pathways due to the SRCV177M variant. The findings of our pedigree-based study contribute to the exploration of the genetic background of familial CRC and bring insights into the molecular basis of upregulated SRC activity and downstream pathways in colorectal carcinogenesis.
REVIEW | doi:10.20944/preprints202009.0604.v1
Subject: Life Sciences, Biochemistry Keywords: Nucleus; Nuclear envelope; Lamins; Genome organization; Chromatin; Gene expression
Online: 25 September 2020 (11:03:59 CEST)
Nuclear lamins are type V intermediate filament proteins that form a filamentous meshwork beneath the inner nuclear membrane. Additionally, a sub-population of A-type and B-type lamins is localized in the nuclear interior. The nuclear lamina protects the nucleus from mechanical stress and mediates nucleo-cytoskeletal coupling. Lamins form a scaffold that partially tethers chromatin at the nuclear envelope. The nuclear lamina also stabilizes protein-protein interactions involved in gene regulation and DNA repair. The lamin-based protein sub-complexes are implicated in both nuclear and cytoskeletal organization, the mechanical stability of the nucleus, genome organization, transcriptional regulation, genome stability, and cellular differentiation. Here we review recent research in the field of nuclear lamins and their role in modulating various nuclear processes and their impact on cell function.
REVIEW | doi:10.20944/preprints202009.0279.v1
Subject: Biology, Other Keywords: selection; mutation; genetic drift; adaptation; ploidy drive; genome instability
Online: 13 September 2020 (11:48:30 CEST)
Ploidy is a significant type of genetic variation, describing the number of chromosome sets per cell. Ploidy evolves in natural populations, clinical populations, and lab experiments, particularly in fungi. Despite a long history of theoretical work on this topic, predicting how ploidy will evolve has proven difficult, as it is often unclear why one ploidy state outperforms another. Here, we review what is known about contemporary ploidy evolution in diverse fungal species through the lens of population genetics. As with typical genetic variants, ploidy evolution depends on the rate that new ploidy states arise by mutation, natural selection on alternative ploidy states, and random genetic drift. However, ploidy variation also has unique impacts on evolution, with the potential to alter chromosomal stability, the rate and patterns of point mutation, and the nature of selection on all loci in the genome. We discuss how ploidy evolution depends on these general and unique factors and highlight areas where additional experimental evidence is required to comprehensively explain the ploidy transitions observed in the field and the lab.
Subject: Keywords: tetraodon palembangensis; chromosome-level genome; genomic annotation; gene family
Online: 31 August 2020 (04:28:47 CEST)
The humpback puffer, Tetraodon palembangensis, also known as Pao palembangensis, is a species of poisonous freshwater pufferfish mainly distributed in Southeast Asia (Thailand, Laos, Malaysia and Indonesia). Despite interesting biological features, such as its very inactive nature, tetrodotoxin production and body expansion mechanisms, molecular research on the humpback puffer is still rare because of the lack of a high-quality reference genome. Here, we reported a first chromosome-level genome assembly of an adult humpback puffer, of which the genome size is 362 Mb with ~1.78 Mb contig N50 and ~15.8 Mb scaffold N50s. Based on the genome, ~61.5Mb (18.11%) repeat sequences were also identified, and totally 19,925 genes were annotated, 99.20% of which could be predicted with function using protein-coding function databases. Finally, a phylogenetic tree was constructed with single-copy gene families from ten teleost fishes. The humpback puffer genome will be a valuable genomic resource to illustrate possible mechanisms of tetrodotoxin synthesis and tolerance, providing clues for future detailed studies of biological toxins.
ARTICLE | doi:10.20944/preprints202007.0251.v1
Subject: Life Sciences, Molecular Biology Keywords: SARS-CoV-2; COVID-19; Spike protein; Mutant; Genome
Online: 12 July 2020 (12:03:16 CEST)
The severity of coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), greatly varies from patient to patient. In the present study, we explored and compared mutation profiles of SARS-CoV-2 isolated from mildly affected and severely affected COVID-19 patients in order to explore any relationship between mutation profile and disease severity. Genomic sequences of SARS-CoV-2 were downloaded from GISAID database. With the help of Genome Detective Coronavirus Typing Tool, genomic sequences were aligned with the Wuhan seafood market pneumonia virus reference sequence and all the mutations were identified. Distribution of mutant variants was then compared between mildly and severely affected groups. Among the numerous mutations detected, 14,408C>T and 23,403A>G mutations resulting in RNA-dependent RNA polymerase (RdRp) P323L and spike protein D614G mutations, respectively, were found predominantly in severely affected group (>82%) compared with mildly affected group (<46%, p<0.001). The 241C>T mutation in the non-coding region of the genome was also found predominantly in severely affected group. The 3,037C>T, a silent mutation, also appeared in relatively high frequency in severely affected group. We concluded that RdRp P323L and spike protein D614G mutations predominate in severely affected COVID-19 patients. Further studies will be required to explore whether these mutations have any impact on the severity of COVID-19.
ARTICLE | doi:10.20944/preprints201912.0024.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: wheat variety bn207; genome composition; fish; snp; chromosomal variations
Online: 3 December 2019 (11:49:52 CET)
Development and deployment of wheat varieties with high yields, wide adaptability, good quality, multiple-resistance to abiotic and biotic stresses, and efficient response to fertilizers have greatly contributed to global wheat sustainable production. The genomic composition of key commercial wheat variety can help understand the genetic basis underlying the development of new variety and permit increased breeding efficiency. In this study, we report the chromosomal and genomic compositions of BN207, presently the leading wheat variety in the southern region of Huang-Huai River Valley, the most important wheat producing area in China through an integrated analysis using fluorescent in situ hybridization (FISH) and wheat 15 K SNP array. Our results showed that BN207 inherited 55.3% and 40.7% of its genome from its male parent BN64 and female parent ZM16, respectively, and generating 64 novel or recombined loci. Besides, we detected nine chromosomal variations in Bn207 and its parents and ten sister lines, and physically mapped two variations, the pericentric inversion of chromosome 6B, and large tandem repeat sequence block at the long arm of 5A, both had positive effects on agronomic traits, by integration of FISH and SNP loci recombination analyses. These results will provide a reference for breeding of high yield wheat varieties as BN207, and the application of founder parents BN64 and ZM16, which are being utilized frequently in wheat breeding programs in Henan Province and surrounding areas.
ARTICLE | doi:10.20944/preprints201805.0330.v1
Subject: Life Sciences, Molecular Biology Keywords: fibrillin; cucumber; genome-wide; gene expression; high light stress
Online: 24 May 2018 (05:24:00 CEST)
Fibrillin (FBN) is a plastid lipid-associated protein found in photosynthetic organisms from cyanobacteria to plants. In this study, 10 CsaFBN genes were identified in genomic DNA sequences of cucumber (Chinese long and Gy14) through database searches using the conserved domain of FBN and the 14 FBN genes of Arabidopsis. Phylogenetic analysis of CsaFBN protein sequences showed that there was no counterpart of Arabidopsis and rice FBN5 in the cucumber genome. FBN5 is essential for growth in Arabidopsis and rice; its absence in cucumber may be because of incomplete genome sequences or that another FBN carries out its functions. Among the 10 CsaFBN genes, CsaFBN1 and CsaFBN9 were the most divergent in terms of nucleotide sequences. Most of the CsaFBN genes were expressed in the leaf, stem, and fruit. CsaFBN4 showed the highest mRNA expression levels in various tissues, followed by CsaFBN6, CsaFBN1, and CsaFBN9. High-light stress combined with low temperature decreased photosynthetic efficiency and highly induced transcript levels of CsaFBN1, CsaFBN6, and CsaFBN11, which decreased after 24 h treatment. Transcript levels of the other seven genes were changed only slightly. This result suggests that CsaFBN1, CsaFBN6, and CsaFBN11 may be involved in photoprotection under high-light conditions at low temperature.
ARTICLE | doi:10.20944/preprints202301.0350.v1
Subject: Life Sciences, Microbiology Keywords: whole genome sequencing; β-lactamases; MLST; plasmid replicons; Klebsiella pneumoniae
Online: 19 January 2023 (09:06:46 CET)
Klebsiella pneumoniae (Kp) has gained prominence in the last two decades due to its global spread as a multi-drug resistant (MDR) pathogen. Further, Carbapenem-Resistant Kp are emerging at an alarming rate. The objective of this study was (1) to evaluate the prevalence of β-lactamases, especially carbapenemases in Kp isolates from India, (2) determine the most prevalent sequence type (ST) & plasmids, and their association with β-lactamases. Clinical samples of K. pneumoniae (n=65) were collected from various pathology lab, drug susceptibility and minimum inhibitory concentrations (MIC) were detected. Whole genome sequencing (WGS) was done for (n=22) resistant isolates and WGS analysis was performed using various bioinformatics tools. Additional Indian MDR Kp genomes (n=187) were retrieved using Pathosystems Resource Integration Center (PATRIC) database. Detection of β-lactamase genes, location, plasmid replicons, and ST type of genomes were carried out using CARD, mlplasmids, PlasmidFinder, and PubMLST respectively. All data were analyzed and summarized using iTOL tool. ST231 was highest, followed by ST147, ST2096 & ST14 among Indian isolates. blaAmpH was detected as the most prevalent gene followed by blaCTX-M-15, blaTEM-1. Among carbapenemase genes, blaOXA-232 was prevalent and associated with ST231, ST2096 and ST14, which was followed by blaNDM-5 which was observed to prevalent in ST147, ST395 &ST437. ST231 genomes were most commonly found to carry Col440I and ColKP3 plasmids. ST16 carried mainly ColKP3, and Col (BS512) was abundantly present in ST147 genomes. One Kp isolate with novel MLST profile was identified, which carried blaCTX-M-15, blaOXA-1 and blaTEM-1. ST16 &ST14 from this study, which is mostly dual producer of carbapenem and ESBL genes, could be emerging high-risk clones in India.
REVIEW | doi:10.20944/preprints202212.0184.v1
Subject: Biology, Plant Sciences Keywords: CRISPR; genome editing; gene editing; forage grass; abiotic stress; plant
Online: 12 December 2022 (01:38:47 CET)
Due to an increase in the consumption of food, feed, and fuel and to meet global food security needs for the rapidly growing human population, there is a necessity to obtain high-yielding crops that can adapt to future climate changes. Currently, the main feed source used for ruminant livestock production is forage grasses. In temperate climate zones, perennial grasses grown for feed are widely distributed and tend to suffer under unfavorable environmental conditions. Gene editing has been shown to be an effective tool for the development of abiotic stress-resistant plants. The highly versatile CRISPR-Cas system enables increasingly complex modifications in genomes while maintaining precision and low off-target frequency mutations. In this review, we provide an overview of forage grass species that have been subjected to gene editing. We offer a perspective view on the generation of plants resilient to abiotic stresses. Due to the broad factors contributing to these stresses the review focuses on drought, salt, heat, and cold stresses. The application of new genomic techniques (e.g., CRISPR-Cas) allows addressing several challenges caused by climate change and abiotic stresses for developing forage grass cultivars with improved adaptation to the future climatic conditions. Gene editing will contribute towards developing safe and sustainable food systems.
ARTICLE | doi:10.20944/preprints202211.0039.v1
Subject: Life Sciences, Genetics Keywords: Staphylococcus aureus, MRSA ST239, osteomyelitis, genome features, adaptation; chronic infection
Online: 2 November 2022 (03:34:29 CET)
Abstract. The increasing frequency of isolation of methicillin-resistant Staphylococcus aureus (MRSA) limits the chances of effective antibacterial therapy of staphylococcal diseases and results in development of persistent infection such as bacteremia and osteomyelitis. The aim of this study was to identify features of the MRSAST239 0943-1505-2016 (SA943) genome, that contribute to the formation of both acute and chronic musculoskeletal infections. The analysis was performed using comparative genomics data of the dominant epidemic S. aureus lineages namely ST1, ST8, ST30, ST36, ST239. SA943 genome encodes proteins that provide resistance to the host immune system, suppress immunological memory and form biofilms. The molecular mechanisms of adaptation responsible for development of persistent infection were as follows: amino acid substitution in PBP2 and PBP2a, providing resistance to ceftaroline; loss of a large part of prophage DNA and restoration of nucleotide sequence of beta-hemolysin, that greatly facilitates escape of phagocytosed bacteria from phagosome and formation of biofilms; dysfunction of the AgrA system due to the presence of psm-mec and several amino acid substitutions in the AgrC; partial deletion of nucleotide sequence in genomic island vSAβ resulting in the loss of two proteases of Spl - operon; deletion of SD repeats in SdrE amino acid sequence.
ARTICLE | doi:10.20944/preprints202210.0220.v1
Subject: Biology, Animal Sciences & Zoology Keywords: Haemonchus contortus; nematode; genome; transcriptome; microbiome; host-parasite interactions; vaccine
Online: 17 October 2022 (02:09:27 CEST)
The emergence of drug-resistant parasitic nematodes of both humans and livestock calls for development of alternative and cost-effective control strategies. For the economically important ruminant strongylid Haemonchus contortus, Barbervax® remains the only registered vaccine available. Here we compared the microbiome, genome-wide diversity and transcriptome of H. contortus adult male populations that survived vaccination with an experimental vaccine after inoculation in sheep. Our genome-wide SNP analysis revealed 16 putative candidate vaccine evasion genes. However, we did not identify any evidence for changes in microbial community profiling based of 16S rRNA gene sequencing results of vaccine surviving parasite populations. A total of 58 genes were identified as significantly differentially expressed with six being long non-coding (lnc) RNAs and no putative candidate SNP associated genes. The genes highly upregulated in surviving parasites from vaccinated animals were associated with GO terms belonging to predominantly molecular function and a few biological processes that may have facilitated evasion or potentially lessened the effect of the vaccine. These included five targets: astacin (ASTL), carbonate dehydratase (CA2), phospholipase A2 (PLA2), glutamine synthetase (GLUL) and fatty acid-binding protein (FABP3). We searched all five DEG targets against the proteomes of selected Nematoda (Clades III, V, IV, C, I) and Platyhelminthes (Clades Monogenea, Trematoda, Cestoda, Rhabditophora) to determine homologs within the H. contortus NZ_HCO_NP v1.0 genome and identified single-copy orthologous groups (OGs) in selected proteomes. All but one (FABP3) demonstrated high levels of duplication and wide-spread occurrence in closely related Caenorhabditis elegans and Pristionchus pacificus, with complete absence of all five gene targets among other Clade III (Toxocara canis) and V (Ascaris suum, Ascaris lumbricoides and Parascaris univalens) nematodes, further supporting their vital biological functions in nematodes. Phylogenetic analyses inferred the presence of only ASTL and CA2 in almost all Nematoda, platyhelminthes and metazoans examined, with loss of GLULs observed among all outgroup vertebrate species and the presence of FABP3 in only three other species (Schmidtea mediterranea, Fasciola gigantica and F. hepatica). Our tertiary structure predictions and modelling analyses were used to perform in silico searches of all published and commercially available inhibitor molecules or substrate analogues with potential broad-spectrum efficacy against nematodes of human and veterinary importance.
ARTICLE | doi:10.20944/preprints202209.0004.v2
Subject: Life Sciences, Virology Keywords: phage Rih21; MRSA; novel bacteriophage; S. aureus; bacteriophage; phage genome analysis
Online: 5 October 2022 (10:13:04 CEST)
From the hospital wastewater, a novel bacteriophage was isolated and characterized. According to characterization properties, this bacteriophage belongs to the Siphoviridae family, the maximum bacteriophage titer was recorded at 37°C and a pH of 7.2, had a 44,789 bp linear double-strand DNA genome, and within the genome sequence, there are 61 genes, all of which are encoded into proteins. Although this bacteriophage does not have any virulence factors or antimicrobial resistance genes and had specific lytic activity against some antimicrobial resistance S. aureus clinical isolates.
ARTICLE | doi:10.20944/preprints202204.0005.v1
Subject: Medicine & Pharmacology, Other Keywords: genome mining; marine environments; molecular networking; bacterial extremophiles; secondary metabolites
Online: 1 April 2022 (10:21:11 CEST)
Understanding extremophiles and their usefulness in biotechnology involves studying their habitat, physiology and biochemical adaptations , as well as their ability to produce biocatalysts, in environments that are still poorly explored. In northwestern Peru, which saline lagoons of marine origin Pacific Ocean, the other site from the coast of Brazil of the Atlantic Ocean. Both environments are considered extreme. The objective of the present work was to compare two different strains isolated from these extreme environments at the metabolic level using molecular network methodology through the Global Natural Products Molecular Social Network (GNPS). In our study, the MS/MS spectra from the network were compared with GNPS spectral libraries, where the metabolites were annotated. Differences were observed in the molecular network presented in the two strains of Streptomyces spp. coming from these two different environments. Within the annotated compounds from marine bacteria, the metabolites characterized for Streptomyces sp. B-81 from Peruvian marshes were lobophorins A (1) and H (2), as well as divergolides A (3), B (4) and C (5). Streptomyces sp. 796.1 produced different compounds, such as glucopiericidin A (6) and dehydro-piericidin A1a (7). The search for new metabolites in underexplored environments may therefore reveal new metabolites with potential application in different areas of biotechnology.
BRIEF REPORT | doi:10.20944/preprints202201.0057.v1
Subject: Life Sciences, Virology Keywords: Dengue virus; complete genome; Cosmopolitan genotype; Senegal; 2018; Regional diversification
Online: 6 January 2022 (09:56:19 CET)
To assess the genetic diversity of circulating dengue virus 2 in Senegal in 2018 we performed molecular characterization by complete genome sequencing and performing phylogenetic analysis. Sequenced strains belong to Cosmopolitan genotype of DENV-2 we observed intra-genotype variability leading to a divergence in two clades with differential geographic distribution. We report two variants namely; the “Northern variant” harbouring three nonsynonymous mutations (V1183M, R1405K, P2266T) located respectively on NS2A, NS2B and NS4A and the “Western variant” with two nonsynonymous mutations (V1185E, V3214E) located respectively in the NS2A gene and the NS5 gene. Findings calls for in depth in vitro and functional study to elucidate the impact of observed mutations on viral fitness, spread, epidemiology and disease outcome.
ARTICLE | doi:10.20944/preprints202111.0557.v1
Subject: Biology, Other Keywords: Bacterial nomenclature; archaeal nomenclature; genome taxonomy; shotgun metagenomics; Candidatus names
Online: 30 November 2021 (10:53:50 CET)
Thousands of new bacterial and archaeal species and higher-level taxa are discovered each year through the analysis of genomes and metagenomes. The Genome Taxonomy Database (GTDB) provides hierarchical sequence-based descriptions and classifications for new and as-yet-unnamed taxa. However, bacterial nomenclature, as currently configured, cannot keep up with the need for new well-formed names. Instead, microbiologists have been forced to use hard-to-remember alphanumeric placeholder labels. Here, we exploit an approach to the generation of well-formed arbitrary Latinate names at a scale sufficient to name tens of thousands of unnamed taxa within GTDB. These newly created names represent an important resource for the microbiology community, facilitating communication between bioinformaticians, microbiologists and taxonomists, while populating the emerging landscape of microbial taxonomic and functional discovery with accessible and memorable linguistic labels.
ARTICLE | doi:10.20944/preprints202111.0517.v1
Subject: Biology, Other Keywords: Rhodotorula babjevae; de-novo hybrid assembly; Nanopore sequencing; genome divergence
Online: 29 November 2021 (07:57:39 CET)
The genus Rhodotorula includes basidiomycetous oleaginous yeast species. R. babjevae can produce compounds of biotechnological interest such as lipids, carotenoids and biosurfactants from low value substrates such as lignocellulose hydrolysate. High-quality genome assemblies are needed to develop genetic tools and to understand fungal evolution and genetics. Here, we combined short- and long-read sequencing to resolve the genomes of two R. babjevae strains, CBS 7808 (type strain) and DBVPG 8058 at chromosomal level. Both genomes have a size of 21 Mbp and a GC content of 68.2%. Allele frequency analysis indicated tetraploidy in both strains. They harbor 21 putative chromosomes with sizes ranging from 0.4 to 2.4 Mb. In both assemblies, the mitochondrial genome was recovered in a single contig, which shared 97% pairwise identity. The pairwise identity between the majority of chromosomes ranges from 82% to 87%. We found indications for strain-specific extrachromosomal endogenous DNA. 7,591 protein-coding genes and 7,607 associated transcripts were annotated in CBS 7808 and 7,481 protein-coding genes and 7,516 associated transcripts in DBVPG 8058. CBS 7808 has accumulated a higher number of tandem duplications than DBVPG 8058. We identified large translocation events between putative chromosomes and a high genetic divergence between the two strains.
ARTICLE | doi:10.20944/preprints202110.0093.v1
Subject: Life Sciences, Genetics Keywords: genome, DNA, alphabet, matrices, tensor product, quantum informatics, stochastic resonance.
Online: 5 October 2021 (16:25:34 CEST)
The article is devoted to the new results of the author, which add his previously published ones, of studying hidden rules and symmetries in structures of long single-stranded DNA sequences in eukaryotic and prokaryotic genomes. The author uses the existence of different alphabets of n-plets in DNA: the alphabet of 4 nucleotides, the alphabet of 16 douplets, the alphabet of 64 triplets, etc. Each of such DNA alphabets of n-plets can serve for constructing a text as a chain of these n-plets. Using this possibility, the author represents any long DNA nucleotide sequence as a bunch of many so-called n-texts, each of which is written on the basis of one of these alphabets of n-plets. Each of such n-texts has its individual percents of different n-plets in its genomic DNA. But it turns out that in such multi-alphabetical or multilayer presentation of each of many genomic DNA, analyzed by the author, universal rules of probabilities and symmetry exist in interrelations of its different n-texts regarding their percents of n-plets. In this study, the tensor product of matrices and vectors is used as an effective analytical tool borrowed from the arsenal of quantum mechanics. Some additions to the topic of algebra-holographic principles in genetics are also presented. Taking into account the described genomic rules of probability, the author puts also forward a concept of the important role of stochastic resonances in genetic informatics.
ARTICLE | doi:10.20944/preprints202102.0604.v1
Subject: Life Sciences, Biochemistry Keywords: West Nile Virus; outbreak; meningoencephalitis; epidemiology; phylogeny; whole genome sequencing
Online: 26 February 2021 (09:46:38 CET)
During the last decades West Nile Virus (WNV) outbreaks have continuously occurred in the Mediterranean area. In August 2020 a new WNV outbreak affected 71 people with meningoencephalitis in Andalusia and 6 more cases in Extremadura (south-west of Spain), causing a total of eight deaths. The whole genomes of four viral isolates were obtained and phylogenetically analyzed in the context of recent outbreaks. The Andalusian viral samples belonged to the lineage 1 and were relatively similar to previous outbreaks occurred in the Mediterranean region. Here we present a detailed analysis of the outbreak, including an extensive phylogenetic study.
REVIEW | doi:10.20944/preprints202101.0110.v2
Subject: Biology, Anatomy & Morphology Keywords: Amphiploidy; Disomic Polyploidy; Plant Genome Evolution; Neo-polyploidy; Polysomic Polyploidy
Online: 23 February 2021 (14:25:28 CET)
Polyploidy means having more than two basic sets of chromosomes. Polyploid plants may be artificially obtained through chemical, physical and biological (2n gametes) methods. This approach allows an increased gene scope and expression, thus resulting in phenotypic changes such as yield and product quality. Nonetheless, breeding new cultivars through induced polyploidy should overcome deleterious effects that are partly contributed by genome and epigenome instability after polyploidization. Furthermore, shortening the time required from early chromosome set doubling to the final selection of high yielding superior polyploids is a must. Despite these hurdles, plant breeders have successfully obtained polyploid bred-germplasm in broad range of forages after optimizing methods, concentration and time, particularly when using colchicine. These experimental polyploids are a valuable tool for understanding gene expression, which seems to be driven by dosage dependent gene expression, altered gene regulation and epigenetic changes. Isozymes and DNA-based markers facilitated the identification of rare alleles for particular loci when compared with diploids, and also explained their heterozygosity, phenotypic plasticity and adaptability to diverse environments. Experimentally induced polyploid germplasm could enhance fresh herbage yield and quality, e.g. leaf protein content, leaf total soluble solids, water soluble carbohydrates and sucrose content. Offspring of experimentally obtained hybrids should undergo selection for several generations to improve their performance and stability.
ARTICLE | doi:10.20944/preprints202012.0421.v1
Online: 17 December 2020 (09:13:29 CET)
Whole genome pooled sequence data of 12 Pakistani Teddy goats is analyzed for positive selection signatures as their breed defining characteristics. Selection imprints left in the Teddy genome are unveiled by genomic differentiation after the successful paired-end alignment of 635,357,043 reads with (ARS1) reference genome assembly. Pooled-heterozygosity ( ) and Tajima’s D (TD) are applied for validation and getting better hits of selection signals, while pairwise FST statistics is conducted on Teddy vs. Bezoar (wild goat ancestor) for genomic differentiation. Annotation of regions under positive selection reveals 59 genes underlying production and adaptive traits. score ≥ 5 detected six windows having highest scores on Chr. 29, 9, 25, 15 and 14 that harbor HRASLS5, LACE1 and AXIN1 genes which are candidate for embryonic development, lactation and body height. Secondly, TD value of ≤ -2.2 showed 4 windows with very strong hits on Chr.5 & 9 harbor STIM1 and ADM genes related to body mass and weight. Lastly, FST analysis generated three strong signals with threshold ≤ 0.42 on Chr.12 & 5 harbor ITGB1 gene associated with milk production & lactation traits. Other significant selection signatures encompass genes associated with wool production, prolificacy, immunity and coat colors. In brief, this study identified the genes under selection in this Pakistani goat breed that will be helpful to refining future breeding policies and converging required productive traits within and across other goat breeds and to explore full genetic potential of this valued livestock species.
CONCEPT PAPER | doi:10.20944/preprints202010.0368.v1
Subject: Life Sciences, Biochemistry Keywords: proteoform; human genome project; proteomics; post-translational modification; human proteome
Online: 19 October 2020 (10:49:39 CEST)
Proteins are the primary effectors of function in biology, and thus complete knowledge of their structure and properties is fundamental to deciphering function in basic and translational research. The chemical diversity of proteins is expressed in their many proteoforms, which result from combinations of genetic polymorphisms, RNA splice variants and post-translational modifications. This knowledge is foundational for the biological complexes and networks that control biology, yet remains largely unknown. We propose here an ambitious initiative to define the human proteome; that is to generate a definitive reference set of the proteoforms produced from the genome. Several examples of the power and importance of proteoform-level knowledge in disease-based research are presented, along with a call for improved technologies in a two-pronged strategy to accomplish the Human Proteoform Project.
REVIEW | doi:10.20944/preprints202006.0086.v2
Subject: Life Sciences, Virology Keywords: SARS-CoV-2; Genome organisation and expression; Polyproteins; Prevention strategies
Online: 14 June 2020 (16:49:10 CEST)
COVID-19 manifests regarding extreme acute respiratory conditions caused by a novel beta coronavirus (SARS-CoV-2) which is reported to be the seventh coronavirus to infect humans. Like other SARS-CoVs it has a large positive-stranded RNA genome. But specific furin site in the spike protein, mutation prone and phylogenetically mess Orf1ab separates SARS-CoV-2 from other RNA viruses. Since, the outbreak (February - March 2020) which originated in China, researchers, scientists, and medical professionals are inspecting all possible facts from every possible aspect including its replication, detection, and prevention strategies. This led to the prompt identification of its basic biology, genome characterization, structural based functional information of proteins, and strategies to prevent its spread. Due to the rapid mutation rate, the functional characterization of a few proteins is still lagging. This review summarizes the recent updates on the basic molecular biology of SARS-CoV-2 and prevention strategies undertaken worldwide to tackle COVID-19. This recent information can be implemented for the development and designing of therapeutics against SARS-CoV-2.
ARTICLE | doi:10.20944/preprints202006.0089.v1
Subject: Life Sciences, Genetics Keywords: Wuhua yellow chicken; whole genome resequencing; heritable variation; selection signal
Online: 7 June 2020 (14:42:23 CEST)
Chickens have extensive phenotypic variation. The Wuhua yellow chicken (WHYC) is an important traditional yellow-feathered chicken in China, characterized by white tail feathers, white flight feathers, and strong disease resistance. However, the genomic basis of traits associated with WHYC is still poorly understood. In this study, whole genome resequencing was performed with an average coverage of 20.77-fold to investigate heritable variation and identify selection signals in WHYC. Reads were mapped onto the chicken reference genome (Galgal5) with a coverage of 85.95%. After quality control, 11,953,471 SNPs and 1,069,574 InDels were obtained. In addition, 41,408 structural variants and 33,278 copy number variants were found. A comparative genomic analysis of WHYC and other yellow-feathered chicken showed that selected regions were enriched in genes involved in transport and catabolism, immune system, infectious diseases, signal transduction, and signaling molecules and interaction. Several genes associated with disease resistance were identified, including IFNA, IFNB, CD86, IL18, IL11RA, VEGFC, and ATG10. Furthermore, PMEL and TYRP1 may contribute to the coloring of white feathers in WHYC. These findings improve our understanding of the genetic characteristics of WHYC and may contribute to future breed improvement.
Subject: Medicine & Pharmacology, General Medical Research Keywords: COVI-19; SARS-CoV-2; virus; mutation; polymorphism; genome sequence
Online: 21 May 2020 (04:09:53 CEST)
Background: SARS-CoV-2 infection has spread to over 200 countries since it was first reported in December of 2019. Significant country-specific variations in infection and mortality rate have been noted. Although country-specific differences in public health response have had a large impact on infection rate control, it is currently unclear as to whether evolution of the virus itself has also contributed to variations in infection and mortality rate. Previous studies on SARS-CoV-2 mutations were based on the analysis of ~ 160 SARS-CoV-2 sequences available until mid-February 2020.2, 3, 4, 5 By mid-April, > 550 SARS-CoV-2 sequences had been deposited in GenBank, and over 8,200 in the GISAID database. Methods: We performed a sequence analysis on 474 SARS-CoV-2 genomes submitted to GenBank up to April 11, 2020 by multiple alignment using Map to a Reference Assembly and Variants/SNP identification. The results were verified on a larger scale, 8,126 hCoV-19 (SARS-CoV-2) sequences from GISAID database. Results: We identified 5 recently emerged mutations in many isolates (up to 40%). Our analysis highlights 5 frequent new mutations that have emerged since late February 2020. These mutations are: one each missense (non-synonymous) mutation in orf1ab (C1059T), orf3 (G25563T) and orf8 (C27964T), one in 5’UTR (C241T), one in a non-coding region (G29553A). The final mutation (G29553A) was found to be almost exclusive to the US isolates. The first 3 mutations are non-synonymous, leading to amino acid substitutions in the viral protein sequence. Except for C241T, all the novel mutations identified are absent in the isolates from Italy and Spain in the SARS-CoV-2 genomes deposited in GenBank and GISAID by April 13, 2020. Conclusion: The results of current study indicate that new mutations are emerging as COVID-19 pandemic are spreading to different countries and that geography specific mutants may exist. The findings of current study lay the foundation for further investigation into the impact of SARS-CoV-2 mutations on disease incidence, severity, and host immune response. In addition, it may also provide insights into vaccine development and serological response detection for the virus.
REVIEW | doi:10.20944/preprints202004.0359.v1
Subject: Life Sciences, Microbiology Keywords: SARS-CoV-2; genetic diversity; genome evolution; diagnostics; therapeutics; vaccines
Online: 20 April 2020 (02:33:15 CEST)
A novel coronavirus COVID-19 was first emerged in Wuhan city of Hubei Province in China in December 2019. The COVID-19, since then spreads to 213 countries and territories, and has become a pandemic. Genomic analyses have indicated that the virus, popularly named as corona, originated through a natural process and is probably not a purposefully manipulated laboratory construct. However, currently available data are not sufficient to precisely conclude the origin of this fearsome virus. Genome-wide annotation of thousands of genomes revealed that more than 1,407 nucleotide mutations and 722 amino acids replacements occurred at different positions of the SARS-CoV-2. The spike (S) glycoprotein of SARS-CoV-2 possesses a functional polybasic (furin) cleavage site at the S1-S2 boundary through the insertion of 12 nucleotides. It leads to the predicted acquisition of 3-O-linked glycan around the cleavage site. Although real-time RT-PCR methods targeting specific gene(s) have widely been used to diagnose the COVID-19 patients, however, recently developed more convenient, rapid, and specific diagnostic tools targeting IgM/IgG or newly developed plug and play methods should be available for resource-poor developing countries. Some drugs, vaccines and therapies have shown great promise in early trials, however, these candidates of preventive or therapeutic agents have to pass a long path of trials before being released for the practical application against COVID-19. This review updates current knowledge on origin, genomic evolution, development of the diagnostic tools and the preventive or therapeutic remedies of the COVID-19, and discusses on scopes for further research and effective management and surveillance of COVID-19.
REVIEW | doi:10.20944/preprints201911.0076.v3
Subject: Life Sciences, Genetics Keywords: phase separation; nuclear bodies; self-assembly; genome organization; gene expression
Online: 11 December 2019 (11:17:34 CET)
The importance of genome organization at the supranucleosomal scale in the control of gene expression is increasingly recognized today. In mammals, Topologically Associating Domains (TADs) and the active / inactive chromosomal compartments are two of the main nuclear structures that contribute to this organization level. However, recent works reviewed here indicate that, at specific loci, chromatin interactions with nuclear bodies could also be crucial to regulate genome functions, in particular transcription. They moreover suggest that these nuclear bodies are membrane-less organelles dynamically self-assembled and disassembled through mechanisms of phase separation. We have recently developed a novel genome-wide experimental method, High-salt Recovered Sequences sequencing (HRS-seq), which allows the identification of chromatin regions associated with large ribonucleoprotein (RNP) complexes and nuclear bodies. We argue that the physical nature of such RNP complexes and nuclear bodies appears to be central in their ability to promote efficient interactions between distant genomic regions. The development of novel experimental approaches, including our HRS-seq method, is opening new avenues to understand how self-assembly of phase separated nuclear bodies possibly contributes to mammalian genome organization and gene expression.
ARTICLE | doi:10.20944/preprints201809.0378.v1
Subject: Biology, Other Keywords: enterobacteriaceae; antibiotics; beta-lactamases; beta-lactam resistome; whole genome sequencing
Online: 19 September 2018 (09:47:42 CEST)
Beta-lactam resistant bacteria, commonly resident in tertiary hospitals, have emerged as a worldwide health problem because of ready-to-eat vegetable intake. We aimed to characterize the genes providing resistance to beta-lactam antibiotics in Enterobacteriaceae, isolated from five commercial salad brands for human consumption in Mexico City. 25 samples were collected, grow in blood agar plates, the bacteria were biochemistry identified and antimicrobial susceptibility testing was done, the carried family genes were identified by endpoint PCR and the specific genes were confirmed with WGS by NGS. 12 positive cultures were identified and their microbiological distribution was as follows, 8.3% for Enterobacter aerogene (n=1), 8.3% for Serratia fonricola (n=1), 16.7% for Serratia marcesens (n=2), 16.7% for Klebsiella pneumoniae (n=2), and 50% (n=6) for Enterobacter cloacae. The endpoint PCR results showed 11 colonies positive for blaBIL (91.7%), 11 for blaSHV (91.7%), 11 for blaCTX (97.7%), 12 for blaDHA (100%),4 for blaVIM (33.3%), 2 for blaOXA (16.7%), 2 for blaIMP (16.7%), 1 for blaKPC (8.3%) and 1 for blaTEM (8.3%) gene, all samples were negative blaROB, blaCMY, blaP, blaCFX and blaLAP gene. The sequencing analysis revels a specific genotypes for Enterobacter cloacae (blaSHV-12, blaCTX-M-15, blaDHA-1, blaKPC-2); Serratia marcescens (blaSHV-1, blaCTX-M-3, blaDHA-1, blaVIM-2); Klebsiella pneumoniae (blaSHV-12, blaCTX-M-15, blaDHA-1); Serratia fonticola (blaSHV-12, blaVIM-1, blaDHA-1) and Enterobacter aerogene (blaSHV-1, blaCTX-M-1, blaDHA-1, blaVIM-2, blaOXA-9). Our results indicate that beta-lactam resistant bacteria have acquired integrons with a different number of genes that providing panresistance to beta-lactam antibiotics, including penicillins, oxacillins, cefalosporins, monobactams, carbapenems and imipenems.
ARTICLE | doi:10.20944/preprints201809.0337.v1
Subject: Life Sciences, Virology Keywords: Echovirus 7; Echovirus 19; Nigeria; Enterovirus Species B; Complete Genome
Online: 18 September 2018 (09:39:11 CEST)
We describe the genomes of two Echovirus isolates from Nigeria as reference enterovirus species B genomes for the region. These Echovirus 7 and 19 genomes have 7,411nt and 7,426nt, and were recovered from sewage contaminated water (in 2010) and an acute flaccid paralysis case (in 2014), respectively.
ARTICLE | doi:10.20944/preprints201804.0106.v1
Subject: Biology, Plant Sciences Keywords: Clematis; chloroplast genome; rearrangement; inversion; IR expansion; synonymous substitution rate
Online: 9 April 2018 (10:34:28 CEST)
Genus Clematis is one of the largest within Ranunculaceae. Here we report the chloroplast genome of two Clematis species, C. brachyura and C. trichotoma endemic to Korea. The chloroplast genome lengths of C. brachyura and C. trichotoma are 159,532 bp and 159,170 bp, respectively. Gene contents in the complete chloroplast genomes of these two Clematis species are identical to that of most Ranunculaceae and other angiosperms. However, our data results demonstrated that genus Clematis has inversion and rearrangement events concerning gene rps4 gene, rps16 to trnH region, and trnL to ndhC region, and IR regions expansion. Comparison of IR regions among Ranunculaceae species revealed that Clematis species contained six protein coding genes (infA, rps8, rpl14, rpl16, rps3, and rpl22) usually found in the long single copy (LSC) region of other species. Phylogenetic analysis demonstrated that genus Clematis is closely related to genus Ranunculus. Differences in repeat structure, substitution rates, and IR expansion in genera Clematis and Ranunculus, explained their relationship. Clematis species showed slightly higher tandem repeats content than Ranunculus species. The six protein-coding genes showed lower synonymous substitution rates in the IR of Clematis species than in the LSC of Ranunculus species. Overall, the chloroplast genomes and results presented here provide important information on the evolution of Ranunculaceae.
ARTICLE | doi:10.20944/preprints202208.0178.v1
Subject: Medicine & Pharmacology, Psychiatry & Mental Health Studies Keywords: alcohol dependence; comorbidity; gene network; genome-wide association study; sex differences
Online: 9 August 2022 (10:35:29 CEST)
At least 50% of factors predisposing to alcohol dependence (AD) are genetic and women affected with this disorder present with more psychiatric comorbidities, probably indicating different genetic factors involved. We aimed to run a genome-wide association study (GWAS) followed by a bioinformatic functional annotation of associated genomic regions in male and female patients with AD and eight related clinical measures. A genome-wide significant association of rs220677 with AD (p-value = 1.33×10^-8 calculated with the Yates-corrected Chi-square test under the assumption of dominant inheritance) was discovered in female patients. Associations of AD and related clinical measures with seven other single nucleotide polymorphisms listed in previous GWAS of psychiatric and addiction traits were differently replicated in male and female patients. The bioinformatic analysis showed that regulatory elements in the eight associated linkage disequilibrium blocks define the expression of 80 protein-coding genes. Nearly 68% of these and of 120 previously published coding genes associated with alcohol phenotypes directly interact in a single network. This study indicates that a number of genes behind the pathogenesis of AD are different in male and female patients, but implicated molecular mechanisms are functionally connected. The results also suggest the genetic basis of sex-specific psychiatric comorbidities of AD.
ARTICLE | doi:10.20944/preprints202202.0166.v1
Subject: Life Sciences, Microbiology Keywords: microbiological characterization; safety; VanZ; isolation; vancomycin resistant gene; genome; bee; honey
Online: 11 February 2022 (21:17:45 CET)
Bifidobacteria have long been recognized as bacteria with probiotic and therapeutic features. The aim of this work is to characterize the Bifidobacterium asteroides BA15 and BA17 strains, isolated from honeybee gut. An in-depth assessment was carried out on safety properties (antibiotic resistance profiling, β-haemolytic, DNAse and gelatinase activities and virulence factor presence) and other properties (antimicrobial activity, auto-aggregation, co-aggregation and hydrophobicity). Based on phenotypic and genotypic characterization, both strains satisfied all the safety requirements. More specifically, genome analysis showed the absence of genes encoding for glycopeptide (vanA, vanB, vanC-1, vanC-2, vanD, vanE, vanG), resistance to tetracycline (tet-M, tet-L and tetO), and virulence genes (asa1, gelE, cylA, esp, hyl).
ARTICLE | doi:10.20944/preprints202111.0467.v1
Subject: Chemistry, Organic Chemistry Keywords: myxobacteria; secondary metabolites; biarylitide; natural product discovery; RiPPs; genome mining; myxarylin
Online: 25 November 2021 (10:42:39 CET)
Ribosomally synthesized and post-translationally modified peptides (RiPPs) are a structurally diverse group of natural products. They feature a wide range of intriguing posttranslational modifications as exemplified by the biarylitides. These are a family of cyclic tripeptides found in Planomonospora, carrying a biaryl-linkage between two aromatic amino acids. Recent genomic analyses revealed the minimal biosynthetic prerequisite of biarylitide biosynthesis consisting of only one ribosomally synthesized pentapeptide precursor as substrate and a modifying cytochrome P450 dependent enzyme. In silico analyses revealed that the minimal biarylitide RiPP clusters are widespread among natural product producers across phylogenetic borders including myxobacteria. We report here the genome-guided discovery of the first myxobacterial biarylitide MeYLH termed Myxarylin from Pyxidicoccus fallax An d48. Myxarylin was found to be an N-methylated tripeptide surprisingly exhibiting a C–N biaryl crosslink. In contrast to Myxarylin, previously isolated biarylitides are N-acetylated tripeptides featuring a C–C biaryl crosslink. Furthermore, the formation of Myxarylin was confirmed by heterologous expression of the identified biosynthetic genes in Myxococcus xanthus DK1622. These findings expand the structural and biosynthetic scope of biarylitide type RiPPs and emphasize the distinct biochemistry found in the myxobacterial realm.
Subject: Biology, Agricultural Sciences & Agronomy Keywords: Salmonella enterica; food safety; genome; theory; single nucleotide polymorphisms; recombination; serotype
Online: 31 August 2021 (12:47:33 CEST)
Adenine and thymine homopolymer strings of at least 8 nucleotides (AT 8+mers) were characterized in Salmonella entericasubspecies I. The motif differed between cother taxonomic classes but not between Salmonella enterica serovars. The motif in plasmids was associated with serovar. Approximately 12.3% of the S. enterica motif loci had mutations. Mutability of AT 8+mers suggests that genomes undergo frequent repair to maintain optimal gene content, and that the motif facilitates self-recognition; in addition, serovar diversity is associated with plasmid content. A theory that genome regeneration accounts for both persistence of predominant Salmonella serovars and serovar diversity provides a new framework for investigating root causes of foodborne illness.
REVIEW | doi:10.20944/preprints202107.0182.v1
Subject: Life Sciences, Biochemistry Keywords: CRISPR-Cas9; Genome editing; plant editing; bacterial immune system; genetic disease
Online: 8 July 2021 (09:50:22 CEST)
Clustered regularly interspaced short palindromic repeats or CRISPR, one of the major technological tools from nature's toolbox, has revolutionized the scientific world with its potential use in humans and plants. CRISPR Cas9 was first known as an adaptive immune system of bacteria. It is a system that cleaves foreign DNA. It has been exploited to be used as a genome editing tool for correcting genetic diseases in humans, for plants to create stress-resistant plants, and for a variety of different purposes. This review provides a basic overview of its applications in different areas of biological research. It has immense potential for a variety of researches, but it's still a mystery for science. It feels like scientists just know a tip of an iceberg.
DATA DESCRIPTOR | doi:10.20944/preprints202106.0368.v1
Subject: Life Sciences, Biochemistry Keywords: Microbial Mash database, Mash distance, Genome containment, Type material, Microbial taxonomy
Online: 14 June 2021 (14:54:32 CEST)
The analysis of curated genomic, metagenomic, and proteomic data are of paramount importance in the fields of biology, medicine, education, and bioinformatics. Although this type of data is usually hosted in raw form in free international repositories, its access requires plenty of computing, storage, and processing capacities for the domestic user. The purpose of the study is to offer a comprehensive set of genomic and proteomic reference data, in an accessible and easy-to-use form to the scientific community. A representative type material set of genomes, proteomes and metagenomes were directly downloaded from the site: https://www.ncbi.nlm.nih.gov/assembly/ and from Genome Taxonomy Database, associated with the major groups of Bacteria, Archaea, Virus, and Fungi. Sketched databases were subsequently created and stored on handy raw reduced representations, by using Mash software. Our dataset contains near to 100 GB of space disk reduced to 585.78 MB and represents 87,476 genomics/proteomic records from eight informative contexts, which have been prefiltered to make them accessible, usable, and user-friendly with computational resources. Potential uses of this dataset include but are not limited to, microbial species delimitation, estimation of genomic distances, genomic novelties, paired comparisons between proteomes, genomes, and metagenomes.
HYPOTHESIS | doi:10.20944/preprints202105.0520.v1
Subject: Life Sciences, Biochemistry Keywords: genome evolution; ribozymes; RNA ligase; early Earth; autocatalytic sets; RNA world
Online: 21 May 2021 (10:16:35 CEST)
The evolutionary origin of the genome remains elusive. Here, I hypothesize that its first iteration, the protogenome, was a multi-ribozyme RNA. It evolved, likely within liposomes (the protocells) forming in dry-wet cycling environments, through the random fusion of ribozymes by a ligase and was amplified by a polymerase. The protogenome thereby linked, in one molecule, the information required to seed the protometabolism (a combination of RNA-based autocatalytic sets) in newly forming protocells. If this combination of autocatalytic sets was evolutionarily advantageous, the protogenome would have amplified in a population of multiplying protocells. It likely was a quasispecies with redundant information, e.g., multiple copies of one ribozyme. As such, new functionalities could evolve, including a genetic code. Once one or more components of the protometabolism were templated by the protogenome (e.g., when a ribozyme was replaced by a protein enzyme), and/or addiction modules evolved, the protometabolism became dependent on the protogenome. Along with increasing fidelity of the RNA polymerase, the protogenome could grow, e.g., by incorporating additional ribozyme domains. Finally, the protogenome could have evolved into a DNA genome with increased stability and storage capacity. I will provide suggestions for experiments to test some aspects of this hypothesis.
SHORT NOTE | doi:10.20944/preprints202006.0225.v1
Subject: Life Sciences, Virology Keywords: SARS-CoV-2; web application; virus genome; lineage assignment; amino acids
Online: 18 June 2020 (06:24:20 CEST)
Summary CoV-GLUE is an online web application for the interpretation and analysis of SARS-CoV-2 virus genome sequences, with a focus on amino acid sequence variation. It is based on the GLUE data-centric bioinformatics environment and provides a browsable database of amino acid replacements and coding region indels that have been observed in sequences from the pandemic. Users may also analyse their own SARS-CoV-2 sequences by submitting them to the web application to receive an interactive report containing visualisations of phylogenetic classification and highlighting genomic variation of potentially high impact, for example linked to primer mismatches.Availability and implementation Available at http://cov-glue.cvr.gla.ac.uk. Implemented using GLUE, an open source framework for the development of virus sequence data resources. Contact email@example.com
REVIEW | doi:10.20944/preprints201906.0106.v2
Subject: Biology, Other Keywords: virus evolution; cheat; cooperation; social evolution; defective interfering genome; satellite virus
Online: 22 April 2020 (06:02:05 CEST)
The success of many viruses depends upon cooperative interactions between viral genomes. For example, viruses that coinfect the same cell can share essential gene products, such as replicase, the enzyme that replicates the viral genome. However, when cooperation occurs, there is the potential for ‘cheats’ to exploit that cooperation. We suggest that: (1) the biology of viruses makes viral cooperation particularly susceptible to cheating; (2) cheats are common across a wide range of viruses, including viral entities that are already well studied, such as defective interfering genomes, and satellite viruses. Consequently, evolutionary theory developed to explain cheating offers a conceptual framework for understanding and manipulating viral dynamics. At the same time, viruses offer unique opportunities to study how cheats evolve, because cheating is relatively common in viruses, compared with taxa where cooperation is more usually studied, such as animals.