ARTICLE | doi:10.20944/preprints201905.0182.v1
Online: 15 May 2019 (10:09:48 CEST)
A remarkable feature of US federal investments in human genetics has been the availability of parallel funding for studies examining ethical, legal and social implications (ELSI). This funding has allowed ELSI researchers to develop new strategies to understand genetics, evaluate the benefits of genetic testing, and propose health policy that maximize the promise while minimizing harms. Despite successes, a consequence of this investment is the preoccupation with what is arguably the least actionable system of biomolecules, human DNA. In contrast, the most actionable system of biomolecules, the metabolome, is grossly understudied, despite its often more alarming ELSI.
REVIEW | doi:10.20944/preprints202011.0501.v1
Online: 19 November 2020 (10:41:35 CET)
To fully appreciate genetics, one must understand the link between genotype (DNA sequence) and phenotype (observable characteristics). Advances in high-throughput genomic sequencing technologies and applications, so-called “-omics”, have made genetic sequencing readily available across fields in biology from applications in non-traditional study organisms to precision medicine. Thus, understanding these tools is critical for any biologist, especially those early in their career. This comprehensive review discusses the chronological development of different sequencing methods, the bioinformatics steps to analyzing this data, and social and ethical issues raised by these techniques that must be discussed and evaluated.
ARTICLE | doi:10.20944/preprints202105.0750.v1
Subject: Life Sciences, Biochemistry Keywords: COVID-19; SARS-CoV-2 genomics; spike protein; epitope prediction; coronavirus comparative genomics
Online: 31 May 2021 (11:36:29 CEST)
The coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) challenges include understanding what triggered SARS-CoV-2 emergence, how this RNA virus is evolving or how the genomic variability may impact the primary structure of proteins that are targets for vaccine. We analyzed 19471 SARS-CoV-2 genomes and 199,984 spike glycoprotein sequences available at the GISAID database from all over the world and 3335 genomes of other Coronoviridae family members available at Genbank, collecting SARS-CoV-2 high-quality genomes and distinct Coronoviridae family genomes. Here, we identify a SARS-CoV-2 emerging cluster containing 13 closely related genomes isolated from bat and pangolin that showed evidence of recombination, which may have contributed to the emergence of SARS-CoV-2. The analyzed SARS-CoV-2 genomes presented 9632 single nucleotide polymorphisms (SNPs) corresponding to a variant density of 0.3 over the genome, and a clear geographic distribution. SNPs are unevenly distributed throughout the genome and hotspots for mutations were found for the spike gene and ORF 1ab. We describe a set of predicted spike protein epitopes whose variability is negligible. All predicted epitopes for the structural E, M and N proteins are highly conserved. This result favors the continuous efficacy of the available vaccines.
CONCEPT PAPER | doi:10.20944/preprints202203.0069.v1
Online: 3 March 2022 (17:18:57 CET)
Genomics has put prokaryotic rank-based taxonomy on a solid phylogenetic foundation. However, most taxonomic ranks were set long before the advent of DNA sequencing and genomics. In this concept paper, we thus ask the simple yet profound question: Should prokaryotic classification schemes besides the current phylum-to-species ranks be explored, developed, and incorporated into scientific discourse? Could such alternative schemes provide better solutions to the basic need of science and society for which taxonomy was developed, namely, precise and meaningful identification? A neutral genome-similarity based framework is then described that could allow alternative classification schemes to be explored, compared, and translated into each other without having to choose only one as the gold standard. Classification schemes could thus continue to evolve and be selected according to their benefits and based on how well they fulfill the need for prokaryotic identification.
CONCEPT PAPER | doi:10.20944/preprints202107.0546.v1
Subject: Biology, Anatomy & Morphology Keywords: Bacterial nomenclature; taxonomy; microbial genomics
Online: 23 July 2021 (14:22:59 CEST)
The remarkable success of taxonomic discovery, powered by culturomics, genomics and metagenomics, creates a pressing need for new bacterial names, while holding a mirror up to the slow pace of change in bacterial nomenclature. Here, I take a fresh look at bacterial nomenclature, exploring how we might create a system fit for the age of genomics, playing to the strengths of current practice, while minimising difficulties. Adoption of linguistic pragmatism, obeying the rules while treating recommendations as merely optional will make it easier to create names derived from descriptions, from people or places or even arbitrarily. Simpler protologues and a relaxed approach to recommendations will also remove much of the need for expert linguistic quality control. Automated computer-based approaches will allow names to be created en masse before they are needed, while also relieving microbiologists of the need for competence in Latin. The result will be a system that is accessible, inclusive and digital, while also fully capable of naming the unnamed millions of bacteria.
ARTICLE | doi:10.20944/preprints201912.0183.v2
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: Bioinformatics, Genomics, TCGA, Cox Model.
Online: 21 August 2020 (11:26:33 CEST)
This study aimed to rank cancers based on the strength of the relationship between the comprehensive mRNA expression levels of the most harmful or protective genes and patient survival. Using The Cancer Genome Atlas dataset that includes the RNA sequencing and c linical data, we investigated not only gene specific prognostic availability, but also comprehensive prognostic availability of prognostic genes filtered by the Cox coefficient values, and ranked cancers using a specially designed prognostic indicator. Usi ng Kaplan Meier plots, we found that cancers vary in the strength of the influence of their prognostic genes, and can be ranked based on this finding. There is a high probability that the treatment developed by using methods that reduce or increase the exp ression levels of biomarkers, for cancers that ranked at the bottom will not be efficient. The results of this study could be used as scientific evidence for the same.
TECHNICAL NOTE | doi:10.20944/preprints202007.0179.v1
Online: 9 July 2020 (07:59:18 CEST)
Motivation: UCSC Xena platform provides huge amounts of processed cancer omics data from big public projects like TCGA or individual reserach groups for enabling unprecedented research opportunities. In 2019, we developed UCSCXenaTools, an R package for retrieval of UCSC Xena data. However, an easier dataset exploration and analysis tool is still lack, especially for researchers without programming experience. Results: We develop UCSCXenaShiny, an R Shiny package to quickly explore, download all datasets from UCSC Xena data hubs. In addiction, a module based analysis framework is constructed to analyze and visualize data. Availability: https://github.com/openbiox/UCSCXenaShiny or https://cran.r-project.org/package=UCSCXenaShiny.
HYPOTHESIS | doi:10.20944/preprints202003.0392.v1
Online: 26 March 2020 (14:58:35 CET)
SARS-CoV 2 also known as COVID-19 is a fast spreading coronavirus related disease that emerged from China in December 2019 and is currently attained the status of a pandemic. There are currently no drugs/ vaccines against the same and moreover limited diagnostic tests to identify the infection. Additionally, these tests are expensive and hence are exclusive for very highly suspected cases of the disease especially in developing countries. This is causing an under-diagnosis which is an alarming state of affairs, as even a single missed SARS-CoV 2 case would spread the disease exponentially and keep it in the community. Through this entirely in silico study, we have developed a cheaper and faster diagnostic method based on simple PCR and restriction enzyme digestion, commonly used in restriction fragment length polymorphism (RFLP) tests. Through comparative genomics, we found the closest neighbours of SARS-CoV 2 then found the highly conserved regions of the genome which were absent in SARS-CoV 1, its closest neighbour. Then we found restriction sites for various enzymes followed by designing of PCR primers flanking those sites. We have found the primer pair to produce a 401 bp amplicon and when digested by SwaI enzyme, it produces two fragments of lengths 216 bp and 185 bp. As an internal control, GAPDH primers are pooled with the SARS-CoV 2 primers as the patient sample will also include human RNA mixed with the viral RNA. This primer pair gives an amplicon of 131 bp and hence a negative sample should show a single band of 131 bp while a positive digested sample will give three bands of 401 bp, 216 bp and 131 bp. The primers are specific to SARS-CoV 2 only and can additionally be used for SYBR green based real time quantification of viral load. The developed tests have not yet been tested in vitro due to stressed out working hours in the only pathogenic virus handling laboratory in our institute. Nonetheless, this study works as a head start for other laboratories to rapidly test the suggested protocols in vitro and make available a cheaper alternative test for SARS-CoV 2 which would especially be beneficial for the lower to middle income countries.
Online: 11 January 2020 (11:30:10 CET)
Public health agencies are increasingly using pathogen whole genome sequencing (WGS) to support surveillance and epidemiologic investigations. As access to WGS has grown, greater amounts of molecular data have helped improve our ability to detect outbreaks, investigate transmission chains, and explore large-scale population dynamics, such as the spread of antibiotic resistance. However, the wide adoption of WGS also poses challenges due to the amount of data generated and the need to transform raw data prior to analysis. This complexity means that public health agencies may need more advanced computational infrastructure, a broader technical workforce, and new approaches to data management and stewardship. As both a guide for how this development could occur, and a place to initiate discussion, we describe ten proposals for developing and supporting an informatics infrastructure for public health.
ARTICLE | doi:10.20944/preprints201905.0113.v1
Online: 9 May 2019 (12:57:09 CEST)
Experiential learning in the field is an opportunity for students to enter the heart of a scientific discipline. Through such experience, they can extract conceptual clues and discover motivational stepping stones that will potentially influence the rest of their education and career choice. Unfortunately, in Biology, the inescapable topic of Next-Generation Sequencing represents a challenge when it comes to create an educational curriculum that aims to provide students with hands-on experience on sequencers. It is an even more difficult task to accomplish if one’s purpose was to set such curriculum in a field situation. However, in recent years, educators have seen possibility to bring Next-Generation Sequencing to the reach of students more easily, with the Oxford Nanopore MinION, a low-budget, user-friendly, hand-held sequencer. Academic researchers have illustrated the performances of this device in the field and are inspirational for curricula aiming to take the next generation of scientists in the outdoors. We designed a modular 5-day workshop, with nanopore sequencing to be performed in field conditions. Here we describe the material and methods that lead the students and instructors from sample collection, DNA extraction and preparation for nanopore sequencing with MinION to real-time analysis of the data collected. This curriculum was implemented for the first-time aboard Research Vessel Sikuliaq during a transit organized by the STEMSEAS program at Columbia University in collaboration with the University of Alaska BLaST program. The line of investigation formulated for the workshop was an open-ended question that led the students to establish a proof of concept in terms of technology deployment at sea: what will show metagenomic results from DNA obtained from sea water and sequenced with Oxford Nanopore MinION? The workshop took place in October 2018 while Research Vessel Sikuliaq sailed the Alaskans seas for 7 days. Students successfully used nanopore sequencing for multiple metagenomic seawater samples. Their introductory analysis was consistent with environmental conditions and they were able to present their results by the end of the workshop.
ARTICLE | doi:10.20944/preprints201807.0618.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: Cancer genomics; CNA; CGH; bioinformatics
Online: 31 July 2018 (10:21:40 CEST)
Cancers arise from the accumulation of somatic genome mutations, with varying contributions of intrinsic (i.e. genetic predisposition) and extrinsic (i.e. environmental) factors. For the understanding of malignant clones, precise information about their genomic composition has to be correlated with morphological, clinical and individual features, in the context of the available medical knowledge. Rapid improvements in molecular profiling techniques, the accumulation of large amount of data in genomic alterations in human malignancies and the expansion of bioinformatic tools and methodologies have facilitated the understanding of the molecular changes during oncogenesis, and their correlation with clinico-pathological phenotypes. Far beyond a limited set of "driver" genes, oncogenomic profiling has identified a large variety of somatic mutations; and whole genome sequencing studies of healthy individuals have improved the knowledge of heritable genome variation. Nevertheless, main challenges arise from the skewed representation of individuals from varying population backgrounds in biomedical studies, and also through the limited extend in which some cancer entities are represented in the scientific literature. Content analyses of oncogenomic publications could provide guidance for the planning and support of future studies aiming at filling prominent knowledge gaps.
REVIEW | doi:10.20944/preprints202209.0050.v1
Online: 5 September 2022 (07:49:30 CEST)
Bunyaviruses represent the largest group of RNA viruses, and are the causative agent of a variety of febrile and hemorrhagic illnesses. Originally characterized as a single serotype in Africa, the number of described bunyaviruses now exceeds over 500, with its presence detected around the world. These predominantly tri-segmented, single-stranded RNA viruses are transmitted primarily through arthropod and rodent vectors, and can infect a wide variety of animal and plants. Although encoding for a small number of proteins, these viruses can inflict potentially fatal disease outcomes, and have even developed strategies to suppress the innate antiviral immune mechanisms of the infected host. This short review will attempt to provide an overall description of the order Bunyavirales, describing the mechanisms behind their infection, replication and their evasion of the host immune response. Furthermore, the historical context of these virus will be presented, starting from their original discovery almost 80 years ago, to the most recent research pertaining to viral replication and host immune response.
ARTICLE | doi:10.20944/preprints202208.0337.v1
Subject: Life Sciences, Genetics Keywords: breast cancer; polymorphism; mitochondrial genomics; D310
Online: 18 August 2022 (10:17:28 CEST)
Breast cancer has an important incidence in the worldwide female population. Although alterations in the mitochondrial genome probably play an important role in carcinogenesis, the actual evidence is ambiguous and inconclusive. The purpose of the present work was to explore mitochondrial sequences of clinical cases with breast cancer from different origins and determine the polymorphisms associated. The search for complete and partial mtDNA sequences obtained from breast cancer patients and controls was performed in NCBI Genbank database. We identified 124 mtDNA sequences associated to breast cancer cases of which 86 were complete and 38 partial sequences. Of these 86 complete sequences, 52 belong to patients with a confirmed diagnosis of breast cancer and 34 sequences were obtained from healthy mammary tissue of the same patients used as controls. From mtDNA analysis, two polymorphisms with significative statistical differences were found in D130 in sequences analyzed: m.310del (rs869289246) in 34.6% (27/78) breast cancer cases and 61.7% (21/34) of controls; and m.315dup (rs369786048) in 60.2% (47/78) of breast cancer cases and 38.2% (13/34) of controls. Also, the variant m.16519T>C (rs3937033) was found in 59% of control sequences and 52% of breast cancer sequences with a significant statistical difference. Polymorphic changes are evolutionarily related to haplogroup H of Indo-European and Euro Asiatic origins, however, were found in all non-European sequences with breast cancer.
REVIEW | doi:10.20944/preprints202204.0149.v1
Online: 15 April 2022 (14:58:08 CEST)
Sweet potatoes are a crucial crop for Asian and African countries. Its nutritional content and capacity to keep you healthy have increased in recent years. Moreover, sweet potatoes' fibre also keeps your gut happy. Most sweet potato varieties don't bloom. Due to pollination issues, sweet potatoes are also incompatible with each other. Sweet potato blooms are self-sterile, so they don't mix well in breeding programmes. Traditional and modern breeding procedures didn't always work with sweet potatoes, but some did. Using molecular biology methods, some individuals become more resistant to illnesses by eliminating particular genes. The crop's nature and growth should be improved. All of this should be done to acquire new characteristics in sweet potatoes by crossing them. Sweet potatoes are a superb tuberous crop, but they have issues with pollination and adjusting to new breeding procedures. Modern breeding and biotechnology methods can be used to get the most out of this crop. These are "chronological" ways to get the most out of farming.
ARTICLE | doi:10.20944/preprints202203.0112.v1
Online: 8 March 2022 (01:58:22 CET)
As commonly used chemical plasticizers in plastic products, phthalate esters had become a serious ubiquitous environmental pollutant, such as in soil of plastic film mulch culture. Microbial degradation or transformation was regarded as a suitable strategy to solve the phthalate esters pollution. Thus, a new phthalate esters degrading strain Gordonia sp. GZ-YC7 was isolated in this study, which exhibited the highest di-(2-ethylhexyl) phthalate degradation efficiency under 1000 mg/L and the strongest tolerance to 4000 mg/L. The comparative genomic analysis showed that there exist diverse degradation pathways for various phthalate esters such as di-(2-ethylhexyl) phthalate and dibutyl phthalate in Gordonia sp. GZ-YC7, which possibly contributes to its broad substrate spectrum, high degrading efficiency and high tolerance to phthalate esters. Gordonia sp. GZ-YC7 is potential for bioremediation of phthalate esters in polluted soil environments.
ARTICLE | doi:10.20944/preprints202111.0533.v1
Subject: Life Sciences, Genetics Keywords: chloroplast; genetic resources; genomics capirona; phylogenomics
Online: 29 November 2021 (12:32:24 CET)
Capirona (Calycophyllum spruceanum Benth.) belongs to subfamily Ixoroideae, one of de major lineages in the Rubiaceae family, and is an important timber tree, with origin in the Amazon Basin and has widespread distribution in Bolivia, Peru, Colombia, and Brazil. In this study, we obtained the first complete chloroplast (cp) genome of capirona from department of Madre de Dios located in the Peruvian Amazon. High-quality genomic DNA was used to construct librar-ies. Pair-end clean reads were obtained by PE 150 library and the Illumina HiSeq 2500 platform. The complete cp genome of C. spruceanum has a 154,480 bp in length with typical quadripartite structure, containing a large single copy (LSC) region (84,813 bp) and a small single-copy (SSC) region (18,101 bp), separated by two inverted repeat (IR) regions (25,783 bp). The annotation of C. spruceanum cp genome predicted 87 protein-coding genes (CDS), 8 ribosomal RNA (rRNA) genes, 37 transfer RNA (tRNA) genes and 01 pseudogene. A total of 41 simple sequence repeats (SSR) of this cp genome were divided into mononucleotides (29), dinucleotides (5), trinucleotides (3), and tetranucleotide (4). Most of these repeats were distributed in the noncoding regions. Whole chloroplast genome comparison with the other six Ixoroideae species revealed that the small single copy and large single copy regions showed more divergence than invert regions. Finally, phylogenetic analysis resolved that C. spruceanum is a sister species to Emmenopterys henryi, and confirms its position within the subfamily Ixoroideae. This study reports for the first time the genome organization, gene content, and structural features of the chloroplast genome of C. spruceanum, providing valuable information for genetic and evolutionary studies in the genus Calycophyllum and beyond.
ARTICLE | doi:10.20944/preprints202001.0064.v1
Subject: Medicine & Pharmacology, Dentistry Keywords: dental enamel; dental caries; genomics; odontogenesis
Online: 8 January 2020 (06:35:12 CET)
Objectives: The hierarchical structure of enamel gives insight on the properties of enamel and can influence its strength and ultimately caries experience. Presently, past caries experience is quantified using the decayed, missing, filled teeth/decayed, missing, filled surface (DMFT/DMFS for permanent teeth; dmft/dmfs for primary teeth) or international caries detection and assessment system (ICDAS) scores. By analyzing the structure of enamel, a new measurement can be utilized clinically to predict susceptibility to future caries experience based on a patient’s individual’s biomarkers. The purpose of this study was to test the hypothesis that number of prisms by square millimeter in enamel and average gap distance between prisms and interprismatic areas, influence caries experience through genetic variation of the genes involved in enamel formation. Materials and Methods: Scanning electron microscopy (SEM) images of enamel from primary teeth were used to measure number of prisms by square millimeter and interprismatic spaces, prism density and gap distances between prisms in the enamel samples. The measurements were tested to explore a genetic association with variants of selected genes and correlations with caries experience based on the individual’s DMFT+ dmft score and enamel microhardness at baseline, after an artificial lesion was created and after the artificial lesion was treated with fluoride. Results: Associations were found between variants of genes including ameloblastin, amelogenin, enamelin, tuftelin, tuftelin interactive protein 11, beta defensin 1, matrix metallopeptidase 20 and enamel structure variables measured. Significant correlations were found between caries experience and microhardness and enamel structure. Negative correlations were found between number of prisms by square millimeter and high caries experience (r value= -0.71), gap distance between prisms and the enamel microhardness after an artificial lesion was created (r value= -0.70), and gap distance between prisms and the enamel microhardness after an artificial lesion was created and then treated with fluoride (r value= -0.81). There was a positive correlation between number of prisms by square millimeter and prism density of the enamel (r value= 0.82). Conclusions: our data support that genetic variation may impact enamel formation, and therefore influence susceptibility to dental decay and future caries experience. Clinical Relevance: The evaluation of enamel structure that may impact caries experience allows for hypothesizing that the identification of individuals at higher risk for dental caries and implementation of personalized preventative treatments may one day become a reality.
ARTICLE | doi:10.20944/preprints201808.0423.v1
Subject: Biology, Animal Sciences & Zoology Keywords: mitochondrial DNA; mitochondrial genome; genome assembly; genome annotation; next generation sequencing; animal genomics; partial genomics; bioinformatics
Online: 24 August 2018 (03:24:37 CEST)
Next-generation sequencing is now a mature technology, allowing partial animal genomes to be produced for many clades. Though many software exist for genome assembly and annotation, a simple pipeline that allows researchers to input raw sequencing reads in fastq format and allow the retrieval of a completely assembled and annotated mitochondrial genome is still missing. mitoMaker 1.0 is a pipeline developed in python that implements (i) recursive de novo assembly of mitochondrial genomes using a set of increasing k-mers; (ii) search for the best matching result to a target mitogenome and; (iii) performs iterative reference-based strategies to optimize the assembly. After (iv) checking for circularization and (v) positioning tRNA-Phe at the beginning, (vi) geneChecker.py module performs a complete annotation of the mitochondrial genome and provides a GenBank formatted file as output.
COMMUNICATION | doi:10.20944/preprints202109.0485.v2
Subject: Life Sciences, Genetics Keywords: gene nomenclature; vertebrate genomics; oxytocin; arginine vasopressin
Online: 29 April 2022 (08:09:45 CEST)
Standardized gene nomenclature supports unambiguous communication and identification of the scientific literature associated with genes. To support the increasing number of annotated genomes that are now available for comparative studies, gene nomenclature authorities coordinate the assignment of approved gene names that can be readily propagated across species. Theofanopoulou et al. (Theofanopoulou et al. 2021) propose a new nomenclature for the genes encoding oxytocin and arginine vasopressin and their receptors. Rather than changing to a different nomenclature system, we propose minor updates to the current approved nomenclature of these vertebrate genes to better reflect their evolutionary history. We call on authors, journal editors and reviewers to help support communication and indexing of gene-related publications by working with existing gene nomenclature committees and ensuring that standardized gene nomenclature is routinely used.
REVIEW | doi:10.20944/preprints202101.0521.v1
Subject: Life Sciences, Molecular Biology Keywords: Data integration; multi-omics; integration strategies; genomics
Online: 25 January 2021 (16:19:31 CET)
Metabolomics deals with multiple and complex chemical reactions within living organisms and how these are influenced by external or internal perturbations. It lies at the heart of omics profiling technologies not only as the underlying biochemical layer that reflects information expressed by the genome, the transcriptome and the proteome, but also as the closest layer to the phenome. The combination of metabolomics data with the information available from genomics, transcriptomics, and proteomics offers unprecedented possibilities to enhance current understanding of biological functions, elucidate their underlying mechanisms and uncover hidden associations between omics variables. As a result, a vast array of computational tools have been developed to assist with integrative analysis of metabolomics data with different omics. Here, we review and propose five criteria – hypothesis, data types, strategies, study design and study focus – to classify statistical multi-omics data integration approaches into state-of-the-art classes under which all existing statistical methods fall. The purpose of this review is to look at various aspects that lead the choice of the statistical integrative analysis pipeline in terms of the different classes. We will draw a particular attention to metabolomics and genomics data to assist those new to this field in the choice of the integrative analysis pipeline.
REVIEW | doi:10.20944/preprints202010.0118.v1
Online: 6 October 2020 (10:53:17 CEST)
Here we review and describe a set of research priorities to meet present and future challenges posed to farmed animal production that build on progress, successes and resources from the Functional Annotation of ANimal Genomes (FAANG) project.
REVIEW | doi:10.20944/preprints202009.0073.v1
Subject: Medicine & Pharmacology, Pediatrics Keywords: genomics; pediatrics; lung disease; pulmonary arterial hypertension
Online: 3 September 2020 (15:29:36 CEST)
Pulmonary arterial hypertension is a rare disease with high mortality despite recent therapeutic advances. The disease is caused by both genetic and environmental factors, and likely gene x environment interactions. While PAH can manifest across the lifespan, pediatric-onset disease is particularly challenging because it is frequently associated with a more severe clinical course and comorbidities including lung/heart developmental anomalies. In light of these differences, it is perhaps not surprising that emerging data from genetic studies of pediatric-onset PAH indicate that the genetic basis is different than that of adults. There is a greater genetic burden in children, with rare genetic factors contributing to at least 36% of pediatric-onset idiopathic PAH (IPAH) compared to ~11% of adult-onset IPAH. De novo variants are frequently associated with PAH in children, and contribute to at least 15% of all pediatric cases. The standard of medical care for pediatric PAH patients is based on extrapolations from adult data. However, the increased etiologic heterogeneity, poorer prognosis and increased genetic burden for pediatric-onset PAH calls for a dedicated pediatric research agenda to improve molecular diagnosis and clinical management. A genomics-first approach will improve the understanding of pediatric PAH and how it is related to other rare pediatric genetic disorders.
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: cancer biomarkers; BRCA1/2; genomics; European recommendation
Online: 18 August 2020 (16:30:04 CEST)
Rapid and continuing advances in biomarker testing are not being matched by uptake in health systems, and this is hampering both patient care and innovation. It also risks costing health systems the opportunity to make their services more efficient and, over time, more economical. The potential that genomics has brought to biomarker testing in diagnosis, prediction and research is being realised, pre-eminently in many cancers, but also in an ever-wider range of conditions – notably BRCA1/2 testing in ovarian, breast, pancreatic and prostate cancers. Nevertheless, the implementation of genetic testing in clinical routine setting is still challenging. Development is impeded by country-related heterogeneity, data deficiencies, and lack of policy alignment on standards, approval – and the role of real-world evidence in the process - and reimbursement. The acute nature of the problem is compellingly illustrated by the particular challenges facing the development and use of tumour agnostic therapies, where the gaps in preparedness for taking advantage of this innovative approach to cancer therapy are sharply exposed. Europe should already have in place a guarantee of universal access to a minimum suite of biomarker tests and should be planning for an optimum testing scenario with a wider range of biomarker tests integrated into a more sophisticated health system articulated around personalised medicine. Improving healthcare and winning advantages for Europe's industrial competitiveness and innovation require an appropriate policy framework – starting with an update to outdated recommendations.
ARTICLE | doi:10.20944/preprints202008.0307.v1
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: cancer biomarkers, BRCA1/2, genomics; European recommendation
Online: 14 August 2020 (04:27:45 CEST)
Rapid and continuing advances in biomarker testing are not being matched by take-up in health systems, and this is hampering both patient care and innovation. It also risks costing health systems the opportunity to make their services more efficient and, over time, more economical. The potential that genomics has brought to biomarker testing in diagnosis, prediction and research is being realised, pre-eminently in many cancers, but also in an ever-wider range of conditions. One of the paradigmatic examples is BRCA1/2 testing in ovarian, breast, pancreatic and prostate cancers. Nevertheless, development is impeded by data deficiencies, and lack of policy alignment on standards, approval – and the role of real-world evidence in the process - and reimbursement. The acute nature of the problem is compellingly illustrated by the particular challenges facing the development and use of tumour agnostic therapies, where the gaps in preparedness for taking advantage of this innovative approach to cancer therapy are sharply exposed. Europe should already have in place a guarantee of universal access to a minimum suite of biomarker tests and should be planning for an optimum testing scenario with a wider range of biomarker tests integrated into a more sophisticated health system articulated around personalised medicine. Improving healthcare and winning advantages for Europe's industrial competitiveness and innovation require an appropriate policy framework – starting with an update to outdated recommendations.
REVIEW | doi:10.20944/preprints201810.0708.v1
Subject: Life Sciences, Genetics Keywords: circular visualization; circos; genomics; next-generation sequencing
Online: 30 October 2018 (07:06:35 CET)
After human genome sequencing and rapid changes in genome sequencing methods, we have entered in the era of rapidly accumulating genome-sequencing data. This has poses development of several types of methods for representing results of genome sequencing data. Circular genome visualizations tools are also critical in this area as they provide rapid interpretation and simple visualization of overall data. In the last 15 years, we have seen rapid changes in circular visualization tools after the development of the circos tool with 1–2 tools published per year. Herein we have summarized and revisited all these tools until the third quarter of 2018.
ARTICLE | doi:10.20944/preprints201809.0388.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: biobanks, electronic health records, Michigan Genomics Initiative
Online: 19 September 2018 (14:57:30 CEST)
Biobanks linked to electronic health records provide a rich data resource for health-related research. With the establishment of large-scale infrastructure, the availability and utility of data from biobanks has dramatically increased over time. As more researchers become interested in using biobank data to explore a diverse spectrum of scientific questions, resources guiding the data access, design, and analysis of biobank-based studies will be crucial. The first aim of this review is to characterize the types of biobanks that are discussed in the recent literature and provide detailed descriptions of specific biobanks including their location, size, data access, data linkages and more. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, new discoveries, and hypothesis-generating studies of disease-treatment, disease-exposure and disease-gene associations. Rather than spending time and money designing and implementing a single study with pre-defined objectives, researchers can use biobanks’ existing data-rich resources to answer scientific questions as quickly as they can analyze them. While the data are becoming increasingly available, additional thought is needed to address issues related to the design of such studies and analysis of these data. In the second aim of this review, we discuss statistical issues related to biobank research in general including study design, sampling strategy, phenotype identification, and missing data. These issues are illustrated using data from the Michigan Genomics Initiative, UK Biobank, and Genes for Good. We summarize the current body of statistical literature aimed at addressing some of these challenges and discuss some of the standing open problems in this area. This work serves to complement and extend recent reviews about biobank-based research and aims to provide a resource catalog with statistical and practical guidance to researchers pursuing biobank-based research.
REVIEW | doi:10.20944/preprints201806.0191.v1
Subject: Life Sciences, Genetics Keywords: rare disease; functional genomics; genetic variant validation
Online: 12 June 2018 (12:36:08 CEST)
Many insights into human disease have been built on experimental results in Drosophila, and research in fruit flies is often justified on the basis of its predictive value for questions related to human health. Additionally, there is now a growing recognition of the value of Drosophila for the study of rare human genetic diseases, either as a means of validating the causative nature of a candidate genetic variant found in patients, or as a means of obtaining functional information about a novel disease-linked gene when there is little known about it. For these reasons, funders in the US, Europe, and Canada have launched targeted programs to link human geneticists working on discovering new rare disease loci with researchers who work on the counterpart genes in Drosophila and other model organisms. Several of these initiatives are described here, as are a number of output publications that validate this new approach.
ARTICLE | doi:10.20944/preprints201612.0098.v1
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: lung cancer; adenovirus; E1b; UV-irradiation; genomics
Online: 19 December 2016 (09:33:20 CET)
Adenoviruses (Ads) have been extensively manipulated for the development of cancer selective replication, leading to cancer cell death or oncolysis. Clinical studies using E1-modified oncolytic Ads have shown that this therapeutic platform was safe, but with limited efficacy, indicating the necessity of targeting other viral genes for manipulation. To improve the therapeutic efficacy of oncolytic Ads, we treated the entire Ad genome repeatedly with UV-light and have isolated AdUV which efficiently lyses cancer cells as reported previously . In this report, we show that no mutations were observed in the early genes (E1 or E4) of AdUV while several mutations were observed within the Ad late genes which have structural or viral DNA packaging functions. This study also reported the increased release of AdUV from cancer cells. In this study, we found that AdUV inhibits tumor growth following intratumoral injection. These results indicate the potentially significant role of the viral late genes, in particular the DNA packaging genes, to enhance Ad oncolysis.
ARTICLE | doi:10.20944/preprints202209.0248.v1
Subject: Life Sciences, Genetics Keywords: Fruitless; genomics; An. gambiae s.l; vector control; Africa
Online: 16 September 2022 (11:33:36 CEST)
Targeting genes involved in sexual determinism for vector or pest control purpose, requires a better understanding of their polymorphism in natural populations in order to ensure a rapid spread of the construct. By using genomic data from An. gambiae s.l., we analyzed the genetic variation and the conservation score of the fru gene in 18 natural populations across Africa. A total of 34339 SNPs were identified including 3.11% non-synonymous segregating sites. Overall, the nucleotide diversity was low and the Tajima's D neutrality test was negative indicating an excess of low frequency SNPs in the fru gene. The allelic frequencies of the non-synonymous SNPs were low (freq < 0.26) except two SNPs identified at high frequencies (freq > 0.8) in the Zinc-finger A and B protein domains. The conservation score was variable throughout the fru gene with maximum values in the exonic compared to the intronic regions. These results showed a low genetic variation in overall the exonic regions especially the male sex-specific exon and the BTB-exon 1 of the fru gene. These findings are crucial for the development of a gene drive construct targeting the fru gene that can rapidly spread without encountering resistance in wild populations.
ARTICLE | doi:10.20944/preprints202203.0224.v1
Subject: Life Sciences, Genetics Keywords: zoogenetic resources; organelle; genomics; NGS; cattle; Bos taurus
Online: 16 March 2022 (07:36:01 CET)
Cattle spread throughout the American continent during the colonization years, originating creole breeds that adapted to a wide range of climate conditions. Population of creole cattle in Peru is decreasing mainly due to the introduction of more productive breeds in recent years. During the last 15 years, there have been a significant progress on cattle genomics. However, little is known about the genetics of the Peruvian creole cattle (PCC) even though its importance to (i) improve productivity in the Andean region, (ii) agricultural labor, and (iii) cultural traditions. In addition, the origin and phylogenetic relationship of the PCC is still unclear. In order to promote the conservation of the PCC, we sequenced for the first time the mitochondrial genome of a creole bull from the highlands of Arequipa, which also possessed exceptional fighting skills and was employed for agricultural tasks. The total mitochondrial genome sequence is 16,339 bp in length with the base composition of 31.43 % for A, 28.64 % for T, 26.81 % for C, and 13.12 % for G. It contains 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes and a control region. Among the 37 genes, 28 were positioned on the H-strand and nine were positioned on the L-strand. The most frequently used codons were CUA (Leucine), AUA (Isoleucine), AUU (Isoleucine), AUC (Isoleucine), y ACA (Threonine). Maximum likelihood reconstruction using complete mitochondrial genome sequences clearly demonstrated that the PCC is strongly related to native African breeds, giving insights into the ancestry of PCC. The annotated mitochondrial genome of PCC would serve as an important genetic data set for further breeding work and conservation strategies.
ARTICLE | doi:10.20944/preprints202110.0367.v1
Subject: Biology, Other Keywords: Bacteria; culturomics; genome; species; sp. nov.,; taxono-genomics
Online: 25 October 2021 (15:47:32 CEST)
Marseille-Q4369 is a strain that we isolated from human healthy skin and characterized by taxono-genomic approach. Marseille-Q4369 exhibited 99.80% 16S rRNA sequence similarity with Agrococcus pavilionensisT the phylogenetically closest bacterium with standing in nomenclature. Furthermore, digital DNA–DNA hybridization revealed a maximum identity similarity of only 52.4% and an OrthoANI parameter provided a value of 93.63% between the novel organism and Agrococcus pavilionensisT. Marseille-Q4369 was observed to be a yellowish-pigmented, Gram-positive, coccoïd, facultative aerobic bacterium, and belonging to the Microbacteriaceae family. The major fatty acids detected are 12-methyl-tetradecanoic acid (66%), 14-methyl-hexadecanoic acid (24%) followed by 13-methyl-tetradecanoic acid (5%). The genome size of strain Marseille-Q4369 was 2,737,735-bp long with a 72,27 % G+C content. Taken altogether, these results confirm the status of this strain as a new member of the Agrococcus genus for which the name of Agrococcus massiliensis is proposed (=CSUR-Q4369 = DSM112404).
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: Pediatric Acute Lymphoblastic Leukemia; Genomics; Epigenetics; Targeted Therapy
Online: 1 October 2021 (12:23:33 CEST)
Acute lymphoblastic leukemia is the most common malignancy in children and is characterized by numerous genetic and epigenetic abnormalities. Epigenetic mechanisms, which involve DNA methylations and histone modifications, result in the heritable silencing of genes without a change in their coding sequence. Emerging studies are increasing our understanding of the epigenetic role of leukemogenesis and have demonstrated the potential of DNA methylations and histone modifications as a biomarker for lineage and subtypes classification, predicting relapse, and disease progression in ALL. Epigenetic abnormalities are relatively reversible when treated with some small molecule-based agents compared to genetic alterations. In this review, we conclude the genetic and epigenetic characteristics in ALL and discuss the future role of DNA methylation and histone modifications in predicting relapse, finally focus on the individual and precision therapy targeting epigenetic alterations.
ARTICLE | doi:10.20944/preprints202107.0281.v1
Subject: Life Sciences, Biochemistry Keywords: Glioblastoma; Precision Medicine; Targeted Therapy; Genomics; Neuro-Oncology
Online: 13 July 2021 (09:28:35 CEST)
BACKGROUND: Glioblastoma (GBM) is driven by various genomic alterations. Next generation sequencing (NGS) could yield targetable alterations that may impact outcomes. The goal of this study was to describe how NGS can inform targeted therapy (TT) in this patient population. METHODS: The medical records of patients (pts) with a diagnosis of GBM from 2017-2019 were reviewed. Records of patients with recurrent GBM and genomic alterations were evaluated. Objective response rates and disease control rates were deter-mined. RESULTS: A total of 87 pts with GBM underwent NGS. Forty percent (n = 35) were considered to have actionable alterations. Of the 35, 40% (n=14) pts had their treatment changed due to an alteration. The objective response rate (ORR) of this population was 43%. The disease control rate (DCR) was 100%. The absolute mean decrease in contrast enhancing disease was 50.7% (95% CI 34.8 – 66.6). CONCLUSION: NGS for GBM, particularly in the recurrent setting, yields a high rate of actionable alterations. We observed a high ORR and DCR, reflecting the value of NGS in deciding on TT to match alterations that are likely to respond. In conclusion, patient selection and availability of NGS may impact outcomes in select pts with recurrent GBM.
CASE REPORT | doi:10.20944/preprints202009.0543.v1
Subject: Medicine & Pharmacology, Other Keywords: genetics； comparative genomics； phylogenetic analysis； osteopetrosis； CLCN7 gene
Online: 23 September 2020 (07:56:30 CEST)
Osteopetrosis is a group of rare inheritable disorders of the skeleton characterized by increased bone density. The disease is remarkably heterogeneous in clinical presentation and often misdiagnosed. Therefore, genetic testing and molecular pathogenicity analysis are essential for precise diagnosis and new targets for preventive pharmacotherapy. Mutations in the CLCN7 gene give rise to the complete spectrum of osteopetrosis phenotypes and are responsible for about 75% of cases of autosomal dominant osteopetrosis. In this study, we report the identification of a novel variant in the CLCN7 gene in a patient diagnosed with osteopetrosis and provide evidence for its significance (likely deleterious) based on extensive comparative genomics, protein sequence and structure analysis. A set of automated bioinformatics tools used to predict consequences of this variant identified it as deleterious or pathogenic. Structure analysis revealed that the variant is located at the same “hot spot” as the most common CLCN7 mutations causing osteopetrosis. Deep phylogenetic reconstruction showed that not only Leu614Arg, but any non-aliphatic substitutions in this position are evolutionarily intolerant, further supporting the deleterious nature of the variant. The present study provides further evidence that reconstructing a precise evolutionary history of a gene helps predicting phenotypical consequences of variants of uncertain significance.
ARTICLE | doi:10.20944/preprints202008.0220.v1
Online: 9 August 2020 (15:53:58 CEST)
The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatic tools and resources, and advocate for greater openness, interoperability, accessibility and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a clear and present need for a fit-for-purpose, open source SARS-CoV-2 contextual data standard. As such, we have developed an extension to the INSDC pathogen package, providing a SARS-CoV-2 contextual data specification based on harmonisable, publicly available, community standards. The specification is implementable via a collection template, as well as an array of protocols and tools to support the harmonisation and submission of sequence data and contextual information to public repositories. Well-structured, rich contextual data adds value, promotes reuse, and enables aggregation and integration of disparate data sets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19.
Online: 3 July 2020 (09:45:43 CEST)
The novel respiratory disease COVID-19 has reached the status of worldwide pandemic and large efforts are currently being undertaken in molecularly characterizing the virus causing it, SARS-CoV-2. The genomic variability of SARS-CoV-2 specimens scattered across the globe can underly geographically specific etiological effects. In the present study, we gather the 48,635 SARS-CoV-2 complete genomes currently available thanks to the collection endeavor of the GISAID consortium and thousands of contributing laboratories. We analyze and annotate all SARS-CoV-2 mutations compared with the reference Wuhan genome NC_045512.2, observing an average of 7.23 mutations per sample. Our analysis shows the prevalence of single nucleotide transitions as the major mutational type across the world. There exist at least three clades characterized by geographic and genomic specificity. In particular, the clade G, prevalent in Europe, carries a D614G mutation in the Spike protein, which is responsible for the initial interaction of the virus with the host human cell. Our analysis may drive local modulation of antiviral strategies based on the molecular specificities of this novel virus.
COMMUNICATION | doi:10.20944/preprints202005.0428.v1
Subject: Biology, Animal Sciences & Zoology Keywords: evolutionary genomics; Gasterosteus aculeatus; gene flow; hybridization; phylogeny
Online: 26 May 2020 (08:44:35 CEST)
Where genetic variation promoting speciation originates is a crucial question in evolutionary genomics. In a recent article, Marques et al. (2019) seek to address this question in lake and stream threespine stickleback fish from the Lake Constance (hereafter LC) basin in Central Europe. Based on population genetic methods, they conclude that incipient speciation between lake and stream stickleback was facilitated by the mixing of genetic variation from old lineages evolved in isolation (i.e., admixture following secondary contact). In this comment, I discuss conceptual and methodological problems and unrecognized conflicts with existing evidence that cast doubt on Marques et al.’s conclusion.
ARTICLE | doi:10.20944/preprints201906.0293.v1
Subject: Biology, Plant Sciences Keywords: kiwifruit; genomics; polyploidy; breeding; ascorbic acid; vitamin C
Online: 28 June 2019 (08:09:04 CEST)
During analysis of kiwifruit derived from hybrids between the high AsA species Actinidia eriantha and A. chinensis var chinensis, we observed bimodal segregation of fruit AsA concentration suggesting major gene segregation. To test this hypothesis we performed whole-genome sequencing on pools of high and low AsA fruit from tetraploid A. chinensis var. deliciosa x A. eriantha backcross families. Pool-GWAS revealed a single QTL spanning more than 5 Mbp on chromosome 26, which we denote as qAsA26.1. A co-dominant PCR marker was used to validate this association in four diploid (A. chinensis x A. eriantha) x A. chinensis backcross families, showing that the eriantha allele at this locus increases fruit AsA levels by 250 mg/100 g fresh weight. Inspection of genome composition and recombination in other A. chinensis genetic maps confirmed that the qAsA26.1 region bears hallmarks of suppressed recombination. The molecular fingerprint of this locus was examined in leaves of backcross validation families by RNASEQ. This confirmed strong allelic expression bias across this region as well as differential expression of transcripts on other chromosomes. This evidence suggests that the region harboring qAsA26.1 constitutes a supergene, which may condition multiple pleiotropic effects on metabolism.
ARTICLE | doi:10.20944/preprints202102.0135.v1
Subject: Mathematics & Computer Science, Applied Mathematics Keywords: algorithmic information theory; universal distribution; Kolmogorov complexity; quantum algorithms; quantum circuit model; quantum Turing machine; genomics; viral genomics; meta-biology
Online: 4 February 2021 (12:04:02 CET)
Inferring algorithmic structure in data is essential for discovering causal generative models. In this research, we present a quantum computing framework using the circuit model, for estimating algorithmic information metrics. The canonical computation model of the Turing machine is restricted in time and space resources, to make the target metrics computable under realistic assumptions. The universal prior distribution for the automata is obtained as a quantum superposition, which is further conditioned to estimate the metrics. Specific cases are explored where the quantum implementation offers polynomial advantage, in contrast to an indispensable exhaustive enumeration in the corresponding classical case. The unstructured output data and the computational irreducibility of Turing machines make this algorithm impossible to approximate using heuristics. Thus, exploring the space of program-output relations is one of the most promising problems for demonstrating quantum supremacy using Grover search that cannot be dequantized. Experimental use cases for quantum acceleration are developed for self-replicating programs and algorithmic complexity of short strings. With quantum computing hardware rapidly attaining technological maturity, we discuss how this framework will have significant advantage for various genomics applications in meta-biology, phylogenetic tree analysis, protein-protein interaction mapping and synthetic biology. This is the first time experimental algorithmic information theory is implemented using quantum computation. Our implementation on the Qiskit quantum programming platform is copy-left and can be found on https://github.com/Advanced-Research-Centre/QPULBA
ARTICLE | doi:10.20944/preprints202208.0191.v1
Subject: Life Sciences, Molecular Biology Keywords: bacterial genomics; de novo assembly; Oxford Nanopore Technologies; Snakemake
Online: 10 August 2022 (04:37:01 CEST)
With the advancement of long-read sequencing technologies and their more widespread use for bacterial genomics, several methods for generating genome assemblies from error-prone long reads have been developed. These are complemented by various tools for assembly polishing using either long reads, short reads, or reference genomes. End users are therefore left with a plethora of possible combinations of programs for obtaining a final trusted assembly. Hence, there is also the need for measuring completeness and accuracy of such assemblies, for which, again, several evaluation methods implemented in various programs are available. In order to automatically run all these programs, I developed two workflows for the workflow management system Snakemake for bacterial genome assembly and evaluation of assemblies, which provide end users with an easy-to-run method for both tasks. The workflows are available as open source software under the MIT license at https://github.com/pmenzel/ont-assembly-snake and https://github.com/pmenzel/score-assemblies.
REVIEW | doi:10.20944/preprints202111.0203.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: flower development; epigenetics; RNA biology; Genomics; single cell biology
Online: 10 November 2021 (11:00:03 CET)
The rise of data science in biology stimulates interdisciplinary collaborations to address fundamental questions. Here, we report the outcome of the first SINFONIA symposium focused on revealing the mechanisms governing plant reproductive development across biological scales. The intricate and dynamic target networks of known regulators of flower development remain poorly understood. To analyze development from the genome to the final floral organ morphology, high-resolution data that capture spatiotemporal regulatory activities are necessary and require advanced computational methods for analysis and modeling. Moreover, frameworks to share data, practices and approaches that facilitate the combination of varied expertise to advance the field are called for. Training young researchers in interdisciplinary approaches and science communication offers the opportunity to establish a collaborative mindset to shape future research.
REVIEW | doi:10.20944/preprints202108.0514.v1
Subject: Keywords: radish; breeding; interspecific hybridization; molecular breeding; genomics; genetic engineering
Online: 26 August 2021 (16:46:36 CEST)
Radish is an annual herbaceous root crop, fruit, and oil crop plant belonging to the Cruciferae family. The important traits for radish breeding include high yield, early maturity, late bolting, pungency, cold-hardiness, drought resistance, heat tolerance, and soil adaptability. For successful radish production, need to the understand nature and behavior of the flower, and very important to identify the S haplotypes of parental lines to produce F1 hybrids based on self-incompatibility to get rid of laborious hand emasculation in radish. In radish some desirable genes are not present within varieties. Therefore, further breeding programmes depend on inter-specific and intra-specific hybridization, which has a vital role in genomic studies and crop improvement by introducing desirable agronomic characters. It is essential to acquire detailed genetic information on chromosomes and information on inheritance. Genomics is now at the core of crop improvement, and radish crop is exploited to study the underlying differences in genotypes. But some monogenic characters are improved by genetic engineering. A three-decade span following the first documented instance of genetic engineering has witnessed its application's unprecedented growth. Researchers have successfully produced transgenic radishes with various agronomic characteristics over the last decade.
REVIEW | doi:10.20944/preprints202010.0149.v1
Subject: Biology, Anatomy & Morphology Keywords: breeding; diversity; genetic engineering; genomics; male sterility; melon; QTLs
Online: 7 October 2020 (09:22:33 CEST)
Melon (Cucumis melo L.) a member of family Cucurbitaceae is extensively cultivated for its fleshy fruits. Based on the specific agro-climatic zones of cultivation as well as concerning the regional preferences, melon displays significant variability phenotypic and biochemical attributes. Below, an effort is put forth to considerably evaluate the scope of achievements while in the growth as well as the enactment of melon breeding programs by employing the newest solutions. Melon breeding has achieved critical milestones throughout the previous century, and we hope this trend will go on to persist down the road. However, studies have to determine new genetic information for genes associated with the challenges imposed by climate change. The identification of valuable hereditary and also metabolic variability in the form of landraces and melon wild relatives will be useful for harvest diversification and also for the broadening of the cultivated melon genetic base. Whereas, considerable information on genomics, and melon metabolomics, is beneficial for dissecting the basis of the inheritance of important traits and their impact on the former characteristics. Overall, we hope the manuscript is going to serve as a crucial resource for the melon breeders.
ARTICLE | doi:10.20944/preprints202005.0413.v1
Subject: Life Sciences, Virology Keywords: SARS-CoV-2; nucleocapsid (N); genomics; coronavirus; Wuhan; Pandemic
Online: 25 May 2020 (17:45:40 CEST)
Severe acute respiratory syndrome novel coronavirus 2 (SARS-CoV-2) has caused the global pandemic as COVID-19, which is the most notorious global public health crisis in the last 100 years. SARS-CoV-2 is composed of four structural proteins and several non-structured proteins. The multi-facet nucleocapsid (N) protein is the major component of structural proteins of CoVs, However, there are no dedicated genomic, sequences and structural analyses focusing on potential roles of N protein. Hence, there is an urgent requirement of a detailed study on N protein of SARS-CoV-2. Herein, we are presenting a comprehensive study on N protein from SARS-CoV-2. We have identified seven motifs conserved in the three major domains namely N-terminal domain, linker regions and the C-terminal domains. Out of seven motifs, six motifs are conserved across different members of coronaviridae, while motif4 is specific for SARS CoVs with potential amyloidogenic properties. Additionally, we report this protein has large patches of disordered regions flanking with these seven motifs. These motifs are hubs of epitopes with 67 experimentally verified epitopes from related viruses. We report the presence of three nuclear localization signals (NLS1-NLS3 mapped to 36-41, 256-26, and 363-389 residues, respectively) and two nuclear export signals (NES1-NLS2 from 151-161 and 217-230 residues, respectively) in the N protein of SARS-CoV-2. These deciphered two Q-patches as Q-patch1 and Q-patch2, mapped in the regions of 266-306, and 361-418 residues, which potentially help in the aggregation of the viral proteins along with 219LALLLLDR226 patch. Additionally, we have identified 14 antiviral drugs potentially binding to seven motifs of N-proteins using docking-based drug discovery methods.
REVIEW | doi:10.20944/preprints202004.0005.v1
Subject: Life Sciences, Virology Keywords: SARS-CoV-2; COVID-19; Coronavirus; Pandemic; Viral Genomics
Online: 1 April 2020 (09:22:38 CEST)
The COVID-19 pandemic is due to infection caused by the novel SARS-CoV-2 that impacts the lower respiratory tract. The spectrum of symptoms ranges from asymptomatic infections to mild respiratory symptoms to the lethal form of COVID-19 which is associated with severe pneumonia, acute respiratory distress and fatality. At present, the global case fatality rate of COVID-19 laboratory confirmed cases is ~4.7% ranging from ~0.3-0.4% in Chile and Israel to ~10.8% in Italy. To address this global crisis, up-to-date information on the viral genomics and transcriptomics is crucial for understanding the origins and global dispersal of the virus, providing insight into viral pathogenicity, transmission and epidemiology, and enabling strategies for therapeutic interventions, drug discovery and vaccine development. Therefore, this review provides a comprehensive overview of COVID-19 epidemiology, genomic etiology, findings from recent transcriptomic map analysis, viral-human protein interactions, molecular diagnostics, and the current status of vaccine and novel therapeutic intervention development. Moreover, we provide an extensive list of resources that will help the scientific community access numerous types of databases related to SARS-CoV-2 OMICs and approaches to therapeutics related to COVID-19 treatment.
ARTICLE | doi:10.20944/preprints202003.0336.v1
Subject: Life Sciences, Virology Keywords: Manihot esculenta Crantz; potexvirus; Cassava common mosaic virus; genomics
Online: 23 March 2020 (05:46:35 CET)
The complete genomic sequence of a Cassava common mosaic virus Linggao isolate (CsCMV-LG) was determined from cassava (Manihot esculenta Crantz) with mild leafy mosaic symptom to no symptom in China. Excluding the poly(A) tail, the CsCMV-LG genome (GenBank accession No. MT038420) is 6374 nucleotides (nts) in length, with five major open reading frames encoding a 1450-amino acids (aa) RNA-dependent RNA polymerase (RdRp), three triple gene block (TGB) proteins (231-aa, 110-aa and 95-aa), and a 229-aa coat protein (CP). Phylogenetic analysis indicated that the complete genome of the CsCMV-LG is closely related to that of CsCMV-Brazilian which has been assigned to the genus Potexvirus, but the sequence identity shared only 88.0%. Notable, the mild CsCMV-LG isolate can also infect Nicotiana benthamiana in laboratory through rub inoculation causing mild vein yellowing at 15-day post inoculation. This is the first full-length genome sequence of a distinct isolate of Cassava common mosaic virus (CsCMV) infecting cassava in Hainan, China.
REVIEW | doi:10.20944/preprints201612.0061.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: alcohol; aroma; bioengineering; flavour; synthetic genomics; taste; wine; yeast
Online: 10 December 2016 (09:09:54 CET)
A perfectly balanced wine can be said to create a symphony in the mouth. To achieve the sublime, both in wine and music, requires imagination and skilled orchestration of artistic craftmanship. For wine, inventiveness starts in the vineyard. Similar to a composer of music, the grapegrower produces grapes through a multitude of specifications to achieve a quality result. Different Vitis vinifera grape varieties allow the creation of wine of different genres. Akin to a conductor of music, the winemaker decides what genre to create and considers resources required to realise the grape’s potential. A primary consideration is the yeast: inoculate the grape juice or leave it ‘wild’; which specific or combined Saccharomyces strain(s) should be used; or proceed with a non-Saccharomyces species? Whilst the various Saccharomyces and non-Saccharomyces yeasts perform their role during fermentation, the performance is not over until the ‘fat lady’ (S. cerevisiae) has sung (i.e. the grape sugar has been fermented to specified dryness and alcoholic fermentation is complete). Is the wine harmonious or discordant? Will the consumer demand an encore and make a repeat purchase? Understanding consumer needs lets winemakers orchestrate different symphonies (i.e. wine styles) using single- or multi-species ferments. Some consumers will choose the sounds of a philharmonic orchestra comprising a great range of diverse instrumentalists (as is the case with wine created from spontaneous fermentation); some will prefer to listen to a smaller ensemble (analogous to wine produced by a selected group of non-Saccharomyces and Saccharomyces yeast); and others will favour the well-known and reliable superstar soprano (i.e. S. cerevisiae). But what if a digital music synthesiser ‒ such as a synthetic yeast ‒ becomes available that can produce any music genre with the purest of sounds by the touch of a few buttons? Will synthesisers spoil the character of the music and lead to the loss of the much-lauded romantic mystique? Or will music synthesisers support composers and conductors to create novel compositions and even higher quality performances that will thrill audiences? This article explores these and other relevant questions in the context of winemaking and the role that yeast and its genomics play in the betterment of wine quality.
REVIEW | doi:10.20944/preprints202201.0084.v1
Subject: Life Sciences, Biotechnology Keywords: hybrid; lager; yeast; introgression; interspecific; domestication; phylogeny; brewing; molecular; genomics
Online: 6 January 2022 (11:55:13 CET)
: Microbiology has long been a keystone in fermentation and the utilization of yeast biology rein-forces molecular biotechnology as the pioneering frontier in brewing science. Consequently, modern understanding of the brewer’s yeast has faced significant refinement over the last few decades. This publication presents a condensed summation of Saccharomyces species dynamics with an emphasis on the relationship between traditional ale yeast, Saccharomyces cerevisiae, and the interspecific hybrids used in lager beer production, S. pastorianus. Introgression from other Sac-charomyces species is also touched on. The unique history of Saccharomyces cerevisiae and Saccharo-myces hybrids are exemplified by recent genomic sequencing studies aimed at categorizing brewing strains through phylogeny and redefining Saccharomyces species boundaries. Phylogenetic investigations highlight the genomic diversity of Saccharomyces cerevisiae ale strains long known to brewers by their fermentation characteristics and phenotypes. Discoveries of genomic contribu-tions from interspecific Saccharomyces species into the genome of S. cerevisiae strains is ever more apparent with increased investigations on the hybrid nature of modern industrial and historical fermentation yeast.
ARTICLE | doi:10.20944/preprints202107.0457.v1
Subject: Life Sciences, Microbiology Keywords: Latilactobacillus sakei; comparative genomics; carbohydrate utilization; antibiotic tolerance; CRISPR-Cas
Online: 20 July 2021 (15:02:42 CEST)
Increasing attention has been paid to the potential probiotic effects of Latilactobacillus sakei. To explore the genetic diversity of L. sakei, 14 strains isolated from different niches (feces, fermented kimchi and meat products) and 54 published strains were compared and analyzed. The results showed that the average genome size and GC content of L. sakei were 1.98Mb and 41.22%, respectively. Its core genome mainly encodes translation and transcription, amino acid synthesis, glucose metabolism and defense functions. L. sakei has an open pan-genomic characteristics, and its pan-gene curve shows an upward trend. L. sakei has open pan-genome feature, and its pan-genome curve is on the rise. The genetic diversity of L. sakei is mainly reflected in carbohydrate utilization, antibiotic tolerance, and immune/competition-related factors, such as clustering regular interval short palindromic repeat sequence (CRISPR)-Cas. The CRISPR system is mainly IIA type, and a few are IIC types. This work provides a basis for the study of this species.
ARTICLE | doi:10.20944/preprints202102.0060.v1
Subject: Biology, Anatomy & Morphology Keywords: Escherichia coli; magnetite nanoparticles; metals; antibiotics; genomics; pleiotropy; cell morphology
Online: 1 February 2021 (15:58:10 CET)
Experimental evolution was utilized to produce 5 magnetite nanoparticle-resistant (FeNP1-5) populations of Escherichia coli. The control populations were not exposed to magnetite nanoparticles. The 24-hour growth of these replicates was evaluated in the presence of increasing concentrations magnetite NPs as well as other ionic metals (gallium III, iron II, iron III, silver I) and antibiotics (ampicillin, chloramphenicol, rifampicin, sulfanilamide, tetracycline). Scanning electron microscope was utilized to determine cell size and shape in response to magnetite nanoparticle selection. Whole genome sequencing was carried out to determine if any genomic changes that resulted from magnetite nanoparticle resistance. After 25 days of selection magnetite resistance was evident in the FeNP treatment. The FeNP populations also showed a highly significantly (p < 0.0001) greater 24-growth as measured by optical density in metals (Fe (II), Fe (III), Ga (III), Ag and Cu II); as well as antibiotics (ampicillin, chloramphenicol, rifampicin, sulfanilamide, and tetracycline). The FeNP resistant populations also showed a significantly greater cell length compared to controls (p < 0.001). Genomic analysis of FeNP identified both polymorphisms and hard selective sweeps in the RNA polymerase genes rpoA, rpoB, and rpoC. Collectively, our results show that E. coli can rapidly evolve resistance to magnetite nanoparticles and that this result is correlated resistances to other metals and antibiotics. There were also changes in cell morphology resulting from adaptation to magnetite NPs. Thus, the various applications of magnetite nanoparticles could result in unanticipated changes in resistance to both metal and antibiotics.
REVIEW | doi:10.20944/preprints202008.0133.v1
Subject: Life Sciences, Virology Keywords: epidemic; viral sequences; genomics; metadata; data harmonization; integration and search
Online: 5 August 2020 (10:58:27 CEST)
With the outbreak of the COVID-19 disease, the research community is producing unprecedented efforts dedicated to better understand and mitigate the affects of the pandemic. In this context, we review the data integration efforts required for accessing and searching genome sequences and metadata of SARS-CoV2, the virus responsible for the COVID-19 disease, which have been deposited into the most important repositories of viral sequences. Organizations that were already present in the virus domain are now dedicating special interest to the emergence of COVID-19 pandemics, by emphasizing specific SARS-CoV2 data and services. At the same time, novel organizations and resources were born in this critical period to serve specifically the purposes of COVID-19 mitigation, while setting the research ground for contrasting possible future pandemics. Accessibility and integration of viral sequence data, possibly in conjunction with the human host genotype and clinical data, are paramount to better understand the COVID-19 disease and mitigate its effects.
REVIEW | doi:10.20944/preprints201811.0571.v2
Subject: Life Sciences, Genetics Keywords: prostate cancer; prostate-specific antigen; incidence; genomics; next generation sequencing
Online: 3 April 2019 (10:15:50 CEST)
In the recent past, there has been a rise in Prostate Cancer (PCa) in Asia, particularly India. Although systematic reviews on PCa have dealt on the genetics, genomics and the environmental influence in causal of PCa, no predictive analytics in comparing the PCa from Caucasian, American to Asian population was attempted. In this review article, we have attempted to elaborate this aspect of PCa and deliberated on challenges related to next generation sequencing methods of PCa’s manifestation when compared to the west.
ARTICLE | doi:10.20944/preprints201811.0071.v1
Subject: Life Sciences, Other Keywords: basal angiosperms; chloroplast; comparative genomics; Nymphaeales; Nymphaeaceae; phylogenomics; water lily
Online: 2 November 2018 (16:20:31 CET)
The order Nymphaeales, consisting of three families with a record of eight genera, has gained significant interest from botanists probably due to its position as a basal-angiosperm. The phylogenetic relationships within the order have well been studied and resolved; however, a few controversial nodes still remain in the Nymphaeaceae including the position of the genus Nuphar. The position of the genus Nuphar and the monophyly of the Nymphaeaceae family remain uncertain. This study adds to the increasing number of completely sequenced plastid genomes of the Nymphaeales and applies large chloroplast gene data set in reconstructing the intergeneric relationships within the Nymphaeaceae. Five complete chloroplast genomes were newly generated, including a first one for the monotypic genus Euryale. Using a set of 66 protein coding genes from the chloroplast genomes of 17 taxa, the phylogenetic position of Nuphar was determined and a monophyletic Nymphaeaceae family was obtained with a convincing statistical support from both partitioned and unpartitioned data schemes. Although genomic comparative analyses revealed a high degree of synteny among the chloroplast genomes of the ancient angiosperms, key minor variations were evident particularly in the contraction/expansion of the Inverted Repeat regions and in RNA editing events. Genome structure, gene content and arrangement were highly conserved among the chloroplast genomes.
REVIEW | doi:10.20944/preprints202203.0388.v1
Subject: Biology, Ecology Keywords: phenotypic polymorphism; structural genomics; chromosomal inversion; supergene; functional genomics; hormonal plasticity; frequency-dependent selection; cryptic female choice of sperm; sexual selection; eco-evolutionary dynamics
Online: 30 March 2022 (10:15:55 CEST)
A few empirical examples document fixed alternative male mating strategies in animals. Here we focus on the polymorphism of male mating strategies in the ruff (Calidris pugnax, Aves Charadriiformes). In ruffs, three fixed alternative male mating strategies coexist and are signaled by extreme plumage polymorphism. We first present relevant data on the biology of the species. Then we review the available knowledge of the behavioral ecology of ruffs during the breeding season and we detail the characteristics of each of the three known fixed male mating strategies. We next turn to the exceptional quality results accumulated on both the structural and functional genomics of the ruff over the past few years. We show how much these genomic data can shed a new, mechanistic light on the evolution and maintenance of the three fixed alternative male mating strategies. We then look if there are sufficient indication to support frequency-dependent selection as key mechanism in maintaining these three strategies. Specifically, we search for evidences of equal fitness among individuals using each of the three strategies. Finally, we propose three lines of research avenues that will help to understand the eco-evolutionary dynamics of phenotypic differences within natural populations of this iconic model species.
ARTICLE | doi:10.20944/preprints202206.0105.v1
Subject: Biology, Other Keywords: Smear-ripened cheese; virulent phages; rind bacteria; phage reservoirs; viral genomics
Online: 7 June 2022 (10:37:21 CEST)
Smear-ripened cheeses host complex microbial communities that play a crucial role in the ripening process. Although bacteriophages have been frequently isolated from dairy products, their diversity and ecological role in such type of cheese remain underexplored. In order to fill this gap, the main objective of this study was to isolate and characterize bacteriophages from the rind of a smear-ripened cheese. Thus, viral particles extracted from cheese rind were tested against a collection of bacterial isolates through a spot assay. In total, five virulent bacteriophages infecting Brevibacterium aurantiacum, Glutamicibacter arilaitensis, Leuconostoc falkenbergense and Psychrobacter aquimaris species were obtained. All exhibit a narrow host range, being only able to infect a few cheese-rind isolates within the same species. The complete genome of each phage was sequenced using both Nanopore and Illumina technologies, assembled and annotated. Sequence comparison with known phages revealed that four of them may represent at least new genera. The distribution of the five virulent phages into the dairy-plant environment was also investigated by PCR and three potential reservoirs were identified. This work provides new knowledge on the cheese rind viral community and an overview of the distribution of phages within a cheese factory.
ARTICLE | doi:10.20944/preprints202205.0070.v1
Subject: Life Sciences, Microbiology Keywords: phototrophic bacteria; phototrophic extracellular electron uptake; comparative genomics; transcriptomics; environmental microbiology
Online: 6 May 2022 (09:35:45 CEST)
Rhodovulum spp. are anoxygenic photosynthetic purple bacteria with versatile metabolisms, including the ability to obtain electrons from minerals in their environment to drive photosynthesis, a relatively novel process called phototrophic extracellular electron uptake (pEEU). Recently, our group isolated 15 strains of R. sulfidophilum to observe this metabolism in marine phototrophs. Our group previously observed carbon dioxide fixation coupled to phototrophic iron oxidation (photoferrotrophy) and pEEU in AB26 and identified a novel di-heme c¬-type cytochrome EeuP important for pEEU but not photoferrotrophy. Taxonomic re-evaluation based on 16S and pufM phylogenetic analyses led us to re-classify two isolates, AB26 and AB19, as Rhodovulum visakhapatnamense. The AB26 genome consists of 4,380,746 base-pairs, including two plasmids, and encodes 4,296 predicted protein-coding genes. AB26 contains 22 histidine kinases, 20 response regulators, and dedicates ~16% of its genome to transport. Transcriptomic data under aerobic, photoheterotrophy, photoautotrophy, and pEEU reveals how gene expression varies between metabolisms. Lastly, we use transcriptomic data for a comparative genomic analysis of potential pEEU-relevant genes between all 15 isolates. With these data we identify potential pEEU capable phototrophs within these isolates, and likely molecular mechanisms of pEEU.
ARTICLE | doi:10.20944/preprints202109.0102.v1
Subject: Life Sciences, Biotechnology Keywords: abiotic stress; HSFs; genomics; gene ontology; maize breeding; protein 3D structures
Online: 6 September 2021 (13:57:37 CEST)
Heat shock transcription factors (HSFs) participate in regulating many environmental stress responses and biological processes in plants. Maize (Zea mays L.) is a major cash crop that is grown worldwide. However, the growth and yield of maize are affected by several adverse environmental inputs. Therefore, investigating the factors that regulate maize growth and development and resistance to abiotic stress is an essential task for developing stress-resilient maize varieties. Thus, a comprehensive genome-wide identification analysis was performed to identify HSFs in the maize genome. The current study identified 25 ZmHSFs, randomly distributed throughout the maize genome. Phylogenetic analysis revealed that ZmHSFs are divided into three classes and 13 sub-classes. Gene structure and protein motif analysis supported the results obtained through the phylogenetic analysis. Domain analysis showed the DNA-binding domain to be the most conserved region of ZmHSFs. Segmental duplication is shown to be responsible for the expansion of ZmHSFs. Most of the ZmHSFs are localized inside the nucleus, and the ZmHSFs which belong to the same group show similar physio-chemical properties. The 3D structures revealed comparable conserved ZmHSFs protein structures. RNA-seq analysis revealed a major role of class A HSFs including, ZmHSFA-1a and ZmHSFA-2a in all the maize growth stages, i.e., seed, vegetative, and reproductive development. Furthermore, ZmHSFs displayed an obvious spatiotemporal expression. Under abiotic stress conditions (heat, drought, cold, UV, and salinity), members of class A and B ZmHSFs are induced. Gene ontology (GO) annotation analysis indicated a major role of ZmHSFs in resistance to environmental stress and regulation of primary metabolism. Further, the protein-protein interaction analysis showed that ZmHSFs interact with several molecular chaperons and major stress-responsive proteins. To summarize, this study provides novel insights for functional studies on the ZmHSFs in maize breeding programs.
ARTICLE | doi:10.20944/preprints202103.0103.v1
Subject: Medicine & Pharmacology, Allergology Keywords: Next Generation Sequencing; Laboratory automation; Hereditary Cancer; Genetic Testing; Clinical Genomics.
Online: 2 March 2021 (16:00:24 CET)
(1) Background: the NGS based mutational study of hereditary cancer genes is crucial to design tailored prevention strategies in subjects with different hereditary cancer risk. The ease of amplicon-based NGS library construction protocols contrasts with the greater uniformity of enrichment provided by capture-based protocols and so with greater chances for detecting larger genomic rearrangements and copy-number variations. Capture-based protocols, however, are characterized by a higher level of complexity of sample handling, extremely susceptible to human bias. Robotics platforms may definitely help dealing with these limits, reducing hands-on time, limiting random errors and guaranteeing process standardization. (2) Methods: We implemented and validated the complete automation of the SOPHiA GENETICS’ CE-IVD Hereditary Cancer Solution™ (HCS) libraries preparation workflow on the Hamilton’s STARlet platform. (3) Results: We demonstrate that this automated workflow, used for more than 1000 samples achieved the same performances of manual setup in terms of coverages and reads uniformity, with extremely lower variability of reads mapping rate onto the regions of interest. (4) Conclusions: This automated solution offers same reliable and affordable NGS data, but with the essential advantages of a flexible, automated and integrated framework, minimizing possible human errors and depicting a laboratory’s walk-away scenario.
REVIEW | doi:10.20944/preprints202010.0301.v1
Subject: Life Sciences, Genetics Keywords: Obesity; Genetics; Companion Animals; Metabolic Disease; Comparative Genomics; Dogs; Cats; Horses
Online: 14 October 2020 (10:51:29 CEST)
Obesity is one of the most prevalent health conditions in humans and companion animals across the world. Obesity is associated with multiple health conditions across species including premature mortality. It is therefore of importance across the fields of medicine and veterinary medicine. The regulation of body weight is a homeostatic process vulnerable to disruption by genetic and environmental factors. It is well established that the heritability of obesity is high in humans and laboratory animals, with ample evidence that the same is true in companion animals. In this review, we provide an overview of how genes link to obesity in humans, drawing on a wealth of information from laboratory animal models, and summarising the mechanisms by which obesity causes related disease. Throughout, we focus on how large-scale human studies and niche investigations of rare mutations in severely affected patients have improved our understanding of obesity biology and can inform our ability to interpret results of animal studies. For dogs, cats and horses, we review the similarities in obesity pathophysiology to humans and review those genetic studies that have been done to investigate them. Finally, we discuss how veterinary genetics may learn from humans about studying precise, nuanced phenotypes and implementing large-scale studies, but also how veterinary studies may be able to look past clinical findings to mechanistic ones and demonstrate translational benefits to human research.
REVIEW | doi:10.20944/preprints202005.0448.v1
Subject: Life Sciences, Virology Keywords: betacoronaviruses; genomics; SARS-CoV; MERS-CoV; SARS-CoV-2; COVID-19
Online: 27 May 2020 (08:50:46 CEST)
In the 21st century, three highly pathogenic betacoronaviruses have emerged, with an alarming rate of human morbidity and case fatality. Genomic information has been widely used to understand the pathogenesis, animal origin and mode of transmission of betacoronaviruses in the aftermath of the 2002-03 severe acute respiratory syndrome (SARS) and 2012 Middle East respiratory syndrome (MERS) outbreaks. Furthermore, genome sequencing and bioinformatic analysis have had an unprecedented relevance in the battle against the 2019-20 coronavirus disease 2019 (COVID-19) pandemic, the newest and most devastating outbreak caused by a coronavirus in the history of mankind, allowing the follow up of disease spread and transmission dynamics in near real time. Here, we review how genomic information has been used to tackle outbreaks caused by emerging, highly pathogenic, betacoronavirus strains, emphasizing on SARS-CoV, MERS-CoV and SARS-CoV-2.
REVIEW | doi:10.20944/preprints202004.0333.v1
Subject: Life Sciences, Genetics Keywords: bitter gourd; breeding; genetic diversity; genomics; heterosis; molecular breeding; mutation breeding
Online: 19 April 2020 (06:00:00 CEST)
Bitter gourd is an important vegetable of the family Cucurbitaceae, cultivated mainly in humid and subtropical Asia. Bitter gourd is vegetable with immense health benefits due to the presence of medicinal compounds such as charatin, vicine, and polypeptide-p, which play an essential role in lessening the blood glucose levels. Moreover, bitter gourd fruits are particularly rich in vitamin C, minerals, and carotenes. Here, an effort has been made to critically evaluate the extent of achievements during the enhancement and enactment of bitter gourd breeding programs with the use of latest technologies. Broadening of the genetic base of cultivated bitter groud varieties as a result of enrichment of the existing resources by using the wild species in the breeding programs. Practical seed production technological know-how along with the use of the MS system (male sterility)/chemical-induced sterility procedure is nonetheless vital to cope up with the market demands. Superior yielding bitter gourd hybrids combining early maturity and resistance to biotic and abiotic stresses are regularly needed to cope up with the challenge of bitter gourd production.
REVIEW | doi:10.20944/preprints201912.0316.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: Peanut; plant breeding; research; funding; genomics; INERA; cultivar; selection; Arachis hypogaea
Online: 24 December 2019 (11:07:38 CET)
Groundnut (Arachis hypogaea L.) is a major food and cash crop in Burkina Faso. Due to growing demand for raw oilseeds, there is an increasing interest in groundnut production from traditional rain-fed areas to irrigated environments. However, despite implementation of many initiatives in the past to increase groundnut productivity and production, the groundnut industry still struggles to prosper, due to several constraints including minimal development research and fluctuating markets. Yield penalty due to drought and biotic stresses continue to be a major drawback for groundnut production. This review traces progress in the groundnut breeding that started in Burkina Faso before the country’s political independence in 1960 through to present times. Up to the 1980s, groundnut improvement was led by international research institutions such as IRHO (Institute of Oils and Oleaginous Research) and ICRISAT (International Crops Research Institute for the Semi-Arid Tropics). However, international breeding initiatives were not sufficient to establish a robust domestic groundnut breeding programme. This review also provides essential information about opportunities and challenges of groundnut research in Burkina Faso, emphasising the need for institutional attention to genetic improvement of the crop.
ARTICLE | doi:10.20944/preprints201812.0026.v2
Subject: Life Sciences, Virology Keywords: Lactobacillus plantarum; phage; new genus; annotation; comparative genomics; phylogenetics; isolation; diversity
Online: 11 June 2019 (09:54:23 CEST)
Lactobacillus plantarum is a bacterium with promising applications to the food industry and agriculture and probiotic properties. So far, bacteriophages of this bacterium have been moderately addressed. We examined the diversity of five new L. plantarum phages via whole genome shotgun sequencing and in silico protein predictions. Moreover, we looked into their phylogeny and their potential genomic similarities to other complete phage genome records through extensive nucleotide and protein comparisons. These analyses revealed a high degree of similarity among the five phages, which extended to the vast majority of predicted virion-associated proteins. Based on these, we selected one of the phages as a representative and performed transmission electron microscopy and structural protein sequencing tests. Overall, the results suggested that the five phages belong to the family Myoviridae, they have a long genome of 137.973-141.344 bp, a G/C content of 36,3-36,6% that is quite distinct from their host’s, and, surprisingly, seven to 15 tRNAs. Only an average 41/174 of their predicted genes were assigned a function. The comparative analyses unraveled considerable genetic diversity for the five L. plantarum phages of this study. Hence, the new genus “Semelevirus” was proposed, which comprises exclusively the five phages. This novel lineage of Lactobacillus phages provides further insight into the genetic heterogeneity of phages infecting Lactobacillus sp.. The five new Lactobacillus phages have a potential value for the development of more robust starters through, for example, the selection of mutants insensitive to phage infections. The five phages could also form part of phage cocktails, which producers would apply in different stages of L. plantarum fermentations in order to create a range of organoleptic outputs.
ARTICLE | doi:10.20944/preprints201905.0284.v1
Subject: Life Sciences, Genetics Keywords: personal genomics, DNA, polygenic, risk, regulation, discrimination, calibration, prediction, transparency, autonomy
Online: 23 May 2019 (16:23:57 CEST)
Direct-to-consumer genetic testing companies aim to predict the risks of complex diseases using proprietary algorithms. Companies keep algorithms as trade secrets for competitive advantage, but a market that thrives on the premise that customers can make their own decisions about genetic testing should respect customer autonomy and informed decision making and maximize opportunities for transparency. The algorithm itself is only one piece of the information that is deemed essential for understanding how prediction algorithms are developed and evaluated. Companies should be encouraged to disclose everything else, including the expected risk distribution of the algorithm when applied in the population, using a benchmark DNA dataset. A standardized presentation of information and risk distributions allows customers to compare test offers and scientists to verify whether the undisclosed algorithms could be valid. A new model of oversight in which stakeholders collaboratively keep a check on the commercial market is needed.
ARTICLE | doi:10.20944/preprints201803.0145.v1
Subject: Life Sciences, Genetics Keywords: repetitive elements; RNA-Seq; genomics; evolution; cytogenetics; supernumerary elements; extra chromosomes
Online: 19 March 2018 (08:33:48 CET)
B chromosomes (B) are supernumerary elements found in many taxonomic groups. Most B chromosomes are rich in heterochromatin and composed of abundant repetitive sequences, especially transposable elements (TEs). Bs origin is generally linked to the A chromosome complement (A). The first report of a B chromosome in African cichlids was on Astatotilapia latifasciata, which can harbor 0, 1 or 2 B chromosomes. Classical cytogenetics studies found high TE content on the species B chromosome. In this study, we aim to understand TE composition and expression on A. latifasciata genome and its relation to the B chromosome. We use bioinformatics analysis to explore TEs genome organization and also their composition on the B chromosome. Bioinformatics findings were validated by fluorescent in situ hybridization (FISH) and real-time PCR (qPCR). A. latifasciata has a TE content similar to other cichlid fishes and several expanded elements on its B chromosome. With RNA sequencing data (RNA-seq) we showed that all major TE classes are transcribed in brain, muscle and male/female gonads. The evaluation of TE expression between B- and B+ individuals showed that few elements have differential expression among groups and expanded B elements were not highly transcribed. Putative silencing mechanisms may the acting on the B chromosome of A. latifasciata to prevent adverse consequences of repeat transcription and mobilization in the genome.
REVIEW | doi:10.20944/preprints201612.0113.v1
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: breast cancer; brain metastases; clonal evolution; precision medicine; genomics; tumour microenvironment
Online: 22 December 2016 (09:57:33 CET)
Brain metastases are highly evolved manifestations of breast cancer arising in a unique microenvironment, giving them exceptional adaptability in the face of new extrinsic pressures. The incidence is rising in line with population ageing, and use of newer therapies that stabilise metastatic disease burden with variable efficacy throughout the body. Historically, there has been a widely held view that brain metastases do not respond to circulating therapeutics because the blood-brain-barrier (BBB) restricts their uptake. However, emerging data are beginning to paint a far more complex picture where the brain acts as a sanctuary for dormant, subclinical proliferations that are initially protected by the BBB, but then exposed to dynamic selection pressures as tumours mature and vascular permeability increases. Here, we review key experimental approaches and landmark studies that have charted the genomic landscape of breast cancer brain metastases. These findings are contextualised with the factors impacting on clonal outgrowth in the brain: intrinsic breast tumour cell capabilities required for brain metastatic fitness, and the neural niche, which is initially hostile to invading cells but then engineered into a tumour-support vehicle by the successful minority. We also discuss how late detection, abnormal vascular perfusion and interstitial fluid dynamics underpin the recalcitrant clinical behaviour of brain metastases, and outline active clinical trials in the context of precision management.
ARTICLE | doi:10.20944/preprints202107.0280.v1
Subject: Life Sciences, Biochemistry Keywords: comparative genomics; metabolic reconstruction; bioinformatics; conserved unknowns; function prediction; functional annotation; orthology
Online: 13 July 2021 (08:58:31 CEST)
Members of the DUF34 (domain of unknown function 34) family, also known as the NIF3 protein superfamily, are ubiquitous across superkingdoms. Proteins of this family have been widely annotated as “GTP cyclohydrolase I type 2” through electronic propagation based on one study. Here, the annotation status of this protein family was examined through comprehensive literature review and integrative bioinformatic analyses that revealed varied pleiotropic associations and phenotypes. This analysis combined with functional complementation studies strongly challenges the current annotation and suggests that DUF34 family members may serve as metal ion insertases, chaperones, or metallocofactor maturases. This general molecular function could explain how DUF34 subgroups participate in highly diversified pathways such as cell differentiation, metal ion homeostasis, pathogen virulence, redox and universal stress responses.
Subject: Biology, Anatomy & Morphology Keywords: mitochondria; mitochondrial DNA; nervous tissue, OxPhos complexes; bioenergetics; genomics; proteomics; mitochondrial diseases
Online: 17 June 2021 (15:12:01 CEST)
Oxidative phosphorylation (OxPhos) is the basic function of mitochondria although the land-scape of mitochondrial functions is continuously growing to include more aspects of cellular homeostasis. Thanks to the application of -omics technologies to the study of the OxPhos system, novel features emerge from the cataloging of novel proteins as mitochondrial thus adding de-tails to the mitochondrial proteome and defining novel metabolic cellular interrelations, espe-cially in the human brain. We focussed on the diversity of bioenergetics demand and different aspects of mitochondrial structure, functions, and dysfunction in the brain. Definition as ‘mitoexome’, ‘mitoproteome’ and ‘mitointeractome’ have entered the field of ‘mitochondrial medicine’. In this context, we reviewed several genetic defects that hamper the last step of aerobic metabolism mostly involving the nervous tissue as one of the most prominent energy-dependent tissues and, as consequence, as a primary target of mitochondrial dysfunction. The dual genetic determination of the OxPhos complexes is one of the reasons for the complexity of the geno-type-phenotype correlation when facing human diseases associated with mitochondria defects; clinically, are characterized by extremely heterogeneous symptoms, ranging from organ-specific to multisystemic dysfunction with different clinical courses. Finally, we briefly discuss the fu-ture directions of the multi-omics study of human brain disorders.
REVIEW | doi:10.20944/preprints202106.0363.v1
Subject: Biology, Anatomy & Morphology Keywords: Traditional food crops; Climate change; Food security; Omics; Translational genomics; Gene editing
Online: 14 June 2021 (13:02:24 CEST)
The indigenous communities across the globe especially in the rural areas consume locally available plants known as Traditional Food Plants (TFPs) for their nutritional and health-related needs. Recent research shows that many of the traditional food plants are highly nutritious as they contain health beneficial metabolites, vitamins, mineral elements and other nutrients. Excessive reliance on the mainstream staple crops has its own disadvantages. TFPs are nowadays considered important crops of the future and can act as supplementary foods for the burgeoning global population. They can also act as emergency foods in times of pandemics and other situations like COVID-19. The current situation necessitates locally available alternative nutritious TFPs for sustainable food production. To increase the cultivation or improve the traits in TFPs, it is essential to understand the molecular basis of the genes that regulate some important traits such as nutritional components and resilience to biotic and abiotic stresses. The integrated use of modern omics and gene editing technologies provide great opportunities to better understand the genetic and molecular basis of superior nutrient content, climate-resilient traits and adaptation to local agroclimatic zones. Recently, realising the importance and benefits of TFPs, scientists have shown interest in the prospection and sequencing of traditional food plants for their improvements, further cultivation and mainstreaming. Integrated omics such as genomics, transcriptomics, proteomics, metabolomics and ionomics are successfully used in plants and have provided a comprehensive understanding of gene-protein-metabolite networks. Combined use of omics and editing tools has led to successful editing of beneficial traits in few TFPs. This suggests that there is ample scope of integrated use of modern omics and editing tools/techniques for improvement of TFPs and their use for sustainable food production. In this article, we highlight the importance, scope and progress towards improvement of TFPs for valuable traits by integrated use of omics and gene editing techniques.
ARTICLE | doi:10.20944/preprints202106.0214.v1
Subject: Biology, Anatomy & Morphology Keywords: formae speciales; horizontal gene transfer; endophytic; pathogenic; Fusarium; RNAseq; comparative genomics; vanilla
Online: 8 June 2021 (11:37:26 CEST)
Members of the Fusarium oxysporum species complex (FOSC) has the capacity to specialize into host-specific pathogens known as formae speciales through horizontal gene transfer between pathogenic and endophytic individuals. To this day, the origin of these formae speciales and the genetic determinants dictating the switch from endophytic to pathogenic Fusarium oxysporum (Fox) are still unknown. F. oxysporum f. sp. vanillae (Fov), member of FOSC, is the causal agent of root and stem rot disease, representing the main phytosanitary problem in vanilla plantations worldwide. Here we analyzed the RNA-seq libraries resulting from the interaction vanilla-Fov at early and late stages of the infection, and what we initially identified as control in a previous study, detecting the presence of Fox endophytes. We identified virulence, hypervirulence, sporulation, conidiation, necrosis, and production of fusaric acid as key processes taking place during Fov-vanilla interaction. Through comparison with endophytic Fox, we found that Fov can infect vanilla thanks to the presence of pathogenicity islands and genomic regions associated with supernumerary chromosomes. These play a central role as carriers of genes involved with pathogenic activity and could have being obtained by Fov through horizontal gene transfer. We also found that, unlike other pathogenic members of FOSC, Fov do not use Secreted in Xylem proteins (SIX) to infect vanilla.
ARTICLE | doi:10.20944/preprints202012.0387.v1
Subject: Life Sciences, Biochemistry Keywords: next-generation sequencing; database; variant annotation; variant classification; data management; clinical genomics
Online: 15 December 2020 (13:14:21 CET)
The rapid evolution of Next Generation Sequencing in clinical settings and the resulting challenge of variants interpretation in the light of constantly updated information, requires robust data management systems and organized approaches to variant reinterpretation. In this paper, we present iVar: a freely available and highly customizable tool provided with a user-friendly web interface. It represents a platform for the unified management of variants identified by different sequencing technologies. iVar accepts, as input, VCF files and text annotation files and elaborates them, optimizing data organization and avoiding redundancies. Updated annotations can be periodically re-uploaded and associated to variants as historicize attributes. Data can be visualized through variant-centered and sample-centered interfaces. A customizable search functionality can be exploited to periodically check if pathogenicity related data of a variant are changed over time. Patient recontacting ensuing from variant reinterpretation is made easier by iVar through the effective identification of all patients present in the database and carrying a specific variant. We tested iVar by uploading 4171 VCF files and 1463 annotation files, obtaining a database of 4166 samples and 22569 unique variants. iVar has proven to be a useful tool with good performances for collecting and managing data from medium-throughput
REVIEW | doi:10.20944/preprints202011.0060.v1
Subject: Life Sciences, Biochemistry Keywords: Cotton; Fiber initiation; Genomics; Epigenomics; Phytohormones; Transcription factors; MicroRNAs; Gene expression regulation
Online: 2 November 2020 (15:50:39 CET)
The epidermal cells on the surface of the cotton ovules undergo differentiation to produce fibers, which are single-celled hair-like protrusions resembling the plant trichomes. The initiation of these unicellular fibers from the cotton ovule surface is a complex and tightly regulated process. The initiation step is the cell fate-determining stage, which leads to the commitment of cells that eventually developed into fibers, thus becomes the most crucial phase in fiber development. The in-depth knowledge of molecular regulation is a prerequisite to get a clear view of the fiber initiation process's genetic and epigenetic control. The identification and functional validation of cotton fiber initiation-related genes, few fibreless mutants, transcription factors, microRNAs, epigenetic regulators, as well as the elucidation of the role of phytohormones as signaling molecules, has played a significant role in understanding the cotton fiber initiation process at the molecular level. This review focuses on the comprehensive information regarding the genetic and epigenetic regulation of cotton fiber initiation. Thus, the review will provide readers insight into mechanistic details that operate during cotton fiber initiation.
Subject: Life Sciences, Other Keywords: data science; reuse; sequencing data; genomics; bioinformatics; databases; computational biology; open science
Online: 16 July 2020 (12:39:43 CEST)
The 'big data revolution' has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the challenges, limitations and risks associated with it. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define ‘successful reuse’ as the use of previously published data to enable novel scientific findings and use selected examples of such reuse from different disciplines to illustrate the enormous potential of the practice, while acknowledging their respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of the practice as a norm has the potential to benefit all stakeholders in the life sciences.
REVIEW | doi:10.20944/preprints201911.0300.v1
Subject: Biology, Entomology Keywords: artificial selection; biological control; genetics; genome assembly; genomics; insect breeding; microbiome; modelling
Online: 24 November 2019 (17:10:31 CET)
Biological control is widely successful for controlling pests, but effective biocontrol agents are now more difficult to obtain due to more restrictive international trade laws. Coupled with increasing demand, the efficacy of existing and new biocontrol agents needs to be improved with genetic and genomic approaches. Although they have been underutilised in the past, applying genetic and genomic techniques is becoming more feasible from both technological and economic perspectives. We review current methods and provide a framework for using them, incorporating evolutionary and ecological principles. First, it is necessary to identify which biocontrol trait to select and in what direction. Next, the genes or markers linked to these traits need be determined to better target their selection, followed by how to implement this information into a breeding program. Choosing a trait can be assisted by modelling to account for the proper agro-ecological context, and by knowing which traits have sufficiently high heritability values. We provide guidelines for designing genomic strategies in biocontrol programs, which depends on the organism, budget, and desired objective. Genomic approaches start with genome sequencing and assembly. We provide a guide for deciding the most successful sequencing strategy for biocontrol agents. Gene discovery involves quantitative trait loci (QTL) analyses, transcriptomic and proteomic studies, and gene editing. Improving biocontrol practices include marker-assisted selection, genomic selection and microbiome manipulation of biocontrol agents, and monitoring for genetic variation during rearing and post-release. We conclude by identifying the most promising applications of genetic and genomic methods to improve biological control efficacy.
ARTICLE | doi:10.20944/preprints201809.0169.v1
Subject: Biology, Animal Sciences & Zoology Keywords: European sardine; draft genome; teleosts; comparative genomics; long chain polyunsaturated fatty acids
Online: 10 September 2018 (12:37:23 CEST)
Clupeiformes, such as sardines and herrings, represent an important share of worldwide fisheries. Among those, the European sardine (Sardina pilchardus, Walbaum 1792) exhibits significant commercial relevance. While the last decade showed a steady and sharp decline in capture levels, recent advances in culture husbandry represent promising research avenues. Yet, the complete absence of genomic resources from sardine imposes a severe bottleneck to understand its physiological and ecological requirements. We generated 69 Gbp of paired-end reads using Illumina HiSeq X Ten and assembled a draft genome assembly with an N50 scaffold length of 25579 bp and BUSCO completeness of 82.1% (Actinopterygii). The estimated size of the genome ranges between 655 and 850 Mb. Additionally, we generated a relatively high-level liver transcriptome. To deliver a proof of principle of the value of this dataset, we established the presence and function of enzymes (elovl2, elovl5 and fads2) that have pivotal roles in the biosynthesis of long chain polyunsaturated fatty acids, essential nutrients particularly abundant in oily fish such as sardines. Our study provides the first omics dataset from a valuable economic marine teleost species, the European sardine, an essential resource for their effective conservation, management and sustainable exploitation.
ARTICLE | doi:10.20944/preprints201805.0471.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: genomics; genomic medicine; health outcomes; evidence; standards; eMERGE; ClinGen; precision public health
Online: 31 May 2018 (11:27:23 CEST)
Genomic medicine is moving from research to the clinic. There is a lack of evidence about the impact of genomic medicine interventions on health outcomes. This is due in part to a lack of standardized outcome measures that can be used across different programs to evaluate the impact of interventions targeted to specific genetic conditions. The eMERGE Outcomes working group (OWG) developed measures to collect information on outcomes following the return of genomic results to participants for several genetic disorders. These outcomes were compared to outcome intervention pairs for genetic disorders developed independently by the ClinGen Actionability working group (AWG). In general, there was concordance between the defined outcomes between the two groups. The ClinGen outcomes tended to be higher level and the AWG scored outcomes represented a subset of outcomes referenced in the accompanying AWG evidence review. eMERGE OWG outcomes were more detailed and discrete, facilitating collection of relevant information from health records. This paper demonstrates that common outcomes for genomic medicine interventions can be identified. Further work is needed to standardize outcomes across genomic medicine implementation projects and make these publicly available to enhance dissemination and assist in making precision public health a reality.
ARTICLE | doi:10.20944/preprints201803.0009.v1
Subject: Biology, Other Keywords: gene flow; sympatry; parapatry; simulation model; population genomics; Heliconius; coupling; nonlinear transitions
Online: 1 March 2018 (15:23:13 CET)
During speciation-with-gene-flow, a transition from single-locus to multi-locus processes can occur, as strong coupling of multiple loci creates a barrier to gene flow. Testing predictions about such transitions with empirical data requires building upon past theoretical work and the continued development of quantitative approaches. We simulated genomes under different evolutionary scenarios of gene flow and divergent selection, extending previous work with the additions of neutral sites and coupling statistics, allowing us to investigate if and how selected and neutral sites differ in the conditions they require for transitions during speciation. As the per-locus strength of selection grew and/or migration decreased, it became easier for selected sites to show divergence – and thus to rise in linkage disequilibrium (LD) with each other as a statistical consequence – farther in advance of the conditions under which neutral sites could diverge. Indeed, even very low rates of gene flow were sufficient to prevent differentiation at neutral sites. However, once strong enough, coupling among selected sites eventually reduced gene flow at neutral sites as well. To explore whether similar transitions might be detectable in empirical data, we used published genome resequencing data from three taxa of Heliconius butterflies. We found that allele-frequency outliers and F ST outliers exhibited stronger patterns of LD than the genomic background, as expected. The statistical characteristics of LD – likely indicative of the strength of coupling of barrier loci – varied between chromosomes and taxonomic comparisons. Broad qualitative agreement between the patterns we observed in the empirical data and our simulations suggests that selection drives rapid genome-wide transitions to multi-locus coupling, illustrating how divergence and gene flow interact along the speciation continuum.
REVIEW | doi:10.20944/preprints201608.0054.v1
Subject: Biology, Other Keywords: influenza virus; antiviral agent; proteomics; phosphoproteomics; metabolomics; transcriptomics; genomics; virtual ligand screening
Online: 5 August 2016 (12:41:07 CEST)
Human influenza A viruses (IAVs) cause global pandemics and epidemics. These viruses evolve rapidly, making current treatment options ineffective. To identify novel modulators of IAV-host interactions, we re-analyzed our recent transcriptomics, metabolomics, proteomics, phosphoproteomics, and genomics/virtual ligand screening data. We identified 713 potential modulators targeting 200 cellular and two viral proteins. Anti-influenza activity for 48 of them has been reported previously, whereas the antiviral efficacy of the remaining 665 is unknown. Studying anti-influenza efficacy, immuno-modulating properties and potential resistance of these compounds or their combinations may lead to the discovery of novel modulators of IAV-host interactions, which might be more effective than the currently available anti-influenza therapeutics.
REVIEW | doi:10.20944/preprints202207.0432.v2
Subject: Medicine & Pharmacology, Oncology & Oncogenics Keywords: mutation; gastric cancer; p53; K-Ras; c-Myc; cancer genomics; targeted therapy; immunotherapy
Online: 16 August 2022 (10:35:16 CEST)
The genetic changes appearing in the information system of the cell that program its unregulated growth and proliferation gradually lead to cancer manifestation, and the treatment options need to be guided accordingly. The critical roles played by some of the molecules associated with the signaling pathways and cell microenvironment that often induce tumorigenesis and metastasis have been described precisely in recent years based on findings of the human genome project and other related initiatives undertaken afterward to thoroughly understand the molecular basis of cancer cell behaviors. It is to rely upon the genomic study of cancer cells to fully understand the prognosis and pathways involved in disease progression to selectively target them for a cure. Furthermore, patients with the same cancer types often respond differently to cancer therapies, indicating the need for a patient-specific treatment regimen for cancer. In this direction, precision oncology, defined as the molecular profiling of tumors to identify targetable alterations for custom-tailored personalized treatment, is gaining ground as a potential means of cancer treatment and has started influencing the ways cancer has been treated in clinics. This article intends to comprehensively elucidate the foundations and frontiers of precision oncology in the context of recent advances in cancer genomics and single-cell technologies for assessing its scope and importance in the realization of a proper cure for cancer.
ARTICLE | doi:10.20944/preprints202109.0051.v1
Subject: Biology, Entomology Keywords: Helicoverpa zea; Bollworm; CRISPR; Cry1A; Bt Toxin; Genome Editing; Knockout; Functional Genomics; Resistance
Online: 3 September 2021 (08:19:44 CEST)
Members of the insect ATP binding cassette transporter subfamily C2 (ABCC2) in several moth species are known as receptors for the Cry1Ac insecticidal protein from Bacillus thuringiensis (Bt). Mutations that abolish the functional domains of ABCC2 are known to cause resistance to Cry1Ac, although the reported levels of resistance vary widely depending on insect species. In this study, the function of the ABCC2 gene as putative Cry1Ac receptor in Helicoverpa zea, a major pest of over 300 crops, was evaluated using CRISPR/Cas9 to progressively eliminate different functional ABCC2 domains. Results from bioassays with edited insect lines support that muta-tions in ABCC2 was associated with Cry1Ac resistance ratios (RR) ranging from 7.3- to 39.8-fold. No significant differences in susceptibility to Cry1Ac were detected between H. zea with partial or complete ABCC2 knockout, although highest levels of tolerance were observed when knocking out half of ABCC2. Based on >500-1,000-fold RRs reported in similar studies for closely related moth species, the low RRs observed in H. zea knockouts support that ABCC2 is not a major Cry1Ac receptor in this insect.
REVIEW | doi:10.20944/preprints202009.0091.v1
Subject: Life Sciences, Other Keywords: Evolutionary dynamics; life-history stages; mating systems; biotic interactions; climatic variability; ecological genomics
Online: 4 September 2020 (08:13:40 CEST)
Contemporary climate change is exposing plant populations to novel combinations of temperatures, drought stress, [CO2] and other abiotic and biotic conditions. These changes are rapidly disrupting the evolutionary dynamics of plants. Despite the multifactorial nature of climate change, most studies typically manipulate only one climatic factor. In this opinion piece, we seek to explore how climate change factors interact with each other and with biotic pressures to alter evolutionary processes. We first explore the ramifications of climate change for key life history stages (germination, growth and reproduction). We then examine how mating system variation influences population persistence under rapid environmental change and propose that mixed mating could be advantageous in future climates. Furthermore, we discuss how spatial and temporal mismatches between plants and their mutualists and antagonists could promote or constrain adaptive responses to climate change. For example, plant-virus interactions vary from highly pathogenic to mildly facilitative, and are partly mediated by temperature, moisture availability and [CO2]. Will host plants exposed to novel, stressful abiotic conditions be more susceptible to viral pathogens? Finally, we propose novel experimental approaches that could illuminate how plants will cope with unprecedented global change, such as resurrection studies combined with experimental evolution, genomics or epigenetics.
ARTICLE | doi:10.20944/preprints202005.0463.v1
Subject: Biology, Other Keywords: comparative genomics; SARS-CoV-2; microevolution; quasi-species; point mutation; disinfectants as mutagens
Online: 28 May 2020 (16:21:05 CEST)
In the wake of the current SARS-CoV-2 pandemic devastating the world, it is imperative to elucidate the comparative genomics of geographically-diverse strains of this novel coronavirus to gain insights into its microevolution, pathogenesis and control. Here we explore the molecular nature, genome-wide frequency, and gene-wise distribution of mutations in three distinct datasets encompassing 68 SARS-CoV-2 RNA-genomes altogether. While phylogenomic analysis revealed parallelism between the evolutionary paths charted by distinct quasispecies clusters of the virus, occurrence of mutations across genomes was found to be non-random. Whereas deletion mutations are extremely scarce and insertions totally absent, of all the instances of single nucleotide substitution detected, the overwhelming majority were transition mutations with cytidine to uridine being the most prevalent type. Propensity of this transition could be attributed to hydrolytic deamination mediated by ultra-violet irradiation or bisulfite reagent, both of which find wide usage as sterilizer/disinfectant. Transversions, albeit few and predominated by the guanosine to uridine form, were found concentrated in loci encoding the structural proteins of the virus, so might confer versatile tissue-colonization potentials. Mutation frequency of the three distinct genome-sets ranged narrowly between 0.07-1.08 × 10-4 nucleotides positions mutated per nucleotide aligned. Gene-wise mapping of the global mutations illuminated the highly conserved nature of the genes encoding the non-structural proteins Nsp7, Nsp8 (two essential cofactors of the viral RNA-dependent RNA-polymerase) and Nsp9 (Nsp8-interacting single-strand RNA-binding protein), plus the envelope protein E (involved in SARS-CoV-2 assembly, budding and pathogenesis). These mutation-free genomic loci and/or their protein products could be potent targets for future drug designing/targeting.
ARTICLE | doi:10.20944/preprints202005.0227.v1
Subject: Life Sciences, Genetics Keywords: meta-analysis; transcription factor; binding site; genomics; transcriptomics; chilling stress; CBF; DREB; CAMTA
Online: 13 May 2020 (15:17:16 CEST)
At the molecular level, response to an external factor or an internal condition causes reprogramming of temporal and spatial transcription. When an organism undergoes physiological and/or morphological changes, several signaling pathways are activated simultaneously. Examples of such complex reactions are the response to temperature changes, dehydration, various biologically active substances, and others. Synergistic action of multiple pathways greatly complicates the experimental study of the molecular genetic mechanisms of the organism's reactions. As a result, a significant part of the regulatory ensemble in such complex reactions remains unidentified. We developed metaRE, an R package for the systematic search for cis-regulatory elements enriched in the promoters of the genes significantly changed their transcription in a complex reaction. metaRE mines multiple expression profiling datasets generated to test the same organism's response and identifies simple and composite cis-regulatory elements systematically associated with differential expression of genes. Here we showed metaRE performance for identification of cold stress-responsive cis-regulatory code in Arabidopsis thaliana. MetaRE identified potential binding sites for known as well as unknown cold response regulators. Software with source files, documentation, and example data files are freely available online at the repository (https://github.com/cheburechko/MetaRE).
ARTICLE | doi:10.20944/preprints202002.0388.v1
Subject: Biology, Other Keywords: Biosynthetic gene clusters; Eurotium rubrum; halophilic; marine genomics; nonribosomal peptide synthetase; T1pks; Terpene
Online: 26 February 2020 (10:58:18 CET)
Eurotium rubrum is a halophilic marine ascomycete, which can bear the hypersalinities of the Red Sea and proliferate, while most living entities cannot bear this condition. Recently, a 26.2 Mb assembled genome of this fungus had become available. Marine fungi are fascinating organisms capable of harboring several biosynthetic gene clusters (BGCs), which enables them to produce several natural compounds with antibiotic and anticancerous properties. Understanding the BGCs are critically important for the development of biotechnological applications and the discovery of future drugs. There is no knowledge available on the BGCs of this halophilic marine ascomycete. Herein, we set out to explore and characterize BGCs and the corresponding genes from E. rubrum using bioinformatic methods. We deciphered 36 BGCs in the genome of E. rubrum. These 36 BGCs can be grouped into four non-ribosomal peptide synthetase (NRPS) clusters, eight NRPS-like (NRPSL) BGCs, eight type I polyketide synthase (T1PKS), 11 terpene BGCs including one β-lactone cluster, four hybrid BGCs, and two siderophore BGCs. This study is an example of marine genomics application into potential future drug-like compound discovery.
ARTICLE | doi:10.20944/preprints202102.0089.v1
Subject: Life Sciences, Biochemistry Keywords: Crimean-Congo hemorrhagic fever virus; hemorrhagic fever; viral genomics; Rhipicephalus (Boophilus) decoloratus; Rhipicephalus; ticks
Online: 2 February 2021 (13:32:42 CET)
Crimean-Congo haemorrhagic fever virus (CCHFV) is the most geographically widespread tick-borne virus. However, African strains are poorly represented in sequence databases. In addition, almost all sequence data have been obtained from cases of human disease, while information regarding circulation of the virus in tick and animal reservoirs is severely lacking. Here, we characterise the complete coding region of a novel CCHFV strain, detected in African blue ticks (Rhipicephalus (Boophilus) decoloratus) feeding on cattle in an abattoir in Kampala, Uganda. These cattle originated from a farm in Mbarara, a major cattle-trading hub for much of Uganda. Phylogenetic analysis indicates that the newly sequenced strain belongs to the African genotype II clade, which predominantly contains the sequences of strains isolated from West Africa in the 1950’s and South Africa in the 1980’s. Whilst, the viral S (nucleoprotein) and L (RNA polymerase) genome segments shared >90% nucleotide similarity with previously reported genotype II strains, the glycoprotein-coding M segment shared only 80% nucleotide similarity with the next most closely related strains, which were from India and China. This segment also displayed a large number of non-synonymous mutations previously unreported in genotype II strains. Characterisation of this novel strain adds to our limited understanding of the natural diversity of CCHFV circulating in both ticks and in Africa. Such data can be used to inform the design of vaccines and diagnostics, as well as studies exploring the epidemiology and evolution of the virus for the establishment of future CCHFV control strategies.
REVIEW | doi:10.20944/preprints202006.0284.v1
Subject: Medicine & Pharmacology, General Medical Research Keywords: SARS-CoV-2; RT-PCR; antibody; zoonotic; animal transmission; genomics; asymptomatic fraction; herd immunity
Online: 23 June 2020 (13:30:11 CEST)
Since December 2019, a rapid increase in the number of SARS-CoV-2 (COVID-19) cases was reported worldwide, despite strict infection control and lock down measures. Current paper investigated the actual facts behind this rapid increase in the number of cases. Study of genomic sequence reveals that domestic and wild animals were likely ancestors and zoonotic source for SARS-CoVs, MERS-CoVs, and SARS-CoV-2. Strong evidence suggest that these viruses already existed and replicated in animals and humans during past several decades, exhibiting diverse mutations, evolutions and self-limiting diseases, except during outbreaks. Serious zoonotic reservoir investigations are required to investigate animal transmission of SARS-CoVs and SARS-CoV-2 to limit current pandemic. This might be the reason of increasing number of cases via animals. SARS-CoV-2 has been retrospectively isolated in different studies in August 2019, several months before Wuhan announced. Hence, there is a possibility that viruses existed, went undetected, infecting subclinically, in past several years, and SARS-CoV-2 antigens and neutralizing antibodies may have been present in humans since long time. This might be another reason of increasing number of cases by screening as mass screening and antigen or antibody testing was not carried out in the past years. Randomized controlled trials are required to investigate human to human transmission by touch, as the current evidence is limited with conflicting results. As all SARS-CoVs are basically respiratory viruses, droplet precautions and infection control measures are essential, especially for hospital staff. Increased number of SARS-CoV-2 asymptomatic, or subclinical cases are detected worldwide. This silent phase of transmission can be beneficial for humans. Lack of symptoms eventually lessen virus transmission and reduce the pathogen's long-term survival and provide humoral herd immunity up to several years. Hence, seropositivity with diverse antibodies develops against mutating SARS-CoVs which will confer strong immunity during epidemics. Strategies such as identification, contact tracing and quarantine are costly and practically difficult. Hence, asymptomatic persons can continue their work with droplet precautions and standard infection control procedures, while symptomatic or sick persons can isolate themselves in their homes without the need for strict quarantine until clinical recovery, with reduced hospital visits and minimizing chances of hospital acquired infections. RT-PCR has low sensitivity and specificity, carries a high risk of handling live virus antigens, and requires difficult protocols. As viral load also sharply declines after few days of onset of infection, this technique might overlook infection. Furthermore, SARS-CoV-2 infection may be present in blood when oropharyngeal swabs are negative by RT-PCR. Additionally, RT-PCR usually gives false negative and false positive results and must be interpreted cautiously. This might be again a reason of increasing number of cases by false positive RT-PCR reporting. Moreover, antibodies against SARS-CoVs develop robustly in serum even by reduced amount of antigens. In contrast to RT-PCR, ELISA for diagnosing antibodies against SARS-CoV-2 demonstrates 100% specificity and 100% sensitivity, even in clinically asymptomatic individuals. These antibodies can be used for serologic surveys, monitoring and screening. However, screening tests for SARS-COV-2 should be avoided in unhygienic public places by nasopharyngeal swabs, which carry a high risk of further transmission, co-infection or superinfection. Such highly infectious virus must be isolated and tested in highly sterilized laboratory. Further strict international laws and policies are required to stop the possible spread of experimental viruses, biological warfare and bioterrorism.
ARTICLE | doi:10.20944/preprints201607.0096.v1
Subject: Life Sciences, Other Keywords: number of paralogs; comparative genomics; combinatorial optimization; Mycoplasmas; Halophiles; Orientia; Mycobacterium leprae; genome size
Online: 29 July 2016 (16:24:29 CEST)
The existence of multiple copies of genes is a well-known phenomenon. A gene family is a set of sufficiently similar genes, formed by gene duplication. In earlier works conducted on limited number of completely sequenced and annotated genomes it was found that size of gene family and size of genome are positively correlated. Additionally, it was found that several atypical microbes deviated from the observed general trend. In this study, we reexamined these associations on a larger dataset consisting of 1484 prokaryotic genomes and using several ranking approaches. We applied ranking methods in such a way that genomes with lower number of paralogs would have lower rank. Until now only simple ranking methods were used; we applied the Kemeny optimal aggregation approach as well. Regression and correlation analysis were utilized in order to accurately quantify and characterize the relationships between measures of paralog indices and genome size. In addition, boxplot analysis was employed as a method for outlier detection. We found that, in general, all paralog indexes positively correlate with an increase of genome size. As expected, different groups of atypical prokaryotic genomes were found for different types of paralog quantities.
REVIEW | doi:10.20944/preprints202208.0154.v1
Subject: Biology, Agricultural Sciences & Agronomy Keywords: Genome selection; Rice breeding; Genetic analysis; Omics assisted markers; Nutritional quality; Genomics and pangenomics; Biofortification
Online: 8 August 2022 (10:53:16 CEST)
The primary considerations while producing rice (Oryza sativa L.) include improving its nutritional quality and production. To tackle widespread hunger globally, better nutritional, high-yielding rice cultivars need to be developed. The conventional ways are to increase the production of rice and add balanced nutrients in the daily diet to fulfill the need of yield and nutrient quality. This article focuses on nutritional strategies for rice and illustrates the availability of omics technologies. Current advancements providing many methodologies and approaches for exploring genetic resources and for understanding the molecular mechanisms involved in trait formation have been highlighted. Studying the genetic influences of various characteristics has been proven to expedite crop breeding processes. In this perspective, genome-wide association research, genome selection (GS), and QTL mapping are all genetic analysis that helps in increasing the nutritional content of rice. Implementation of several omic techniques are effective approaches to enhance and regulate the nutritional quality of rice cultivars. Advancements in different types of omics including genomics and pangenomics, transcriptomics, metabolomics, nutrigenomics, and proteomics are also relevant to rice development initiatives. This review article compiles genes, locus, mutants and all omic approaches for rice enhancement. This knowledge will be very useful for now and for the future regarding rice studies.
REVIEW | doi:10.20944/preprints202108.0411.v1
Subject: Life Sciences, Biotechnology Keywords: abiotic stresses; gene-expression; genomics; ion homeostasis; plant growth and development; plasma membrane; sugar translocation
Online: 20 August 2021 (11:43:31 CEST)
Membrane transporters (MTs) are mainly localized at the plasma membrane (PM), tonoplast and vacuolar membrane (VM) of cells in all plant organs. Their work is to maintain the cellular homeostasis by controlling ionic movements across PM channels from roots to upper plant parts, xylem loading and remobilization of sugar molecules from photosynthesis tissues in the leaf (source) to roots, stem and seeds (sink) via phloem loading. The plant’s whole source-to-sink relationship is regulated by multiple transporting proteins in a highly sophisticated manner and driven based on different stages of plant growth and development (PG&D), and environmental changes. The MTs play a pivotal role in PG&D in terms of increased plant height, branches/tiller numbers, enhanced numbers, length and filled panicles per plant, seed yield and grain quality. Dynamic climatic changes disturbed the ionic balance (salt, drought and heavy metals) and sugar supply (cold and heat stress). Due to poor selectivity, some of the MTs also uptake toxic elements in the roots that negatively impact on PG&D, later on also exported to upper parts and then deteriorate the grain quality. As an adaptive strategy, in response to salt and HMs plants activated PM and VM localized MTs that export toxic elements into vacuole, and also translocate in the root’s tips and shoot. However, in case of drought, cold and heat stresses, MTs increased the water and sugar supply to all organs. In this review, we mainly reviewed recent literature from Arabidopsis, halophytes, and major field crops such as rice, wheat, maize and oilseed rape to argue on the global role of MTs in PG&D and abiotic stress tolerance. We also discussed the gene expression level changes and genomic variations within a species as well as within a family in response to developmental and environmental cues.
REVIEW | doi:10.20944/preprints202011.0599.v1
Subject: Life Sciences, Biochemistry Keywords: Precision medicine; Inborn errors of metabolism; Pharmacogenomics, Ethics; Genomics; Learning healthcare; Machine learning; Computational biology
Online: 23 November 2020 (20:44:20 CET)
Genome sequencing is enabling precision medicine—tailoring treatment to the unique constellation of variants in an individual’s genome. The impact of recurrent pathogenic variants is often understood, leaving a long tail of rare genetic variants that are uncharacterized. The problem of uncharacterized rare variation is especially acute when it occurs in genes of known clinical importance with functionally consequent frequent variants and associated mechanisms. Variants of unknown significance (VUS) in these genes are discovered at a rate that outpaces current ability to classify them using databases of previous cases, experimental evaluation, and computational predictors. Clinicians are thus left without guidance about the significance of variants that may have actionable consequences. Computational prediction of the impact of rare genetic variation is increasingly becoming an important capability. In this paper, we review the technical and ethical challenges of interpreting the function of rare variants in two settings: inborn errors of metabolism in newborns, and pharmacogenomics. We propose a framework for a genomic learning healthcare system with an initial focus on early-onset treatable disease in newborns and actionable pharmacogenomics. We argue that (1) a genomic learning healthcare system must allow for continuous collection and assessment of rare variants, (2) emerging machine learning methods will enable algorithms to predict the clinical impact of rare variants on protein function, and (3) ethical considerations must inform the construction and deployment of all rare-variation triage strategies, particularly with respect to health disparities arising from unbalanced ancestry representation.
REVIEW | doi:10.20944/preprints202008.0320.v1
Subject: Mathematics & Computer Science, Artificial Intelligence & Robotics Keywords: Data Science; Machine Learning; Deep Learning; Genomics; COVID-19; Drug Discovery; Image Analysis; Interactomics; Epidemiology
Online: 14 August 2020 (11:01:56 CEST)
The outbreak of novel Coronavirus (SARS-COV-2 ) disease (COVID-19) in Wuhan has attracted worldwide attention. SARS-COV-2 known to share a similar clinical manifestation that includes various symptoms such as pneumonia, fever, breathing difficulty, and in particular, SARS-COV-2 also causes a severe in ammation state that leads to death. Consequently, massive and rapid research growth has been observed across the globe to elucidate the mechanisms of infections and disease progression in genotype and phenotype scale. Data Science is playing a pivotal role in in-silico analysis to draw hidden and novel insights about the SARS-COV-2 origin, pathogenesis, COVID-19 outbreak forecasting, medical diagnosis, and drug discovery. With the availability of multi-omics, radiological, biomolecular, and medical data urges to develop novel exploratory and predictive models or customise exiting learning models to t the current problem domain. The presence of many approaches generates the need for the systematic surveys to guide both data scientists and medical practitioners. We perform an elaborate study on the state-of-the-art data science method ologies in action to tackle the current pandemic scenario. We consider various active COVID-19 data analytics domains such as phylogeny analysis, SARS-COV-2 genome identication, protein structure prediction, host-viral protein interactomics, clinical imaging, epidemiological analysis, and most importantly (existing) drug discovery. We highlight types of data, their generation pipeline, and the data science models in use. We believe that the current study will give a detailed sketch of the road map towards handling COVID-19 like situation by leveraging data science in the future. We summarise our review focusing on prime challenges and possible future research directions .
ARTICLE | doi:10.20944/preprints202001.0274.v1
Subject: Mathematics & Computer Science, Other Keywords: bioinformatics; computational genomics; computational medicine; data science; data visualization; parallel processing; grid computing; fog computing
Online: 24 January 2020 (10:26:26 CET)
Conventional data visualization software have greatly improved the efficiency of the mining and visualization of biomedical data. However, when one applies a grid computing approach the efficiency and complexity of such visualization allows for a hypothetical increase in research opportunities. This paper will present data visualization examples presented in conventional networks, then go into higher details about more complex techniques related to leveraging parallel processing architecture. Part of these complex techniques include the attempt to build a basic general adversarial network (GAN) in order to increase the statistical pool of biomedical data for analysis as well as an introduction to the project utilizing the decentralized-internet SDK. This paper is meant to show you said conventional examples then go into details about the deeper experimentation and self contained results.
ARTICLE | doi:10.20944/preprints201908.0288.v1
Subject: Life Sciences, Genetics Keywords: array-comparative genomic; gliomas; Cell culture; Cancer genomics; Cancer Transcriptomics; brain tumors; cell line; glioblastoma
Online: 27 August 2019 (16:34:22 CEST)
Cancer cell lines are widely used as in vitro models of tumorigenesis, facilitating fundamental discoveries in cancer biology and translational medicine. Currently, there are few options for glioblastoma (GBM) treatment and limited in vitro models with accurate genomic and transcriptomic characterization. Here, a detailed characterization of a new GBM cell line, namely AHOL1, was conducted in order to fully characterize its molecular composition based on its copy number alteration (CNA) and transcriptome profiling, followed by the validation of key elements associated with GBM tumorigenesis. Large numbers of CNAs and differentially expressed genes (DEGs) were identified. CNAs were distributed throughout the genome, including gains at Xq11.1-q28, Xp22.33-p11.1, Xq21.1-q21.33, 4p15.1-p14, 8q23.2-q23.3 and losses at Yq11.21-q12, Yp11.31-p11.2 and 12p13.31 positions. Nine druggable genes were identified, including HCRTR2, ETV1, PTPRD, PRKX, STS, RPS6KA6, ZFY, USP9Y and KDM5D. By integrating DEGs and CNAs, we identified 57 overlapping genes enriched in fourteen pathways. Altered expression of several cancer-related candidates found in the DEGs-CNA dataset was confirmed by RT-qPCR. Taken together, this first comprehensive genomic and transcriptomic landscape of AHOL1 provides unique resources for further studies and identifies several druggable targets that may be useful for therapeutics and biologic and molecular investigation of GBM.
ARTICLE | doi:10.20944/preprints201904.0014.v1
Subject: Life Sciences, Other Keywords: Alignment; assembly; taxonomic classification; time series; data transformation; DWT; DFT; PAA; data compression; compressive genomics
Online: 1 April 2019 (13:29:58 CEST)
Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work we explore the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Despite using highly compressed sequence transformations to accelerate the processes, our sequence processing approach yielded comparable accuracy to existing approaches, and are ideally suited for sequences originating from highly diverse virus populations. We demonstrate the application of our methodology to both synthetic and real viral pathogen sequence data. Our results show that the use of highly compressed sequence approximations can provide accurate results and that useful analytical performance can be retained and even enhanced through appropriate dimensionality reduction of sequence data.
ARTICLE | doi:10.20944/preprints201808.0539.v1
Subject: Life Sciences, Genetics Keywords: computational genomics; genome comparison; algorithms; genetic testing; privacy; direct-to-consumer; study design; population genetics
Online: 31 August 2018 (04:59:28 CEST)
Genetic testing has expanded out of the research laboratory into medical practice and the direct-to-consumer market, and rapid analysis of the resulting genotype data can now have significant impact. We present a method for summarizing personal genotypes as ‘genotype fingerprints’ that meet these needs. Genotype fingerprints can be derived from any single nucleotide polymorphism (SNP)-based assay, and remain comparable as chip designs evolve to higher marker densities. We demonstrate that they support distinguishing types of relationships among closely related individuals and closely related individuals from individuals from the same background population, as well as high-throughput identification of identical genotypes, individuals in known background populations, and de novo separation of subpopulations within a large cohort through extremely rapid comparisons. While fingerprints do not preserve anonymity, they provide a useful degree of privacy by summarizing a genotype in a way that prevents reconstruction of individual marker states. Genotype fingerprints are therefore well-suited as a format for public aggregation of genetic information to support ancestry and relatedness determination without revealing personal health risk status.
ARTICLE | doi:10.20944/preprints202205.0362.v1
Subject: Life Sciences, Microbiology Keywords: Synechococcus sp. PCC 11901; CodA selection; SacB selection; Vitamin B12; Photosynthesis, Photoinhibition; Comparative genomics; Cellular metabolism
Online: 26 May 2022 (10:30:28 CEST)
Synechococcus sp. PCC 11901 reportedly demonstrates the highest, sustained growth of any known cyanobacterium under optimized conditions. Due to its recent discovery our knowledge of its biology, including the factors underlying sustained, fast growth, is limited. Furthermore, tools specific for genetic manipulation of PCC 11901 are not established. Here, we demonstrate that PCC 11901 shows faster growth than other model cyanobacteria, including the fast-growing species Synechococcus elongatus UTEX 2973, under optimal growth conditions for UTEX 2973. Comparative genomics between PCC 11901 and Synechocystis sp. PCC 6803 reveal conservation of most metabolic pathways but PCC 11901 has a simplified electron transport chain and reduced light-harvesting complex. This may underlie its superior light utilization, reduced photoinhibition and higher photosynthetic and respiratory rates. To aid biotechnology applications we developed a vitamin B12 auxotrophic mutant but were unable to generate unmarked knockouts using two negative selectable markers, suggesting that recombinase- or CRISPR-based approaches may be required for repeated genetic manipulation. Overall, this study establishes PCC 11901 as one of the most promising species currently available for cyanobacterial biotechnology and provides a useful set of bioinformatics tools and strains for advancing this field, in addition to insights into the factors underlying its fast growth phenotype.
CASE REPORT | doi:10.20944/preprints202012.0548.v1
Subject: Medicine & Pharmacology, Allergology Keywords: next-generation sequencing; hereditary breast cancer; Homologous Recombination Repair; hereditary cancer syndrome; Clinical genomics; Molecular diagnostics
Online: 22 December 2020 (10:17:48 CET)
Next Generation Sequencing based cancer risk screening with multigene panels has become the most successful method for programming cancer prevention strategies. ATM germ-line heterozygosity has been described as able to increase tumor susceptibility. In particular, families that carry heterozygous germ-line variants of ATM gene show a 5- to 9-fold risk of developing breast cancer. Recent studies identified ATM as the second most mutated gene after CHEK2 in BRCA-negative patients. Nowadays, more than 170 potential missense variants and several truncating mutations have been identified in ATM gene. Here we present the molecular characterization a new ATM deletion, identified thanks to the CNV algorithm implemented in the NGS analysis pipeline. An automated workflow implementing the SOPHiA Genetics’ Hereditary Cancer Solution (HCS) protocol was used to generate NGS libraries that were sequenced on Illumina MiSeq Platform. NGS data analysis allowed to identify a new inactivating deletion of exons 19-27 of ATM gene. DNA breakpoint was characterized both at DNA and RNA level
ARTICLE | doi:10.20944/preprints202006.0228.v1
Subject: Biology, Horticulture Keywords: crop genetics; Solanum tuberosum; abiotic stress; phenylpropanoids; essential amino acid; transcriptome; small RNA; comparative genomics; nutrition
Online: 18 June 2020 (09:15:21 CEST)
Potato is among one of the most important food crops, yet maintaining plant productivity in this drought-sensitive crop has become a challenge. Competition for scarce water resources and the continued effects of global warming exacerbate current constraints on crop production. While plants’ response to drought in above-ground tissues has been well documented, the regulatory cascades in developing tubers have been largely unexplored. Using the commercial Canadian cultivar ‘Vigor’, plants were subjected to a drought treatment under high-tunnels causing a 4 ℃ increase in canopy temperature when compared to the well-watered control. Tubers were sampled for RNAseq and metabolite analysis. Approximately 2600 genes and 3898 transcripts were differentially expressed by at least four-fold in drought-stressed potato tubers, with 75 % and 69 % being down-regulated respectively. A further 229 small RNAs were implicated in gene regulation during drought. The comparison of protein homologues between Solanum tuberosum L. and Arabidopsis thaliana L. indicates that downregulated genes are associated with phenylpropanoid, carotenoid, and patatin biosynthesis. This suggests that there may be nutritive implications to drought stress occurring during the potato tuber bulking phase in sensitive cultivars.
REVIEW | doi:10.20944/preprints201911.0252.v1
Subject: Biology, Ecology Keywords: data; sequence; information; entropy; genome; gene; proteins; time-series; modeling; meta-genomics; transcriptomics; proteomics; bioinformatics; DNA
Online: 22 November 2019 (02:28:15 CET)
Today massive amounts of sequenced metagenomic and -transcriptomic data from different ecological niches and environmental locations are available. Scientific progress depends critically on methods that allow extracting useful information from the various types of sequence data. Here, we will first discuss types of information contained in the various flavours of biological sequence data, and how this information can be interpreted to increase our scientific knowledge and understanding. We argue that a mechanistic understanding is required to consistently interpret experimental observations, and that this understanding is greatly facilitated by the generation and analysis of dynamic mathematical models. We conclude that, in order to construct mathematical models and to test mechanistic hypotheses, time-series data is of critical importance. We review diverse techniques to analyse time-series data and discuss various approaches by which time-series of biological sequence data was successfully used to derive and test mechanistic hypotheses. Analysing the bottlenecks of current strategies in the extraction of knowledge and understanding from data, we conclude that combined experimental and theoretical efforts should be implemented as early as possible during the planning phase of individual experiments and scientific research projects.
ARTICLE | doi:10.20944/preprints201809.0383.v1
Subject: Life Sciences, Biochemistry Keywords: fragment screening; XChem; protein crystallisation; X-ray crystallography; diamond light source; I04-1; structural genomics consortium
Online: 19 September 2018 (11:37:41 CEST)
The XChem facility at Diamond Light Source offers fragment screening by X-ray crystallography as a general access user program. The main advantage of X-ray crystallography as a primary fragment screen is that it yields directly the location and pose of the fragment hits, whether within pockets of interest or merely on surface sites: this is the key information for structure-based design and for enabling synthesis of follow-up molecules. Extensive streamlining of the screening experiment at XChem has engendered a very active user programme that is generating large amounts of data: in 2017, 36 academic and industry groups generated 35,000 datasets of uniquely soaked crystals. It has also generated a large number of learnings concerning the main remaining bottleneck, namely obtaining a suitable crystal system that will support a successful fragment screen. Here we discuss the practicalities of generating screen-ready crystals that have useful electron density maps, and how to ensure they will be successfully reproduced and usable at a facility outside the home lab.