ARTICLE | doi:10.20944/preprints202301.0123.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Prostate cancer; Systems genomics; Functional genomics; Pathways; biomarkers
Online: 6 January 2023 (09:58:26 CET)
Prostate cancer (PCa) is one of the most prevalent cancers among men in India. Although studies on PCa have dealt with the genetics, genomics, and the environmental influence in causality of PCa, not many studies employing the next generation sequencing (NGS) approaches of PCa have been carried out. In our previous study, we have identified some causal genes and mutations specific to Indian PCa using Whole-Exome Sequencing (WES). In the recent past, with the help of different cancer consortiums such as The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), along with differentially expressed genes (DEGs), many cancer-associated novel non-coding RNAs have been identified as biomarkers. In this work, we attempt to identify DEGs as well as long non-coding RNAs (lncRNAs) associated with signature pathways from an Indian PCa cohort using RNA-sequencing (RNA-seq) approach. From a cohort of 60, we screened 6 patients who underwent prostatectomy; we performed a whole transcriptome shotgun sequencing (WTSS)/RNA-sequencing to decipher the DEGs. We further normalized the read counts using fragments per kilobase of transcript per million mapped reads (FPKM) and analyzed the DEGs using a cohort of downstream regulatory tools, viz. GeneMANIA Stringdb, Cytoscape-Cytohubba, cbioportal to map the inherent signatures associated with PCa. By comparing the RNA-seq data obtained from the pairs of normal and PCa tissue samples using our benchmarked in-house cuffdiff pipeline, we observed some important genes specific to PCa such as STEAP2, APP, PMEPA1, PABPC1, NFE2L2, HN1L and some other important genes known to be involved in different cancer pathways such as, COL6A1, DOK5, STX6, BCAS1, BACE1, BACE2, LMOD1, SNX9, CTNND1 etc. We also identified a few novel lncRNAs such as LINC01440, SOX2OT, ENSG00000232855, and ENST00000647843.1 that need to be characterized further. Deregulation of SOX2OT is observed in various tumors, including lung cancer, gastric cancer, esophageal cancer, breast cancer, hepatocellular carcinoma, ovarian cancer, pancreatic, laryngeal squamous cell carcinoma, osteosarcoma, nasopharyngeal carcinoma, and glioblastoma. It would be interesting to characterize its function in PCa as well. In comparison with publicly available datasets, we have identified characteristic DEGs and novel lncRNAs implicated in signature PCa pathways in an Indian PCa cohort which have perhaps not been reported. As a pilot study, this has set a precedent for us to validate further experimentally, and we firmly believe this will pave a way towards discovery of biomarkers and development of novel therapies.
ARTICLE | doi:10.20944/preprints201905.0182.v1
Subject: Biology And Life Sciences, Endocrinology And Metabolism Keywords: metabolomics; genomics; ethics
Online: 15 May 2019 (10:09:48 CEST)
A remarkable feature of US federal investments in human genetics has been the availability of parallel funding for studies examining ethical, legal and social implications (ELSI). This funding has allowed ELSI researchers to develop new strategies to understand genetics, evaluate the benefits of genetic testing, and propose health policy that maximize the promise while minimizing harms. Despite successes, a consequence of this investment is the preoccupation with what is arguably the least actionable system of biomolecules, human DNA. In contrast, the most actionable system of biomolecules, the metabolome, is grossly understudied, despite its often more alarming ELSI.
BRIEF REPORT | doi:10.20944/preprints202309.1740.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: Histone Modifiers; cancer; Genomics;
Online: 26 September 2023 (07:15:04 CEST)
Histone modifications, especially H3K27 and H3K36 methylations, are crucial for epigenetic regulation and are often dysregulated in cancer. In a comprehensive analysis of 11,194 patient samples across 32 cancer types, we identified significant genomic alterations in H3K27 and H3K36 modifiers, with the most common being in KDM6B, BRPF1, KDM6A, SETD2, and NSD1. Patients with these alterations also frequently exhibited mutations in genes like TP53 and PIK3CA. Moreover, the presence of these histone modifier alterations correlated with poorer overall survival, emphasizing their potential as both oncogenic drivers and prognostic markers in various cancers.
REVIEW | doi:10.20944/preprints202307.0870.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: Cryptosporidium spp.; genomics; proteomics
Online: 13 July 2023 (10:03:56 CEST)
Cryptosporidiosis is a widespread disease caused by the parasitic protozoan Cryptosporidium spp., which infects various vertebrate species, including humans. Once unknown as a gastroenteritis-causing agent, Cryptosporidium spp. is now recognized as a pathogen causing life-threatening disease, especially in immunocompromised individuals such as AIDS patients. Advances in diagnostic methods and increased awareness have led to a significant shift in the perception of Cryptosporidium spp. as a pathogen. Nowadays, genomic and proteomic studies have played a main role in understanding the molecular biology of this complex-life-cycle parasite. Genomics has enabled the identification of numerous genes involved in the parasite's development and interaction with hosts. Proteomics has allowed for the identification of protein interactions, their function, structure, and cellular activity. The combination of these two approaches has significantly contributed to the development of new diagnostic tools, vaccines, and drugs for cryptosporidiosis. In this review, we summarize the major accomplishments in the field of Cryptosporidium research using genomics and proteomics.
REVIEW | doi:10.20944/preprints202011.0501.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: genomics; sequencing; ethics; bioinformatics
Online: 19 November 2020 (10:41:35 CET)
To fully appreciate genetics, one must understand the link between genotype (DNA sequence) and phenotype (observable characteristics). Advances in high-throughput genomic sequencing technologies and applications, so-called “-omics”, have made genetic sequencing readily available across fields in biology from applications in non-traditional study organisms to precision medicine. Thus, understanding these tools is critical for any biologist, especially those early in their career. This comprehensive review discusses the chronological development of different sequencing methods, the bioinformatics steps to analyzing this data, and social and ethical issues raised by these techniques that must be discussed and evaluated.
ARTICLE | doi:10.20944/preprints202105.0750.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: COVID-19; SARS-CoV-2 genomics; spike protein; epitope prediction; coronavirus comparative genomics
Online: 31 May 2021 (11:36:29 CEST)
The coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) challenges include understanding what triggered SARS-CoV-2 emergence, how this RNA virus is evolving or how the genomic variability may impact the primary structure of proteins that are targets for vaccine. We analyzed 19471 SARS-CoV-2 genomes and 199,984 spike glycoprotein sequences available at the GISAID database from all over the world and 3335 genomes of other Coronoviridae family members available at Genbank, collecting SARS-CoV-2 high-quality genomes and distinct Coronoviridae family genomes. Here, we identify a SARS-CoV-2 emerging cluster containing 13 closely related genomes isolated from bat and pangolin that showed evidence of recombination, which may have contributed to the emergence of SARS-CoV-2. The analyzed SARS-CoV-2 genomes presented 9632 single nucleotide polymorphisms (SNPs) corresponding to a variant density of 0.3 over the genome, and a clear geographic distribution. SNPs are unevenly distributed throughout the genome and hotspots for mutations were found for the spike gene and ORF 1ab. We describe a set of predicted spike protein epitopes whose variability is negligible. All predicted epitopes for the structural E, M and N proteins are highly conserved. This result favors the continuous efficacy of the available vaccines.
ARTICLE | doi:10.20944/preprints202306.0046.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: ethics; genomics; neonates; outcomes; economics
Online: 1 June 2023 (07:23:07 CEST)
GS provides exciting opportunities to rapidly identify a diagnosis in critically ill newborns and children with rare genetic conditions. But there are good reasons to remain cautious about the broad implementation of these strategies in babies and children. At best, rapid GS leads to diagnoses in many infants in highly selected populations. It sometimes leads to beneficial and changes in management. Parents and physicians both often find these results useful. We don’t know how useful such testing will be in the general population. It is almost inevitable that genetic counseling will be more challenging in a more general population. We don’t know how often GS helps improve outcome and survival or reduce symptoms in babies who receive a molecular diagnosis. We don’t know the relative cost-effectiveness of whole genome, whole exome, or targeted panels in different populations. We don’t know the relative contribution of a molecular diagnosis to the decision to withdraw life support. Each of these concerns will require careful study of both the technology and the ethical issues to allow us to harness the potential of these new technologies while avoiding foreseeable problems. Studies are underway to see how the tests are used in general populations. These studies should generate important information to guide clinicians and policymakers. In the meantime, parents should know that genetic results sometimes confirm an already suspected dismal prognosis and sometimes yield only ambiguous findings. Anticipatory discussions should try to give parents a realistic understanding of the likely impact of a genetic diagnosis. Both doctors and parents should recognize that the clinical usefulness of diagnostic genomic testing in newborns is a work in progress. We are only a decade into the age of true genomic medicine. We still have much to learn.
CONCEPT PAPER | doi:10.20944/preprints202203.0069.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: prokaryotic taxonomy; classification; identification; genomics
Online: 3 March 2022 (17:18:57 CET)
Genomics has put prokaryotic rank-based taxonomy on a solid phylogenetic foundation. However, most taxonomic ranks were set long before the advent of DNA sequencing and genomics. In this concept paper, we thus ask the simple yet profound question: Should prokaryotic classification schemes besides the current phylum-to-species ranks be explored, developed, and incorporated into scientific discourse? Could such alternative schemes provide better solutions to the basic need of science and society for which taxonomy was developed, namely, precise and meaningful identification? A neutral genome-similarity based framework is then described that could allow alternative classification schemes to be explored, compared, and translated into each other without having to choose only one as the gold standard. Classification schemes could thus continue to evolve and be selected according to their benefits and based on how well they fulfill the need for prokaryotic identification.
CONCEPT PAPER | doi:10.20944/preprints202107.0546.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: Bacterial nomenclature; taxonomy; microbial genomics
Online: 23 July 2021 (14:22:59 CEST)
The remarkable success of taxonomic discovery, powered by culturomics, genomics and metagenomics, creates a pressing need for new bacterial names, while holding a mirror up to the slow pace of change in bacterial nomenclature. Here, I take a fresh look at bacterial nomenclature, exploring how we might create a system fit for the age of genomics, playing to the strengths of current practice, while minimising difficulties. Adoption of linguistic pragmatism, obeying the rules while treating recommendations as merely optional will make it easier to create names derived from descriptions, from people or places or even arbitrarily. Simpler protologues and a relaxed approach to recommendations will also remove much of the need for expert linguistic quality control. Automated computer-based approaches will allow names to be created en masse before they are needed, while also relieving microbiologists of the need for competence in Latin. The result will be a system that is accessible, inclusive and digital, while also fully capable of naming the unnamed millions of bacteria.
ARTICLE | doi:10.20944/preprints201912.0183.v2
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: Bioinformatics, Genomics, TCGA, Cox Model.
Online: 21 August 2020 (11:26:33 CEST)
This study aimed to rank cancers based on the strength of the relationship between the comprehensive mRNA expression levels of the most harmful or protective genes and patient survival. Using The Cancer Genome Atlas dataset that includes the RNA sequencing and c linical data, we investigated not only gene specific prognostic availability, but also comprehensive prognostic availability of prognostic genes filtered by the Cox coefficient values, and ranked cancers using a specially designed prognostic indicator. Usi ng Kaplan Meier plots, we found that cancers vary in the strength of the influence of their prognostic genes, and can be ranked based on this finding. There is a high probability that the treatment developed by using methods that reduce or increase the exp ression levels of biomarkers, for cancers that ranked at the bottom will not be efficient. The results of this study could be used as scientific evidence for the same.
TECHNICAL NOTE | doi:10.20944/preprints202007.0179.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: UCSC Xena, cancer genomics, TCGA
Online: 9 July 2020 (07:59:18 CEST)
Motivation: UCSC Xena platform provides huge amounts of processed cancer omics data from big public projects like TCGA or individual reserach groups for enabling unprecedented research opportunities. In 2019, we developed UCSCXenaTools, an R package for retrieval of UCSC Xena data. However, an easier dataset exploration and analysis tool is still lack, especially for researchers without programming experience. Results: We develop UCSCXenaShiny, an R Shiny package to quickly explore, download all datasets from UCSC Xena data hubs. In addiction, a module based analysis framework is constructed to analyze and visualize data. Availability: https://github.com/openbiox/UCSCXenaShiny or https://cran.r-project.org/package=UCSCXenaShiny.
HYPOTHESIS | doi:10.20944/preprints202003.0392.v1
Online: 26 March 2020 (14:58:35 CET)
SARS-CoV 2 also known as COVID-19 is a fast spreading coronavirus related disease that emerged from China in December 2019 and is currently attained the status of a pandemic. There are currently no drugs/ vaccines against the same and moreover limited diagnostic tests to identify the infection. Additionally, these tests are expensive and hence are exclusive for very highly suspected cases of the disease especially in developing countries. This is causing an under-diagnosis which is an alarming state of affairs, as even a single missed SARS-CoV 2 case would spread the disease exponentially and keep it in the community. Through this entirely in silico study, we have developed a cheaper and faster diagnostic method based on simple PCR and restriction enzyme digestion, commonly used in restriction fragment length polymorphism (RFLP) tests. Through comparative genomics, we found the closest neighbours of SARS-CoV 2 then found the highly conserved regions of the genome which were absent in SARS-CoV 1, its closest neighbour. Then we found restriction sites for various enzymes followed by designing of PCR primers flanking those sites. We have found the primer pair to produce a 401 bp amplicon and when digested by SwaI enzyme, it produces two fragments of lengths 216 bp and 185 bp. As an internal control, GAPDH primers are pooled with the SARS-CoV 2 primers as the patient sample will also include human RNA mixed with the viral RNA. This primer pair gives an amplicon of 131 bp and hence a negative sample should show a single band of 131 bp while a positive digested sample will give three bands of 401 bp, 216 bp and 131 bp. The primers are specific to SARS-CoV 2 only and can additionally be used for SYBR green based real time quantification of viral load. The developed tests have not yet been tested in vitro due to stressed out working hours in the only pathogenic virus handling laboratory in our institute. Nonetheless, this study works as a head start for other laboratories to rapidly test the suggested protocols in vitro and make available a cheaper alternative test for SARS-CoV 2 which would especially be beneficial for the lower to middle income countries.
Subject: Public Health And Healthcare, Public Health And Health Services Keywords: pathogen genomics; public health; bioinformatics
Online: 11 January 2020 (11:30:10 CET)
Public health agencies are increasingly using pathogen whole genome sequencing (WGS) to support surveillance and epidemiologic investigations. As access to WGS has grown, greater amounts of molecular data have helped improve our ability to detect outbreaks, investigate transmission chains, and explore large-scale population dynamics, such as the spread of antibiotic resistance. However, the wide adoption of WGS also poses challenges due to the amount of data generated and the need to transform raw data prior to analysis. This complexity means that public health agencies may need more advanced computational infrastructure, a broader technical workforce, and new approaches to data management and stewardship. As both a guide for how this development could occur, and a place to initiate discussion, we describe ten proposals for developing and supporting an informatics infrastructure for public health.
ARTICLE | doi:10.20944/preprints201905.0113.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: education; genomics; nanopore; oceanography; fieldwork
Online: 9 May 2019 (12:57:09 CEST)
Experiential learning in the field is an opportunity for students to enter the heart of a scientific discipline. Through such experience, they can extract conceptual clues and discover motivational stepping stones that will potentially influence the rest of their education and career choice. Unfortunately, in Biology, the inescapable topic of Next-Generation Sequencing represents a challenge when it comes to create an educational curriculum that aims to provide students with hands-on experience on sequencers. It is an even more difficult task to accomplish if one’s purpose was to set such curriculum in a field situation. However, in recent years, educators have seen possibility to bring Next-Generation Sequencing to the reach of students more easily, with the Oxford Nanopore MinION, a low-budget, user-friendly, hand-held sequencer. Academic researchers have illustrated the performances of this device in the field and are inspirational for curricula aiming to take the next generation of scientists in the outdoors. We designed a modular 5-day workshop, with nanopore sequencing to be performed in field conditions. Here we describe the material and methods that lead the students and instructors from sample collection, DNA extraction and preparation for nanopore sequencing with MinION to real-time analysis of the data collected. This curriculum was implemented for the first-time aboard Research Vessel Sikuliaq during a transit organized by the STEMSEAS program at Columbia University in collaboration with the University of Alaska BLaST program. The line of investigation formulated for the workshop was an open-ended question that led the students to establish a proof of concept in terms of technology deployment at sea: what will show metagenomic results from DNA obtained from sea water and sequenced with Oxford Nanopore MinION? The workshop took place in October 2018 while Research Vessel Sikuliaq sailed the Alaskans seas for 7 days. Students successfully used nanopore sequencing for multiple metagenomic seawater samples. Their introductory analysis was consistent with environmental conditions and they were able to present their results by the end of the workshop.
ARTICLE | doi:10.20944/preprints201807.0618.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: Cancer genomics; CNA; CGH; bioinformatics
Online: 31 July 2018 (10:21:40 CEST)
Cancers arise from the accumulation of somatic genome mutations, with varying contributions of intrinsic (i.e. genetic predisposition) and extrinsic (i.e. environmental) factors. For the understanding of malignant clones, precise information about their genomic composition has to be correlated with morphological, clinical and individual features, in the context of the available medical knowledge. Rapid improvements in molecular profiling techniques, the accumulation of large amount of data in genomic alterations in human malignancies and the expansion of bioinformatic tools and methodologies have facilitated the understanding of the molecular changes during oncogenesis, and their correlation with clinico-pathological phenotypes. Far beyond a limited set of "driver" genes, oncogenomic profiling has identified a large variety of somatic mutations; and whole genome sequencing studies of healthy individuals have improved the knowledge of heritable genome variation. Nevertheless, main challenges arise from the skewed representation of individuals from varying population backgrounds in biomedical studies, and also through the limited extend in which some cancer entities are represented in the scientific literature. Content analyses of oncogenomic publications could provide guidance for the planning and support of future studies aiming at filling prominent knowledge gaps.
ARTICLE | doi:10.20944/preprints202308.1797.v1
Subject: Biology And Life Sciences, Life Sciences Keywords: Genetics and Genomics; Zoology; Animal Genetics
Online: 25 August 2023 (09:35:48 CEST)
The king ratsnake (Elaphe carinata) of the genus Elaphe is a common large non-venomous snake that is widely distributed in Southeast and East Asia, and is an economically important farmed snake species. As a non-venomous snake, the king snake that is predatory on venomous snakes such as cobras and pit vipers. The immune mechanisms of which has been unclear. Despite their economic and research importance, genomic resources which will benefit studies in toxicology, phylogeography and immunogenetics are lacking. In this study, we use single-tube long fragment read (stLFR) sequencing to display the first complete genome of a King ratsnake from Huangshan City, Anhui province in China. The genome size is 1.56GB with a scaffold N50 of 6.53M, the total length of the genome is approximately 621Mb, and the repeat content is 38.90%. Additionally, we predicted 22,339 protein-coding genes, of which 22,065 had functional annotations. Our genome is a potentially useful addition to those currently available for snakes.
ARTICLE | doi:10.20944/preprints202308.1451.v1
Subject: Biology And Life Sciences, Other Keywords: Genetics and Genomics; Evolutionary Biology; Zoology
Online: 22 August 2023 (03:43:10 CEST)
The Oriental ratsnake Ptyas mucosa is a common non-venomous snake of the colubrid family, with a wide geographic range spanning much of South and Southeast Asia. P. mucosa is widely cultivated dut to it used in traditional medicine, scientific research, and handicrafts. Therefore, genome resources could play an important role in the efficacy of traditional medicine and the analysis of the living environment of the species. We collected a snake sample in Hezhou, Guangxi, China, which was identified as P. mucosa by morphological identification. Here we present a highly continuous P. mucosa genome with a genome size of 1.74Gb. The scaffold N50 length is 9.57Mb and the maximal length of scaffold is 78.3Mb, the P. mucosa genome has a CG content of 37.9% and the integrity of the gene reached 86.6%.Assembled using long-reads, the total length of the repeat sequence in the genome reached 735 Mb, and its repeat content was as high as 42.19%. A total of 24,869 functional genes were annotated. This study will assist in the understanding of the P. mucosa, and also provide a basis for medicinal research.
DATA DESCRIPTOR | doi:10.20944/preprints202306.0658.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: Genetics and Genomics; Evolutionary Biology; Zoology
Online: 9 June 2023 (03:30:56 CEST)
Snakes are a vital component of wildlife resources and are widely distributed across the globe. Bungarus multicinctus, a highly venomous snake, is found in central and southern China. B. multicinctus is a highly venomous snake and is distributed in central and southern China. Snakes are an ancient group of reptiles, and their genome resources can provide important clues for understanding the evolutionary history of reptiles. Meanwhile, genomic resources play a crucial role in comprehending the evolution of species. So far, the genomic resources of snakes are a rarity. In 2021, a snake sample was collected from Beiliu Longgukeng, Guangxi, which was identified as B. multicinctus through morphological identification. In this study, we present a highly contiguous genome of B. multicinctus with a size of 1.51 Gb. The genome contains a repeat content of 40.15%, with a total length exceeding 620 Mb. Additionally, we annotated a total of 24,869 functional genes. This research is of great significance for comprehending the evolution of B. multicinctus and provides a genomic basis for the genes involved in venom gland function.
REVIEW | doi:10.20944/preprints202303.0336.v1
Subject: Biology And Life Sciences, Aging Keywords: ocean; phytoplankton; bacteria; biogeochemistry; rheology; genomics
Online: 20 March 2023 (03:07:57 CET)
Dissolved organic matter (DOM) in the ocean represents about 662 billion tonnes of C, 200 times more than the living biomass. It is produced mainly by microbial primary production. The largest fraction of this DOM is old (>weeks to months) and both chemically and biologically recalcitrant. The remainder is young (seconds to weeks), more labile and surface active. It also changes the rheological properties in the bulk phase of the water and at interfaces including the sea surface microlayer (SML). In order of abundance, this DOM consists of sugars, amino acids, fatty acids and nucleic acids, often incorporated into complex polymers. The DOM molecules are produced by microbial genes, and are further modified by enzymes themselves produced by genes. The properties of ocean water and its interfaces as well as biogeochemical fluxes may thus be modified by ocean microbial genes. These fluxes influence ocean and atmospheric climate, which in return acts on the biota. Therefore the ocean microbial genomes and the fluxes and climates they influence may be subject to Darwinian-type selection. Research programmes need to integrate ocean ecology, rheology, biogeochemistry and genomics, to find the associations among them. Discovery of commercial bioactive molecules may be a bonus.
REVIEW | doi:10.20944/preprints202209.0050.v1
Subject: Biology And Life Sciences, Virology Keywords: bunyavirus; structure; genomics; immune response; review
Online: 5 September 2022 (07:49:30 CEST)
Bunyaviruses represent the largest group of RNA viruses, and are the causative agent of a variety of febrile and hemorrhagic illnesses. Originally characterized as a single serotype in Africa, the number of described bunyaviruses now exceeds over 500, with its presence detected around the world. These predominantly tri-segmented, single-stranded RNA viruses are transmitted primarily through arthropod and rodent vectors, and can infect a wide variety of animal and plants. Although encoding for a small number of proteins, these viruses can inflict potentially fatal disease outcomes, and have even developed strategies to suppress the innate antiviral immune mechanisms of the infected host. This short review will attempt to provide an overall description of the order Bunyavirales, describing the mechanisms behind their infection, replication and their evasion of the host immune response. Furthermore, the historical context of these virus will be presented, starting from their original discovery almost 80 years ago, to the most recent research pertaining to viral replication and host immune response.
ARTICLE | doi:10.20944/preprints202208.0337.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: breast cancer; polymorphism; mitochondrial genomics; D310
Online: 18 August 2022 (10:17:28 CEST)
Breast cancer has an important incidence in the worldwide female population. Although alterations in the mitochondrial genome probably play an important role in carcinogenesis, the actual evidence is ambiguous and inconclusive. The purpose of the present work was to explore mitochondrial sequences of clinical cases with breast cancer from different origins and determine the polymorphisms associated. The search for complete and partial mtDNA sequences obtained from breast cancer patients and controls was performed in NCBI Genbank database. We identified 124 mtDNA sequences associated to breast cancer cases of which 86 were complete and 38 partial sequences. Of these 86 complete sequences, 52 belong to patients with a confirmed diagnosis of breast cancer and 34 sequences were obtained from healthy mammary tissue of the same patients used as controls. From mtDNA analysis, two polymorphisms with significative statistical differences were found in D130 in sequences analyzed: m.310del (rs869289246) in 34.6% (27/78) breast cancer cases and 61.7% (21/34) of controls; and m.315dup (rs369786048) in 60.2% (47/78) of breast cancer cases and 38.2% (13/34) of controls. Also, the variant m.16519T>C (rs3937033) was found in 59% of control sequences and 52% of breast cancer sequences with a significant statistical difference. Polymorphic changes are evolutionarily related to haplogroup H of Indo-European and Euro Asiatic origins, however, were found in all non-European sequences with breast cancer.
REVIEW | doi:10.20944/preprints202204.0149.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: sweet potato; breeding; genetics; genomics; health
Online: 15 April 2022 (14:58:08 CEST)
Sweet potatoes are a crucial crop for Asian and African countries. Its nutritional content and capacity to keep you healthy have increased in recent years. Moreover, sweet potatoes' fibre also keeps your gut happy. Most sweet potato varieties don't bloom. Due to pollination issues, sweet potatoes are also incompatible with each other. Sweet potato blooms are self-sterile, so they don't mix well in breeding programmes. Traditional and modern breeding procedures didn't always work with sweet potatoes, but some did. Using molecular biology methods, some individuals become more resistant to illnesses by eliminating particular genes. The crop's nature and growth should be improved. All of this should be done to acquire new characteristics in sweet potatoes by crossing them. Sweet potatoes are a superb tuberous crop, but they have issues with pollination and adjusting to new breeding procedures. Modern breeding and biotechnology methods can be used to get the most out of this crop. These are "chronological" ways to get the most out of farming.
ARTICLE | doi:10.20944/preprints202203.0112.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: PAEs; Comparative genomics analysis; DEHP; Bioremediation
Online: 8 March 2022 (01:58:22 CET)
As commonly used chemical plasticizers in plastic products, phthalate esters had become a serious ubiquitous environmental pollutant, such as in soil of plastic film mulch culture. Microbial degradation or transformation was regarded as a suitable strategy to solve the phthalate esters pollution. Thus, a new phthalate esters degrading strain Gordonia sp. GZ-YC7 was isolated in this study, which exhibited the highest di-(2-ethylhexyl) phthalate degradation efficiency under 1000 mg/L and the strongest tolerance to 4000 mg/L. The comparative genomic analysis showed that there exist diverse degradation pathways for various phthalate esters such as di-(2-ethylhexyl) phthalate and dibutyl phthalate in Gordonia sp. GZ-YC7, which possibly contributes to its broad substrate spectrum, high degrading efficiency and high tolerance to phthalate esters. Gordonia sp. GZ-YC7 is potential for bioremediation of phthalate esters in polluted soil environments.
ARTICLE | doi:10.20944/preprints202111.0533.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: chloroplast; genetic resources; genomics capirona; phylogenomics
Online: 29 November 2021 (12:32:24 CET)
Capirona (Calycophyllum spruceanum Benth.) belongs to subfamily Ixoroideae, one of de major lineages in the Rubiaceae family, and is an important timber tree, with origin in the Amazon Basin and has widespread distribution in Bolivia, Peru, Colombia, and Brazil. In this study, we obtained the first complete chloroplast (cp) genome of capirona from department of Madre de Dios located in the Peruvian Amazon. High-quality genomic DNA was used to construct librar-ies. Pair-end clean reads were obtained by PE 150 library and the Illumina HiSeq 2500 platform. The complete cp genome of C. spruceanum has a 154,480 bp in length with typical quadripartite structure, containing a large single copy (LSC) region (84,813 bp) and a small single-copy (SSC) region (18,101 bp), separated by two inverted repeat (IR) regions (25,783 bp). The annotation of C. spruceanum cp genome predicted 87 protein-coding genes (CDS), 8 ribosomal RNA (rRNA) genes, 37 transfer RNA (tRNA) genes and 01 pseudogene. A total of 41 simple sequence repeats (SSR) of this cp genome were divided into mononucleotides (29), dinucleotides (5), trinucleotides (3), and tetranucleotide (4). Most of these repeats were distributed in the noncoding regions. Whole chloroplast genome comparison with the other six Ixoroideae species revealed that the small single copy and large single copy regions showed more divergence than invert regions. Finally, phylogenetic analysis resolved that C. spruceanum is a sister species to Emmenopterys henryi, and confirms its position within the subfamily Ixoroideae. This study reports for the first time the genome organization, gene content, and structural features of the chloroplast genome of C. spruceanum, providing valuable information for genetic and evolutionary studies in the genus Calycophyllum and beyond.
ARTICLE | doi:10.20944/preprints202001.0064.v1
Subject: Medicine And Pharmacology, Dentistry And Oral Surgery Keywords: dental enamel; dental caries; genomics; odontogenesis
Online: 8 January 2020 (06:35:12 CET)
Objectives: The hierarchical structure of enamel gives insight on the properties of enamel and can influence its strength and ultimately caries experience. Presently, past caries experience is quantified using the decayed, missing, filled teeth/decayed, missing, filled surface (DMFT/DMFS for permanent teeth; dmft/dmfs for primary teeth) or international caries detection and assessment system (ICDAS) scores. By analyzing the structure of enamel, a new measurement can be utilized clinically to predict susceptibility to future caries experience based on a patient’s individual’s biomarkers. The purpose of this study was to test the hypothesis that number of prisms by square millimeter in enamel and average gap distance between prisms and interprismatic areas, influence caries experience through genetic variation of the genes involved in enamel formation. Materials and Methods: Scanning electron microscopy (SEM) images of enamel from primary teeth were used to measure number of prisms by square millimeter and interprismatic spaces, prism density and gap distances between prisms in the enamel samples. The measurements were tested to explore a genetic association with variants of selected genes and correlations with caries experience based on the individual’s DMFT+ dmft score and enamel microhardness at baseline, after an artificial lesion was created and after the artificial lesion was treated with fluoride. Results: Associations were found between variants of genes including ameloblastin, amelogenin, enamelin, tuftelin, tuftelin interactive protein 11, beta defensin 1, matrix metallopeptidase 20 and enamel structure variables measured. Significant correlations were found between caries experience and microhardness and enamel structure. Negative correlations were found between number of prisms by square millimeter and high caries experience (r value= -0.71), gap distance between prisms and the enamel microhardness after an artificial lesion was created (r value= -0.70), and gap distance between prisms and the enamel microhardness after an artificial lesion was created and then treated with fluoride (r value= -0.81). There was a positive correlation between number of prisms by square millimeter and prism density of the enamel (r value= 0.82). Conclusions: our data support that genetic variation may impact enamel formation, and therefore influence susceptibility to dental decay and future caries experience. Clinical Relevance: The evaluation of enamel structure that may impact caries experience allows for hypothesizing that the identification of individuals at higher risk for dental caries and implementation of personalized preventative treatments may one day become a reality.
ARTICLE | doi:10.20944/preprints201808.0423.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: mitochondrial DNA; mitochondrial genome; genome assembly; genome annotation; next generation sequencing; animal genomics; partial genomics; bioinformatics
Online: 24 August 2018 (03:24:37 CEST)
Next-generation sequencing is now a mature technology, allowing partial animal genomes to be produced for many clades. Though many software exist for genome assembly and annotation, a simple pipeline that allows researchers to input raw sequencing reads in fastq format and allow the retrieval of a completely assembled and annotated mitochondrial genome is still missing. mitoMaker 1.0 is a pipeline developed in python that implements (i) recursive de novo assembly of mitochondrial genomes using a set of increasing k-mers; (ii) search for the best matching result to a target mitogenome and; (iii) performs iterative reference-based strategies to optimize the assembly. After (iv) checking for circularization and (v) positioning tRNA-Phe at the beginning, (vi) geneChecker.py module performs a complete annotation of the mitochondrial genome and provides a GenBank formatted file as output.
ARTICLE | doi:10.20944/preprints202308.0775.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: Genetics and Genomics; Animal Genetics; Evolutionary Biology
Online: 9 August 2023 (10:51:51 CEST)
The study of the currently known >3,000 species of snakes can provide valuable insights into the evolution of their genomes. Deinagkistrodon acutus, also known as Sharp-nosed Pit Viper, one hundred-pacer viper or five-pacer viper, is a venomous snake with significant economic, medicinal and scientific importance. Widely distributed in southeastern China and South-East Asia, D. acutus has been primarily studied for its venom. Here, we employed next-generation sequencing to assemble and annotate a highly continuous genome of D. acutus. The genome size is 1.46 Gb; its scaffold N50 length is 6.21 Mb, the repeat content is 42.81%, and 24,402 functional genes were annotated. This study helps to further understand and utilize D. acutus and its venom at the genetic level.
REVIEW | doi:10.20944/preprints202302.0405.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Explainability; Deep Learning; Artificial Intelligence; Genomics; Transcriptomics
Online: 23 February 2023 (09:32:09 CET)
Deep learning has already revolutionised the way we process a wide range of data, in many areas of our daily life. The ability to learn abstractions and relationships from heterogeneous data, has provided impressively accurate prediction and classification tools to handle increasingly big datasets. This has a significant impact on the growing wealth of omics datasets, with the unprecedented opportunity for a better understanding of the complexity of living organisms. While this revolution is transforming the way we analyse these data, explainable deep learning is emerging as an additional tool with the potential to change the way we interpret biological data. Explainability addresses critical issues such as transparency, so important when computational tools are introduced especially in clinical environments. Moreover, it empowers artificial intelligence with the capability to provide new insights in the input data, thus adding an element of discovery to these already powerful resources. In this review we provide an overview of the transformative effects explainable deep learning is having on multiple sectors, ranging from genome engineering and genomics, from radiomics to drug design and clinical trials. We offer a perspective to life scientists, to better understand the potential of these tools, and a motivation to implement them in their research, by suggesting learning resources they can use to move their first steps in this field.
ARTICLE | doi:10.20944/preprints202301.0345.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Fabaceae; bioinformatics; molecular markers; neglected crop; genomics
Online: 19 January 2023 (03:57:25 CET)
Lupinus mutabilis Sweet (Fabaceae), “tarwi” or “chocho”, is an important grain legume in the Andean region. In Peru, studies on tarwi have been mainly focused on morphological features, however, the have not been molecularly characterized. Currently, it is possible to explore genetic parameters of plants with reliable and modern methods like genotyping-by-sequencing (GBS). We here for the first time used single nucleotide polymorphisms (SNPs) markers to infer the genetic diversity and population structure of 89 accessions of tarwi from nine Andean regions of Peru. A total of 5922 SNPs distributed along all chromosomes of tarwi were identified. STRUCTURE analysis revealed that this crop is grouped into two clusters. A dendrogram was generated using the UPGMA clustering algorithm and, similar to the principal coordinate analysis (PCoA), it showed two groups that correspond to the geographic origin of the tarwi samples. AMOVA showed a reduced variation between clusters (7.59 %) and indicated that variability within populations is 92.41 %. Population divergence (Fst) between clusters 1 and 2 revealed low genetic difference (0.019). We also detected a negative Fis for both populations, demonstrating that, similar to other Lupinus species, tarwi also depends on cross-pollination. SNPs markers were powerful and effective for the genotyping process in this germplasm. We hope that this information is the beginning of the path towards a modern genetic improvement and conservation strategies of this important Andean legume.
COMMUNICATION | doi:10.20944/preprints202109.0485.v2
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: gene nomenclature; vertebrate genomics; oxytocin; arginine vasopressin
Online: 29 April 2022 (08:09:45 CEST)
Standardized gene nomenclature supports unambiguous communication and identification of the scientific literature associated with genes. To support the increasing number of annotated genomes that are now available for comparative studies, gene nomenclature authorities coordinate the assignment of approved gene names that can be readily propagated across species. Theofanopoulou et al. (Theofanopoulou et al. 2021) propose a new nomenclature for the genes encoding oxytocin and arginine vasopressin and their receptors. Rather than changing to a different nomenclature system, we propose minor updates to the current approved nomenclature of these vertebrate genes to better reflect their evolutionary history. We call on authors, journal editors and reviewers to help support communication and indexing of gene-related publications by working with existing gene nomenclature committees and ensuring that standardized gene nomenclature is routinely used.
REVIEW | doi:10.20944/preprints202101.0521.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Data integration; multi-omics; integration strategies; genomics
Online: 25 January 2021 (16:19:31 CET)
Metabolomics deals with multiple and complex chemical reactions within living organisms and how these are influenced by external or internal perturbations. It lies at the heart of omics profiling technologies not only as the underlying biochemical layer that reflects information expressed by the genome, the transcriptome and the proteome, but also as the closest layer to the phenome. The combination of metabolomics data with the information available from genomics, transcriptomics, and proteomics offers unprecedented possibilities to enhance current understanding of biological functions, elucidate their underlying mechanisms and uncover hidden associations between omics variables. As a result, a vast array of computational tools have been developed to assist with integrative analysis of metabolomics data with different omics. Here, we review and propose five criteria – hypothesis, data types, strategies, study design and study focus – to classify statistical multi-omics data integration approaches into state-of-the-art classes under which all existing statistical methods fall. The purpose of this review is to look at various aspects that lead the choice of the statistical integrative analysis pipeline in terms of the different classes. We will draw a particular attention to metabolomics and genomics data to assist those new to this field in the choice of the integrative analysis pipeline.
REVIEW | doi:10.20944/preprints202010.0118.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: FAANG; farmed animal; genomics; genotype-to-phenotype
Online: 6 October 2020 (10:53:17 CEST)
Here we review and describe a set of research priorities to meet present and future challenges posed to farmed animal production that build on progress, successes and resources from the Functional Annotation of ANimal Genomes (FAANG) project.
REVIEW | doi:10.20944/preprints202009.0073.v1
Subject: Medicine And Pharmacology, Pediatrics, Perinatology And Child Health Keywords: genomics; pediatrics; lung disease; pulmonary arterial hypertension
Online: 3 September 2020 (15:29:36 CEST)
Pulmonary arterial hypertension is a rare disease with high mortality despite recent therapeutic advances. The disease is caused by both genetic and environmental factors, and likely gene x environment interactions. While PAH can manifest across the lifespan, pediatric-onset disease is particularly challenging because it is frequently associated with a more severe clinical course and comorbidities including lung/heart developmental anomalies. In light of these differences, it is perhaps not surprising that emerging data from genetic studies of pediatric-onset PAH indicate that the genetic basis is different than that of adults. There is a greater genetic burden in children, with rare genetic factors contributing to at least 36% of pediatric-onset idiopathic PAH (IPAH) compared to ~11% of adult-onset IPAH. De novo variants are frequently associated with PAH in children, and contribute to at least 15% of all pediatric cases. The standard of medical care for pediatric PAH patients is based on extrapolations from adult data. However, the increased etiologic heterogeneity, poorer prognosis and increased genetic burden for pediatric-onset PAH calls for a dedicated pediatric research agenda to improve molecular diagnosis and clinical management. A genomics-first approach will improve the understanding of pediatric PAH and how it is related to other rare pediatric genetic disorders.
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: cancer biomarkers; BRCA1/2; genomics; European recommendation
Online: 18 August 2020 (16:30:04 CEST)
Rapid and continuing advances in biomarker testing are not being matched by uptake in health systems, and this is hampering both patient care and innovation. It also risks costing health systems the opportunity to make their services more efficient and, over time, more economical. The potential that genomics has brought to biomarker testing in diagnosis, prediction and research is being realised, pre-eminently in many cancers, but also in an ever-wider range of conditions – notably BRCA1/2 testing in ovarian, breast, pancreatic and prostate cancers. Nevertheless, the implementation of genetic testing in clinical routine setting is still challenging. Development is impeded by country-related heterogeneity, data deficiencies, and lack of policy alignment on standards, approval – and the role of real-world evidence in the process - and reimbursement. The acute nature of the problem is compellingly illustrated by the particular challenges facing the development and use of tumour agnostic therapies, where the gaps in preparedness for taking advantage of this innovative approach to cancer therapy are sharply exposed. Europe should already have in place a guarantee of universal access to a minimum suite of biomarker tests and should be planning for an optimum testing scenario with a wider range of biomarker tests integrated into a more sophisticated health system articulated around personalised medicine. Improving healthcare and winning advantages for Europe's industrial competitiveness and innovation require an appropriate policy framework – starting with an update to outdated recommendations.
ARTICLE | doi:10.20944/preprints202008.0307.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: cancer biomarkers, BRCA1/2, genomics; European recommendation
Online: 14 August 2020 (04:27:45 CEST)
Rapid and continuing advances in biomarker testing are not being matched by take-up in health systems, and this is hampering both patient care and innovation. It also risks costing health systems the opportunity to make their services more efficient and, over time, more economical. The potential that genomics has brought to biomarker testing in diagnosis, prediction and research is being realised, pre-eminently in many cancers, but also in an ever-wider range of conditions. One of the paradigmatic examples is BRCA1/2 testing in ovarian, breast, pancreatic and prostate cancers. Nevertheless, development is impeded by data deficiencies, and lack of policy alignment on standards, approval – and the role of real-world evidence in the process - and reimbursement. The acute nature of the problem is compellingly illustrated by the particular challenges facing the development and use of tumour agnostic therapies, where the gaps in preparedness for taking advantage of this innovative approach to cancer therapy are sharply exposed. Europe should already have in place a guarantee of universal access to a minimum suite of biomarker tests and should be planning for an optimum testing scenario with a wider range of biomarker tests integrated into a more sophisticated health system articulated around personalised medicine. Improving healthcare and winning advantages for Europe's industrial competitiveness and innovation require an appropriate policy framework – starting with an update to outdated recommendations.
REVIEW | doi:10.20944/preprints201810.0708.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: circular visualization; circos; genomics; next-generation sequencing
Online: 30 October 2018 (07:06:35 CET)
After human genome sequencing and rapid changes in genome sequencing methods, we have entered in the era of rapidly accumulating genome-sequencing data. This has poses development of several types of methods for representing results of genome sequencing data. Circular genome visualizations tools are also critical in this area as they provide rapid interpretation and simple visualization of overall data. In the last 15 years, we have seen rapid changes in circular visualization tools after the development of the circos tool with 1–2 tools published per year. Herein we have summarized and revisited all these tools until the third quarter of 2018.
ARTICLE | doi:10.20944/preprints201809.0388.v1
Subject: Medicine And Pharmacology, Other Keywords: biobanks, electronic health records, Michigan Genomics Initiative
Online: 19 September 2018 (14:57:30 CEST)
Biobanks linked to electronic health records provide a rich data resource for health-related research. With the establishment of large-scale infrastructure, the availability and utility of data from biobanks has dramatically increased over time. As more researchers become interested in using biobank data to explore a diverse spectrum of scientific questions, resources guiding the data access, design, and analysis of biobank-based studies will be crucial. The first aim of this review is to characterize the types of biobanks that are discussed in the recent literature and provide detailed descriptions of specific biobanks including their location, size, data access, data linkages and more. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, new discoveries, and hypothesis-generating studies of disease-treatment, disease-exposure and disease-gene associations. Rather than spending time and money designing and implementing a single study with pre-defined objectives, researchers can use biobanks’ existing data-rich resources to answer scientific questions as quickly as they can analyze them. While the data are becoming increasingly available, additional thought is needed to address issues related to the design of such studies and analysis of these data. In the second aim of this review, we discuss statistical issues related to biobank research in general including study design, sampling strategy, phenotype identification, and missing data. These issues are illustrated using data from the Michigan Genomics Initiative, UK Biobank, and Genes for Good. We summarize the current body of statistical literature aimed at addressing some of these challenges and discuss some of the standing open problems in this area. This work serves to complement and extend recent reviews about biobank-based research and aims to provide a resource catalog with statistical and practical guidance to researchers pursuing biobank-based research.
REVIEW | doi:10.20944/preprints201806.0191.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: rare disease; functional genomics; genetic variant validation
Online: 12 June 2018 (12:36:08 CEST)
Many insights into human disease have been built on experimental results in Drosophila, and research in fruit flies is often justified on the basis of its predictive value for questions related to human health. Additionally, there is now a growing recognition of the value of Drosophila for the study of rare human genetic diseases, either as a means of validating the causative nature of a candidate genetic variant found in patients, or as a means of obtaining functional information about a novel disease-linked gene when there is little known about it. For these reasons, funders in the US, Europe, and Canada have launched targeted programs to link human geneticists working on discovering new rare disease loci with researchers who work on the counterpart genes in Drosophila and other model organisms. Several of these initiatives are described here, as are a number of output publications that validate this new approach.
ARTICLE | doi:10.20944/preprints201612.0098.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: lung cancer; adenovirus; E1b; UV-irradiation; genomics
Online: 19 December 2016 (09:33:20 CET)
Adenoviruses (Ads) have been extensively manipulated for the development of cancer selective replication, leading to cancer cell death or oncolysis. Clinical studies using E1-modified oncolytic Ads have shown that this therapeutic platform was safe, but with limited efficacy, indicating the necessity of targeting other viral genes for manipulation. To improve the therapeutic efficacy of oncolytic Ads, we treated the entire Ad genome repeatedly with UV-light and have isolated AdUV which efficiently lyses cancer cells as reported previously . In this report, we show that no mutations were observed in the early genes (E1 or E4) of AdUV while several mutations were observed within the Ad late genes which have structural or viral DNA packaging functions. This study also reported the increased release of AdUV from cancer cells. In this study, we found that AdUV inhibits tumor growth following intratumoral injection. These results indicate the potentially significant role of the viral late genes, in particular the DNA packaging genes, to enhance Ad oncolysis.
COMMUNICATION | doi:10.20944/preprints202309.0079.v1
Subject: Biology And Life Sciences, Life Sciences Keywords: computational biology; genomics; sequencing; data literacy; bioinformatics; education
Online: 4 September 2023 (04:32:42 CEST)
With an ever increasing amount of research data available, it becomes constantly more important to possess data literacy skills to benefit from this valuable resource. An integrative course was developed to teach students the fundamentals of data literacy through an engaging genome sequencing project. Each cohort of students performed planning of the experiment, DNA extraction, nanopore sequencing, genome sequence assembly, prediction of genes in the assembled sequence, and assignment of functional annotation terms to predicted genes. Students learned how to communicate science through writing a protocol in the form of a scientific paper, providing comments during a peer-review process, and presenting their findings as part of an international symposium. Many students enjoyed the opportunity to own a project and to work towards a meaningful objective.
ARTICLE | doi:10.20944/preprints202307.0696.v1
Subject: Biology And Life Sciences, Life Sciences Keywords: MALDI-TOF; enviromental; Rivers; Genomics method; Dominican Republic
Online: 11 July 2023 (09:30:41 CEST)
We compared the performance of matrix-assisted laser desorption/ionization followed by a time of flight (MALDI-TOF) mass spectrometry and genomic DNA extraction followed by sequencing, assembly, and alignment for phylogenetic assessment (Genomics method). We collected the samples from four contaminated rivers in the Dominican Republic and analyzed MALDI-TOF efficacy and accuracy for identifying bacteria in the samples. We evaluated the results for both methods (MALDI-TOF and Genomics) and reported a similarity percentage between each method's results. The MALDI-TOF method had a 72.41\% of coincidence with the Genomics method. This could have been sequence contamination found in the Genomics method. When it was later filtered, the result's coincidence rate went up to 90\%.
ARTICLE | doi:10.20944/preprints202305.0805.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: biogeographic barrier; phylogeographic break; conservation genomics; Caribbean Sea
Online: 11 May 2023 (05:42:49 CEST)
The comparative phylogeography of marine species with contrasting dispersal potential across the southern Caribbean Sea was evaluated by the presence of two putative barriers: the Magdale-na River plume (MRP) and the combination of the absence of a rocky bottom and permanent upwelling in the La Guajira Peninsula (ARB+PUG). Three species of rocky shallow bottoms were selected with different dispersal potentials: Acanthemblemaria rivasi (PLD < 22 days), Cittarium pica (PLD < 6 days), and Nerita tessellata (PLD > 60 days). We generated a set of SNPs for the three species using the ddRad-seq technique. Samples of each species were collected in five locations from Capurganá to La Guajira. For the first time, evidence of a phylogeographic break caused by MRP is provided, mainly for A. rivasi (AMOVA: ФCT = 0.420). The ARB+PUG barrier causes an-other break for A. rivasi (ФCT = 0.406) and C. pica (ФCT = 0.224). Three populations (K = 3) were identified for A. rivasi and C. pica, while N. tessellata presented one population (K = 1). The Mantel correlogram indicated that A. rivasi and C. pica fit the hierarchical population model, and only the A. rivasi and C. pica comparisons showed phylogeographic congruence. Our results demon-strate how the biological traits of these three species and the biogeographic barriers have influ-enced their phylogeographic structure.
ARTICLE | doi:10.20944/preprints202210.0470.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: metastasis; cancer evolution; bioinformatics; cancer biology; cancer genomics
Online: 31 October 2022 (07:06:45 CET)
: Cancer metastasis is the lethal developmental step in cancer, responsible for the majority of cancer deaths. To metastasize, cancer cells must acquire the ability to disseminate systemically and to escape an activated immune response. Here, we endeavoured to investigate if metastatic dissemination reflects acquisition of genomic traits that are selected for. We acquired mutation and copy number data from 8,332 tumours representing 19 cancer types acquired from The Cancer Genome Atlas and the Hartwig Medical Foundation. A total of 827,344 non-synonymous mutations across 8,332 tumour samples representing 19 cancer types were timed as early or late relative to copy number alterations, and potential driver events were annotated. We found that metastatic cancers had significantly higher proportion of clonal mutations and a general enrichment of early mutations in p53 and RTK/KRAS pathways. However, while individual pathways demonstrated a clear time-separated preference for specific events, the relative timing did not vary between primary and metastatic cancers. These results indicate that the selective pressure that drives cancer development does not change dramatically between primary and metastatic cancer on a genomic level, and is mainly focused on alterations that increase proliferation.
ARTICLE | doi:10.20944/preprints202209.0248.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Fruitless; genomics; An. gambiae s.l; vector control; Africa
Online: 16 September 2022 (11:33:36 CEST)
Targeting genes involved in sexual determinism for vector or pest control purpose, requires a better understanding of their polymorphism in natural populations in order to ensure a rapid spread of the construct. By using genomic data from An. gambiae s.l., we analyzed the genetic variation and the conservation score of the fru gene in 18 natural populations across Africa. A total of 34339 SNPs were identified including 3.11% non-synonymous segregating sites. Overall, the nucleotide diversity was low and the Tajima's D neutrality test was negative indicating an excess of low frequency SNPs in the fru gene. The allelic frequencies of the non-synonymous SNPs were low (freq < 0.26) except two SNPs identified at high frequencies (freq > 0.8) in the Zinc-finger A and B protein domains. The conservation score was variable throughout the fru gene with maximum values in the exonic compared to the intronic regions. These results showed a low genetic variation in overall the exonic regions especially the male sex-specific exon and the BTB-exon 1 of the fru gene. These findings are crucial for the development of a gene drive construct targeting the fru gene that can rapidly spread without encountering resistance in wild populations.
ARTICLE | doi:10.20944/preprints202203.0224.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: zoogenetic resources; organelle; genomics; NGS; cattle; Bos taurus
Online: 16 March 2022 (07:36:01 CET)
Cattle spread throughout the American continent during the colonization years, originating creole breeds that adapted to a wide range of climate conditions. Population of creole cattle in Peru is decreasing mainly due to the introduction of more productive breeds in recent years. During the last 15 years, there have been a significant progress on cattle genomics. However, little is known about the genetics of the Peruvian creole cattle (PCC) even though its importance to (i) improve productivity in the Andean region, (ii) agricultural labor, and (iii) cultural traditions. In addition, the origin and phylogenetic relationship of the PCC is still unclear. In order to promote the conservation of the PCC, we sequenced for the first time the mitochondrial genome of a creole bull from the highlands of Arequipa, which also possessed exceptional fighting skills and was employed for agricultural tasks. The total mitochondrial genome sequence is 16,339 bp in length with the base composition of 31.43 % for A, 28.64 % for T, 26.81 % for C, and 13.12 % for G. It contains 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes and a control region. Among the 37 genes, 28 were positioned on the H-strand and nine were positioned on the L-strand. The most frequently used codons were CUA (Leucine), AUA (Isoleucine), AUU (Isoleucine), AUC (Isoleucine), y ACA (Threonine). Maximum likelihood reconstruction using complete mitochondrial genome sequences clearly demonstrated that the PCC is strongly related to native African breeds, giving insights into the ancestry of PCC. The annotated mitochondrial genome of PCC would serve as an important genetic data set for further breeding work and conservation strategies.
ARTICLE | doi:10.20944/preprints202110.0367.v1
Subject: Biology And Life Sciences, Immunology And Microbiology Keywords: Bacteria; culturomics; genome; species; sp. nov.,; taxono-genomics
Online: 25 October 2021 (15:47:32 CEST)
Marseille-Q4369 is a strain that we isolated from human healthy skin and characterized by taxono-genomic approach. Marseille-Q4369 exhibited 99.80% 16S rRNA sequence similarity with Agrococcus pavilionensisT the phylogenetically closest bacterium with standing in nomenclature. Furthermore, digital DNA–DNA hybridization revealed a maximum identity similarity of only 52.4% and an OrthoANI parameter provided a value of 93.63% between the novel organism and Agrococcus pavilionensisT. Marseille-Q4369 was observed to be a yellowish-pigmented, Gram-positive, coccoïd, facultative aerobic bacterium, and belonging to the Microbacteriaceae family. The major fatty acids detected are 12-methyl-tetradecanoic acid (66%), 14-methyl-hexadecanoic acid (24%) followed by 13-methyl-tetradecanoic acid (5%). The genome size of strain Marseille-Q4369 was 2,737,735-bp long with a 72,27 % G+C content. Taken altogether, these results confirm the status of this strain as a new member of the Agrococcus genus for which the name of Agrococcus massiliensis is proposed (=CSUR-Q4369 = DSM112404).
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: Pediatric Acute Lymphoblastic Leukemia; Genomics; Epigenetics; Targeted Therapy
Online: 1 October 2021 (12:23:33 CEST)
Acute lymphoblastic leukemia is the most common malignancy in children and is characterized by numerous genetic and epigenetic abnormalities. Epigenetic mechanisms, which involve DNA methylations and histone modifications, result in the heritable silencing of genes without a change in their coding sequence. Emerging studies are increasing our understanding of the epigenetic role of leukemogenesis and have demonstrated the potential of DNA methylations and histone modifications as a biomarker for lineage and subtypes classification, predicting relapse, and disease progression in ALL. Epigenetic abnormalities are relatively reversible when treated with some small molecule-based agents compared to genetic alterations. In this review, we conclude the genetic and epigenetic characteristics in ALL and discuss the future role of DNA methylation and histone modifications in predicting relapse, finally focus on the individual and precision therapy targeting epigenetic alterations.
ARTICLE | doi:10.20944/preprints202107.0281.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Glioblastoma; Precision Medicine; Targeted Therapy; Genomics; Neuro-Oncology
Online: 13 July 2021 (09:28:35 CEST)
BACKGROUND: Glioblastoma (GBM) is driven by various genomic alterations. Next generation sequencing (NGS) could yield targetable alterations that may impact outcomes. The goal of this study was to describe how NGS can inform targeted therapy (TT) in this patient population. METHODS: The medical records of patients (pts) with a diagnosis of GBM from 2017-2019 were reviewed. Records of patients with recurrent GBM and genomic alterations were evaluated. Objective response rates and disease control rates were deter-mined. RESULTS: A total of 87 pts with GBM underwent NGS. Forty percent (n = 35) were considered to have actionable alterations. Of the 35, 40% (n=14) pts had their treatment changed due to an alteration. The objective response rate (ORR) of this population was 43%. The disease control rate (DCR) was 100%. The absolute mean decrease in contrast enhancing disease was 50.7% (95% CI 34.8 – 66.6). CONCLUSION: NGS for GBM, particularly in the recurrent setting, yields a high rate of actionable alterations. We observed a high ORR and DCR, reflecting the value of NGS in deciding on TT to match alterations that are likely to respond. In conclusion, patient selection and availability of NGS may impact outcomes in select pts with recurrent GBM.
CASE REPORT | doi:10.20944/preprints202009.0543.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: genetics； comparative genomics； phylogenetic analysis； osteopetrosis； CLCN7 gene
Online: 23 September 2020 (07:56:30 CEST)
Osteopetrosis is a group of rare inheritable disorders of the skeleton characterized by increased bone density. The disease is remarkably heterogeneous in clinical presentation and often misdiagnosed. Therefore, genetic testing and molecular pathogenicity analysis are essential for precise diagnosis and new targets for preventive pharmacotherapy. Mutations in the CLCN7 gene give rise to the complete spectrum of osteopetrosis phenotypes and are responsible for about 75% of cases of autosomal dominant osteopetrosis. In this study, we report the identification of a novel variant in the CLCN7 gene in a patient diagnosed with osteopetrosis and provide evidence for its significance (likely deleterious) based on extensive comparative genomics, protein sequence and structure analysis. A set of automated bioinformatics tools used to predict consequences of this variant identified it as deleterious or pathogenic. Structure analysis revealed that the variant is located at the same “hot spot” as the most common CLCN7 mutations causing osteopetrosis. Deep phylogenetic reconstruction showed that not only Leu614Arg, but any non-aliphatic substitutions in this position are evolutionarily intolerant, further supporting the deleterious nature of the variant. The present study provides further evidence that reconstructing a precise evolutionary history of a gene helps predicting phenotypical consequences of variants of uncertain significance.
ARTICLE | doi:10.20944/preprints202008.0220.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: genomics; metadata; SARS-CoV-2; bioinformatics; data standards
Online: 9 August 2020 (15:53:58 CEST)
The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatic tools and resources, and advocate for greater openness, interoperability, accessibility and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a clear and present need for a fit-for-purpose, open source SARS-CoV-2 contextual data standard. As such, we have developed an extension to the INSDC pathogen package, providing a SARS-CoV-2 contextual data specification based on harmonisable, publicly available, community standards. The specification is implementable via a collection template, as well as an array of protocols and tools to support the harmonisation and submission of sequence data and contextual information to public repositories. Well-structured, rich contextual data adds value, promotes reuse, and enables aggregation and integration of disparate data sets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19.
Subject: Biology And Life Sciences, Virology Keywords: SARS-CoV-2; genomics coronavirus; COVID-19 evolution
Online: 3 July 2020 (09:45:43 CEST)
The novel respiratory disease COVID-19 has reached the status of worldwide pandemic and large efforts are currently being undertaken in molecularly characterizing the virus causing it, SARS-CoV-2. The genomic variability of SARS-CoV-2 specimens scattered across the globe can underly geographically specific etiological effects. In the present study, we gather the 48,635 SARS-CoV-2 complete genomes currently available thanks to the collection endeavor of the GISAID consortium and thousands of contributing laboratories. We analyze and annotate all SARS-CoV-2 mutations compared with the reference Wuhan genome NC_045512.2, observing an average of 7.23 mutations per sample. Our analysis shows the prevalence of single nucleotide transitions as the major mutational type across the world. There exist at least three clades characterized by geographic and genomic specificity. In particular, the clade G, prevalent in Europe, carries a D614G mutation in the Spike protein, which is responsible for the initial interaction of the virus with the host human cell. Our analysis may drive local modulation of antiviral strategies based on the molecular specificities of this novel virus.
COMMUNICATION | doi:10.20944/preprints202005.0428.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: evolutionary genomics; Gasterosteus aculeatus; gene flow; hybridization; phylogeny
Online: 26 May 2020 (08:44:35 CEST)
Where genetic variation promoting speciation originates is a crucial question in evolutionary genomics. In a recent article, Marques et al. (2019) seek to address this question in lake and stream threespine stickleback fish from the Lake Constance (hereafter LC) basin in Central Europe. Based on population genetic methods, they conclude that incipient speciation between lake and stream stickleback was facilitated by the mixing of genetic variation from old lineages evolved in isolation (i.e., admixture following secondary contact). In this comment, I discuss conceptual and methodological problems and unrecognized conflicts with existing evidence that cast doubt on Marques et al.’s conclusion.
ARTICLE | doi:10.20944/preprints201906.0293.v1
Subject: Biology And Life Sciences, Plant Sciences Keywords: kiwifruit; genomics; polyploidy; breeding; ascorbic acid; vitamin C
Online: 28 June 2019 (08:09:04 CEST)
During analysis of kiwifruit derived from hybrids between the high AsA species Actinidia eriantha and A. chinensis var chinensis, we observed bimodal segregation of fruit AsA concentration suggesting major gene segregation. To test this hypothesis we performed whole-genome sequencing on pools of high and low AsA fruit from tetraploid A. chinensis var. deliciosa x A. eriantha backcross families. Pool-GWAS revealed a single QTL spanning more than 5 Mbp on chromosome 26, which we denote as qAsA26.1. A co-dominant PCR marker was used to validate this association in four diploid (A. chinensis x A. eriantha) x A. chinensis backcross families, showing that the eriantha allele at this locus increases fruit AsA levels by 250 mg/100 g fresh weight. Inspection of genome composition and recombination in other A. chinensis genetic maps confirmed that the qAsA26.1 region bears hallmarks of suppressed recombination. The molecular fingerprint of this locus was examined in leaves of backcross validation families by RNASEQ. This confirmed strong allelic expression bias across this region as well as differential expression of transcripts on other chromosomes. This evidence suggests that the region harboring qAsA26.1 constitutes a supergene, which may condition multiple pleiotropic effects on metabolism.
ARTICLE | doi:10.20944/preprints202102.0135.v1
Subject: Computer Science And Mathematics, Applied Mathematics Keywords: algorithmic information theory; universal distribution; Kolmogorov complexity; quantum algorithms; quantum circuit model; quantum Turing machine; genomics; viral genomics; meta-biology
Online: 4 February 2021 (12:04:02 CET)
Inferring algorithmic structure in data is essential for discovering causal generative models. In this research, we present a quantum computing framework using the circuit model, for estimating algorithmic information metrics. The canonical computation model of the Turing machine is restricted in time and space resources, to make the target metrics computable under realistic assumptions. The universal prior distribution for the automata is obtained as a quantum superposition, which is further conditioned to estimate the metrics. Specific cases are explored where the quantum implementation offers polynomial advantage, in contrast to an indispensable exhaustive enumeration in the corresponding classical case. The unstructured output data and the computational irreducibility of Turing machines make this algorithm impossible to approximate using heuristics. Thus, exploring the space of program-output relations is one of the most promising problems for demonstrating quantum supremacy using Grover search that cannot be dequantized. Experimental use cases for quantum acceleration are developed for self-replicating programs and algorithmic complexity of short strings. With quantum computing hardware rapidly attaining technological maturity, we discuss how this framework will have significant advantage for various genomics applications in meta-biology, phylogenetic tree analysis, protein-protein interaction mapping and synthetic biology. This is the first time experimental algorithmic information theory is implemented using quantum computation. Our implementation on the Qiskit quantum programming platform is copy-left and can be found on https://github.com/Advanced-Research-Centre/QPULBA
REVIEW | doi:10.20944/preprints202309.0496.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: plant genetic resources; population genomics; crop evolution; food legumes
Online: 7 September 2023 (09:23:38 CEST)
Paleogenomics focuses on the recovery, manipulation, and analysis of ancient DNA (aDNA) from historical or long-dead organisms to reconstruct and analyze their genomes. The aDNA is commonly obtained from remains found in paleontological and archaeological sites, conserved in museums and in other archival collections. Herbarium collections represent a great source of phenotypic and genotypic information, and their exploitation allowed to infer and to clarify previously unsolved taxonomic and systematic relationships. Moreover, Herbarium specimens offered a new source to study phenological traits in plants and to disentangle biogeography and evolutionary scenarios of species. More recently, advances in the molecular technologies went in parallel with the decreasing costs of the next-generation sequencing (NGS) approaches, that paved the way to the utilization of aDNA for whole-genome studies. Although many studies have been carried out combining modern analytic techniques and Herbarium specimens, this research field is still relatively unexplored, due to the need of improving strategies for aDNA manipulation and exploitation. The higher susceptibility of aDNA to degradation and contamination during the Herbarium conservation and manipulation, and the occurrence of biochemical post-mortem damages, can result in the more challenging reconstruction of the original DNA sequence. Here, we review the methodological approaches that have been developed for the exploitation of historical Herbarium plant materials, such as the best practices for aDNA extraction, amplification and genotyping. We also focus on some strategies to overcome the main problems related to the utilization of Herbarium specimens for their exploitation in plant evolutionary studies.
ARTICLE | doi:10.20944/preprints202307.0398.v1
Subject: Medicine And Pharmacology, Medicine And Pharmacology Keywords: Nephrotoxicity; Methotrexate; Genomics; Machine Learning; Explainable Artificial Intelligence; Biomarker
Online: 6 July 2023 (08:41:31 CEST)
Background: The purpose of this study is to carry out bioinformatic analysis of lncRNA data obtained as a result of genomic analysis of kidney tissue samples taken from rats with nephrotoxicity induced by methotrexate (MTX) and from rats without pathology and modeling with tree-based machine learning method. Another aim of the study is to identify potential biomarkers for the diagnosis of nephrotoxicity and to provide a better understanding of the nephrotoxicity formation process by providing the interpretability of the model with explainable artificial intelligence methods as a result of the modeling. Methods: To identify potential indicators of drug-induced nephrotoxicity, 20 female Wistar Albino rats were separated into two groups: nephrotoxicity and control. Kidney tissue samples were collected from the rats, and genomic, histological, and immunohistochemical analyses were performed. The data set obtained as a result of genomic analysis was modeled with Random Forest (RF), one of the tree-based methods. Modeling results were evaluated with sensitivity (Se), specificity (Sp), balanced accuracy (B-Acc), negative predictive value (Npv), accuracy (Acc), positive predictive value (Ppv), and F1-score performance metrics. The Local Interpretable Model-Agnostic Annotations (LIME) method was used to determine the lncRNAs that could be biomarkers for nephrotoxicity by providing the interpretability of the RF model. Results: The outcomes of the histological and immunohistochemical analyses done in the study supported the conclusion that MTX use caused kidney injury. According to the results of the bioinformatics analysis, 52 lncRNAs showed different expression in the groups. As a result of modeling with RF for lncRNAs selected with Boruta variable selection, the B-Acc, Acc, Sp, Se, Npv, Ppv, and F1-score were 88.9%, 90%, 90.9%, 88.9%, 90.9%, 88.9% and 88.9%. respectively. lncRNAs with id rnaXR_591534.3 rnaXR_005503408.1, rnaXR_005495645.1, rnaXR_001839007.2, rnaXR_005492056.1 and rna_XR_005492522.1 the lncRNAs with the highest variable importance values produced from RF modeling can be used as nephroxicity biomarker candidates. Also, according to the LIME results, the high level of lncRNAs with id rnaXR_591534.3 and rnaXR_005503408.1 especially increased the possibility of nephrotoxicity. Conclusions: With the possible biomarkers obtained as a result of the analyses made within the scope of this study, it can be ensured that the procedures for the diagnosis of drug-induced nephrotoxicity can be carried out easily, quickly and effectively.
COMMUNICATION | doi:10.20944/preprints202211.0440.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Cervical cancer; Functional genomics; Bioinformatics; Next Generation Sequencing; Exome
Online: 23 November 2022 (07:42:19 CET)
We attempted to understand the cervical cancer patient samples through Whole Exome Sequencing. We derived the variants from raw reads via our in-house benchmarked pipeline and validated the variants by IGV. This is the first cervical cancer exome data from the Indian cohort. Background: Cervical cancer (CC) is caused mainly by persistent infections of high-risk HPV, reduced parity, and factors like a decrease in average socioeconomic levels. We keep this because no cervical cancer exome data is available from the Indian cohort. Methods: The CC patient clinical samples were initially subjected to preparation using Qiagen DNA extraction, quantified, and the library preparation using the Agilent target enrichment system. Further, exome capture by Illumina platform (100X), quality check, alignment, and variant calling followed by the downstream analysis and finally visualized by IGV. Results: We observed a large number of SNVs or mutations from an Indian perspective, such as KMT2C, OR4M1, PDPR2P, EPHB1, FAS, OPCML, MGST1, C1QTNF9, HS6ST3, OR4K2, PRPSAP2, KCNJ12, FIGNL1, SFXN1, BAGE2, ARVCF, NAMPT variants of significance, unknown significance and possibly significant are reported. Conclusion: For the first time, KMT2C is observed as a novel potent mutation and pathogenic, showing the variant position at (7:152265091, T>A, SNV 62478356) from the Indian context in CC. Further, we visualized and validated the mutations using the Integrative Genomic Viewer (IGV) browser. Finally, we discuss the inherent challenges through KMT2C mutations.
ARTICLE | doi:10.20944/preprints202208.0191.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: bacterial genomics; de novo assembly; Oxford Nanopore Technologies; Snakemake
Online: 10 August 2022 (04:37:01 CEST)
With the advancement of long-read sequencing technologies and their more widespread use for bacterial genomics, several methods for generating genome assemblies from error-prone long reads have been developed. These are complemented by various tools for assembly polishing using either long reads, short reads, or reference genomes. End users are therefore left with a plethora of possible combinations of programs for obtaining a final trusted assembly. Hence, there is also the need for measuring completeness and accuracy of such assemblies, for which, again, several evaluation methods implemented in various programs are available. In order to automatically run all these programs, I developed two workflows for the workflow management system Snakemake for bacterial genome assembly and evaluation of assemblies, which provide end users with an easy-to-run method for both tasks. The workflows are available as open source software under the MIT license at https://github.com/pmenzel/ont-assembly-snake and https://github.com/pmenzel/score-assemblies.
REVIEW | doi:10.20944/preprints202111.0203.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: flower development; epigenetics; RNA biology; Genomics; single cell biology
Online: 10 November 2021 (11:00:03 CET)
The rise of data science in biology stimulates interdisciplinary collaborations to address fundamental questions. Here, we report the outcome of the first SINFONIA symposium focused on revealing the mechanisms governing plant reproductive development across biological scales. The intricate and dynamic target networks of known regulators of flower development remain poorly understood. To analyze development from the genome to the final floral organ morphology, high-resolution data that capture spatiotemporal regulatory activities are necessary and require advanced computational methods for analysis and modeling. Moreover, frameworks to share data, practices and approaches that facilitate the combination of varied expertise to advance the field are called for. Training young researchers in interdisciplinary approaches and science communication offers the opportunity to establish a collaborative mindset to shape future research.
REVIEW | doi:10.20944/preprints202108.0514.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: radish; breeding; interspecific hybridization; molecular breeding; genomics; genetic engineering
Online: 26 August 2021 (16:46:36 CEST)
Radish is an annual herbaceous root crop, fruit, and oil crop plant belonging to the Cruciferae family. The important traits for radish breeding include high yield, early maturity, late bolting, pungency, cold-hardiness, drought resistance, heat tolerance, and soil adaptability. For successful radish production, need to the understand nature and behavior of the flower, and very important to identify the S haplotypes of parental lines to produce F1 hybrids based on self-incompatibility to get rid of laborious hand emasculation in radish. In radish some desirable genes are not present within varieties. Therefore, further breeding programmes depend on inter-specific and intra-specific hybridization, which has a vital role in genomic studies and crop improvement by introducing desirable agronomic characters. It is essential to acquire detailed genetic information on chromosomes and information on inheritance. Genomics is now at the core of crop improvement, and radish crop is exploited to study the underlying differences in genotypes. But some monogenic characters are improved by genetic engineering. A three-decade span following the first documented instance of genetic engineering has witnessed its application's unprecedented growth. Researchers have successfully produced transgenic radishes with various agronomic characteristics over the last decade.
REVIEW | doi:10.20944/preprints202010.0149.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: breeding; diversity; genetic engineering; genomics; male sterility; melon; QTLs
Online: 7 October 2020 (09:22:33 CEST)
Melon (Cucumis melo L.) a member of family Cucurbitaceae is extensively cultivated for its fleshy fruits. Based on the specific agro-climatic zones of cultivation as well as concerning the regional preferences, melon displays significant variability phenotypic and biochemical attributes. Below, an effort is put forth to considerably evaluate the scope of achievements while in the growth as well as the enactment of melon breeding programs by employing the newest solutions. Melon breeding has achieved critical milestones throughout the previous century, and we hope this trend will go on to persist down the road. However, studies have to determine new genetic information for genes associated with the challenges imposed by climate change. The identification of valuable hereditary and also metabolic variability in the form of landraces and melon wild relatives will be useful for harvest diversification and also for the broadening of the cultivated melon genetic base. Whereas, considerable information on genomics, and melon metabolomics, is beneficial for dissecting the basis of the inheritance of important traits and their impact on the former characteristics. Overall, we hope the manuscript is going to serve as a crucial resource for the melon breeders.
ARTICLE | doi:10.20944/preprints202005.0413.v1
Subject: Biology And Life Sciences, Virology Keywords: SARS-CoV-2; nucleocapsid (N); genomics; coronavirus; Wuhan; Pandemic
Online: 25 May 2020 (17:45:40 CEST)
Severe acute respiratory syndrome novel coronavirus 2 (SARS-CoV-2) has caused the global pandemic as COVID-19, which is the most notorious global public health crisis in the last 100 years. SARS-CoV-2 is composed of four structural proteins and several non-structured proteins. The multi-facet nucleocapsid (N) protein is the major component of structural proteins of CoVs, However, there are no dedicated genomic, sequences and structural analyses focusing on potential roles of N protein. Hence, there is an urgent requirement of a detailed study on N protein of SARS-CoV-2. Herein, we are presenting a comprehensive study on N protein from SARS-CoV-2. We have identified seven motifs conserved in the three major domains namely N-terminal domain, linker regions and the C-terminal domains. Out of seven motifs, six motifs are conserved across different members of coronaviridae, while motif4 is specific for SARS CoVs with potential amyloidogenic properties. Additionally, we report this protein has large patches of disordered regions flanking with these seven motifs. These motifs are hubs of epitopes with 67 experimentally verified epitopes from related viruses. We report the presence of three nuclear localization signals (NLS1-NLS3 mapped to 36-41, 256-26, and 363-389 residues, respectively) and two nuclear export signals (NES1-NLS2 from 151-161 and 217-230 residues, respectively) in the N protein of SARS-CoV-2. These deciphered two Q-patches as Q-patch1 and Q-patch2, mapped in the regions of 266-306, and 361-418 residues, which potentially help in the aggregation of the viral proteins along with 219LALLLLDR226 patch. Additionally, we have identified 14 antiviral drugs potentially binding to seven motifs of N-proteins using docking-based drug discovery methods.
REVIEW | doi:10.20944/preprints202004.0005.v1
Subject: Biology And Life Sciences, Virology Keywords: SARS-CoV-2; COVID-19; Coronavirus; Pandemic; Viral Genomics
Online: 1 April 2020 (09:22:38 CEST)
The COVID-19 pandemic is due to infection caused by the novel SARS-CoV-2 that impacts the lower respiratory tract. The spectrum of symptoms ranges from asymptomatic infections to mild respiratory symptoms to the lethal form of COVID-19 which is associated with severe pneumonia, acute respiratory distress and fatality. At present, the global case fatality rate of COVID-19 laboratory confirmed cases is ~4.7% ranging from ~0.3-0.4% in Chile and Israel to ~10.8% in Italy. To address this global crisis, up-to-date information on the viral genomics and transcriptomics is crucial for understanding the origins and global dispersal of the virus, providing insight into viral pathogenicity, transmission and epidemiology, and enabling strategies for therapeutic interventions, drug discovery and vaccine development. Therefore, this review provides a comprehensive overview of COVID-19 epidemiology, genomic etiology, findings from recent transcriptomic map analysis, viral-human protein interactions, molecular diagnostics, and the current status of vaccine and novel therapeutic intervention development. Moreover, we provide an extensive list of resources that will help the scientific community access numerous types of databases related to SARS-CoV-2 OMICs and approaches to therapeutics related to COVID-19 treatment.
ARTICLE | doi:10.20944/preprints202003.0336.v1
Subject: Biology And Life Sciences, Virology Keywords: Manihot esculenta Crantz; potexvirus; Cassava common mosaic virus; genomics
Online: 23 March 2020 (05:46:35 CET)
The complete genomic sequence of a Cassava common mosaic virus Linggao isolate (CsCMV-LG) was determined from cassava (Manihot esculenta Crantz) with mild leafy mosaic symptom to no symptom in China. Excluding the poly(A) tail, the CsCMV-LG genome (GenBank accession No. MT038420) is 6374 nucleotides (nts) in length, with five major open reading frames encoding a 1450-amino acids (aa) RNA-dependent RNA polymerase (RdRp), three triple gene block (TGB) proteins (231-aa, 110-aa and 95-aa), and a 229-aa coat protein (CP). Phylogenetic analysis indicated that the complete genome of the CsCMV-LG is closely related to that of CsCMV-Brazilian which has been assigned to the genus Potexvirus, but the sequence identity shared only 88.0%. Notable, the mild CsCMV-LG isolate can also infect Nicotiana benthamiana in laboratory through rub inoculation causing mild vein yellowing at 15-day post inoculation. This is the first full-length genome sequence of a distinct isolate of Cassava common mosaic virus (CsCMV) infecting cassava in Hainan, China.
REVIEW | doi:10.20944/preprints201612.0061.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: alcohol; aroma; bioengineering; flavour; synthetic genomics; taste; wine; yeast
Online: 10 December 2016 (09:09:54 CET)
A perfectly balanced wine can be said to create a symphony in the mouth. To achieve the sublime, both in wine and music, requires imagination and skilled orchestration of artistic craftmanship. For wine, inventiveness starts in the vineyard. Similar to a composer of music, the grapegrower produces grapes through a multitude of specifications to achieve a quality result. Different Vitis vinifera grape varieties allow the creation of wine of different genres. Akin to a conductor of music, the winemaker decides what genre to create and considers resources required to realise the grape’s potential. A primary consideration is the yeast: inoculate the grape juice or leave it ‘wild’; which specific or combined Saccharomyces strain(s) should be used; or proceed with a non-Saccharomyces species? Whilst the various Saccharomyces and non-Saccharomyces yeasts perform their role during fermentation, the performance is not over until the ‘fat lady’ (S. cerevisiae) has sung (i.e. the grape sugar has been fermented to specified dryness and alcoholic fermentation is complete). Is the wine harmonious or discordant? Will the consumer demand an encore and make a repeat purchase? Understanding consumer needs lets winemakers orchestrate different symphonies (i.e. wine styles) using single- or multi-species ferments. Some consumers will choose the sounds of a philharmonic orchestra comprising a great range of diverse instrumentalists (as is the case with wine created from spontaneous fermentation); some will prefer to listen to a smaller ensemble (analogous to wine produced by a selected group of non-Saccharomyces and Saccharomyces yeast); and others will favour the well-known and reliable superstar soprano (i.e. S. cerevisiae). But what if a digital music synthesiser ‒ such as a synthetic yeast ‒ becomes available that can produce any music genre with the purest of sounds by the touch of a few buttons? Will synthesisers spoil the character of the music and lead to the loss of the much-lauded romantic mystique? Or will music synthesisers support composers and conductors to create novel compositions and even higher quality performances that will thrill audiences? This article explores these and other relevant questions in the context of winemaking and the role that yeast and its genomics play in the betterment of wine quality.
REVIEW | doi:10.20944/preprints202304.0798.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Head and neck cancer; oral mucositis; pain; genomics; transcriptomics; microbiomics
Online: 23 April 2023 (12:29:17 CEST)
Oral mucositis (OM) is inflammation of the mouth caused by damage to the mucous membranes that line the mouth and throat. It is a side effect of cancer treatment, particularly in patients with head and neck squamous cell carcinoma (HNSCC) who undergo radiotherapy, chemotherapy, and/or immunotherapy with immune checkpoint inhibitors. The etiology and pathogenic mechanisms of OM is complex and multifaceted, involving cytotoxicity (cell death), inflammation, infection, change in microbiome, and immune-mediated cytotoxicity. We summarize the literature about attempts to use various omics methodologies (genomics, transcriptomics, microbiomics and metabolomics) to elucidate the biological pathways associated with the development or the severity of OM. Integrating different omics into multi-omics approaches carries the potential to discover links among host factors (genomics), host responses (transcriptomics, metabolomics), and local environment (microbiomics).
REVIEW | doi:10.20944/preprints202201.0084.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: hybrid; lager; yeast; introgression; interspecific; domestication; phylogeny; brewing; molecular; genomics
Online: 6 January 2022 (11:55:13 CET)
: Microbiology has long been a keystone in fermentation and the utilization of yeast biology rein-forces molecular biotechnology as the pioneering frontier in brewing science. Consequently, modern understanding of the brewer’s yeast has faced significant refinement over the last few decades. This publication presents a condensed summation of Saccharomyces species dynamics with an emphasis on the relationship between traditional ale yeast, Saccharomyces cerevisiae, and the interspecific hybrids used in lager beer production, S. pastorianus. Introgression from other Sac-charomyces species is also touched on. The unique history of Saccharomyces cerevisiae and Saccharo-myces hybrids are exemplified by recent genomic sequencing studies aimed at categorizing brewing strains through phylogeny and redefining Saccharomyces species boundaries. Phylogenetic investigations highlight the genomic diversity of Saccharomyces cerevisiae ale strains long known to brewers by their fermentation characteristics and phenotypes. Discoveries of genomic contribu-tions from interspecific Saccharomyces species into the genome of S. cerevisiae strains is ever more apparent with increased investigations on the hybrid nature of modern industrial and historical fermentation yeast.
ARTICLE | doi:10.20944/preprints202107.0457.v1
Subject: Biology And Life Sciences, Immunology And Microbiology Keywords: Latilactobacillus sakei; comparative genomics; carbohydrate utilization; antibiotic tolerance; CRISPR-Cas
Online: 20 July 2021 (15:02:42 CEST)
Increasing attention has been paid to the potential probiotic effects of Latilactobacillus sakei. To explore the genetic diversity of L. sakei, 14 strains isolated from different niches (feces, fermented kimchi and meat products) and 54 published strains were compared and analyzed. The results showed that the average genome size and GC content of L. sakei were 1.98Mb and 41.22%, respectively. Its core genome mainly encodes translation and transcription, amino acid synthesis, glucose metabolism and defense functions. L. sakei has an open pan-genomic characteristics, and its pan-gene curve shows an upward trend. L. sakei has open pan-genome feature, and its pan-genome curve is on the rise. The genetic diversity of L. sakei is mainly reflected in carbohydrate utilization, antibiotic tolerance, and immune/competition-related factors, such as clustering regular interval short palindromic repeat sequence (CRISPR)-Cas. The CRISPR system is mainly IIA type, and a few are IIC types. This work provides a basis for the study of this species.
ARTICLE | doi:10.20944/preprints202102.0060.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: Escherichia coli; magnetite nanoparticles; metals; antibiotics; genomics; pleiotropy; cell morphology
Online: 1 February 2021 (15:58:10 CET)
Experimental evolution was utilized to produce 5 magnetite nanoparticle-resistant (FeNP1-5) populations of Escherichia coli. The control populations were not exposed to magnetite nanoparticles. The 24-hour growth of these replicates was evaluated in the presence of increasing concentrations magnetite NPs as well as other ionic metals (gallium III, iron II, iron III, silver I) and antibiotics (ampicillin, chloramphenicol, rifampicin, sulfanilamide, tetracycline). Scanning electron microscope was utilized to determine cell size and shape in response to magnetite nanoparticle selection. Whole genome sequencing was carried out to determine if any genomic changes that resulted from magnetite nanoparticle resistance. After 25 days of selection magnetite resistance was evident in the FeNP treatment. The FeNP populations also showed a highly significantly (p < 0.0001) greater 24-growth as measured by optical density in metals (Fe (II), Fe (III), Ga (III), Ag and Cu II); as well as antibiotics (ampicillin, chloramphenicol, rifampicin, sulfanilamide, and tetracycline). The FeNP resistant populations also showed a significantly greater cell length compared to controls (p < 0.001). Genomic analysis of FeNP identified both polymorphisms and hard selective sweeps in the RNA polymerase genes rpoA, rpoB, and rpoC. Collectively, our results show that E. coli can rapidly evolve resistance to magnetite nanoparticles and that this result is correlated resistances to other metals and antibiotics. There were also changes in cell morphology resulting from adaptation to magnetite NPs. Thus, the various applications of magnetite nanoparticles could result in unanticipated changes in resistance to both metal and antibiotics.
REVIEW | doi:10.20944/preprints202008.0133.v1
Subject: Biology And Life Sciences, Virology Keywords: epidemic; viral sequences; genomics; metadata; data harmonization; integration and search
Online: 5 August 2020 (10:58:27 CEST)
With the outbreak of the COVID-19 disease, the research community is producing unprecedented efforts dedicated to better understand and mitigate the affects of the pandemic. In this context, we review the data integration efforts required for accessing and searching genome sequences and metadata of SARS-CoV2, the virus responsible for the COVID-19 disease, which have been deposited into the most important repositories of viral sequences. Organizations that were already present in the virus domain are now dedicating special interest to the emergence of COVID-19 pandemics, by emphasizing specific SARS-CoV2 data and services. At the same time, novel organizations and resources were born in this critical period to serve specifically the purposes of COVID-19 mitigation, while setting the research ground for contrasting possible future pandemics. Accessibility and integration of viral sequence data, possibly in conjunction with the human host genotype and clinical data, are paramount to better understand the COVID-19 disease and mitigate its effects.
REVIEW | doi:10.20944/preprints201811.0571.v2
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: prostate cancer; prostate-specific antigen; incidence; genomics; next generation sequencing
Online: 3 April 2019 (10:15:50 CEST)
In the recent past, there has been a rise in Prostate Cancer (PCa) in Asia, particularly India. Although systematic reviews on PCa have dealt on the genetics, genomics and the environmental influence in causal of PCa, no predictive analytics in comparing the PCa from Caucasian, American to Asian population was attempted. In this review article, we have attempted to elaborate this aspect of PCa and deliberated on challenges related to next generation sequencing methods of PCa’s manifestation when compared to the west.
ARTICLE | doi:10.20944/preprints201811.0071.v1
Subject: Biology And Life Sciences, Plant Sciences Keywords: basal angiosperms; chloroplast; comparative genomics; Nymphaeales; Nymphaeaceae; phylogenomics; water lily
Online: 2 November 2018 (16:20:31 CET)
The order Nymphaeales, consisting of three families with a record of eight genera, has gained significant interest from botanists probably due to its position as a basal-angiosperm. The phylogenetic relationships within the order have well been studied and resolved; however, a few controversial nodes still remain in the Nymphaeaceae including the position of the genus Nuphar. The position of the genus Nuphar and the monophyly of the Nymphaeaceae family remain uncertain. This study adds to the increasing number of completely sequenced plastid genomes of the Nymphaeales and applies large chloroplast gene data set in reconstructing the intergeneric relationships within the Nymphaeaceae. Five complete chloroplast genomes were newly generated, including a first one for the monotypic genus Euryale. Using a set of 66 protein coding genes from the chloroplast genomes of 17 taxa, the phylogenetic position of Nuphar was determined and a monophyletic Nymphaeaceae family was obtained with a convincing statistical support from both partitioned and unpartitioned data schemes. Although genomic comparative analyses revealed a high degree of synteny among the chloroplast genomes of the ancient angiosperms, key minor variations were evident particularly in the contraction/expansion of the Inverted Repeat regions and in RNA editing events. Genome structure, gene content and arrangement were highly conserved among the chloroplast genomes.
REVIEW | doi:10.20944/preprints202203.0388.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: phenotypic polymorphism; structural genomics; chromosomal inversion; supergene; functional genomics; hormonal plasticity; frequency-dependent selection; cryptic female choice of sperm; sexual selection; eco-evolutionary dynamics
Online: 30 March 2022 (10:15:55 CEST)
A few empirical examples document fixed alternative male mating strategies in animals. Here we focus on the polymorphism of male mating strategies in the ruff (Calidris pugnax, Aves Charadriiformes). In ruffs, three fixed alternative male mating strategies coexist and are signaled by extreme plumage polymorphism. We first present relevant data on the biology of the species. Then we review the available knowledge of the behavioral ecology of ruffs during the breeding season and we detail the characteristics of each of the three known fixed male mating strategies. We next turn to the exceptional quality results accumulated on both the structural and functional genomics of the ruff over the past few years. We show how much these genomic data can shed a new, mechanistic light on the evolution and maintenance of the three fixed alternative male mating strategies. We then look if there are sufficient indication to support frequency-dependent selection as key mechanism in maintaining these three strategies. Specifically, we search for evidences of equal fitness among individuals using each of the three strategies. Finally, we propose three lines of research avenues that will help to understand the eco-evolutionary dynamics of phenotypic differences within natural populations of this iconic model species.
REVIEW | doi:10.20944/preprints202308.1641.v1
Subject: Biology And Life Sciences, Plant Sciences Keywords: pan-genomes; comparative genomics; plant pathways; genomic databases; gravitropism; Gene Ontology
Online: 23 August 2023 (09:29:18 CEST)
The availability of multiple sequenced genomes from a single species made it possible to explore intra- and inter-specific genomic comparisons at higher resolution and build clade-specific pangenomes of several crops. The pan-genomes of crops constructed from various cultivars/accessions, landraces, and wild ancestral species represent a compendium of genes and structural variations and allow researchers to search for the novel genes and alleles that were inadvertently lost in domesticated crops during the historical process of crop domestication or in the process of extensive plant breeding. Fortunately, many valuable genes and alleles associated with desirable traits like disease resistance, abiotic stress tolerance, plant architecture, and nutrition qualities exist in landraces, ancestral species, and crop wild relatives. The novels genes from the wild ancestors and landraces can be introduced back to high-yielding varieties of modern crops by implementing classical plant breeding, genomic selection, and transgenic/gene editing approaches. Thus, pan-genomic represents a great leap in plant research and offers new avenues for targeted breeding for mitigating the impact of global climate change. Here we summarize the tools used for pangenome assembly and annotations, web-portals hosting plant pangenomes. Furthermore, we highlight a few discoveries made in crops using the pan-genomic approach and its future potential.
ARTICLE | doi:10.20944/preprints202306.0526.v1
Subject: Biology And Life Sciences, Ecology, Evolution, Behavior And Systematics Keywords: Dispersal; Connectivity; Movement; Conservation Genomics; Madagascar; Habitat Loss and Fragmentation; Rodents
Online: 7 June 2023 (09:30:53 CEST)
Habitat loss and fragmentation are of concern to conservation biologists worldwide. However, not all organisms are affected equally by these processes, thus it is important to study the effects of living in fragmented habitats on species that differ in lifestyle and habitat requirements. In this study we examined dispersal and connectivity patterns of rodents, one endemic (Eliurus myoxinus) and one invasive (Rattus rattus), in two landscapes containing forest fragments and adjacent continuous forest patches in northwestern Madagascar. We generated genomic (RADseq) data for 66 E. myoxinus and 81 R. rattus individuals to evaluate differences in genetic diversity as well as inbreeding and connectivity in two landscapes. We found higher levels of inbreeding and lower levels of genetic diversity in E. myoxinus compared with R. rattus. We observed related dyads both within and between habitat patches and positive spatial autocorrelation at lower distance classes for both species, with a stronger pattern of spatial autocorrelation in R. rattus. Across each site we identified contrasting migration rates for each species, but these did not correspond to habitat-matrix dichotomies. The relatively low genetic diversity in the endemic E. myoxinus suggests ecological constraints that require further investigation.
ARTICLE | doi:10.20944/preprints202206.0105.v1
Subject: Biology And Life Sciences, Virology Keywords: Smear-ripened cheese; virulent phages; rind bacteria; phage reservoirs; viral genomics
Online: 7 June 2022 (10:37:21 CEST)
Smear-ripened cheeses host complex microbial communities that play a crucial role in the ripening process. Although bacteriophages have been frequently isolated from dairy products, their diversity and ecological role in such type of cheese remain underexplored. In order to fill this gap, the main objective of this study was to isolate and characterize bacteriophages from the rind of a smear-ripened cheese. Thus, viral particles extracted from cheese rind were tested against a collection of bacterial isolates through a spot assay. In total, five virulent bacteriophages infecting Brevibacterium aurantiacum, Glutamicibacter arilaitensis, Leuconostoc falkenbergense and Psychrobacter aquimaris species were obtained. All exhibit a narrow host range, being only able to infect a few cheese-rind isolates within the same species. The complete genome of each phage was sequenced using both Nanopore and Illumina technologies, assembled and annotated. Sequence comparison with known phages revealed that four of them may represent at least new genera. The distribution of the five virulent phages into the dairy-plant environment was also investigated by PCR and three potential reservoirs were identified. This work provides new knowledge on the cheese rind viral community and an overview of the distribution of phages within a cheese factory.
ARTICLE | doi:10.20944/preprints202205.0070.v1
Subject: Biology And Life Sciences, Immunology And Microbiology Keywords: phototrophic bacteria; phototrophic extracellular electron uptake; comparative genomics; transcriptomics; environmental microbiology
Online: 6 May 2022 (09:35:45 CEST)
Rhodovulum spp. are anoxygenic photosynthetic purple bacteria with versatile metabolisms, including the ability to obtain electrons from minerals in their environment to drive photosynthesis, a relatively novel process called phototrophic extracellular electron uptake (pEEU). Recently, our group isolated 15 strains of R. sulfidophilum to observe this metabolism in marine phototrophs. Our group previously observed carbon dioxide fixation coupled to phototrophic iron oxidation (photoferrotrophy) and pEEU in AB26 and identified a novel di-heme c¬-type cytochrome EeuP important for pEEU but not photoferrotrophy. Taxonomic re-evaluation based on 16S and pufM phylogenetic analyses led us to re-classify two isolates, AB26 and AB19, as Rhodovulum visakhapatnamense. The AB26 genome consists of 4,380,746 base-pairs, including two plasmids, and encodes 4,296 predicted protein-coding genes. AB26 contains 22 histidine kinases, 20 response regulators, and dedicates ~16% of its genome to transport. Transcriptomic data under aerobic, photoheterotrophy, photoautotrophy, and pEEU reveals how gene expression varies between metabolisms. Lastly, we use transcriptomic data for a comparative genomic analysis of potential pEEU-relevant genes between all 15 isolates. With these data we identify potential pEEU capable phototrophs within these isolates, and likely molecular mechanisms of pEEU.
ARTICLE | doi:10.20944/preprints202109.0102.v1
Subject: Biology And Life Sciences, Biology And Biotechnology Keywords: abiotic stress; HSFs; genomics; gene ontology; maize breeding; protein 3D structures
Online: 6 September 2021 (13:57:37 CEST)
Heat shock transcription factors (HSFs) participate in regulating many environmental stress responses and biological processes in plants. Maize (Zea mays L.) is a major cash crop that is grown worldwide. However, the growth and yield of maize are affected by several adverse environmental inputs. Therefore, investigating the factors that regulate maize growth and development and resistance to abiotic stress is an essential task for developing stress-resilient maize varieties. Thus, a comprehensive genome-wide identification analysis was performed to identify HSFs in the maize genome. The current study identified 25 ZmHSFs, randomly distributed throughout the maize genome. Phylogenetic analysis revealed that ZmHSFs are divided into three classes and 13 sub-classes. Gene structure and protein motif analysis supported the results obtained through the phylogenetic analysis. Domain analysis showed the DNA-binding domain to be the most conserved region of ZmHSFs. Segmental duplication is shown to be responsible for the expansion of ZmHSFs. Most of the ZmHSFs are localized inside the nucleus, and the ZmHSFs which belong to the same group show similar physio-chemical properties. The 3D structures revealed comparable conserved ZmHSFs protein structures. RNA-seq analysis revealed a major role of class A HSFs including, ZmHSFA-1a and ZmHSFA-2a in all the maize growth stages, i.e., seed, vegetative, and reproductive development. Furthermore, ZmHSFs displayed an obvious spatiotemporal expression. Under abiotic stress conditions (heat, drought, cold, UV, and salinity), members of class A and B ZmHSFs are induced. Gene ontology (GO) annotation analysis indicated a major role of ZmHSFs in resistance to environmental stress and regulation of primary metabolism. Further, the protein-protein interaction analysis showed that ZmHSFs interact with several molecular chaperons and major stress-responsive proteins. To summarize, this study provides novel insights for functional studies on the ZmHSFs in maize breeding programs.
ARTICLE | doi:10.20944/preprints202103.0103.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: Next Generation Sequencing; Laboratory automation; Hereditary Cancer; Genetic Testing; Clinical Genomics.
Online: 2 March 2021 (16:00:24 CET)
(1) Background: the NGS based mutational study of hereditary cancer genes is crucial to design tailored prevention strategies in subjects with different hereditary cancer risk. The ease of amplicon-based NGS library construction protocols contrasts with the greater uniformity of enrichment provided by capture-based protocols and so with greater chances for detecting larger genomic rearrangements and copy-number variations. Capture-based protocols, however, are characterized by a higher level of complexity of sample handling, extremely susceptible to human bias. Robotics platforms may definitely help dealing with these limits, reducing hands-on time, limiting random errors and guaranteeing process standardization. (2) Methods: We implemented and validated the complete automation of the SOPHiA GENETICS’ CE-IVD Hereditary Cancer Solution™ (HCS) libraries preparation workflow on the Hamilton’s STARlet platform. (3) Results: We demonstrate that this automated workflow, used for more than 1000 samples achieved the same performances of manual setup in terms of coverages and reads uniformity, with extremely lower variability of reads mapping rate onto the regions of interest. (4) Conclusions: This automated solution offers same reliable and affordable NGS data, but with the essential advantages of a flexible, automated and integrated framework, minimizing possible human errors and depicting a laboratory’s walk-away scenario.
REVIEW | doi:10.20944/preprints202010.0301.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Obesity; Genetics; Companion Animals; Metabolic Disease; Comparative Genomics; Dogs; Cats; Horses
Online: 14 October 2020 (10:51:29 CEST)
Obesity is one of the most prevalent health conditions in humans and companion animals across the world. Obesity is associated with multiple health conditions across species including premature mortality. It is therefore of importance across the fields of medicine and veterinary medicine. The regulation of body weight is a homeostatic process vulnerable to disruption by genetic and environmental factors. It is well established that the heritability of obesity is high in humans and laboratory animals, with ample evidence that the same is true in companion animals. In this review, we provide an overview of how genes link to obesity in humans, drawing on a wealth of information from laboratory animal models, and summarising the mechanisms by which obesity causes related disease. Throughout, we focus on how large-scale human studies and niche investigations of rare mutations in severely affected patients have improved our understanding of obesity biology and can inform our ability to interpret results of animal studies. For dogs, cats and horses, we review the similarities in obesity pathophysiology to humans and review those genetic studies that have been done to investigate them. Finally, we discuss how veterinary genetics may learn from humans about studying precise, nuanced phenotypes and implementing large-scale studies, but also how veterinary studies may be able to look past clinical findings to mechanistic ones and demonstrate translational benefits to human research.
REVIEW | doi:10.20944/preprints202005.0448.v1
Subject: Biology And Life Sciences, Virology Keywords: betacoronaviruses; genomics; SARS-CoV; MERS-CoV; SARS-CoV-2; COVID-19
Online: 27 May 2020 (08:50:46 CEST)
In the 21st century, three highly pathogenic betacoronaviruses have emerged, with an alarming rate of human morbidity and case fatality. Genomic information has been widely used to understand the pathogenesis, animal origin and mode of transmission of betacoronaviruses in the aftermath of the 2002-03 severe acute respiratory syndrome (SARS) and 2012 Middle East respiratory syndrome (MERS) outbreaks. Furthermore, genome sequencing and bioinformatic analysis have had an unprecedented relevance in the battle against the 2019-20 coronavirus disease 2019 (COVID-19) pandemic, the newest and most devastating outbreak caused by a coronavirus in the history of mankind, allowing the follow up of disease spread and transmission dynamics in near real time. Here, we review how genomic information has been used to tackle outbreaks caused by emerging, highly pathogenic, betacoronavirus strains, emphasizing on SARS-CoV, MERS-CoV and SARS-CoV-2.
REVIEW | doi:10.20944/preprints202004.0333.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: bitter gourd; breeding; genetic diversity; genomics; heterosis; molecular breeding; mutation breeding
Online: 19 April 2020 (06:00:00 CEST)
Bitter gourd is an important vegetable of the family Cucurbitaceae, cultivated mainly in humid and subtropical Asia. Bitter gourd is vegetable with immense health benefits due to the presence of medicinal compounds such as charatin, vicine, and polypeptide-p, which play an essential role in lessening the blood glucose levels. Moreover, bitter gourd fruits are particularly rich in vitamin C, minerals, and carotenes. Here, an effort has been made to critically evaluate the extent of achievements during the enhancement and enactment of bitter gourd breeding programs with the use of latest technologies. Broadening of the genetic base of cultivated bitter groud varieties as a result of enrichment of the existing resources by using the wild species in the breeding programs. Practical seed production technological know-how along with the use of the MS system (male sterility)/chemical-induced sterility procedure is nonetheless vital to cope up with the market demands. Superior yielding bitter gourd hybrids combining early maturity and resistance to biotic and abiotic stresses are regularly needed to cope up with the challenge of bitter gourd production.
REVIEW | doi:10.20944/preprints201912.0316.v1
Subject: Biology And Life Sciences, Agricultural Science And Agronomy Keywords: Peanut; plant breeding; research; funding; genomics; INERA; cultivar; selection; Arachis hypogaea
Online: 24 December 2019 (11:07:38 CET)
Groundnut (Arachis hypogaea L.) is a major food and cash crop in Burkina Faso. Due to growing demand for raw oilseeds, there is an increasing interest in groundnut production from traditional rain-fed areas to irrigated environments. However, despite implementation of many initiatives in the past to increase groundnut productivity and production, the groundnut industry still struggles to prosper, due to several constraints including minimal development research and fluctuating markets. Yield penalty due to drought and biotic stresses continue to be a major drawback for groundnut production. This review traces progress in the groundnut breeding that started in Burkina Faso before the country’s political independence in 1960 through to present times. Up to the 1980s, groundnut improvement was led by international research institutions such as IRHO (Institute of Oils and Oleaginous Research) and ICRISAT (International Crops Research Institute for the Semi-Arid Tropics). However, international breeding initiatives were not sufficient to establish a robust domestic groundnut breeding programme. This review also provides essential information about opportunities and challenges of groundnut research in Burkina Faso, emphasising the need for institutional attention to genetic improvement of the crop.
ARTICLE | doi:10.20944/preprints201812.0026.v2
Subject: Biology And Life Sciences, Virology Keywords: Lactobacillus plantarum; phage; new genus; annotation; comparative genomics; phylogenetics; isolation; diversity
Online: 11 June 2019 (09:54:23 CEST)
Lactobacillus plantarum is a bacterium with promising applications to the food industry and agriculture and probiotic properties. So far, bacteriophages of this bacterium have been moderately addressed. We examined the diversity of five new L. plantarum phages via whole genome shotgun sequencing and in silico protein predictions. Moreover, we looked into their phylogeny and their potential genomic similarities to other complete phage genome records through extensive nucleotide and protein comparisons. These analyses revealed a high degree of similarity among the five phages, which extended to the vast majority of predicted virion-associated proteins. Based on these, we selected one of the phages as a representative and performed transmission electron microscopy and structural protein sequencing tests. Overall, the results suggested that the five phages belong to the family Myoviridae, they have a long genome of 137.973-141.344 bp, a G/C content of 36,3-36,6% that is quite distinct from their host’s, and, surprisingly, seven to 15 tRNAs. Only an average 41/174 of their predicted genes were assigned a function. The comparative analyses unraveled considerable genetic diversity for the five L. plantarum phages of this study. Hence, the new genus “Semelevirus” was proposed, which comprises exclusively the five phages. This novel lineage of Lactobacillus phages provides further insight into the genetic heterogeneity of phages infecting Lactobacillus sp.. The five new Lactobacillus phages have a potential value for the development of more robust starters through, for example, the selection of mutants insensitive to phage infections. The five phages could also form part of phage cocktails, which producers would apply in different stages of L. plantarum fermentations in order to create a range of organoleptic outputs.
ARTICLE | doi:10.20944/preprints201905.0284.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: personal genomics, DNA, polygenic, risk, regulation, discrimination, calibration, prediction, transparency, autonomy
Online: 23 May 2019 (16:23:57 CEST)
Direct-to-consumer genetic testing companies aim to predict the risks of complex diseases using proprietary algorithms. Companies keep algorithms as trade secrets for competitive advantage, but a market that thrives on the premise that customers can make their own decisions about genetic testing should respect customer autonomy and informed decision making and maximize opportunities for transparency. The algorithm itself is only one piece of the information that is deemed essential for understanding how prediction algorithms are developed and evaluated. Companies should be encouraged to disclose everything else, including the expected risk distribution of the algorithm when applied in the population, using a benchmark DNA dataset. A standardized presentation of information and risk distributions allows customers to compare test offers and scientists to verify whether the undisclosed algorithms could be valid. A new model of oversight in which stakeholders collaboratively keep a check on the commercial market is needed.
ARTICLE | doi:10.20944/preprints201803.0145.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: repetitive elements; RNA-Seq; genomics; evolution; cytogenetics; supernumerary elements; extra chromosomes
Online: 19 March 2018 (08:33:48 CET)
B chromosomes (B) are supernumerary elements found in many taxonomic groups. Most B chromosomes are rich in heterochromatin and composed of abundant repetitive sequences, especially transposable elements (TEs). Bs origin is generally linked to the A chromosome complement (A). The first report of a B chromosome in African cichlids was on Astatotilapia latifasciata, which can harbor 0, 1 or 2 B chromosomes. Classical cytogenetics studies found high TE content on the species B chromosome. In this study, we aim to understand TE composition and expression on A. latifasciata genome and its relation to the B chromosome. We use bioinformatics analysis to explore TEs genome organization and also their composition on the B chromosome. Bioinformatics findings were validated by fluorescent in situ hybridization (FISH) and real-time PCR (qPCR). A. latifasciata has a TE content similar to other cichlid fishes and several expanded elements on its B chromosome. With RNA sequencing data (RNA-seq) we showed that all major TE classes are transcribed in brain, muscle and male/female gonads. The evaluation of TE expression between B- and B+ individuals showed that few elements have differential expression among groups and expanded B elements were not highly transcribed. Putative silencing mechanisms may the acting on the B chromosome of A. latifasciata to prevent adverse consequences of repeat transcription and mobilization in the genome.
REVIEW | doi:10.20944/preprints201612.0113.v1
Subject: Medicine And Pharmacology, Oncology And Oncogenics Keywords: breast cancer; brain metastases; clonal evolution; precision medicine; genomics; tumour microenvironment
Online: 22 December 2016 (09:57:33 CET)
Brain metastases are highly evolved manifestations of breast cancer arising in a unique microenvironment, giving them exceptional adaptability in the face of new extrinsic pressures. The incidence is rising in line with population ageing, and use of newer therapies that stabilise metastatic disease burden with variable efficacy throughout the body. Historically, there has been a widely held view that brain metastases do not respond to circulating therapeutics because the blood-brain-barrier (BBB) restricts their uptake. However, emerging data are beginning to paint a far more complex picture where the brain acts as a sanctuary for dormant, subclinical proliferations that are initially protected by the BBB, but then exposed to dynamic selection pressures as tumours mature and vascular permeability increases. Here, we review key experimental approaches and landmark studies that have charted the genomic landscape of breast cancer brain metastases. These findings are contextualised with the factors impacting on clonal outgrowth in the brain: intrinsic breast tumour cell capabilities required for brain metastatic fitness, and the neural niche, which is initially hostile to invading cells but then engineered into a tumour-support vehicle by the successful minority. We also discuss how late detection, abnormal vascular perfusion and interstitial fluid dynamics underpin the recalcitrant clinical behaviour of brain metastases, and outline active clinical trials in the context of precision management.
REVIEW | doi:10.20944/preprints202308.1703.v1
Subject: Biology And Life Sciences, Life Sciences Keywords: neurodegenerative diseases; cardiovascular diseases; fructose metabolism; adaptation; evolutionary biology; genomics; gout; hyperuricemia
Online: 24 August 2023 (03:53:34 CEST)
The accumulation of loss of function mutations in the uricase gene well explains the high urate levels in hominoids (apes and humans) compared to other mammals. The loss of human uricase activity may have allowed humans to survive environmental stressors, evolution bottlenecks, and life-threatening pathogens. While high urate levels may contribute to developing cardiometabolic disorders such as hypertension and insulin resistance, low urate levels may increase the risk for neurodegenerative diseases. The double-edged sword effect of uric acid has resurrected a growing interest in the antioxidant role of urate and the role of the uricase gene in modulating the risk of obesity. Characterizing both the effect of uric acid levels and the uricase gene in different animal models may provide new insights into the potential therapeutic benefits of uric acid and novel uricase-based therapy.
ARTICLE | doi:10.20944/preprints202306.1828.v1
Subject: Biology And Life Sciences, Virology Keywords: Rabies; lyssaviruses; RABV; dog; jackal; canine; neutralization; genomics; next-generation sequencing; phylogeny
Online: 26 June 2023 (14:44:40 CEST)
Rabies is a fatal zoonosis that is considered a re-emerging infectious disease. Although rabies remains endemic in canines throughout much of the world, vaccination programs have essentially eliminated dog rabies in the Americas and much of Europe. However, despite the goal of eradicating dog rabies in the European Union by 2020, sporadic cases of dog rabies still occur in Eastern Europe, including Georgia. To assess the genetic diversity of strains recently circulating in Georgia, we sequenced 78 RABV-positive samples from brain tissues of rabid dogs and jackals using Illumina short-read sequencing of total RNA shotgun libraries. Seventy-seven RABV genomes were successfully assembled and annotated, 74 of them to the coding complete status. Phylogenetic analyses of the nucleoprotein (N) and attachment glycoprotein (G) genes placed all the assembled genomes into the Cosmopolitan clade, consistent with the Georgian origin of the samples. Amino acid alignment of the G glycoprotein ectodomain identified twelve different sequences for this domain among the samples. Only one of the ectodomain groups contained a residue change in an antigenic site, an R264H change in the G5 antigenic site. Three isolates were cultured, and these were found to be efficiently neutralized by human monoclonal antibody A6. Overall, our data show that recently circulating RABV isolates from Georgian canines are predominantly closely related phylogroup I viruses of the Cosmopolitan clade. Current rabies vaccines should offer protection against infection by Georgian canine RABVs. The genomes have been deposited in GenBank (accessions: OQ603609-OQ603685).
ARTICLE | doi:10.20944/preprints202107.0280.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: comparative genomics; metabolic reconstruction; bioinformatics; conserved unknowns; function prediction; functional annotation; orthology
Online: 13 July 2021 (08:58:31 CEST)
Members of the DUF34 (domain of unknown function 34) family, also known as the NIF3 protein superfamily, are ubiquitous across superkingdoms. Proteins of this family have been widely annotated as “GTP cyclohydrolase I type 2” through electronic propagation based on one study. Here, the annotation status of this protein family was examined through comprehensive literature review and integrative bioinformatic analyses that revealed varied pleiotropic associations and phenotypes. This analysis combined with functional complementation studies strongly challenges the current annotation and suggests that DUF34 family members may serve as metal ion insertases, chaperones, or metallocofactor maturases. This general molecular function could explain how DUF34 subgroups participate in highly diversified pathways such as cell differentiation, metal ion homeostasis, pathogen virulence, redox and universal stress responses.
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: mitochondria; mitochondrial DNA; nervous tissue, OxPhos complexes; bioenergetics; genomics; proteomics; mitochondrial diseases
Online: 17 June 2021 (15:12:01 CEST)
Oxidative phosphorylation (OxPhos) is the basic function of mitochondria although the land-scape of mitochondrial functions is continuously growing to include more aspects of cellular homeostasis. Thanks to the application of -omics technologies to the study of the OxPhos system, novel features emerge from the cataloging of novel proteins as mitochondrial thus adding de-tails to the mitochondrial proteome and defining novel metabolic cellular interrelations, espe-cially in the human brain. We focussed on the diversity of bioenergetics demand and different aspects of mitochondrial structure, functions, and dysfunction in the brain. Definition as ‘mitoexome’, ‘mitoproteome’ and ‘mitointeractome’ have entered the field of ‘mitochondrial medicine’. In this context, we reviewed several genetic defects that hamper the last step of aerobic metabolism mostly involving the nervous tissue as one of the most prominent energy-dependent tissues and, as consequence, as a primary target of mitochondrial dysfunction. The dual genetic determination of the OxPhos complexes is one of the reasons for the complexity of the geno-type-phenotype correlation when facing human diseases associated with mitochondria defects; clinically, are characterized by extremely heterogeneous symptoms, ranging from organ-specific to multisystemic dysfunction with different clinical courses. Finally, we briefly discuss the fu-ture directions of the multi-omics study of human brain disorders.
REVIEW | doi:10.20944/preprints202106.0363.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: Traditional food crops; Climate change; Food security; Omics; Translational genomics; Gene editing
Online: 14 June 2021 (13:02:24 CEST)
The indigenous communities across the globe especially in the rural areas consume locally available plants known as Traditional Food Plants (TFPs) for their nutritional and health-related needs. Recent research shows that many of the traditional food plants are highly nutritious as they contain health beneficial metabolites, vitamins, mineral elements and other nutrients. Excessive reliance on the mainstream staple crops has its own disadvantages. TFPs are nowadays considered important crops of the future and can act as supplementary foods for the burgeoning global population. They can also act as emergency foods in times of pandemics and other situations like COVID-19. The current situation necessitates locally available alternative nutritious TFPs for sustainable food production. To increase the cultivation or improve the traits in TFPs, it is essential to understand the molecular basis of the genes that regulate some important traits such as nutritional components and resilience to biotic and abiotic stresses. The integrated use of modern omics and gene editing technologies provide great opportunities to better understand the genetic and molecular basis of superior nutrient content, climate-resilient traits and adaptation to local agroclimatic zones. Recently, realising the importance and benefits of TFPs, scientists have shown interest in the prospection and sequencing of traditional food plants for their improvements, further cultivation and mainstreaming. Integrated omics such as genomics, transcriptomics, proteomics, metabolomics and ionomics are successfully used in plants and have provided a comprehensive understanding of gene-protein-metabolite networks. Combined use of omics and editing tools has led to successful editing of beneficial traits in few TFPs. This suggests that there is ample scope of integrated use of modern omics and editing tools/techniques for improvement of TFPs and their use for sustainable food production. In this article, we highlight the importance, scope and progress towards improvement of TFPs for valuable traits by integrated use of omics and gene editing techniques.
ARTICLE | doi:10.20944/preprints202106.0214.v1
Subject: Biology And Life Sciences, Anatomy And Physiology Keywords: formae speciales; horizontal gene transfer; endophytic; pathogenic; Fusarium; RNAseq; comparative genomics; vanilla
Online: 8 June 2021 (11:37:26 CEST)
Members of the Fusarium oxysporum species complex (FOSC) has the capacity to specialize into host-specific pathogens known as formae speciales through horizontal gene transfer between pathogenic and endophytic individuals. To this day, the origin of these formae speciales and the genetic determinants dictating the switch from endophytic to pathogenic Fusarium oxysporum (Fox) are still unknown. F. oxysporum f. sp. vanillae (Fov), member of FOSC, is the causal agent of root and stem rot disease, representing the main phytosanitary problem in vanilla plantations worldwide. Here we analyzed the RNA-seq libraries resulting from the interaction vanilla-Fov at early and late stages of the infection, and what we initially identified as control in a previous study, detecting the presence of Fox endophytes. We identified virulence, hypervirulence, sporulation, conidiation, necrosis, and production of fusaric acid as key processes taking place during Fov-vanilla interaction. Through comparison with endophytic Fox, we found that Fov can infect vanilla thanks to the presence of pathogenicity islands and genomic regions associated with supernumerary chromosomes. These play a central role as carriers of genes involved with pathogenic activity and could have being obtained by Fov through horizontal gene transfer. We also found that, unlike other pathogenic members of FOSC, Fov do not use Secreted in Xylem proteins (SIX) to infect vanilla.
ARTICLE | doi:10.20944/preprints202012.0387.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: next-generation sequencing; database; variant annotation; variant classification; data management; clinical genomics
Online: 15 December 2020 (13:14:21 CET)
The rapid evolution of Next Generation Sequencing in clinical settings and the resulting challenge of variants interpretation in the light of constantly updated information, requires robust data management systems and organized approaches to variant reinterpretation. In this paper, we present iVar: a freely available and highly customizable tool provided with a user-friendly web interface. It represents a platform for the unified management of variants identified by different sequencing technologies. iVar accepts, as input, VCF files and text annotation files and elaborates them, optimizing data organization and avoiding redundancies. Updated annotations can be periodically re-uploaded and associated to variants as historicize attributes. Data can be visualized through variant-centered and sample-centered interfaces. A customizable search functionality can be exploited to periodically check if pathogenicity related data of a variant are changed over time. Patient recontacting ensuing from variant reinterpretation is made easier by iVar through the effective identification of all patients present in the database and carrying a specific variant. We tested iVar by uploading 4171 VCF files and 1463 annotation files, obtaining a database of 4166 samples and 22569 unique variants. iVar has proven to be a useful tool with good performances for collecting and managing data from medium-throughput
REVIEW | doi:10.20944/preprints202011.0060.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: Cotton; Fiber initiation; Genomics; Epigenomics; Phytohormones; Transcription factors; MicroRNAs; Gene expression regulation
Online: 2 November 2020 (15:50:39 CET)
The epidermal cells on the surface of the cotton ovules undergo differentiation to produce fibers, which are single-celled hair-like protrusions resembling the plant trichomes. The initiation of these unicellular fibers from the cotton ovule surface is a complex and tightly regulated process. The initiation step is the cell fate-determining stage, which leads to the commitment of cells that eventually developed into fibers, thus becomes the most crucial phase in fiber development. The in-depth knowledge of molecular regulation is a prerequisite to get a clear view of the fiber initiation process's genetic and epigenetic control. The identification and functional validation of cotton fiber initiation-related genes, few fibreless mutants, transcription factors, microRNAs, epigenetic regulators, as well as the elucidation of the role of phytohormones as signaling molecules, has played a significant role in understanding the cotton fiber initiation process at the molecular level. This review focuses on the comprehensive information regarding the genetic and epigenetic regulation of cotton fiber initiation. Thus, the review will provide readers insight into mechanistic details that operate during cotton fiber initiation.
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: data science; reuse; sequencing data; genomics; bioinformatics; databases; computational biology; open science
Online: 16 July 2020 (12:39:43 CEST)
The 'big data revolution' has enabled novel types of analyses in the life sciences, facilitated by public sharing and reuse of datasets. Here, we review the prodigious potential of reusing publicly available datasets and the challenges, limitations and risks associated with it. Possible solutions to issues and research integrity considerations are also discussed. Due to the prominence, abundance and wide distribution of sequencing data, we focus on the reuse of publicly available sequence datasets. We define ‘successful reuse’ as the use of previously published data to enable novel scientific findings and use selected examples of such reuse from different disciplines to illustrate the enormous potential of the practice, while acknowledging their respective limitations and risks. A checklist to determine the reuse value and potential of a particular dataset is also provided. The open discussion of data reuse and the establishment of the practice as a norm has the potential to benefit all stakeholders in the life sciences.
REVIEW | doi:10.20944/preprints201911.0300.v1
Subject: Biology And Life Sciences, Insect Science Keywords: artificial selection; biological control; genetics; genome assembly; genomics; insect breeding; microbiome; modelling
Online: 24 November 2019 (17:10:31 CET)
Biological control is widely successful for controlling pests, but effective biocontrol agents are now more difficult to obtain due to more restrictive international trade laws. Coupled with increasing demand, the efficacy of existing and new biocontrol agents needs to be improved with genetic and genomic approaches. Although they have been underutilised in the past, applying genetic and genomic techniques is becoming more feasible from both technological and economic perspectives. We review current methods and provide a framework for using them, incorporating evolutionary and ecological principles. First, it is necessary to identify which biocontrol trait to select and in what direction. Next, the genes or markers linked to these traits need be determined to better target their selection, followed by how to implement this information into a breeding program. Choosing a trait can be assisted by modelling to account for the proper agro-ecological context, and by knowing which traits have sufficiently high heritability values. We provide guidelines for designing genomic strategies in biocontrol programs, which depends on the organism, budget, and desired objective. Genomic approaches start with genome sequencing and assembly. We provide a guide for deciding the most successful sequencing strategy for biocontrol agents. Gene discovery involves quantitative trait loci (QTL) analyses, transcriptomic and proteomic studies, and gene editing. Improving biocontrol practices include marker-assisted selection, genomic selection and microbiome manipulation of biocontrol agents, and monitoring for genetic variation during rearing and post-release. We conclude by identifying the most promising applications of genetic and genomic methods to improve biological control efficacy.
ARTICLE | doi:10.20944/preprints201809.0169.v1
Subject: Biology And Life Sciences, Animal Science, Veterinary Science And Zoology Keywords: European sardine; draft genome; teleosts; comparative genomics; long chain polyunsaturated fatty acids
Online: 10 September 2018 (12:37:23 CEST)
Clupeiformes, such as sardines and herrings, represent an important share of worldwide fisheries. Among those, the European sardine (Sardina pilchardus, Walbaum 1792) exhibits significant commercial relevance. While the last decade showed a steady and sharp decline in capture levels, recent advances in culture husbandry represent promising research avenues. Yet, the complete absence of genomic resources from sardine imposes a severe bottleneck to understand its physiological and ecological requirements. We generated 69 Gbp of paired-end reads using Illumina HiSeq X Ten and assembled a draft genome assembly with an N50 scaffold length of 25579 bp and BUSCO completeness of 82.1% (Actinopterygii). The estimated size of the genome ranges between 655 and 850 Mb. Additionally, we generated a relatively high-level liver transcriptome. To deliver a proof of principle of the value of this dataset, we established the presence and function of enzymes (elovl2, elovl5 and fads2) that have pivotal roles in the biosynthesis of long chain polyunsaturated fatty acids, essential nutrients particularly abundant in oily fish such as sardines. Our study provides the first omics dataset from a valuable economic marine teleost species, the European sardine, an essential resource for their effective conservation, management and sustainable exploitation.
ARTICLE | doi:10.20944/preprints201805.0471.v1
Subject: Medicine And Pharmacology, Clinical Medicine Keywords: genomics; genomic medicine; health outcomes; evidence; standards; eMERGE; ClinGen; precision public health
Online: 31 May 2018 (11:27:23 CEST)
Genomic medicine is moving from research to the clinic. There is a lack of evidence about the impact of genomic medicine interventions on health outcomes. This is due in part to a lack of standardized outcome measures that can be used across different programs to evaluate the impact of interventions targeted to specific genetic conditions. The eMERGE Outcomes working group (OWG) developed measures to collect information on outcomes following the return of genomic results to participants for several genetic disorders. These outcomes were compared to outcome intervention pairs for genetic disorders developed independently by the ClinGen Actionability working group (AWG). In general, there was concordance between the defined outcomes between the two groups. The ClinGen outcomes tended to be higher level and the AWG scored outcomes represented a subset of outcomes referenced in the accompanying AWG evidence review. eMERGE OWG outcomes were more detailed and discrete, facilitating collection of relevant information from health records. This paper demonstrates that common outcomes for genomic medicine interventions can be identified. Further work is needed to standardize outcomes across genomic medicine implementation projects and make these publicly available to enhance dissemination and assist in making precision public health a reality.
ARTICLE | doi:10.20944/preprints201803.0009.v1
Subject: Biology And Life Sciences, Biochemistry And Molecular Biology Keywords: gene flow; sympatry; parapatry; simulation model; population genomics; Heliconius; coupling; nonlinear transitions
Online: 1 March 2018 (15:23:13 CET)
During speciation-with-gene-flow, a transition from single-locus to multi-locus processes can occur, as strong coupling of multiple loci creates a barrier to gene flow. Testing predictions about such transitions with empirical data requires building upon past theoretical work and the continued development of quantitative approaches. We simulated genomes under different evolutionary scenarios of gene flow and divergent selection, extending previous work with the additions of neutral sites and coupling statistics, allowing us to investigate if and how selected and neutral sites differ in the conditions they require for transitions during speciation. As the per-locus strength of selection grew and/or migration decreased, it became easier for selected sites to show divergence – and thus to rise in linkage disequilibrium (LD) with each other as a statistical consequence – farther in advance of the conditions under which neutral sites could diverge. Indeed, even very low rates of gene flow were sufficient to prevent differentiation at neutral sites. However, once strong enough, coupling among selected sites eventually reduced gene flow at neutral sites as well. To explore whether similar transitions might be detectable in empirical data, we used published genome resequencing data from three taxa of Heliconius butterflies. We found that allele-frequency outliers and F ST outliers exhibited stronger patterns of LD than the genomic background, as expected. The statistical characteristics of LD – likely indicative of the strength of coupling of barrier loci – varied between chromosomes and taxonomic comparisons. Broad qualitative agreement between the patterns we observed in the empirical data and our simulations suggests that selection drives rapid genome-wide transitions to multi-locus coupling, illustrating how divergence and gene flow interact along the speciation continuum.