Preprint Data Descriptor Version 1 Preserved in Portico This version is not peer-reviewed

Mash Sketched Reference Dataset for Genome-Based Taxonomy and Comparative Genomics

Version 1 : Received: 12 June 2021 / Approved: 14 June 2021 / Online: 14 June 2021 (14:54:32 CEST)

How to cite: Sánchez-Reyes, A.; Fernández-López, M.G. Mash Sketched Reference Dataset for Genome-Based Taxonomy and Comparative Genomics. Preprints 2021, 2021060368. https://doi.org/10.20944/preprints202106.0368.v1 Sánchez-Reyes, A.; Fernández-López, M.G. Mash Sketched Reference Dataset for Genome-Based Taxonomy and Comparative Genomics. Preprints 2021, 2021060368. https://doi.org/10.20944/preprints202106.0368.v1

Abstract

The analysis of curated genomic, metagenomic, and proteomic data are of paramount importance in the fields of biology, medicine, education, and bioinformatics. Although this type of data is usually hosted in raw form in free international repositories, its access requires plenty of computing, storage, and processing capacities for the domestic user. The purpose of the study is to offer a comprehensive set of genomic and proteomic reference data, in an accessible and easy-to-use form to the scientific community. A representative type material set of genomes, proteomes and metagenomes were directly downloaded from the site: https://www.ncbi.nlm.nih.gov/assembly/ and from Genome Taxonomy Database, associated with the major groups of Bacteria, Archaea, Virus, and Fungi. Sketched databases were subsequently created and stored on handy raw reduced representations, by using Mash software. Our dataset contains near to 100 GB of space disk reduced to 585.78 MB and represents 87,476 genomics/proteomic records from eight informative contexts, which have been prefiltered to make them accessible, usable, and user-friendly with computational resources. Potential uses of this dataset include but are not limited to, microbial species delimitation, estimation of genomic distances, genomic novelties, paired comparisons between proteomes, genomes, and metagenomes.

Keywords

Microbial Mash database, Mash distance, Genome containment, Type material, Microbial taxonomy

Subject

Biology and Life Sciences, Biochemistry and Molecular Biology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.