Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

HoSeIn: A Workflow for Integrating Various Homology Search Results from a High-Throughput Sequence Dataset

Version 1 : Received: 15 December 2018 / Approved: 17 December 2018 / Online: 17 December 2018 (09:51:22 CET)

How to cite: McCarthy, C.; Rozadilla, G.; Clemente, J.M. HoSeIn: A Workflow for Integrating Various Homology Search Results from a High-Throughput Sequence Dataset. Preprints 2018, 2018120184. https://doi.org/10.20944/preprints201812.0184.v1 McCarthy, C.; Rozadilla, G.; Clemente, J.M. HoSeIn: A Workflow for Integrating Various Homology Search Results from a High-Throughput Sequence Dataset. Preprints 2018, 2018120184. https://doi.org/10.20944/preprints201812.0184.v1

Abstract

Data generated by metagenomic and metatranscriptomic experiments is both enormous and inherently noisy [1]. When using taxonomy-dependent alignment-based methods to classify and label reads, such as MEGAN [2], the first step consists in performing homology searches against sequence databases. To obtain the most information from the samples, nucleotide sequences are usually compared to various databases (i.e., nucleotide and protein) using local sequence aligners such as BLASTN and BLASTX [3]. Nevertheless, the analysis and integration of these results can be problematic because the outputs from these searches usually show differences, which can be notorious when working with RNA-seq (Personal observation; Graphical abstract). These inconsistencies led us to develop the HoSeIn workflow to determine the unequivocal taxonomic and functional profile of environmental samples, based on the assumption that the sequences that correspond to a certain taxon are composed of (Graphical abstract): 1) sequences that were assigned to the same taxon by both homology searches, plus 2) sequences that were assigned to that taxon by one of the homology searches but returned no hits in the other one, and vice versa.

Keywords

Metagenomics; Metatranscriptomics; Environmental sample; Homology searches; Taxonomic profile; Functional profile

Subject

Computer Science and Mathematics, Mathematical and Computational Biology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.