Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Guideline for genome transposon annotation derived from evaluation of popular TE identification tools

Version 1 : Received: 10 August 2020 / Approved: 12 August 2020 / Online: 12 August 2020 (08:07:14 CEST)

How to cite: Yan, H.; Torchiana, F.; Bombarely, A. Guideline for genome transposon annotation derived from evaluation of popular TE identification tools. Preprints 2020, 2020080275 (doi: 10.20944/preprints202008.0275.v1). Yan, H.; Torchiana, F.; Bombarely, A. Guideline for genome transposon annotation derived from evaluation of popular TE identification tools. Preprints 2020, 2020080275 (doi: 10.20944/preprints202008.0275.v1).

Abstract

Background: Transposable elements (TEs) constitute the vast majority of all eukaryotic DNA, and display extreme diversity, with thousands of families. Given their abundance and diversity, TEs discovery and annotation becomes challengeable. At present, tools and databases have built libraries to mask TEs in genomes based on de novo- and homology-based identification strategies, but no consensus criteria about which tools should be used have been proposed. Results: In the de novo-based strategy, we compared performances of TE libraries developed by four commonly used tools, including RepeatModeler, LTR_FINDER, LTRharvest, and MITE_Hunter, by using a simulated genome as a standard control. The results showed that the performance of RepeatModeler decreased as it was combined with either LTR_FINDER or LTRharvest. Combination of RepeatModeler and MITE_Hunter showed better performance than RepeatModeler and MITE_Hunter alone. In the homology-based strategy, we evaluated different sources from a taxonomic point of view to build an accurate TE library. When we selected a library from databases to identify TEs for Arabidopsis thaliana genome, the library from a genus genetically closer to Arabidopsis achieved better performance than other genera with further genetic distance. Without the Arabidopsis, combination of top three genera closer to Arabidopsis showed better performance than combination of all genera. Conclusion: This study proposes a series of recommendations to perform an accurate TE annotation: 1) For de novo-based strategy, RepeatModeler and MITE_Hunter are suggested to build a TE library; 2) For homology-based strategy, it is recommended to use library of genus genetically close to the species rather than use combined library from all genera.

Subject Areas

transposable elements; genome annotation; software evaluation

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.