Preprint
Article

Guideline for genome transposon annotation derived from evaluation of popular TE identification tools

This version is not peer-reviewed.

Submitted:

10 August 2020

Posted:

12 August 2020

You are already at the latest version

Abstract
Background: Transposable elements (TEs) constitute the vast majority of all eukaryotic DNA, and display extreme diversity, with thousands of families. Given their abundance and diversity, TEs discovery and annotation becomes challengeable. At present, tools and databases have built libraries to mask TEs in genomes based on de novo- and homology-based identification strategies, but no consensus criteria about which tools should be used have been proposed. Results: In the de novo-based strategy, we compared performances of TE libraries developed by four commonly used tools, including RepeatModeler, LTR_FINDER, LTRharvest, and MITE_Hunter, by using a simulated genome as a standard control. The results showed that the performance of RepeatModeler decreased as it was combined with either LTR_FINDER or LTRharvest. Combination of RepeatModeler and MITE_Hunter showed better performance than RepeatModeler and MITE_Hunter alone. In the homology-based strategy, we evaluated different sources from a taxonomic point of view to build an accurate TE library. When we selected a library from databases to identify TEs for Arabidopsis thaliana genome, the library from a genus genetically closer to Arabidopsis achieved better performance than other genera with further genetic distance. Without the Arabidopsis, combination of top three genera closer to Arabidopsis showed better performance than combination of all genera. Conclusion: This study proposes a series of recommendations to perform an accurate TE annotation: 1) For de novo-based strategy, RepeatModeler and MITE_Hunter are suggested to build a TE library; 2) For homology-based strategy, it is recommended to use library of genus genetically close to the species rather than use combined library from all genera.
Keywords: 
Subject: 
Biology and Life Sciences  -   Plant Sciences
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Alerts
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2025 MDPI (Basel, Switzerland) unless otherwise stated