Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Gene Annotation and Transcriptome Delineation on a de novo Genome Assembly for the Reference Leishmania major Friedlin Strain

Version 1 : Received: 21 July 2021 / Approved: 26 July 2021 / Online: 26 July 2021 (10:23:40 CEST)

A peer-reviewed article of this Preprint also exists.

Camacho, E.; González-de la Fuente, S.G.-D.; Solana, J.C.; Rastrojo, A.; Carrasco-Ramiro, F.; Requena, J.M.; Aguado, B. Gene Annotation and Transcriptome Delineation on a De Novo Genome Assembly for the Reference Leishmania major Friedlin Strain. Genes 2021, 12, 1359. Camacho, E.; González-de la Fuente, S.G.-D.; Solana, J.C.; Rastrojo, A.; Carrasco-Ramiro, F.; Requena, J.M.; Aguado, B. Gene Annotation and Transcriptome Delineation on a De Novo Genome Assembly for the Reference Leishmania major Friedlin Strain. Genes 2021, 12, 1359.

Abstract

Leishmania major is the main causative agent of cutaneous leishmaniasis in humans. The Friedlin strain of this species (LmjF) was chosen when a multi-laboratory consortium undertook the objective of deciphering the first genome sequence for a parasite of the genus Leishmania. The objective was successfully attained in 2005, and this represented a milestone for Leishmania molecular biology studies around the world. Although the LmjF genome sequence was done following a shotgun strategy and using classical Sanger sequencing, the results were excellent and this genome assembly served as the reference for subsequent genome assemblies in other Leishmania species. Here, we present a new assembly for the genome of this strain (named LMJFC for clarity), generated by the combination of two high throughput sequencing platforms, Illumina short-read sequencing and PacBio Single Molecular Real-Time (SMRT) sequencing, which provides long-read sequences. Apart from resolving uncertain nucleotide positions, several genomic regions have been reorganized and a more precise composition of tandemly repeated gene loci was attained. Additionally, the genome annotation has been improved by adding 542 genes and more accurate coding-sequences defined for around two hundred genes, based on the transcriptome delimitation also carried out in this work. As a result, we are providing gene models (including untranslated regions and introns) for 11,238 genes. Genomic information ultimately determines the biology of every organism; therefore, our understanding of molecular mechanisms will depend on the availability of precise genome sequences and accurate gene annotations. In this regards, this work is providing an improved genome sequence and updated transcriptome annotations for the reference L. major Friedlin strain.

Keywords

genome; transcriptome; gene models; Leishmania; Illumina sequencing; PacBio sequencing; expression levels; untranslated regions (UTRs); SL-additions sites; polyadenylation sites

Subject

Biology and Life Sciences, Biochemistry and Molecular Biology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.