Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Aligning the Aligners: Comparison of RNA Sequencing Data Alignment and Gene Expression Quantification Tools for Clinical Breast Cancer Research

Version 1 : Received: 1 March 2019 / Approved: 4 March 2019 / Online: 4 March 2019 (10:23:25 CET)

A peer-reviewed article of this Preprint also exists.

Raplee, I.D.; Evsikov, A.V.; Marín de Evsikova, C. Aligning the Aligners: Comparison of RNA Sequencing Data Alignment and Gene Expression Quantification Tools for Clinical Breast Cancer Research. J. Pers. Med. 2019, 9, 18. Raplee, I.D.; Evsikov, A.V.; Marín de Evsikova, C. Aligning the Aligners: Comparison of RNA Sequencing Data Alignment and Gene Expression Quantification Tools for Clinical Breast Cancer Research. J. Pers. Med. 2019, 9, 18.

Abstract

The rapid expansion of transcriptomics from increased affordability of next-generation sequencing (NGS) technologies generates rocketing amounts of gene expression data across biology and medicine, and notably in cancer research. Concomitantly, many bioinformatics tools were developed to streamline gene expression analysis and quantification. We tested the concordance of NGS RNA sequencing (RNA-seq) analysis outcomes between the two predominant programs for reads alignment, HISAT2 and STAR, and the two most popular programs for quantifying gene expression in NGS experiments, edgeR and DESeq2, using RNA-seq data from a series of breast cancer progression specimens, which include histologically confirmed normal, early neoplasia, ductal carcinoma in situ and infiltrating ductal carcinoma samples microdissected from formalin fixed, paraffin embedded (FFPE) breast tissue blocks. We identified significant differences in aligners’ performance: HISAT2 was prone to misalign reads to retrogene genomic loci, STAR generated more precise alignments, especially for early neoplasia samples. edgeR and DESeq2 produced similar lists of differentially expressed genes in stage comparisons, with edgeR producing more conservative, though shorter, lists of genes. Albeit, Gene Ontology (GO) enrichment analysis revealed no skewness in significant GO categories identified among differentially expressed genes by edgeR vs DESeq2. As transcriptome analysis of archived FFPE samples becomes a vanguard of precision medicine, identification and fine-tuning of bioinformatics tools becomes critical for clinical research. Our results indicate that STAR and edgeR are well-suited tools for differential gene expression analysis from FFPE samples.

Keywords

breast neoplasms, ductal carcinoma in situ (DCIS), gene expression profiling, high-throughput nucleotide sequencing, infiltrating ductal carcinoma (IDC), paraffin embedding, sequence alignment, transcriptome

Subject

Medicine and Pharmacology, Oncology and Oncogenics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.