Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Annotation of Human Exome Gene Variants with Consensus Pathogenicity

Version 1 : Received: 29 July 2020 / Approved: 31 July 2020 / Online: 31 July 2020 (06:13:53 CEST)

A peer-reviewed article of this Preprint also exists.

Jaravine, V.; Balmford, J.; Metzger, P.; Boerries, M.; Binder, H.; Böker, M. Annotation of Human Exome Gene Variants with Consensus Pathogenicity. Genes 2020, 11, 1076. Jaravine, V.; Balmford, J.; Metzger, P.; Boerries, M.; Binder, H.; Böker, M. Annotation of Human Exome Gene Variants with Consensus Pathogenicity. Genes 2020, 11, 1076.

Abstract

Pathogenicity is unknown for the majority of human gene variants. For prioritization of sequenced somatic and germline mutation variants, in silico approaches can be utilized. In this study, 84 million non-synonymous Single Nucleotide Variants (SNVs) in the human coding genome were annotated using consensus Variant Effect Prediction (cVEP) method. An algorithm, implemented as a stacked ensemble of supervised learners, performed combination of the 39 functional, conservation mutation impact scores from dbNSFP4.0. Adding gene indispensability score, accounting for differences in the pathogenicities of the variants in the essential and the mutation-tolerant genes, improved the predictions. For each SNV the consensus combination gives either a continuous-value pathogenicity score, or a categorical score in five classes: pathogenic, likely pathogenic, uncertain significance, likely benign, benign. The provided class database is aimed for direct use in clinical practice. The trained prediction models were 5-fold cross-validated on the evidence-based categorical annotations from the ClinVar database. The rankings of the scores based on their ability to predict pathogenicity were obtained. A two-step strategy using the rankings, scores and class annotations is suggested for filtering and prioritization of the human exome mutations in clinical and biological applications of NGS technology.

Keywords

Variant of Unknown Significance (VUS); Single-Nucleotide Variant (SNV); Variant Effect Prediction (VEP); Stacked Ensemble of Supervised Deep Learners (SESDL); Next Generation Sequencing (NGS); Alternative Allele Frequency (AAF).

Subject

Biology and Life Sciences, Biochemistry and Molecular Biology

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.