Preprint
Article

This version is not peer-reviewed.

AmazonForest: In-silico Meta-Prediction of Pathogenic Variants

Submitted:

18 November 2020

Posted:

19 November 2020

You are already at the latest version

Abstract
ClinVar is a web platform that stores around 774k curated entries, which allows exploring genetic variants and their associations with complex phenotypes. A partial set of ClinVar’s genetic associations were reported with conflict of interpretation or uncertain clinical impact significance, which currently challenges clinicians and geneticists. Here, we evaluate the performance of data pre-processing methods combined with classical prediction methods, such as Naive Bayes, Random Forest, and Support Vector Machine to build a meta-prediction model aiming to improve genetic pathogenicity interpretation. Models were trained with ClinVar data (September 2020), and genetic variants were annotated with eight functional impact predictors catalogued with SnpEff/SnpSift (v4.3). A 10-fold cross-validation strategy was performed for evaluation by accuracy, F1-Score, Receiver Operating Characteristic, Area Under Curve. The best meta-prediction model raises by combining one-hot encoding with tree-based classifiers as Random Forest, which shows Area Under Curve ≥ 0,93. We predict pathogenicity for 109k genetic variants, which were found labeled as uncertain significance or conflict of interpretation. Additionally, we implemented AmazonForest (https://www.lghm.ufpa.br/amazonforest), a web tool to query data for a set of 5k variants that were predicted with high pathogenic probability (RFprob >= 0.9).
Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated