Preprint Article Version 1 This version is not peer-reviewed

Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate

Version 1 : Received: 19 November 2017 / Approved: 24 November 2017 / Online: 24 November 2017 (11:14:05 CET)

A peer-reviewed article of this Preprint also exists.

Yu, C.Y.; Li, X.X.; Yang, H.; Li, Y.H.; Xue, W.W.; Chen, Y.Z.; Tao, L.; Zhu, F. Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate. Int. J. Mol. Sci. 2018, 19, 183. Yu, C.Y.; Li, X.X.; Yang, H.; Li, Y.H.; Xue, W.W.; Chen, Y.Z.; Tao, L.; Zhu, F. Assessing the Performances of Protein Function Prediction Algorithms from the Perspectives of Identification Accuracy and False Discovery Rate. Int. J. Mol. Sci. 2018, 19, 183.

Journal reference: Int. J. Mol. Sci. 2018, 19, 183
DOI: 10.3390/ijms19010183

Abstract

The knowledge of protein function is essential for the study of biological processes, the understanding of disease mechanism and the exploration of novel therapeutic target. Apart from experimental methods, a number of in-silico approaches have been developed and extensively used for protein function prediction. Among these approaches, BLAST predicts functions based on protein sequence similarity, and machine learning predicts functional families from protein sequences irrespective of their similarity, which complements BLAST and other methods in predicting diverse classes of proteins including distantly related proteins and homologous proteins of different functions. However, their identification accuracies and the false discovery rate have not yet been assessed so far, which greatly limits the usage of these prediction algorithms. Herein, a comprehensive comparison of the performances among four popular functional prediction algorithms (BLAST, SVM, PNN and KNN) was conducted. In particular, the performance of these algorithms were systematically assessed by four metrics (sensitivity, specificity, accuracy and Matthews correlation coefficient) based on the independent test datasets generated from 93 protein families defined by UniProtKB Keywords. Moreover, the false discovery rates of these algorithms were evaluated by scanning the genomes of four representative model species (homo sapiens, arabidopsis thaliana, saccharomyces cerevisiae and mycobacterium tuberculosis). As a result, the substantially higher sensitivity and stability of BLAST and SVM were observed compared with that of PNN and KNN. But the machine learning algorithms (PNN, KNN and SVM) were found capable of significantly reducing the false discovery rate (SVM < PNN ≈ KNN). In summary, this study comprehensively assessed the performance of four popular algorithms applied to protein function prediction, which could facilitate the selection of the most appropriate method in the related biomedical research.

Subject Areas

false discovery rate; machine learning; protein function prediction; support vector machine; BLAST

Readers' Comments and Ratings (0)

Leave a public comment
Send a private comment to the author(s)
Rate this article
Views 0
Downloads 0
Comments 0
Metrics 0
Leave a public comment

×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.