Preprint
Article

Effects of Data Quality and Quantity on Deep Learning for Protein-Ligand Binding Affinity Prediction

This version is not peer-reviewed.

Submitted:

16 February 2022

Posted:

16 February 2022

Read the latest preprint version here

A peer-reviewed article of this preprint also exists.

Abstract
Prediction of protein-ligand binding affinities is crucial for computational drug discovery. A number of deep learning approaches have been developed in recent years to improve the accuracy of such affinity prediction. While the predicting power of these systems have advanced to some degrees depending on the dataset used for model training and testing, the effects of the quality and quantity of the underlying data have not been thoroughly examined. In this study, we employed erroneous datasets and data subsets of different sizes, created from one of the largest databases of experimental binding affinities, to train and evaluate a deep learning system based on convolutional neural networks. Our results show that data quality and quantity do have significant impacts on the performance of trained models. Depending on the variations in data quality and quantity, the performance discrepancies could be comparable to or even larger than those observed among dif-ferent deep learning approaches. This implies that continued accumulation of high-quality affinity data, especially for proteins without any affinity data, is important for improving deep learning models to better predict protein-ligand binding affinities.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Downloads

647

Views

640

Comments

1

Subscription

Notify me about updates to this article or when a peer-reviewed version is published.

Email

Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2025 MDPI (Basel, Switzerland) unless otherwise stated