Beyond Random Splits: A Critical Evaluation of Graph Learning Models in Predicting Mutation-Induced Drug Resistance

Zongrui Cheng; Haoxin Wu; Dengming Ming

doi:10.20944/preprints202604.0069.v1

Submitted:

31 March 2026

Posted:

02 April 2026

You are already at the latest version

Abstract

Background: Deep learning has become an important tool for predicting mutation-induced changes in binding free energy (ΔΔG). However, most current state-of-the-art methods rely heavily on paired wild-type (WT) and mutant (MT) complex structures during both training and inference. This dependence on post-mutation structural information substantially limits their practical utility in real-world scenarios, such as clinical diagnosis and early-stage drug screening, where mutant structures are difficult to obtain experimentally in a timely manner. Methods: To evaluate model performance in more realistic and challenging translational settings, we conducted a systematic benchmark of graph-based deep learning models under a WT-only inductive setting. We constructed a full-protein heterogeneous graph framework that incorporates long-range spatial constraints to implicitly infer mutational effects from static wild-type structures. We compared it against a sequence-based vector baseline model. Results: Through a systematic evaluation on the MdrDB dataset, we revealed a critical generalization gap. Although random splitting yielded relatively high predictive correlation due to homologous data leakage (Pearson R ≈ 0.55), model performance dropped sharply under a strict UniProt-based cross-protein split designed to simulate prediction on truly unseen targets (Pearson R ≈ 0.15). Although the absolute performance remained limited, the graph-based model showed a weak but consistent improvement over the sequence baseline, which was close to random guessing (Pearson R ≈ 0.04). Conclusions: Further analyses suggest that the performance bottleneck may partly arise from intrinsic experimental noise in the dataset (i.e., label inconsistency) and from the absence of conformational entropy (dynamic) information in static WT structures. This study indicates that random splitting can lead to substantial overestimation of model generalizability. It highlights the need to integrate physical priors and dynamic features to overcome the current limitations of drug resistance prediction when explicit mutant structures are unavailable.

Keywords:

drug resistance prediction

;

graph neural networks (GNN)

;

inductive generalization

;

data leakage

;

ΔΔG prediction

Subject:

Biology and Life Sciences - Biochemistry and Molecular Biology

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Beyond Random Splits: A Critical Evaluation of Graph Learning Models in Predicting Mutation-Induced Drug Resistance

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe