Advances in Molecular Docking Methodologies

Ekaterina Grigorenko; Alexander Novikov

doi:10.20944/preprints202604.1840.v1

Submitted:

24 April 2026

Posted:

27 April 2026

You are already at the latest version

Abstract

Computer-aided drug design (CADD) is undergoing a fundamental paradigm shift driven by the transition from classical biophysical methods to deep learning architectures and generative artificial intelligence. This review analyzes the evolution of molecular docking algorithms. We examine traditional programs (AutoDock Vina, Glide, GOLD) based on stochastic conformational search and empirical scoring functions, which retain the status of gold standard due to the high physical validity of the generated predictions. Software solutions for high-throughput virtual screening, such as distributed pipelines like EasyDock and graphical interfaces like EasyDockVina, are analyzed. Particular attention is paid to the latest generative AI models (DiffDock, GNINA, AlphaFold 3, DynamicBind, FABFlex), which address the computational challenges of blind docking and macromolecular receptor flexibility. We assess the systemic crisis of neural network generalization ability identified in independent benchmarks (PoseBusters, Bento, NextTopDocker) and substantiate the need to integrate the laws of molecular physics into the latent spaces of models. We conclude that the formation of hybrid pipelines, combining the speed of AI with the rigor of classical mechanics, is a necessary development.

Keywords:

molecular docking

;

virtual screening

;

deep learning

;

generative AI

;

AutoDock Vina

;

AlphaFold 3

;

receptor flexibility

;

structure-based drug design

;

physical plausibility

;

computer-aided drug design

Subject:

Chemistry and Materials Science - Other

1. Introduction

The process of developing new pharmacological agents has historically been associated with colossal time, computational, and financial costs. Modern scientific and industrial statistics show that the development of a single therapeutic agent, from the initial identification of the biological target to successful market launch, takes an average of over a decade and requires investments exceeding one billion US dollars. [1] In the early stages of this lengthy cycle, specifically in hit discovery and lead optimization, virtual screening (VS) plays a crucial and irreplaceable role. [2] This process represents a large-scale computational filtration of giant chemical compound libraries, ranging from hundreds of thousands to tens of billions of molecules, to identify unique structures with high thermodynamic affinity for a specific therapeutic protein target.

Molecular docking is the central computational method in structure-based drug design (SBDD). [3] The fundamental physicochemical task of molecular docking is the algorithmic prediction of the optimal three-dimensional conformation and spatial orientation (pose) of a small organic molecule (ligand) within the active or allosteric binding site of a macromolecule (receptor). Beyond the purely geometric problem, docking requires an accurate quantitative assessment of the thermodynamic favorability of forming such a protein-ligand complex. [4] For over thirty years, this complex multidimensional computational problem was solved almost exclusively by methods of classical computational physics and empirical quantum chemistry. The traditional software architecture strictly divided the computational process into two independent stages: a conformational search algorithm for finding the global energy minimum, and the application of a mathematical scoring function for the final ranking of the found configurations.

However, these classical biophysical approaches inevitably encountered fundamental computational barriers. The exponential growth in the number of degrees of freedom when attempting to account for the natural dynamic mobility of the protein chain (conformational receptor flexibility) made physically accurate simulations completely inapplicable for large-scale screening. [3] Algorithms were forced to make radical reductions: treat the protein as a completely rigid body, ignore quantum-chemical polarization effects, and neglect complex changes in system entropy.

The colossal technological breakthrough in artificial intelligence (AI), marked by the success of the AlphaFold 2 system, [5] triggered an unprecedented development of deep learning (DL) architectures in the field of intermolecular interaction prediction. Between 2024 and 2026, the scientific community observed a tectonic shift from classical models to generative AI methods. Innovative models demonstrate the ability to approximate fundamental physicochemical laws directly from colossal arrays of structural data, generating highly accurate binding poses in fractions of a second while simultaneously addressing the historical problem of accounting for protein dynamics. [6] This review provides a comprehensive analysis of this technological transition.

2. Materials and Methods

This review was conducted in accordance with the PRISMA-Scoping (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) guidelines. [7] The review protocol was not registered.

2.1. Research Questions

The aim of this work was to map the current landscape of molecular docking tools, trace the evolution from classical biophysical algorithms to generative artificial intelligence methods, and identify key unresolved issues. To achieve this aim, the following research questions were formulated:

What are the main classes of molecular docking algorithms currently in existence, and what is their architectural evolution?
What are the comparative characteristics of accuracy, computational scalability, and physical validity of classical and AI-oriented tools according to independent benchmarks?
What infrastructure solutions (pipelines, graphical interfaces) have been developed for high-throughput virtual screening, and what is their efficiency?
What approaches are used to model conformational receptor flexibility and perform blind docking?
What are the current trends and predicted directions for the development of hybrid pipelines that combine classical force fields and neural network models?

2.2. Information Sources and Search Strategy

The literature search was conducted from November 2025 to February 2026 in the following electronic databases: PubMed, Scopus, Web of Science, arXiv.org, ChemRxiv, and the Google Scholar search engine. Additionally, the bibliographies of selected publications and software repositories (GitHub, Zenodo) were analyzed to identify tools not described in peer-reviewed articles.

The search strategy included combinations of keywords and their synonyms in English (as the vast majority of publications on the topic are in English). The following search queries were used:

(“molecular docking” OR “protein-ligand docking”) AND (“virtual screening” OR “high-throughput screening”)
(“docking software” OR “docking tools”) AND (“AutoDock Vina” OR “Glide” OR “GOLD”)
(“deep learning” OR “machine learning” OR “generative AI”) AND (“molecular docking” OR “drug design”)
(“DiffDock” OR “GNINA” OR “AlphaFold 3” OR “DynamicBind” OR “FABFlex”) AND (“docking” OR “binding pose”)
(“benchmark” OR “PoseBusters” OR “comparative assessment”) AND (“docking accuracy” OR “scoring function”)
(“receptor flexibility” OR “induced fit” OR “blind docking”) AND (“molecular docking”)
(“pipeline” OR “workflow” OR “high-throughput”) AND (“EasyDock” OR “virtual screening”)

The search was limited to publications released from 2015 onwards to cover the period of active integration of machine learning into docking. The languages of publication were English and Russian (for describing tools developed in Russian-speaking laboratories).

2.3. Selection Criteria

The following types of publications were included:

Original research articles describing new algorithms, programs, or web servers for molecular docking.
Review articles that systematically analyze docking methods.
Preprints (arXiv, ChemRxiv, bioRxiv) if they contained a description of a tool or benchmark that, at the time of the search, did not have a peer-reviewed version.
Conference proceedings (ICLR, NeurIPS, MLSB) describing new models.
Software documentation and repositories to clarify technical specifications.

Excluded were:

Publications focused solely on the application of existing tools for a specific pharmacological problem (case studies) without methodological innovations.
Conference abstracts of less than 2 pages.
Works that did not contain information on evaluating the accuracy or speed of the tool.

2.4. Selection Process and Data Extraction

The selection of publications was conducted in two stages. At the first stage, titles and abstracts of all identified records were screened. At the second stage, the full texts of potentially relevant articles were analyzed.

For each publication included in the final list, structured data extraction was performed. The recorded parameters were: name of the software tool or algorithm; year of first publication; methodology type (classical, hybrid, generative); availability of public source code; method for accounting for receptor conformational mobility; scoring function architecture; reported efficiency (accuracy in reproducing experimental poses and computational performance); and the results of third-party comparative tests (benchmarks), if available in the text.

2.5. Synthesis of Results

The obtained data were grouped into thematic categories corresponding to the structure of the review: classical platforms, infrastructure solutions, hybrid approaches with machine learning, generative models, methods for accounting for receptor flexibility, and specialized modalities (covalent docking, peptides, etc.). For each tool or approach, a qualitative comparison was made based on the criteria of accuracy, physical realism, and computational efficiency, using data from original sources and independent benchmarks (PoseBusters, Bento, NextTopDocker). Particular attention was paid to analyzing limitations and identifying unresolved problems.

3. Classical Docking Platforms

Despite the active and widespread integration of neural networks, classical software packages retain their status as the industry standard. This is due to their exceptional interpretability, predictable algorithmic reliability, and strict adherence to the fundamental laws of stereochemistry.

3.1. AutoDock Vina and Empirical Approaches

The AutoDock Vina software suite is historically one of the most cited and methodologically significant tools in academia. [8] Vina’s key technological advantage was its outstanding computational efficiency. The program computes internal potential grids “on the fly,” which radically reduced the operational preparation time for virtual screening. Vina’s conformational search is based on a complex heuristic method of iterative local optimization, combining stochastic search with a gradient descent algorithm (BFGS). [9] Although AutoDock Vina consistently demonstrates high accuracy in determining geometrically correct poses, its linear scoring function is systematically criticized for its inability to correctly rank compounds by real thermodynamic affinity in complex cases.

3.2. Commercial Solutions: Glide and GOLD

In the corporate sector, proprietary platforms dominate. The Glide program from Schrödinger is based on the concept of hierarchical funnels for the systematic pruning of unpromising regions of conformational space. [10] Glide’s main technological advantage is its Extra Precision (XP) mode, which includes advanced descriptors for modeling hydrophobic enclosure and the displacement of high-energy structural water molecules. [9]

In turn, the GOLD package is unique in its approach based on genetic algorithms (GA). Translation, rotation, and torsion angle parameters are encoded as “chromosomes” that evolve to minimize a fitness function. [11] The tool stands out for its exceptional ability to predict complex multicentric hydrogen bond networks.

Comparative analysis shows that classical tools maintain an undeniable advantage over AI models in terms of physical validity. When tested on the rigorous PoseBusters dataset, classical programs generated chemically unrealistic structures in only 2-3% of cases, whereas many neural network models allowed atom collisions in an attempt to minimize the RMSD metric. [12]

4. Scalable Infrastructure Solutions for Screening

A key technological limitation of basic classical tools is the lack of built-in support for automated high-throughput virtual screening (HTVS). To systematically address the logistical problems when working with millions of molecules, powerful infrastructure pipelines were developed.

4.1. Local Interfaces: EasyDockVina

The EasyDockVina tool is a graphical user interface (GUI) created for local screening. [13] Its goal is to democratize computer-aided drug design for specialists without programming skills. The program’s architecture ensures universal format conversion (reading over 100 different file types) and converts them into the .pdbqt format. It automates batch structure preparation, applying standardized procedures for adding polar hydrogen atoms and assigning partial charges.

4.2. Distributed Computing: EasyDock

While EasyDockVina focuses on the simplicity of local use, the EasyDock software suite is a highly modular Python framework designed to automate molecular docking in scalable computing environments. EasyDock’s architecture allows deployment on clusters using the Dask parallel computing library, enabling operation without strict binding to traditional task schedulers like SLURM. The system core provides load distribution: EasyDock can dynamically distribute calculations across multiple nodes, ensuring high throughput for virtual screening.

The core of the system is integration with the Dask parallel computing library. EasyDock can be dynamically distributed across dozens of nodes, ensuring scalability: docking 5,000 compounds takes only 22 minutes when distributing the load across 640 cores. [14] The pipeline automates the entire processing cycle: generating 3D conformations from SMILES via RDKit, desalting, and generating tautomers for physiological pH 7.4. Absolute fault tolerance is guaranteed by atomically saving results to an SQLite database, allowing resumable calculations after fatal failures. A comparison of solutions for high-throughput screening is presented in Table 1.

5. Paradigm Shift: Machine Learning and Hybrid Architectures

Classical conformational search provides excellent physical realism of poses; however, empirical scoring functions often fail to accurately predict affinity due to ignoring non-linear quantum effects.

5.1. GNINA and Convolutional Neural Networks

The GNINA software suite transfers advanced computer vision concepts to the field of structural biology. [15] GNINA uses a classical Markov chain Monte Carlo method with the empirical Vina scoring function for rapid generation of a pool of plausible conformations. Then, an ensemble of three-dimensional convolutional neural networks (3D CNN) comes into play. The three-dimensional space of the site is divided into voxels, where each atom type is assigned an independent “data channel.” The network extracts complex spatial interaction patterns, non-linearly modeling affinity. [15]

5.2. ArtiDock and Optimization for Industrial Scales

To address the trade-off between accuracy and speed in ultra-large industrial tasks, the ArtiDock platform was developed. [16] Trained on the novel PLINDER dataset, ArtiDock uses ML algorithms for pocket-specific docking. When tested on realistic scenarios—using apo protein structures and in the presence of water molecules in the binding site—ArtiDock demonstrated pose prediction accuracy surpassing classical programs by 29-38%. [16]

6. Generative Artificial Intelligence and Endogenous Modeling

A class of models capable of predicting atomic coordinates of the complex directly, end-to-end, bypassing the stages of stochastic sampling, has emerged.

6.1. Diffusion Generative Models: DiffDock

A fundamental shift was initiated by the emergence of the DiffDock architecture. [17] DiffDock was the first to apply score-based diffusion models to the docking problem. During the machine learning process, the ligand’s coordinates are gradually subjected to the addition of Gaussian noise. An equivariant graph neural network then learns a denoising score function, which at each step predicts the gradient vector to return the ligand to the ideal pose. [17] A crucial feature is the ability to perform blind docking: the model independently analyzes the protein surface and identifies the binding site.

6.2. Flow Matching and Geometric Innovations

An alternative to diffusion has become models based on Riemannian flow matching algorithms. They form direct deterministic trajectories between the initial distribution of unbound molecules and the target complex conformation, ensuring smooth gradient convergence and achieving top-tier results in virtual screening tasks. [6]

7. Overcoming the Problem of Receptor Flexibility

Under physiological conditions, ligand binding induces conformational changes in the protein (induced fit). Ignoring these changes led to the failure of classical programs in cross-docking scenarios.

7.1. Backbone Transformation and DynamicBind

A breakthrough in modeling complete macroscopic protein flexibility (including the main backbone) was the DynamicBind system. [18] The method uses equivariant diffusion networks to stimulate transformations between the thermodynamic states of the protein (from apo to holo form). A unique property of DynamicBind is its ability to detect hidden (cryptic) allosteric pockets that only open upon ligand interaction. [18]

7.2. High-Speed Alternatives: FABFlex

For tasks requiring greater computational speed, the AI model FABFlex (2025) was developed. By separating the process into pocket prediction, ligand docking, and pocket adaptation, FABFlex operates 208 times faster than DynamicBind while providing superior prediction accuracy. [19]

8. Co-Folding: Ab Initio Prediction of Macromolecular Complexes

A direction of co-folding algorithms is emerging, which predict the 3D structure of a complex starting solely from the linear amino acid sequence of the target and the SMILES string of the ligand. The flagship model is AlphaFold 3 (AF3) [20]. Its integrated diffusion structural module iteratively refines the coordinates of all atoms in the system (proteins, DNA, small molecules, ions). However, the high computational cost makes such models unsuitable for mass screening, relegating them to the role of a precision tool for the hit-to-lead optimization stage.

9. Specialized Therapeutic Modalities

9.1. Covalent Docking

Covalent inhibitors provide irreversible target engagement. The CarsiDock-Cov neural network became the first DL-based platform to predict interatomic distance matrices, which are then converted into a 3D pose of the covalent complex. [21]

9.2. Peptides, Metalloproteins, and Macrocycles

For docking extremely flexible polypeptide chains, AI models based on SE(3)-equivariant diffusion (DiffPepDock) doubled prediction accuracy and enabled de novo peptide design. [22] For metalloproteins, the MetalloDock tool integrates autoregressive spatial decoding, reconstructing the complex geometry of metal coordination. [23] For macrocycles, reinforcement learning algorithms (Macro-Hop) are applied to calculate the flexibility of giant rings. [24]

10. Discussion: Independent Benchmarking and the Neural Network Crisis

Claims of the complete superiority of AI models over traditional programs were critically re-evaluated during rigorous benchmarks, revealing fundamental vulnerabilities.

Neural networks have a tendency to overfit. The exhaustive Bento benchmark demonstrated that when presented with unseen pocket structures, the accuracy of pure AI models collapses catastrophically. [25] Evaluation based solely on root mean square deviation proved insufficient. Testing showed that generative diffusion models often generate impossible structures where atoms interpenetrate, ignoring van der Waals radii. The NextTopDocker study confirmed that classical logistic regression often outperforms the raw predictions of the most complex end-to-end AI frameworks, requiring mandatory post-processing using force fields. [26]

11. Future Research Perspectives

The dichotomy of “Classical Methods vs. AI” is recognized as false. Pure AI models suffer from unrealistic chemistry, while classical algorithms lack speed. [6] The optimal pipeline of the future will be constructed as a deeply integrated process:

Use of AI for full-atom modeling of receptor flexibility.
Sieving of colossal libraries through distributed cluster pipelines with basic physical evaluation.
Hybrid docking combining diffusion models with local optimization.
Rescoring with an ensemble of 3D-CNNs followed by physical minimization.
Application of co-folding models to the final group of ligands.

12. Conclusions

This review has fulfilled its stated aim: mapping the current landscape of molecular docking tools and analyzing their evolution. Based on a systematic analysis of the literature, answers were obtained for all the formulated research questions.

A classification of docking methods is proposed, reflecting their architectural evolution. Three main generations of tools are identified: (i) classical biophysical platforms (AutoDock Vina, Glide, GOLD), based on stochastic search and empirical scoring functions; (ii) hybrid approaches integrating machine learning for rescoring or pose refinement (GNINA, ArtiDock); (iii) end-to-end generative models (DiffDock, AlphaFold 3, DynamicBind) that predict complex structure directly, without iterative conformational search.
Analysis of independent benchmarks (PoseBusters, Bento, NextTopDocker) allowed for a balanced assessment of the capabilities of different approaches. It is shown that claims of the total superiority of AI models are premature. Classical algorithms maintain leadership in terms of physical validity (<3% incorrect structures), whereas generative models, while minimizing RMSD, often generate stereochemically impossible poses with atom collisions. At the same time, deep learning methods demonstrate unprecedented speed and the ability to account for macroscopic receptor flexibility, confirming their high potential when integrated correctly.
The review revealed that specialized pipelines have been developed to address the challenge of high-throughput screening. It is shown that graphical interfaces (EasyDockVina) democratize access to docking for researchers without programming skills, while distributed frameworks (EasyDock) ensure unprecedented scalability (docking thousands of compounds in minutes on cluster architectures), automating the entire data processing cycle—from 3D structure generation to ensuring fault tolerance.
Approaches to solving the fundamental problem of protein flexibility were analyzed in detail. It is demonstrated that the latest diffusion models (DynamicBind, FABFlex) overcome the limitations of classical “induced fit” modeling, allowing simulation of protein backbone transformations and detection of cryptic pockets. At the same time, a trade-off between accuracy (DynamicBind) and computational speed (FABFlex) was identified, which determines the choice of tool depending on the scale of the task.
The main result of this work is the substantiation of the thesis regarding the inevitability of forming hybrid pipelines. The “classical vs. AI” dichotomy is recognized as false. The optimal future strategy lies in deep integration, where generative models provide rapid hypothesis generation and account for conformational mobility, classical force fields ensure physical realism and correct neural network artifacts, and distributed infrastructures (EasyDock) enable scaling this process to billion-sized libraries.

Thus, the conducted analysis confirms that the synergistic symbiosis of proven biophysical methods and advanced generative artificial intelligence technologies is the main path for the development of computer-aided drug design, capable of radically increasing the efficiency of translating in silico discoveries into real clinical practice

Author Contributions

Conceptualization, E.G. and A.N.; methodology, E.G.; formal analysis, E.G.; data curation, E.G.; writing—original draft preparation, E.G.; writing—review and editing, A.N.; visualization, E.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

DiMasi, J.A.; Grabowski, H.G.; Hansen, R.W. Innovation in the pharmaceutical industry: New estimates of R&D costs. J. Health Econ. 2016, 47, 20–33. [Google Scholar]
Meng, X.Y.; Zhang, H.X.; Mezei, M.; Cui, M. Molecular Docking: A Powerful Approach for Structure-Based Drug Discovery. Curr. Comput. Aided Drug. Des. 2011, 7, 146–157. [Google Scholar] [CrossRef]
Paggi, J.M.; Pandit, A.; Dror, R.O. The Art and Science of Molecular Docking. Annu. Rev. Biochem. 2024, 93, 389–410. [Google Scholar] [CrossRef] [PubMed]
Oliveira, A.S. Bridging traditional and contemporary approaches in computational medicinal chemistry: opportunities for innovation in drug discovery. RSC Med. Chem. 2025, 16, 5953–5963. [Google Scholar] [CrossRef]
Jumper, J.; Evans, R.; Pritzel, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
Li, Y.; Yi, J.; Li, H.; et al. Decoding the limits of deep learning in molecular docking for drug discovery. Chem. Sci. 2025, 16, 17374–17390. [Google Scholar] [CrossRef]
Tricco, A.C.; Lillie, E.; Zarin, W.; et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 2018, 169, 467–473. [Google Scholar] [CrossRef]
Trott, O.; Olson, A.J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. [Google Scholar] [CrossRef] [PubMed]
Friesner, R.A.; Banks, J.L.; Murphy, R.B.; et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 2004, 47, 1739–1749. [Google Scholar] [CrossRef] [PubMed]
Jones, G.; Willett, P.; Glen, R.C.; Leach, A.R.; Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 1997, 267, 727–748. [Google Scholar] [CrossRef]
Buttenschoen, M.; Morris, G.M.; Deane, C.M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chem. Sci. 2024, 15, 3130–3139. [Google Scholar] [CrossRef]
ElTijani, A.; Alsafi, M.Y.; Ahmed, A.F. EasyDockVina: Graphical Interface for Ligand Optimization and High Throughput Virtual Screening with Vina. Zenodo 2019, v2.2. [Google Scholar]
Minibaeva, G.; Ivanova, A.; Polishchuk, P. EasyDock: customizable and scalable docking tool. J. Cheminform. 2023, 15, 102. [Google Scholar] [CrossRef] [PubMed]
McNutt, A.T.; Li, Y.; Meli, R.; Aggarwal, R.; Koes, D.R. GNINA 1.3: the next increment in molecular docking with deep learning. J. Cheminform. 2025, 17, 28. [Google Scholar] [CrossRef]
Voitsitskyi, T.; Koleiev, I.; Stratiichuk, R.; et al. ArtiDock: Accurate Machine Learning Approach to Protein–Ligand Docking Optimized for High-Throughput Virtual Screening. J. Chem. Inf. Model. 2026, 66, 1–15. [Google Scholar] [CrossRef] [PubMed]
Corso, G.; Stärk, H.; Jing, B.; Barzilay, R.; Jaakkola, T. DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Lu, W.; Zhang, J.; Huang, W.; et al. DynamicBind: predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model. Nat. Commun. 2024, 15, 1071. [Google Scholar] [CrossRef]
Zhang, Z.; Wu, L.; Gao, K.; Yao, J.; Qin, T.; Han, B. Fast and Accurate Blind Flexible Docking. In Proceedings of the Thirteenth International Conference on Learning Representations (ICLR), Singapore, 24–28 April 2025. [Google Scholar]
Abramson, J.; Adler, J.; Dunger, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
Shen, C.; Du, H.; Zhang, X.; et al. CarsiDock-Cov: A deep learning-guided approach for automated covalent docking and screening. Acta Pharm. Sin. B 2025, 15, 5758–5771. [Google Scholar] [CrossRef]
Wang, Y.; Wang, F.; Feng, L.; Zhang, C.; Lai, L. DiffPepDock: Efficient protein–peptide docking and binder screening via SE(3)-equivariant diffusion. Protein Sci. 2025, 34, e70338. [Google Scholar] [CrossRef]
Zhang, H.; Su, Q.; Zheng, Y.; et al. MetalloDock: Decoding Metalloprotein–Ligand Interactions via Physics-Aware Deep Learning for Metalloprotein Drug Discovery. J. Am. Chem. Soc. 2026. [Google Scholar] [CrossRef]
Liang, H.; Huang, S.; Xu, X.; et al. Designing Macrocyclic Kinase Inhibitors Using Macrocycle Scaffold Hopping with Reinforced Learning (Macro-Hop). J. Med. Chem. 2025, 68, 6698–6717. [Google Scholar] [CrossRef]
Pak, M.A.; Frolova, D.; Nikolenko, S.A.; et al. Bento: Benchmarking Classical and AI Docking on Drug Design–Relevant Data. bioRxiv 2025, 2025.12.30.696741. [Google Scholar]
Alcaide, E.; Gao, Z. NextTopDocker: the largest-to-date docking power benchmark reveals that deep learning performs generally much worse than logistic regression models. ChemRxiv 2024. [Google Scholar] [CrossRef]

Table 1. Comparison of infrastructure solutions for high-throughput screening. [13].

Characteristic	EasyDockVina (2019)	EasyDock (2023)
Interface	Graphical (GUI)	Command Line (CLI) / Python Module
Scalability	Local Workstation	Cluster Architecture (Dask, SSH)
Data Processing	File Conversion (.mol → .pdbqt)	3D Generation from SMILES, RDKit, Desalting
Chemical Management	Absent	Tautomer Generation, pH 7.4
Fault Tolerance	Basic (write to TXT/CSV)	Advanced (SQLite DB, Resumability)
Supported Engines	AutoDock Vina	Vina, Smina, GNINA, QVina
Specialized Functions	Standard Parameterization	Boron Compound Docking (Covalent Inhibitors)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Advances in Molecular Docking Methodologies

Abstract

Keywords:

Subject:

1. Introduction

2. Materials and Methods

2.1. Research Questions

2.2. Information Sources and Search Strategy

2.3. Selection Criteria

2.4. Selection Process and Data Extraction

2.5. Synthesis of Results

3. Classical Docking Platforms

3.1. AutoDock Vina and Empirical Approaches

3.2. Commercial Solutions: Glide and GOLD

4. Scalable Infrastructure Solutions for Screening

4.1. Local Interfaces: EasyDockVina

4.2. Distributed Computing: EasyDock

5. Paradigm Shift: Machine Learning and Hybrid Architectures

5.1. GNINA and Convolutional Neural Networks

5.2. ArtiDock and Optimization for Industrial Scales

6. Generative Artificial Intelligence and Endogenous Modeling

6.1. Diffusion Generative Models: DiffDock

6.2. Flow Matching and Geometric Innovations

7. Overcoming the Problem of Receptor Flexibility

7.1. Backbone Transformation and DynamicBind

7.2. High-Speed Alternatives: FABFlex

8. Co-Folding: Ab Initio Prediction of Macromolecular Complexes

9. Specialized Therapeutic Modalities

9.1. Covalent Docking

9.2. Peptides, Metalloproteins, and Macrocycles

10. Discussion: Independent Benchmarking and the Neural Network Crisis

11. Future Research Perspectives

12. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

MDPI Initiatives

Important Links

Subscribe