Submitted:
03 June 2026
Posted:
05 June 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction

2. Related Work
- Protein Structure Prediction and Representation Learning. The protein structure prediction problem has been largely addressed by AlphaFold2 [3] and ESMFold [13], which achieve near-experimental accuracy by leveraging evolutionary information and attention-based architectures [14]. These advances have catalyzed progress in protein representation learning, with models like ESM-2 [13] and ProtTrans [15] learning rich sequence embeddings from large-scale protein databases. Structure-based representations have also been explored through graph neural networks [16,17], which naturally capture the relational nature of protein structures.
- Generative Models for Protein Structure. Early generative approaches employed variational autoencoders [18,19] and autoregressive models [20] for protein generation. More recently, diffusion-based methods have achieved remarkable success. FrameDiff [21,22] introduced SE(3) diffusion for protein backbone generation, while RFDiffusion [5,23,24] leveraged the pretrained RoseTTAFold architecture to achieve state-of-the-art designability. Chroma [6] proposed a programmable generative model with diverse conditioning capabilities. Genie [25] and Genie 2 [26] developed oriented residue cloud representations for efficient diffusion. Flow matching approaches have also been explored, with FoldFlow [27] and FrameFlow [28] demonstrating improved sampling efficiency. Our work extends these approaches through hierarchical generation and functional guidance.
- Conditional Protein Generation. Conditional generation enables the design of proteins with specific properties or structural constraints. Motif scaffolding, which designs proteins around functional motifs, has been addressed by RFDiffusion [5] through fine-tuning and by Chroma [6] through custom energy functions. ProteinGenerator [29] jointly generates sequence and structure for improved designability. Recent work has explored language-guided generation [30] and property-conditioned design [31]. Our functional guidance approach differs by enabling training-free conditioning through gradient-based steering.
- SE(3)-Equivariant Neural Networks. Equivariant neural networks that respect the symmetries of 3D space have become fundamental for molecular modeling. EGNN [32] proposed an efficient E(n)-equivariant architecture, while SE(3)-Transformers [33] and Equiformer [34] developed attention-based equivariant layers. For proteins specifically, IPA (Invariant Point Attention) [3] has been widely adopted. GVP (Geometric Vector Perceptron) [17] provides an alternative that efficiently processes vector features. Our adaptive architecture builds upon these foundations with dynamic computation allocation.
3. Preliminaries
3.1. Protein Structure Representation
3.2. Flow Matching
3.3. SE(3) Flow Matching for Proteins
4. Method
4.1. Overview
4.2. Hierarchical Flow Matching
- Stage 1: Backbone Generation. The first stage generates residue frames from a prior distribution. We initialize frames from a centered Gaussian distribution on SE(3):where denotes the uniform distribution on SO(3). The vector field network predicts frame updates:where is the angular velocity and is the linear velocity for frame i.
- Stage 2: All-Atom Refinement. Given the generated backbone frames , the second stage generates all-atom coordinates conditioned on the backbone structure. We parameterize atoms relative to their residue frames, decomposing atomic positions as:where is the position in the local frame of residue i.
- Conditioning Mechanism. The backbone information is injected into Stage 2 through three complementary mechanisms: (1) Frame embedding: Each residue’s frame is encoded into a 128-dimensional embedding via invariant features (inter-frame distances and relative orientations), concatenated with atom features; (2) Cross-attention: Atomic features attend to a sequence of backbone frame representations through multi-head cross-attention layers, enabling long-range backbone-atom communication; (3) Local geometric features: For each atom, we compute distances to the three backbone atoms (N, , C) of its residue and neighboring residues, providing fine-grained positional context. The refinement network predicts displacements in local coordinates, ensuring SE(3) equivariance of the overall generation.
- Hierarchical Training. Both stages are trained independently with conditional flow matching objectives:
- The hierarchical decomposition reduces the effective dimensionality at each stage, enabling more efficient learning and sampling.
4.3. Functional Guidance
- Guidance Mechanism. At each sampling step, we compute the gradient of the property score with respect to the current structure and add it to the flow velocity:where controls the guidance strength. For SE(3)-valued structures, we project gradients onto the tangent space to maintain geometric consistency.
-
Multi-Property Guidance. Multiple property predictors can be combined through weighted summation:This enables simultaneous optimization of multiple objectives such as stability, binding affinity, and solubility.
- Practical Considerations. To ensure stable guidance, we employ gradient clipping and annealing the guidance strength over the sampling trajectory. Specifically, we use where controls the annealing rate, applying stronger guidance early in sampling when the structure is more malleable.
4.4. Adaptive SE(3)-Equivariant Architecture
- Multi-Scale Graph Construction. We construct a hierarchical graph with nodes representing atoms at different resolutions. Edges connect nodes based on spatial proximity with resolution-dependent cutoffs:where denotes the resolution level. We use cutoff distances Å for coarse backbone interactions, Å for residue-level contacts, and Å for fine-grained atomic interactions. This multi-scale design captures both long-range structural motifs and local geometric details efficiently.
- Adaptive Message Passing. Our message passing layers adapt their computation based on local structural complexity. We define a complexity score for each node based on local density and geometric features:where is the neighbor count, is local density, and captures geometric curvature. The number of message passing iterations for each node is then , with and in our experiments.
- Implementation Details. To efficiently handle variable iteration counts within batches, we implement adaptive message passing through masked operations: all nodes undergo iterations, but updates are masked out for nodes that have reached their allocated iteration count. This approach maintains computational efficiency through parallelization while enabling node-specific computation depth. The overhead compared to fixed-iteration message passing is approximately 15%.
- SE(3)-Equivariant Updates. Node features are updated through equivariant message passing:where , , and are learnable functions. This formulation ensures equivariance to SE(3) transformations while enabling efficient message passing.
- Vector Feature Channels. Following GVP [17], we maintain both scalar features and vector features at each node. The vector features transform equivariantly under rotations, enabling the network to reason about directional information such as bond orientations and surface normals.
4.5. Training Details
- Dataset. We train on a filtered subset of the Protein Data Bank (PDB) [35] containing 73,582 high-resolution structures (resolution Å, R-free ) with sequence identity clustered at 40% using MMseqs2 [36]. We exclude structures with missing residues or chain breaks. We further augment the training set with 127,418 high-confidence AlphaFold2 predictions (pLDDT ) from the Swiss-Prot database [37]. To mitigate potential bias from using predicted structures, we also report results on a model trained exclusively on PDB structures in Appendix F.
- Optimization. Both stages are trained using AdamW optimizer with learning rate and weight decay . We use a cosine annealing schedule with 10,000 warmup steps. Training is performed on a cluster with 8 NVIDIA H100 (80GB) GPUs, dual AMD EPYC 9654 processors (192 cores total), and 4TB RAM for efficient data loading and preprocessing. Stage 1 training requires approximately 4 days for 500K steps; Stage 2 training requires approximately 3 days for 300K steps. Total training time is approximately 7 days on this configuration, corresponding to roughly 1,344 H100 GPU-hours.
- Sampling. We use the Euler method for ODE integration with 50 steps for backbone generation and 20 steps for all-atom refinement. Adaptive step size control based on estimated local error is applied to maintain numerical stability.
- Numerical Stability. The rotation interpolation can be numerically unstable when and represent nearly opposite rotations (rotation angle ). We address this through: (1) quaternion-based interpolation with proper handling of the double-cover of SO(3); (2) clamping the rotation angle to with ; (3) regularization during training that penalizes very large rotation differences. In practice, such near- rotations are rare in protein structures due to physical constraints.

5. Experiments
5.1. Evaluation Metrics
5.2. Unconditional Backbone Generation
5.3. Motif Scaffolding
5.4. Functional Protein Design
5.5. Ablation Studies
5.6. Computational Efficiency
5.7. Case Studies
- Enzyme Active Site Design. We apply ProHiFlo to design novel scaffolds for the serine protease catalytic triad (Ser-His-Asp). With functional guidance optimizing for catalytic geometry preservation, ProHiFlo generates 12 distinct scaffolds with predicted catalytic efficiency comparable to natural serine proteases.
- De Novo Binder Design. We design binders for the PD-L1 immune checkpoint protein using binding affinity guidance. Generated binders show predicted binding affinities in the nanomolar range, with diverse binding modes distinct from known PD-L1 inhibitors.
6. Limitations
- Hierarchical Inconsistency. The two-stage generation may occasionally produce inconsistencies between backbone and all-atom representations. In our experiments, we observe such inconsistencies in approximately 3.2% of generated structures (defined as cases where any sidechain atom is Å from its expected position given ideal bond geometry). These are addressed through a lightweight post-processing step consisting of: (1) energy minimization using OpenMM [42] with the AMBER14 force field (500 steps); (2) sidechain repacking using Rosetta’s [43] PackRotamers protocol. Post-processing adds approximately 0.8 seconds per structure. Inconsistencies occur more frequently for longer proteins ( residues) and proteins with high loop content.
- Guidance Generalization. Functional guidance depends on the quality of pretrained predictors. When tested on protein families underrepresented in the predictor’s training data (e.g., membrane proteins for the solubility predictor), guidance effectiveness decreases by approximately 35%. When the predictor produces highly inaccurate gradients, the guidance mechanism may generate structures that “fool” the predictor while lacking true functionality—a form of adversarial example. Users should validate guided designs with orthogonal methods.
- Experimental Validation. All evaluations are computational. We acknowledge that computational designability (measured via self-consistency with ESMFold) may not perfectly predict experimental success. Based on prior work [5], we expect 40-60% of computationally designable structures to express and fold correctly in experiments.
- Scale Limitations. Generation of very large proteins ( residues) or multi-chain complexes requires further optimization. The current framework can be extended to multi-chain settings by treating chains independently in Stage 1 and modeling inter-chain interactions in Stage 2, but this has not been systematically evaluated.
7. Conclusion
- Code and Data Availability. Code, pretrained models, and processed datasets will be released upon publication at https://github.com/anonymous/prohiflo. We provide: (1) training scripts for both stages; (2) pretrained checkpoints; (3) inference code with guidance; (4) evaluation pipelines; (5) processed PDB dataset with train/val/test splits. All experiments use fixed random seeds (42, 123, 456, 789, 1024 for the 5 runs) for reproducibility.
- Reproducibility Statement. We have made extensive efforts to ensure reproducibility. All hyperparameters are reported in Appendix C. Evaluation uses ProteinMPNN v1.0.1 and ESMFold v2.0 with default parameters. Structure alignment uses TM-align [44]. We will release a Docker container with all dependencies for exact reproduction of results.
Appendix A. Theoretical Analysis
Appendix A.1. Convergence Guarantee for Hierarchical Flow Matching
- (R1)
- The data distribution has finite second moments over SE(3)N ;
- (R2)
- The neural networks are Lipschitz continuous with constants ;
- (R3)
- The training uses conditional flow matching with optimal transport paths;
Appendix A.2. Complexity Analysis
- Adaptive Overhead. The adaptive message passing mechanism introduces additional overhead of where is the average iteration count. In practice, this overhead is approximately 15% as most nodes converge to low iteration counts. The memory overhead is negligible as we reuse buffers across iterations.
Appendix A.3. Guidance Optimality
- Relaxing the Lipschitz Assumption. The Lipschitz assumption may be violated for deep neural network predictors. In this case, we can replace the global Lipschitz constant L with a local estimate and use adaptive guidance . Empirically, we find that gradient clipping to with effectively handles non-Lipschitz predictors while preserving guidance effectiveness.
Appendix B. Detailed Derivations
Appendix B.1. SE(3) Flow Matching on Protein Frames
- Parameterization. A residue frame consists of a rotation and translation . The tangent space at T is .
- Interpolation Path. For source frame and target frame , we define the interpolation:where and are the exponential and logarithm maps.
- Vector Field. The conditional vector field that generates this path is:
- Training Objective. The SE(3) flow matching loss becomes:where denotes the Frobenius norm on .
Appendix B.2. Adaptive Message Passing Derivation
Appendix C. Additional Experimental Details
Appendix C.1. Dataset Statistics
| Source | Structures | Avg. Length |
|---|---|---|
| PDB (filtered) | 73,582 | 187.3 |
| AlphaFold DB | 127,418 | 234.6 |
| Total | 201,000 | 217.2 |
Appendix C.2. Hyperparameter Settings
| Parameter | Stage 1 | Stage 2 |
|---|---|---|
| Hidden dimension | 384 | 256 |
| Number of layers | 12 | 8 |
| Attention heads | 12 | 8 |
| Dropout | 0.1 | 0.1 |
| Learning rate | ||
| Batch size | 256 | 128 |
| Training steps | 500K | 300K |
Appendix C.3. Evaluation Protocol
- 1.
- Design 8 sequences using ProteinMPNN with sampling temperature 0.1
- 2.
- Predict structures for all sequences using ESMFold
- 3.
- Compute scTM score between generated backbone and predicted structures
- 4.
- Report designability as fraction with max(scTM)
Appendix D. Additional Results
Appendix D.1. Per-Length Designability Breakdown
| Method | 50-100 | 100-150 | 150-200 | 200-250 | 250-300 |
|---|---|---|---|---|---|
| RFDiffusion | 0.912 | 0.867 | 0.798 | 0.723 | 0.651 |
| Chroma | 0.894 | 0.834 | 0.756 | 0.689 | 0.612 |
| FoldFlow-2 | 0.923 | 0.889 | 0.834 | 0.778 | 0.712 |
| ProHiFlo | 0.967 | 0.945 | 0.912 | 0.878 | 0.834 |
Appendix D.2. Functional Guidance with Different Predictors
| Predictor | Base Score | Guided Score | Improvement |
|---|---|---|---|
| ESM-2 Stability | 0.698 | 0.867 | +24.2% |
| GVP Binding | 0.612 | 0.784 | +28.1% |
| DeepSol Solubility | 0.654 | 0.823 | +25.8% |
| ProteinMPNN pLDDT | 0.756 | 0.891 | +17.9% |
Appendix E. Hyperparameter Ablation Studies
Appendix E.1. Sampling Steps Ablation
| Stage 1 Steps | Designability | Novelty | Time (s) | Validity |
|---|---|---|---|---|
| 20 | 0.856±.024 | 0.712±.028 | 1.2 | 0.978 |
| 30 | 0.889±.019 | 0.734±.024 | 1.5 | 0.986 |
| 50 | 0.924±.012 | 0.758±.018 | 2.1 | 0.994 |
| 75 | 0.927±.011 | 0.761±.017 | 2.9 | 0.995 |
| 100 | 0.928±.011 | 0.762±.016 | 3.8 | 0.995 |
| Stage 2 Steps | Designability | All-Atom RMSD | Time (s) | Validity |
|---|---|---|---|---|
| 10 | 0.912±.015 | 0.42±.08 | 1.8 | 0.987 |
| 20 | 0.924±.012 | 0.31±.06 | 2.1 | 0.994 |
| 30 | 0.925±.012 | 0.29±.05 | 2.5 | 0.994 |
| 50 | 0.926±.011 | 0.28±.05 | 3.2 | 0.995 |
Appendix E.2. Guidance Annealing Parameter γ
| Stability | Designability | Diversity | Mode Collapse | |
|---|---|---|---|---|
| 0.0 (no annealing) | 0.823±.028 | 0.856±.021 | 0.612±.034 | 12.3% |
| 0.5 | 0.856±.024 | 0.878±.018 | 0.698±.028 | 6.8% |
| 1.0 | 0.867±.019 | 0.897±.014 | 0.734±.024 | 3.2% |
| 2.0 | 0.854±.022 | 0.889±.016 | 0.756±.021 | 2.1% |
Appendix E.3. Adaptive Message Passing Bounds
| Designability | Time (s) | Avg. Iterations | ||
|---|---|---|---|---|
| 1 | 4 | 0.878±.022 | 1.6 | 2.1 |
| 2 | 4 | 0.901±.018 | 1.8 | 2.8 |
| 2 | 6 | 0.924±.012 | 2.1 | 3.4 |
| 2 | 8 | 0.926±.011 | 2.6 | 4.1 |
| 4 | 8 | 0.921±.013 | 3.1 | 5.2 |
Appendix F. PDB-Only Training Results
| Training Data | Designability | Novelty | Diversity | Validity |
|---|---|---|---|---|
| PDB + AlphaFold | 0.924±.012 | 0.758±.018 | 0.769±.015 | 0.994±.003 |
| PDB only | 0.901±.016 | 0.782±.021 | 0.791±.018 | 0.989±.005 |
Appendix G. Fair Comparison at Equal Sampling Steps
| Method | Designability | Validity | Time (s) |
|---|---|---|---|
| RFDiffusion (50 steps) | 0.623±.034 | 0.912±.018 | 5.6 |
| Chroma (50 steps) | 0.598±.038 | 0.897±.021 | 4.8 |
| FoldFlow-2 (50 steps) | 0.756±.028 | 0.956±.012 | 3.2 |
| ProHiFlo (50 steps) | 0.924±.012 | 0.994±.003 | 2.1 |
Appendix H. Failure Case Analysis
- Long loops (42% of failures): Structures with extended loop regions ( residues) show reduced designability due to conformational flexibility.
- Unusual topologies (28%): Novel fold topologies not well-represented in the training data.
- High -sheet content (18%): All- structures are more challenging due to long-range hydrogen bonding patterns.
- Hierarchical inconsistency (12%): Cases where backbone and all-atom stages produce conflicting local geometries.
References
- Huang, Po-Ssu; Boyken, Scott E; Baker, David. The coming of age of de novo protein design. Nature 2016, 537, 320–327. [Google Scholar] [CrossRef]
- Zhang, Yichao; Deng, Ningyuan; Song, Xinyuan; Bi, Ziqian; Wang, Tianyang; Yao, Zheyu; Chen, Keyu; Li, Ming; Niu, Qian; Liu, Junyu; et al. Advanced deep learning methods for protein structure prediction and design. BIO Integration, 2025. [Google Scholar]
- Jumper, John; Evans, Richard; Pritzel, Alexander; Green, Tim; Figurnov, Michael; Ronneberger, Olaf; Tunyasuvunakool, Kathryn; Bates, Russ; Žídek, Augustin; Potapenko, Anna; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
- Chen, Kaijie; Lin, Zihao; Xu, Zhiyang; Shen, Ying; Yao, Yuguang; Rimchala, Joy; Zhang, Jiaxin; Huang, Lifu. R2i-bench: Benchmarking reasoning-driven text-to-image generation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025; pp. 12606–12641. [Google Scholar]
- Watson, Joseph L; Juergens, David; Bennett, Nathaniel R; Trippe, Brian L; Yim, Jason; Eisenach, Helen E; Ahern, Woody; Borber, Andrew J; Ragotte, Robert J; Milles, Lukas F; et al. De novo design of protein structure and function with RFdiffusion. Nature 2023, 620, 1089–1100. [Google Scholar] [CrossRef]
- Ingraham, John B; Barber, Max; Wilber, Greta; Strom, Luke; Theesfeld, Chandra; Listgarten, Julia; Corso, Gabriele; Jaakkola, Tommi; Barzilay, Regina. Illuminating protein space with a programmable generative model. Nature 2023, 623, 1070–1078. [Google Scholar] [CrossRef]
- Alamdari, Sarah; Thakkar, Nitya; van den Berg, Rianne; Lu, Alex X; Fusi, Nicolo; Amini, Ava P; Yang, Kevin K. Protein generation with evolutionary diffusion: sequence is all you need. bioRxiv 2023, pages 2023–09. [Google Scholar] [CrossRef]
- Peng, Benji; Liang, Chia Xin; Bi, Ziqian; Liu, Ming; Zhang, Yichao; Wang, Tianyang; Chen, Keyu; Song, Xinyuan; Feng, Pohsun. From noise to nuance: Advances in deep generative image models. arXiv 2024, arXiv:2412.09656. [Google Scholar] [CrossRef]
- You, Mingjie; Chen, Kaijie; Cheng, Dawei. Drdgrl: Dual-relational dynamic graph representation learning for delay-sensitive stock trend prediction. International Conference on Database Systems for Advanced Applications, 2026; Springer; pp. 35–50. [Google Scholar]
- Zhang, Haobo; Mao, Xutao; Dong, Guangyuan; Li, Ziwei; Su, Xuanbo; Chen, Kaijie; Yang, Jing; Lin, Zheng. Memmark: State-evolution attribution watermarking for agent long-term memory systems. arXiv 2026, arXiv:2605.25002. [Google Scholar]
- Ho, Jonathan; Jain, Ajay; Abbeel, Pieter. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Lipman, Yaron; Chen, Ricky TQ; Ben-Hamu, Heli; Nickel, Maximilian; Le, Matthew. Flow matching for generative modeling. International Conference on Learning Representations, 2023. [Google Scholar]
- Lin, Zeming; Akin, Halil; Rao, Roshan; Hie, Brian; Zhu, Zhongkai; Lu, Wenting; Smetanin, Nikita; Verkuil, Robert; Kabeli, Ori; Shmueli, Yaniv; et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N; Kaiser, ukasz; Polosukhin, Illia. Attention is all you need. Advances in Neural Information Processing Systems, 2017; 30. [Google Scholar]
- Elnaggar, Ahmed; Heinzinger, Michael; Dallago, Christian; Rehawi, Ghalia; Wang, Yu; Jones, Llion; Gibbs, Tom; Feher, Tamas; Angerer, Christoph; Steinegger, Martin; et al. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7112–7127. [Google Scholar] [CrossRef]
- Gligorijević, Vladimir; Renfrew, P Douglas; Kosciolek, Tomasz; Koehler Ber, Julia; Berenberg, Daniel; Vatez, Tommi; Chandler, Chris; Taylor-Compston, Andre; Frey, Brendan J; Bonneau, Richard. Structure-based protein function prediction using graph convolutional networks. Nat. Commun. 2021, 12, 3168. [Google Scholar] [CrossRef]
- Jing, Bowen; Eismann, Stephan; Suriana, Patricia; Townshend, Raphael JL; Dror, Ron. Equivariant graph neural networks for 3d macromolecular structure. arXiv 2021, arXiv:2106.03843. [Google Scholar] [CrossRef]
- Hawkins-Hooker, Alex; Depardieu, Florence; Baez-Ortega, Adrian; Touchon, Marie; Rocha, Eduardo PC; Granata, Ilaria; Brown, Michael PH; Savageau, Michael A. Generating functional protein variants with variational autoencoders. PLoS Comput. Biol. 2021, 17, e1008736. [Google Scholar] [CrossRef]
- Eguchi, Raphael R; Choe, Christian A; Huang, Po-Ssu. Ig-vae: Generative modeling of protein structure by direct 3d coordinate generation. PLoS Comput. Biol. 2022, 18, e1010271. [Google Scholar] [CrossRef]
- Ingraham, John; Garg, Vikas; Barzilay, Regina; Jaakkola, Tommi. Generative models for graph-based protein design. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Yim, Jason; Trippe, Brian L; De Bortoli, Valentin; Mathieu, Emile; Doucet, Arnaud; Barzilay, Regina; Jaakkola, Tommi. Se(3) diffusion model with application to protein backbone generation. International Conference on Machine Learning, 2023; PMLR; pp. 40001–40039. [Google Scholar]
- Chen, Huiyi; Peng, Jiawei; Min, Dehai; Sun, Changchang; Chen, Kaijie; Yan, Yan; Yang, Xu; Cheng, Lu. Mvi-bench: A comprehensive benchmark for evaluating robustness to misleading visual inputs in lvlms. arXiv 2025, arXiv:2511.14159. [Google Scholar] [CrossRef]
- Huang, Yixu; Li, Bo; Li, Na; Wang, Zhe; Chen, Kaijie; Ge, Haonan; Si, Qingyi; Shen, Yuanzhe; Yang, Ruihan; Wang, Guangjing; et al. Gui agents for continual game generation. arXiv 2026, arXiv:2605.28258. [Google Scholar] [CrossRef]
- Chen, Kaijie; Xu, Zhiyang; Shen, Ying; Lin, Zihao; Yao, Yuguang; Huang, Lifu. Superflow: Training flow matching models with rl on the fly. arXiv 2025, arXiv:2512.17951. [Google Scholar]
- Lin, Yeqing; AlQuraishi, Mohammed. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. International Conference on Machine Learning, 2023; PMLR; pp. 21312–21333. [Google Scholar]
- Lin, Yeqing; Lee, Minji; Zhang, Zhao; AlQuraishi, Mohammed. Out of many, one: Designing and scaffolding proteins at the scale of the structural universe with Genie 2. arXiv 2024, arXiv:2405.15489. [Google Scholar] [CrossRef]
- Bose, Avishek Joey; Akhound-Sadegh, Tara; Fatras, Kilian; Huguet, Guillaume; Rector-Brooks, Jarrid; Liu, Cheng-Hao; Nica, Andrei Cristian; Korablyov, Maksym; Bronstein, Michael; Tong, Alexander. Se(3)-stochastic flow matching for protein backbone generation. International Conference on Learning Representations, 2024. [Google Scholar]
- Yim, Jason; Campbell, Andrew; Foong, Andrew YK; Gastegger, Michael; Jiménez-Luna, José; Lewis, Sarah; Garcia Satorras, Victor; Veeling, Bastiaan S; Barzilay, Regina; Jaakkola, Tommi; et al. Fast protein backbone generation with SE(3) flow matching. arXiv 2023, arXiv:2310.05297. [Google Scholar] [CrossRef]
- Lisanza, Sidney Lyayuga; Gershon, Jake M; Tipps, Sam WK; Arnesen, Jerald A; Zhu, Chenlin; Zandberg, Samuel J; Raman, Rishi; Bakker, Casper; Koska, W Sebastian; Lehnert, Dustin; et al. Joint generation of protein sequence and structure with RoseTTAFold sequence space diffusion. bioRxiv 2023, 2023–05. [Google Scholar] [CrossRef]
- Ferruz, Noelia; Schmidt, Steffen; Höcker, Birte. Protgpt2 is a deep unsupervised language model for protein design. Nat. Commun. 2022, 13, 4348. [Google Scholar] [CrossRef] [PubMed]
- Gruver, Nate; Stanton, Samuel; Frey, Nathan C; Rudner, Tim GJ; Hotzel, Isidro; Lafrance-Vanasse, Julien; Rajpal, Arvind; Cho, Kyunghyun; Wilson, Andrew Gordon. Protein design with guided discrete diffusion. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
- Garcia Satorras, Víctor; Hoogeboom, Emiel; Welling, Max. E(n) equivariant graph neural networks. International Conference on Machine Learning, 2021; PMLR; pp. pages 9323–9332. [Google Scholar]
- Fuchs, Fabian; Worrall, Daniel; Fischer, Volker; Welling, Max. Se(3)-transformers: 3d roto-translation equivariant attention networks. Adv. Neural Inf. Process. Syst. 2020, 33, 1970–1981. [Google Scholar]
- Liao, Yi-Lun; Smidt, Tess. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. International Conference on Learning Representations, 2023. [Google Scholar]
- Berman, Helen M; Westbrook, John; Feng, Zukang; Gilliland, Gary; Bhat, Talapady N; Weissig, Helge; Shindyalov, Ilya N; Bourne, Philip E. The protein data bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef]
- Steinegger, Martin; Söding, Johannes. Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 2017, 35, 1026–1028. [Google Scholar] [CrossRef]
- Varadi, Mihaly; Anyango, Stephen; Deshpande, Mandar; Nair, Sreenath; Natassia, Cyrus; Yordanova, Galabina; Yuan, David; Stroe, Oana; Wood, Gemma; Laydon, Agata; et al. Alphafold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 2022, 50, D419–D427. [Google Scholar] [CrossRef]
- Dauparas, Justas; Anishchenko, Ivan; Bennett, Nathaniel; Bai, Hua; Ragotte, Robert J; Milles, Lukas F; Wicky, Basile IM; Courber, Alexis; de Haas, Rob J; Bethel, Neville; et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science 2022, 378, 49–56. [Google Scholar] [CrossRef]
- Corso, Gabriele; Stärk, Hannes; Jing, Bowen; Barzilay, Regina; Jaakkola, Tommi. Diffdock: Diffusion steps, twists, and turns for molecular docking. International Conference on Learning Representations, 2023. [Google Scholar]
- Rives, Alexander; Meier, Joshua; Sercu, Tom; Goyal, Siddharth; Lin, Zeming; Liu, Jason; Guo, Demi; Ott, Myle; Zitnick, C Lawrence; Ma, Jerry; Fergus, Rob. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. 2021, 118, e2016239118. [Google Scholar] [CrossRef] [PubMed]
- Khurana, Sameer; Rawi, Reda; Kuber, Kumardeep; Hadar, Shaomin; Manor, Ohad; Orengo, Christine; Pires, Douglas EV; Ascher, David B; Cowen, Lenore; Bhardwaj, Gaurav. Deepsol: a deep learning framework for sequence-based protein solubility prediction. Bioinformatics 2018, 34, 2605–2613. [Google Scholar] [CrossRef]
- Eastman, Peter; Swails, Jason; Chodera, John D; McGibbon, Robert T; Zhao, Yutong; Beauchamp, Kyle A; Wang, Lee-Ping; Simmonett, Andrew C; Harrigan, Matthew P; Stern, Chaya D; et al. Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLoS Comput. Biol. 2017, 13, e1005659. [Google Scholar] [CrossRef]
- Leman, Julia Koehler; Weitzner, Brian D; Lewis, Steven M; Adolf-Bryfogle, Jared; Alam, Nawsad; Alford, Rebecca F; Aprahamian, Melanie; Baker, David; Barlow, Kyle A; Barth, Patrick; et al. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat. Methods 2020, 17, 665–680. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Yang; Skolnick, Jeffrey. Tm-align: a protein structure alignment algorithm based on the tm-score. Nucleic Acids Res. 2005, 33, 2302–2309. [Google Scholar] [CrossRef] [PubMed]
- Chen, Ricky TQ; Rubanova, Yulia; Bettencourt, Jesse; Duvenaud, David K. Neural ordinary differential equations. Advances in Neural Information Processing Systems 31.





| Method | Designability ↑ | Novelty ↑ | Diversity ↑ | Validity ↑ | Steps ↓ |
|---|---|---|---|---|---|
| FrameDiff | 0.612±.031 | 0.724±.035 | 0.681±.033 | 0.943±.012 | 500 |
| Genie 2 | 0.734±.028 | 0.698±.029 | 0.712±.026 | 0.967±.009 | 500 |
| RFDiffusion | 0.823±.019 | 0.631±.023 | 0.658±.028 | 0.982±.006 | 200 |
| Chroma | 0.796±.024 | 0.687±.027 | 0.723±.024 | 0.971±.008 | 500 |
| FoldFlow-2 | 0.851±.021 | 0.703±.025 | 0.695±.027 | 0.978±.007 | 100 |
| EvoDiff | 0.789±.026 | 0.712±.024 | 0.698±.029 | 0.963±.011 | 200 |
| ProHiFlo (Ours) | 0.924±.012 | 0.758±.018 | 0.769±.015 | 0.994±.003 | 50 |
| Method | Active Sites (n=24) | Binding (n=18) | Structural (n=32) | Average |
|---|---|---|---|---|
| RFDiffusion | 0.412±.045 | 0.523±.038 | 0.687±.032 | 0.541 |
| Chroma | 0.378±.052 | 0.489±.044 | 0.654±.039 | 0.507 |
| Genie 2 | 0.445±.041 | 0.534±.036 | 0.712±.028 | 0.564 |
| EvoDiff | 0.398±.048 | 0.512±.041 | 0.678±.034 | 0.529 |
| FoldFlow-2 | 0.467±.038 | 0.556±.033 | 0.698±.031 | 0.574 |
| ProHiFlo | 0.589±.028 | 0.672±.024 | 0.812±.021 | 0.691 |
| Method | Stability | Binding | Solubility | Designability |
|---|---|---|---|---|
| RFDiffusion | 0.623±.034 | 0.534±.041 | 0.587±.038 | 0.812±.022 |
| Chroma | 0.645±.031 | 0.567±.037 | 0.612±.035 | 0.789±.025 |
| EvoDiff | 0.634±.033 | 0.545±.039 | 0.598±.036 | 0.798±.024 |
| ProHiFlo (no guidance) | 0.698±.028 | 0.612±.032 | 0.654±.029 | 0.924±.012 |
| ProHiFlo + Guidance | 0.867±.019 | 0.784±.023 | 0.823±.021 | 0.897±.014 |
| Variant | Designability | Novelty | Steps | Time (s) |
|---|---|---|---|---|
| Full model | 0.924±.012 | 0.758±.018 | 50 | 2.1 |
| w/o Hierarchical | 0.867±.024 | 0.723±.027 | 120 | 8.7 |
| w/o Adaptive arch. | 0.892±.019 | 0.741±.022 | 50 | 4.2 |
| w/o Multi-scale | 0.878±.021 | 0.719±.024 | 50 | 2.8 |
| Single-stage all-atom | 0.821±.028 | 0.697±.031 | 150 | 12.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).