Submitted:
05 June 2026
Posted:
08 June 2026
You are already at the latest version
Abstract
Keywords:
Key Points
- Microbiome data are naturally relational, but many computational pipelines still treat microbial profiles as flat abundance vectors.
- Graph contrastive learning can exploit unlabelled microbiome data by aligning biologically related graph views while reducing dependence on expensive clinical labels.
- Microbiome-oriented GCL requires biologically informed augmentations because generic perturbations may destroy ecological, phylogenetic, functional or compositional meaning.
- A reliable benchmark should combine leave-one-study-out testing, cross-cohort transfer, graph-construction stress tests, biological plausibility checks and reproducibility reporting.
- Clinical use requires explicit control of batch effects, confounding, dataset shift, false invariances and the uncertainty of inferred microbial networks.
1. Introduction

2. Microbiome Data as Graphs
3. Foundations of Graph Contrastive Learning
| Method | Contrastive level | View construction | Objective | Microbiome implication |
|---|---|---|---|---|
| DGI [17] | Local–global | Original and corrupted graph | Mutual-information-inspired discrimination | Useful for aligning taxa/subgraph embeddings with whole-community summaries. |
| InfoGraph [28] | Graph-level | Graph and substructure summaries | MI maximization across scales | Relevant for sample-level representations and microbial community classification. |
| GraphCL [18] | Graph-level | Node drop, edge perturbation, attribute masking, subgraph sampling | NT-Xent contrast | Provides the canonical augmentation vocabulary, but requires biological safeguards. |
| GRACE [19] | Node-level | Edge dropping and feature masking | Symmetric InfoNCE | Relevant for taxon embeddings and node-level biomarker discovery. |
| MVGRL [20] | Node and graph | Neighborhood and diffusion views | Multi-view contrast | Suggests contrasting co-abundance graphs with diffusion or functional similarity views. |
| GCA [21] | Node-level | Adaptive topology/feature perturbation | InfoNCE | Motivates confidence-aware edge dropping and prevalence-aware masking. |
| GCC [29] | Subgraph/ego-network | Random-walk-based subgraphs | Contrastive pretraining | Useful for transferring structural motifs across studies or body sites. |
| JOAO [30] | Graph-level | Automatically selected augmentations | Bilevel/automated augmentation search | Suggests data-driven search constrained by compositional and ecological rules. |
| AD-GCL [31] | Graph-level | Learned adversarial augmentations | Contrastive adversarial objective | Useful for stress-testing whether augmentations are too weak or biologically destructive. |
| BGRL [32] | Node-level | Two augmented views | Negative-free bootstrap | Attractive when false negatives are likely across cohorts and phenotypes. |
4. Biologically Informed GCL for Microbiome Data
5. Applications
| Application | Representative study | Graph representation | Main methodological idea | Relevance for microbiome GCL |
|---|---|---|---|---|
| Disease classification | Khan et al. [9] | Phylogenetic graph of taxa | GCN for multiclass metagenomic classification | Taxonomic and phylogenetic structure can define biologically meaningful views. |
| CRC classification | CACONET [10] | Compositional-aware microbial correlation networks | Graph-level classification of posterior networks | Posterior or phenotype-specific networks can be contrasted as views. |
| Biomarker discovery | WSGMB [11] | Weighted signed microbial co-occurrence graphs | Signed GNN and node-importance scoring | Supports contrastive biomarker stability under perturbations. |
| Microbe–disease prediction | GCATCMDA [15] | Microbe similarity, disease similarity and association graphs | GNN plus contrastive learning | Direct example of contrastive graph learning for microbiome-related link prediction. |
| Higher-order association | LSCHNN [40] | Food–microbe–disease hypergraph | Single-view contrastive hypergraph neural network | Illustrates sparse higher-order microbiome-related GCL. |
| Microbe–drug prediction | SMMDA [41] | Heterogeneous drug–microbe graph | Transformer plus multi-view GCL | Shows relevance to pharmacomicrobiomics and drug discovery. |
| Community prediction | SIMBA-GNN [42] | Mechanistic microbe–metabolite–pathway graph | Simulation-augmented edge-aware graph transformer | Motivates mechanistic and temporal GCL with simulation-derived views. |
6. Evaluation and Benchmarking
7. Reporting Standards, Ethics and Clinical Translation
8. Future Directions
9. Conclusion
Data and Software Availability
References
- Turnbaugh, P.J.; Ley, R.E.; Hamady, M.; Fraser-Liggett, C.M.; Knight, R.; Gordon, J.I. The human microbiome project. Nature 2007, 449, 804–810. [CrossRef]
- Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 2012, 486, 207–214. [CrossRef]
- Qin, J.; Li, R.; Raes, J.; Arumugam, M.; Burgdorf, K.S.; Manichanh, C.; Nielsen, T.; Pons, N.; Levenez, F.; Yamada, T.; et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010, 464, 59–65. [CrossRef]
- Quince, C.; Walker, A.W.; Simpson, J.T.; Loman, N.J.; Segata, N. Shotgun metagenomics, from sampling to analysis. Nature Biotechnology 2017, 35, 833–844. [CrossRef]
- Friedman, J.; Alm, E.J. Inferring correlation networks from genomic survey data. PLoS Computational Biology 2012, 8, e1002687. [CrossRef]
- Kurtz, Z.D.; Müller, C.L.; Miraldi, E.R.; Littman, D.R.; Blaser, M.J.; Bonneau, R.A. Sparse and compositionally robust inference of microbial ecological networks. PLoS Computational Biology 2015, 11, e1004226. [CrossRef]
- Gloor, G.B.; Macklaim, J.M.; Pawlowsky-Glahn, V.; Egozcue, J.J. Microbiome datasets are compositional: and this is not optional. Frontiers in Microbiology 2017, 8, 2224. [CrossRef]
- Quinn, T.P.; Erb, I.; Gloor, G.; Notredame, C.; Richardson, M.F.; Crowley, T.M. A field guide for the compositional analysis of any-omics data. GigaScience 2019, 8, giz107. [CrossRef]
- Khan, S.; Kelly, L.; Glickman, J.; Ghaoui, L.E.; et al. Multiclass disease classification from microbial whole-community metagenomes using graph convolutional neural networks. Pacific Symposium on Biocomputing 2020, 25, 223–234.
- Xu, Y.; et al. CACONET: a novel classification framework for microbial correlation networks. Bioinformatics 2022, 38, 1639–1648. [CrossRef]
- Pan, S.; Jiang, X.; Zhang, K. WSGMB: weight signed graph neural network for microbial biomarker identification. Briefings in Bioinformatics 2024, 25, bbad448. [CrossRef]
- Fioravanti, D.; Giarratano, Y.; Maggio, V.; Agostinelli, C.; Chierici, M.; Jurman, G.; Furlanello, C. Phylogenetic convolutional neural networks in metagenomics. BMC Bioinformatics 2018, 19, 49. [CrossRef]
- Reiman, D.; Layden, B.T.; Dai, Y. PopPhy-CNN: a phylogenetic tree embedded architecture for convolutional neural networks to predict host phenotype from metagenomic data. IEEE Journal of Biomedical and Health Informatics 2020, 24, 2993–3001. [CrossRef]
- Irwin, C.; Mignone, F.; Montani, S.; Portinale, L. Graph Neural Networks for Gut Microbiome Metaomic Data: A Preliminary Work. arXiv preprint arXiv:2407.00142 2024. [CrossRef]
- Jiang, C.; et al. Predicting microbe-disease associations via graph neural network and contrastive learning. Frontiers in Microbiology 2024, 15, 1483983. [CrossRef]
- He, L.; et al. Adversarial regularized autoencoder graph neural network for predicting microbe-disease associations. Briefings in Bioinformatics 2024, 25, bbae584. [CrossRef]
- Veličković, P.; Fedus, W.; Hamilton, W.L.; Liò, P.; Bengio, Y.; Hjelm, R.D. Deep Graph Infomax. In Proceedings of the International Conference on Learning Representations, 2019.
- You, Y.; Chen, T.; Sui, Y.; Chen, T.; Wang, Z.; Shen, Y. Graph Contrastive Learning with Augmentations. In Proceedings of the Advances in Neural Information Processing Systems, 2020, Vol. 33, pp. 5812–5823.
- Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Deep Graph Contrastive Representation Learning. arXiv preprint arXiv:2006.04131 2020, [2006.04131].
- Hassani, K.; Khasahmadi, A.H. Contrastive Multi-View Representation Learning on Graphs. In Proceedings of the Proceedings of the 37th International Conference on Machine Learning, 2020, Vol. 119, pp. 4116–4126.
- Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Graph Contrastive Learning with Adaptive Augmentation. In Proceedings of the Proceedings of the Web Conference 2021, 2021, pp. 2069–2080. [CrossRef]
- Ju, W.; Wang, Y.; Qin, Y.; Mao, Z.; Xiao, Z.; Luo, J.; Yang, J.; Gu, Y.; Wang, D.; Long, Q.; et al. Towards Graph Contrastive Learning: A Survey and Beyond. arXiv preprint arXiv:2405.11868 2024.
- Peschel, S.; Müller, C.L.; von Mutius, E.; Boulesteix, A.L.; Depner, M. NetCoMi: network construction and comparison for microbiome data in R. Briefings in Bioinformatics 2021, 22, bbaa290. [CrossRef]
- Pasolli, E.; Schiffer, L.; Manghi, P.; Renson, A.; Obenchain, V.; Truong, D.T.; Beghini, F.; Malik, F.; Ramos, M.; Dowd, J.B.; et al. Accessible, curated metagenomic data through ExperimentHub. Nature Methods 2017, 14, 1023–1024. [CrossRef]
- Dai, D.; Zhu, J.; Sun, C.; Li, M.; Liu, J.; Wu, S.; Ning, K. GMrepo v2: a curated human gut microbiome database with special focus on disease markers and cross-dataset comparison. Nucleic Acids Research 2022, 50, D777–D784. [CrossRef]
- Janssens, Y.; Nielandt, J.; Bronselaer, A.; Debunne, N.; Verbeke, F.; Wynendaele, E.; Van Immerseel, F.; Vandewynckel, Y.P.; De Tré, G.; De Spiegeleer, B. Disbiome database: linking the microbiome to disease. BMC Microbiology 2018, 18, 50. [CrossRef]
- Mitchell, A.L.; Almeida, A.; Beracochea, M.; Boland, M.; Burgin, J.; Cochrane, G.; Crusoe, M.R.; Kale, V.; Potter, S.C.; Richardson, L.J.; et al. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Research 2020, 48, D570–D578. [CrossRef]
- Sun, F.Y.; Hoffmann, J.; Verma, V.; Tang, J. InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization. In Proceedings of the International Conference on Learning Representations, 2020, [1908.01000].
- Qiu, J.; Chen, Q.; Dong, Y.; Zhang, J.; Yang, H.; Ding, M.; Wang, K.; Tang, J. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training. In Proceedings of the Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2020, pp. 1150–1160. [CrossRef]
- You, Y.; Chen, T.; Shen, Y.; Wang, Z. Graph Contrastive Learning Automated. In Proceedings of the Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021, Vol. 139, Proceedings of Machine Learning Research, pp. 12121–12132.
- Suresh, S.; Li, P.; Hao, C.; Neville, J. Adversarial Graph Augmentation to Improve Graph Contrastive Learning. In Proceedings of the Advances in Neural Information Processing Systems, 2021, Vol. 34, pp. 15920–15933.
- Thakoor, S.; Tallec, C.; Azar, M.G.; Azabou, M.; Dyer, E.L.; Munos, R.; Veličković, P.; Valko, M. Large-Scale Representation Learning on Graphs via Bootstrapping. In Proceedings of the International Conference on Learning Representations, 2022, [2102.06514].
- Wang, T.; Isola, P. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. In Proceedings of the Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020, Vol. 119, Proceedings of Machine Learning Research, pp. 9929–9939.
- Aitchison, J. The Statistical Analysis of Compositional Data; Chapman and Hall: London, 1986. [CrossRef]
- Silverman, J.D.; Washburne, A.D.; Mukherjee, S.; David, L.A. A phylogenetic transform enhances analysis of compositional microbiota data. eLife 2017, 6, e21887. [CrossRef]
- Lin, H.; Peddada, S.D. Analysis of compositions of microbiomes with bias correction. Nature Communications 2020, 11, 3514. [CrossRef]
- Wirbel, J.; Zych, K.; Essex, M.; Karcher, N.; Kartal, E.; Salazar, G.; Bork, P.; Sunagawa, S.; Zeller, G. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biology 2021, 22, 93. [CrossRef]
- Ma, S.; Shungin, D.; Mallick, H.; Schirmer, M.; Nguyen, L.H.; Kolde, R.; Franzosa, E.A.; Vlamakis, H.; Xavier, R.J.; Huttenhower, C. Population structure discovery in meta-analyzed microbial communities and inflammatory bowel disease using MMUPHin. Genome Biology 2022, 23, 208. [CrossRef]
- Ling, W.; et al. Batch effects removal for microbiome data via conditional quantile regression. Nature Communications 2022, 13, 5418. [CrossRef]
- Hu, J.; Hu, M.; Wu, Y.; Mu, S.; Huang, D.; Wang, B.; Gao, Y.; Gu, S.; Zhu, J. A lightweight single-view contrastive learning hypergraph neural network for food–microbe–disease association prediction. BMC Bioinformatics 2025, 26, 283. [CrossRef]
- Xuan, P.; Wang, R.; Gu, J.; Cui, H.; Zhang, T. Structure-sensitive transformer and multi-view graph contrastive learning enhanced prediction of drug-related microbes. BMC Bioinformatics 2025, 26, 231. [CrossRef]
- Aminian-Dehkordi, J.; Parsa, M.; Dickson, A.; Mofrad, M.R.K. SIMBA-GNN: mechanistic graph learning for microbiome prediction. npj Systems Biology and Applications 2025, 11, 120. [CrossRef]
- Li, M.; et al. Performance of gut microbiome as an independent diagnostic tool for 20 diseases: cross-cohort validation of machine-learning classifiers. Gut Microbes 2023, 15, 2157684. [CrossRef]
- Chen, Z.; Wu, Z.; Zhong, L.; Plant, C.; Wang, S.; Guo, W. Attributed Multi-order Graph Convolutional Network for Heterogeneous Graphs. arXiv preprint arXiv:2304.06336 2023, [2304.06336].

| Graph type | Nodes | Edges | Typical use |
|---|---|---|---|
| Taxa–taxa graph | OTUs, ASVs, species or genera | Correlation, partial correlation, proportionality, co-abundance or inferred ecological association | Microbial network analysis, disease-specific community comparison, biomarker modules. |
| Phylogenetic graph | Taxa or internal tree nodes | Ancestor–descendant relationships or phylogenetic distance | Taxonomy-aware representation learning and phenotype prediction. |
| Sample–sample graph | Patients, samples or time points | Similarity in taxonomic, functional or multi-omics profiles | Patient stratification, disease classification and cohort alignment. |
| Microbe–disease graph | Microbes and diseases | Known or predicted associations | Microbe–disease association prediction and hypothesis generation. |
| Microbe–metabolite/pathway graph | Taxa, metabolites, genes or pathways | Functional annotation, metabolic exchange or correlation | Multi-omics integration and mechanism discovery. |
| Host–microbe graph | Host genes, immune markers, clinical variables and taxa | Statistical, mechanistic or literature-derived relationships | Precision medicine and host–microbiome interaction modelling. |
| Temporal graph | Taxa, samples or subject states over time | Longitudinal transitions or dynamic associations | Dysbiosis trajectories, intervention response and microbial stability. |
| Resource | Data type | Candidate task | Graph formulation | Candidate augmentation policy |
|---|---|---|---|---|
| Human Microbiome Project [2] | Multi-body-site 16S and shotgun profiles | Body-site classification, baseline pretraining | Sample–sample, taxa–taxa, body-site graphs | Body-site-preserving feature masking; cross-body-site transfer tests. |
| curated MetagenomicData [24] | Uniformly processed human metagenomes | Disease classification, LOSO validation | Taxa–taxa, sample–sample and study-aware graphs | Study-balanced batches; edge-confidence perturbation; leave-one-study-out splits. |
| GMrepo [25] | Curated gut microbiome profiles and disease metadata | Cross-disease phenotype prediction | Microbe–disease and sample–disease graphs | Disease-aware contrast; hard negative sampling among related phenotypes. |
| Disbiome [26] | Literature-curated microbe–disease links | Link prediction and biological validation | Bipartite microbe–disease graph | Similarity-view contrast; evaluation against held-out known associations. |
| MGnify [27] | Public metagenomic/metabarcoding analyses | Environmental transfer and pretraining | Taxa, functional and environmental graphs | Environment-aware positives; domain adaptation across habitats. |
| Disease-specific cohorts | 16S or shotgun profiles plus metadata | IBD, CRC, metabolic disease, infection | Phenotype-specific co-abundance networks | Confounder-aware pair selection; graph robustness across inference methods. |
| Augmentation | Implementation | Biological meaning | Suggested safeguard | Risk |
|---|---|---|---|---|
| Confidence-aware edge dropping | Preferentially drop edges with low bootstrap support or weak association score | Robustness to uncertain microbial associations | Drop 5–30% of low-confidence edges; preserve high-confidence hubs | May remove weak but meaningful interactions. |
| Prevalence-aware taxon masking | Mask taxa with probability conditioned on prevalence and abundance | Robustness to sparsity and dropout | Avoid always masking rare disease markers; report prevalence threshold | May suppress low-abundance biomarkers. |
| Phylogenetic subgraph sampling | Sample clades or phylogenetically coherent neighborhoods | Preservation of evolutionary structure | Sample at multiple depths; compare species/genus/family views | May overemphasize taxonomy over function. |
| Log-ratio feature perturbation | Add noise or mask features in CLR/ILR/PhILR space | Compositionality-aware robustness | Perform sensitivity analysis across zero-handling strategies | Poor pseudo-count choice may distort rare taxa. |
| Cross-omics contrast | Contrast taxonomic, pathway, metabolite or host graphs | Integration of complementary biological layers | Use paired samples only or model missing modalities explicitly | Omics layers may have different noise models. |
| Disease-aware graph views | Contrast healthy and disease-specific networks | Identification of dysbiosis-related modules | Balance cohorts and metadata; avoid study-label shortcuts | Confounding by treatment, geography or lifestyle. |
| Temporal contrast | Contrast nearby time points or intervention states | Learning stability and transition patterns | Use subject-level splits and time-aware negatives | Requires dense longitudinal sampling. |
| Dimension | Metrics or criteria | Concrete protocol | GCL-specific check |
|---|---|---|---|
| Predictive performance | Accuracy, balanced accuracy, F1-score, AUROC, AUPRC, calibration | Compare non-graph ML, supervised GNN and GCL-pretrained models on identical splits; use link-prediction metrics for association tasks [15,40,41]. | Test whether gains come from contrastive pretraining rather than only architecture or classifier tuning. |
| External validation | LOSO AUROC/AUPRC, transfer gap, calibration shift | Train on multiple cohorts and test on a held-out study using SIAMCAT-like cross-study designs [37,43]. | Verify that embeddings transfer across studies and do not encode cohort identity. |
| Graph robustness | Stability across inference method, threshold, graph density and taxonomic level | Reconstruct graphs using Spearman/SparCC/SPIEC-EASI/NetCoMi and repeat evaluation [5,6,23]. | Measure prediction and embedding stability under graph-construction stress tests. |
| Biological plausibility | Enrichment of known taxa, pathways, edges or modules | Compare salient nodes/edges/subgraphs with Disbiome, GMrepo and disease literature [25,26]. | Check whether learned invariances correspond to plausible biology. |
| Interpretability | Stability of node, edge and subgraph explanations | Repeat explanations across seeds, folds, cohorts and graph perturbations; compare with CACONET/WSGMB-style network biomarkers [10,11]. | Determine whether biomarkers and microbial modules are stable rather than augmentation artifacts. |
| Reproducibility | Code, scripts, splits, graph parameters, seeds and model checkpoints | Release preprocessing, graph construction, view generation, training and evaluation scripts; use standardized resources such as curated MetagenomicData [24]. | Allows separation of the effects of preprocessing, graph construction, augmentation and contrastive loss. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).