Submitted:
21 May 2026
Posted:
22 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Background
2.1. Structured Data: Types, Properties, and Inductive Biases
2.1.1. Tabular Data
2.1.2. Time Series Data
2.1.3. Graph Data
2.2. Foundation Models: Core Concepts and Characteristics
2.2.1. General-Purpose Representations
2.2.2. Efficient and Robust Adaptability
2.3. Why Structured Data Breaks Conventional Foundation Model Assumptions
2.3.1. Data Heterogeneity and Inductive Biases
2.3.2. Lack of Canonical Tokenization
2.3.3. Coupling Between Structural Semantics and Tasks
3. Common Principles
3.1. Structured, Heterogeneous, and Semantically Constrained Data
3.2. Tokenization as Structure-Preserving Abstraction
3.3. Data-Native vs. LLM-Based Architecture Paradigms
3.4. Pre-Training by Learning Generalizable Structural Priors
3.5. Efficient and Structure-Preserving Adaptation
4. Tabular Foundation Models
4.1. Pre-Training Data Construction
4.1.1. Synthetic Data from Priors
4.1.2. Real-World Data Across Diverse Domains
4.1.3. Large Knowledge Base
4.2. Pre-Training Objectives and Tasks
4.2.1. Supervised Tasks
4.2.2. Masked Reconstruction
4.2.3. Contrastive Learning
4.3. Data Tokenization and Representation
4.3.1. Cell-Level Tokenization
4.3.2. Row-Level Tokenization
4.3.3. Column Name-Value Tuple
4.4. Model Architecture
4.4.1. Transformer-Based Architecture
4.4.2. Pre-trained language model based architecture
4.4.3. Hybrid architecture
4.5. Adaptation Strategy
4.5.1. In-Context Learning
4.5.2. Prompt-Based Adaptation
4.5.3. Fine-Tuning
4.6. Recent Advances and Applications
4.6.1. Trustworthiness
4.6.2. Applications
4.7. Datasets and Benchmarks
5. Time Series Foundation Models
5.1. Pre-Training Data Construction
5.1.1. Real-World Datasets
5.1.2. Synthetic Data
5.2. Pre-Training Objectives and Tasks
5.2.1. Supervised Predictive Tasks
5.2.2. Masked Reconstruction
5.2.3. Next Token Prediction
5.3. Data Tokenization and Representation
5.3.1. Point-Level Tokenization
5.3.2. Patch-Level Tokenization
5.3.3. Temporal Feature Representation
5.4. Model Architecture
5.4.1. Transformer-Based Architecture
5.4.2. LLM-Based Architecture
5.5. Adaptation Strategy
5.5.1. Direct Inference
5.5.2. Prompt-Based Adaptation
5.5.3. Fine-Tuning
5.6. Domain and Task Transferability
5.6.1. Domain Transferability
5.6.2. Task Transferability
5.7. Recent Advances and Applications
5.7.1. Vision Models
5.7.2. Vision-Time Fusion
5.7.3. Retrieval Augmentation
5.7.4. Interactive Systems
5.8. Datasets, Benchmarks and Frameworks
6. Graph Foundation Models
6.1. Pre-Training Data Construction
6.1.1. Text-Attributed Graphs (TAGs)
6.1.2. Text-Free Graphs
6.1.3. Others
6.2. Pre-Training Objectives and Tasks
6.2.1. Contrastive and Alignment
6.2.2. Reconstruction and Generative
6.2.3. Objective-Agnostic
6.3. Data Tokenization and Representation
6.3.1. Graph-Structure Tokens
6.3.2. Auxiliary Abstract Tokens
6.3.3. Language and Sequence Tokens
6.4. Model Architecture
6.4.1. GNN-Based Architecture
6.4.2. LLM-Based Architecture
6.4.3. GNN-LLM Hybrid Architecture
6.5. Adaptation Strategy
6.5.1. Prompt-Based
6.5.2. Fine-Tuning-Based
6.5.3. Instruction- and In-Context Adaptation
6.5.4. Adaptation-Free Strategies
6.6. Recent Advances and Applications
6.6.1. Reasoning
6.6.2. Federated Learning
6.6.3. Trustworthiness
6.6.4. Biomedicine
6.7. Datasets, Benchmarks and Frameworks
7. Challenges and Future Directions
7.1. Scaling Laws for Structured Data Foundation Models
7.2. Unified Foundation Models for Structured Data
7.3. Integration with LLMs, VLMs, and Emerging Agentic Systems
8. Conclusions
Acknowledgments
References
- Minaee, S.; et al. Large language models: A survey. arXiv 2024, arXiv:2402.06196. [Google Scholar]
- Li, Z.; et al. A survey of state of the art large vision language models: Alignment, benchmark, evaluations and challenges. arXiv 2025, arXiv:2501.02189. [Google Scholar] [CrossRef]
- Van Breugel, B.; Van Der Schaar, M. Position: Why tabular foundation models should be a research priority. arXiv 2024, arXiv:2405.01147. [Google Scholar]
- Hollmann, N.; et al. Accurate predictions on small data with a tabular foundation model. Nature 2025, 637, 319–326. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; et al. Limix: Unleashing structured-data modeling capability for generalist intelligence. arXiv 2025, arXiv:2509.03505. [Google Scholar]
- Das, A.; et al. A decoder-only foundation model for time-series forecasting. In Proceedings of the ICML, 2024. [Google Scholar]
- Ansari, A.F.; Stella, L.; Turkmen, A.C.; et al. Chronos: Learning the Language of Time Series. TMLR, 2024. [Google Scholar]
- Zhao, H.; et al. All in one and one for all: A simple yet effective method towards cross-domain graph pretraining. In Proceedings of the KDD, 2024; pp. 4443–4454. [Google Scholar]
- He, Y.; et al. Unigraph: Learning a unified cross-domain foundation model for text-attributed graphs. In Proceedings of the KDD, 2025; pp. 448–459. [Google Scholar]
- Somvanshi, S.; et al. A survey on deep tabular learning. arXiv 2024, arXiv:2410.12034. [Google Scholar] [CrossRef]
- Ye, J.; et al. A survey of time series foundation models: Generalizing time series representation with large language model. arXiv 2024, arXiv:2405.02358. [Google Scholar]
- Liang, Y.; et al. Foundation models for time series analysis: A tutorial and survey. In Proceedings of the KDD, 2024; pp. 6555–6565. [Google Scholar]
- Wang, Z.; et al. Graph Foundation Models: A Comprehensive Survey. arXiv 2025, arXiv:2505.15116. [Google Scholar] [CrossRef]
- Liu, J.; et al. Graph foundation models: Concepts, opportunities and challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. [Google Scholar]
- Yan, J.o. Making pre-trained language models great on tabular prediction. In Proceedings of the ICLR, 2024. [Google Scholar]
- Hegselmann, S.; et al. Tabllm: Few-shot classification of tabular data with large language models. In Proceedings of the AISTATS, 2023; pp. 5549–5581. [Google Scholar]
- Goswami, M.; et al. MOMENT: A Family of Open Time-series Foundation Models. In Proceedings of the ICML, 2024. [Google Scholar]
- Yuan, H.; et al. GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Liu, H.; et al. One For All: Towards Training One Graph Model For All Classification Tasks. In Proceedings of the ICLR, 2024. [Google Scholar]
- Liu, X.; et al. Unitime: A language-empowered unified model for cross-domain time series forecasting. In Proceedings of the WWW, 2024; pp. 4095–4106. [Google Scholar]
- Yuan, H.; et al. How Much Can Transfer? BRIDGE: Bounded Multi-Domain Graph Foundation Model with Generalization Guarantees. In Proceedings of the ICML, 2025. [Google Scholar]
- Gruver, N.; et al. Large language models are zero-shot time series forecasters. NeurIPS 2023, 36, 19622–19635. [Google Scholar]
- Chen, R.; et al. Llaga: Large language and graph assistant. arXiv 2024, arXiv:2402.08170. [Google Scholar] [CrossRef]
- Yak, S.; et al. IngesTables: scalable and efficient training of LLM-enabled tabular foundation models. In Proceedings of the TRL Workshop in NeurIPS, 2023. [Google Scholar]
- Cao, D.; et al. TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting. In Proceedings of the ICLR, 2024. [Google Scholar]
- Shi, J.; et al. SA 2GFM: Enhancing Robust Graph Foundation Models with Structure-Aware Semantic Augmentation. In Proceedings of the AAAI, 2026. [Google Scholar]
- Ma, J.; et al. Tabdpt: Scaling tabular foundation models on real data. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Liu, Y.; et al. Timer: Generative Pre-trained Transformers Are Large Time Series Models. In Proceedings of the ICML, 2024; pp. 32369–32399. [Google Scholar]
- Yang, Y.; et al. Unitabe: A universal pretraining protocol for tabular foundation model in data science. In Proceedings of the ICLR, 2024. [Google Scholar]
- Xia, L.; Huang, C. Anygraph: Graph foundation model in the wild. arXiv 2024, arXiv:2408.10700. [Google Scholar] [CrossRef]
- Xue, H.; Salim, F.D. Promptcast: A new prompt-based learning paradigm for time series forecasting. IEEE TKDE 2023, 36, 6851–6864. [Google Scholar] [CrossRef]
- Liu, Z.; et al. Graphprompt: Unifying pre-training and downstream tasks for graph neural networks. In Proceedings of the WWW, 2023; pp. 417–428. [Google Scholar]
- Jingang, Q.; et al. TabICL: A Tabular Foundation Model for In-Context Learning on Large Data. In Proceedings of the ICML, 2025. [Google Scholar]
- Zhang, X.o. Mitra: Mixed synthetic priors for enhancing tabular foundation models. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Kim, M.J.; et al. CARTE: Pretraining and Transfer for Tabular Learning. In Proceedings of the ICML, 2024; pp. 23843–23866. [Google Scholar]
- Spinaci, M.; et al. Portal: Scalable tabular foundation models via content-specific tokenization. arXiv 2024, arXiv:2410.13516. [Google Scholar] [CrossRef]
- Breejen, F.d.; et al. Fine-tuned In-Context Learning Transformers are Excellent Tabular Data Classifiers. arXiv 2024, arXiv:2405.13396. [Google Scholar]
- Arazi, A.; et al. TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Ding, J.; et al. Tabula: A Tabular Self-Supervised Foundation Model for Single-Cell Transcriptomics. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Kim, M.J.; et al. Table Foundation Models: on knowledge pre-training for tabular learning. TMLR; 2025. [Google Scholar]
- Garg, A.; et al. Real-tabpfn: Improving tabular foundation models via continued pre-training with real-world data. arXiv 2025, arXiv:2507.03971. [Google Scholar]
- Wang, R.; et al. Unipredict: Large language models are universal tabular classifiers. arXiv 2023, arXiv:2310.03266. [Google Scholar]
- Gardner, J.; et al. Large scale transfer learning for tabular data via language modeling. Adv. Neural Inf. Process. Syst. 2024, 37, 45155–45205. [Google Scholar]
- Hu, E.J.; et al. Lora: Low-rank adaptation of large language models. Iclr 2022, 1, 3. [Google Scholar]
- Peroni, M.; et al. Robust Tabular Foundation Models. arXiv 2025, arXiv:2512.03307. [Google Scholar] [CrossRef]
- Djilani, M.; et al. On the Robustness of Tabular Foundation Models: Test-Time Attacks and In-Context Defenses. arXiv 2025, arXiv:2506.02978. [Google Scholar]
- Saito, T.; et al. Applying a Tabular Foundation Model to Geotechnical Site Characterization. Geod. AI 2025, 100040. [Google Scholar] [CrossRef]
- Ye, C.; et al. Towards cross-table masked pretraining for web data mining. In Proceedings of the WWW, 2024; pp. 4449–4459. [Google Scholar]
- Erickson, N.; et al. Tabarena: A living benchmark for machine learning on tabular data. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Tran, Q.M.; et al. TabularFM: An open framework for tabular foundational models. In Proceedings of the IEEE BigData, 2024; pp. 1694–1699. [Google Scholar]
- Byun, J.; et al. Risk In Context: Benchmarking Privacy Leakage of Foundation Models in Synthetic Tabular Data Generation. arXiv 2025, arXiv:2507.17066. [Google Scholar] [CrossRef]
- Dooley, S.; et al. Forecastpfn: Synthetically-trained zero-shot forecasting. NeurIPS 2023, 36, 2403–2426. [Google Scholar]
- Rasul, K.; et al. Lag-llama: Towards foundation models for time series forecasting. In Proceedings of the R0-FoMo Workshop at NeurIPS, 2023. [Google Scholar]
- Garza, A.; et al. TimeGPT-1. arXiv 2023, arXiv:2310.03589. [Google Scholar]
- Woo, G.; et al. Unified Training of Universal Time Series Forecasting Transformers. In Proceedings of the ICML, 2024; pp. 53140–53164. [Google Scholar]
- Gao, S.; et al. Units: A unified multi-task time series model. NeurIPS 2024, 37, 140589–140631. [Google Scholar]
- Xiaoming, S.; et al. Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts. In Proceedings of the ICLR, 2025. [Google Scholar]
- Masserano, L.; et al. Enhancing foundation models for time series forecasting via Wavelet-based tokenization. arXiv 2024, arXiv:2412.05244. [Google Scholar] [CrossRef]
- Wang, Y.; et al. Towards a General Time Series Forecasting Model with Unified Representation and Adaptive Transfer. In Proceedings of the ICML, 2025. [Google Scholar]
- Zhou, T.; et al. One fits all: Power general time series analysis by pretrained lm. NeurIPS 2023, 36, 43322–43355. [Google Scholar]
- Jia, F.; et al. Gpt4mts: Prompt-based large language model for multimodal time-series forecasting. Proc. AAAI 2024, Vol. 38, 23343–23351. [Google Scholar] [CrossRef]
- Jin, M.; et al. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. In Proceedings of the ICLR, 2024. [Google Scholar]
- Liu, Y.; et al. Autotimes: Autoregressive time series forecasters via large language models. NeurIPS 2024, 37, 122154–122184. [Google Scholar]
- Liu, P.; et al. Calf: Aligning llms for time series forecasting via cross-modal fine-tuning. Proc. AAAI 2025, Vol. 39, 18915–18923. [Google Scholar] [CrossRef]
- Chang, C.; et al. Llm4ts: Aligning pre-trained llms as data-efficient time-series forecasters. ACM Trans. Intell. Syst. Technol. 2025, 16, 1–20. [Google Scholar] [CrossRef]
- Kowsher, M.; et al. Llm-mixer: Multiscale mixing in llms for time series forecasting. In Proceedings of the TRL Workshop at ACL, 2025; pp. 156–165. [Google Scholar]
- Woo, G.; et al. Pushing the limits of pre-training for time series forecasting in the cloudops domain. arXiv 2023, arXiv:2310.05063. [Google Scholar]
- Lu, K.; et al. Frozen pretrained transformers as universal computation engines. Proc. AAAI 2022, Vol. 36, 7628–7636. [Google Scholar] [CrossRef]
- Chen, M.; et al. VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters. In Proceedings of the ICML, 2025; pp. 8979–9007. [Google Scholar]
- Shen, L.; et al. VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Vision Backbones. arXiv 2025, arXiv:2508.04379. [Google Scholar]
- Zhong, S.; et al. Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting. In Proceedings of the ICML, 2025; pp. 78478–78497. [Google Scholar]
- Khezresmaeilzadeh, T.; et al. Vista: Vision-language inference for training-free stock time-series analysis. arXiv 2025, arXiv:2505.18570. [Google Scholar]
- Yang, S.; et al. Timerag: Boosting llm time series forecasting via retrieval-augmented generation. In Proceedings of the ICASSP. IEEE, 2025; pp. 1–5. [Google Scholar]
- Ning, K.; et al. Ts-rag: Retrieval-augmented generation based time series foundation models are stronger zero-shot forecaster. arXiv 2025, arXiv:2503.07649. [Google Scholar]
- Liang, K.Y.; et al. Retrieval-Augmented Generation with Covariate Time Series. arXiv 2026, arXiv:2603.04951. [Google Scholar] [CrossRef]
- Zhang, H.; et al. Timeraf: Retrieval-augmented foundation model for zero-shot time series forecasting. IEEE Transactions on Knowledge and Data Engineering, 2025. [Google Scholar]
- Lee, S.; et al. Cross-RAG: Zero-Shot Retrieval-Augmented Time Series Forecasting via Cross-Attention. arXiv 2026, arXiv:2603.14709. [Google Scholar]
- Ravuru, C.; et al. Agentic retrieval-augmented generation for time series analysis. arXiv 2024, arXiv:2408.14484. [Google Scholar] [CrossRef]
- Ang, Y.; et al. Tsgassist: An interactive assistant harnessing llms and rag for time series generation recommendations and benchmarking. VLDB 2024, 17, 4309–4312. [Google Scholar] [CrossRef]
- Ang, Y.; et al. TSGBench: Time Series Generation Benchmark. VLDB 2023, 17, 305–318. [Google Scholar] [CrossRef]
- Li, Z.; et al. Tsfm-bench: A comprehensive and unified benchmark of foundation models for time series forecasting. In Proceedings of the KDD, 2025; pp. 5595–5606. [Google Scholar]
- Chuang, Y.N.; et al. Ltsm-bundle: A toolbox and benchmark on large language models for time series forecasting. ACM SIGKDD Explor. Newsl. 2025, 27, 43–61. [Google Scholar] [CrossRef]
- Meyer, M.; et al. Benchmarking time series foundation models for short-term household electricity load forecasting. arXiv 2024, arXiv:2410.09487. [Google Scholar] [CrossRef]
- Franco, A.C.; et al. Forecasting Oil Production with Time-Series Foundation Models-A Benchmark Study Against Classical Machine Learning Models. In Proceedings of the SPE Annual Technical Conference and Exhibition, 2025; p. D011S010R003. [Google Scholar]
- Marchesi, G.; et al. Assessing Time Series Foundation Models for Probabilistic Electricity Price Forecasting: Toward a Unified Benchmark. Energies 2025, 18, 6269. [Google Scholar] [CrossRef]
- Yu, X.; et al. Hgprompt: Bridging homogeneous and heterogeneous graphs for few-shot prompt learning. In Proceedings of the AAAI, 2024. [Google Scholar]
- Yu, X.; et al. Multigprompt for multi-task pre-training and prompting on graphs. In Proceedings of the WWW, 2024; pp. 515–526. [Google Scholar]
- Xia, L.; et al. Opengraph: Towards open graph foundation models. arXiv 2024, arXiv:2403.01121. [Google Scholar] [CrossRef]
- Wang, Z.; et al. Gft: Graph foundation model with transferable tree vocabulary. NeurIPS 2024, 37, 107403–107443. [Google Scholar]
- Yu, X.; et al. Text-free multi-domain graph pre-training: Toward graph foundation models. arXiv 2024, arXiv:2405.13934. [Google Scholar]
- Liu, J.; et al. One model for one graph: A new perspective for pretraining with cross-domain graphs. arXiv 2024, arXiv:2412.00315. [Google Scholar]
- Guo, Z.; et al. Graphmore: Mitigating topological heterogeneity via mixture of riemannian experts. Proc. AAAI 2025, Vol. 39, 11754–11762. [Google Scholar] [CrossRef]
- Zhao, J.; et al. Fully-inductive Node Classification on Arbitrary Graphs. In Proceedings of the ICLR, 2025. [Google Scholar]
- Lv, R.; et al. Graphprompter: Multi-stage adaptive prompt optimization for graph in-context learning. In Proceedings of the ICDE, 2025; pp. 3917–3930. [Google Scholar]
- Wang, Z.; et al. Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees. In Proceedings of the ICML, 2025. [Google Scholar]
- Wang, S.; et al. Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment. In Proceedings of the ICML, 2025. [Google Scholar]
- Chen, H.; et al. Autogfm: Automated graph foundation model with adaptive architecture customization. In Proceedings of the ICML, 2025. [Google Scholar]
- Yu, X.; et al. GCoT: Chain-of-thought prompt learning for graphs. In Proceedings of the KDD, 2025; pp. 3669–3679. [Google Scholar]
- Sun, Y.; et al. Handling feature heterogeneity with learnable graph patches. In Proceedings of the KDD, 2025; pp. 1313–1324. [Google Scholar]
- Yu, X.; et al. Samgpt: Text-free graph foundation model for multi-domain pre-training and cross-domain adaptation. In Proceedings of the WWW, 2025; pp. 1142–1153. [Google Scholar]
- He, Y.; et al. Unigraph2: Learning a unified embedding space to bind multimodal graphs. In Proceedings of the WWW, 2025; pp. 1759–1770. [Google Scholar]
- Sun, L.; et al. Riemanngfm: Learning a graph foundation model from riemannian geometry. In Proceedings of the WWW, 2025; pp. 1154–1165. [Google Scholar]
- Zhu, Y.; et al. Graphclip: Enhancing transferability in graph foundation models for text-attributed graphs. In Proceedings of the WWW, 2025; pp. 2183–2197. [Google Scholar]
- Huang, Y.; et al. One Prompt Fits All: Universal Graph Adaptation for Pretrained Models. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Guo, Z.; et al. GraphKeeper: Graph Domain-Incremental Learning via Knowledge Disentanglement and Preservation. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Nguyen, T.K.; et al. H2GFM: Towards unifying Homogeneity and Heterogeneity on Text-Attributed Graphs. arXiv 2025, arXiv:2506.08298. [Google Scholar]
- Tang, Z.; Chen, J. Toward a Graph Foundation Model: Pre-Training Transformers With Random Walks. arXiv 2025, arXiv:2506.14098. [Google Scholar] [CrossRef]
- Zhao, Z.; et al. Towards Text-free Graph Foundation Models: Rethinking Multi-Domain Graph Contrastive Learning. arXiv 2025, arXiv:2506.22510. [Google Scholar]
- Ma, W.; et al. GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning. arXiv 2025, arXiv:2510.04567. [Google Scholar]
- Wang, Z.; et al. GMoPE: A Prompt-Expert Mixture Framework for Graph Foundation Models. arXiv 2025, arXiv:2511.03251. [Google Scholar]
- Sun, L.; et al. Multi-Domain Transferable Graph Gluing for Building Graph Foundation Models. In Proceedings of the ICLR, 2026. [Google Scholar]
- Lin, T.; et al. Langgfm: A large language model alone can be a powerful graph foundation model. arXiv 2024, arXiv:2410.14961. [Google Scholar] [CrossRef]
- Zhu, X.; et al. Llm as gnn: Graph vocabulary learning for text-attributed graph foundation models. arXiv 2025, arXiv:2503.03313. [Google Scholar] [CrossRef]
- Tang, J.; et al. Graphgpt: Graph instruction tuning for large language models. In Proceedings of the SIGIR, 2024; pp. 491–500. [Google Scholar]
- Li, Y.; et al. Zerog: Investigating cross-dataset zero-shot transferability in graphs. In Proceedings of the KDD, 2024; pp. 1725–1735. [Google Scholar]
- Kong, L.; et al. GOFA: A Generative One-For-All Model for Joint Graph Language Modeling. In Proceedings of the ICLR, 2025. [Google Scholar]
- Cheng, Y.; et al. Boosting Cross-Domain and Cross-Task Generalization for Text-Attributed Graphs from Structural Perspective. In Frontiers of Computer Science; 2025. [Google Scholar]
- Gao, Y.; et al. Hypergraph foundation model. TPAMI; 2025. [Google Scholar]
- Sun, L.; et al. Deeper with Riemannian Geometry: Overcoming Oversmoothing and Oversquashing for Graph Foundation Models. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Sun, L.; Yu, P.S. A Riemannian perspective on graph foundation models: curvature as a guiding principle. Front. Comput. Sci. 2026, 20, 2012370. [Google Scholar] [CrossRef]
- Eremeev, D.; et al. Turning tabular foundation models into graph foundation models. arXiv 2025, arXiv:2508.20906. [Google Scholar] [CrossRef]
- Veličković, P.; et al. Deep Graph Infomax. In Proceedings of the ICLR, 2019. [Google Scholar]
- Zhu, Y.; et al. Deep graph contrastive representation learning. arXiv 2020, arXiv:2006.04131. [Google Scholar] [CrossRef]
- Hou, Z.; et al. Graphmae: Self-supervised masked graph autoencoders. In Proceedings of the KDD, 2022; pp. 594–604. [Google Scholar]
- Zhao, J.; et al. Fug: Feature-universal graph contrastive pre-training for graphs with diverse node features. NeurIPS 2024, 37, 4003–4034. [Google Scholar]
- Li, Y.; et al. Advancing graph foundation models: A data-centric perspective. In Proceedings of the KDD, 2025; pp. 1635–1646. [Google Scholar]
- Cui, Y.; et al. A prompt-based knowledge graph foundation model for universal in-context reasoning. NeurIPS 2024, 37, 7095–7124. [Google Scholar]
- Luo, L.; et al. GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Luo, L.; et al. G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge. In Proceedings of the ICLR, 2025. [Google Scholar]
- Yuan, H.; et al. RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation. In Proceedings of the WWW, 2026. [Google Scholar]
- Yuan, H.; et al. Retrieving Minimal and Sufficient Reasoning Subgraphs with Graph Foundation Models for Path-aware GraphRAG. arXiv 2026, arXiv:2603.07179. [Google Scholar] [CrossRef]
- Zhu, Y.; et al. Towards Effective Federated Graph Foundation Model via Mitigating Knowledge Entanglement. In Proceedings of the NeurIPS, 2025. [Google Scholar]
- Wu, Z.; et al. FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling. arXiv 2025, arXiv:2510.07755. [Google Scholar]
- Zhu, Y.; et al. Rethinking Federated Graph Foundation Models: A Graph-Language Alignment-based Approach. arXiv 2026, arXiv:2601.21369. [Google Scholar]
- Qiao, H.; et al. Anomalygfm: Graph foundation model for zero/few-shot anomaly detection. In Proceedings of the KDD, 2025; pp. 2326–2337. [Google Scholar]
- Xu, H.; et al. GLIP-OOD: Zero-Shot Graph OOD Detection with Graph Foundation Model. arXiv 2025, arXiv:2504.21186. [Google Scholar]
- Xu, H.; et al. A Systematic Study of Model Extraction Attacks on Graph Foundation Models. arXiv 2025, arXiv:2511.11912. [Google Scholar] [CrossRef]
- Xue, X.; et al. Stealthy Dual-Trigger Backdoors: Attacking Prompt Tuning in LM-Empowered Graph Foundation Models. arXiv 2025, arXiv:2510.14470. [Google Scholar]
- Wang, Y.; et al. HeTa: relation-wise heterogeneous graph foundation attack model. In Proceedings of the IJCAI, 2025; pp. 3453–3461. [Google Scholar]
- Luo, J.; et al. Towards Effective, Stealthy, and Persistent Backdoor Attacks Targeting Graph Foundation Models. In Proceedings of the AAAI, 2026. [Google Scholar]
- Luo, J.; et al. Privacy auditing of multi-domain graph pre-trained model under membership inference attacks. Proc. AAAI 2026, Vol. 40, 15483–15491. [Google Scholar] [CrossRef]
- Chen, J.; et al. GFM4GA: Graph Foundation Model for Group Anomaly Detection. arXiv 2026, arXiv:2601.10193. [Google Scholar] [CrossRef]
- King, I.J.; et al. CyberGFM: Graph Foundation Models for Lateral Movement Detection in Enterprise Networks. arXiv 2026, arXiv:2601.05988. [Google Scholar] [CrossRef]
- Huang, K.; et al. A foundation model for clinician-centered drug repurposing. Nat. Med. 2024, 30, 3601–3613. [Google Scholar] [CrossRef]
- Qin, Z.; et al. GraphMSR: A graph foundation model-based approach for MRI image super-resolution with multimodal semantic integration. Pattern Recognit. 2025, 112178. [Google Scholar] [CrossRef]
- Wei, X.; et al. A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning across Broad Atlases and Disorders. In Proceedings of the ICLR, 2026. [Google Scholar]
- Zhang, X.; et al. CellAwareGNN: Single-Cell Enhanced Knowledge Graph Foundation Model for Drug Indication Prediction. bioRxiv 2026, 2026–02. [Google Scholar]
- Chen, Z.; et al. Text-space graph foundation models: Comprehensive benchmarks and new insights. NeurIPS 2024, 37, 7464–7492. [Google Scholar]
- Yang, J.; et al. Benchmarking Graph Foundation Models. Proc. KDD 2025, 5866–5875. [Google Scholar]
- Yu, X.; et al. Evaluating Progress in Graph Foundation Models: A Comprehensive Benchmark and New Insights. arXiv 2026, arXiv:2603.10033. [Google Scholar] [CrossRef]
- Brown, T.; et al. Language models are few-shot learners. NeurIPS 2020, 33, 1877–1901. [Google Scholar]
- Kaplan, J.; et al. Scaling laws for neural language models. arXiv 2020, arXiv:2001.08361. [Google Scholar] [CrossRef]
- Mao, H.; et al. Position: Graph foundation models are already here. In Proceedings of the ICML, 2024. [Google Scholar]
- Yao, Q.; et al. Towards neural scaling laws for time series foundation models. In Proceedings of the ICLR, 2025. [Google Scholar]
- Hayler, A.; et al. Bringing Graphs to the Table: Zero-shot Node Classification via Tabular Foundation Models. arXiv 2025, arXiv:2509.07143. [Google Scholar] [CrossRef]
- Hayler, A.; et al. Of graphs and tables: Zero-shot node classification with tabular foundation models. In Proceedings of the NPGML Workshop in NeurIPS, 2025. [Google Scholar]
- Latif-Martínez, H.; et al. Tsgfm - towards a graph foundation model for time series analysis in network monitoring. In Proceedings of the TMA. IEEE, 2025; pp. 1–4. [Google Scholar]
| 1 | |
| 2 | |
| 3 |


| Method | Pretraining Data | Pre-trainingObjective & Task | Tokenization | Model Architecture | Adaptation Strategy | Domain Transferability | Downstream Task | Venue |
|---|---|---|---|---|---|---|---|---|
| MITRA [34] | synthetic data(SCM, tree-based priors) | classification, regression | cell | Transformer | FT, ICL | 1:N | CLS, REG | NeurIPS 2025 |
| UniTabE [29] | real-worldtabular datasets | maskedcell prediction, row-wisecontrastive learning | name-value tuple | Transformer | FT | N:N | CLS, REG | ICLR 2024 |
| CARTE [35] | large knowledge base | contrastive learningof graphlet and truncation pairs | row | Transformer | FT | N:N | CLS, REG | ICML 2024 |
| PORTAL [36] | real-worldtabular datasets | masked cell modeling | row | Transformer | FT | N:N | CLS, REG | NeurIPS 2024Workshop |
| TabForestPFN [37] | synthetic data(SCM, tree-based priors) | classification | cell | Transformer | FT, ICL | 1:N | CLS | arXiv 2024 |
| TabPFNv2 [4] | synthetic data (SCM) | masked cell prediction | cell | Transformer | ICL | 1:N | CLS, REG | Nature 2025 |
| TabICL [33] | synthetic data(SCM, tree-based priors) | classification | row | Transformer | ICL | 1:N | CLS | ICML 2025 |
| TabDPT [27] | real-worldtabular datasets | masked column prediction | row | Transformer | ICL | N:N | CLS, REG | NeurIPS 2025 |
| TabSTAR [38] | real-worldtabular datasets | classification, regression | name-value tuple | Transformer | FT | N:N | CLS, REG | NeurIPS 2025 |
| TABULA [39] | real-worldtabular datasets | column-wise reconstruction | name-value tuple | Transformer | FT | N:N | IMP | NeurIPS 2025 |
| TARTE [40] | large knowledge base | contrastive learningof entities and facts | name-value tuple | Transformer | FT | N:N | CLS, REG | TMLR 2025 |
| LimiX [5] | synthetic data (SCM) | context-conditional masked modeling | cell | Transformer | ICL | 1:N | CLS, REG, IMP, GEN | arXiv 2025 |
| Real-TabPFN [41] | synthetic, real-worldtabular datasets | classification | cell | Transformer | ICL | 1:N | CLS | arXiv 2025 |
| TabLLM [16] | text | table-to-text generation | name-value tuple | LLM | FT | 1:N | CLS | AISTATS 2023 |
| UniPredict [42] | real-worldtabular datasets | table-to-text generation | name-value tuple | LLM | IT | 1:N | CLS, REG | arXiv 2023 |
| TP-BERTa [15] | real-worldtabular datasets | CLS, REG | name-value tuple | LLM | FT | N:N | CLS, REG | ICLR 2024 |
| TABULA-8B [43] | real-worldtabular datasets | tabular prediction | row | LLM | ICL | 1:N | CLS, REG | NeurIPS 2024 |
| IngesTables [24] | real-worldtabular datasets | attention-basedtabular modeling | name-value tuple | Transformer+ LLM | FT | N:N | CLS, REG | NeurIPS 2023Workshop |
| Method | Pretraining Data | Pre-trainingObjective & Task | Tokenization | Model Architecture | Adaptation Strategy | Domain Transferability | Downstream Task | Venue |
|---|---|---|---|---|---|---|---|---|
| ForecastFPN [52] | synthetic data (periodicity) | point forecasting | point | Transformer | / | 1:N | FCT | NeurIPS 2023 |
| Lag-Llama [53] | real-world time series datasets | probabilistic forecasting | lag-feature vector | Transformer | / | N:N | FCT | NeurIPS 2023 Workshop |
| TimeGPT-1 [54] | real-world time series datasets | forecasting | sliding window | Transformer | FT | N:N | FCT | arXiv 2023 |
| UniTime [20] | real-world time series datasets | point forecasting, reconstruction | patch with fixed length | Transformer | ICL | N:N | FCT | WWW 2024 |
| TimesFM [6] | synthetic data and real-word time series datasets | point forecasting | patch with fixed length | Transformer | FT | N:N | FCT | ICML 2024 |
| MOMENT [17] | real-world time series datasets | masked reconstruction | patch with fixed length | Transformer | FT | N:N | FCT, CLS, IMP, AD | ICML 2024 |
| MOIRAI [55] | real-world time series datasets | probabilistic forecasting | patch withadaptive length | Transformer | ICL | N:N | FCT | ICML 2024 |
| Timer [28] | real-world time series datasets | next token prediction | patch with fixed length | Transformer | / | N:N | FCT, IMP, AD | ICML 2024 |
| UNITS [56] | real-world time series datasets | masked reconstruction | patch with fixed length | Transformer | PL | N:N | FCT, CLS, IMP, AD | NeurIPS 2024 |
| Time-MoE [57] | real-world time series datasets | multi-resolution forecasting | point | Transformer | FT, ICL | N:N | FCT | ICLR 2025 |
| WaveToken [58] | real-world time series datasets | next token prediction | wavelet | Transformer | ICL | N:N | FCT | ICML 2025 |
| ROSE [59] | real-world time series datasets | masked reconstruction | patch with fixed length | Transformer | FT | N:N | FCT | ICML 2025 |
| GPT4TS [60] | / | / | patch with fixed length | LLM | FT | 1:N | FCT, CLS, IMP, AD | NeurIPS 2023 |
| LLMTime [22] | / | / | point, strings of digits | LLM | / | 1:N | FCT | NeurIPS 2023 |
| PromptCast [31] | / | / | point, strings of digits | LLM | FT | 1:N | FCT | TKDE 2023 |
| GPT4MTS [61] | real-world large event datasets | multimodal forecasting | patch with reversible instance normalization | LLM | PL | 1:1 | FCT | AAAI 2024 |
| TIME-LLM [62] | / | / | patch with fixed length | LLM | PL | 1:N | FCT | ICLR 2024 |
| AutoTimes [63] | real-world time series datasets | next token prediction | patch with fixed length | LLM | PL, ICL | N:N | FCT | NeurIPS 2024 |
| Chronos [7] | real-word time series datasets and synthetic data | autoregressive density estimation | quantization | LLM | / | N:N | FCT | TMLR 2024 |
| CALF [64] | / | / | text and time series emebdding | LLM | FT | N:N | FCT | AAAI 2025 |
| LLM4TS [65] | / | autoregressive time-series alignment | patch with fixed length | LLM | FT | 1:N | FCT | TIST 2025 |
| LLM-Mixer [66] | / | / | text and time series emebdding | LLM | FT | 1:N | FCT | ACL 2025 Workshop |
| TEMPO [25] | / | point forecasting | patch with fixed length | Transformer+ LLM | PL | N:N | FCT | ICLR 2024 |
| Method | Pretraining Data | Pre-trainingObjective & Task | Tokenization | Model Architecture | Adaptation Strategy | Domain Transferability | Downstream Task | Venue |
|---|---|---|---|---|---|---|---|---|
| GraphPrompt [32] | text-free | subgraph similarity | subgraph | GNN | PL | 1:1 | NC, GC | WWW 2023 |
| HGPrompt [86] | text-free | subgraph similarity | subgraph | GNN | PL | 1:1 | NC, GC | AAAI 2024 |
| GCOPE [8] | text-free | contrastive pretraining, feature reconstruction | node | GNN | FT, PL | N:N | NC | KDD 2024 |
| MultiGPrompt [87] | text-free | subgraph similarity | encoder layer | GNN | PL | 1:1 | NC, GC | WWW 2024 |
| OpenGraph [88] | text-free | masked autoencoding | node | GNN | / | N:N | NC, LP | EMNLP 2024 |
| GFT [89] | text-attributed | tree reconstruction | computation tree | GNN | FT | N:N | NC, GC, LP | NeurIPS 2024 |
| AnyGraph [30] | text-free | link prediction | node | GNN | FT | 1:N | NC, GC, LP | arXiv 2024 |
| MDGPT [90] | text-free | subgraph similarity | domain | GNN | PL | N:N | NC, GC | arXiv 2024 |
| OMOG [91] | text-attributed | contrastive pretraining | node | GNN | / | N:N | NC, LP | arXiv 2024 |
| GraphMoRE [92] | text-free | topology heterogeneity modeling | node | GNN | FT | 1:1 | NC, LP | AAAI 2025 |
| GraphAny [93] | text-free | node classification | node | GNN | / | 1:N | NC | ICLR 2025 |
| GraphPrompter [94] | text-free | neighbor matching, subgraph reconstruction | subgraph | GNN | ICL | N:N | NC, GC, LP | ICDE 2025 |
| BRIDGE [21] | text-free | subgraph similarity | aligner | GNN | PL | N:N | NC, GC | ICML 2025 |
| GIT [95] | text-attributed | tree reconstruction | task tree | GNN | FT, IT, ICL | N:N | NC, GC, LP | ICML 2025 |
| MDGFM [96] | text-free | subgraph similarity | domain | GNN | PL | N:N | NC | ICML 2025 |
| AutoGFM [97] | text-attributed | disentangled contrastive representation learning | subgraph | GNN | FT | N:N | NC, GC, LP | ICML 2025 |
| GCoT [98] | text-free | link prediction | node | GNN | PL | 1:1 | NC, GC | KDD 2025 |
| PatchNet [99] | text-free | attribute maskingcontext prediction | node patch | GNN | FT | N:N | NC, GC | KDD 2025 |
| SAMGPT [100] | text-free | subgraph similarity | structure token | GNN | PL | N:N | NC, GC | WWW 2025 |
| UniGraph2 [101] | multimodal | reconstruction | node | GNN | / | N:N | Multimodal Tasks | WWW 2025 |
| RiemannGFM [102] | text-attributed, text-free | geometriccontrastive learning | subgraph | GNN | FT | N:N | NC, LP | WWW 2025 |
| GraphCLIP [103] | text-attributed | contrastive learning invariant alignment | subgraph | GNN | PL | N:N | NC, LP | WWW 2025 |
| UniPrompt [104] | text-free | / | prompt graph | GNN | PL | N:N | NC | NeurIPS 2025 |
| GraphKeeper [105] | text-free | continualgraph pretraining | node | GNN | FT | N:1 | NC, GC | NeurIPS 2025 |
| H2GFM [106] | text-attributed | text-space node encoding, context-path modeling | node | GNN | / | N:N | NC, LP | arXiv 2025 |
| RWPT [107] | text-attributed | contrastive pretraining | node sequence | GNN | FT | N:N | NC, GC, LP | arXiv 2025 |
| MDGCL [108] | text-free | contrastive pretraining | subgraph | GNN | FT | N:1 | NC, GC | arXiv 2025 |
| GILT [109] | text-free | few-shot meta-pretraining | node, edge, graph | GNN | ICL | N:N | NC, GC, LP | arXiv 2025 |
| GMoPE [110] | text-free | contrastive pretraining | node | GNN | FT | N:N | NC, GC, LP | arXiv 2025 |
| GraphGlue [111] | text-free | geometric pretraining | manifold patch | GNN | FT, PL | N:1 | NC, GC, LP | ICLR 2026 |
| LLaGA [23] | text-attributed | alignment tuning | node sequence | LLM | IT | N:N | NC, LP | ICML 2024 |
| LangGFM [112] | text-attributed, text-free | instruction tuning | text | LLM | IT, ICL | N:N | NC, GC, LP | arXiv 2024 |
| PromptGFM [113] | text-attributed | multi-taskinstruction tuning | node | LLM | IT | N:N | NC, LP | arXiv 2025 |
| OFA [19] | text-attributed | graph classification | subgraph | GNN + LLM | PL, ICL | N:N | NC, GC, LP | ICLR 2024 |
| GraphGPT [114] | text-attributed | contrastive alignment, graph matching | subgraph | GNN + LLM | IT | N:N | NC, LP | SIGIR 2024 |
| ZeroG [115] | text-attributed | semantic similarity | prompting node, subgraph | GNN + LLM | / | N:N | NC | KDD 2024 |
| GOFA [116] | text-attributed | generative modeling | node, edge | GNN + LLM | IT | N:N | NC, GC, LP | ICLR 2025 |
| UniGraph [9] | text-attributed | text reconstruction | node, subgraph | GNN + LLM | IT, ICL | N:N | NC, GC, LP | KDD 2025 |
| BooG [117] | text-attributed | super-node/ class-hypothesis matching | subgraph | GNN + LLM | FT | N:N | NC, GC, LP | FCS 2025 |
| GRAVER [18] | text-attributed | subgraph similarity | subgraph | GNN + LLM | PL | N:N | NC, GC | NeurIPS 2025 |
| SA2GFM [26] | text-attributed | subgraph similarity | node,structural entropy | GNN + LLM | PL | N:N | NC, GC | AAAI 2026 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.