Submitted:
25 May 2025
Posted:
26 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Background and Related Work
3. Preliminaries and Problem Definition
4. SubGraph Retrieval Augmented Generation
4.1. Overview of the SG-RAG
4.2. Experimental Setup and Main Results
- Direct: The direct method generates an answer to q based solely on the internal knowledge of the LLM stored in its weights. This method is important because it tests how knowledgeable the underlying LLM is in the targeted domain.
- Retrieval Augmented Generation (RAG) [4]: This method is based on the traditional RAG method, where the external knowledge is a set of textual documents about the targeted domain. The external knowledge is stored in a vector database. Knowledge retrieval is based on the semantic similarity between q and the set of textual documents. The top-k similar documents to q are sent as a context to the LLM to generate an answer.
4.3. Potential Improvements
5. Merging and Ordering Triplets
5.1. Merging Subgraphs (MS)
| Algorithm 1: Merging Subgraphs |
![]() |
5.2. Ordering Triplets (OT)
| Algorithm 2: BFS Ordering Triplets |
![]() |
6. Experimental Settings
6.1. Baselines
- Chain-of-Thought (CoT) [28]: In the CoT method, the LLM answers the question q in a step-by-step approach based on its internal knowledge until it reaches the final answer to q. To give the LLM this ability, we applied the few-shot setup by providing 7 examples as context in the prompt. The prompt template for CoT is in Figure A4.
- Triplet-based RAG: This method integrates the top-k related knowledge, which is retrieved from the in triplet format, with the LLM prompt to generate the answer to q. The retrieval process is based on the semantic similarity between the q and the triplets in the . In our experiments, we applied this method three times with k being 5, 10, and 20. the prompt used with this method is same as SG-RAG which is shown in Figure A2.
6.2. Dataset and Evaluation Metric
6.3. Experimental Setup
7. Results and Discussion
7.1. Overall Performance
7.2. Ablation Study
7.3. Case Studies
8. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| LLM | Large Language Model |
| KG | Knowledge Graph |
| RAG | Retrieval Augmented Generation |
| SG-RAG | SubGraph Retrieval Augmented Generation |
| MOT | Merging and Ordering Triplets |
| MS | Merging Subgraph |
| OT | Ordering Triplets |
| BFS | Breadth First Search |
| DFS | Depth First Search |
| CoT | Chain-of-Though |
| ToG | Think-on-Graph |
Appendix A. Cypher Query Templates

Appendix B. Prompt Templates



References
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 2023. [CrossRef]
- Tonmoy, S.; Zaman, S.; Jain, V.; Rani, A.; Rawte, V.; Chadha, A.; Das, A. A comprehensive survey of hallucination mitigation techniques in large language models. arXiv preprint arXiv:2401.01313 2024. [CrossRef]
- Li, J.; Chen, J.; Ren, R.; Cheng, X.; Zhao, W.X.; Nie, J.Y.; Wen, J.R. The dawn after the dark: An empirical study on factuality hallucination in large language models. arXiv preprint arXiv:2401.03205 2024. [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 2020, 33, 9459–9474.
- Setty, S.; Jijo, K.; Chung, E.; Vidra, N. Improving Retrieval for RAG based Question Answering Models on Financial Documents. arXiv preprint arXiv:2404.07221 2024. [CrossRef]
- Zakka, C.; Shad, R.; Chaurasia, A.; Dalal, A.R.; Kim, J.L.; Moor, M.; Fong, R.; Phillips, C.; Alexander, K.; Ashley, E.; et al. Almanac—retrieval-augmented language models for clinical medicine. NEJM AI 2024, 1, AIoa2300068. [CrossRef]
- Alan, A.Y.; Karaarslan, E.; Aydin, O. A RAG-based Question Answering System Proposal for Understanding Islam: MufassirQAS LLM. arXiv preprint arXiv:2401.15378 2024. [CrossRef]
- Liu, N.F.; Lin, K.; Hewitt, J.; Paranjape, A.; Bevilacqua, M.; Petroni, F.; Liang, P. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172 2023. [CrossRef]
- Zhang, Q.; Chen, S.; Bei, Y.; Yuan, Z.; Zhou, H.; Hong, Z.; Dong, J.; Chen, H.; Chang, Y.; Huang, X. A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models. arXiv preprint arXiv:2501.13958 2025. [CrossRef]
- Larson, J.; Truitt, S. GraphRAG: Unlocking LLM discovery on narrative private data, 2024. Accessed 25/06/2024.
- Saleh, A.O.; Tür, G.; Saygin, Y. SG-RAG: Multi-Hop Question Answering With Large Language Models Through Knowledge Graphs. In Proceedings of the Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024), 2024, pp. 439–448.
- Reid, M.; Savinov, N.; Teplyashin, D.; Lepikhin, D.; Lillicrap, T.; Alayrac, J.b.; Soricut, R.; Lazaridou, A.; Firat, O.; Schrittwieser, J.; et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530 2024. [CrossRef]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, H. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997 2023. [CrossRef]
- Jin, B.; Liu, G.; Han, C.; Jiang, M.; Ji, H.; Han, J. Large language models on graphs: A comprehensive survey. arXiv preprint arXiv:2312.02783 2023. [CrossRef]
- Chen, Z.; Mao, H.; Li, H.; Jin, W.; Wen, H.; Wei, X.; Wang, S.; Yin, D.; Fan, W.; Liu, H.; et al. Exploring the potential of large language models (llms) in learning on graphs. ACM SIGKDD Explorations Newsletter 2024, 25, 42–61. [CrossRef]
- Edge, D.; Trinh, H.; Cheng, N.; Bradley, J.; Chao, A.; Mody, A.; Truitt, S.; Larson, J. From local to global: A graph rag approach to query-focused summarization. arXiv preprint arXiv:2404.16130 2024. [CrossRef]
- Jin, B.; Xie, C.; Zhang, J.; Roy, K.K.; Zhang, Y.; Li, Z.; Li, R.; Tang, X.; Wang, S.; Meng, Y.; et al. Graph chain-of-thought: Augmenting large language models by reasoning on graphs. arXiv preprint arXiv:2404.07103 2024. [CrossRef]
- Sun, J.; Xu, C.; Tang, L.; Wang, S.; Lin, C.; Gong, Y.; Ni, L.M.; Shum, H.Y.; Guo, J. Think-on-graph: Deep and responsible reasoning of large language model on knowledge graph. arXiv preprint arXiv:2307.07697 2023. [CrossRef]
- Shafie, T. A multigraph approach to social network analysis 2015. [CrossRef]
- Francis, N.; Green, A.; Guagliardo, P.; Libkin, L.; Lindaaker, T.; Marsault, V.; Plantikow, S.; Rydberg, M.; Selmer, P.; Taylor, A. Cypher: An evolving query language for property graphs. In Proceedings of the Proceedings of the 2018 international conference on management of data, 2018, pp. 1433–1445. [CrossRef]
- Zhang, Y.; Dai, H.; Kozareva, Z.; Smola, A.J.; Song, L. Variational Reasoning for Question Answering with Knowledge Graph. In Proceedings of the AAAI, 2018. [CrossRef]
- Wen, T.H.; Vandyke, D.; Mrkšić, N.; Gašić, M.; Rojas-Barahona, L.M.; Su, P.H.; Ultes, S.; Young, S. A Network-based End-to-End Trainable Task-oriented Dialogue System. In Proceedings of the Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers; Lapata, M.; Blunsom, P.; Koller, A., Eds., Valencia, Spain, 2017; pp. 438–449. [CrossRef]
- AI@Meta. Llama 3 Model Card 2024.
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 2023. [CrossRef]
- Baker, G.A.; Raut, A.; Shaier, S.; Hunter, L.E.; von der Wense, K. Lost in the Middle, and In-Between: Enhancing Language Models’ Ability to Reason Over Long Contexts in Multi-Hop QA. arXiv preprint arXiv:2412.10079 2024. [CrossRef]
- Peysakhovich, A.; Lerer, A. Attention sorting combats recency bias in long context language models. arXiv preprint arXiv:2310.01427 2023. [CrossRef]
- Tang, R.; Zhang, X.; Ma, X.; Lin, J.; Ture, F. Found in the middle: Permutation self-consistency improves listwise ranking in large language models. arXiv preprint arXiv:2310.07712 2023. [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D.; et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 2022, 35, 24824–24837.
- Team, Q. Qwen2.5: A Party of Foundation Models, 2024.
| 1 | |
| 2 | The p-value calculated by one-tailed paired t-test
|
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 |







| Method | 1-hop | 2-hop | 3-hop |
|---|---|---|---|
| Direct | 0.24 | 0.13 | 0.17 |
| RAG-Wiki Top-1 | 0.33 | 0.19 | 0.21 |
| RAG-Wiki Top-2 | 0.36 | 0.20 | 0.20 |
| RAG-Wiki Top-3 | 0.38 | 0.22 | 0.20 |
| RAG-Wiki Top-5 | 0.40 | 0.23 | 0.18 |
| RAG-Wiki Top-10 | 0.42 | 0.27 | 0.19 |
| SG-RAG | 0.90 | 0.73 | 0.58 |
| Method | 1-hop | 2-hop | 3-hop |
|---|---|---|---|
| RAG-Wiki Top-1 | 0.33 | 0.19 | 0.21 |
| RAG-Wiki Top-2 | 0.35 | 0.20 | 0.20 |
| RAG-Wiki Top-3 | 0.36 | 0.22 | 0.20 |
| RAG-Gen Top-1 | 0.64 | 0.15 | 0.17 |
| RAG-Gen Top-2 | 0.66 | 0.12 | 0.13 |
| RAG-Gen Top-3 | 0.66 | 0.12 | 0.16 |
| SG-RAG | 0.91 | 0.72 | 0.60 |
| Method | 1-hop | 2-hop | 3-hop |
|---|---|---|---|
| RAG-Gen Top-1 | 0.765 | 0.286 | 0.204 |
| RAG-Gen Top-2 | 0.776 | 0.181 | 0.177 |
| RAG-Gen Top-3 | 0.784 | 0.179 | 0.180 |
| SG-RAG | 0.941 | 0.815 | 0.520 |
| Method | LLM | 1-hop | 2-hop | 3-hop |
|---|---|---|---|---|
| Direct | Llama-3.1 8B Instruct | 36.43 | 22.77 | 18.01 |
| Llama-3.2 3B Instruct | 21.30 | 12.13 | 8.72 | |
| Qwen-2.5 7B Instruct | 16.46 | 18.61 | 15.14 | |
| Qwen-2.5 3B Instruct | 11.85 | 12.96 | 8.05 | |
| CoT | Llama-3.1 8B Instruct | 39.27 | 21.81 | 14.25 |
| Llama-3.2 3B Instruct | 23.01 | 13.95 | 13.38 | |
| Qwen-2.5 7B Instruct | 18.45 | 18.99 | 15.62 | |
| Qwen-2.5 3B Instruct | 13.08 | 14.36 | 8.79 | |
| Triplet RAG Top 5 | Llama-3.1 8B Instruct | 54.94 | 4.58 | 9.85 |
| Llama-3.2 3B Instruct | 52.24 | 5.27 | 12.83 | |
| Qwen-2.5 7B Instruct | 56.73 | 5.91 | 11.68 | |
| Qwen-2.5 3B Instruct | 53.60 | 3.71 | 12.13 | |
| Triplet RAG Top 10 | Llama-3.1 8B Instruct | 60.28 | 6.06 | 12.66 |
| Llama-3.2 3B Instruct | 58.25 | 6.42 | 15.23 | |
| Qwen-2.5 7B Instruct | 61.82 | 6.53 | 13.20 | |
| Qwen-2.5 3B Instruct | 57.31 | 4.27 | 13.87 | |
| Triplet RAG Top 20 | Llama-3.1 8B Instruct | 63.87 | 7.18 | 14.63 |
| Llama-3.2 3B Instruct | 61.46 | 7.12 | 16.79 | |
| Qwen-2.5 7B Instruct | 64.68 | 6.75 | 14.14 | |
| Qwen-2.5 3B Instruct | 58.55 | 5.58 | 14.62 | |
| Graph-CoT | Llama-3.1 8B Instruct | 47.98 | 15.38 | 4.33 |
| Llama-3.2 3B Instruct | 25.11 | 9.92 | 6.41 | |
| Qwen-2.5 7B Instruct | 81.40 | 57.42 | 25.35 | |
| Qwen-2.5 3B Instruct | 51.83 | 13.65 | 6.48 | |
| SG-RAG MOT | 85.26 | 77.27 | 65.63 | |
| LLM | 1-hop | 2-hop | 3-hop |
|---|---|---|---|
| Llama-3.1 8B Instruct | 85.26 | 77.27 | 65.63 |
| Llama-3.2 3B Instruct | 72.62 | 77.43 | 65.75 |
| Qwen-2.5 7B Instruct | 88.80 | 86.52 | 68.50 |
| Qwen-2.5 3B Instruct | 81.40 | 75.25 | 57.75 |
| LLM | Ordering Strategy | 2-hop | 3-hop |
|---|---|---|---|
| Llama-3.1 8B Instruct | Random | 67.69 | 45.81 |
| DFS | 73.24 | 50.98 | |
| Reverse DFS | 68.82 | 46.68 | |
| BFS | 72.64 | 51.59 | |
| Reverse BFS | 69.39 | 50.37 | |
| Llama-3.2 3B Instruct | Random | 69.75 | 42.93 |
| DFS | 73.98 | 46.52 | |
| Reverse DFS | 69.44 | 43.60 | |
| BFS | 74.78 | 46.29 | |
| Reverse BFS | 69.65 | 43.34 | |
| Qwen-2.5 7B Instruct | Random | 77.07 | 43.81 |
| DFS | 82.58 | 50.45 | |
| Reverse DFS | 80.98 | 45.98 | |
| BFS | 81.88 | 48.23 | |
| Reverse BFS | 79.76 | 49.36 | |
| Qwen-2.5 3B Instruct | Random | 65.12 | 36.98 |
| DFS | 69.47 | 41.25 | |
| Reverse DFS | 67.77 | 38.42 | |
| BFS | 70.49 | 42.42 | |
| Reverse BFS | 67.28 | 38.26 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

