Submitted:
05 April 2025
Posted:
08 April 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Background and Foundations
2.1. Traditional Information Retrieval Methods
2.2. The Evolution of Neural Language Models
2.3. Knowledge Augmentation Strategies
- Explicit Knowledge Injection: Approaches that integrate structured or unstructured external knowledge sources directly into the model. This includes knowledge graph embeddings, entity linking, and rule-based enhancements.
- Retrieval-Augmented Approaches: Techniques that dynamically retrieve relevant documents or passages at inference time, grounding the model’s outputs in real-world data.
- Memory-Augmented Neural Networks: Models that incorporate external memory modules to store and retrieve information dynamically.
- Hybrid Techniques: Methods that combine retrieval with fine-tuning strategies, optimizing models to effectively incorporate retrieved evidence.
2.4. Challenges in Integrating Retrieval with Generation
- Efficient and Scalable Retrieval: Ensuring fast and accurate retrieval from large-scale knowledge bases while maintaining low latency [27].
- Relevance and Ranking: Optimizing retrieval models to return the most relevant information, minimizing noisy or irrelevant results.
- Fusion Strategies: Determining how retrieved knowledge should be incorporated into the generation process, balancing factual accuracy with natural language fluency [28].
- Security and Robustness: Preventing adversarial attacks and misinformation propagation by ensuring retrieved knowledge is trustworthy.
2.5. Summary
3. Architectures and Methodologies of Retrieval-Augmented Generation
3.1. General Framework of Retrieval-Augmented Generation
3.2. Retrieval Mechanisms in RAG
3.2.1. Sparse Retrieval
3.2.2. Dense Retrieval
3.3. Fusion Strategies for Retrieval and Generation
3.3.1. Concatenation-Based Fusion
3.3.2. Attention-Based Fusion
3.3.3. Reinforcement Learning-Based Fusion
3.4. Trade-Offs and Challenges
- Latency: Retrieving documents in real-time can introduce delays, necessitating efficient indexing and retrieval mechanisms [47].
- Noisy Retrieval: Retrieved documents may contain irrelevant or conflicting information, requiring robust filtering strategies [48].
- Knowledge Conflicts: When retrieved knowledge contradicts the LLM’s internal knowledge, resolving inconsistencies remains a complex problem.
- Scalability: Maintaining large-scale knowledge bases for retrieval without excessive computational overhead is an ongoing research challenge [49].
3.5. Summary
4. Applications of Retrieval-Augmented Generation
4.1. Question Answering Systems
4.1.1. Open-Domain Question Answering
4.1.2. Domain-Specific Question Answering
4.2. Conversational Agents and Chatbots
4.2.1. Personalized Assistants
4.2.2. Customer Support Systems
4.3. Document Summarization and Knowledge Synthesis
- Multi-Document Summarization: Retrieving multiple related documents and generating a coherent summary [62].
- Scientific Literature Reviews: Extracting key findings from multiple research papers.
- Legal Document Analysis: Summarizing case laws, contracts, and regulations based on retrieved precedents.
4.4. Code Generation and Software Development
4.4.1. Code Completion and Debugging
- Autocompleting partially written code.
- Suggesting optimized implementations.
- Identifying potential bugs by referencing previous issues [65].
4.4.2. API and Library Recommendations
4.5. Biomedical Applications
4.5.1. Clinical Decision Support
4.5.2. Drug Discovery and Literature Mining
4.6. Legal and Compliance Analysis
4.6.1. Legal Case Analysis
4.6.2. Regulatory Compliance
4.7. Fake News Detection and Misinformation Mitigation
4.7.1. Fact-Checking Systems
4.8. Summary
5. Challenges and Future Research Directions
5.1. Challenges in Retrieval-Augmented Generation
5.1.1. Retrieval Quality and Relevance
- Irrelevant Retrieval: Retrieved documents may not contain the information necessary to answer the query accurately, leading to incorrect or unhelpful responses [73].
- Misinformation and Conflicting Sources: If the retrieved documents contain conflicting or unreliable information, the generative model may produce misleading outputs.
- Noisy Retrieval: Some retrieval methods, particularly those based on dense embeddings, may return semantically related but contextually irrelevant documents.
- Enhancing hybrid retrieval techniques that combine sparse (e.g., BM25) and dense (e.g., DPR) retrieval for improved recall and precision.
- Incorporating retrieval reranking models, such as cross-encoders, that refine initial retrieval results before passing them to the generative model [74].
- Developing retrieval-augmented contrastive learning techniques to better differentiate between highly relevant and marginally related documents.
5.1.2. Fusion of Retrieved Knowledge and Text Generation
- Over-Reliance on Parametric Knowledge: Generative models sometimes ignore retrieved information and rely on their internal knowledge, leading to outdated or hallucinated responses.
- Fragmented Knowledge Integration: When multiple documents are retrieved, models may struggle to synthesize information coherently, leading to contradictory or disjointed responses [75].
- Loss of Granular Context: Retrieved passages may contain useful details that are overlooked when aggregated or truncated for input into the generative model.
- Developing adaptive attention mechanisms that dynamically weigh retrieved evidence based on relevance scores [76].
- Implementing memory-augmented architectures that store and refine retrieved information over multiple turns in dialogue-based applications.
- Exploring multi-step reasoning frameworks that enable models to iteratively refine their responses based on retrieved content [77].
5.1.3. Efficiency and Latency
- Utilizing approximate nearest neighbor (ANN) search techniques to accelerate retrieval while maintaining high recall.
- Exploring caching mechanisms that store frequently accessed documents for rapid retrieval.
- Investigating knowledge distillation techniques to precompute and embed commonly retrieved information, reducing the need for real-time document lookup.
5.1.4. Scalability Issues in Large Knowledge Bases
- Indexing Complexity: Maintaining efficient indices for retrieval across vast document collections is computationally expensive.
- Storage Constraints: Storing large-scale dense embeddings requires significant memory resources.
- Incremental Updates: Updating knowledge bases dynamically without requiring complete re-indexing remains an open problem [81].
- Developing hierarchical indexing methods that allow efficient retrieval at multiple levels of granularity [82].
- Leveraging vector compression techniques to reduce the memory footprint of large-scale embeddings [83].
- Designing streaming knowledge retrieval frameworks that support continuous updates without full re-indexing.
5.1.5. Security, Bias, and Ethical Concerns
- Exposure of Sensitive Information: If the knowledge base contains private or proprietary data, retrieved documents may inadvertently expose sensitive content.
- Bias in Retrieved Content: The retrieval model may reinforce societal biases if the training data contains skewed or imbalanced representations.
- Adversarial Attacks on Retrieval: Malicious actors could manipulate retrieval results by poisoning indexed documents or injecting misleading content.
- Implementing differential privacy techniques to prevent leakage of sensitive information [84].
- Introducing bias mitigation frameworks that assess and balance retrieval outputs before text generation [85].
- Developing robust adversarial detection mechanisms to identify and filter manipulated knowledge sources.
5.2. Future Research Directions
5.2.1. Neural-Symbolic Hybrid Models
- Leverage structured knowledge bases (e.g., Wikidata, knowledge graphs) alongside traditional document retrieval.
- Employ logic-based reasoning modules to validate generated responses [86].
- Enhance interpretability by explicitly linking outputs to retrieved knowledge sources.
5.2.2. Continual Learning and Adaptive Retrieval
- Lifelong learning approaches that allow models to dynamically update retrieval strategies based on new information [88].
- Self-improving retrieval mechanisms that refine document selection based on user feedback.
- Incremental learning pipelines that enable RAG systems to incorporate real-time knowledge without full retraining [89].
5.2.3. Multimodal RAG Models
- Image and Video Retrieval: Integrating visual knowledge for applications in medical imaging, forensic analysis, and autonomous systems.
- Audio and Speech Retrieval: Enhancing conversational AI with audio-based search capabilities.
- Cross-Modal Retrieval Fusion: Developing architectures that combine text, images, and structured data for richer knowledge grounding [90].
5.2.4. Explainability and Interpretability in RAG
5.3. Summary
6. Conclusion
6.1. Summary of Key Insights
- Hybrid Approach for Enhanced Performance: RAG leverages the strengths of both retrieval-based and generation-based models. It dynamically retrieves relevant documents at inference time and uses them to condition text generation, leading to more informed and contextually relevant responses.
- Improved Accuracy and Knowledge Grounding: By incorporating external knowledge sources, RAG mitigates hallucination issues commonly observed in LLMs. The generated text is more reliable, explainable, and directly linked to retrieved evidence.
- Broad Applicability Across Domains: RAG has demonstrated significant benefits across multiple applications, including open-domain question answering, scientific literature summarization, conversational AI, legal document analysis, and biomedical research.
- Scalability and Efficiency Challenges: Despite its advantages, RAG introduces additional computational complexity due to the retrieval process. Efficient indexing, caching, and hybrid search methods are crucial for optimizing performance.
- Ethical Considerations and Robustness: Issues such as retrieval bias, exposure of sensitive data, and adversarial manipulation remain open problems. Future research should focus on building secure, transparent, and fair RAG-based systems.
6.2. The Broader Impact of RAG
6.2.1. Democratization of Knowledge
6.2.2. Towards Explainable AI
6.2.3. Bridging Structured and Unstructured Data
6.3. Future of Retrieval-Augmented Generation
- Neural-Symbolic Integration: Combining symbolic reasoning with neural retrieval mechanisms could lead to more robust and interpretable RAG systems that leverage structured knowledge bases alongside free-text retrieval.
- Efficient and Scalable Retrieval: The development of advanced indexing techniques, approximate nearest neighbor (ANN) search, and knowledge compression strategies will be critical for making retrieval more efficient, enabling real-time applications.
- Continual Learning and Adaptive Retrieval: Future RAG models could evolve dynamically by learning from user interactions, updating retrieval strategies based on feedback, and incorporating the latest knowledge without requiring frequent retraining.
- Multimodal and Cross-Domain RAG: Expanding RAG beyond text-based retrieval to incorporate images, videos, structured data, and multimodal inputs could unlock novel applications in areas such as autonomous systems, education, and healthcare.
- Ethical AI and Bias Mitigation: Ensuring fairness and reducing biases in retrieved documents remains a critical research challenge. Developing adversarially robust retrieval methods and fairness-aware ranking algorithms will be essential for building responsible AI systems.
6.4. Final Remarks
References
- ILIN, I. Advanced RAG Techniques: an Illustrated Overview. https://pub.towardsai.net/advanced-rag-techniques-an-illustrated-overview-04d193d8fec6, 2023.
- Husain, H.; Wu, H.H.; Gazit, T.; Allamanis, M.; Brockschmidt, M. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436 2019. [CrossRef]
- Gottlob, G.; Leone, N.; Scarcello, F. Hypertree Decompositions and Tractable Queries. Journal of Computer and System Sciences 2002, 64, 579–627. [CrossRef]
- Berant, J.; Chou, A.K.; Frostig, R.; Liang, P. Semantic Parsing on Freebase from Question-Answer Pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013.
- Bang, Y.; Cahyawijaya, S.; Lee, N.; Dai, W.; Su, D.; Wilie, B.; Lovenia, H.; Ji, Z.; Yu, T.; Chung, W.; et al. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023 2023. [CrossRef]
- Wang, Y.; Lipka, N.; Rossi, R.A.; Siu, A.; Zhang, R.; Derr, T. Knowledge graph prompting for multi-document question answering. arXiv preprint arXiv:2308.11730 2023. [CrossRef]
- Bai, Y.; Kadavath, S.; Kundu, S.; Askell, A.; Kernion, J.; Jones, A.; Chen, A.; Goldie, A.; Mirhoseini, A.; McKinnon, C.; et al. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073 2022. [CrossRef]
- Karpukhin, V.; Oğuz, B.; Min, S.; Lewis, P.; Wu, L.; Edunov, S.; Chen, D.; tau Yih, W. Dense Passage Retrieval for Open-Domain Question Answering, 2020, [arXiv:cs.CL/2004.04906]. [CrossRef]
- Saha, S.; Junaed, J.A.; Saleki, M.; Sharma, A.S.; Rifat, M.R.; Rahouti, M.; Ahmed, S.I.; Mohammed, N.; Amin, M.R. Vio-lens: A novel dataset of annotated social network posts leading to different forms of communal violence and its evaluation. In Proceedings of the Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), 2023, pp. 72–84.
- Levesque, H.J. A logic of implicit and explicit belief. In Proceedings of the Proceedings of the Fourth National Conference on Artificial Intelligence, Austin, Texas, August 1984; pp. 198–202.
- Saha, A.; Pahuja, V.; Khapra, M.M.; Sankaranarayanan, K.; Chandar, S. Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph 2018. [arXiv:1801.10314]. [CrossRef]
- NVIDIA. Spectrum-X: End-to-End Networking for AI and High-Performance Computing. https://www.nvidia.com/en-us/networking/spectrumx/, 2025. Accessed: 2025-01-28.
- Han, Y.; Liu, C.; Wang, P. A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge, 2023, [arXiv:cs.DB/2310.11703]. [CrossRef]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models, 2024, [arXiv:cs.CL/2303.18223]. [CrossRef]
- Kim, G.; Kim, S.; Jeon, B.; Park, J.; Kang, J. Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models. arXiv preprint arXiv:2310.14696 2023. [CrossRef]
- Welbl, J.; Stenetorp, P.; et al. 2WikiMultiHopQA: Multihop Question Answering over Wikipedia Articles. EMNLP 2018.
- LangChain. LangSmith: The Ultimate Toolkit for Debugging and Monitoring LLM Applications. https://www.langchain.com/langsmith, 2025. Accessed: 2025-01-28.
- Cohere. Say Goodbye to Irrelevant Search Results: Cohere Rerank Is Here. https://txt.cohere.com/rerank/, 2023.
- Wang, L.; Yang, N.; Wei, F. Query2doc: Query Expansion with Large Language Models. arXiv preprint arXiv:2303.07678 2023. [CrossRef]
- Yan, S.Q.; Gu, J.C.; Zhu, Y.; Ling, Z.H. Corrective Retrieval Augmented Generation, 2024, [arXiv:cs.CL/2401.15884]. [CrossRef]
- Xu, F.; Shi, W.; Choi, E. RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation. arXiv preprint arXiv:2310.04408 2023. [CrossRef]
- An, Z.; Ding, X.; Fu, Y.C.; Chu, C.C.; Li, Y.; Du, W. Golden-Retriever: High-Fidelity Agentic Retrieval Augmented Generation for Industrial Knowledge Base, 2024, [arXiv:cs.IR/2408.00798]. [CrossRef]
- Elsahar, H.; Vougiouklis, P.; Remaci, A.; Gravier, C.; Hare, J.; Laforest, F.; Simperl, E. T-rex: A large scale alignment of natural language with knowledge base triples. In Proceedings of the Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018.
- Blagojevi, V. Enhancing RAG Pipelines in Haystack: Introducing DiversityRanker and LostInTheMiddleRanker. https://towardsdatascience.com/enhancing-rag-pipelines-in-haystack-45f14e2bc9f5, 2023.
- LangChain. LangGraph Workflows Tutorial, 2025. https://langchain-ai.github.io/langgraph/tutorials/workflows/. Accessed: February 2, 2025.
- Lee, M.C.; Zhu, Q.; Mavromatis, C.; Han, Z.; Adeshina, S.; Ioannidis, V.N.; Rangwala, H.; Faloutsos, C. Agent-G: An Agentic Framework for Graph Retrieval Augmented Generation, 2024.
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; tau Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, 2021, [arXiv:cs.CL/2005.11401]. [CrossRef]
- Nebel, B. On the compilability and expressive power of propositional planning formalisms. Journal of Artificial Intelligence Research 2000, 12, 271–315.
- Saha, S.; Junaed, J.A.; Saleki, M.; Sen Sharma, A.; Rifat, M.R.; Rahouti, M.; Ahmed, S.I.; Mohammed, N.; Amin, M.R. Vio-Lens: A Novel Dataset of Annotated Social Network Posts Leading to Different Forms of Communal Violence and its Evaluation. In Proceedings of the Proceedings of the First Workshop on Bangla Language Processing (BLP-2023); Alam, F.; Kar, S.; Chowdhury, S.A.; Sadeque, F.; Amin, R., Eds., Singapore, 2023; pp. 72–84. [CrossRef]
- Berchansky, M.; Izsak, P.; Caciularu, A.; Dagan, I.; Wasserblat, M. Optimizing Retrieval-augmented Reader Models via Token Elimination. arXiv preprint arXiv:2310.13682 2023. [CrossRef]
- Sciavolino, C.; Zhong, Z.; Lee, J.; Chen, D. Simple entity-centric questions challenge dense retrievers. arXiv preprint arXiv:2109.08535 2021. [CrossRef]
- Dasigi, P.; Lo, K.; Beltagy, I.; Cohan, A.; Smith, N.A.; Gardner, M. A dataset of information-seeking questions and answers anchored in research papers. arXiv preprint arXiv:2105.03011 2021. [CrossRef]
- He, R.; McAuley, J. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the proceedings of the 25th international conference on world wide web, 2016, pp. 507–517.
- Ram, O.; Levine, Y.; Dalmedigos, I.; Muhlgay, D.; Shashua, A.; Leyton-Brown, K.; Shoham, Y. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083 2023. [CrossRef]
- Du, X.; Ji, H. Retrieval-Augmented Generative Question Answering for Event Argument Extraction. arXiv preprint arXiv:2211.07067 2022. [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 2020, 33, 9459–9474.
- LlamaIndex. Introducing Agentic Document Workflows. https://www.llamaindex.ai/blog/introducing-agentic-document-workflows, 2025. Accessed: 2025-01-13.
- crewAI Inc.. crewAI: A GitHub Repository for AI Projects. https://github.com/crewAIInc/crewAI, 2025. Accessed: 2025-01-15.
- Singh, A.; Kumar, S.; Ehtesham, A.; Khoei, T.T.; Bhati, D. Large Language Model-Driven Immersive Agent. In Proceedings of the 2024 IEEE World AI IoT Congress (AIIoT), 2024, pp. 0619–0624. [CrossRef]
- Clark, C.; Lee, K.; Chang, M.W.; Kwiatkowski, T.; Collins, M.; Toutanova, K. BoolQ: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044 2019. [CrossRef]
- Xu, X.; Gou, Z.; Wu, W.; Niu, Z.Y.; Wu, H.; Wang, H.; Wang, S. Long time no see! open-domain conversation with long-term persona memory. arXiv preprint arXiv:2203.05797 2022. [CrossRef]
- Xiao, G.; Tian, Y.; Chen, B.; Han, S.; Lewis, M. Efficient streaming language models with attention sinks. arXiv preprint arXiv:2309.17453 2023. [CrossRef]
- Zniyed, Y.; Nguyen, T.P.; et al. Enhanced network compression through tensor decompositions and pruning. IEEE Transactions on Neural Networks and Learning Systems 2024. [CrossRef]
- Leng, Q.; Uhlenhuth, K.; Polyzotis, A. Best Practices for LLM Evaluation of RAG Applications. https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG, 2023.
- Steinberger, R.; Pouliquen, B.; Widiger, A.; Ignat, C.; Erjavec, T.; Tufis, D.; Varga, D. The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. arXiv preprint cs/0609058 2006. [CrossRef]
- Hoshi, Y.; Miyashita, D.; Ng, Y.; Tatsuno, K.; Morioka, Y.; Torii, O.; Deguchi, J. RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models. arXiv preprint arXiv:2308.10633 2023. [CrossRef]
- Nguyen, I. Evaluating RAG Part I: How to Evaluate Document Retrieval. https://www.deepset.ai/blog/rag-evaluation-retrieval, 2023.
- Ren, R.; Wang, Y.; Qu, Y.; Zhao, W.X.; Liu, J.; Tian, H.; Wu, H.; Wen, J.R.; Wang, H. Investigating the factual knowledge boundary of large language models with retrieval augmentation. arXiv preprint arXiv:2307.11019 2023. [CrossRef]
- Rau, D.; Déjean, H.; Chirkova, N.; Formal, T.; Wang, S.; Nikoulina, V.; Clinchant, S. BERGEN: A Benchmarking Library for Retrieval-Augmented Generation, 2024, [arXiv:cs.CL/2407.01102]. [CrossRef]
- Kandpal, N.; Deng, H.; Roberts, A.; Wallace, E.; Raffel, C. Large language models struggle to learn long-tail knowledge. In Proceedings of the International Conference on Machine Learning. PMLR, 2023, pp. 15696–15707.
- Mallen, A.; Asai, A.; Zhong, V.; Das, R.; Hajishirzi, H.; Khashabi, D. When not to trust language models: Investigating effectiveness and limitations of parametric and non-parametric memories. arXiv preprint arXiv:2212.10511 2022. [CrossRef]
- Madaan, A.; Tandon, N.; Gupta, P.; Hallinan, S.; Gao, L.; Wiegreffe, S.; Alon, U.; Dziri, N.; Prabhumoye, S.; Yang, Y.; et al. Self-Refine: Iterative Refinement with Self-Feedback, 2023, [arXiv:cs.CL/2303.17651]. [CrossRef]
- Huang, J.; Chang, K.C.C. Towards Reasoning in Large Language Models: A Survey, 2023, [arXiv:cs.CL/2212.10403]. [CrossRef]
- Saad-Falcon, J.; Khattab, O.; Potts, C.; Zaharia, M. ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems. arXiv preprint arXiv:2311.09476 2023. [CrossRef]
- Zhang, J. Graph-ToolFormer: To Empower LLMs with Graph Reasoning Ability via Prompt Augmented by ChatGPT. arXiv preprint arXiv:2304.11116 2023. [CrossRef]
- Jiang, X.; Zhang, R.; Xu, Y.; Qiu, R.; Fang, Y.; Wang, Z.; Tang, J.; Ding, H.; Chu, X.; Zhao, J.; et al. Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models. arXiv preprint arXiv:2312.15883 2023. [CrossRef]
- Ye, D.; Lin, Y.; Du, J.; Liu, Z.; Li, P.; Sun, M.; Liu, Z. Coreferential reasoning learning for language representation. arXiv preprint arXiv:2004.06870 2020. [CrossRef]
- Zeng, H. Measuring massive multitask chinese understanding. arXiv preprint arXiv:2304.12986 2023. [CrossRef]
- He, X.; Tian, Y.; Sun, Y.; Chawla, N.V.; Laurent, T.; LeCun, Y.; Bresson, X.; Hooi, B. G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering. arXiv preprint arXiv:2402.07630 2024. [CrossRef]
- Jiang, H.; Wu, Q.; Lin, C.Y.; Yang, Y.; Qiu, L. Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736 2023. [CrossRef]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, M.; Wang, H. Retrieval-Augmented Generation for Large Language Models: A Survey, 2024, [arXiv:cs.CL/2312.10997]. [CrossRef]
- Yang, A.; Nagrani, A.; Seo, P.H.; Miech, A.; Pont-Tuset, J.; Laptev, I.; Sivic, J.; Schmid, C. Vid2seq: Large-scale pretraining of a visual language model for dense video captioning. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10714–10726.
- Shi, T.; Li, L.; Lin, Z.; Yang, T.; Quan, X.; Wang, Q. Dual-Feedback Knowledge Retrieval for Task-Oriented Dialogue Systems. arXiv preprint arXiv:2310.14528 2023. [CrossRef]
- Mallen, A.; Asai, A.; Zhong, V.; Das, R.; Khashabi, D.; Hajishirzi, H. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Rogers, A.; Boyd-Graber, J.; Okazaki, N., Eds., Toronto, Canada, 2023; pp. 9802–9822. [CrossRef]
- Xia, M.; Huang, G.; Liu, L.; Shi, S. Graph based translation memory for neural machine translation. In Proceedings of the Proceedings of the AAAI conference on artificial intelligence, 2019, Vol. 33, pp. 7297–7304.
- Li, X.; Zhao, R.; Chia, Y.K.; Ding, B.; Bing, L.; Joty, S.; Poria, S. Chain of Knowledge: A Framework for Grounding Large Language Models with Structured Knowledge Bases. arXiv preprint arXiv:2305.13269 2023. [CrossRef]
- Bajaj, P.; Campos, D.; Craswell, N.; Deng, L.; Gao, J.; Liu, X.; Majumder, R.; McNamara, A.; Mitra, B.; Nguyen, T.; et al. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, 2018, [arXiv:cs.CL/1611.09268]. [CrossRef]
- Cookbook, H.F. Agentic RAG: Turbocharge Your Retrieval-Augmented Generation with Query Reformulation and Self-Query. https://huggingface.co/learn/cookbook/en/agent_rag. Accessed: 2025-01-14.
- Seo, M.; Baek, J.; Thorne, J.; Hwang, S.J. Retrieval-Augmented Data Augmentation for Low-Resource Domain Tasks. arXiv preprint arXiv:2402.13482 2024. [CrossRef]
- Ma, Y.; Cao, Y.; Hong, Y.; Sun, A. Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples! ArXiv 2023, abs/2303.08559. [CrossRef]
- Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Transactions on Information Systems 2024. [CrossRef]
- Asai, A.; Min, S.; Zhong, Z.; Chen, D. Retrieval-based language models and applications. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts), 2023, pp. 41–46.
- Dasigi, P.; Lo, K.; Beltagy, I.; Cohan, A.; Smith, N.A.; Gardner, M. A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers. In Proceedings of the Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Toutanova, K.; Rumshisky, A.; Zettlemoyer, L.; Hakkani-Tur, D.; Beltagy, I.; Bethard, S.; Cotterell, R.; Chakraborty, T.; Zhou, Y., Eds., Online, 2021; pp. 4599–4610. [CrossRef]
- Zheng, H.S.; Mishra, S.; Chen, X.; Cheng, H.T.; Chi, E.H.; Le, Q.V.; Zhou, D. Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models. arXiv preprint arXiv:2310.06117 2023. [CrossRef]
- Yang, S. Advanced RAG 01: Small-to-Big Retrieval. https://towardsdatascience.com/advanced-rag-01-small-to-big-retrieval-172181b396d4, 2023.
- Pang, R.Y.; Parrish, A.; Joshi, N.; Nangia, N.; Phang, J.; Chen, A.; Padmakumar, V.; Ma, J.; Thompson, J.; He, H.; et al. QuALITY: Question Answering with Long Input Texts, Yes!, 2022, [arXiv:cs.CL/2112.08608]. [CrossRef]
- Yan, S.Q.; Gu, J.C.; Zhu, Y.; Ling, Z.H. Corrective Retrieval Augmented Generation, 2024, [arXiv:cs.CL/2401.15884]. [CrossRef]
- Wang, X.; Chen, G.H.; Song, D.; Zhang, Z.; Chen, Z.; Xiao, Q.; Jiang, F.; Li, J.; Wan, X.; Wang, B.; et al. CMB: A Comprehensive Medical Benchmark in Chinese, 2024, [arXiv:cs.CL/2308.08833]. [CrossRef]
- Zniyed, Y.; Nguyen, T.P.; et al. Efficient tensor decomposition-based filter pruning. Neural Networks 2024, 178, 106393. [CrossRef]
- Anderson, N.; Wilson, C.; Richardson, S.D. Lingua: Addressing Scenarios for Live Interpretation and Automatic Dubbing. In Proceedings of the Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track); Campbell, J.; Larocca, S.; Marciano, J.; Savenkov, K.; Yanishevsky, A., Eds., Orlando, USA, 2022; pp. 202–209.
- Raudaschl, A.H. Forget RAG, the Future is RAG-Fusion. https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1, 2023.
- Koehn, P. Europarl: A Parallel Corpus for Statistical Machine Translation. MT Summit 2005.
- Pang, R.Y.; Parrish, A.; Joshi, N.; Nangia, N.; Phang, J.; Chen, A.; Padmakumar, V.; Ma, J.; Thompson, J.; He, H.; et al. QuALITY: Question answering with long input texts, yes! arXiv preprint arXiv:2112.08608 2021. [CrossRef]
- Lyu, Y.; Li, Z.; Niu, S.; Xiong, F.; Tang, B.; Wang, W.; Wu, H.; Liu, H.; Xu, T.; Chen, E. CRUD-RAG: A comprehensive chinese benchmark for retrieval-augmented generation of large language models. arXiv preprint arXiv:2401.17043 2024. [CrossRef]
- Chen, D.; Yih, W.t. Open-domain question answering. In Proceedings of the Proceedings of the 58th annual meeting of the association for computational linguistics: tutorial abstracts, 2020, pp. 34–37.
- Li, X.; Nie, E.; Liang, S. From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL. arXiv preprint arXiv:2311.06595 2023. [CrossRef]
- Kim, S.; Joo, S.J.; Kim, D.; Jang, J.; Ye, S.; Shin, J.; Seo, M. The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning. arXiv preprint arXiv:2305.14045 2023. [CrossRef]
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems 2022, 35, 27730–27744.
- Qin, Y.; Cai, Z.; Jin, D.; Yan, L.; Liang, S.; Zhu, K.; Lin, Y.; Han, X.; Ding, N.; Wang, H.; et al. WebCPM: Interactive Web Search for Chinese Long-form Question Answering. arXiv preprint arXiv:2305.06849 2023. [CrossRef]
- Repository, L.D. Contract Review Workflow using LlamaCloud. https://github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/contract_review/contract_review.ipynb, 2025. Accessed: 2025-01-13.
- Minaee, S.; Mikolov, T.; Nikzad, N.; Chenaghlu, M.; Socher, R.; Amatriain, X.; Gao, J. Large Language Models: A Survey, 2024, [arXiv:cs.CL/2402.06196]. [CrossRef]
- Community, I.G. Agentic RAG: AI Agents with IBM Granite Models. https://github.com/ibm-granite-community/granite-snack-cookbook/blob/main/recipes/AI-Agents/Agentic_RAG.ipynb. Accessed: 2025-01-14.
- Lin, X.V.; Chen, X.; Chen, M.; Shi, W.; Lomeli, M.; James, R.; Rodriguez, P.; Kahn, J.; Szilvasy, G.; Lewis, M.; et al. RA-DIT: Retrieval-Augmented Dual Instruction Tuning. arXiv preprint arXiv:2310.01352 2023. [CrossRef]
- Thakur, N.; Bonifacio, L.; Zhang, X.; Ogundepo, O.; Kamalloo, E.; Alfonso-Hermelo, D.; Li, X.; Liu, Q.; Chen, B.; Rezagholizadeh, M.; et al. "Knowing When You Don’t Know": A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation, 2024, [arXiv:cs.CL/2312.11361]. [CrossRef]
- Chan, D.M.; Ghosh, S.; Rastrow, A.; Hoffmeister, B. Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition. arXiv preprint arXiv:2301.02736 2023. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
