Submitted:
29 August 2024
Posted:
30 August 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Our Viewpoints and Contributions
- The proposed framework mitigates the hallucination in LLMs by using RAG. With the notion of tags. the model controls retrieval and self-evaluates the retrieved content, selecting only high-scoring information for generation, thus further reducing hallucinations.
- By incorporating RAG, the model can retrieve external databases for answers when it judges that it cannot answer effectively. The real-time nature of the external data enhances the model’s real-time performance.
- The framework allows for parallel content generation for retrieved documents and step-by-step evaluation using reflective tags to assess relevance, validity, and accuracy. This process ensures that only the most critical and valid information is selected, filtering out irrelevant or unreliable information and generating more accurate responses.
2. Retrieval-Augmented Generation (RAG)
2.1. Workflow of RAG
2.2. Existing Research on Improving RAGs
3. Proposed Method
3.1. Document Retrieval
3.2. Control the Document Retrieval Using Reflection Tags
3.3. Evaluation of Retrieved Documents
- Evaluate whether document is relevant to the user query, and assign one of RLV tags: “Relevant" and “Irrelevant", where the cosine similarity of feature vectors is used to assess the relevance.
- Assess whether the document supports the input, and based on the degree of support, assign one of the following SPT tags:“Fully Supported”, “Partially Supported”, and “Not Supported”.
- Determine the suitability of document for use in the RAG framework, and assign one of the PIT tags ranging from 1 to 5, where 5 means Highly Appropriate and 1 means Least Appropriate.
3.4. Contents Generation
4. Evaluation
4.1. Setup
- ARC-Challenge: A fact-checking dataset comprising multiple-choice science questions from elementary to high school levels. The more challenging ARC-Challenge subset was utilized, requiring advanced reasoning. Preprocessing resulted in 1,172 data points.
- PubHealth: A fact-checking dataset containing public health statements, corresponding articles, and fact-checking annotations. After preprocessing, 987 data points remained.
- PopQA: An open-domain question-answering dataset covering various domains. A long-tail subset, primarily from Wikipedia, was selected. Preprocessing yielded 1,399 data points.
- TriviaQA: An open-domain question-answering dataset with 95,000 question-answer pairs. Known for its challenging long contexts and inference requirements, preprocessing reduced the dataset to 11,313 data points.
4.2. Comparison of Four Models
4.3. Impact of the Number of Reference Passages K
4.4. Significance of Reflection Tags
4.5. Qualitative Analysis
5. Concluding Remarks
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. Context Construction through Chain of Thought (CoT)
Appendix A.1.1. Overview
Appendix A.1.2. Advantages of CoT
References
- Creswell, A., Shanahan, M., and Higgins, I. (2022). Selection-inference: Exploiting large language models for interpretable logical reasoning. arXiv preprint arXiv:2205.09712.
- Guu, K., Lee, K., Tung, Z., Pasupat, P., and Chang, M. (2020, November). Retrieval augmented language model pre-training. in International conference on machine learning (pp. 3929-3938). PMLR.
- Izacard, G., Lewis, P., Lomeli, M., Hosseini, L., Petroni, F., Schick, T., Dwivedi-Yu, J., Joulin, A., Riedel, S., and Grave, E. (2023). Atlas: Few-shot learning with retrieval augmented language models. Journal of Machine Learning Research, 24(251), 1-43.
- Jiang, Z., Xu, F. F., Gao, L., Sun, Z., Liu, Q., Dwivedi-Yu, J., Yang, Y., Jamie Callan, J., and Neubig, G. (2023). Active retrieval augmented generation. arXiv preprint arXiv:2305.06983.
- Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., and Iwasawa, Y. (2022). Large language models are zero-shot reasoners. Advances in neural information processing systems, 35, 22199-22213.
- Korbak, T., Shi, K., Chen, A., Bhalerao, R. V., Buckley, C., Phang, J., Bowman, S. R., and Perez, E. (2023, July). Pretraining language models with human preferences. in International Conference on Machine Learning (pp. 17506-17533). PMLR.
- Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-T., Rocktäschel, T., Riedel, S., and Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.
- Lin, X. V., Chen, X., Chen, M., Shi, W., Lomeli, M., James, R., Rodriguez, P., Kahn, J., Szilvasy, G., Lewis, M., Zettlemoyer, L., and Yih, S. (2023). Ra-dit: Retrieval-augmented dual instruction tuning. arXiv preprint arXiv:2310.01352.
- Luo, H., Chuang, Y. S., Gong, Y., Zhang, T., Kim, Y., Wu, X., Fox, D., Meng, H., and Glass, J. (2023). Sail: Search-augmented instruction learning. arXiv preprint arXiv:2305.15225.
- Ma, K., Cheng, H., Zhang, Y., Liu, X., Nyberg, E., and Gao, J. (2023). Chain-of-skills: A configurable model for open-domain question answering. arXiv preprint arXiv:2305.03130.
- Mallen, A., Asai, A., Zhong, V., Das, R., Khashabi, D., and Hajishirzi, H. (2022). When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. arXiv preprint arXiv:2212.10511.
- Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., and Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35, 27730-27744.
- Ram, O., Levine, Y., Dalmedigos, I., Muhlgay, D., Shashua, A., Leyton-Brown, K., and Shoham, Y. (2023). In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics, 11, 1316-1331.
- Shi, F., Chen, X., Misra, K., Scales, N., Dohan, D., Chi, E. H., Schärli, N., and Zhou, D. (2023, July). Large language models can be easily distracted by irrelevant context. In International Conference on Machine Learning (pp. 31210-31227). PMLR.
- Wang, K., Duan, F., Wang, S., Li, P., Xian, Y., Yin, C., Rong, W., and Xiong, Z. (2023). Knowledge-driven cot: Exploring faithful reasoning in llms for knowledge-intensive question answering. arXiv preprint arXiv:2308.13259.
- Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Xia, F., Le, Q., and Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35, 24824-24837.
- Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., and Le, Q. V. (2021). Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Yoran, O., Wolfson, T., Ram, O., and Berant, J. (2023). Making retrieval-augmented language models robust to irrelevant context. arXiv preprint arXiv:2310.01558.
- Yu, D., Zhu, C., Fang, Y., Yu, W., Wang, S., Xu, Y., Ren, X., Yang, Y., and Zeng, M. (2021). Kg-fid: Infusing knowledge graph in fusion-in-decoder for open-domain question answering. arXiv preprint arXiv:2110.04330.
- Zhang, Z., Zhang, A., Li, M., Zhao, H., Karypis, G., and Smola, A. (2023). Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923.





| Q0 | How did Van Gogh create "The Starry Night"? |
| I0 | Split the problem based on the steps in the sample below. Example: Why does the price of gold rise? Split the example question into: What are the main factors that affect the price of gold? How does economic uncertainty affect the price of gold? How does inflation affect the price of gold? How do supply and demand affect the price of gold? |
| Q1 | What is the background of the creation of "Starry Night"? |
| Q2 | What techniques did Van Gogh use to create this painting? |
| Q3 | What are the characteristics of the color and composition of this painting? |
| Q4 | What is the significance of this painting in art history? |
| S0 | "Starry Night" is an oil painting created by Dutch post-impressionist painter Vincent van Goch. |
| S1 | Vincent van Goch created The Starry Night in 1889 in a mental hospital in Saint-Remy, France. |
| S2 | now is in the collection of the Museum of Modern Art in New York. |
| S3 | The inspiration came from the scenery he saw from the window of his room, combined with his memory and imagination. |
| S4 | The detached from reality scene reflects Van Goch’s restless emotions and crazy hallucination world. |
| Model | ARC-challenge | PubHealth | PopQA-longtail | TriviaQA |
| Qwen | 81.83 | 58.42 | 59.97 | 64.64 |
| GPT-3.5 | 77.99 | 68.70 | 56.90 | 72.86 |
| Qwen+RAG | 83.02 (+1.09) | 65.51 (+7.09) | 62.26 (+2.29) | 66.42 (+1.78) |
| GPT-3.5+RAG | 81.57 (+3.58) | 72.26 (+3.56) | 57.68 (+0.78) | 73.76 (+0.9) |
| Id | Question |
| Mercury_7234308 | A scientist maps a long region in which earthquakes originate and determines this region is a transform plate boundary. Which evidence would cause the scientist to reevaluate this determination? |
| Mercury_184975 | To determine how closely related organisms are, scientists consider all of the following. |
| Mercury_SC_400578 | Which is an example of learned behavior? |
| AKDE&ED_2008_4_26 | Which example shows a relationship between a living thing and a nonliving thing? |
| Id | Reference | Predict | Rag-result |
| Mercury_7234308 | A | B | A |
| Mercury_184975 | C | B | C |
| Mercury_SC_400578 | A | C | A |
| AKDE&ED_2008_4_26 | C | B | C |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).