Submitted:
10 September 2025
Posted:
11 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We formulate abductive inference within the RAG framework, defining the task of generating and validating missing premises.
- We propose a modular pipeline that detects insufficiency, performs abductive generation, and validates candidate premises via entailment and retrieval-based checks.
- We demonstrate improvements on abductive reasoning and multi-hop QA benchmarks, showing that our approach reduces hallucination and increases answer accuracy.
2. Related Work
2.1. Retrieval-Augmented Generation
2.2. Abductive and Multi-hop Reasoning
2.3. Premise Validation and Context Modeling
2.4. Theoretical Perspectives
2.5. Premise Validation and Faithfulness
2.6. Problem Definition
2.7. Insufficiency Detection
2.8. Abductive Premise Generation
2.9. Premise Validation
- Consistency Check: Using an NLI model, we test whether contains contradictions.
- Plausibility Check: We query an external retriever or knowledge base to verify whether has empirical support.
2.10. Answer Generation
3. Experiments
3.1. Datasets
- Robust RAG Benchmarks (Sang, 2025) [2]: Designed to test the robustness of RAG systems under noisy retrieval inputs. This benchmark is especially relevant for premise validation, since abductive inference must handle retrieval imperfections.
- Explainable RAG Evaluation (Sang, 2025) [3]: Focuses on tracing how retrieved passages influence generation. We use this benchmark to evaluate whether abductively generated premises improve explainability and reduce spurious influences from irrelevant passages.
- Context-Aware Dialogue Benchmarks (Liu et al., 2024) [8]: Multi-turn chat tasks where maintaining consistency across turns is crucial. We evaluate whether abductive premises help bridge missing context between utterances.
3.2. Baselines
- Robust-RAG (Sang, 2025) [2]: A retrieval-augmented baseline evaluated under noisy retrieval settings, representing the robustness frontier.
- Explainable-RAG (Sang, 2025) [3]: A framework that traces the influence of retrieved passages on generation, serving as a state-of-the-art faithfulness-oriented baseline.
- Reward-Shaped Multi-hop Reasoning (Li et al., 2024) [4]: Enhances reasoning across knowledge graphs through reinforcement learning with reward shaping, offering strong performance on multi-hop tasks.
- Compressed-Context KG Reasoning (Quach et al., 2024) [5]: Integrates compressed contexts into knowledge graph reasoning, showing gains in efficiency and reasoning accuracy.
- Context-Aware Dialogue Models (Liu et al., 2024) [8]: Models long conversational context explicitly, reducing contradictions in multi-turn interactions.
- Transformer-based Context Modeling (Wu et al., 2025) [7]: A baseline highlighting architectural improvements for contextual understanding in LLMs.
3.3. Evaluation Metrics
- Answer Accuracy: Exact Match (EM) and F1 scores for QA tasks.
- Premise Plausibility: Human evaluation on a 5-point Likert scale assessing whether generated premises are reasonable and non-contradictory.
- Faithfulness: Contradiction rate measured via NLI, i.e., percentage of generated answers contradicting retrieved evidence.
3.4. Implementation Details
4. Results and Discussion
4.1. Quantitative Results
4.2. Ablation Study
4.3. Case Study
4.4. Discussion
5. Conclusion
References
- He, B.; Chen, N.; He, X.; Yan, L.; Wei, Z.; Luo, J.; Ling, Z.H. Retrieving, Rethinking and Revising: The Chain-of-Verification Can Improve Retrieval Augmented Generation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024. Association for Computational Linguistics; 2024; pp. 10371–10393. [Google Scholar]
- Sang, Y. Robustness of Fine-Tuned LLMs under Noisy Retrieval Inputs 2025.
- Sang, Y. Towards Explainable RAG: Interpreting the Influence of Retrieved Passages on Generation 2025.
- Li, C.; Zheng, H.; Sun, Y.; Wang, C.; Yu, L.; Chang, C.; Tian, X.; Liu, B. Enhancing multi-hop knowledge graph reasoning through reward shaping techniques. In Proceedings of the 2024 4th International Conference on Machine Learning and Intelligent Systems Engineering (MLISE). IEEE; 2024; pp. 1–5. [Google Scholar]
- Quach, N.; Wang, Q.; Gao, Z.; Sun, Q.; Guan, B.; Floyd, L. Reinforcement Learning Approach for Integrating Compressed Contexts into Knowledge Graphs. In Proceedings of the 2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL); 2024; pp. 862–866. [Google Scholar] [CrossRef]
- Wang, C.; Yang, Y.; Li, R.; Sun, D.; Cai, R.; Zhang, Y.; Fu, C. Adapting llms for efficient context processing through soft prompt compression. In Proceedings of the Proceedings of the International Conference on Modeling, Natural Language Processing and Machine Learning, 2024, pp. 91–97.
- Wu, T.; Wang, Y.; Quach, N. Advancements in natural language processing: Exploring transformer-based architectures for text understanding. In Proceedings of the 2025 5th International Conference on Artificial Intelligence and Industrial Technology Applications (AIITA). IEEE; 2025; pp. 1384–1388. [Google Scholar]
- Liu, M.; Sui, M.; Nian, Y.; Wang, C.; Zhou, Z. Ca-bert: Leveraging context awareness for enhanced multi-turn chat interaction. In Proceedings of the 2024 5th International Conference on Big Data &, 2024, Artificial Intelligence & Software Engineering (ICBASE). IEEE; pp. 388–392.
- Gao, Z. Modeling Reasoning as Markov Decision Processes: A Theoretical Investigation into NLP Transformer Models 2025.
- Wang, C.; Sui, M.; Sun, D.; Zhang, Z.; Zhou, Y. Theoretical analysis of meta reinforcement learning: Generalization bounds and convergence guarantees. In Proceedings of the Proceedings of the International Conference on Modeling, Natural Language Processing and Machine Learning, 2024, pp. 153–159.
- Sheng, Y. Evaluating Generalization Capability of Language Models: Abductive, Inductive, and Deductive Reasoning. In Proceedings of the Proceedings of COLING 2025, 2025. [Google Scholar]
- Qin, Y.; Li, S.; Nian, Y.; Yu, X.V.; Zhao, Y.; Ma, X. Don’t Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning. arXiv preprint, arXiv:2504.06438 2025.
- Lee, Z.; Cao, S.; Liu, J.; Zhang, J.; Liu, W.; Che, X.; Hou, L.; Li, J. ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation. arXiv preprint, arXiv:2503.21729 2025.
- Das, D.; O’ Nuallain, S.; Rahimi, R. RaDeR: Reasoning-aware Dense Retrieval Models. arXiv preprint, arXiv:2505.18405 2025.


| Model | HotpotQA (F1) | EntailmentBank (EM) | ART |
|---|---|---|---|
| LLM-only | 51.2 | 38.5 | 2.9 |
| RAG | 67.8 | 54.3 | 3.1 |
| FiD | 71.4 | 57.6 | 3.2 |
| HyDE | 72.0 | 59.1 | 3.4 |
| Ours-Abductive RAG | 75.3 | 61.5 | 4.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).