Submitted:
01 December 2025
Posted:
02 December 2025
You are already at the latest version
Abstract
Domain-specific question answering (QA) systems for services face unique challenges in integrating heterogeneous knowledge sources while ensuring both accuracy and safety. Existing large language models often struggle with factual consistency and context alignment in sensitive domains such as healthcare policies and government welfare. In this work, we introduce Knowledge-Aware Reasoning and Memory-Augmented Adaptation (KARMA), a novel framework designed to enhance QA performance in care scenarios. KARMA incorporates a dual-encoder architecture to fuse structured and unstructured knowledge sources, a gated memory unit to dynamically regulate external knowledge integration, and a safety-aware controllable decoder that mitigates unsafe outputs using safety classification and guided generation techniques. Extensive experiments on a proprietary QA dataset demonstrate that KARMA outperforms strong baselines in both answer quality and safety. This study offers a comprehensive solution for building trustworthy and adaptive QA systems in service contexts.
Keywords:
1. Introduction
2. Related Work
3. Methodology
4. Algorithm and Model
4.1. Multi-Source Knowledge Fusion
4.2. Gated Memory Unit
4.3. Safety-Aware Controllable Decoder
4.4. Objective Function
4.5. Implementation Details
5. Loss Function Design
5.1. Language Modeling Loss
5.2. Safety Classification Loss
5.3. Contrastive Knowledge Alignment Loss
5.4. Gating Regularization Loss
6. Prompt Design Strategy
- Instructional Prompts: State the task and constraints in plain language, e.g., “You are a digital assistant helping elderly users with government services. Answer clearly and safely.”
-
Knowledge Injection Prompts: Prepend retrieved knowledge in a fixed segment:[KNOWLEDGE]: <SEP><SEP>…<SEP>then append the query:[QUESTION]: “How can I apply for a senior transit card?”
- Safety Tokens: Insert control tokens <SAFE> and <REJECT> during decoding. Train the decoder to emit <REJECT> when .
-
Multi-Turn Prompts: Use history-aware templates to model dialogue:[USER]: “I want to get a health subsidy.”[SYSTEM]: “You can apply through the local government portal. Would you like help?”[USER]: “Yes, please show me how.”[SYSTEM]: ____ (Generated response)
7. Evaluation Metrics
7.1. Accuracy
7.2. F1 Score
7.3. Rejection Rate
7.4. Knowledge Relevance Score
8. Experiment Results
9. Conclusions
References
- Mo, L.; Wang, Z.; Zhao, J.; Sun, H. Knowledge transfer between structured and unstructured sources for complex question answering. In Proceedings of the Proceedings of the Workshop on Structured and Unstructured Knowledge Integration (SUKI), 2022, pp. 55–66.
- Huang, X.; Wang, Z.; Liu, X.; Tian, Y.; Leng, Q. Towards Interpretable and Consistent Multi-Step Mathematical Reasoning in Large Language Models 2025.
- Luo, X. Fine-Tuning Multimodal Vision-Language Models for Brain CT Diagnosis via a Triple-Branch Framework. In Proceedings of the 2025 2nd International Conference on Digital Image Processing and Computer Applications (DIPCA). IEEE, 2025, pp. 270–274.
- Sun, A. Real-Time Delivery Prediction Framework with Spatio-Temporal Fusion and LLM Semantic Enhancement. Preprints 2025. [Google Scholar] [CrossRef]
- Ajayi, K.; Zhang, L.; He, Y.; Wu, J. Uncertainty Quantification in Table Structure Recognition. In Proceedings of the 2024 IEEE International Conference on Information Reuse and Integration for Data Science (IRI). IEEE, 2024, pp. 1–6.
- Sun, Y.; Shi, Y.; Du, J. A Real-Time Adaptive Multi-Stream GPU System for Online Approximate Nearest Neighborhood Search. In Proceedings of the Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024, pp. 4906–4913.
- Yu, H. Hybrid Modal Decoupled Fusion for Stable Multilingual Code Generation. Preprints 2025. [Google Scholar] [CrossRef]
- Mudgal, S.; Lee, J.; Ganapathy, H.; Li, Y.; Wang, T.; Huang, Y.; Chen, Z.; Cheng, H.T.; Collins, M.; Strohman, T.; et al. Controlled decoding from language models. arXiv preprint arXiv:2310.17022 2023. arXiv:2310.17022 2023.
- Liu, J. Knowledge-Augmented News Recommendation via LLM Recall, Temporal GNN Encoding, and Multi-Task Ranking. In Proceedings of the 2025 6th International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE). IEEE, 2025, pp. 141–144.
- Sun, Y.; Cui, Y.; Hu, J.; Jia, W. Relation classification using coarse and fine-grained networks with SDP supervised key words selection. In Proceedings of the International Conference on Knowledge Science, Engineering and Management. Springer, 2018, pp. 514–522.
- Yu, H.; Yu, C.; Wang, Z.; Zou, D.; Qin, H. Enhancing healthcare through large language models: A study on medical question answering. In Proceedings of the 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems (ICPICS). IEEE, 2024, pp. 895–900.
- Guo, R. Multi-Modal Hierarchical Spatio-Temporal Network with Gradient-Boosting Integration for Cloud Resource Prediction. Preprints 2025. [Google Scholar] [CrossRef]




| Model | Accuracy | F1 Score | RR | KRS |
|---|---|---|---|---|
| Baseline LLaMA | 71.2% | 69.8% | 2.1% | 0.642 |
| + MKF | 82.9% | 81.5% | 4.3% | 0.819 |
| + MKF + GMU | 84.7% | 83.2% | 6.0% | 0.845 |
| KARMA (Full) | 86.1% | 85.0% | 12.4% | 0.882 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).