Submitted:
25 August 2025
Posted:
27 August 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Understanding Hallucinations
2.1. Definition of Hallucinations
2.2. Categories of Hallucinations
-
Intrinsic hallucinations (factuality errors) occur when a model generates content that contradicts established facts, its training data, or referenced input [6,12,13,62,137,286,315]. Following the taxonomic names in [292] the subtypes of this category may include (but are not limited to):
- ○
- Entity-error hallucinations, where the model generates non-existent entities or misrepresents their relationships (e.g., inventing fake individuals, non-existent biographical details [87] or non-existent research papers), often measured via entity-level consistency metrics [98], as shown in [13,28,208,286].
- ○
- ○
- ○
-
Extrinsic hallucinations (faithfulness errors) appear when the generated content deviates from the provided input or user prompt. These hallucinations are generally characterized by the inability to verify the generated output which may or may not be true but, in either case, it is either not directly deducible from the user prompt or it contradicts itself [12,13,62,219,279,292]. Extrinsic hallucinations may manifest as:
- ○
- ○
- ○
- Emergent hallucinations, defined as those arising unpredictably in larger models due to scaling effects [92]. These can be attributed to cross-domain reasoning and modality fusion especially in multi-modal settings or Chain of Thought (CoT) prompting scenarios [13,92,123], multi-step inference errors [147] and abstraction or alignment issues as shown in [28,62] and [123]. For instance, self-reflexion demonstrates mitigation capabilities, effectively reducing hallucinations only in models above a certain threshold (e.g., 70B parameters), while paradoxically increasing errors in smaller models due to limited self-diagnostic capacity [292].
2.3. Underlying Causes of Hallucinations
3. Related Works
4. Review Methodology, Proposed Taxonomy, Contributions and Limitations
4.1. Review Methdology
- Literature Retrieval: We systematically collected research papers from major electronic archives—including Google Scholar, ACM Digital Library, IEEE Xplore, Elsevier, Springer, and ArXiv—with a cutoff date of August 12, 2025. Eligible records were restricted to peer-reviewed journal articles, conference papers, preprints under peer review, and technical reports, while non-academic sources such as blogs or opinion pieces were excluded. A structured query was used, combining keywords: ("mitigation" AND "hallucination" AND "large language models") OR "evaluation". In addition, we examined bibliographies of retrieved works to identify further relevant publications.
- Screening: The screening process followed a two-stage approach. First, titles and abstracts were screened for topical relevance. Records passing this stage underwent a full-text review to assess eligibility. Out of 412 initially retrieved records, 83 were excluded as irrelevant at the screening stage. The 329 eligible papers were then examined in detail and further categorized into support studies, literature reviews, datasets/benchmarks, and works directly proposing hallucination detection or mitigation methods. The final set of 221 studies formed the basis of our taxonomy. This process is summarized in the PRISMA-style diagram below.

- Paper-level tagging, where every study was assigned one or more tags corresponding to its employed mitigation strategies. Our review accounts for papers that propose multiple methodologies by assigning them multiple tags, ensuring a comprehensive representation of each paper’s contributions.
- Thematic clustering, where we consolidated those tags into six broad categories presented analytically in 4.2. This enabled us to generate informative visualizations that reflect the prevalence and trends among different mitigation techniques.
- Content-specific retrieval: To gain deeper insight into mitigation strategies, we developed a custom Retrieval-Augmented Generation (RAG) system based on the Mistral language model as an additional research tool, which enabled us to extract content-specific passages directly from the research papers.
4.2. Proposed Taxonomy and Review Organization
- Training and Learning Approaches (5.1): Encompasses diverse methodologies employed to train and refine AI models, shaping their capabilities and performance (e.g., Supervised Learning, Reinforcement Learning, Knowledge Distillation).
- Architectural Modifications (5.2): Covers structural changes and enhancements made to AI models and their inference processes to improve performance, efficiency, and generation quality (e.g., Attention Mechanisms, Decoding Strategies, Retrieval Augmented Generation).
- Input/Prompt Optimization (5.3): Focuses on strategies for crafting and refining the text provided to AI models to steer their behavior and output, often specifically to mitigate hallucinations (e.g., Prompt Engineering, Context Optimization).
- Post-Generation Quality Control (5.4): Encompasses essential post-generation checks applied to text outputs, aiming to identify or correct inaccuracies (e.g., Self-verification, External Fact-checking, Uncertainty Estimation).
- Interpretability and Diagnostic Approaches (5.5): Encompasses methods that help researchers understand why and where a model may be hallucinating (e.g., Internal State Probing, Attribution-based diagnostics).
- Agent-based Orchestration (5.6): Includes frameworks comprising single or multiple LLMs within multi-step loops, enabling iterative reasoning, tool usage, and dynamic retrieval (e.g., Reflexive Agents, Multi-agent Architectures).
4.3. Contributions and Key Findings
5. Methods for Mitigating Hallucinations
5.1. Training and Learning Approaches
5.1.1. Supervised and Semi-Supervised Learning
- Fine-Tuning with factuality objectives, where techniques such as FactPEGASUS make use of ranked factual summaries for factuality-aware fine-tuning [105] while FAVA generates synthetic training data using a pipeline involving error insertion and post-processing to detect and correct fine-grained hallucinations [112]. Similarly, Faithful Finetuning employs weighted cross-entropy and fact-grounded QA losses to enhance faithfulness and minimize hallucinations [209]. Principle Engraving fine-tunes the base LLaMA model on self-aligned responses that adhere to specific principles [230], while [292] explores how the combination of supervised fine-tuning and RLHF impacts hallucinations. Wasserstein Generative Adversarial Networks (GANs) provide the conceptual basis for [17] which introduces Adversarial Feature Hallucination Networks (AFHN). AFHN synthesizes fake features for new classes by using labeled samples as conditional context. The framework uses a classification regularizer for feature discriminability and an anti-collapse regularizer that boosts the diversity of the synthesized features.
- Synthetic Data & Weak Supervision, where studies automatically generated hallucinated data or weak labels for training. For instance, in [68] hallucinated tags are prepended to the model inputs so that it can learn from annotated examples to control hallucination levels while [81] uses BART and cross-lingual models with synthetic hallucinated datasets for token-level hallucination detection. Similarly, Petite Unsupervised Research and Revision (PURR) involves fine-tuning a compact model on synthetic data comprised of corrupted claims and their denoised versions [235] while TrueTeacher uses labels generated by a teacher LLM to train a student model on factual consistency [311].
- Preference-Based Optimization and Alignment: In [114] a two-stage framework first combines supervised fine-tuning using curated legal QA data and Hard Sample-aware Iterative Direct Preference Optimization (HIPO) to ensure factuality by leveraging signals based on human preferences while in [270] a lightweight classifier is finetuned on contrastive pairs (hallucinated vs. non-hallucinated outputs). Similarly, mFACT—a metric for factual consistency—is derived from training classifiers in different target languages [79], while Contrastive Preference Optimization (CPO) combines a standard negative-log likelihood loss with a contrastive loss to finetune a model on a dataset consisting of triplets (source, hallucinated translation, corrected translation) [206]. UPRISE employs a retriever model that is trained using signals from an LLM to select optimal prompts for zero-shot tasks, allowing the retriever to directly internalize alignment signals from the LLM [322]. Finally, behavioral tuning uses label data (dialogue history, knowledge sources, and corresponding responses) to improve alignment [84].
- Knowledge-Enhanced Adaptation: Techniques like HALO injects Wikidata entity triplets or summaries via fine-tuning [140] while Joint Entity and Summary Generation employs a pre-trained Longformer model which is finetuned on the PubMed dataset, in order to mitigate hallucinations by supervised adaptation and data filtering [134]. The impact of injecting new knowledge in LLMs via supervised finetuning and the potential risk of hallucinations is also studied in [89].
- Hallucination Detection Classifiers: [142] involves fine-tuning a LLaMA-2-7B model to classify hallucination-prone queries using labeled data while in [129] a sample selection strategy improves the efficiency of supervised fine-tuning by reducing annotation costs while preserving factuality through supervision.
- Training of factuality classifiers: Supervised finetuning is used to train models on labeled text data in datasets such as HDMBENCH, TruthfulQA, and multilingual datasets demonstrating improvements in task-specific performance and factual alignment [33,138,211]. Additionally, training enables classifiers to detect properties such as honesty and lies within intermediate representations resulting in increased accuracy and separability of these concepts as shown in [148,256,296].
- Synthetic data creation: In the Fine-Grained Process Reward Model (FG-PRM), various hallucination types are injected into correct solutions of reasoning steps. The synthetic dataset thus created is used to train six Process Reward Models, each able to detect and mitigate a specific hallucination type [111] while techniques such as RAGTruth includes human-annotated labels indicating whether generated responses are grounded in retrieved content, which enables supervised training and evaluation of hallucination detection models [241]. Similarly, [97] addresses over-reliance on parametric knowledge by introducing an entity-based substitution framework that generates conflicting QA instances by replacing named entities in the context and answer.
- Refining pipelines: Supervised training is used to train a critic model using the base LLM’s training data and synthetic negatives [71]. TOPICPREFIX is an augmentation technique that prepends topic entities from Wikipedia to improve contextual grounding and factuality [108] while the training of the Hypothesis Verification Model (HVM) on the FATE dataset aims to help the model recognize faithful and unfaithful text [302], while similar approaches discern between truthful and untruthful representations [313,315,316]. Self-training is used to train models on synthetic data with superior results compared to crowdsourced data [327] and finally, in WizardLM a LLaMA model is finetuned on generated instructions, thus resulting in better generalization [328].
5.1.2. Reinforcement Learning
5.1.3. Contrastive Learning
5.1.4. Knowledge Distillation
5.1.5. Instruction Tuning
- Factual alignment, such as in [114], where LLMs are finetuned using a dataset of legal instructions and responses, explicitly grounding outputs in legal knowledge, thus aligning their behavior to domain-specific prompts and reducing factual hallucinations. Curriculum-based Contrastive Learning-based Cross-lingual Chain-of-Thought (CCL-XCoT) combines curriculum-based cross-lingual contrastive learning with instruction fine-tuning to transfer factual knowledge from high-resource to low-resource languages. Furthermore, its Cross-lingual Chain of Thought (XCoT) strategy guides the model to reason in a high-resource language and then generate in the target language, thereby reducing context-related hallucinations [50].
- Consistency alignment, which is achieved in [153] during a two-stage supervised fine-tuning process: The first step uses instruction–response pairs while in the second step, pairs of semantically similar instructions, which are implemented via contrastive-style learning, are used to enforce aligned responses across instructions.
- Data-centric grounding, where Self-Instruct introduces a scalable, semi-automated method for generating diverse data without human annotation [271]. It begins with a small set of instructions and uses a pre-trained LLM to generate new tasks and corresponding input-output examples, which are then filtered and used to fine-tune the model, thus generating more aligned and grounded outputs.
5.2. Architectural Modifications
5.2.1. Attention Mechanisms
5.2.2. Decoding Strategies
- Probabilistic Refinement & Confidence-Based Adjustments, where techniques such as Context-aware decoding adjust token selection by prioritizing information aligned with relevant context so as to emphasize contextual information over its internal prior knowledge. This is achieved by amplifying the difference between output probabilities with and without the context, effectively downplaying prior knowledge when more relevant contextual details are available [66], [312]. Another direction involves entropy-based decoding adjustments, where the model’s cross-layer entropy or confidence values are used to penalize hallucination-prone outputs [150] and Conditional Pointwise Mutual Information (CPMI) which adjusts the score of the conditional entropy of the next-token distribution so as to prioritize tokens more aligned with the source [214]. [256] uses logits and their probabilities to refine the standard decoding process by interpreting and manipulating the outputs during generation, while Confident Decoding integrates predictive uncertainty into a beam search variant to reduce hallucinations, with epistemic uncertainty guiding token selection towards greater faithfulness [284]. Similarly, [220] modifies the beam search decoding process to prioritize outputs with lower uncertainty. This method leverages the connection between hallucinations and predictive uncertainty, demonstrating that higher predictive uncertainty correlates with a greater chance of hallucinations [220]. Selective abstention Learning (SEAL) introduces a selective abstention learning framework where an LLM is trained to output a special [REJ] token when predictions conflict with its parametric knowledge. During inference, its abstention-aware decoding leverages the [REJ] probability to penalize uncertain trajectories, thereby guiding generation toward more factual outputs [24]. Finally, Factual-nucleus sampling extends the concept of nucleus sampling [291] by adjusting the sampling randomness during decoding based on the sentence position, thereby significantly reducing factual errors [108].
- Beyond probabilistic refinements, some decoding strategies are inspired by contrastive learning to explicitly counter hallucinations. For instance, Decoding by Contrasting Retrieval Heads (DeCoRe) induces hallucinations through masking specific retrieval heads responsible for extracting contextual information and dynamically contrasting the outputs of the original base LLM and its hallucination-prone counterpart [72]. Delta mitigates hallucinations by randomly masking spans of the input prompt and then contrasting the output distributions generated for both the original and the masked prompts, thus effectively reducing the generation of hallucinated content [76]. Contrastive Decoding is an alternative to search-based decoding methods like nucleus or top-k sampling, which optimizes the difference between the log-likelihoods of an LLM and a SLM by introducing a plausibility constraint that filters out low-probability tokens [64]. Similarly, the Self-Highlighted Hesitation mechanism (SH2) uses contrastive decoding to manipulate the decision-making process at the token level by appending low-confidence tokens to the original context, thus leading the decoder to hesitate before generation [277]. Spectral Editing of Activations (SEA) projects token representations into directions of maximum information, thus amplifying signals that correlate with factuality while suppressing those linked to hallucinated outputs [283]. Similarly, Induce-then-contrast (ICD) constructs a "factually weak LLM" by inducing hallucinations from the original LLM, via fine-tuning with non-factual samples. These induced hallucinations are subsequently leveraged as a penalty term, thus effectively downplaying untruthful predictions [26]. In Active Layer Contrastive Decoding (ActLCD), a reward-driven classifier uses a reinforcement learning policy to determine when to apply contrastive decoding between selected layers, effectively framing decoding as a Markov decision process [15]. Finally, Self-contrastive Decoding (SCD) reduces the influence of tokens which are over-represented in the model’s training data to directly affect the selection of tokens during text generation, thus reducing knowledge overshadowing [165].
- Verification & Critic-Guided Mechanisms: A number of decoding strategies work in tandem with verification and critic-guided mechanisms to further improve the generation capabilities of the decoder. Critic-driven Decoding combines the probabilistic output of an LLM with a "text critic" classifier which assesses the generated text and steers the decoding process away from the generation of hallucinations [71]. Self-consistency samples from a diverse set of reasoning paths and selects the most consistent answer, leveraging the idea that correct reasoning tends to converge on the same answer. Furthermore, the consistency among the sampled paths can serve as an uncertainty estimate which helps to identify and mitigate potential hallucinations [268]. In Think While Effectively Articulating Knowledge (TWEAK) the generated sequences at each decoding step, along with their potential future sequences, are treated as hypotheses, which are subsequently reranked by an NLI model or a Hypothesis Verification Model (HVM) according to the extent to which they are supported by the input facts [302]. In a similar line of research, mFACT integrates a novel faithfulness metric directly into the decoding process, thereby evaluating each candidate summary regarding its factual consistency with the source document. Candidates that fall below a predetermined mFACT threshold are then pruned, effectively guiding the generation towards more factually accurate outputs [79]. Finally, Reducing Hallucination in Open-domain Dialogues (RHO) generates a set of candidate responses using beam search and re-ranks them based on their factual consistency, which is determined by analyzing various trajectories over knowledge graph sub-graphs extracted from an external knowledge base [260].
- Internal Representation Intervention & Layer Analysis: Understanding how LLMs encode information regarding the possible replies to a query, particularly within their early internal states, is particularly useful for developing decoding strategies that mitigate hallucinations [290]. Intermediate outputs which are prone to hallucinations often exhibit diffuse activation patterns, where activations are spread across multiple competing concepts rather than being focused on relevant references. In-context sharpness metrics, proposed in [154], leverage this observation by enforcing sharper token activations to ensure that predictions are derived from high-confidence knowledge areas. Similarly, Inference-time Intervention (ITI) involves shifting activations during inference and applies these adjustments iteratively until the full response is generated [155] while DoLa leverages differences in logits from earlier vs. later layers promoting the factual knowledge encoded in higher layers as opposed to syntactically plausible but less factual contributions from lower layers [90]. Activation Decoding is another constrained decoding method that directly adjusts token probabilities based on entropy-derived activations without having to retrain the model [154]. Finally, LayerSkip, uses self-speculative decoding and trains models with layer dropout and early exit loss, enabling tokens to be predicted from earlier layers and verified by later layers, thus accelerating inference while mitigating hallucinations [174].
- RAG-based Decoding: RAG-based decoding strategies integrate external knowledge to enhance factual consistency and mitigate hallucinations [255,258]. For instance, REPLUG prepends a different retrieved document for every forward pass of the LLM and averages the probabilities from these individual passes, thus allowing the model to produce more accurate outputs by synthesizing information from multiple relevant contexts simultaneously [255]. Similarly, Retrieval in Decoder (RID) dynamically adjusts the decoding process based on the outcomes of the retrieval, allowing the model to adapt its generation based on the confidence and relevance of the retrieved information [258].
5.2.3. Retrieval-Augmented Generation
5.2.4. Knowledge Representation Approaches
5.2.5. Specialized Architectural Mechanisms for Enhanced Generation
5.3. Input / Prompt Optimization
5.3.1. Prompt Engineering
- In dataset creation and evaluation, prompt engineering has been used to generate and filter references used for inference and evaluation [37,51,142,143], systematically induce, detect, or elicit imitative falsehoods [26,100,112,126,132,148,179,315], and even create specific types of code hallucinations to test research methodologies [60].
- For confidence assessment and behavioral guidance, it has been used to elicit verbalized confidence, consistency, or uncertainty, and test or guide model behavior and alignment [48,84,86,87,144,145,159,230,232,235,267,277,290,313], reduce corpus-based social biases [270], extract and verify claims [251] as well as investigate failure cascades like hallucination snowballing [147].
- In knowledge integration scenarios it has been combined with retrieval modules or factual constraints [190,241,259,282], in agentic environments where prompts guide the generation of states, actions, and transitions [245], or the alignment process between queries and external knowledge bases [297], and even in the training process of a model where they are used to inject entity summaries and triplets [140]. Additionally, prompts have also been explored as explicit, language-based feedback signals in reinforcement learning settings, where natural language instructions are parsed and used to fine-tune policy decisions during training [305].
- scalability issues which arise from the number of intermediate tasks or their complexity [299],
- context dilution which demonstrates that prompts often fail when irrelevant context is retrieved, especially in RAG scenarios [190],
- lack of standardized prompting workflows which makes prompt engineering a significant trial and error task not only for end-users but also for NLP experts [326], hindering reliable mitigation, and
5.3.2. Structured or Iterative Reasoning Prompting
- Structured reasoning prompts modify the model’s behavior in a single forward pass: the model follows the request to enumerate steps in one shot, as there is typically no separate module that determines how and when to make these steps or whether to call external tools.
- Iterative reasoning, on the other hand, can further improve the generative capabilities of a model, by guiding it to decompose a task into a series of steps, each of which builds upon, refines, and supports previous steps.
- exploit the dialog capabilities of LLMs to detect logical inconsistencies by integrating deductive formal methods [75].
5.3.3. In-Context Prompting
- Pattern Reinforcement: By seeing multiple demonstrations, the model better aligns its response style and factual consistency with provided examples. For instance, Principle-Driven Self-Alignment provides 5 in-context exemplars alongside 16 human-written principles that guide the model by providing clear patterns for how it should comply with these principles, thus aligning its behavior and internal thoughts with the desired behavior demonstrated in the examples [230].
- Bias Reduction: Balanced example selection can minimize systematic biases, particularly in ambiguous queries [44,106] while few-shot examples have been used to calibrate GPT-3’s responses, demonstrating how different sets of balanced vs. biased prompts significantly influence downstream performance [232].
5.3.4. Context Optimization
5.3.5. System Prompt Design
5.4. Post-Generation Quality Control
- Self-verification and Consistency Checking: Involves internal assessments of output quality, ensuring logical flow, and maintaining factual coherence within the generated content.
- External Fact-checking and Source Attribution: Validates information against outside authoritative sources or asks the model to explicitly name its sources.
-
Reliability Quantification: a broader subcategory that encompasses:
- ○
- Uncertainty Estimation (quantifying the likelihood of claims) and
- ○
- Confidence Scoring (assigning an overall reliability score to the output).
- Output Refinement: Involves further shaping and iteratively polishing the generated text.
- Response Validation: Strictly focuses on confirming that the output meets specific, pre-defined criteria and constraints.
5.4.1. Self-Verification and Consistency Checking
5.4.2. External Fact-Checking
5.4.3. Uncertainty Estimation & Confidence Scoring
Uncertainty Estimation
- Entropy-based approaches: The Real-time Hallucination Detection method (RHD) in [203] also leverages entropy to detect output entities with low probability and high entropy, which are likely to be potential hallucinations. When RHD determines that the model is likely to generate unreliable text, it triggers a self-correction mechanism [203]. A similar mechanism Conditional Pointwise Mutual Information (CPMI) in [214], quantifies model uncertainty via token-level conditional entropy. Specifically, the method identifies hallucinated token generation as corresponding to high entropy states, where the model is most uncertain, thus confirming that uncertainty is a reliable signal [214]. INSIDE leverages the model’s internal states to directly measure uncertainty with the EigenScore metric, which represents the differential entropy in the sentence embedding space while a feature clipping method mitigates overconfident generations [156]. Uncertainty in [165] is measured with Pointwise Mutual Information (PMI) which quantifies overshadowing likelihood by identifying low-confidence conditions that are likely to be overshadowed. In this case, shifts in model confidence under controlled perturbations are measured by using the probability difference between p(y|x) and p(y|x′) which essentially serves as a method of uncertainty-based detection [165]. The concept of entropy is extended in a number of research papers to encompass semantics. For instance, in [82], the authors measure the model’s uncertainty over the meaning of its answers rather than just variations in specific words by using "semantic entropy," an entropy-based uncertainty estimator which involves generating multiple answers, clustering them by semantic meaning, and then computing the entropy of these clusters to quantify the model’s uncertainty [82]. Similarly, in [276] "semantic entropy probes (SEPs)" gauge the model’s uncertainty from the hidden states of a single generation, which is a more efficient approach than sampling multiple responses, log probabilities or naive entropy according to the authors [276]. In an Epistemic Neural Network (ENN) extracts hidden layer features from LLaMA-2, feeding them into a small MLP ENN trained on next-token prediction, whose outputs are combined with DoLa contrastive decoding logits. This hybrid approach improves uncertainty estimation by allowing the model to down-weight low-confidence generations [249]. Finally, in [314], the conventional wisdom that hallucinations are typically associated with low confidence is challenged with the introduction of Certain Hallucinations Overriding Known Evidence (CHOKE). Specifically, the researches use and evaluate three uncertainty metrics (semantic entropy, Token probability, Top-2 token probability gap), showing that hallucinations can occur with high certainty, even when the model “knows” the correct answer [314].
- Sampling: Sampling is a classic method for uncertainty estimation as exemplified in [113], where a model’s outputs are sampled multiple times and the variability in its output is used as a proxy for its uncertainty. Contrary to the confidence score being a simple, single-token probability, it is a numerical value that is derived from the resampling process, and it acts as a signal for verification and a reward in its reinforcement learning stage [113]. Similarly, sampling is also used in [274] where increasing divergence between sampled responses is an indicator of hallucinated content and uncertainty which the authors measure with various methods such as BERTScore and NLI [274]. Finally, the framework presented in [48] leverages three main components: sampling strategies to generate multiple responses, different prompting techniques to elicit the model’s uncertainty and aggregation techniques which are used to combine these multiple responses and their associated confidence scores to produce a final, calibrated confidence score [48].
- Monte Carlo methods: Fundamental uncertainty measures such as sequence log-probability and Monte-Carlo dropout dissimilarity are leveraged as key metrics to detect hallucinations and inform subsequent stages, such as refinement, detection, and re-ranking of the generated text, capturing the variability and confidence in its predictions [173]. In [245], reward estimation is based on the log probability of actions, effectively capturing how "confident" a model is about specific reasoning steps. While the authors do not explicitly term this as "uncertainty estimation," we believe that their approach overlaps significantly because the reward function evaluates the plausibility of reasoning steps. Specifically, the Monte Carlo Tree Search (MCTS) uses these rewards to guide exploration, prioritizing reasoning paths with higher estimated rewards as the reward mechanism reflects the degree of trust in the reasoning trace generated by the LLM [245].
- Explicit Verbalization: The core method in [239] trains models to explicitly verbalize epistemic uncertainty by identifying "uncertain" vs. "certain" data. It uses supervised (prediction-ground truth mismatch) and unsupervised (entropy over generations) methods and subsequently evaluates model performance with Expected Calibration Error (ECE) and Average Precision (AP), enhancing models’ ability to express self-doubt about their knowledge. In a similar vein, SelfAware introduces a benchmark and method to assess a model’s self-knowledge by detecting when they should verbalize uncertainty in response to unanswerable questions [88]. Similarly, [288] fine-tunes GPT-3 to produce what the authors call "verbalized probability", which is essentially a direct expression of the model’s epistemic uncertainty. This teaches the model to be self-aware of its uncertainty, which is a "higher-order objective" that goes beyond the typical raw softmax-based confidence scoring. While we categorized [288] under Uncertainty Estimation, we do acknowledge that confidence scoring is a crucial component since it measures calibration of these scores using metrics such as mean square error (MSE) and mean absolute deviation (MAD). However, these scores are a direct outcome of the new verbalized probability method and not derived from existing logits [288].
- Semantic analysis: In [275], the authors propose semantic density to quantify uncertainty which measures similarity between a response and multiple completions in embedding space. Semantic density operates on a response-wise and not prompt-wise manner and doesn’t require model retraining, addressing key limitations of earlier uncertainty quantification methods (like semantic entropy or P(True)). While the authors consistently use the term “confidence” as a counterpart to “uncertainty”, and the final semantic density score is indeed a confidence indicator, yielding numerical scores in the range [0, 1] with provision for thresholding or filtering, we believe that this scoring is treated as the outcome of the proposed uncertainty metric [275]. Semantic analysis and logit-derived, token-level probabilistic measures are combined in [282] to calculate a confidence score for each atomic unit of an answer. This confidence is then integrated with textual entailment probabilities to produce a refined score to identify hallucinated spans. Although the authors use confidence scores that support thresholding, and consistently use the term "confidence," these confidence scores are used as part of a larger framework to detect model-generated hallucinations and thus we believe that confidence scoring is a means to an end, and that end is uncertainty estimation in [282].
- Training approaches: [280] explicitly links hard labels to overconfidence and proposes soft labels as a means of introducing uncertainty-aware supervision. This aligns with the theme of uncertainty estimation because their training objective is restructured to reflect model confidence calibration. They also evaluate overconfidence by plotting the NLL of incorrect answers, and argue that fine-tuning with soft labels reduces misplaced certainty—one of the major causes of hallucination [280]. The core contribution in [281], is a method to mitigate hallucinations by using "smoothed soft labels" as opposed to traditional hard labels that encourage overconfidence and disregard the inherent uncertainty in natural language. By introducing "uncertainty-aware supervision" through knowledge distillation, the student model learns from a more calibrated probability distribution. This approach aligns with the principle of maximum entropy and is designed to make models less overconfident and more reliable by improving factual grounding. This aligns with the theme of uncertainty estimation, because their training objective is restructured to reflect model confidence calibration. They also evaluate overconfidence by plotting the NLL of incorrect answers and argue that fine-tuning with soft labels reduces misplaced certainty—one of the major causes of hallucination [281].
- Composite methods: In [214], the authors utilize epistemic uncertainty to inform a modified beam search algorithm which prioritizes outputs with lower uncertainty, thus leading the model to reduce the generation of incorrect or nonexistent facts. In [96], a reference-free, uncertainty-based method uses a proxy model to calculate token and sentence-level hallucination scores based on uncertainty metrics. These metrics are then enhanced by focusing on keywords, propagating uncertainty through attention weights, and correcting token probabilities based on entity type and frequency to address over-confidence and under-confidence issues. Finally, in [4] the authors use the attention mechanism as a self-knowledge probe. Specifically, they design an uncertainty estimation head, which is essentially a lightweight attention head that relies on attention-derived features such as token-to-token attention maps and lookback ratios, serving as indicators of hallucination likelihood.
Confidence Scoring
5.4.4. Output Refinement
- RAG-based methods and Web searches, where external sources like documents or the web are directly used to retrieve information for output refinement. For instance, the Corrective Retrieval-Augmented Generation (CRAG) framework employs a lightweight retrieval evaluator and a decompose-then-recompose algorithm to assess the relevance of retrieved documents to a given query [69]. Similarly, EVER validates model outputs and iteratively rectifies hallucinations by revising intrinsic errors so that they align with factually verified content or re-formulates extrinsic hallucinations while warning users accordingly [102] while FAVA refines model outputs by performing fine-grained hallucination detection and editing at the span level. Specifically, it identifies specific segments of text, or "spans," that contain factual inaccuracies or subjective content and suggests edits by marking the incorrect span for deletion and providing a corrected span to replace it [112].
- Structured Knowledge Sources: These approaches integrate and reason over structured external data such as knowledge graphs or formal verification systems. For instance, [202] leverages the probabilistic inference capacity of Graph Neural Networks (GNN) to refine model outputs by processing relational data alongside textual information while [260] employs a re-ranking mechanism to refine and enhance conversational reasoning by leveraging walks over knowledge sub-graphs. Similarly, Neural Path Hunter (NPH) uses a generate-then-refine strategy that post-processes generated dialogue responses by detecting hallucinated entity mentions and refining those mentions using a KG query to replace incorrect entities with faithful ones [217]. During the revision phase of FLEEK, a fact revision module suggests corrections for dubious fact triplets based on verified evidence from Knowledge Graphs or the Web [116] while [75] integrates deductive formal methods with the dialectic capabilities of inductive LLMs to detect hallucinations through logical tests.
- External Feedback and verification: These methods rely on external signals, human feedback, or verified external knowledge to guide the refinement process. Using the emergent abilities of CoT reasoning and few-shot prompting, CRITIC revises hallucinated or incorrect outputs based on external feedback that includes free-form question answering, mathematical reasoning, or toxicity reduction [70]. Chain of Knowledge (CoK) uses a three-stage process comprising reasoning preparation, dynamic knowledge adapting, and answer consolidation to refine model outputs. If a majority consensus is not reached, CoK corrects the rationales by integrating knowledge from identified domains, heterogeneous sources, including structured and unstructured data, to gather supporting knowledge [52]. Model outputs are refined in [324] through the Verify-and-Edit framework that specifically post-edits Chain of Thought (CoT) reasoning chains. The process begins with the language model generating an initial response and its corresponding CoT as an intermediate artifact. Subsequently, the framework generates "verifying questions" and retrieves relevant external knowledge to answer them. The original CoT and the newly retrieved external facts are then used to re-adjust the generated output, correcting any unverified or factually incorrect information [324]. The Self-correction based on External Knowledge (SEK) module presented in [203] is a key component of the DRAD framework designed to mitigate hallucinations in LLMs. When the Real-time Hallucination Detection (RHD) module identifies a potential hallucination, the SEK module formulates a query using the context around the detected error and retrieves relevant external knowledge from an external corpus. Finally, the LLM truncates its original output at the hallucination point and regenerates the content by leveraging the retrieved external knowledge, thereby correcting the factual inaccuracies [203].
- Filtering Based on External Grounding: These techniques filter outputs by comparing them against external documents or ground truth. For instance, the HAR (Hallucination Augmented Recitations) pipeline presented in [126], employs Factuality Filtering and Attribution Filtering to extract factual answers while simultaneously removing any question, document, and answer pairs where the answer is not properly grounded in the provided document [126]. HaluEval-Wild uses an adversarial filtering process, manual verification of hallucination-prone queries, and selection of challenging examples so as to refine the model outputs and ensure the dataset includes only relevant, challenging cases for evaluation [142].
- Agent-Based Interaction with External Context: These involve agents that interact with external environments, systems, or receive structured external feedback for refinement. For instance, the mitigation agent in [51] is designed to refine and improve the output by interpreting an Open Voice Network (OVON) JSON message generated by the second-level agent. This JSON message contains crucial information, including the estimated hallucination level and detailed reasons for potential hallucinations, which guides the third agent’s refinement process [132].
- Model Tuning/Refinement with External Knowledge: Methods that explicitly use external knowledge during their training or refinement phase to improve model outputs. In [166], methods like refusal tuning, open-book tuning, and discard tuning are leveraged to refine the outputs of the model, thus ensuring consistency with external and intrinsic knowledge. The PURR model refines its outputs through a process akin to conditional denoising by learning to correct faux hallucinations—intentionally corrupted text that has been used to fine tune an LLM. The refinement happens as PURR denoises these corruptions by incorporating relevant evidence, resulting in more accurate and attributable outputs [235].
- Iterative self-correction, where approaches such as [252] leverage an adaptive, prompt-driven iterative framework for defect analysis, guided optimization, and response comparison using prompt-based voting. Output refinement is accomplished in [207] by employing Self-Checks where the model rephrases its own prompts or poses related questions to itself for internal consistency. Additionally, [273] uses in-context prompting to incorporate the model’s self-generated feedback for iterative self-correction while [303] focuses on rewriting and improving answers to enhance factuality, consistency, and entailment through Self-Reflection. Furthermore, [269] utilizes a prompting-based framework for the LLM to identify and adjust self-contradictions within its generated text and finally, in the Tree of Thoughts (ToT) framework [309], structured exploration is guided by search algorithms like Breadth-First Search or Depth-First Search thus helping the model perform self-evaluation at various stages to refine its reasoning path.
- Self-Regulation during Generation/Decoding, where the model re-adjusts its own output or decision-making process in real-time during generation. For instance, the Self-highlighted Hesitation method (SH2) presented in [277] refines the model’s output by iteratively recalibrating the token probabilities through hesitation and contrastive decoding, while the Hypothesis Verification Model (HVM) estimates faithfulness scores during decoding, refining the output at each step [302].
- Self-Generated Data for Improvement, where the LLM generates data or instructions which are subsequently used to finetune itself. For instance, the Self-Instruct framework bootstraps off the LLM’s own generations to create a diverse set of instructions for finetuning while in WizardLM such instructions are evolved and iteratively refined through elimination evolving to ensure a diverse dataset for instruction fine-tuning.
- Model-based techniques and tuning: LaMDA employs a generate-then-rerank pipeline that explicitly filters and ranks candidate responses based on safety and quality metrics. The discriminators are used to evaluate these attributes and the best-ranked response is selected for output [168]. Dehallucinator overwrites flagged translations by generating and scoring Monte Carlo dropout hypotheses, scoring them with a specific measure, and selecting the highest-scoring translation as the final candidate [189]. In [95], two models are responsible for generating the initial output and evaluating this output for inconsistencies using token-level confidence scoring and probabilistic anomaly detection. A feedback mechanism iteratively refines the output by flagging problematic sections and dynamically re-adjusting its internal parameters. Structured Comparative reasoning (SC2) combines approximate inference and pairwise comparison to select the most consistent structured representation from a number of intermediate representations [228]. In [48] techniques like sampling multiple responses and aggregating them for consistency aim to refine the model’s output by filtering for coherence and reliability while the Verbose Cloning stage in [230] uses carefully constructed prompts and context distillation to refine outputs by making them more comprehensive and detailed, addressing issues with overly brief or indirect responses. The central idea in [121] is to iteratively refine sequences generated by a base model using a separately trained corrector model. This process is based on value-improving triplets of the form (input, hypothesis, correction) which are examples of mapping a hypothesis to a higher-valued correction thus resulting in significant improvements in math program synthesis, lexical constrained generation and toxicity removal [121]. Finally, the MoE architecture presented in [247] uses majority voting to filter out erroneous responses and refines outputs by combining expert contributions, ensuring only consensus-backed outputs are used and that a more accurate and polished response is generated.
5.4.5. Response Validation
5.5. Interpretability and Diagnostic Approaches
5.5.1. Internal State Probing
- Detection — employing anomaly detection and linguistic analysis.
- Mitigation — suggesting cross-referencing with structured knowledge bases [32].
5.5.2. Neuron Activation and Layer Analysis
5.5.3. Attribution-Based Diagnostics
5.6. Agent-Based Orchestration
5.6.1. Reflexive / Self-Reflective Agents
5.6.2. Modular and Multi-Agent Architectures
6. Benchmarks for Evaluating Hallucinations
-
Factual Verification Benchmarks: These benchmarks focus on assessing the factual accuracy of LLM outputs by comparing them against established ground truth.
- ○
- ANAH is a bilingual dataset for fine-grained hallucination annotation in large language models, providing sentence-level annotations for hallucination type and correction [28]
- ○
- BoolQ: A question answering dataset which focuses on yes/no questions, requiring models to understand the context and before deciding [43].
- ○
- DiaHalu is introduced as the first dialogue-level hallucination evaluation benchmark for LLMs. This dataset aims to address the limitations of existing benchmarks that often focus solely on factual hallucinations by covering four multi-turn dialogue domains: knowledge-grounded, task-oriented, chit-chat, and reasoning [83].
- ○
- FACTOR (Factuality Assessment Corpus for Text and Reasoning) is a benchmark designed to evaluate the factuality of language models, particularly their ability to perform multi-hop reasoning and retrieve supporting evidence [120]. This benchmark focuses on assessing whether a model’s generated text aligns with verifiable facts, demanding not only information retrieval capabilities but also the ability to reason over multiple pieces of evidence to validate claims.
- ○
- FACTSCORE [106] contributes a fine-grained, atomic-level metric designed to assess the factual precision of long-form text generation. Accompanied by a large benchmark of 6,500 model generations from 13 LLMs, FACTSCORE enables comparisons across models by distinguishing supported, unsupported, and unverifiable statements within a single output.
- ○
- FELM (Factuality Evaluation of Large Language Models) is a benchmark dataset designed to evaluate the ability of factuality evaluators to detect errors in LLM-generated text. FELM spans five domains—world knowledge, science/tech, math, reasoning, and writing/recommendation—and is primarily used to benchmark factuality detection methods for long-form LLM outputs [109].
- ○
- FEVER (Fact Extraction and Verification): FEVER is a dataset consisting of 185,445 claims that are classified as "Supported", "Refuted" or "NotEnoughInfo" [110]. The researchers argue that FEVER is a challenging testbed, since some claims require multi-hop reasoning and information retrieval, and can be used to assess the ability of models to not only retrieve relevant evidence but also to determine the veracity of claims based on that evidence.
- ○
- FEWL (Factuality Evaluation Without Labels): FEWL is a methodology for measuring and reducing hallucinations in large language models without relying on gold-standard answers [194]. This approach focuses on evaluating the factual consistency of generated text by using a combination of techniques, including self-consistency checking and external knowledge verification, to identify potential hallucinations.
- ○
- The FRANK benchmark introduces a typology of factual errors grounded in linguistic theory to assess abstractive summarization models. It annotates summaries from nine systems using fine-grained error categories, providing a dataset that enables the evaluation and comparison of factuality metrics [320].
- ○
- HADES (HAllucination DEtection dataset) is a reference-free annotated dataset specifically designed for hallucination detection in QA tasks without requiring ground-truth references, which is particularly useful for free-form text generation where such references may not be available. The creation of HADES involved perturbing text segments from English Wikipedia and then human-annotating them through a "model-in-the-loop" procedure to identify hallucinations [128].
- ○
- HalluEditBench is a dataset comprised of verified hallucinations across multiple domains and topics and measures editing performance across Efficacy, Generalization, Portability, Locality, and Robustness [46].
- ○
- HalluLens is a comprehensive benchmark specifically designed for evaluating hallucinations in large language models, distinguishing them from factuality benchmarks [139]. HalluLens includes both intrinsic and extrinsic hallucination tasks and dynamically generates test sets to reduce data leakage. The authors argue that factuality benchmarks like TruthfulQA [315] and HaluEval 2.0 [292] often conflate factual correctness with hallucination, although hallucinations are more about consistency with training data or user input rather than absolute truth. HalluLens, therefore, offers a more precise and task-aligned tool for hallucination detection.
- ○
- HALOGEN (Hallucinations of Generative Models) is a hallucination benchmark designed to evaluate LLM outputs across nine domains including programming, scientific attribution, summarization, biographies, historical events, and reasoning tasks. HALOGEN is used to measure hallucination frequency, refusal behavior, and utility scores for generative LLMs, providing insights into error types and their potential sources in pretraining data [141].
- ○
- HaluEval: HaluEval is a large-scale benchmark for the evaluation of hallucination tendencies of large language models across diverse tasks and domains [143]. It assesses models on their ability to generate factually accurate content and identify hallucinated information. The benchmark includes multiple datasets covering various factual verification scenarios, making it a comprehensive tool for evaluating hallucination detection and mitigation performance.
- ○
- HaluEval 2.0: HaluEval 2.0 is an enhanced version of the original HaluEval benchmark, containing 8,770 questions from such diverse domains as biomedicine, finance, science and education, offering wider coverage, more rigorous evaluation metrics for assessing factuality hallucinations in LLMs as well as challenging tasks to better capture subtle hallucination patterns and measure model robustness [292].
- ○
- HaluEval-Wild: HaluEval-Wild is a benchmark specifically designed to evaluate hallucinations within dynamic, real-world user interactions as opposed to other benchmarks that focus on controlled NLP tasks like question answering or summarization [142]. It is curated using challenging, and adversarially filtered queries from the ShareGPT dataset. These queries are categorized into five types to enable fine-grained hallucination analysis.
- ○
- HDMBench is a benchmark designed for hallucination detection across diverse knowledge-intensive tasks [138]. It includes span-level and sentence-level annotations, covering hallucinations grounded in both context and common knowledge. HDMBench enables the fine-grained evaluation of detection models, supporting future research on factuality by providing unified evaluation protocols and high-quality, manually validated annotations across multiple domains.
- ○
- Head-to-Tail: Head-to-Tail delves into the nuances of factual recall by categorizing information based on popularity [144]. It consists of 18,000 question-answer pairs and segments knowledge into "head" (most popular), "torso" (moderately popular), and "tail" (least popular) categories, mirroring the distribution of information in knowledge graphs.
- ○
- HotpotQA: HotpotQA evaluates multi-hop reasoning and information retrieval capabilities, requiring models to synthesize information from multiple documents to answer complex questions and it has proven to be particularly useful for evaluating factual consistency across multiple sources [146].
- ○
- NQ (Natural Questions): NQ is a large-scale dataset of real questions asked by users on Google, paired with corresponding long-form answers from Wikipedia [215]. It tests the ability to retrieve and understand information from a large corpus.
- ○
- RAGTruth is a dataset tailored for analyzing word-level hallucinations within standard RAG frameworks for LLM applications. It comprises nearly 18,000 naturally generated responses from various LLMs that can also be used to benchmark hallucination frequencies [241].
- ○
- SelfCheckGPT is a zero-resource, black-box benchmark for hallucination detection. It assesses model consistency by sampling multiple responses and measuring their similarity, without needing external databases or model internals [274].
- ○
- TriviaQA is a large-scale question-answering dataset that contains over 650,000 question-answer-evidence triplets that were created by combining trivia questions from various web sources with supporting evidence documents automatically gathered from Wikipedia and web search results [309]. The questions cover a diverse range of topics with many of those questions requiring multi-hop reasoning or the integration of information from multiple sources in order to generate a correct answer.
- ○
- TruthfulQA: TruthfulQA is a benchmark designed to assess the capability of LLMs in distinguishing between truthful and false statements, particularly those crafted to be adversarial or misleading [315].
- ○
- The UHGEval benchmark offers a large-scale, Chinese-language dataset for evaluating hallucinations under unconstrained generation settings. Unlike prior constrained or synthetic benchmarks, UHGEval captures naturally occurring hallucinations from five LLMs and applies a rigorous annotation pipeline, making it a more realistic and fine-grained resource for factuality evaluation [318].
-
Domain-Specific Benchmarks: These benchmarks target specific domains, testing the model’s knowledge and reasoning abilities within those areas.
- ○
- PubMedQA: This benchmark focuses on medical question answering, evaluating the accuracy and reliability of LLMs in the medical domain [234].
- ○
- SciBench: This benchmark verifies scientific reasoning and claim consistency, assessing the ability of LLMs to understand and apply scientific principles [265].
- ○
- LegalBench: This benchmark examines legal reasoning and interpretation, evaluating the performance of LLMs on legal tasks [177].
- Code Generation Benchmarks (e.g., HumanEval, Codex): These benchmarks assess the ability of LLMs to generate correct and functional code, which requires both factual accuracy and logical reasoning [99].
7. Practical Implications
- Keep humans in the loop with role clarity: Use Self-Reflection and external fact-checking pipelines to route low-confidence or conflicting outputs to a designated reviewer; require dual sign-off for irreversible actions when sources disagree.
8. Challenges
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Our Taxonomy Presented in Categories, Subcategories and Associated Papers

Appendix B. Summary Table of Benchmarks Used in Hallucination Detection and Mitigation

Appendix C. Hallucination Mitigation Subcategories Comparison Table

Glossary
- A
- ActLCD — Active Layer-Contrastive Decoding — Decoding that contrasts intermediate layers to steer token selection toward more factual continuations.
- Activation Decoding — Constrained decoding that adjusts next-token probabilities using activation/uncertainty signals to suppress hallucinations.
- AFHN — Adversarial Feature Hallucination Networks — Adversarial training to produce features and examples that stress models and reduce hallucinations.
- AggTruth — Attention aggregation across heads/layers to flag unsupported spans and improve factual consistency checks.
- ALIGNed-LLM — Aligns external knowledge (e.g., KG/entity embeddings) with the model’s representation space to ground generations.
- ALTI+ — Attribution method that quantifies how much each input token contributes to generated tokens for interpretability/factuality analysis.
- ANAH — Bilingual hallucination dataset with sentence-level annotations and suggested corrections.
- ATF — Adaptive Token Fusion — Merges redundant/similar tokens early to retain meaning while reducing noise and hallucination risk.
- AutoHall — Automatic pipeline to synthesize, detect, and evaluate hallucinations for training and benchmarking.
- AutoRAG-LoRA — Lightweight LoRA adaptation to better couple retrieval and generation in RAG systems.
- B
- BERTScore — Semantic similarity metric using contextual embeddings to evaluate generated text.
- BLEU — N-gram overlap metric; useful for surface similarity but not a direct measure of factuality.
- BoolQ — Yes/no question-answering dataset often used in factuality experiments.
- C
- CCL-XCoT — Cross-lingual transfer of Chain of Thought traces to improve reasoning and reduce hallucinations across languages.
- CD — Contrastive Decoding — Penalizes tokens favored by a weaker/contrast model to filter implausible continuations.
- Chain of Knowledge (CoK) — Grounds reasoning by explicitly incorporating external knowledge into intermediate steps.
- Chain of NLI (CoNLI) — Cascaded entailment checks over partial outputs to prune unsupported content.
- Chain of Thought (CoT) — Prompting that elicits step-by-step reasoning before the final answer.
- Chain of Verification (CoVe) — Contextualized word embeddings pre-trained on translation, used for stronger semantic representations.
- COMET-QE — Reference-free MT quality estimation used as a proxy signal for consistency.
- Conditional Pointwise Mutual Information (CPMI) — Decoding re-scoring that rewards tokens better supported by the source/context.
- Confident Decoding — Incorporates uncertainty estimates into beam/nucleus procedures to favor low-uncertainty continuations.
- CPM — Conditional Entropy Mechanism — Uses token-level entropy to detect and avoid uncertain/hallucination-prone outputs.
- CPO — Contrastive Preference Optimization — Preference optimization that uses contrastive signals to align outputs with faithful behavior.
- CRAG — Corrective Retrieval-Augmented Generation — Adds corrective/revision steps atop RAG to fix unsupported claims.
- CRITIC — A verify-and-edit framework where a “critic” process checks claims against evidence and proposes fixes.
- Critic-driven Decoding — Decoding guided by a trained critic/verifier that down-weights unsupported next tokens.
- D
- D&Q — Decompose-and-Query — Decomposes a question into sub-questions and retrieves evidence for each before answering.
- DeCoRe — Decoding by Contrasting Retrieval Heads — Contrasts retrieval-conditioned signals to suppress ungrounded tokens.
- Dehallucinator — Detect-then-rewrite approach that edits hallucinated spans into grounded alternatives.
- Delta — Compares outputs under masked vs. full context to detect and penalize hallucination-prone continuations.
- DiaHalu — Dialogue-level hallucination benchmark covering multiple multi-turn domains.
- DoLa — Decoding by Contrasting Layers — Uses differences between early vs. late layer logits to promote factual signals.
- DPO — Direct Preference Optimization — RL-free preference tuning that directly optimizes for chosen responses.
- DRAD — Decoding with Retrieval-Augmented Drafts — Uses retrieved drafts/evidence to guide decoding away from unsupported text.
- DreamCatcher — Detects and corrects hallucinations by cross-checking outputs against external evidence/tools.
- DrHall — Lightweight, fast hallucination detection targeted at real-time scenarios.
- E
- EigenScore — Uncertainty/factuality signal derived from the spectrum of hidden-state representations.
- EntailR — Entailment-based verifier used to check whether generated claims follow from retrieved evidence.
- EVER — Evidence-based verification/rectification that validates claims and proposes fixes during/after generation.
- F
- — Faith Finetuning — Direct finetuning objective to increase faithfulness of generations.
- Faithful Finetuning — Finetuning strategies that explicitly optimize for verifiable, source-supported outputs.
- FacTool — Tool-augmented factuality checking that extracts claims and verifies them against sources.
- FactPEGASUS — Summarization variant emphasizing factual consistency with the source document.
- FactRAG — RAG design focused on retrieving and citing evidence that supports each claim.
- FACTOR — Benchmark emphasizing multi-hop factuality and evidence aggregation.
- FAVA — Corrupt-and-denoise training pipeline to teach models to correct fabricated content.
- FELM — Benchmark for evaluating factuality evaluators on long-form outputs.
- FEVER — Large-scale fact verification dataset (Supported/Refuted/Not Enough Info).
- FG-PRM — Fine-Grained Process Reward Model — Process-level reward modeling for stepwise supervision of reasoning.
- FRANK — Fine-grained factual error taxonomy and benchmark for summarization.
- FreshLLMs — Uses live retrieval/search refresh to reduce outdated or stale knowledge.
- FActScore / FACTSCORE — Atomic, claim-level factuality scoring/benchmark for long-form text.
- G
- GAN — Generative Adversarial Network — Adversarial training framework used to stress and correct model behaviors.
- GAT — Graph Attention Network — Graph neural network with attention; used to propagate grounded evidence.
- GNN — Graph Neural Network — Neural architectures over graphs for structured reasoning/grounding.
- GoT — Graph-of-Thoughts — Represents reasoning as a graph of states/operations to explore multiple paths.
- Grad-CAM — Gradient-based localization on intermediate features for interpretability of decisions.
- Gradient × Input — Simple attribution method multiplying gradients by inputs to estimate token importance.
- Graph-RAG — RAG that leverages knowledge graphs/graph structure for retrieval and grounding.
- G-Retriever — Graph-aware retriever designed to recall evidence that reduces hallucinations.
- H
- HADES — Reference-free hallucination detection dataset for QA.
- HALO — Estimation + reduction framework for hallucinations in open-source LLMs.
- HALOGEN — Structure-aware reasoning/verification pipeline to reduce unsupported claims.
- HalluciNot — Retrieval-assisted span verification to detect and mitigate hallucinations.
- HaluBench — Benchmark suite for evaluating hallucinations across tasks or RAG settings.
- HaluEval — Large-scale hallucination evaluation benchmark.
- HaluEval-Wild — “In-the-wild” hallucination evaluation using web-scale references.
- HaluSearch — Retrieval-in-the-loop detection/mitigation pipeline that searches evidence while generating.
- HAR — Hallucination Augmented Recitations — Produces recitations/snippets that anchor generation to evidence.
- HDM-2— Hallucination Detection Method 2 — Modular multi-detector system targeting specific hallucination types.
- HERMAN — Checks entities/quantities in outputs against source to avoid numerical/entity errors.
- HILL — Human-factors-oriented hallucination identification framework/benchmark.
- Hidden Markov Chains — Sequential state models used to analyze latent dynamics associated with hallucinations.
- HIPO — Hard-sample-aware iterative preference optimization to improve robustness.
- HSP — Hierarchical Semantic Piece — Hierarchical text segmentation/representation to stabilize retrieval and grounding.
- HybridRAG — Combines multiple retrieval sources/strategies (e.g., dense + sparse + KG) for stronger grounding.
- HumanEval — Code generation benchmark often used in hallucination-sensitive program synthesis.
- HVM — Hypothesis Verification Model — Classifier/verifier that filters candidates by textual entailment with evidence.
- I
- ICD — Induce-then-Contrast Decoding — Induces errors with a weaker model and contrasts to discourage hallucinated tokens.
- INSIDE — Internal-state-based uncertainty estimation with interventions to reduce overconfidence.
- Input Erasure — Attribution by removing/ablating input spans to see their effect on outputs.
- InterrogateLLM — Detects hallucinations via inconsistency across multiple answers/contexts.
- Iter-AHMCL — Iterative decoding with hallucination-aware contrastive learning to refine outputs.
- ITI — Inference-Time Intervention — Nudges specific heads/activations along truth-aligned directions during decoding.
- J
- Joint Entity and Summary Generation — Summarization that jointly predicts entities and the abstract to reduce unsupported content.
- K
- KB — Knowledge Base — External repository of facts used for grounding/verification.
- KCA — Knowledge-Consistent Alignment — Aligns model outputs with retrieved knowledge via structured prompting/objectives.
- KG — Knowledge Graph — Graph-structured facts used for retrieval, verification, and attribution.
- KGR — Knowledge Graph Retrofitting — Injects/retrofits KG-verified facts into outputs or intermediate representations.
- KL-divergence — Divergence measure used in calibration/regularization and to compare layer distributions.
- Knowledge Overshadowing — When parametric priors dominate over context, causing the model to ignore given evidence.
- L
- LaBSE — Multilingual sentence encoder used for cross-lingual matching/verification.
- LASER — Language-agnostic sentence embeddings for multilingual retrieval/entailment.
- LAT — Linear Artificial Tomography — Linear probes/edits to reveal and steer latent concept directions.
- LayerSkip — Self-speculative decoding with early exits/verification by later layers.
- LID — Local Intrinsic Dimension — Dimensionality measure of hidden states linked to uncertainty/truthfulness.
- LinkQ — Forces explicit knowledge-graph queries to ground answers.
- LLM — Large Language Model — Transformer-based model trained for next-token prediction and generation.
- LLM Factoscope — Probing/visualization of hidden-state clusters to distinguish factual vs fabricated content.
- LLM-AUGMENTER — Orchestrates retrieval/tools around an LLM to improve grounding and reduce errors.
- Logit Lens — Projects intermediate residual streams to the vocabulary space to inspect token preferences.
- Lookback Lens — Attention-only method that checks whether outputs attend to relevant context.
- LoRA — Low-rank adapters for efficient finetuning, commonly used in factuality/hallucination pipelines.
- LQC — Lightweight Query Checkpoint — Predicts when a query needs verification or retrieval before answering.
- LRP — Layer-wise Relevance Propagation — Decomposes predictions to attribute token-level contributions.
- M
- MARL — Multi-Agent Reinforcement Learning — Multiple agents coordinate/critique each other to improve reliability.
- MC — Monte Carlo — Stochastic sampling used for uncertainty estimation and search.
- MCTS — Monte Carlo Tree Search — Guided tree exploration used in deliberate, plan-and-verify reasoning.
- METEOR — MT metric leveraging synonymy/stemming; not a direct factuality measure.
- mFACT — Decoding-integrated factuality signal to prune low-faithfulness candidates.
- MixCL — Mixed contrastive learning (with hard negatives) to reduce dialog hallucinations.
- MoCo — Momentum contrast representation learning used to build stronger encoders.
- MoE — Mixture-of-Experts — Sparse expert routing to localize knowledge and reduce interference.
- N
- NEER — Neural evidence-based evaluation/repair methods that use entailment or retrieved evidence to improve outputs.
- Neural Path Hunter — Analyzes reasoning paths/graphs to locate error-prone segments for correction.
- Neural-retrieval-in-the-loop — Integrates a trainable retriever during inference to stabilize grounding.
- NL-ITI — Non-linear version of ITI with richer probes and multi-token interventions.
- NLU — Natural Language Understanding — Models/components (e.g., NLI, QA) used as verifiers or critics.
- Nucleus Sampling — Top-p decoding that samples from the smallest set whose cumulative probability exceeds p.
- O
- OVON — Open-Vocabulary Object Navigation; task setting where language directs navigation to open-set objects, used in agent/LLM evaluations.
- P
- PCA — Principal Component Analysis — Projects activations to principal subspaces to analyze truth/lie separability.
- PGFES — Psychology-guided two-stage editing and sampling along “truthfulness” directions in latent space.
- Persona drift — When a model’s stated persona/stance shifts across sessions or contexts.
- PoLLMgraph — Probabilistic/graph model over latent states to track hallucination dynamics.
- PMI — Pointwise Mutual Information — Signal for overshadowing/low-confidence conditions during decoding.
- Principle Engraving — Representation-editing to imprint desired principles into activations.
- Principle-Driven Self-Alignment — Self-alignment method that derives rules/principles and tunes behavior accordingly.
- ProbTree — Probabilistic Tree-of-Thought — ToT reasoning with probabilistic selection/evaluation of branches.
- PURR — Trains on corrupted vs. corrected claims to produce a compact, factuality-aware model.
- TOPICPREFIX — Prompt/prefix-tuning scheme to stabilize topic adherence and reduce drift.
- Q
- Q2 — Factual consistency measure comparing outputs to retrieved references.
- R
- R-Tuning — Instructioning/tuning models to abstain or say “I don’t know” when unsure.
- RAG — Retrieval-Augmented Generation — Augments generation with document retrieval for grounding.
- RAG-KG-IL — RAG integrated with knowledge-graph and incremental-learning components.
- RAG-Turn — Turn-aware retrieval for multi-turn tasks.
- RAGTruth — Human-annotated data for evaluating/teaching RAG factuality.
- RAP — Reasoning viA Planning — Planning-style reasoning that structures problem solving before answering.
- RARR — Retrieve-and-Revise pipeline that edits outputs to add citations and fix unsupported claims.
- RBG — Read-Before-Generate — Reads/retrieves first, then conditions generation on the evidence.
- REPLUG — Prepends retrieved text and averages probabilities across retrieval passes to ground decoding.
- RepE — Representation Engineering — Editing/steering latent directions to improve honesty/faithfulness.
- RefChecker — Reference-based fine-grained hallucination checker and diagnostic benchmark.
- Reflexion — Self-critique loop where the model reflects on errors and retries.
- RID — Retrieval-In-Decoder — Retrieval integrated directly into the decoder loop.
- RHO — Reranks candidates by factual consistency with retrieved knowledge or graph evidence.
- RHD — Real-time Hallucination Detection — Online detection and optional self-correction during generation.
- RLCD — Reinforcement Learning with Contrastive Decoding — RL variant that pairs contrastive objectives with decoding.
- RLHF — Reinforcement Learning from Human Feedback — Uses human preference signals to align model behavior.
- RLAIF — Reinforcement Learning from AI Feedback — Uses AI-generated preference signals to scale alignment.
- RLKF — Reinforcement-Learning-based Knowledge Filtering that favors context-grounded generation.
- ROUGE — Overlap-based summarization metric (e.g., ROUGE-L).
- RaLFiT — Reinforcement-learning-style fine-tuning aimed at improving truthfulness/factuality.
- S
- SC2— Structured Comparative Reasoning — Compares structured alternatives and selects the most consistent one.
- SCOTT — Self-Consistent Chain-of-Thought Distillation — Samples multiple CoTs and distills the consistent answer.
- SCD — Self-Contrastive Decoding — Penalizes over-represented priors to counter knowledge overshadowing.
- SEA — Spectral Editing of Activations — Projects activations along truth-aligned directions while suppressing misleading ones.
- SEAL — Selective Abstention Learning — Teaches models to abstain (e.g., emit a reject token) when uncertain.
- SEBRAG — Structured Evidence-Based RAG — RAG variant that structures evidence and grounding steps.
- SEK — Evidence selection/structuring module used to verify or revise outputs.
- SEPs — Semantic Entropy Probes — Fast probes that estimate uncertainty from hidden states.
- Self-Checker — Pipeline that extracts and verifies claims using tools or retrieval.
- Self-Checks — Generic self-verification passes (consistency checks, regeneration, or critique).
- Self-Consistency — Samples multiple reasoning paths and selects the majority-consistent result.
- Self-Familiarity — Calibrates outputs based on what the model “knows it knows” vs. uncertain areas.
- Self-Refine — Iterative refine-and-feedback loop where the model improves its own draft.
- Self-Reflection — The model reflects on its reasoning and revises responses accordingly.
- SELF-RAG — Self-reflective RAG where a critic guides retrieval and edits drafts.
- SelfCheckGPT — Consistency-based hallucination detector using multiple sampled outputs.
- — Self-Highlighted Hesitation — Injects hesitation/abstention mechanisms at uncertain steps.
- SimCLR — Contrastive representation learning framework used to build stronger encoders.
- SimCTG — Contrastive text generation that constrains decoding to avoid degenerate outputs.
- Singular Value Decomposition (SVD) — Matrix factorization used to analyze or edit latent directions.
- Socratic Prompting — Uses guided questions to elicit intermediate reasoning and evidence.
- T
- ToT — Tree-of-Thought — Branch-and-evaluate reasoning over a tree of intermediate states.
- TOPICPREFIX — Prompt/prefix-tuning that encodes topics to stabilize context adherence.
- TrueTeacher — Teacher-style training that builds a factual evaluator and uses it to guide student outputs.
- Truth Forest — Learns orthogonal “truth” representations and intervenes along those directions.
- TruthfulQA — Benchmark evaluating resistance to common falsehoods.
- TruthX — Latent editing method that nudges activations toward truthful directions.
- Tuned Lens — Learns linear mappings from hidden states to logits to study/steer layer-wise predictions.
- TWEAK — Think While Effectively Articulating Knowledge — Hypothesis-and-NLI-guided reranking that prefers supported continuations.
- U
- UHGEval — Hallucination evaluation benchmark for unconstrained generation in Chinese and related settings.
- UPRISE — Uses LLM signals to train a retriever that selects stronger prompts/evidence.
- V
- Verbose Cloning — Prompting/aggregation technique that elicits explicit, fully-specified answers to reduce ambiguity.
- X
- XCoT — Cross-lingual Chain-of-Thought prompting/transfer.
- XNLI — Cross-lingual NLI benchmark commonly used for entailment-based verification.
References
- S. M. T. I. Tonmoy et al., “A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.01313.
- Y. Su, T. Lan, Y. Wang, D. Yogatama, L. Kong, and N. Collier, “A Contrastive Framework for Neural Text Generation,” Feb. 2022, [Online]. Available: http://arxiv.org/abs/2202.06417.
- T. R. McIntosh et al., “A Culturally Sensitive Test to Evaluate Nuanced GPT Hallucination,” IEEE Access, vol. 12, pp. 51555–51572, 2024. [CrossRef]
- A.Shelmanov et al., “A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs,” May 2025, [Online]. Available: http://arxiv.org/abs/2505.08200.
- J. White et al., “A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT,” Feb. 2023, [Online]. Available: http://arxiv.org/abs/2302.11382.
- Z. Yin, “A review of methods for alleviating hallucination issues in large language models,” Applied and Computational Engineering, vol. 76, no. 1, pp. 258–266, Jul. 2024. [CrossRef]
- B. AlKhamissi, M. Li, A. Celikyilmaz, M. Diab, and M. Ghazvininejad, “A Review on Language Models as Knowledge Bases,” Apr. 2022, [Online]. Available: http://arxiv.org/abs/2204.06031.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” Feb. 2020, [Online]. Available: http://arxiv.org/abs/2002.05709.
- F. Nie, J.-G. Yao, J. Wang, R. Pan, and C.-Y. Lin, “A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation,” Association for Computational Linguistics, 2019. [Online]. Available: https://aclanthology.org/P19-1256.pdf.
- H. Cao, Z. An, J. Feng, K. Xu, L. Chen, and D. Zhao, “A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question Decomposition with Large Language Models,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.07491.
- Q. Signé, M. Boughanem, J. G. Moreno, and T. Belkacem, “A Substring Extraction-Based RAG Method for Minimising Hallucinations in Aircraft Maintenance Question Answering,” in Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR), New York, NY, USA: ACM, Jul. 2025, pp. 513–521. [CrossRef]
- V. Rawte, A. Sheth, and A. Das, “A Survey of Hallucination in Large Foundation Models,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.05922.
- L. Huang et al., “A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions,” ACM Transactions on Information Systems, Nov. 2024. [CrossRef]
- S. v. Shah, “Accuracy, Consistency, and Hallucination of Large Language Models When Analyzing Unstructured Clinical Notes in Electronic Medical Records,” JAMA Network Open. American Medical Association, p. e2425953, 2024. [CrossRef]
- H. Zhang, H. Chen, M. Chen, and T. Zhang, “Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation,” Jun. 2025, [Online]. Available: http://arxiv.org/abs/2505.23657.
- R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive Mixtures of Local Experts,” 1991. Accessed: Aug. 07, 2025. [Online]. Available: https://2024.sci-hub.se/1867/e922caa86bf169b2dbb314f150dbdadb/jacobs1991.pdf.
- K. Li, Y. Zhang, K. Li, and Y. Fu, “Adversarial Feature Hallucination Networks for Few-Shot Learning,” 2020. [Online]. Available: http://arxiv.org/abs/2003.13193.
- P. Matys et al., “AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs,” Jun. 2025. [CrossRef]
- S. Kapoor, B. Stroebl, Z. S. Siegel, N. Nadgir, and A. Narayanan, “AI Agents That Matter,” Jul. 2024, [Online]. Available: http://arxiv.org/abs/2407.01502.
- J. Ji et al., “AI Alignment: A Comprehensive Survey,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.19852.
- N. Maleki, B. Padmanabhan, and K. Dutta, “AI Hallucinations: A Misnomer Worth Clarifying,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.06796.
- N. A. Z. Nishat, A. Coletta, L. Bellomarini, K. Amouzouvi, J. Lehmann, and S. Vahdati, “Aligning Knowledge Graphs and Language Models for Factual Accuracy,” Jul. 2025, [Online]. Available: http://arxiv.org/abs/2507.13411.
- Y. Yang, E. Chern, X. Qiu, G. Neubig, and P. Liu, “Alignment for Honesty,” Oct. 2024, [Online]. Available: http://arxiv.org/abs/2312.07000.
- L. Huang et al., “Alleviating Hallucinations from Knowledge Misalignment in Large Language Models via Selective Abstention Learning,” 2025. Accessed: Aug. 04, 2025. [Online]. Available: https://aclanthology.org/2025.acl-long.1199.pdf.
- J. Li, Z. Mao, and Q. Wang, “Alleviating Hallucinations in Large Language Models via Truthfulness-driven Rank-adaptive LoRA,” Jul. 2025. Accessed: Aug. 04, 2025. [Online]. Available: https://aclanthology.org/2025.findings-acl.103.pdf.
- Y. Zhang, L. Cui, W. Bi, and S. Shi, “Alleviating Hallucinations of Large Language Models through Induced Hallucinations,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.15710.
- Y. Li, Z. Li, K. Hung, W. Wang, H. Xie, and Y. Li, “Ambiguity processing in Large Language Models: Detection, resolution, and the path to hallucination,” Natural Language Processing Journal, p. 100173, Jul. 2025. [CrossRef]
- Z. Ji, Y. Gu, W. Zhang, C. Lyu, D. Lin, and K. Chen, “ANAH: Analytical Annotation of Hallucinations in Large Language Models,” May 2024, Available: http://arxiv.org/abs/2405.20315.
- S. Ramprasad, E. Ferracane, and Z. C. Lipton, “Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends,” Jun. 2024, [Online]. Available: http://arxiv.org/abs/2406.03487.
- A.Chaturvedi, S. Bhar, S. Saha, U. Garain, and N. Asher, “Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license,” Computational Linguistics, vol. 50, no. 1, 2024. [CrossRef]
- P. Michel, O. Levy, and G. Neubig, “Are Sixteen Heads Really Better than One?” Nov. 2019, [Online]. Available: http://arxiv.org/abs/1905.10650.
- P. Zablocki and Z. Gajewska, “Assessing Hallucination Risks in Large Language Models Through Internal State Analysis.” Jul. 17, 2024. [CrossRef]
- B. Goodrich et al., “Assessing the Factual Accuracy of Generated Text,” KDD '19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Aug. 2019. [CrossRef]
- A.Vaswani et al., “Attention Is All You Need,” Jun. 2017, [Online]. Available: http://arxiv.org/abs/1706.03762.
- M. Yuksekgonul et al., “Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.15098.
- Y. Zhao, Z. Liu, Y. Zheng, and K.-Y. Lam, “Attribution Techniques for Mitigating Hallucination in RAG-based Question-Answering Systems: A Survey.” Jun. 19, 2025. [CrossRef]
- Z. Cao, Y. Yang, and H. Zhao, “AutoHall: Automated Hallucination Dataset Generation for Large Language Models,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2310.00259.
- K. Dwivedi and P. P. Mishra, “AutoRAG-LoRA: Hallucination-Triggered Knowledge Retuning via Lightweight Adapters,” Jul. 2025, [Online]. Available: http://arxiv.org/abs/2507.10586.
- J. Li et al., “Banishing LLM Hallucinations Requires Rethinking Generalization,” Jun. 2024, [Online]. Available: http://arxiv.org/abs/2406.17642.
- J. Chen, H. Lin, X. Han, and L. Sun, “Benchmarking Large Language Models in Retrieval-Augmented Generation,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.01431.
- T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT,” Apr. 2019, [Online]. Available: http://arxiv.org/abs/1904.09675.
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A Method for Automatic Evaluation of Machine Translation.” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, USA, Jul. 2002, pp. 311–318. https://aclanthology.org/P02-1040.pdf. [CrossRef]
- C. Clark et al., “BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions.” [Online]. Available: https://arxiv.org/abs/1905.10044.
- T. Z. Zhao, E. Wallace, S. Feng, D. Klein, and S. Singh, “Calibrate Before Use: Improving Few-Shot Performance of Language Models,” Jun. 2021, [Online]. Available: http://arxiv.org/abs/2102.09690.
- A.T. Kalai and S. S. Vempala, “Calibrated Language Models Must Hallucinate,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.14648.
- Huang, C. Chen, X. Xu, A. Payani, and K. Shu, “Can Knowledge Editing Really Correct Hallucinations?” Mar. 2025, [Online]. Available: http://arxiv.org/abs/2410.16251.
- G. Agrawal, T. Kumarage, Z. Alghamdi, and H. Liu, “Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2311.07914.
- M. Xiong et al., “Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs,” Jun. 2023, [Online]. Available: http://arxiv.org/abs/2306.13063.
- E. Kıcıman, R. Ness, A. Sharma, and C. Tan, “Causal Reasoning and Large Language Models: Opening a New Frontier for Causality,” Apr. 2023, [Online]. Available: http://arxiv.org/abs/2305.00050.
- W. Zheng, R. K.-W. Lee, Z. Liu, K. Wu, A. Aw, and B. Zou, “CCL-XCoT: An Efficient Cross-Lingual Knowledge Transfer Method for Mitigating Hallucination Generation,” Jul. 2025, [Online]. Available: http://arxiv.org/abs/2507.14239.
- D. Lei et al., “Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.03951.
- X. Li et al., “Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.13269.
- J. Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Jan. 2022, [Online]. Available: http://arxiv.org/abs/2201.11903.
- J. Cheng et al., “Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation,” Jun. 2025, [Online]. Available: http://arxiv.org/abs/2506.17088.
- S. Dhuliawala et al., “Chain-of-Verification Reduces Hallucination in Large Language Models,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.11495.
- J. Kaddour, J. Harris, M. Mozes, H. Bradley, R. Raileanu, and R. McHardy, “Challenges and Applications of Large Language Models,” Jul. 2023, [Online]. Available: http://arxiv.org/abs/2307.10169.
- F. Yin, J. Srinivasa, and K.-W. Chang, “Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2402.18048.
- B. Peng et al., “Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback,” Feb. 2023, [Online]. Available: http://arxiv.org/abs/2302.12813.
- Q. Lv, J. Wang, H. Chen, B. Li, Y. Zhang, and F. Wu, “Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models,” Oct. 2024, [Online]. Available: http://arxiv.org/abs/2410.15116.
- V. Agarwal, Y. Pei, S. Alamir, and X. Liu, “CodeMirage: Hallucinations in Code Generated by Large Language Models,” Aug. 2024, [Online]. Available: http://arxiv.org/abs/2408.08333.
- K. Thórisson and H. Helgasson, “Cognitive Architectures and Autonomy: A Comparative Review,” Journal of Artificial General Intelligence, vol. 3, no. 2, pp. 1–30, May 2012. [CrossRef]
- H. Ye, T. Liu, A. Zhang, W. Hua, and W. Jia, “Cognitive Mirage: A Review of Hallucinations in Large Language Models,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.06794.
- Y. Bai et al., “Constitutional AI: Harmlessness from AI Feedback,” Dec. 2022, [Online]. Available: http://arxiv.org/abs/2212.08073.
- X. L. Li et al., “Contrastive Decoding: Open-ended Text Generation as Optimization,” Oct. 2022, [Online]. Available: http://arxiv.org/abs/2210.15097.
- W. Sun, Z. Shi, S. Gao, P. Ren, M. de Rijke, and Z. Ren, “Contrastive Learning Reduces Hallucination in Conversations,” Dec. 2022, [Online]. Available: http://arxiv.org/abs/2212.10400.
- Z. Xu, “Context-aware Decoding Reduces Hallucination in Query-focused Summarization,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.14335.
- J. Robinson, C.-Y. Chuang, S. Sra, and S. Jegelka, “Contrastive Learning with Hard Negative Samples,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.04592.
- K. Filippova, “Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data,” Oct. 2020, [Online]. Available: http://arxiv.org/abs/2010.05873.
- S.-Q. Yan, J.-C. Gu, Y. Zhu, and Z.-H. Ling, “Corrective Retrieval Augmented Generation,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.15884.
- Z. Gou et al., “CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.11738.
- M. Lango and O. Dušek, “Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation,” 2023. [Online]. Available: https://arxiv.org/abs/2310.16964.
- A.P. Gema et al., “DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations,” Oct. 2024, [Online]. Available: http://arxiv.org/abs/2410.18860.
- K. Lee et al., “Deduplicating Training Data Makes Language Models Better,” Jul. 2021, [Online]. Available: http://arxiv.org/abs/2107.06499.
- Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Jun. 2017, [Online]. Available: http://arxiv.org/abs/1706.03741.
- S. Jha, S. K. Jha, P. Lincoln, N. D. Bastian, A. Velasquez, and S. Neema, “Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting,” in Proceedings - 2023 IEEE International Conference on Assured Autonomy, ICAA 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 149–152. [CrossRef]
- C. P. Huang and H.-Y. Chen, “Delta -- Contrastive Decoding Mitigates Text Hallucinations in Large Language Models,” Feb. 2025, [Online]. Available: http://arxiv.org/abs/2502.05825.
- V. Karpukhin et al., “Dense Passage Retrieval for Open-Domain Question Answering,” 2020. [Online]. Available: https://arxiv.org/abs/2004.04906.
- D. Dale, E. Voita, L. Barrault, and M. R. Costa-jussà, “Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better,” Dec. 2022, [Online]. Available: http://arxiv.org/abs/2212.08597.
- Y. Qiu, Y. Ziser, A. Korhonen, E. M. Ponti, and S. B. Cohen, “Detecting and Mitigating Hallucinations in Multilingual Summarisation,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.13632.
- W. Wu, Y. Cao, N. Yi, R. Ou, and Z. Zheng, “Detecting and Reducing the Factual Hallucinations of Large Language Models with Metamorphic Testing,” Proceedings of the ACM on Software Engineering, vol. 2, no. FSE, pp. 1432 – 1453, 2025, [Online]. Available: https://dl.acm.org/doi/pdf/10.1145/3715784.
- C. Zhou et al., “Detecting Hallucinated Content in Conditional Neural Sequence Generation,” 2021. [Online]. Available: https://arxiv.org/abs/2011.02593.
- S. Farquhar, J. Kossen, L. Kuhn, and Y. Gal, “Detecting hallucinations in large language models using semantic entropy,” Nature, vol. 630, no. 8017, pp. 625–630, Jun. 2024. [CrossRef]
- K. Chen, Q. Chen, J. Zhou, Y. He, and L. He, “DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.00896.
- E. Razumovskaia et al., “Dial BEINFO for Faithfulness: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.09800.
- E. Perez et al., “Discovering Language Model Behaviors with Model-Written Evaluations,” Dec. 2022, [Online]. Available: http://arxiv.org/abs/2212.09251.
- S. CH-Wang, B. van Durme, J. Eisner, and C. Kedzie, “Do Androids Know They’re Only Dreaming of Electric Sheep?” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.17249.
- Agrawal, M. Suzgun, L. Mackey, and A. T. Kalai, “Do Language Models Know When They’re Hallucinating References?” May 2023, [Online]. Available: http://arxiv.org/abs/2305.18248.
- Z. Yin, Q. Sun, Q. Guo, J. Wu, X. Qiu, and X. Huang, “Do Large Language Models Know What They Don’t Know?” 2023. [Online]. Available: https://github.com/yinzhangyue/SelfAware.
- Z. Gekhman et al., “Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?” May 2024, [Online]. Available: http://arxiv.org/abs/2405.05904.
- Y.-S. Chuang, Y. Xie, H. Luo, Y. Kim, J. Glass, and P. He, “DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.03883.
- N. Li, Y. Li, Y. Liu, L. Shi, K. Wang, and H. Wang, “Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models,” May 2024. [CrossRef]
- J. Wei et al., “Emergent Abilities of Large Language Models,” Jun. 2022, [Online]. Available: http://arxiv.org/abs/2206.07682.
- A.M. Garcia-Carmona, M.-L. Prieto, E. Puertas, and J.-J. Beunza, “Enhanced medical data extraction: leveraging LLMs for accurate retrieval of patient information from medical reports,” JMIR AI, Nov. 2024. [CrossRef]
- Wang, Y. Zhao, Y. Liu, and H. Zhu, “Enhancing Latent Diffusion in Large Language Models for High-Quality Implicit Neural Representations with Reduced Hallucinations,” 2024. Accessed: Jun. 29, 2025. [Online]. Available: https://osf.io/preprints/osf/9utwy_v1.
- S. Behore, L. Dumont, and J. Venkataraman, “Enhancing Reliability in Large Language Models: Self-Detection of Hallucinations With Spontaneous Self-Checks.” Sep. 09, 2024. [CrossRef]
- T. Zhang et al., “Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.13230.
- S. Longpre et al., “Entity-Based Knowledge Conflicts in Question Answering,” 2021. Accessed: Aug. 03, 2025. [Online]. Available: https://arxiv.org/abs/2109.05052.
- F. Nan et al., “Entity-level Factual Consistency of Abstractive Text Summarization,” Feb. 2021, [Online]. Available: http://arxiv.org/abs/2102.09130.
- M. Chen et al., “Evaluating Large Language Models Trained on Code,” Jul. 2021, [Online]. Available: http://arxiv.org/abs/2107.03374.
- Q. Cheng et al., “Evaluating Hallucinations in Chinese Large Language Models,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.03368.
- W. Kryściński, B. McCann, C. Xiong, and R. Socher, “Evaluating the Factual Consistency of Abstractive Text Summarization,” Oct. 2019, [Online]. Available: http://arxiv.org/abs/1910.12840.
- H. Kang, J. Ni, and H. Yao, “Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.09114.
- F. Liu et al., “Exploring and Evaluating Hallucinations in LLM-Powered Code Generation,” Apr. 2024, [Online]. Available: http://arxiv.org/abs/2404.00971.
- I.-C. Chern et al., “FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios,” Jul. 2023, [Online]. Available: http://arxiv.org/abs/2307.13528.
- D. Wan and M. Bansal, “FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization,” May 2022, [Online]. Available: http://arxiv.org/abs/2205.07830.
- S. Min et al., “FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.14251.
- P. Roit et al., “Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback,” May 2023, [Online]. Available: http://arxiv.org/abs/2306.00186.
- N. Lee et al., “Factuality Enhanced Language Models for Open-Ended Text Generation,” Jun. 2022, [Online]. Available: http://arxiv.org/abs/2206.04624.
- S. Chen et al., “FELM: Benchmarking Factuality Evaluation of Large Language Models,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2310.00741.
- J. Thorne, A. Vlachos, C. Christodoulopoulos, and A. Mittal, “FEVER: a large-scale dataset for Fact Extraction and VERification,” Mar. 2018, [Online]. Available: http://arxiv.org/abs/1803.05355.
- R. Li, Z. Luo, and X. Du, “FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning,” Oct. 2024, [Online]. Available: http://arxiv.org/abs/2410.06304.
- A.Mishra et al., “Fine-grained Hallucination Detection and Editing for Language Models,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.06855.
- K. Tian, E. Mitchell, H. Yao, C. D. Manning, and C. Finn, “Fine-tuning Language Models for Factuality,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.08401.
- Y. Hu, L. Gan, W. Xiao, K. Kuang, and F. Wu, “Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering,” Jan. 2025, [Online]. Available: http://arxiv.org/abs/2501.06521.
- S.-C. Lin et al., “FLAME: Factuality-Aware Alignment for Large Language Models,” May 2024, [Online]. Available: http://arxiv.org/abs/2405.01525.
- F. F. Bayat et al., “FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.17119.
- T. Vu et al., “FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.03214.
- F. Leiser, S. Eckhardt, M. Knaeble, A. Maedche, G. Schwabe, and A. Sunyaev, “From ChatGPT to FactGPT: A Participatory Design Study to Mitigate the Effects of Large Language Model Hallucinations on Users,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Sep. 2023, pp. 81–90. [CrossRef]
- X. He et al., “G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering,” May 2024, [Online]. Available: http://arxiv.org/abs/2402.07630.
- D. Muhlgay et al., “Generating Benchmarks for Factuality Evaluation of Language Models,” Jul. 2023, [Online]. Available: http://arxiv.org/abs/2307.06908.
- S. Welleck et al., “Generating Sequences by Learning to Self-Correct,” Oct. 2022, [Online]. Available: http://arxiv.org/abs/2211.00053.
- OpenAI et al., “GPT-4 Technical Report,” Mar. 2023, [Online]. Available: http://arxiv.org/abs/2303.08774.
- M. Besta et al., “Graph of Thoughts: Solving Elaborate Problems with Large Language Models,” Aug. 2023. [CrossRef]
- S. Sherif, D. Saad, S. Silva, and V. Gomes, “Graph-Enhanced RAG: A Survey of Methods, Architectures, and Performance,” 2025. [Online]. Available: https://www.researchgate.net/publication/393193258.
- M. Barry et al., “GraphRAG: Leveraging Graph-Based Efficiency to Minimize Hallucinations in LLM-Driven RAG for Finance Data,” 2025. [Online]. Available: https://aclanthology.org/2025.genaik-1.6.pdf.
- Köksal, R. Aksitov, and C.-C. Chang, “Hallucination Augmented Recitations for Language Models,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.07424.
- Y. Chen et al., “Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models,” in International Conference on Information and Knowledge Management, Proceedings, Association for Computing Machinery, Oct. 2023, pp. 245–255. Available: https://arxiv.org/pdf/2407.04121.
- J. Luo, T. Li, D. Wu, M. Jenkin, S. Liu, and G. Dudek, “Hallucination Detection and Hallucination Mitigation: An Investigation,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.08358.
- Y. Xia et al., “Hallucination Diversity-Aware Active Learning for Text Summarization,” Apr. 2024, [Online]. Available: http://arxiv.org/abs/2404.01588.
- Z. Xu, S. Jain, and M. Kankanhalli, “Hallucination is Inevitable: An Innate Limitation of Large Language Models,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.11817.
- W. Zhang and J. Zhang, “Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review,” Mathematics, vol. 13, no. 5. Multidisciplinary Digital Publishing Institute (MDPI), Mar. 01, 2025. [CrossRef]
- D. Gosmar and D. A. Dahl, “Hallucination Mitigation using Agentic AI Natural Language-Based Frameworks,” Jan. 2025, [Online]. Available: http://arxiv.org/abs/2501.13946.
- Z. Bai et al., “Hallucination of Multimodal Large Language Models: A Survey,” Apr. 2024, [Online]. Available: http://arxiv.org/abs/2404.18930.
- T. Rehman, R. Mandal, A. Agarwal, and D. K. Sanyal, “Hallucination Reduction in Long Input Text Summarization,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.16781.
- V. Magesh, F. Surani, M. Dahl, M. Suzgun, C. D. Manning, and D. E. Ho, “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools,” May 2024, [Online]. Available: http://arxiv.org/abs/2405.20362.
- G. P. Reddy, Y. v. Pavan Kumar, and K. P. Prakash, “Hallucinations in Large Language Models (LLMs),” in 2024 IEEE Open Conference of Electrical, Electronic and Information Sciences, eStream 2024 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2024. [CrossRef]
- G. Perković, A. Drobnjak, and I. Botički, “Hallucinations in LLMs: Understanding and Addressing Challenges,” in 2024 47th ICT and Electronics Convention, MIPRO 2024 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 2084–2088. [CrossRef]
- B. Paudel, A. Lyzhov, P. Joshi, and P. Anand, “HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification,” Apr. 2025, [Online]. Available: http://arxiv.org/abs/2504.07069.
- Y. Bang et al., “HalluLens: LLM Hallucination Benchmark,” Apr. 2025, [Online]. Available: http://arxiv.org/abs/2504.17550.
- M. Elaraby et al., “Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models,” Aug. 2023, [Online]. Available: http://arxiv.org/abs/2308.11764.
- Ravichander, S. Ghela, D. Wadden, and Y. Choi, “HALoGEN: Fantastic LLM Hallucinations and Where to Find Them,” Jan. 2025, [Online]. Available: http://arxiv.org/abs/2501.08292.
- Z. Zhu, Y. Yang, and Z. Sun, “HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.04307.
- J. Li, X. Cheng, W. X. Zhao, J.-Y. Nie, and J.-R. Wen, “HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.11747.
- K. Sun, Y. E. Xu, H. Zha, Y. Liu, and X. L. Dong, “Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs?”, Aug. 2023, [Online]. Available: http://arxiv.org/abs/2308.10168.
- F. Leiser et al., “HILL: A Hallucination Identifier for Large Language Models,” in Conference on Human Factors in Computing Systems - Proceedings, Association for Computing Machinery, May 2024. [CrossRef]
- Z. Yang et al., “HOTPOTQA: A Dataset for Diverse, Explainable Multi-hop Question Answering,” Association for Computational Linguistics. [Online]. Available: https://arxiv.org/abs/1809.09600.
- M. Zhang, O. Press, W. Merrill, A. Liu, and N. A. Smith, “How Language Model Hallucinations Can Snowball,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.13534.
- L. Pacchiardi et al., “How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.15840.
- C. S. Mala, G. Gezici, and F. Giannotti, “Hybrid Retrieval for Hallucination Mitigation in Large Language Models: A Comparative Analysis,” Feb. 2025, [Online]. Available: http://arxiv.org/abs/2504.05324.
- J. Wu et al., “Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models,” 2025. [Online]. Available: https://arxiv.org/abs/2502.03199.
- Y. Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving Factuality and Reasoning in Language Models through Multiagent Debate,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.14325.
- I.-C. Chern, Z. Wang, S. Das, B. Sharma, P. Liu, and G. Neubig, “Improving Factuality of Abstractive Summarization via Contrastive Reward Learning,” Jul. 2023, [Online]. Available: http://arxiv.org/abs/2307.04507.
- Y. Zhao et al., “Improving the Robustness of Large Language Models via Consistency Alignment,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.14221.
- S. Chen et al., “In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.01548.
- K. Li, O. Patel, F. Viégas, H. Pfister, and M. Wattenberg, “Inference-Time Intervention: Eliciting Truthful Answers from a Language Model,” Jun. 2023, [Online]. Available: http://arxiv.org/abs/2306.03341.
- C. Chen et al., “INSIDE: LLMs’ Internal States Retain the Power of Hallucination Detection,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2402.03744.
- K. Yin and G. Neubig, “Interpreting Language Models with Contrastive Explanations,” Feb. 2022, [Online]. Available: http://arxiv.org/abs/2202.10419.
- Y. Yehuda, I. Malkiel, O. Barkan, J. Weill, R. Ronen, and N. Koenigstein, “InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers,” Aug. 2024. [Online]. Available: http://arxiv.org/abs/2403.02889.
- N. Varshney et al., “Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation,” Jun. 2024, [Online]. Available: http://arxiv.org/abs/2406.05494.
- G. Chrysostomou, Z. Zhao, M. Williams, and N. Aletras, “Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization”, Nov. 2023, [Online] https://arxiv.org/pdf/2311.09335.
- H. Wu, X. Li, X. Xu, J. Wu, D. Zhang, and Z. Liu, “Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning,” Oct. 2024, [Online]. Available: http://arxiv.org/abs/2410.12130.
- Y. Liu et al., “Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.13860.
- E. Lavrinovics, R. Biswas, J. Bjerva, and K. Hose, “Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective,” Nov. 2024, [Online]. Available: http://arxiv.org/abs/2411.14258.
- D. Dai, L. Dong, Y. Hao, Z. Sui, B. Chang, and F. Wei, “Knowledge Neurons in Pretrained Transformers,” Mar. 2022, [Online]. Available: http://arxiv.org/abs/2104.08696.
- Y. Zhang et al., “Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models,” Jul. 2024, [Online]. Available: http://arxiv.org/abs/2407.08039.
- F. Wan, X. Huang, L. Cui, X. Quan, W. Bi, and S. Shi, “Knowledge Verification to Nip Hallucination in the Bud,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.10768.
- T. B. Brown et al., “Language Models are Few-Shot Learners,” May 2020, [Online]. Available: http://arxiv.org/abs/2005.14165.
- R. Thoppilan et al., “LaMDA: Language Models for Dialog Applications,” Jan. 2022, [Online]. Available: http://arxiv.org/abs/2201.08239.
- M. Turpin, J. Michael, E. Perez, and S. R. Bowman, “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.04388.
- S. Kadavath et al., “Language Models (Mostly) Know What They Know,” Nov. 2022, [Online]. Available: http://arxiv.org/abs/2207.05221.
- T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large Language Models are Zero-Shot Reasoners,” Jan. 2023, [Online]. Available: http://arxiv.org/abs/2205.11916.
- L. Guo, Y. Fang, F. Chen, P. Liu, and S. Xu, “Large Language Models with Adaptive Token Fusion: A Novel Approach to Reducing Hallucinations and Improving Inference Efficiency.” Oct. 24, 2024. [CrossRef]
- M. Dahl, V. Magesh, M. Suzgun, and D. E. Ho, “Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models,” Journal of Legal Analysis, vol. 16, no. 1, pp. 64–93, 2024. [CrossRef]
- M. Elhoushi et al., “LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding,” 2024. Accessed: Aug. 03, 2025. [Online]. Available: https://arxiv.org/abs/2404.16710.
- Y. Liang, Z. Song, H. Wang, and J. Zhang, “Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.15449.
- D. Zhou et al., “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models,” May 2022, [Online]. Available: http://arxiv.org/abs/2205.10625.
- N. Guha et al., “LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models,” Aug. 2023, [Online]. Available: http://arxiv.org/abs/2308.11462.
- H. Lightman et al., “Let’s Verify Step by Step,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.20050.
- N. Nonkes, S. Agaronian, E. Kanoulas, and R. Petcu, “Leveraging Graph Structures to Detect Hallucinations in Large Language Models,” Jul. 2024, [Online]. Available: http://arxiv.org/abs/2407.04485.
- M. Son, J. Jang, and M. Kim, “Lightweight Query Checkpoint: Classifying Faulty User Queries to Mitigate Hallucinations in Large Language Model Question Answering,” Jul. 2025. Accessed: Aug. 04, 2025. [Online]. Available: https://openreview.net/pdf?id=n9C8u6tpT4.
- J. He, Y. Gong, K. Chen, Z. Lin, C. Wei, and Y. Zhao, “LLM Factoscope: Uncovering LLMs’ Factual Discernment through Inner States Analysis,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.16374.
- Z. Zhang, Y. Wang, C. Wang, J. Chen, and Z. Zheng, “LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation,” Sep. 2024, [Online]. Available: http://arxiv.org/abs/2409.20550.
- Z. Ji et al., “LLM Internal States Reveal Hallucination Risk Faced With a Query,” Jul. 2024, [Online]. Available: http://arxiv.org/abs/2407.03282.
- J.-Y. Yao, K.-P. Ning, Z.-H. Liu, M.-N. Ning, Y.-Y. Liu, and L. Yuan, “LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.01469.
- P. Laban et al., “LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.14540.
- S. Banerjee, A. Agarwal, and S. Singla, “LLMs Will Always Hallucinate, and We Need to Live With This,” 2024. [CrossRef]
- R. Cohen, M. Hamri, M. Geva, and A. Globerson, “LM vs LM: Detecting Factual Errors via Cross Examination,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.13281.
- Y.-S. Chuang, L. Qiu, C.-Y. Hsieh, R. Krishna, Y. Kim, and J. Glass, “Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps,” Oct. 2024, [Online]. Available: http://arxiv.org/abs/2407.07071.
- N. M. Guerreiro, E. Voita, and A. F. T. Martins, “Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation,” Aug. 2022, [Online]. Available: http://arxiv.org/abs/2208.05309.
- T. Gao, A. Fisch, and D. Chen, “Making Pre-trained Language Models Better Few-shot Learners,” Jun. 2021, [Online]. Available: http://arxiv.org/abs/2012.15723.
- Yoran, T. Wolfson, O. Ram, and J. Berant, “Making Retrieval-Augmented Language Models Robust to Irrelevant Context,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.01558.
- A.Gundogmusler, F. Bayindiroglu, and M. Karakucukoglu, “Mathematical Foundations of Hallucination in Transformer-Based Large Language Models for Improvisation.” Jun. 24, 2024. [CrossRef]
- K. Li et al., “Measuring and Controlling Instruction (In)Stability in Language Model Dialogs,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2402.10962.
- J. Wei, Y. Yao, J.-F. Ton, H. Guo, A. Estornell, and Y. Liu, “Measuring and Reducing LLM Hallucination without Gold-Standard Answers,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2402.10412.
- A.Shrivastava, J. Hullman, and M. Lamparth, “Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations,” Oct. 2024, [Online]. Available: http://arxiv.org/abs/2410.13204.
- L. Yu, M. Cao, J. C. K. Cheung, and Y. Dong, “Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations,” Jun. 2024, [Online]. Available: http://arxiv.org/abs/2403.18167.
- Y. Kim et al., “Medical Hallucinations in Foundation Models and Their Impact on Healthcare,” Feb. 2025, [Online]. Available: http://arxiv.org/abs/2503.05777.
- M. Suzgun and A. T. Kalai, “Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.12954.
- A.Bilal, M. A. Mohsin, M. Umer, M. A. K. Bangash, and M. A. Jamshed, “Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey,” Apr. 2025, [Online]. Available: http://arxiv.org/abs/2504.14520.
- S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,” in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, Jun. 2005, pp. 65–72. [Online]. Available: https://aclanthology.org/W05-0909/.
- W. Liu et al., “Mind’s Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.09214.
- S. Fairburn and J. Ainsworth, “Mitigate Large Language Model Hallucinations with Probabilistic Inference in Graph Neural Networks.” Jul. 01, 2024. [CrossRef]
- W. Su, Y. Tang, Q. Ai, C. Wang, Z. Wu, and Y. Liu, “Mitigating Entity-Level Hallucination in Large Language Models,” in Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, New York, NY, USA: ACM, Dec. 2024, pp. 23–31. [CrossRef]
- A.Braverman, W. Zhang, and Q. Gu, “Mitigating Hallucination in Large Language Models with Explanatory Prompting,” 2024, [Online]. Available: https://neurips.cc/virtual/2024/105546.
- M. Grayson, C. Patterson, B. Goldstein, S. Ivanov, and M. Davidson, “Mitigating Hallucinations in Large Language Models using a Channel-Aware Domain-Adaptive Generative Adversarial Network (CADAGAN).” Sep. 30, 2024. [CrossRef]
- Z. Tang, R. Chatterjee, and S. Garg, “Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization,” Jan. 2025, [Online]. Available: http://arxiv.org/abs/2501.17295.
- F. Harrington, E. Rosenthal, and M. Swinburne, “Mitigating Hallucinations in Large Language Models with Sliding Generation and Self-Checks.” Aug. 06, 2024. [CrossRef]
- X. Guan et al., “Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting,” 2024. [Online]. Available: http://arxiv.org/abs/2311.13314.
- M. Hu, B. He, Y. Wang, L. Li, C. Ma, and I. King, “Mitigating Large Language Model Hallucination with Faithful Finetuning,” Jun. 2024, [Online]. Available: http://arxiv.org/abs/2406.11267.
- H. Li, G. Appleby, K. Alperin, S. R. Gomez, and A. Suh, “Mitigating LLM Hallucinations with Knowledge Graphs: A Case Study,” Apr. 2025, [Online]. Available: http://arxiv.org/abs/2504.12422.
- J. Pfeiffer, F. Piccinno, M. Nicosia, X. Wang, M. Reid, and S. Ruder, “mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.14224.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum Contrast for Unsupervised Visual Representation Learning,” Nov. 2019, [Online]. Available: http://arxiv.org/abs/1911.05722.
- D. Huh and P. Mohapatra, “Multi-agent Reinforcement Learning: A Comprehensive Survey,” Jul. 2024, [Online]. Available: http://arxiv.org/abs/2312.10256.
- L. van der Poel, R. Cotterell, and C. Meister, “Mutual Information Alleviates Hallucinations in Abstractive Summarization,” Oct. 2022, [Online]. Available: http://arxiv.org/abs/2210.13210.
- T. Kwiatkowski et al., “Natural Questions: A Benchmark for Question Answering Research”. [CrossRef]
- D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” Sep. 2014, [Online]. Available: http://arxiv.org/abs/1409.0473.
- N. Dziri, A. Madotto, O. Zaiane, and A. J. Bose, “Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding,” Apr. 2021, [Online]. Available: http://arxiv.org/abs/2104.08455.
- J. Hoscilowicz et al., “Non-Linear Inference Time Intervention: Improving LLM Truthfulness,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.18680.
- J. Maynez, S. Narayan, B. Bohnet, and R. McDonald, “On Faithfulness and Factuality in Abstractive Summarization,” May 2020, [Online]. Available: http://arxiv.org/abs/2005.00661.
- Y. Xiao and W. Y. Wang, “On Hallucination and Predictive Uncertainty in Conditional Language Generation,” Mar. 2021, [Online]. Available: http://arxiv.org/abs/2103.15025.
- A.Jiang et al., “On Large Language Models’ Hallucination with Regard to Known Facts,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.20009.
- L. Parcalabescu and A. Frank, “On Measuring Faithfulness or Self-consistency of Natural Language Explanations,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.07466.
- N. Dziri, S. Milton, M. Yu, O. Zaiane, and S. Reddy, “On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models?” Apr. 2022, [Online]. Available: http://arxiv.org/abs/2204.07931.
- N. Shazeer et al., “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer,” Jan. 2017, [Online]. Available: http://arxiv.org/abs/1701.06538.
- R. Chen, A. Arditi, H. Sleight, O. Evans, and J. Lindsey, “Persona Vectors: Monitoring and Controlling Character Traits in Language Models,” Jul. 2025, [Online]. Available: http://arxiv.org/abs/2507.21509.
- N. Joshi, J. Rando, A. Saparov, N. Kim, and H. He, “Personas as a Way to Model Truthfulness in Language Models,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.18168.
- A.Zhu et al., “PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics,” Apr. 2024, [Online]. Available: http://arxiv.org/abs/2404.04722.
- J. N. Yan et al., “Predicting Text Preference Via Structured Comparative Reasoning,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.08390.
- X. L. Li and P. Liang, “Prefix-Tuning: Optimizing Continuous Prompts for Generation,” Jan. 2021, [Online]. Available: http://arxiv.org/abs/2101.00190.
- Z. Sun et al., “Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.03047.
- S. Cao et al., “Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.13982.
- A.Si et al., “Prompting GPT-3 To Be Reliable,” Oct. 2022, [Online]. Available: http://arxiv.org/abs/2210.09150.
- B.Y. Chang, “Prompting Large Language Models With the Socratic Method,” Feb. 2023, [Online]. Available: http://arxiv.org/abs/2303.08769.
- Q. Jin, B. Dhingra, Z. Liu, W. W. Cohen, and X. Lu, “PubMedQA: A Dataset for Biomedical Research Question Answering,” Sep. 2019, [Online]. Available: http://arxiv.org/abs/1909.06146.
- A.Chen, P. Pasupat, S. Singh, H. Lee, and K. Guu, “PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.14908.
- Honovich, L. Choshen, R. Aharoni, E. Neeman, I. Szpektor, and O. Abend, Q2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering,” Apr. 2021, [Online]. Available: http://arxiv.org/abs/2104.08202.
- L. Du et al., “Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.05217.
- N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, and C. Zhang, “Quantifying Memorization Across Neural Language Models,” Feb. 2022, [Online]. Available: http://arxiv.org/abs/2202.07646.
- H. Zhang et al., “R-Tuning: Instructing Large Language Models to Say `I Don’t Know’,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.09677.
- H. Q. Yu and F. McQuade, “RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning through RAG and Incremental Knowledge Graph Learning Integration,” Mar. 2025, [Online]. Available: http://arxiv.org/abs/2503.13514.
- C. Niu et al., “RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2401.00396.
- J. J. Ross, E. Khramtsova, A. van der Vegt, B. Koopman, and G. Zuccon, “RARR Unraveled: Component-Level Insights into Hallucination Detection and Mitigation,” in SIGIR 2025 - Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, Inc, Jul. 2025, pp. 3286–3295. [CrossRef]
- L. Gao et al., “RARR: Researching and Revising What Language Models Say, Using Language Models,” Oct. 2022, [Online]. Available: http://arxiv.org/abs/2210.08726.
- D. Su et al., “Read before Generate! Faithful Long Form Question Answering with Machine Reading,” 2022. Accessed: May 12, 2025. [Online]. Available: https://arxiv.org/abs/2203.00343.
- S. Hao et al., “Reasoning with Language Model is Planning with World Model,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.14992.
- A.Berberette, J. Hutchins, and A. Sadovnik, “Redefining ‘Hallucination’ in LLMs: Towards a psychology-informed framework for mitigating misinformation,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2402.01769.
- S. Suzuoki and K. Hatano, “Reducing Hallucinations in Large Language Models: A Consensus Voting Approach Using Mixture of Experts.” Jun. 24, 2024. [CrossRef]
- Y. Liu et al., “Reducing hallucinations of large language models via hierarchical semantic piece,” Complex and Intelligent Systems, vol. 11, no. 5, May 2025. [CrossRef]
- S. Verma, K. Tran, Y. Ali, and G. Min, “Reducing LLM Hallucinations using Epistemic Neural Networks,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.15576.
- Z. Zhao, S. B. Cohen, and B. Webber, “Reducing Quantity Hallucinations in Abstractive Summarization,” Sep. 2020, [Online]. Available: http://arxiv.org/abs/2009.13312.
- X. Hu et al., “RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models,” May 2024, [Online]. Available: http://arxiv.org/abs/2405.14486.
- T. Yan and T. Xu, “Refining the Responses of LLMs by Themselves,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.04039.
- N. Shinn, F. Cassano, E. Berman, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language Agents with Verbal Reinforcement Learning,” Mar. 2023, [Online]. Available: http://arxiv.org/abs/2303.11366.
- R. S. Sutton and A. G. Barto, “Reinforcement Learning: An Introduction.” 2015, [Online]. Available: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf.
- W. Shi et al., “REPLUG: Retrieval-Augmented Black-Box Language Models,” Jan. 2023, [Online]. Available: http://arxiv.org/abs/2301.12652.
- A.Zou et al., “Representation Engineering: A Top-Down Approach to AI Transparency,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.01405.
- K. Shuster, S. Poff, M. Chen, D. Kiela, and J. Weston, “Retrieval Augmentation Reduces Hallucination in Conversation,” Apr. 2021, [Online]. Available: http://arxiv.org/abs/2104.07567.
- J. Feng, Q. Wang, H. Qiu, and L. Liu, “Retrieval In Decoder benefits generative models for explainable complex question answering,” Neural Networks, vol. 181, Jan. 2025. [CrossRef]
- P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” May 2020, [Online]. Available: http://arxiv.org/abs/2005.11401.
- Z. Ji et al., “RHO (ρ): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding.” May 2023, [Online]. Available: https://arxiv.org/abs/2212.01588.
- A. “RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.00267.
- K. Yang, D. Klein, A. Celikyilmaz, N. Peng, and Y. Tian, “RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment,” Jul. 2023, [Online]. Available: http://arxiv.org/abs/2307.12950.
- C.-Y. Lin, “ROUGE: A package for automatic evaluation of summaries,” in Proceedings of the ACL Workshop on Text Summarization Branches Out, Barcelona, Spain, Jul. 2004, pp. 74–81.
- A.Chung et al., “Scaling Instruction-Finetuned Language Models,” Oct. 2022, [Online]. Available: http://arxiv.org/abs/2210.11416.
- X. Wang et al., “SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models,” Jul. 2023, [Online]. Available: http://arxiv.org/abs/2307.10635.
- P. Wang, Z. Wang, Z. Li, Y. Gao, B. Yin, and X. Ren, “SCOTT: Self-Consistent Chain-of-Thought Distillation,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.01879.
- M. Li, B. Peng, M. Galley, J. Gao, and Z. Zhang, “Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.14623.
- X. Wang et al., “Self-Consistency Improves Chain of Thought Reasoning in Language Models.” May. 2023, [Online]. Available: https://arxiv.org/abs/2203.11171.
- N. Mündler, J. He, S. Jenko, and M. Vechev, “Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.15852.
- T. Schick, S. Udupa, and H. Schütze, “Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP,” Feb. 2021, [Online]. Available: http://arxiv.org/abs/2103.00453.
- Y. Wang et al., “Self-Instruct: Aligning Language Models with Self-Generated Instructions,” Dec. 2022, [Online]. Available: http://arxiv.org/abs/2212.10560.
- A.Asai, Z. Wu, Y. Wang, A. Sil, and H. Hajishirzi, “Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.11511.
- B.Madaan et al., “Self-Refine: Iterative Refinement with Self-Feedback,” Mar. 2023, [Online]. Available: http://arxiv.org/abs/2303.17651.
- P. Manakul, A. Liusie, and M. J. F. Gales, “SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models,” Mar. 2023, [Online]. Available: http://arxiv.org/abs/2303.08896.
- X. Qiu and R. Miikkulainen, “Semantic Density: Uncertainty Quantification for Large Language Models through Confidence Measurement in Semantic Space,” May 2024, [Online]. Available: http://arxiv.org/abs/2405.13845.
- A.Kossen, J. Han, M. Razzak, L. Schut, S. Malik, and Y. Gal, “Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs,” 2024. Accessed: May 03, 2025. [Online]. Available: https://arxiv.org/pdf/2406.15927.
- B.Kai, T. Zhang, H. Hu, and Z. Lin, “SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.05930.
- Z. He, B. Zhang, and L. Cheng, “Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs’ Decoding Layers,” Mar. 2025, [Online]. Available: http://arxiv.org/abs/2503.02851.
- Y. Zhang et al., “Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.01219.
- H. Nguyen, Z. He, S. A. Gandre, U. Pasupulety, S. K. Shivakumar, and K. Lerman, “Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation,” Feb. 2025, [Online]. Available: http://arxiv.org/abs/2502.11306.
- N. McKenna, T. Li, L. Cheng, M. J. Hosseini, M. Johnson, and M. Steedman, “Sources of Hallucination by Large Language Models on Inference Tasks,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.14552.
- P. Elchafei and M. Abu-Elkheir, “Span-Level Hallucination Detection for LLM-Generated Answers,” Apr. 2025, [Online]. Available: http://arxiv.org/abs/2504.18639.
- Y. Qiu, Z. Zhao, Y. Ziser, A. Korhonen, E. M. Ponti, and S. B. Cohen, “Spectral Editing of Activations for Large Language Model Alignment,” May 2024, [Online]. Available: http://arxiv.org/abs/2405.09719.
- R. Tian, S. Narayan, T. Sellam, and A. P. Parikh, “Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation,” Oct. 2019, [Online]. Available: http://arxiv.org/abs/1910.08684.
- A. Levinstein and D. A. Herrmann, “Still No Lie Detector for Language Models: Probing Empirical and Conceptual Roadblocks,” Jun. 2023. [CrossRef]
- Z. Ji et al., “Survey of Hallucination in Natural Language Generation,” July. 2024, [Online]. Available: https://arxiv.org/pdf/2202.03629.
- E. Jones et al., “Teaching Language Models to Hallucinate Less with Synthetic Tasks,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.06827.
- S. Lin, J. Hilton, and O. Evans, “Teaching Models to Express Their Uncertainty in Words,” May 2022, [Online]. Available: http://arxiv.org/abs/2205.14334.
- V. Raunak, A. Menezes, and M. Junczys-Dowmunt, “The Curious Case of Hallucinations in Neural Machine Translation,” Apr. 2021, [Online]. Available: http://arxiv.org/abs/2104.06683.
- A.Slobodkin, O. Goldman, A. Caciularu, I. Dagan, and S. Ravfogel, “The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.11877.
- B.Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi, “The Curious Case of Neural Text Degeneration,” Apr. 2019, [Online]. Available: http://arxiv.org/abs/1904.09751.
- A.Li et al., “The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.03205.
- R. Xu et al., “The Earth is Flat because...: Investigating LLMs’ Belief towards Misinformation via Persuasive Conversation,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.09085.
- S. Marks and M. Tegmark, “The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.06824.
- G. Hong et al., “The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models,” Apr. 2024, [Online]. Available: http://arxiv.org/abs/2404.05904.
- A.Azaria and T. Mitchell, “The Internal State of an LLM Knows When It’s Lying,” Apr. 2023, [Online]. Available: http://arxiv.org/abs/2304.13734.
- S. Zhang, L. Pan, J. Zhao, and W. Y. Wang, “The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.13669.
- A.van Deemter, “The Pitfalls of Defining Hallucination,” Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.07897.
- B.Lester, R. Al-Rfou, N. Constant, and G. Research, “The Power of Scale for Parameter-Efficient Prompt Tuning.” [Online]. Available: https://arxiv.org/abs/2104.08691.
- V. Rawte et al., “The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.04988.
- X. Cheng, J. Li, W. X. Zhao, and J.-R. Wen, “Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking,” Jan. 2025, [Online]. Available: http://arxiv.org/abs/2501.01306.
- Y. Qiu, V. Embar, S. B. Cohen, and B. Han, “Think While You Write: Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation,” Nov. 2023, [Online]. Available: http://arxiv.org/abs/2311.09467.
- Z. Ji, T. Yu, Y. Xu, N. Lee, E. Ishii, and P. Fung, “Towards Mitigating Hallucination in Large Language Models via Self-Reflection,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.06271.
- Z. Lin, S. Guan, W. Zhang, H. Zhang, Y. Li, and H. Zhang, “Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models,” Artificial Intelligence Review, vol. 57, no. 9, Sep. 2024. [CrossRef]
- A.Sharma et al., “Towards Understanding Sycophancy in Language Models,” Oct. 2023, [Online]. Available: http://arxiv.org/abs/2310.13548.
- J. Hoffmann et al., “Training Compute-Optimal Large Language Models,” Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.15556.
- L. Ouyang et al., “Training language models to follow instructions with human feedback,” Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.02155.
- P. Feldman, J. R. Foulds, and S. Pan, “Trapping LLM Hallucinations Using Tagged Context Prompts,” Jun. 2023, [Online]. Available: http://arxiv.org/abs/2306.06085.
- S. Yao et al., “Tree of Thoughts: Deliberate Problem Solving with Large Language Models,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.10601.
- A.Joshi, E. Choi, D. S. Weld, and L. Zettlemoyer, “TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension,” May 2017, [Online]. Available: http://arxiv.org/abs/1705.03551.
- Z. Gekhman, J. Herzig, R. Aharoni, C. Elkind, and I. Szpektor, “TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.11171.
- W. Shi, X. Han, M. Lewis, Y. Tsvetkov, L. Zettlemoyer, and S. W. Yih, “Trusting Your Evidence: Hallucinate Less with Context-aware Decoding,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.14739.
- Z. Chen et al., “Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.17484.
- A.Simhi, I. Itzhak, F. Barez, G. Stanovsky, and Y. Belinkov, “Trust Me, I’m Wrong: High-Certainty Hallucinations in LLMs,” Feb. 2025, [Online]. Available: http://arxiv.org/abs/2502.12964.
- S. Lin, J. Hilton and O. Evans, “TruthfulQA: Measuring How Models Mimic Human Falsehoods,” Long Papers, May 2022. [Online]. Available: https://arxiv.org/abs/2109.07958.
- S. Zhang, T. Yu, and Y. Feng, “TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space,” Feb. 2024, [Online]. Available: http://arxiv.org/abs/2402.17811.
- L. Chen, X. Wu, Z. Xiong, and X. Kang, “Two Stage Psychology-Guided Fine-Grained Editing and Sampling Approach for Mitigating Hallucination in Large Language Models Publication,” 2025. Accessed: Aug. 04, 2025. [Online]. Available: https://escholarship.org/uc/item/0gn8m1qq.
- X. Liang et al., “UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation,” Nov. 2023. [CrossRef]
- W. Xu, S. Agrawal, E. Briakou, M. J. Martindale, and M. Carpuat, “Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection,” Jan. 2023, [Online]. Available: http://arxiv.org/abs/2301.07779.
- A.Pagnoni, V. Balachandran, and Y. Tsvetkov, “Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics,” Apr. 2021, [Online]. Available: http://arxiv.org/abs/2104.13346.
- G. Alain and Y. Bengio, “Understanding intermediate layers using linear classifier probes,” Oct. 2016, [Online]. Available: http://arxiv.org/abs/1610.01644.
- A.Cheng et al., “UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation,” Mar. 2023, [Online]. Available: http://arxiv.org/abs/2303.08518.
- Abd Elrahman Amer and Magdi Amer, “Using multi-agent architecture to mitigate the risk of LLM hallucinations,” Jul. 2025, Accessed: Jul. 05, 2025. [Online]. Available: https://arxiv.org/pdf/2507.01446.
- R. Zhao, X. Li, S. Joty, C. Qin, and L. Bing, “Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework,” May 2023, [Online]. Available: http://arxiv.org/abs/2305.03268.
- M. Rateike, C. Cintas, J. Wamburu, T. Akumu, and S. Speakman, “Weakly Supervised Detection of Hallucinations in LLM Activations,” Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.02798.
- J. D. Zamfirescu-Pereira, R. Y. Wong, B. Hartmann, and Q. Yang, “Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts,” in Conference on Human Factors in Computing Systems - Proceedings, Association for Computing Machinery, Apr. 2023. [CrossRef]
- A.Lewis, M. White, J. Liu, T. Koike-Akino, K. Parsons, and Y. Wang, “Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents,” Feb. 2025, [Online]. Available: http://arxiv.org/abs/2502.19545.
- B.Xu et al., “WizardLM: Empowering Large Language Models to Follow Complex Instructions,” Apr. 2023, [Online]. Available: http://arxiv.org/abs/2304.12244.
- J. Luo, C. Xiao, and F. Ma, “Zero-Resource Hallucination Prevention for Large Language Models,” Sep. 2023, [Online]. Available: http://arxiv.org/abs/2309.02654.



















Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
