Limits of Self-Correction in LLMs: An Information-Theoretic Analysis of Correlated Errors

Andrew Michael Brilliant

doi:10.20944/preprints202601.0892.v4

Submitted:

09 April 2026

Posted:

09 April 2026

You are already at the latest version

Abstract

We develop a diagnostic framework for evaluating when LLM self-evaluation can be trusted. The framework's central results are: (1) under a shared-blind-spot modeling assumption with joint conditional independence of evaluations given the shared failure structure, k rounds of self-critique provide information about correctness bounded by what the shared latent failure variable Z mediates---not by any independent channel---so that confidence accumulated through repeated self-evaluation reflects the shared failure structure rather than independently accumulated evidence; and (2) a selector satisfying two independently measurable sufficient conditions---bounded false-acceptance and true-acceptance exceeding that bound---provides a quantifiable lower bound on evidence about correctness. Both results are conditional on explicit modeling assumptions. We also prove an information-theoretic bound showing that self-evaluation is bounded in what it can add when a shared latent failure structure mediates both generation and evaluation errors; we foreground this as scaffolding rather than a primary contribution, since the latent variable requires independent operationalization to give the bound empirical bite.The diagnostic framework identifies what to measure to determine whether a deployed system is in the failure regime, and what properties an external selector must have to escape it. We describe design principles for an architecture motivated by this analysis; same-model context separation is an engineering heuristic, not a theoretical solution, and we present it as a practical starting point pending empirical validation.

Keywords:

LLM

;

self-correction

;

information theory

;

error correlation

;

external selection

;

multi-agent verification

;

context separation

;

language models

;

reasoning

;

validation

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Limits of Self-Correction in LLMs: An Information-Theoretic Analysis of Correlated Errors

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe