Dynamic Closed-Loop Steering for Robust and Interpretable System-2 Reasoning in Large Language Models

Deyu Meng; Tongchuan Xia; Yuanxin Cai

doi:10.20944/preprints202605.1567.v1

Submitted:

22 May 2026

Posted:

25 May 2026

You are already at the latest version

Abstract

Large language models, or LLMs, are moving from fast token generation toward deliberate multi-step reasoning. Scaling test-time compute has become a key way to improve performance on complex tasks because it gives models more opportunity to develop intermediate reasoning before producing an answer. However, unconstrained compute scaling frequently leads to a practical failure mode known as "overthinking." Internally, models become trapped in high-entropy local minima, causing them to repeatedly elaborate on incorrect logical trajectories. Although recent inference-time interventions and budget constraints have shown promise in regulating this extended reasoning, they face two critical challenges: (1) external budget forcing applies coarse token truncation that disrupts logical coherence, and (2) static internal interventions rely on linear hidden-state additions that distort the model's inherent activation norms. Such geometric distortion can cause orthogonal noise, semantic drift, and state shock. We therefore propose Dynamic Closed-Loop Steering, an activation-engineering framework that unifies manifold geometry and feedback control to regulate long-chain test-time reasoning without retraining. Its core principle is to monitor overthinking as latent trajectory divergence while preserving the LayerNorm-induced hyperspherical geometry needed for safe correction. Specifically, our framework utilizes a highly interpretable dual-sensor system—combining EMA entropy for divergence tracking and a ThinkBrake logit margin for endogenous convergence detection—to monitor reasoning states in real time. Driven by an Adaptive Non-linear PD Regulator, the system actively corrects deviations via a geometry-consistent actuation path. By performing PCA-based manifold projection and norm-preserving spherical steering, it redirects drifting hidden states back toward valid reasoning manifolds without destructive perturbations. Empirical evaluations on challenging benchmarks, including MATH500 and AIME, show that our framework delivers a strict Pareto improvement over raw compute scaling. Beyond overall accuracy gains, micro-level trajectory analysis reveals that our approach improves the model's self-correction stability during long-horizon reasoning. On AIME2024, it achieves a 10.47-fold hard-tier Transition Advantage Ratio improvement, a 10:1 rescue-to-break ratio with 51 rescued and 5 broken, and triggers an immediate 71.44% recovery from error episodes. By suppressing entropy escalation to near-baseline safety levels, this work establishes a practical, geometry-preserving foundation for robust AI reasoning systems.

Keywords:

trustworthy AI

;

model robustness

;

interpretability

;

inference-time intervention

;

test-time compute

;

overthinking

;

manifold projection

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Dynamic Closed-Loop Steering for Robust and Interpretable System-2 Reasoning in Large Language Models

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe