From Verification to Internalization: A Cognitive Science Perspective on Verifier-Free Reinforcement Learning

Qian Zha; Yuan Wu; Yi Chang

doi:10.20944/preprints202606.0529.v1

Submitted:

05 June 2026

Posted:

08 June 2026

You are already at the latest version

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has advanced Large Language Models (LLMs) reasoning by providing objective feedback, yet it remains fundamentally dependent on external verifiers, which limits self-regulated reasoning and generalization. We propose a shift toward internalization, relocating verification from external infrastructure into model-internal signals. We formalize this paradigm through a four-dimensional taxonomy: Probabilistic, Uncertainty, Process, and Interaction Internalization. This taxonomy captures how verifier-free reinforcement learning (VFRL) derives evaluative signals from likelihood, uncertainty, intermediate reasoning steps, and candidate interactions. This perspective enables dense, scalable, and model-driven supervision while highlighting characteristic failure modes such as proxy misalignment, miscalibration, local-process errors and preference drift. Our analysis systematizes recent VFRL methods, delineates their strengths and limitations, and outlines research directions for building reliable, auditable, and self-supervised reasoning agents.

Keywords:

verifier-free reinforcement learning

;

reinforcement learning with verifiable rewards

;

large language models

;

internalized verification

;

LLM reasoning

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

From Verification to Internalization: A Cognitive Science Perspective on Verifier-Free Reinforcement Learning

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe