Preprint
Article

This version is not peer-reviewed.

From Verification to Internalization: A Cognitive Science Perspective on Verifier-Free Reinforcement Learning

Submitted:

05 June 2026

Posted:

08 June 2026

You are already at the latest version

Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) has advanced Large Language Models (LLMs) reasoning by providing objective feedback, yet it remains fundamentally dependent on external verifiers, which limits self-regulated reasoning and generalization. We propose a shift toward internalization, relocating verification from external infrastructure into model-internal signals. We formalize this paradigm through a four-dimensional taxonomy: Probabilistic, Uncertainty, Process, and Interaction Internalization. This taxonomy captures how verifier-free reinforcement learning (VFRL) derives evaluative signals from likelihood, uncertainty, intermediate reasoning steps, and candidate interactions. This perspective enables dense, scalable, and model-driven supervision while highlighting characteristic failure modes such as proxy misalignment, miscalibration, local-process errors and preference drift. Our analysis systematizes recent VFRL methods, delineates their strengths and limitations, and outlines research directions for building reliable, auditable, and self-supervised reasoning agents.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated