Preprint
Article

This version is not peer-reviewed.

Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

Submitted:

30 March 2026

Posted:

01 April 2026

You are already at the latest version

Abstract
The rapid advancement of Large Language Models (LLMs) has sparked a debate on whether their performance reflects genuine inferential reasoning or sophisticated rote memorization of internet-scale datasets. While LLMs achieve high scores on standardized benchmarks, these metrics often fail to distinguish between the retrieval of learned patterns and the application of underlying logical principles. This study provides a diagnostic characterization of LLM behavior through a series of targeted probes designed to isolate structural reasoning breaks. Our experiments reveal a persistent "grounding gap" across contemporary models, where surface-level linguistic fluency masks failures in mechanical plausibility, geometric transformation, and multi-entity relational consistency. We identify a computational analog of the Einstellung effect, wherein models default to high-probability training templates even when presented with explicit counterfactual constraints. Furthermore, our analysis of the Abstraction and Reasoning Corpus (ARC-AGI) and proprietary cross-modal probes demonstrates that model performance is often "jagged"—highly sensitive to prompt structure and prone to context misattribution across conversation turns. These findings suggest that current architectures remain tightly coupled to training-time statistical distributions and lack stable mechanisms for internal verification or adaptive restructuring. In light of these findings, we advocate for a shift in AI evaluation from static, outcome-oriented benchmarks toward diagnostic, novelty-persistent frameworks that prioritize cognitive autonomy and introspective self-auditing. By mapping the boundaries where probabilistic pattern matching diverges from functional reasoning, this work underscores a critical requirement for architectural paradigms that move beyond mere parameter scaling. We conclude that achieving grounded, self-regulating intelligence necessitates sys- tems capable of maintaining structural invariants and verifying internal logic independently of training-time statistical frequencies. “Language serves as a medium for expressing intelligence, not as a substrate for its storage”.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated