Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

Rashid Mehmood; Eid Rehman; Muhammad Habib

doi:10.20944/preprints202505.2253.v2

Submitted:

30 March 2026

Posted:

01 April 2026

You are already at the latest version

Abstract

The rapid advancement of Large Language Models (LLMs) has sparked a debate on whether their performance reflects genuine inferential reasoning or sophisticated rote memorization of internet-scale datasets. While LLMs achieve high scores on standardized benchmarks, these metrics often fail to distinguish between the retrieval of learned patterns and the application of underlying logical principles. This study provides a diagnostic characterization of LLM behavior through a series of targeted probes designed to isolate structural reasoning breaks. Our experiments reveal a persistent "grounding gap" across contemporary models, where surface-level linguistic fluency masks failures in mechanical plausibility, geometric transformation, and multi-entity relational consistency. We identify a computational analog of the Einstellung effect, wherein models default to high-probability training templates even when presented with explicit counterfactual constraints. Furthermore, our analysis of the Abstraction and Reasoning Corpus (ARC-AGI) and proprietary cross-modal probes demonstrates that model performance is often "jagged"—highly sensitive to prompt structure and prone to context misattribution across conversation turns. These findings suggest that current architectures remain tightly coupled to training-time statistical distributions and lack stable mechanisms for internal verification or adaptive restructuring. In light of these findings, we advocate for a shift in AI evaluation from static, outcome-oriented benchmarks toward diagnostic, novelty-persistent frameworks that prioritize cognitive autonomy and introspective self-auditing. By mapping the boundaries where probabilistic pattern matching diverges from functional reasoning, this work underscores a critical requirement for architectural paradigms that move beyond mere parameter scaling. We conclude that achieving grounded, self-regulating intelligence necessitates sys- tems capable of maintaining structural invariants and verifying internal logic independently of training-time statistical frequencies. “Language serves as a medium for expressing intelligence, not as a substrate for its storage”.

Keywords:

Artificial General Intelligence (AGI)

;

large language models (LLMs)

;

Abstraction and Rea-soning Corpus (ARC)

;

test-time adaptation

;

meta-learning

;

cognitive limitations of AI

;

generalization

;

reasoning benchmarks

;

self-Auditing

;

few-shot learning

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Rote Memorization or Intelligence: An Assessment of Inferential Reasoning in Large Language Models

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe