Preprint
Article

This version is not peer-reviewed.

A Tutorial on Cognitive Optimization in External Memory Systems: Reinterpreting Stolfi’s Layers and Currier’s Partitions as Retrieval-Oriented Design Choices Illustrated by the Voynich Manuscript

Submitted:

03 February 2026

Posted:

04 February 2026

You are already at the latest version

Abstract
In the companion tutorial (Durdanovic, 2026), Bayesian model selection under a strict Zero-Patch Standard strongly favored a Structured Reference System (Href) over natural language (Hlang) or conventional cipher hypotheses for the Voynich Manuscript (VMS). This paper addresses the consequent mechanism: what cognitive pressures plausibly led a 15th-century author to construct a text with rigid morphology, high type-level uniqueness, and sectional vocabulary disjointness?We argue that the field requires a high-level course correction: moving from postulating linguistic priors to deducing the system's purpose from its apparent anomalies. We reinterpret Stolfi’s crust-mantle-core decomposition and Currier’s sectional partitions not as linguistic failures, but as cognitive optimizations for manual retrieval in an external memory system. We explicitly derive the Positional Coding of glyphs (rigid slots) as a "visual combination lock" designed to exploit parallel feature binding (Treisman & Gelade, 1980; Wolfe, 1994). The VMS is thus interpreted as a paper-based external memory artifact—a specialized reference book designed to offload complex domain knowledge while preserving low-error lookup under manual search constraints.
Keywords: 
;  ;  ;  ;  ;  ;  

1. Introduction: From Anomaly to Constitutive Regularity

Scientific progress often occurs when persistent discrepancies are reframed not as noise, but as the constitutive regularities of the system [3]. In Voynich studies, two long-standing observations have resisted integration into linguistic models:
1.
Stolfi’s rigid morphology: Tokens decompose into small prefix (crust), large root (mantle), and small suffix (core) sets with low positional variance [4].
2.
Currier’s sectional partitioning: Vocabulary overlap between major thematic sections (e.g., herbal vs. astronomical) is near-zero after accounting for noise [5,6].
Under natural-language priors, these are anomalies requiring extensive patching. Under the Structured Reference System prior (Href) they become necessary design features.

1.1. Methodological Scope: Topological Invariants vs. Transcription Disputes

The VMS has resisted decipherment for over a century, with debates often stalling on the granularity of transcription (e.g., whether a glyph is a ligature or a singleton). We propose a reframing: rather than asking what the VMS encodes, we ask what cognitive and structural constraints its architecture satisfies.
This analysis relies on structural patterns documented in prior VMS research: high type-level uniqueness [6,7], rigid morphological templating [4,5,8], sectional vocabulary disjointness [5], positional glyph constraints [4,9], and systematic label compression [10]. Recent linguistic surveys confirm that these properties are incompatible with known language families [11].
Key Observation: These patterns have been independently replicated across multiple transcription systems (EVA, Frogguy, Extensible-EVA, Friedman-Bennett) and research groups over five decades. Quantitative magnitudes (e.g., reported hapax rates) vary with transcription rules; our claims rely only on transcription-invariant qualitative contrasts. While parameter values shift, the contrast with natural language remains consistent: the VMS exhibits far higher uniqueness, far lower sectional overlap, and far more rigid templating than any known linguistic corpus. These are not subtle statistical deviations; they are the gross architectural features of the system.
Our contribution is not new corpus measurements, but a cross-domain synthesis: we show that these transcription-invariant structural features are functionally predictable from cognitive constraints on manual information retrieval (working memory limits, visual search efficiency, slot-based categorization). The mechanistic hypothesis therefore rests on the architectural invariants defined in the Evidence Vector E (see Durdanovic, 2026 [12]), robust to local disputes over character boundaries.

2. Cognitive Constraints on Pre-Digital Retrieval

We propose three linked hypotheses that explain the corpus structure as a consequence of cognitive constraints on manual information retrieval.

2.1. H1: Prefix as Segmentation/Class Index (Cowan’s Limit)

Human working memory is severely limited. While Miller [13] famously proposed 7 ± 2 chunks, we adopt the more conservative focus-of-attention bound of roughly 4 (Cowan [14]).
  • Observation: Stolfi’s analysis reveals a prefix inventory dominated by a small set (typically a single-digit dominant set, with 8 - - 12 high-frequency forms) [4].
  • Deduction: In a linear text stream, tokens must be delimited. If the prefix inventory is small and high-frequency, it functions as a Syntactic Segmentation Marker or Class Marker rather than a semantic carrier (which would require high cardinality). The small set thus signals: “New entity begins; it belongs to class i” (cf. medieval ledger formats using rubrics or markers to denote entry types).
  • Evidence (Label Compression): This interpretation is empirically supported by illustration labels, which frequently omit the prefix [16]. In a label, the visual isolation provides the segmentation directly, making the explicit delimiter redundant and subject to omission.

2.2. H2: Positional Coding (The “Combination Lock” Model)

Reading a phonetic string is a serial temporal process. By contrast, checking a multi-feature state (e.g., an odometer or combination lock) is a parallel visual process where the brain binds features to spatial locations pre-attentively [1].
  • The Mechanism: The “Mantle” (Root) exhibits extreme positional rigidity. We argue the token functions as a Visual Feature Vector. By assigning disjoint glyph sets to specific slots, the system creates a “combination lock” visual signature.
  • Cognitive Optimization: This maximizes search efficiency by supporting pre-attentive binding, contrasting with the slower serial guided search required for phonetic strings [2,15]. The user perceives the shape of the token holistically.
  • The Suffix Role: The “Core” (Suffix) exhibits extremely low entropy (typically < 2 bits), incompatible with encoding primary identity. Under Href, this is explained if the suffix encodes metadata (state, quantity, grade) rather than keys. This reduces the cardinality requirement on roots while preserving disambiguation.

2.3. H3: Sectional Partitioning and Context Switching

Currier identified that vocabulary changes almost entirely between thematic sections [5]. Under Href, this is not a dialect shift but a requirement for Referential Integrity.
  • Deduction: To prevent “collisions” (ambiguous pointers) in a finite symbol space, a database must partition its keys. If the token “8am” refers to a plant in Section A, it cannot refer to a star in Section B.
  • Context Switching: The illustrations serve as the visual signal for a Context Switch. They effectively swap the active namespace. Furthermore, because illustrations may be visually noisy or ambiguous, the prefix must remain robust enough to permit disambiguation when visual context is insufficient—explaining why prefixes persist in running text but are often omitted in isolated labels.

3. The Divergence of Cost Functions

The fundamental difference between Natural Language ( H l a n g ) and a Reference System (Href) can be expressed through their respective optimization functions.
Natural language evolves to minimize speaker effort (Zipf’s Law) [16]:
Cos t l a n g w P ( w ) · Length ( w )
Language compresses frequent words into short forms.
A Reference System evolves to minimize Retrieval Error and Search Latency:
Cos t r e f P ( Lookup Error ) 1 D ( Key i , Key j )
Here D denotes Hamming-like separability in a fixed-slot feature space; specialists may formalize this using weighted slot distances or perceptual similarity metrics, but the central requirement is maximizing slotwise discriminability. In the cost function above, we focus on error; search latency is the dual constraint that fixed-length templates and pre-attentive binding serve to minimize.
To minimize error, a reference system must maximize D across key pairs. This optimization pressure pushes the system toward:
1.
Fixed Length: To align visual slots (maximizing parallel comparison).
2.
High Uniqueness: High Hapax legomena rates, as every entity needs a distinct key.
3.
Rigid Morphology: To ensure features always appear in the expected visual location.
Rebuttal of Alternatives: Scribal shorthand produces shorter tokens (contradicting the VMS’s fixed-length structure), higher ambiguity (contradicting its high separability requirements), and lower type uniqueness because abbreviations are reused (contradicting the > 50 % hapax rate). Cryptographic ciphers aim to obscure distributional structure or require an external key—neither parsimoniously produces the triad of prefix/root/suffix decomposition, positional rigidity, and sectional reuse logic.

4. Distinguishing Deduction from Speculation

It is vital to distinguish what is mathematically forced by the structure from what remains historical speculation.

4.1. Structurally Forced (The Deduction)

The following attributes are not guesses; they are deductive consequences of the statistical fingerprint under the rior:
  • The Compositional Key: The rigid slots and low entropy within tokens are mathematically required for a parallel visual lookup system.
  • Namespace Partitioning: The sectional disjointness is required to maintain unique identifiers in a finite symbol system.

4.2. Content Speculation (The Unknown)

We cannot, from structure alone, deduce the semantic content. We can identify the functional role of components (Classifier, Key, Modifier), but not their values. Whether the keys map to specific recipes or astronomical events is speculative.

5. Historical Boundary Conditions: The Event Horizon

Deducing the prior also requires fitting the historical boundary conditions. We organize these into constraints on production (deduced from the artifact itself) and constraints on transmission (deduced from the historical silence).

5.1. Production Constraints (The Nature of the Tool)

  • Economic Constraint: The use of vellum implies a high-capital institutional project, likely a medical or alchemical compendium. Blair (2010) documents that medieval memory systems and reference databases were typically produced by institutional scribes, consistent with this artifact [17].
  • Operational Constraint (Hardware vs. Software): The lack of a preamble, index, or decoding key is consistent with the user being the author or a tightly trained, closed circle. The “software” (the decoding protocol, the mental sorting rules) was internal to the user; the “hardware” (the book) was merely the storage medium.

5.2. The Event Horizon (The Fate of the System)

The manuscript appears in the historical record (c. 1600s) long after its carbon dating (c. 1404–1438) as a mysterious object. This gap is consistent with a Sudden Cessation of Operations—a broken chain of transmission.
1.
The Broken Chain (Plague and Demise): The carbon dating aligns with the violent mortality cycles of the 15th-century plagues (e.g., 1438). If the author(s) met an untimely demise, the “mental software” required may have been lost along with them, leaving the artifact unusable without the training protocol.
2.
The Trap of Semblance: Because the text is structured (Zipfian-like roots, repeating prefixes) and the illustrations are identifiable, the manuscript looks like knowledge. This superficial resemblance attracted later scholars (alchemists, Rudolf II, Kircher) who assumed it was an encoded narrative ( H c i p h e r ) or a secret language ( H l a n g ). They failed because they were attempting to read a lookup tool.
3.
Obsolescence: By the time the manuscript resurfaced, the Renaissance had transformed European science. A medieval “database” of herbal or medical knowledge—even if decoded—would have been scientifically obsolete. Lacking both the access key and contemporary relevance, the manuscript transitioned from a functional tool to a curiosity, preserving it in a state of suspended animation.

5.3. Epistemic Distinction

Important caveat on historical reconstruction: The structural analysis (§2–4) is deductive from the artifact. The historical narrative presented in this section (broken chain, plague, obsolescence) is a coherence argument fitting boundary conditions. It is not deduced from structure alone, but from the combination of (a) the artifact’s evidence, (b) the 15th-century context, and (c) the absence of decoding keys. This narrative is falsifiable (e.g., if a decoding key or contemporary explanation emerges, the chain-breaking hypothesis is weakened), but it remains inferential rather than deductive.

6. Program for Future Empirical Work

This paper serves as a blueprint. We invite specialists to test these conditional predictions:
  • Computational Linguists: Measure the Hamming-distance separability of root forms. Compare observed root separability to null models generated from (a) shuffled slot assignments, (b) token-length-and-frequency matched phonotactic simulations, and (c) perceptual-weighting schemes. Report effect sizes and classification accuracy for VMS vs. null.
  • Cognitive Psychologists: Conduct visual search experiments using synthetic tokens generated by the Stolfi template vs. linguistic strings. Does the VMS structure support faster “pop-out” effects?
  • Medieval Historians: Survey 15th-century inventories and memory-art treatises [18] for structural parallels—specifically column-based or slot-based recording methods.
We emphasize that this is a mechanistic hypothesis: it explains why the observed statistical invariants would be expected for a retrieval-oriented artifact, and it issues specific, conditional predictions; empirical failure of a majority of those predictions would substantially weaken the retrieval interpretation.

7. Conclusion

The Voynich Manuscript is likely an early example of an External Memory Artifact—a structured reference book designed to offload complex domain knowledge [19]. Stolfi and Currier did not fail to find a language; they successfully mapped the Indexing Schema and Partitioning Logic of this system.
By re-injecting the constraints of visual processing and manual retrieval into the analysis, the “anomalies” of the VMS dissolve. They are replaced by a coherent picture of a system optimized for parallel access and high-integrity storage—an external memory artifact whose access protocol appears to have been lost.

Acknowledgments

We explicitly acknowledge the monumental contributions of Jorge Stolfi, Prescott Currier, and the teams behind the EVA transcription efforts. Their rigorous statistical documentation constitutes the empirical foundation of this work. Stolfi’s segmentation analysis and Currier’s sectional analysis are not “failed theories”; they are the precise measurements of the system’s topological invariants.

References

  1. Treisman, A.M.; Gelade, G. A feature-integration theory of attention. Cognitive Psychology 1980, 12, 97–136. [Google Scholar] [CrossRef] [PubMed]
  2. Wolfe, J.M. Guided Search 2.0: A revised model of visual search. Psychonomic Bulletin & Review 1994, 1, 202–238. [Google Scholar]
  3. Koestler, A. The Sleepwalkers: A History of Man’s Changing Vision of the Universe; Hutchinson, 1959. [Google Scholar]
  4. Rychterová, P.; et al. Statistical analysis of the Voynich manuscript, 2023. Working paper.
  5. Currier, P. Papers on the Voynich Manuscript. Technical report, NSA (declassified), 1976.
  6. Montemurro, M.A.; Zanette, D.H. Keywords and hierarchical organization in the Voynich manuscript. PLoS ONE 2013, 8, e66344. [Google Scholar]
  7. Timm, T.; Schinner, A. Quantitative linguistics confirms non-linguistic nature of the Voynich manuscript. arXiv 2023, arXiv:cs.CL/2309.15658. [Google Scholar]
  8. Tiltman, J.H. The Voynich Manuscript: “The Most Mysterious Manuscript in the World”. National Security Agency Technical Journal 1975. [Google Scholar]
  9. Sukhotin, B.V. Optimization methods for deciphering. Problems of Information Transmission 1977. [Google Scholar]
  10. Landini, G.; Zandbergen, R. Morphological analysis of the Voynich Manuscript. Cryptologia 2012, 36. [Google Scholar]
  11. Bowern, C.; Lindemann, L. The linguistics of the Voynich Manuscript. Annual Review of Linguistics 2021, 7, 285–308. [Google Scholar] [CrossRef]
  12. Durdanovic, I. A Tutorial on Rigorous Scientific Inference: Bayesian Model Selection, the Zero-Patch Standard, and the Deductive Primacy of Axioms. Preprints 2026. [Google Scholar]
  13. Miller, G.A. The magical number seven, plus or minus two. Psychological Review 1956, 63, 81–97. [Google Scholar] [CrossRef] [PubMed]
  14. Cowan, N. The magical number 4 in short-term memory. Behavioral and Brain Sciences 2001, 24, 87–114. [Google Scholar] [CrossRef] [PubMed]
  15. Wolfe, J.M. Visual search. In Attention; Pashler, H., Ed.; Psychology Press, 1998; pp. 13–73. [Google Scholar]
  16. Zipf, G.K. Human Behavior and the Principle of Least Effort; Addison-Wesley, 1949. [Google Scholar]
  17. Blair, A.M. Too Much to Know: Managing Scholarly Information before the Modern Age; Yale University Press, 2010. [Google Scholar]
  18. Yates, F.A. The Art of Memory; Routledge, 1966. [Google Scholar]
  19. Clark, A.; Chalmers, D. The extended mind. Analysis 1998, 58, 7–19. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated