Preprint
Essay

This version is not peer-reviewed.

From the Encyclopaedia to the Vector: The Corpus Pipeline as Epistemic Act in Psychoanalytic AI

  † PhD in Psychoanalysis (Universidade Veiga de Almeida) and MSc in Engineering (COPPE/UFRJ). Founder and researcher at TMU-LAB (The Machine Unconscious Lab), dedicated to the intersection of psychoanalysis and artificial intelligence.

Submitted:

22 June 2026

Posted:

24 June 2026

You are already at the latest version

Abstract
This essay argues that, in artificial intelligence systems applied to psychoanalytic domains, the quality of the knowledge-acquisition pipeline can weigh as heavily as the language model in determining the quality of the system, and sometimes more. It presents a four-stage pipeline that transforms publicly accessible academic encyclopaedias into a structured vector database: standardised crawling with traceability, hierarchical chunking that preserves the semantic structure of the original text, enrichment through knowledge graphs and fragmented LLM extraction, and local multilingual vectorisation. It documents the Terminology Guard as a protection layer against the terminological poisoning that automatic translation models produce in specialised domains. It proposes four concepts: the pipeline as epistemic act, hierarchical chunking as preservation of meaning, the Terminology Guard as defence against terminological epistemicide, and traceability as the ethical condition of citation.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Accessibility

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated