Democritus: Inferring Causality from Language

Sridhar Mahadevan

doi:10.20944/preprints202604.1011.v1

Submitted:

13 April 2026

Posted:

14 April 2026

You are already at the latest version

Abstract

We describe the evolution of DEMOCRITUS, a system for inferring causality from language. Extracting causal claims from natural language is unstable under paraphrase granularity shifts, and context drift. A document collection may express the same causal statement in many surface forms, while neighboring studies may agree locally yet fail to glue globally because relation families or polarities change across regimes. This paper studies that problem through successive versions of DEMOCRITUS, an implemented system for compiling documents into local causal models, causal databases, and interactive diagnostic artifacts. Our central claim is that categorical homotopy offers a useful computational language for finding equivalence classes of paraphrastic causal statements while avoiding collapsing genuinely distinct claims. We formalize weak equivalence between causal mentions via a normalization functor, motivate localization into homotopy classes of extracted claims, and connect missing higher-order coherence to failures of causal gluing. We then describe how these ideas are realized in the current DEMOCRITUS that uses an AGI chatbot named CLIFF (Consciousness Layer Interface to Functor Flow) pipeline through homotopy-localized claim classes, regime-gluing diagnostics, topic partitions, archived experimental artifacts, topos-style study collation via soft pullbacks and pushout merges, and an underlying categorical learning stack based on Diagrammatic Backpropagation, Geometric Transformers, and Kan Extension Transformers. Finally, we report focused case studies, including Mediterranean diet, red-wine cardiovascular studies, and rising-ocean-temperature corpora, showing that homotopy localization reduces paraphrase inflation while preserving diagnostically important regime-sensitive and obstructed claims.

Keywords:

causal inference

;

large language models

;

category theory

;

homotopy

;

natural language processing

;

causal extraction

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Democritus: Inferring Causality from Language

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe