Submitted:
13 April 2026
Posted:
14 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Tier 1 Claims
- Rising ocean temperatures lead to krill moving to deeper waters, which reduces the food availability for antarctic fur seals. Supported by: Habitat loss and fragmentation driven by climate change; Sea ice role in global cooling; Sentinel species as climate indicators.
- Global warming causes the loss of antarctic sea ice, which leads to the degradation of emperor penguin breeding habitats. Supported by: Habitat loss and fragmentation driven by climate change.
1.2. Tier 2 Claims
- Loss of antarctic sea ice leads to emperor penguin breeding habitat loss, which increases chick mortality. Supported by: Sea ice role in global cooling.
- The reduction in krill near the surface caused by warming ocean temperatures affects the foraging success of antarctic fur seals, contributing to their population decline. Supported by: Sea ice role in global cooling.
- Loss of antarctic sea ice leads to the destruction of emperor penguin breeding habitats, which reduces suitable nesting areas Supported by: Sea ice role in global cooling.
1.3. Tier 3 Claims
- The loss of emperor penguin breeding habitat increases chick mortality by reducing safe and suitable nesting areas. Supported by: Habitat loss and fragmentation driven by climate change
2. Scientific Challenges in Building Democritus
- 1.
- We formulate causal extraction from language as a passage from textual mentions to normalized causal claims, and use this to define weak equivalence and homotopy localization for causal discourse.
- 2.
- We describe how these ideas are implemented in Democritus and surfaced in Cliff through homotopy localization, regime gluing, topic partitions, archived run artifacts, and a categorical learning stack built from Diagrammatic Backpropagation, Geometric Transformers, and Kan Extension Transformers.
- 3.
- We present selected empirical results from saved runs showing both the success and the limits of the approach: focused corpora yield stable multi-regime glued claims, red-wine study collation yields auditable pullback and pushout witnesses, and broader corpora reveal fragmented covers and failures of descent.
- 4.
- We argue that reproducible interactive artifacts are themselves a useful experimental method for studying causality from language, because they preserve the local covers, gluing failures, and supporting causal models that are usually flattened away in static summaries.
3. Homotopy Localization of Causal Mentions
4. A Categorical Bridge from Discourse to Models
4.1. Discourse Space and Model Space
4.2. The Discourse–Model Adjunction

4.3. A Universal Property for Causal Grounding
4.4. Worked Grounding Examples
4.5. Simplicial Diagnostics for Localized Claim Families
5. Categorical Machine-Learning Machinery Behind Democritus
6. System Realization in Democritus and Cliff
6.1. Cliff as the Interface Layer
- 1.
- Homotopy localization. The Csql bundle materializes canonical subject–relation–object classes and localizes paraphrastic claim families directly inside the database.
- 2.
- Regime gluing. A second family of views preserves canonical regimes and relation families, allowing the system to classify localized claims as multi-regime glued, regime-sensitive, or obstructed.
- 3.
- Topic partitions. Broad queries often retrieve heterogeneous document sets. Instead of forcing a flat synthesis, Cliff first decomposes the corpus into local topic covers.
- 4.
- Archived interactive artifacts. Runs are saved and indexed, so the full set of generated artifacts can be reopened later, reused in papers, and reproduced from the public code base.
7. Experimental Framing
- 1.
- Focused corpora. Narrow domains such as the Mediterranean diet tend to produce coherent local basins, interpretable topic partitions, and stable regime-glued claims.
- 2.
- Broad but recoverable corpora. Queries such as rising ocean temperatures can still produce useful gluing results once topic filtering and atlas sanitation are improved, even though the underlying domain remains multi-regime.
- 3.
- Overbroad corpora. Queries such as climate change or economic inflation often decompose into multiple local covers. In these cases the atlas and partition layers are more informative than any single flattened global summary.
7.1. Mediterranean Diet as a Focused Corpus
| Mediterranean diet run | Count |
|---|---|
| Cross-document classes | 0 |
| Within-document families | 978 |
| Coherent classes | 958 |
| Partially glued classes | 2 |
| Disconnected classes | 18 |
| Multi-regime glued claims | 26 |
| Regime-sensitive claims | 4 |
| Obstructed claims | 0 |
7.2. Red-Wine Study Collation as Topos-Style Gluing

| Object / category | Count | Interpretation |
|---|---|---|
| Edges in atlas A | 300 | Cardiovascular red-wine discourse |
| Edges in atlas B | 268 | Resveratrol-review discourse |
| Soft pullback matches | 2 | Cross-study aligned mechanisms |
| Pushout edges | 566 | Union modulo matched overlaps |
| CONSENSUS | 1 | High-similarity glued claim |
| WEAK_CONSENSUS | 1 | Lower-similarity but auditable glued claim |
| A_ONLY | 298 | Study-specific claim retained from atlas A |
| B_ONLY | 266 | Study-specific claim retained from atlas B |
7.3. A Single-Document PDF Experiment: Emperor Penguins
| Emperor penguin PDF run | Count |
|---|---|
| Documents | 1 |
| Extracted claims | 330 |
| Domains | 18 |
| Unique aggregated edges | 318 |
| Repeated within-document edges | 10 |
| Maximum support for one edge | 4 |
7.4. Rising Ocean Temperatures as a Multi-Regime Corpus
| Rising ocean temperatures run | Count |
|---|---|
| Documents | 6 |
| Cross-document classes | 0 |
| Within-document families | 1570 |
| Coherent classes | 1522 |
| Partially glued classes | 7 |
| Disconnected classes | 41 |
7.5. Broad Corpora and Topic Partitions
8. Related Work
9. Discussion and Limitations
9.1. From Static Causal Snapshots to Temporal Diffusion and Workflow Repair
10. Conclusions
Appendix A. System and Database Appendix
Appendix B. Reproducing the Results with the GitHub Cliff Implementation
Appendix C. Archived Source Covers and Manifold Diagnostics
Appendix D. Additional Categorical Background
Appendix E. Additional Experimental Directions
References
- Mahadevan, S. Large Causal Models from Large Language Models, 2025, [arXiv:cs.AI/2512.07796]. [CrossRef]
- Solanki, H.; Jain, V.; Thirumalai, K.; Rajagopalan, B.; Mishra, V. River drought forcing of the Harappan metamorphosis. Nature Communications Earth and Environment 2025. [Google Scholar] [CrossRef]
- Mahadevan, S. Categories for AGI. https://people.cs.umass.edu/~mahadeva/papers/catagi.pdf, 2026. Textbook draft.
- Mahadevan, S. CSQL: Mapping Documents into Causal Databases, 2026, [arXiv:cs.DB/2601.08109]. [CrossRef]
- Pearl, J. Causality: Models, Reasoning and Inference, 2nd ed.; Cambridge University Press: USA, 2009. [Google Scholar]
- Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search, Second Edition; Adaptive computationand machine learning, MIT Press, 2000. [Google Scholar]
- Imbens, G.W.; Rubin, D.B. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction; Cambridge University Press: USA, 2015. [Google Scholar]
- Pearl, J. Causality: Models, Reasoning and Inference, 2nd ed.; Cambridge University Press: USA, 2009. [Google Scholar]
- Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search, 2nd ed.; MIT Press, 2000. [Google Scholar]
- Spirtes, P.; Meek, C. Causal Inference and Causal Explanation with Background Knowledge. UAI, 1995. [Google Scholar]
- Chickering, D.M. Optimal Structure Identification with Greedy Search. Journal of Machine Learning Research 2002, 3, 507–554. [Google Scholar]
- Scutari, M.; Denis, J.B. Bayesian Networks: With Examples in R. 2014. [Google Scholar]
- Zheng, X.; Aragam, B.; Ravikumar, P.; Xing, E. DAGs with NO TEARS: Continuous Optimization for Structure Learning. Advances in Neural Information Processing Systems, 2018. [Google Scholar]
- Brouillard, P.; Lachapelle, S.; Lacoste, A. Differentiable Causal Discovery from Interventional Data. Advances in Neural Information Processing Systems, 2020. [Google Scholar]
- Mahadevan, S. Higher Algebraic K-Theory of Causality. Entropy 2025, 27. [Google Scholar] [CrossRef] [PubMed]
- Lambek, J. The Mathematics of Sentence Structure; 1958; Vol. 65, pp. 154–170. [Google Scholar]
- Asudeh, A.; Giorgolo, G. Enriched Meanings: Natural Language Semantics with Category Theory; Oxford University Press, 2020. [Google Scholar]
- Mahadevan, S. Intuitionistic j-Do-Calculus in Topos Causal Models, 2025, [arXiv:cs.LO/2510.17944].
- Mahadevan, S. Universal Causal Inference in a Topos. In Proceedings of the NeurIPS, 2025; 2025. [Google Scholar]
- Mahadevan, S. Decentralized Causal Discovery using Judo Calculus, 2025, [arXiv:cs.AI/2510.23942]. [CrossRef]
- Hashimoto, K.; Inui, K. End-to-End Neural Causal Relation Extraction. ACL, 2016. [Google Scholar]
- Luan, Y.; He, L.; Ostendorf, M.; Hajishirzi, H. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. EMNLP, 2018. [Google Scholar]
- Thorne, J.; Vlachos, A. Automated Fact Checking: Task Formulations, Methods and Future Directions. COLING, 2018. [Google Scholar]
- Garg, P.; Fetzer, T. Testing Causal Claims in Economics. arXiv preprint arXiv:2501.06873 2025. Dataset and analysis of causal claims extracted from economics papers.
- Hogan, A.; Blomqvist, E.; Cochez, M.; et al. Knowledge Graphs. ACM Computing Surveys 2021, 54. [Google Scholar] [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems, 2020. [Google Scholar]




Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).