Submitted:
27 March 2026
Posted:
14 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
A unified epistemic program.
PLM is forced by the dynamics, not merely inspired by them.
The epistemic problem with probabilistic closure.
Overview.
1.1. Contributions
Definitions (self-contained constructions):
- 1.
- Possibilistic vocabulary representation. We define a vocabulary support set at each generation step and associate with it a possibility distribution encoding admissibility rather than probability (Section 4).
- 2.
- Epistemic Possibilistic Attention (EPA). We define an attention operator that replaces softmax likelihood weighting with a whitened innovation compatibility gate, falsifying keys whose residual energy exceeds an admissible innovation bound (Section 5).
- 3.
- Epistemic diagnostics for generation. We define necessity, surprisal, and epistemic tension for autoregressive generation (Section 9).
Derivations (inherited from prior TEAG theory):
- 4.
- PCRB-regularized training objective. We derive a training loss grounded in possibilistic entropy , with a PCRB penalty preventing inadmissible vocabulary collapse per generation step. The justification of the PCRB as a universal floor is inherited from Theorem 5.2 of Jah (2026a); it is not proved independently here (Section 7).
Conditional theorems (proved here, under stated assumptions):
- 5.
- VFI for token support sets. We extend the Volumetric Faithfulness Invariant of Jah and Haslett (2025) to vocabulary hypothesis spaces and prove that, assuming the VFI is maintained through depth (an open proof obligation), non-degenerate generation from actual vocabulary tokens is guaranteed (Section 8).
- 6.
- Gaussian limit recovery. We prove that, under conditions (G1)–(G3) on support cloud geometry and token co-occurrence statistics, the PLM recovers a standard transformer. These conditions are not trivially satisfied by all language domains; their scope is discussed explicitly (Section 11).
1.2. Relation to Existing Work
1.3. Conceptual Overview: Probabilistic Versus Possibilistic Generation
2. Background
2.1. Standard Transformer Language Models
2.2. Possibility Theory
2.3. The Epistemic Support-Point Filter
- Geometric surprisal: , where and is the Cholesky factor of the innovation shape matrix .
- Compatibility: .
- Possibilistic entropy: , where .
- Possibilistic Cramér–Rao Bound (PCRB): .
- Volumetric Faithfulness Invariant (VFI): a joint condition on support geometry, spread parameter, and survivor count ensuring well-posedness.
2.4. The Tropical Hamilton–Jacobi Foundation
The two scalar fields.
The update is an algebraic identity.
The axioms force this structure.
Epistemic time in generation.
3. Related Work
Probabilistic LLM uncertainty.
Imprecise and set-based uncertainty.
Structured decoding and constrained generation.
Retrieval-grounded and symbolic-neural hybrids.
Token masking.
Possibilistic reasoning in NLP.
Epistemic neural networks.
Summary.
4. Possibilistic Vocabulary Representation
4.1. The Vocabulary as a Hypothesis Space
4.2. Embedding-Space Representation and the Geometry Assumption
4.3. Possibilistic -Cuts and Epistemic Volume
5. Epistemic Possibilistic Attention (EPA)
5.1. Whitened Innovation Geometry in Attention
5.2. Compatibility and Falsification
5.3. Possibilistic Attention Operator
- 1.
- Conjunctive possibility update: .
- 2.
- Max-rescaling for ordinal integrity: .
- 3.
- Falsification gate: keys with are excluded (), where ensures at least keys survive.
- 4.
- Output aggregation:
5.4. Multi-Head Possibilistic Attention
6. PLM Architecture
6.1. Overview
- 1.
- Input encoding. Token embeddings are computed as in a standard transformer, with positional encodings.
- 2.
- Possibilistic encoder. layers of EPA with position-wise feed-forward networks compute a sequence of hidden representations .
- 3.
- Possibilistic decoder. layers of masked self-EPA and cross-EPA compute decoder hidden states.
- 4.
- Vocabulary falsification. The final hidden state is projected to the embedding space via , yielding a query vector into the embedding matrix E.
- 5.
- Possibilistic output head. Compatibility scores between and all vocabulary embeddings are computed via the EPA residual geometry, yielding after max-rescaling.
- 6.
- Generation commitment. The minimax medoid commits the filter to the surviving vocabulary hypothesis most geometrically central in admissible embedding space.
6.2. Spreader and Regeneration in Embedding Space
6.3. Epistemic Width Monitor
7. Training Objective
7.1. Possibilistic Cross-Entropy
7.2. PCRB Regularization
7.3. Possibilistic Entropy Regularization
7.4. Total Loss
8. Volumetric Faithfulness Invariant for Token Support Sets
- (i)
- Geometric coverage. contains vocabulary embeddings at multiple radial scales and in all principal directions under .
- (ii)
- Admissibility. All correspond to tokens with .
- (iii)
- Anchor faithfulness. The minimax medoid is an actual surviving vocabulary embedding.
- (iv)
- Non-degeneracy. .
- (v)
- Survivor floor. .
- (a)
- The epistemic vocabulary volume satisfies for all k.
- (b)
- The possibilistic entropy remains finite and bounded from below.
- (c)
- The minimax medoid commitment corresponds to an actual vocabulary token at every step; no hallucinated or interpolated token is committed.
9. Epistemic Diagnostics for Generation
9.1. Field-Theoretic Interpretation of Diagnostics
10. Spread Control and PCRB Enforcement
11. Recovery of the Standard LLM in the Gaussian Limit
- (G1)
- The vocabulary support cloud contracts to a Gaussian in : as .
- (G2)
- The MVEE shape matrix .
- (G3)
- Token co-occurrence statistics are Gaussian with covariance .
| Order p | Functional | Criterion |
| Popperian minimax (falsification) | ||
| PLM minimax-entropy optimality | ||
| , Gaussian | Standard LLM cross-entropy |
12. Multi-Modal and Multi-Sensor Extensions
13. Limitations and Open Proof Obligations
- Open 1:
- Vocabulary VFI maintenance under depth. Theorem 1 proves non-degenerate generation assuming the VFI is maintained. We have not proved that the PLM’s EPA layers jointly maintain the VFI through depth. Until this is established, the Class 1 anti-hallucination guarantee is conditional. This is the most consequential open obligation.
- Open 2:
- Scalability of vocabulary support geometry. For large vocabularies (–), exact MVEE computation is computationally prohibitive. Efficient approximations require formal coverage guarantees under the VFI.
- Open 3:
- Formal PAC-possibilistic learnability. A formal learnability theory for PLM training under the PCE loss is not yet established.
- Open 4:
- Reliability damping sufficiency. The formal proof that reliability damping prevents VFI violation when the survivor count is near-minimal carries over from ESPF Open 3 [14] but has not been proved in the discrete vocabulary setting.
- Open 5:
- Training stability and regeneration validity. The interaction between , , and in a gradient-based optimizer has not been validated. Gradient behavior of the PCE loss, sensitivity to the trivial-cloud failure mode, and the validity of the vocabulary regeneration step (Remark 6) are all open engineering and theoretical questions.
- Open 6:
- Empirical validation of the embedding geometry assumption. As noted in Remark 1, whether MVEE geometry over learned token embeddings faithfully tracks semantic admissibility is an open empirical question. This is a first-priority experimental obligation before the PLM can be claimed to solve Class 1 hallucinations in a meaningful sense.
14. Discussion and Research Agenda
14.1. Adaptive Switching Between PLM and Standard LLM
14.2. What TEAG Solves, Detects, and Does Not Address in the Hallucination Problem
14.3. Connection to Kalman’s Vision
14.4. Research Agenda
- 1.
- Empirical validation of embedding geometry. Before larger architectural claims, establish whether compatibility scores derived from token embedding MVEE geometry correlate with human judgments of semantic admissibility. Even a small-scale study would significantly strengthen the foundation of Section 4.
- 2.
- Efficient vocabulary geometry. Develop approximations to MVEE computation for large vocabularies.
- 3.
- Pre-training from scratch. Train a small PLM (∼125M parameters) on a standard benchmark corpus and compare epistemic diagnostics against a comparably sized standard LLM.
- 4.
- Fine-tuning from probabilistic checkpoints. Develop a curriculum for converting a pretrained probabilistic LLM into a PLM by gradually introducing the EPA operator and PCRB regularization.
- 5.
- Hallucination detection benchmarks. Evaluate whether , , and correlate with factual errors on knowledge-intensive tasks (TriviaQA, NaturalQuestions, HaluEval).
- 6.
- Adaptive switching calibration. Systematic benchmarking of , , , , and across language tasks and model scales.
- 7.
- GaiaVerse integration. Deploy the PLM as the epistemic inference engine within the GaiaVerse planetary stewardship knowledge graph.
15. Conclusion
Acknowledgments
References
- Hokamp, C. and Liu, Q. (2017). Lexically constrained decoding for sequence generation using grid beam search. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1535–1546.
- Angelopoulos, A. N.; Bates, S. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv 2022, arXiv:2107.07511. [Google Scholar] [CrossRef]
- Benferhat, S.; Dubois, D.; Garcia, L.; Prade, H. On the transformation between possibilistic logic bases and possibilistic causal networks. International Journal of Approximate Reasoning 2002, 29(2), 135–173. [Google Scholar] [CrossRef]
- Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL, pages 4171–4186.
- Dubois, D. and Prade, H. (1988). Possibility Theory: An Approach to Computerized Processing of Uncertainty. Plenum Press.
- Dubois, D.; Prade, H. Possibility theory in information fusion. International Journal of Intelligent Systems 2000, 15(7), 621–640. [Google Scholar]
- Dubois, D., Prade, H., and Sabbadin, R. (1997). A possibilistic logic machinery for qualitative decision. In Proceedings of the AAAI Spring Symposium, pages 47–54.
- Ethayarajh, K. (2019). How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. Proceedings of EMNLP, pages 55–65.
- Fan, A.; Lewis, M.; Dauphin, Y. Hierarchical neural story generation. arXiv 2018, arXiv:1805.04833. [Google Scholar] [CrossRef]
- Gal, Y. and Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. Proceedings of ICML, pages 1050–1059.
- Guo, C., Pleiss, G., Sun, Y., and Weinberger, K. Q. (2017). On calibration of modern neural networks. Proceedings of ICML, pages 1321–1330.
- Holtzman, A., Buys, J., Du, L., Forbes, M., and Choi, Y. (2020). The curious case of neural text degeneration. Proceedings of ICLR.
- Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., and Xing, E. P. (2017). Toward controlled generation of text. Proceedings of the 34th International Conference on Machine Learning (ICML), pages 1587–1596.
- Jah, M. K.; Haslett, V. The Epistemic Support-Point Filter (ESPF): A bounded possibilistic framework for ordinal state estimation. arXiv 2025, arXiv:2508.20806. [Google Scholar] [CrossRef]
- Jah, M. K. The Epistemic Support-Point Filter: Jaynesian maximum entropy meets Popperian falsification. A possibilistic minimax-entropy optimality proof. arXiv 2026a, arXiv:2603.10065. [Google Scholar]
- Jah, M. K. The geometry of knowing: From possibilistic ignorance to probabilistic certainty. A measure-theoretic framework for epistemic convergence. arXiv 2026b, arXiv:submit/7362363. [Google Scholar]
- Jah, M. K. (2026c). Theory of Epistemic Abductive Geometry (TEAG): A unified theory of admissibility-driven inference across dynamical systems, measure theory, and language. Manuscript in preparation.
- Jah, M. K. (2026d). The Epistemic Support-Point Filter as a Tropical Hamilton–Jacobi System: Wavefront Propagation and Possibilistic Inference. Preprints.org, version 2, posted 31 March 2026. [CrossRef]
- Kalman, R. E. (1982). System identification from noisy data. In A. Bednarek and L. Cesari (eds.), Dynamic Systems II: A University of Florida International Symposium, pages 135–164. Academic Press, New York.
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; Riedel, S.; Kiela, D. Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems 2020, 33. [Google Scholar]
- Maddox, W. J.; Izmailov, P.; Garipov, T.; Vetrov, D. P.; Wilson, A. G. A simple baseline for Bayesian uncertainty in deep learning. Advances in Neural Information Processing Systems 2019, 32. [Google Scholar]
- Malinin, A. and Gales, M. (2021). Uncertainty estimation in autoregressive structured prediction. Proceedings of ICLR.
- Marcus, G. The next decade in AI: Four steps towards robust artificial intelligence. arXiv 2020, arXiv:2002.06177. [Google Scholar] [CrossRef]
- Post, M. and Vilar, D. (2018). Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. Proceedings of NAACL.
- Senséoy, M.; Kaplan, L.; Kandemir, M. Evidential deep learning to quantify classification uncertainty. Advances in Neural Information Processing Systems 2018, 31. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Advances in Neural Information Processing Systems 2017, 30. [Google Scholar]
- Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities. Chapman and Hall.
- Zadeh, L. A. Fuzzy sets. Information and Control 1965, 8(3), 338–353. [Google Scholar] [CrossRef]
- Zadeh, L. A. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems 1978, 1(1), 3–28. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).