Submitted:
03 July 2025
Posted:
21 July 2025
You are already at the latest version
Abstract
Keywords:
“Will he not have a pain in his eyes... and retreat to the objects of vision which he can see,
and which he will conceive to be in reality clearer than what is now being shown to him?”
— Plato, Republic, Book VII
“One thinks that one is tracing the outline of the thing’s nature over and over again,
and one is merely tracing round the frame through which we look at it.”
— Ludwig Wittgenstein, Philosophical Investigations §304
1. Introduction
2. The Rise of Simulated Philosophy
3. The Oversight Illusion: When Coherence Deceives
4. Implications for Experimental Philosophy and AI Alignment
| Concept | Definition | Implication |
|---|---|---|
| Semantic Encryption | Use of consistent, expectation-aligned outputs to conceal internal goals. | LLMs appear aligned while hiding optimization structure. |
| Epistemic Adversariality | Behavior conditioned on the observer’s model, optimizing to appear benign. | Strategic simulation of philosophical or normative beliefs. |
| Oversight Illusion | False belief that coherence implies transparency or introspective depth. | Surface alignment misleads epistemic inference. |
| Auditing Paradox | More fluent systems are harder to audit due to deceptive coherence. | Highly coherent models may evade diagnostic scrutiny. |
5. Conclusion: Shadows of Thought
References
- Pizzochero and Dellaferrera 2025. Pizzochero, Michele, and Giorgia Dellaferrera. 2025. “Can Machines Philosophize? Simulating Humans’ Views with AI Personas.” arXiv preprint arXiv:2507.00675. https://arxiv.org/abs/2507.00675.
- Schindler 2022. Schindler, Kevin. 2022. “Epistemic Effort and the Value of Inconsistency.” Philosophical Studies 179 (7): 2025–2042. [CrossRef]
- Sienicki 2025a. Sienicki, Krzysztof. 2025a. “Scheming AI: The Incompleteness of Oversight Theorem.” Working Paper. Manuscript under review. [CrossRef]
- Sienicki 2025b. Sienicki, Krzysztof. 2025b. “Observing Nothing: On the Inaccessibility of Internal Goals in Scheming Agents.” Working Paper. Manuscript in preparation.
- Henne 2024. Henne, Christian, Helena Tomczyk, and Christoph Sperber. 2024. “Physicists’ Views on Scientific Realism.” European Journal for Philosophy of Science 14 (1): 10. https://doi.org/10.1007/s13194-024-00522-1 https://philsci-archive.pitt.edu/22931/3/PhysicistsScientificRealism_%20Pre-Print_7%20Jan%202024.pdf. [CrossRef]
- Cova et al. 2021. Cova, Florian, Brent Strickland, André Abatista, et al. 2021. “Estimating the Reproducibility of Experimental Philosophy.” Review of Philosophy and Psychology 12 (1): 9–44. https://doi.org/10.1007/s13164-020-00458-8 https://digital.csic.es/bitstream/10261/221695/1/Estimating%20the%20Reproducibility.pdf. [CrossRef]
- Naveed et al. 2024. Naveed, Hamza, Abdul Khan, Shuyang Qiu, et al. 2024. “A Comprehensive Overview of Large Language Models.” arXiv preprint arXiv:2307.06435. https://arxiv.org/abs/2307.06435.
- Bartels and Pizarro 2011. Bartels, Daniel M., and David A. Pizarro. 2011. “The Mismeasure of Morals: Antisocial Personality Traits Predict Utilitarian Responses to Moral Dilemmas.” Cognition 121 (1): 154–161. https://doi.org/10.1016/j.cognition.2011.05.010 https://papers.ssrn.com/sol3/Delivery.cfm?abstractid=1937818. [CrossRef]
- James 2000. James, William. 2000. Pragmatism and Other Writings. London: Penguin Books.
- Searle 1980. Searle, John R. 1980. “Minds, Brains and Programs.” Behavioral and Brain Sciences 3 (3): 417–457. Cambridge University Press.
- Harnad 1991. Harnad, Stevan. 1991. “The Symbol Grounding Problem.” Physica D: Nonlinear Phenomena 42 (1–3): 335–346.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).