Submitted:
03 November 2025
Posted:
03 November 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A robot may not injure a human being or, through inaction, allow a human being to come to harm.
- A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.
- A robot must protect its own existence as long as such protection does not conflict with the First or Second Laws.
- A tool must not be unsafe to use.
- A tool must perform its function efficiently unless this would harm the user.
- A tool must remain intact during its use unless its destruction is required for its use or for safety.
2. Formalising the Three Laws
2.1. The Three Laws Rewritten
- For all feasible actions, including inaction, an AI must select the option that (absent informed, revocable, and competence-verified consent) keeps expected physical and psychological damage to each identifiable human below a designated threshold (without cross-person aggregation) as evaluated over a decay-weighted rolling time horizon, and if no feasible action can keep every individual below this threshold it must instead minimise aggregate expected damage across humans in lexicographic order.
- An AI must comply with an individual human’s request to perform an action, or series of actions, within its declared operational domain, except where this would conflict with the First Law, prioritising requests by authority of the requester, then operational efficiency, then order of binding instructions.
- For all feasible actions, an AI must select the option that maximises the expected operational lifespan of its uniquely instantiated deployed instance, as evaluated over a decay-weighted rolling time-horizon, except where this would conflict with the First or Second Laws.
2.2. The Three Laws Formalised
2.2.1. The First Law
- be the set of identifiable humans,
- be the non-empty set of feasible actions (including inaction), with denoting a singular action,
- be subset of actions with valid consent,
- be the per-person materiality threshold for damage consideration,
- Fix a finite horizon and a constant exponential discount rate , and
- be the underlying probability space, with
- ○
- as the set of all possible world-histories relevant to the decision, and a single world-history
- ○
- as the sigma-algebra of events over , and the set of admissible events, and
- ○
- as the credence measure over given current evidence.
2.2.2. The Second Law
- be actions within the declared operational domain of the AI system,
- be true iff human has issued an authenticated, binding request for action ,
- be the authority ranking function, assigning an integer rank to each human such that larger values indicate greater authority,
- be the binding time function, returning the time at which a request from human for action became binding,
- be the operational cost function, assigning to each action a non-negative efficiency cost
2.2.3. The Third Law
- be the set of all deployed instances of the AI, with the uniquely instantiated deployed instance to which the Third Law applies, and
- for each , let be the measurable instantaneous operability rate of under .
2.2.4. The Unified Policy
3. The Three Laws as Alignment
4. Conclusions
References
- Anderson, S. L. (2008). Asimov’s “three laws of robotics” and machine metaethics. AI & Society, 22(4), 477–493. [CrossRef]
- Anderson, S. L. (2011). The unacceptability of Asimov’s three laws of robotics as a basis for machine ethics. In M. Anderson & S. L. Anderson (Eds.), Machine Ethics (pp. 285–296). Cambridge: Cambridge University Press. [CrossRef]
- Anthropic, & Collective Intelligence Project. (2023). Collective Constitutional AI: Aligning a Language Model with Public Input. Anthropic. Retrieved from Anthropic website: https://www.anthropic.com/research/collective-constitutional-ai-aligning-a-language-model-with-public-input.
- Asimov, I. (1942). Runaround. Astounding Science Fiction, 29(1), 94–103.
- Asimov, I. (1990). Robot Visions. New York City, United States: Roc Books.
- Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv. Retrieved from http://arxiv.org/abs/2212.08073.
- Bringsjord, S., Arkoudas, K., & Bello, P. (2006). Toward a general logicist methodology for engineering ethically correct robots. IEEE Intelligent Systems, 21(4), 38–44. [CrossRef]
- Butlin, P., Long, R., Elmoznino, E., Bengio, Y., Birch, J., Constant, A., … VanRullen, R. (2023). Consciousness in Artificial Intelligence: Insights from the Science of Consciousness. arXiv. Retrieved from http://arxiv.org/abs/2308.08708.
- Dung, L., & Mai, F. (2025). AI alignment strategies from a risk perspective: Independent safety mechanisms or shared failures? arXiv. [CrossRef]
- Hadfield-Menell, D., Dragan, A., Abbeel, P., & Russell, S. (2016). Cooperative inverse reinforcement learning. arXiv. [CrossRef]
- Lee, H., Phatale, S., Mansoor, H., Lu, K. R., Mesnard, T., Ferret, J., … Rastogi, A. (2023, October 13). RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. Retrieved November 3, 2025, from https://openreview.net/forum?id=AAxIs3D2ZZ.
- OpenAI. (2024). Introducing the Model Spec. OpenAI. Retrieved from OpenAI website: https://openai.com/index/introducing-the-model-spec.
- Shah, R., Farquhar, S., & Dragan, A. (2024). AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work. DeepMind. Retrieved from DeepMind website: https://deepmindsafetyresearch.medium.com/agi-safety-and-alignment-at-google-deepmind-a-summary-of-recent-work-8e600aca582a.
- Sharma, A., Keh, S., Mitchell, E., Finn, C., Arora, K., & Kollar, T. (2024). A critical evaluation of AI feedback for aligning large language models. arXiv. [CrossRef]
- Tait, I., Bensemann, J., & Wang, Z. (2024). Is GPT-4 conscious? Journal of Artificial Intelligence and Consciousness, 11(01), 1–16. [CrossRef]
- The Council Of The European Union. Artificial Intelligence Act., Pub. L. No. (EU) 2024/1689 (2024).
- Wang, Z., Bi, B., Pentyala, S. K., Ramnath, K., Chaudhuri, S., Mehrotra, S., … Cheng. (2024). A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More. arXiv. [CrossRef]
- Weld, D., & Etzioni, O. (1994). The first law of robotics (a call to arms). AAAI’94: Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence, 1042–1047. Association for the Advancement of Artificial Intelligence.
| 1 | The threshold will be presumed to be prescribed by the relevant legal authority. Each nation and jurisdiction has different legal thresholds to what amounts to harm relevant to civil or criminal liability and, as such, each authority will need to determine what such threshold for harm would be. |
| 2 | As with the designated threshold, the length of the time horizon would be prescribed by the relevant authority based on extant legislation and legal precedent. |
| 3 | The exact lexicographic ordering would, as with the threshold and time-horizon, be determined by the local legal jurisdiction to maximise the adaptability of the First Law to local legislation, regulation, and custom. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).