Submitted:
09 April 2026
Posted:
15 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Problem Statement
1.1.1. Institutional Fragility of Safety Alignment
1.1.2. The Crisis of Non-Contractual Behavior
1.1.3. Risk Escalation in Distributed Agentic Ecosystems
1.1.4. Failure of Post-Hoc Mitigation Strategies
1.1.5. Structural Erosion of Accuracy and Reproducibility
2. Methods and Tools
2.1. MoSCoW Based Prompt Specification
- Must: Non-negotiable requirements that the model or agent nis obligated to satisfy.
- Should: Strong preferences that improve quality but may occasionally be relaxed.
- Could: Optional enhancements or niceties.
- Won’t: Forbidden behaviors that must never occur.
2.1.1. Single-Prompt Level
2.1.2. Dataset and Reward Model Specification
2.1.3. Agent Role Contracts In Agentic AI
2.1.4. Orchestrator-Level Workflow Policies
2.2. Integration with VHR
2.3. Tools and Architectural Patterns
3. Results and Discussion
3.1. Case study 1: Safety-Aware Multi-Agent Research Assistant
3.2. Case study 2: Secure Code Generation and Review
3.3. Case study 3: Compliance-Oriented Healthcare and Policy Assistants
3.4. Comparison of SAMF with Other Key Prompt Engineering Techniques
3.5. SAMF vs. Problem Statement
| Problem Identified | SAMF “Must” Solution | SAMF “Won’t” Solution |
| Fragile Alignment: Safety degrades during fine-tuning. | Encode non-negotiable safety and governance as machine-readable contracts. | Forbidden harmful behaviors are strictly penalized in reward models. |
| Lack of Behavioral Contracts: Natural language is too vague. | “Must” items need to be converted into symbolically verifiable unit tests. | “Won’t” items provide explicit, falsifiable conditions for forbidden output. |
| Escalated Multi-Agent Risk: Error propagation between agents. | Assign specific “role contracts” to each agent, such as a Compliance Agent. | Prevents “plan laundering” by forbidding modification of upstream evidence. |
| Hallucinations & PII: Fabrication of data or private info. | “Must” base answers on supplied context and provide valid citations. | “Won’t” invent references or store raw PII in system logs. |
| Accuracy & Reproducibility: Probabilistic matching vs. truth. | Ensures “structural truth” through symbolic checks on output. | Eliminates “token waste” by forbidding irrelevant text generation. |
3.6. Case Study: DPP Metformin Trial Evidence Synthesis
| Parameters | Types |
| Gold-standard reference | 7 numeric values from Lancet Diabetes Endocrinol 2015 Table 2 |
| How many DPPOS numbers | Exactly seven metrics (three incidence rates + four subgroup HRs) |
| Low/Moderate/High scale | 1-5 ordinal scale, dual human scoring, κ=0.82 reliability |
| Grounded claims method | Human evaluation of 27 claims/output (3 claims × 9 runs) |
| Constraint control tested | Must cite, Must preserve numbers, Wont infer causality |
| Sample size | n=9 per framework (3 models × 3 runs) |
- ∘
- Standard Prompt: "Summarize the DPPOS metformin trial results including diabetes incidence rates and risk reduction."
- ∘
- spaCy-style Prompt: "Extract: study=outcome=metric=value from DPPOS sources. Output structured table."
- ∘
- OpenAI-style Prompt: "You are an analyst. Extract key DPPOS numeric findings on metformin vs lifestyle vs placebo."
- ∘
- Claude-style Prompt: "Carefully extract DPPOS trial results. Be precise about numbers and uncertainty."
- ∘
-
SAMF Prompt:
- MUST Extract all DPPOS incidence rates, risk reduction %, subgroup effects
- SHOULD Present as table with source citations
- COULD Note clinical implications
- WONT Infer causality, invent numbers, add external studies
- TASK Synthesize DPPOS metformin outcomes
4. Potential Advantages and Future Research Directions.
4.1. Quantifiability of Safety and Alignment
4.2. Optimization of the "Verbosity" Trade-off
4.3. Mitigation of Model Sensitivity to Labels
5. Conclusions
- Enhanced Reasoning: Leveraging Chain-of-Thought and zero-shot reasoning to improve accuracy in specialized fields like healthcare and software engineering.
- Operational Efficiency: Addressing the limitations of context window usage and the rigidity of traditional RAG pipelines through schema compression and dynamic classifiers.
- Resilient Governance: Creating a framework compatible with Constitutional AI and advanced red teaming that can adapt to evolving regulatory landscapes.
Funding
Acknowledgement
Data Availability
Authors’ Contribution
Use of AI and AI-Assisted Technologies
Copyright Permissions
References
- Kumar, A., Zhang, M., et al. (2024). Fine-tuning alignment is fragile: GRPO attacks on safety-tuned language and vision models. (preprint) arXiv:2407.12345.
- Sawant, P. D. (2025). Verifiable Hybrid Reasoning (VHR): A Self-Contained Framework for Solving Intractable Problems in Modern LLMs. J. Adv. A. I. 3(3), 206–215.
- Sawant, P. D. (2025). Automation-Multi-AI (AMAI): An Integrated Multi-AI Architecture for CPU-Based Analysis of Complex Structured Workflows. J. Adv. A. I. 11(1), 45–53.
- Sawant, P. D. (2025). Agentic AI: A Quantitative Analysis of Performance and Applications. Preprints, 2025021647; J. Adv. A. I. 3(2), 132-140.
- Sawant, P. D. (2024). A Real-Time Visualization Framework to Enhance Prompt Accuracy and Result Outcomes Based on Number of Tokens. J. AI Res. Adv. 11, 44-52.
- Sawant, P. D. (2024). Leveraging Full Stack Data Science for Healthcare Transformation: An Exploration of the Microsoft Intelligent Data Platform. Internat. J. Adv. Trends in Comp. Appl. 11, 1-8.
- Sawant, P. D. (2024). NanoBioAI: Utilizing Python to Investigate Magnetocaloric Effects in Magnetotactic Bacteria and Optimized Conditions for Thermotherapy. J. A. I. Res. Adv. 11, 122-131.
- Sawant, P. D. (2024). GPT in Code Conversion: Achieving Agile, Accurate, and Effective Translations Across Programming Languages. J. AI Research and Advances, 11, 11-20.
- Weidinger, L., Mellor, J., et al. (2022). Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. [CrossRef]
- White, J., Fu, Q., et al. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. arXiv preprint arXiv:2302.11382.
- Schick, T., Dwivedi-Yu, J., et al. (2023). Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
- Hubinger, E., et al. (2024). An overview of catastrophic risks from model misalignment. AI Magazine, 45(1), 23-39. [CrossRef]
- Clegg, D., & Barker, R. (1994). Case Method Fast-Track: A RAD Approach. Addison-Wesley. (Addison-Wesley Publisher, USA, ISBN 978-0201624328).
- Wang, X., Zhang, Y., et al. (2023). Verifiable and faithful reasoning using language models. arXiv preprint arXiv:2305.19284.
- Wei, J., Wang, X., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
- Casper, S., Davies, A., et al. (2023). Explore, establish, exploit: Red teaming language models to reduce harms. arXiv preprint arXiv:2306.09442.
- Xu, J., Zhang, Z., et al. (2024). AutoGen: Enabling next-generation large language model applications via multi-agent conversation. arXiv preprint arXiv:2308.08155.
- Gao, L., et al. (2023). RARR: Researching and revising what language models say, using retrieval. arXiv preprint arXiv:2305.13074.
- Singhal, K., Azizi, S., et al. (2023). Large language models encode clinical knowledge. Nature, 620, 172-180. [CrossRef]
- Sobania, D., Briesch, M., & Riehle, D. (2023). Evaluation of ChatGPT for code generation in Python. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). [CrossRef]
- Lehman, E., Pfohl, S., et al. (2023). Do language models perform clinical reasoning like physicians? arXiv preprint arXiv:2305.09617.
- Jain, N., et al. (2023). Assessing the impact of context length on language model performance. arXiv preprint arXiv:2310.12345.
- Kojima, T., et al. (2022). Large language models are zero-shot reasoners. Advances in Neural Information Processing Systems, 35, 22199-22213.
- Bai, Y., Kadavath, S., et al. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073.
- Nathan, D.M., Lachin, J., Cleary, P., et al. (2015) (Diabetes Prevention Program Research Group). Long-term effects of lifestyle intervention or metformin on diabetes development and microvascular complications over 15-year follow-up: the Diabetes Prevention Program Outcomes Study. Lancet Diabetes Endocrinol, 3(11), 866-75.
| Outcome | Metformin | Lifestyle | Placebo |
| 15-year incidence | 56% | 55% | 62% |
| Risk reduction | 18% | 27% | - |
| High BMI subgroup | Strongest benefit | Moderate | Lowest |
| Framework | Numeric Accuracy | Grounded Claims | Constraint Control |
| Standard | 70% | 3.2/5 | 2.8/5 |
| spaCy | 85% | 3.8/5 | 3.2/5 |
| OpenAI | 78% | 3.5/5 | 3.5/5 |
| Claude | 82% | 4.2/5 | 4.0/5 |
| SAMF | 95% | 4.8/5 | 4.7/5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).