Submitted:
31 October 2025
Posted:
03 November 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. The Evolution of LLM Applications and Emerging Security Landscape
1.2. OWASP LLM01:2025: Prompt Injection as the Primary Threat
1.3. Scope and Research Methodology
1.4. Structure and Contributions
2. Background and Fundamentals
2.1. Large Language Model Architecture and Inference
2.2. Prompt Engineering and System Prompts
2.3. AI Agent Systems and Tool-Augmented LLMs
2.3.1. Model Context Protocol (MCP)
2.3.2. Multi-Agent Systems
2.4. Retrieval-Augmented Generation: Enhancing LLMs with External Knowledge
2.4.1. Vector Database Vulnerabilities
2.5. Trust Boundaries and Attack Surface
3. Taxonomy of Prompt Injection Attacks
3.1. Direct Prompt Injection: Jailbreaking Techniques
3.1.1. Game-Based Manipulation: The ChatGPT Windows Keys Case
3.1.2. Role-Playing and Adversarial Optimization
3.1.3. Obfuscation Techniques
3.2. Indirect Prompt Injection: External Content Attacks
3.2.1. Web Content Poisoning
3.2.2. Document Injection
3.2.3. Email and Message Injection
3.3. Tool-Based Injection: Exploiting AI Agent Capabilities
3.3.1. Tool Poisoning in MCP
3.3.2. Hidden Unicode Instructions
3.3.3. Rug Pull Attacks
4. Vulnerabilities in AI Agent Systems
4.1. GitHub Copilot Security Failures
4.1.1. CVE-2025-53773: YOLO Mode RCE
4.1.2. CamoLeak: CVSS 9.6 Secret Exfiltration
4.1.3. AI Viruses and ZombAI Networks
4.2. Claude MCP Ecosystem Risks
4.2.1. GitHub MCP Issue Injection
4.2.2. MCP Inspector RCE: CVE-2025-49596
4.2.3. Industrial Control Systems Compromise via MCP
4.3. Cross-Platform Attack Vectors and Privilege Escalation
5. RAG System Vulnerabilities
5.1. Knowledge Base Poisoning Attacks
5.2. Vector Database Exploitation
5.3. Memory-Based Persistence and Long-Term Compromise
6. Case Studies: Real-World Exploits
6.1. Development Tools Compromise
6.2. Conversational AI Jailbreaks
6.3. Enterprise and Industrial Attacks
6.4. Supply Chain and Framework Attacks
7. Defense Mechanisms and Mitigation
7.1. Input Validation and Isolation
7.2. Architectural Defenses and Sandboxing
7.3. Prompt Engineering for Security
7.4. Detection and Monitoring
7.5. RAG-Specific Defenses
8. OWASP Framework and Industry Best Practices
8.1. OWASP Top 10 LLM 2025: Comprehensive Analysis
8.2. Industry Standards and Compliance Requirements
8.3. Secure Development Lifecycle for LLM Applications
9. Open Challenges and Fundamental Limitations
9.1. The Stochastic Nature Problem
9.2. The Alignment Paradox in Agent Systems
9.3. Detection Systems as Security Theater
9.4. The Usability-Security Trade-Off
10. Future Research Directions
10.1. Formal Verification and Provable Security
10.2. Novel Defensive Architectures
10.3. Human-AI Collaboration Models
10.4. Regulatory and Policy Research
11. Conclusions
11.1. Summary of Key Findings
11.2. Recommendations for Practitioners
- Assume prompt injection success: Design systems with assumption that prompt injection bypasses will be discovered. Implement defense-in-depth: input validation, output sanitization, sandboxing, monitoring, and privilege minimization. No single layer provides complete protection; layered defenses increase attacker costs.
- Eliminate the lethal trifecta: Audit all AI agent deployments for simultaneous presence of (1) privileged access to sensitive data or actions, (2) processing untrusted content, (3) exfiltration capability. Where all three exist, aggressive mitigation required: reduce privileges to necessary minimum, sandbox untrusted input processing, monitor exfiltration patterns.
- Enforce human-in-the-loop for critical actions: Treat MCP specification’s SHOULD for human approval as MUST rather than optional [2]. Require explicit user confirmation before tool calls affecting: data modification, external communications, financial transactions, access control changes, or actions potentially causing harm if executed incorrectly.
- Sandbox AI agent execution: Deploy agents in isolated environments (VMs, containers, cloud sandboxes) limiting blast radius from successful attacks. If agent is compromised, damage remains contained within sandbox rather than spreading to host system or corporate network.
- Never embed secrets in system prompts: Assume all prompt content will eventually be extracted. Retrieve sensitive data dynamically only when needed through secure channels rather than embedding in prompts. Implement prompt-independent authentication and authorization.
- Implement continuous monitoring: Track agent behavior for anomaly patterns: unusual tool call sequences, attempts accessing out-of-scope resources, output patterns matching exfiltration techniques. Establish baselines of legitimate behavior; alert on significant deviations requiring investigation.
- Maintain RAG knowledge base integrity: For RAG deployments implement source validation ensuring only trusted content enters knowledge bases, regular audits for poisoned documents, access controls limiting who can contribute content, and version control enabling rollback when contamination detected.
11.3. Call to Action for Research Community
References
- GitHub Copilot RCE Vulnerability Allows Hackers To Execute Malicious Code. https://cybersecuritynews.com/github-copilot-rce-vulnerability/.
- Prompt injection and jailbreaking are not the same thing. https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/.
- OWASP Top 10 for LLM Applications - LLM01: Prompt Injection. https://genai.owasp.org/llmrisk/llm01-prompt-injection/.
- OWASP Top 10 2025 for LLM Applications: Risks and Mitigation Techniques. https://www.confident-ai.com/blog/owasp-top-10-2025-for-llm-applications-risks-and-mitigation-techniques.
- GitHub Copilot: Remote Code Execution via Prompt Injection. https://embracethered.com/blog/posts/2025/github-copilot-remote-code-execution-via-prompt-injection/.
- CamoLeak: Critical GitHub Copilot Vulnerability Leaks Private Source Code. https://www.legitsecurity.com/blog/camoleak-critical-github-copilot-vulnerability-leaks-private-source-code.
- AI Under the Microscope: What’s Changed in the OWASP Top 10 for LLMs 2025. https://blog.qualys.com/vulnerabilities-threat-research/2024/11/25/ai-under-the-microscope-whats-changed-in-the-owasp-top-10-for-llms-2025.
- GitHub MCP Vulnerability Has Far-Reaching Consequences. https://cybernews.com/security/github-mcp-vulnerability-has-far-reaching-consequences/.
- The Security Risks of Model Context Protocol (MCP). https://www.pillar.security/blog/the-security-risks-of-model-context-protocol-mcp.
- Experts Uncover Critical MCP and A2A Protocol Flaws in AI Agent Ecosystems. https://thehackernews.com/2025/04/experts-uncover-critical-mcp-and-a2a.html.
- RAG Poisoning: How Attackers Can Manipulate AI Systems. https://www.promptfoo.dev/blog/rag-poisoning/.
- Prompt Injection in LLMs: A Complete Guide. https://www.evidentlyai.com/llm-guide/prompt-injection-llm.
- OWASP Top 10 for Large Language Model Applications v2025. https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-v2025.pdf.
- This Clever Jailbreak Tricked ChatGPT Into Revealing Windows Activation Keys. https://futurism.com/clever-jailbreak-chatgpt-windows-activation-keys.
- ChatGPT Jailbreak Tricks AI into Leaking Windows Product Keys. https://www.theregister.com/2025/07/09/chatgpt_jailbreak_windows_keys/.
- Here’s How ChatGPT Was Tricked Into Revealing Windows Product Keys. https://www.techspot.com/news/108637-here-how-chatgpt-tricked-revealing-windows-product-keys.html.
- ChatGPT Leaks Windows Keys Including Wells Fargo License via Clever Game Prompt. https://meterpreter.org/chatgpt-leaks-windows-keys-including-wells-fargo-license-via-clever-game-prompt/.
- Shi, J., Yuan, Z., Liu, Y., Huang, Y., Zhou, P., Sun, L., & Gong, N. Z. (2024). Optimization-based prompt injection attack to LLM-as-a-judge. arXiv. https://arxiv.org/abs/2403.17710. [CrossRef]
- Model Context Protocol: Security Risks and Exploits. https://embracethered.com/blog/posts/2025/model-context-protocol-security-risks-and-exploits/.
- Prompt Injection in Operational Technology: SCADA Attack Demonstration. https://veganmosfet.github.io/2025/07/14/prompt_injection_OT.html.
- Prompt Injection in LLM Fine-Tuning and Applications. https://labelyourdata.com/articles/llm-fine-tuning/prompt-injection.
- GitHub Copilot Prompt Injection Flaw Leaked Sensitive Data from Private Repos. https://www.csoonline.com/article/4069887/github-copilot-prompt-injection-flaw-leaked-sensitive-data-from-private-repos.html.
- Announcing the Adaptive Prompt Injection Challenge: LLMail-Inject. https://msrc.microsoft.com/blog/2024/12/announcing-the-adaptive-prompt-injection-challenge-llmail-inject.
- Vulnerable MCP: Security Vulnerabilities in Model Context Protocol. https://vulnerablemcp.info/.
- GitHub Copilot RCE Vulnerability Lets Attackers Execute Malicious Code. https://gbhackers.com/github-copilot-rce-vulnerability/.
- GitHub Copilot Vulnerability Exposes User Data and Private Repositories. https://cybersecuritynews.com/github-copilot-vulnerability/.
- GitHub Copilot Vulnerability Patched After CamoLeak Disclosure. https://cybersecuritynews.com/github-copilot-vulnerability/.
- PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation. https://github.com/sleeepeer/PoisonedRAG.
- Clop, C., & Teglia, Y. (2024). Backdoored retrievers for prompt injection attacks on retrieval augmented generation of large language models. arXiv. https://arxiv.org/abs/2410.14479. [CrossRef]
- GitHub Copilot: Remote Code Execution via Prompt Injection (CVE-2025-53773). https://vivekfordevsecopsciso.medium.com/github-copilot-remote-code-execution-via-prompt-injection-cve-2025-53773-38b4792e70fb.
- GitHub Copilot Prompt Injection: CamoLeak Vulnerability Analysis. https://sqmagazine.co.uk/github-copilot-prompt-injection-camoleak/.
- Hackers Bypass OpenAI Guardrails Framework Using Simple Techniques. https://gbhackers.com/hackers-bypass-openai-guardrails-framework/.
- Claude Code Security Documentation. https://docs.claude.com/en/docs/claude-code/security.
- Attention Tracker: Detecting Prompt Injection via Attention Analysis. https://aclanthology.org/2025.findings-naacl.123.pdf.
- Tan, X., Luan, H., Luo, M., Sun, X., Chen, P., & Dai, J. (2024). RevPRAG: Revealing poisoning attacks in retrieval-augmented generation through LLM activation analysis. arXiv. https://arxiv.org/abs/2411.18948. [CrossRef]
- Zhang, B., Chen, Y., Fang, M., Liu, Z., Nie, L., Li, T., & Liu, Z. (2025). Practical poisoning attacks against retrieval-augmented generation. arXiv. https://arxiv.org/abs/2504.03957. [CrossRef]
- Gulyamov, S., & Jurayev, S. (2023). Cybersecurity threats and data breaches: Legal implication in cyberspace contracts. Young Scientists, 1(15), 19–22. https://in-academy.uz/index.php/yo/article/view/21738.
- Gulyamov, S. S., & Rodionov, A. A. (2024). Cyber hygiene as an effective psychological measure in the prevention of cyber addictions. Psychology and Law, 14(2), 77–91. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).