Preprint
Review

This version is not peer-reviewed.

Understanding Large Language Model Attacks: A Beginner-Friendly Introduction

Submitted:

01 April 2026

Posted:

02 April 2026

You are already at the latest version

Abstract
Large language models (LLMs) are now used in chatbots, search engines, writing assistants, coding tools, educational systems, and AI agents. At the same time, they are vulnerable to a wide range of attacks. Some attacks attempt to make the model ignore its rules and produce harmful or manipulated outputs, while others aim to extract private or sensitive information from the model or its training data. This paper presents a concept-level survey of major LLM attack methods in language that is simple enough for broad readers while remaining structured like a research paper. We organize the literature into two high-level groups: security attacks and privacy attacks. Under security attacks, we discuss prompt injection, jailbreaking, backdoor attacks, and data poisoning attacks. Under privacy attacks, we discuss gradient leak-age, membership inference, and personally identifiable information (PII) leakage. For each family, we explain the core idea, summarize representative methods from the literature, and provide descriptive toy examples that help readers understand the mechanism without requiring advanced background knowledge. The goal of this paper is pedagogical: to help new researchers, students, and general readers build a clear mental model of the LLM attack landscape.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated