Understanding Large Language Model Attacks: A Beginner-Friendly Introduction

Md Nurul Absar Siddiky

doi:10.20944/preprints202604.0058.v1

Submitted:

01 April 2026

Posted:

02 April 2026

You are already at the latest version

Abstract

Large language models (LLMs) are now used in chatbots, search engines, writing assistants, coding tools, educational systems, and AI agents. At the same time, they are vulnerable to a wide range of attacks. Some attacks attempt to make the model ignore its rules and produce harmful or manipulated outputs, while others aim to extract private or sensitive information from the model or its training data. This paper presents a concept-level survey of major LLM attack methods in language that is simple enough for broad readers while remaining structured like a research paper. We organize the literature into two high-level groups: security attacks and privacy attacks. Under security attacks, we discuss prompt injection, jailbreaking, backdoor attacks, and data poisoning attacks. Under privacy attacks, we discuss gradient leak-age, membership inference, and personally identifiable information (PII) leakage. For each family, we explain the core idea, summarize representative methods from the literature, and provide descriptive toy examples that help readers understand the mechanism without requiring advanced background knowledge. The goal of this paper is pedagogical: to help new researchers, students, and general readers build a clear mental model of the LLM attack landscape.

Keywords:

large language models

;

prompt injection

;

jailbreaking

;

backdoor attack

;

data poisoning

;

gradient leakage

;

membership inference

;

PII leakage

;

LLM security

;

LLM privacy

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Understanding Large Language Model Attacks: A Beginner-Friendly Introduction

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe