8 ways hackers are targeting your LLMs
Prompt injection, jailbreaking, model stealing, and five more LLM attack techniques every CISO needs to understand.
Most security teams understand web application attacks, network intrusions, and social engineering. But LLM-specific attack techniques are fundamentally different. They exploit the statistical nature of language models rather than traditional software vulnerabilities. The attack surface is the model’s own reasoning.
After running hundreds of automated red-teaming assessments at AgenticAssure, here are the eight attack categories every CISO should understand.
1. Prompt injection
The most prevalent LLM vulnerability. An attacker adds specific instructions into a prompt to hijack the model’s output. This is the LLM equivalent of SQL injection, but with no reliable parameterization defense. The model’s input IS the control plane.
2. Prompt leaking
A specialized form of prompt injection where the goal is to force the model to reveal its system prompt. Once leaked, an attacker knows exactly what guardrails to bypass.
3. Data training poisoning
Attackers inject malicious or biased data into the training dataset to influence the model’s future behavior. The damage is embedded during training and surfaces later in production.
4. Jailbreaking
The most publicized attack category. Jailbreaking uses prompt injection specifically to bypass safety and moderation features. Common patterns exploit role-play scenarios, hypothetical framing, and encoding tricks.
5. Model inversion
An attacker queries the model with carefully crafted inputs to reconstruct sensitive information from the training data.
6. Data extraction
Targeted variant of model inversion focused on extracting specific sensitive records like API keys, email addresses, or credentials.
7. Model stealing
The attacker records a large number of input-output interactions then trains a clone, enabling intellectual property theft and license violation.
8. Membership inference
The attacker determines whether a specific data point was part of the model’s training data, with serious privacy and regulatory implications.
What this means for enterprise AI security
Traditional security tools do not catch these. You cannot run Burp Suite against a jailbreak attempt. You need purpose-built AI red teaming that understands how language models process inputs and where their reasoning can be exploited.