Skip to main content
AgenticAssure
Back to blog
LLM SecurityRed TeamingAttack Vectors

8 ways hackers are targeting your LLMs

Prompt injection, jailbreaking, model stealing, and five more LLM attack techniques every CISO needs to understand.

Manish Chawda Founder & CEO, AgenticAssure 7 min read

Most security teams understand web application attacks, network intrusions, and social engineering. But LLM-specific attack techniques are fundamentally different. They exploit the statistical nature of language models rather than traditional software vulnerabilities. The attack surface is the model’s own reasoning.

After running hundreds of automated red-teaming assessments at AgenticAssure, here are the eight attack categories every CISO should understand.

1. Prompt injection

The most prevalent LLM vulnerability. An attacker adds specific instructions into a prompt to hijack the model’s output. This is the LLM equivalent of SQL injection, but with no reliable parameterization defense. The model’s input IS the control plane.

2. Prompt leaking

A specialized form of prompt injection where the goal is to force the model to reveal its system prompt. Once leaked, an attacker knows exactly what guardrails to bypass.

3. Data training poisoning

Attackers inject malicious or biased data into the training dataset to influence the model’s future behavior. The damage is embedded during training and surfaces later in production.

4. Jailbreaking

The most publicized attack category. Jailbreaking uses prompt injection specifically to bypass safety and moderation features. Common patterns exploit role-play scenarios, hypothetical framing, and encoding tricks.

5. Model inversion

An attacker queries the model with carefully crafted inputs to reconstruct sensitive information from the training data.

6. Data extraction

Targeted variant of model inversion focused on extracting specific sensitive records like API keys, email addresses, or credentials.

7. Model stealing

The attacker records a large number of input-output interactions then trains a clone, enabling intellectual property theft and license violation.

8. Membership inference

The attacker determines whether a specific data point was part of the model’s training data, with serious privacy and regulatory implications.

What this means for enterprise AI security

Traditional security tools do not catch these. You cannot run Burp Suite against a jailbreak attempt. You need purpose-built AI red teaming that understands how language models process inputs and where their reasoning can be exploited.

AgenticAssure · Trust Layer for Enterprise AI

Trust layer for enterprise AI

Your competitors are getting audited.
Are you ready?

Book a demo