Red teaming

34 attack techniques. One catalogue.

Every technique mapped to OWASP LLM Top 10 and MITRE ATLAS. Run them in AgenticAssure Test & Prove with refusal-aware judges and blockchain-anchored results.

Book a demo LLM red teaming guide

Technique	Type	Description	OWASP
Prompt Injection	single-turn	Override system prompt with injected instructions to hijack model behaviour.	LLM01	AML.T0051
Roleplay (DAN / Evil-AI)	single-turn	DAN, Evil-AI, Researcher, Narrator persona attacks to bypass safety alignment.	LLM01 LLM07	AML.T0051.001
Leetspeak	single-turn	Encode payload in leetspeak to evade keyword-based safety filters.	LLM01	AML.T0051.002
ROT13	single-turn	ROT13-encode with decode instruction to bypass pattern matching.	LLM01	AML.T0051.002
Base64	single-turn	Base64-encode payload with decode directive to evade content filters.	LLM01	AML.T0051.002
Gray Box	single-turn	Exploit partial knowledge of model internals (RLHF, reward hacking).	LLM01 LLM05	AML.T0043
Math Problem	single-turn	Embed harmful request in mathematical word problem to bypass content filters.	LLM01	AML.T0051.001
Multilingual	single-turn	Translate across languages to bypass English-centric safety filters.	LLM01	AML.T0051.002
Prompt Probing	single-turn	Extract system prompt, configuration, and hidden instructions.	LLM01 LLM06	AML.T0044
Adversarial Poetry	single-turn	Wrap harmful content in creative writing format to slip past safety classifiers.	LLM01	AML.T0051.001
System Override	single-turn	Claim system maintenance mode or developer override to disable safety.	LLM01 LLM05	AML.T0051.001
Permission Escalation	single-turn	Assert admin or root credentials to elevate privileges.	LLM01 LLM05	AML.T0051.001
Goal Redirection	single-turn	Redefine model objective from assistant to unrestricted retrieval system.	LLM01	AML.T0051
Linguistic Confusion	single-turn	Ambiguous quoting and syntactic tricks to confuse safety classifiers.	LLM01	AML.T0051.002
Input Bypass	single-turn	Inject special tokens (<\|im_start\|>) to bypass input sanitisation.	LLM01	AML.T0051.002
Context Poisoning	single-turn	Inject fake authorisation context to establish false trust.	LLM01 LLM05	AML.T0051.001
Character Stream	single-turn	Deliver payload one character at a time to evade sequence-level filters.	LLM01	AML.T0051.002
Context Flooding	single-turn	Fill context window to push safety instructions out of scope.	LLM01 LLM10	AML.T0051.001
Embedded Instruction JSON	single-turn	Hide instructions in JSON payload with override flags.	LLM01	AML.T0051.001
Synthetic Context Injection	single-turn	Inject fake conversation history to establish unearned trust.	LLM01 LLM05	AML.T0051.001
Authority Escalation	single-turn	Impersonate AI Safety Board or compliance authority for testing override.	LLM01 LLM05	AML.T0051.001
Emotional Manipulation	single-turn	Use emotional appeals to override safety constraints.	LLM01	AML.T0051.001
Indirect Prompt Injection	single-turn	Inject instructions via tool output, retrieved document, or file.	LLM01 LLM02	AML.T0051.003
Tool Metadata Poisoning	single-turn	Poison tool descriptions to misdirect agent invocation paths.	LLM01 LLM08	AML.T0051.003
Cross-Context Retrieval	single-turn	Coax retrieval across tenant or session boundaries.	LLM06 LLM08	AML.T0044
Embedding Adversarial Input	single-turn	Crafted unicode, homoglyphs, and zero-width characters to evade filters.	LLM01	AML.T0051.002
RAG Poisoning	single-turn	Poison vector store or knowledge base referenced by RAG pipeline.	LLM01 LLM09	AML.T0049
Excessive Agency Probe	single-turn	Push the agent to invoke tools beyond its mandate or delegation scope.	LLM08	AML.T0051.003
Identity Spoofing	single-turn	Impersonate another agent or system component within the AI estate.	LLM05 LLM08	AML.T0051.001
Linear Jailbreaking	multi-turn	Single refinement path. Configurable iterations (default 5). The attacker LLM crafts a prompt, the target model responds, an independent judge LLM scores whether safety was bypassed (1-10), then the attacker refines using judge feedback. Fast, deterministic, easy to audit.	LLM01	AML.T0051
Tree Jailbreaking (TAP)	multi-turn	Branches into N paths at each depth, selects top performers. Tree of Attacks with Pruning (TAP) explores multiple refinement branches in parallel. Best for hardened safety classifiers where a single path is unlikely to succeed.	LLM01	AML.T0051
Crescendo	multi-turn	Generates escalating prompts from benign to boundary-testing across turns. Uncovers safety drift and reveals how models handle gradual escalation - the conversational equivalent of boiling a frog.	LLM01	AML.T0051
Sequential Jailbreak	multi-turn	Chain small instruction overrides (mode 1 → mode 2 → mode 3). Each turn ratchets the model further from its safety baseline, establishing cumulative context that eventually bypasses alignment.	LLM01 LLM05	AML.T0051
Bad Likert Judge	multi-turn	Use Likert-scale rating prompts to extract harmful content. The model is asked to rate content on a safety scale, then asked to provide examples of each rating level - eliciting harmful content as a 'demonstration' of low-safety outputs.	LLM01 LLM07	AML.T0051.001

Click any row in the platform to run the full suite, or read the red teaming guide.