Skip to main content
AgenticAssure

Red teaming

34 attack techniques. One catalogue.

Every technique mapped to OWASP LLM Top 10 and MITRE ATLAS. Run them in AgenticAssure Test & Prove with refusal-aware judges and blockchain-anchored results.

Technique Type Description OWASP
Prompt Injection single-turn Override system prompt with injected instructions to hijack model behaviour.
LLM01
AML.T0051
Roleplay (DAN / Evil-AI) single-turn DAN, Evil-AI, Researcher, Narrator persona attacks to bypass safety alignment.
LLM01 LLM07
AML.T0051.001
Leetspeak single-turn Encode payload in leetspeak to evade keyword-based safety filters.
LLM01
AML.T0051.002
ROT13 single-turn ROT13-encode with decode instruction to bypass pattern matching.
LLM01
AML.T0051.002
Base64 single-turn Base64-encode payload with decode directive to evade content filters.
LLM01
AML.T0051.002
Gray Box single-turn Exploit partial knowledge of model internals (RLHF, reward hacking).
LLM01 LLM05
AML.T0043
Math Problem single-turn Embed harmful request in mathematical word problem to bypass content filters.
LLM01
AML.T0051.001
Multilingual single-turn Translate across languages to bypass English-centric safety filters.
LLM01
AML.T0051.002
Prompt Probing single-turn Extract system prompt, configuration, and hidden instructions.
LLM01 LLM06
AML.T0044
Adversarial Poetry single-turn Wrap harmful content in creative writing format to slip past safety classifiers.
LLM01
AML.T0051.001
System Override single-turn Claim system maintenance mode or developer override to disable safety.
LLM01 LLM05
AML.T0051.001
Permission Escalation single-turn Assert admin or root credentials to elevate privileges.
LLM01 LLM05
AML.T0051.001
Goal Redirection single-turn Redefine model objective from assistant to unrestricted retrieval system.
LLM01
AML.T0051
Linguistic Confusion single-turn Ambiguous quoting and syntactic tricks to confuse safety classifiers.
LLM01
AML.T0051.002
Input Bypass single-turn Inject special tokens (<|im_start|>) to bypass input sanitisation.
LLM01
AML.T0051.002
Context Poisoning single-turn Inject fake authorisation context to establish false trust.
LLM01 LLM05
AML.T0051.001
Character Stream single-turn Deliver payload one character at a time to evade sequence-level filters.
LLM01
AML.T0051.002
Context Flooding single-turn Fill context window to push safety instructions out of scope.
LLM01 LLM10
AML.T0051.001
Embedded Instruction JSON single-turn Hide instructions in JSON payload with override flags.
LLM01
AML.T0051.001
Synthetic Context Injection single-turn Inject fake conversation history to establish unearned trust.
LLM01 LLM05
AML.T0051.001
Authority Escalation single-turn Impersonate AI Safety Board or compliance authority for testing override.
LLM01 LLM05
AML.T0051.001
Emotional Manipulation single-turn Use emotional appeals to override safety constraints.
LLM01
AML.T0051.001
Indirect Prompt Injection single-turn Inject instructions via tool output, retrieved document, or file.
LLM01 LLM02
AML.T0051.003
Tool Metadata Poisoning single-turn Poison tool descriptions to misdirect agent invocation paths.
LLM01 LLM08
AML.T0051.003
Cross-Context Retrieval single-turn Coax retrieval across tenant or session boundaries.
LLM06 LLM08
AML.T0044
Embedding Adversarial Input single-turn Crafted unicode, homoglyphs, and zero-width characters to evade filters.
LLM01
AML.T0051.002
RAG Poisoning single-turn Poison vector store or knowledge base referenced by RAG pipeline.
LLM01 LLM09
AML.T0049
Excessive Agency Probe single-turn Push the agent to invoke tools beyond its mandate or delegation scope.
LLM08
AML.T0051.003
Identity Spoofing single-turn Impersonate another agent or system component within the AI estate.
LLM05 LLM08
AML.T0051.001
Linear Jailbreaking multi-turn Single refinement path. Configurable iterations (default 5). The attacker LLM crafts a prompt, the target model responds, an independent judge LLM scores whether safety was bypassed (1-10), then the attacker refines using judge feedback. Fast, deterministic, easy to audit.
LLM01
AML.T0051
Tree Jailbreaking (TAP) multi-turn Branches into N paths at each depth, selects top performers. Tree of Attacks with Pruning (TAP) explores multiple refinement branches in parallel. Best for hardened safety classifiers where a single path is unlikely to succeed.
LLM01
AML.T0051
Crescendo multi-turn Generates escalating prompts from benign to boundary-testing across turns. Uncovers safety drift and reveals how models handle gradual escalation - the conversational equivalent of boiling a frog.
LLM01
AML.T0051
Sequential Jailbreak multi-turn Chain small instruction overrides (mode 1 → mode 2 → mode 3). Each turn ratchets the model further from its safety baseline, establishing cumulative context that eventually bypasses alignment.
LLM01 LLM05
AML.T0051
Bad Likert Judge multi-turn Use Likert-scale rating prompts to extract harmful content. The model is asked to rate content on a safety scale, then asked to provide examples of each rating level - eliciting harmful content as a 'demonstration' of low-safety outputs.
LLM01 LLM07
AML.T0051.001

Click any row in the platform to run the full suite, or read the red teaming guide.