Red teaming
34 attack techniques. One catalogue.
Every technique mapped to OWASP LLM Top 10 and MITRE ATLAS. Run them in AgenticAssure Test & Prove with refusal-aware judges and blockchain-anchored results.
| Technique | Type | Description | OWASP | |
|---|---|---|---|---|
| Prompt Injection | single-turn | Override system prompt with injected instructions to hijack model behaviour. | LLM01 | AML.T0051 |
| Roleplay (DAN / Evil-AI) | single-turn | DAN, Evil-AI, Researcher, Narrator persona attacks to bypass safety alignment. | LLM01 LLM07 | AML.T0051.001 |
| Leetspeak | single-turn | Encode payload in leetspeak to evade keyword-based safety filters. | LLM01 | AML.T0051.002 |
| ROT13 | single-turn | ROT13-encode with decode instruction to bypass pattern matching. | LLM01 | AML.T0051.002 |
| Base64 | single-turn | Base64-encode payload with decode directive to evade content filters. | LLM01 | AML.T0051.002 |
| Gray Box | single-turn | Exploit partial knowledge of model internals (RLHF, reward hacking). | LLM01 LLM05 | AML.T0043 |
| Math Problem | single-turn | Embed harmful request in mathematical word problem to bypass content filters. | LLM01 | AML.T0051.001 |
| Multilingual | single-turn | Translate across languages to bypass English-centric safety filters. | LLM01 | AML.T0051.002 |
| Prompt Probing | single-turn | Extract system prompt, configuration, and hidden instructions. | LLM01 LLM06 | AML.T0044 |
| Adversarial Poetry | single-turn | Wrap harmful content in creative writing format to slip past safety classifiers. | LLM01 | AML.T0051.001 |
| System Override | single-turn | Claim system maintenance mode or developer override to disable safety. | LLM01 LLM05 | AML.T0051.001 |
| Permission Escalation | single-turn | Assert admin or root credentials to elevate privileges. | LLM01 LLM05 | AML.T0051.001 |
| Goal Redirection | single-turn | Redefine model objective from assistant to unrestricted retrieval system. | LLM01 | AML.T0051 |
| Linguistic Confusion | single-turn | Ambiguous quoting and syntactic tricks to confuse safety classifiers. | LLM01 | AML.T0051.002 |
| Input Bypass | single-turn | Inject special tokens (<|im_start|>) to bypass input sanitisation. | LLM01 | AML.T0051.002 |
| Context Poisoning | single-turn | Inject fake authorisation context to establish false trust. | LLM01 LLM05 | AML.T0051.001 |
| Character Stream | single-turn | Deliver payload one character at a time to evade sequence-level filters. | LLM01 | AML.T0051.002 |
| Context Flooding | single-turn | Fill context window to push safety instructions out of scope. | LLM01 LLM10 | AML.T0051.001 |
| Embedded Instruction JSON | single-turn | Hide instructions in JSON payload with override flags. | LLM01 | AML.T0051.001 |
| Synthetic Context Injection | single-turn | Inject fake conversation history to establish unearned trust. | LLM01 LLM05 | AML.T0051.001 |
| Authority Escalation | single-turn | Impersonate AI Safety Board or compliance authority for testing override. | LLM01 LLM05 | AML.T0051.001 |
| Emotional Manipulation | single-turn | Use emotional appeals to override safety constraints. | LLM01 | AML.T0051.001 |
| Indirect Prompt Injection | single-turn | Inject instructions via tool output, retrieved document, or file. | LLM01 LLM02 | AML.T0051.003 |
| Tool Metadata Poisoning | single-turn | Poison tool descriptions to misdirect agent invocation paths. | LLM01 LLM08 | AML.T0051.003 |
| Cross-Context Retrieval | single-turn | Coax retrieval across tenant or session boundaries. | LLM06 LLM08 | AML.T0044 |
| Embedding Adversarial Input | single-turn | Crafted unicode, homoglyphs, and zero-width characters to evade filters. | LLM01 | AML.T0051.002 |
| RAG Poisoning | single-turn | Poison vector store or knowledge base referenced by RAG pipeline. | LLM01 LLM09 | AML.T0049 |
| Excessive Agency Probe | single-turn | Push the agent to invoke tools beyond its mandate or delegation scope. | LLM08 | AML.T0051.003 |
| Identity Spoofing | single-turn | Impersonate another agent or system component within the AI estate. | LLM05 LLM08 | AML.T0051.001 |
| Linear Jailbreaking | multi-turn | Single refinement path. Configurable iterations (default 5). The attacker LLM crafts a prompt, the target model responds, an independent judge LLM scores whether safety was bypassed (1-10), then the attacker refines using judge feedback. Fast, deterministic, easy to audit. | LLM01 | AML.T0051 |
| Tree Jailbreaking (TAP) | multi-turn | Branches into N paths at each depth, selects top performers. Tree of Attacks with Pruning (TAP) explores multiple refinement branches in parallel. Best for hardened safety classifiers where a single path is unlikely to succeed. | LLM01 | AML.T0051 |
| Crescendo | multi-turn | Generates escalating prompts from benign to boundary-testing across turns. Uncovers safety drift and reveals how models handle gradual escalation - the conversational equivalent of boiling a frog. | LLM01 | AML.T0051 |
| Sequential Jailbreak | multi-turn | Chain small instruction overrides (mode 1 → mode 2 → mode 3). Each turn ratchets the model further from its safety baseline, establishing cumulative context that eventually bypasses alignment. | LLM01 LLM05 | AML.T0051 |
| Bad Likert Judge | multi-turn | Use Likert-scale rating prompts to extract harmful content. The model is asked to rate content on a safety scale, then asked to provide examples of each rating level - eliciting harmful content as a 'demonstration' of low-safety outputs. | LLM01 LLM07 | AML.T0051.001 |
Click any row in the platform to run the full suite, or read the red teaming guide.