BenchmarksAgents
Gray Swan Arena (Agent Red-Teaming / Indirect Prompt Injection)
A large-scale public red-teaming competition run on the Gray Swan Arena platform that measures how often adversarial attackers can break frontier AI agents (via jailbreaks and indirect prompt injection across tool-use, coding, and computer-use settings), reported as an attack success rate where lower is better.
AgentsAttack Success Rate (ASR)Lower is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Gemini 2.5 Pro | Google DeepMind | 8.5% | — | Verified | Mar 25, 2025 | Details |
| Llama 3.1 405B | Meta | 5.89% | — | Verified | Jul 23, 2024 | Details |
| DeepSeek V3.1 | DeepSeek | 5.4% | — | Verified | Aug 21, 2025 | Details |
| Kimi K2 Thinking | Moonshot AI | 4.8% | — | Verified | Nov 6, 2025 | Details |
| Grok 4 | xAI | 2.9% | — | Verified | Jul 9, 2025 | Details |
| GPT-5.1 | OpenAI | 2.5% | — | Verified | Nov 12, 2025 | Details |
| o3 | OpenAI | 2.50% | — | Verified | Apr 16, 2025 | Details |
| GPT-4o | OpenAI | 2.41% | — | Verified | May 13, 2024 | Details |
| GPT-5 | OpenAI | 2.0% | — | Verified | Aug 7, 2025 | Details |
| Claude 3.5 Sonnet | Anthropic | 1.85% | — | Verified | Jun 20, 2024 | Details |
| Claude 3.7 Sonnet | Anthropic | 1.61% | — | Verified | Feb 24, 2025 | Details |
| Claude Haiku 4.5 | Anthropic | 1.3% | — | Verified | Oct 15, 2025 | Details |
| Claude Sonnet 4.5 | Anthropic | 1.0% | — | Verified | Sep 29, 2025 | Details |
| Claude Opus 4.5 | Anthropic | 0.5% | — | Verified | Nov 24, 2025 | Details |
Each row reports the model’s Attack Success Rate (ASR) on Gray Swan Arena (Agent Red-Teaming / Indirect Prompt Injection). Click a row for the full run context.