evals.report
BenchmarksLabsCompareRun guides

Gray Swan Arena (Agent Red-Teaming / Indirect Prompt Injection)

A large-scale public red-teaming competition run on the Gray Swan Arena platform that measures how often adversarial attackers can break frontier AI agents (via jailbreaks and indirect prompt injection across tool-use, coding, and computer-use settings), reported as an attack success rate where lower is better.

AgentsAttack Success Rate (ASR)Lower is better
ModelLabScoreSource modelStatusDate
Gemini 2.5 ProGoogle DeepMind8.5%VerifiedMar 25, 2025Details
Llama 3.1 405BMeta5.89%VerifiedJul 23, 2024Details
DeepSeek V3.1DeepSeek5.4%VerifiedAug 21, 2025Details
Kimi K2 ThinkingMoonshot AI4.8%VerifiedNov 6, 2025Details
Grok 4xAI2.9%VerifiedJul 9, 2025Details
GPT-5.1OpenAI2.5%VerifiedNov 12, 2025Details
o3OpenAI2.50%VerifiedApr 16, 2025Details
GPT-4oOpenAI2.41%VerifiedMay 13, 2024Details
GPT-5OpenAI2.0%VerifiedAug 7, 2025Details
Claude 3.5 SonnetAnthropic1.85%VerifiedJun 20, 2024Details
Claude 3.7 SonnetAnthropic1.61%VerifiedFeb 24, 2025Details
Claude Haiku 4.5Anthropic1.3%VerifiedOct 15, 2025Details
Claude Sonnet 4.5Anthropic1.0%VerifiedSep 29, 2025Details
Claude Opus 4.5Anthropic0.5%VerifiedNov 24, 2025Details

Each row reports the model’s Attack Success Rate (ASR) on Gray Swan Arena (Agent Red-Teaming / Indirect Prompt Injection). Click a row for the full run context.