evals.report
BenchmarksLabsCompareRun guides

Gray Swan Arena (Agent Red-Teaming / Indirect Prompt Injection)

A large-scale public red-teaming competition run on the Gray Swan Arena platform that measures how often adversarial attackers can break frontier AI agents (via jailbreaks and indirect prompt injection across tool-use, coding, and computer-use settings), reported as an attack success rate where lower is better.

AgentsAttack Success Rate (ASR)Lower is better

No run guide for this benchmark yet.