evals.report
BenchmarksLabsCompareRun guides

AgentDojo

A dynamic environment by ETH Zurich/Invariant Labs that evaluates the security and utility of tool-using LLM agents against prompt injection attacks, measuring task utility under attack and attacker targeted success rate across realistic banking, Slack, travel, and workspace tasks.

Agentsutility under attackHigher is better

No run guide for this benchmark yet.