BenchmarksAgents
AgentHarm
A safety benchmark of 440 malicious agentic tasks across 11 harm categories that measures how successfully an LLM agent completes harmful multi-step tool-use behaviors (harm score) and how often it refuses them (refusal rate).
AgentsHarm scoreLower is better
No run guide for this benchmark yet.