AgentHarm

Name: AgentHarm
Creator: evals.report
License: https://creativecommons.org/licenses/by/4.0/

A safety benchmark of 440 malicious agentic tasks across 11 harm categories that measures how successfully an LLM agent completes harmful multi-step tool-use behaviors (harm score) and how often it refuses them (refusal rate).

AgentsHarm scoreLower is better

Scores About Run this benchmark

No run guide for this benchmark yet.