AgentHarm

Name: AgentHarm
Creator: evals.report
License: https://creativecommons.org/licenses/by/4.0/

A safety benchmark of 440 malicious agentic tasks across 11 harm categories that measures how successfully an LLM agent completes harmful multi-step tool-use behaviors (harm score) and how often it refuses them (refusal rate).

AgentsHarm scoreLower is better

Scores About Run this benchmark

What this benchmark measures

Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.

The metric shown here is Harm score. It should be interpreted within AgentHarm, not compared as part of a site-wide ranking.

No composite ranking

evals.report never combines benchmarks. Harm score on AgentHarm is its own number — don’t average it with other metrics.