BenchmarksReasoning
LiveBench
A frequently updated public benchmark suite spanning reasoning, coding, math, language, and instruction-following tasks.
ReasoningscoreHigher is better
What this benchmark measures
A frequently updated public benchmark suite spanning reasoning, coding, math, language, and instruction-following tasks.
Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.
The metric shown here is score. It should be interpreted within LiveBench and the LiveBench source context, not compared as part of a site-wide ranking.
What to be careful about
Show the LiveBench global average as a source-scoped LiveBench metric only; do not mix with unrelated benchmarks.
No composite ranking
evals.report never combines benchmarks. score on LiveBench is its own number — don’t average it with other metrics.