evals.report
BenchmarksSourcesLabsCompareRun guides
BenchmarksReasoning

LiveBench

A frequently updated public benchmark suite spanning reasoning, coding, math, language, and instruction-following tasks.

ReasoningscoreHigher is better

What this benchmark measures

A frequently updated public benchmark suite spanning reasoning, coding, math, language, and instruction-following tasks.

Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.

The metric shown here is score. It should be interpreted within LiveBench and the LiveBench source context, not compared as part of a site-wide ranking.

What to be careful about

Show the LiveBench global average as a source-scoped LiveBench metric only; do not mix with unrelated benchmarks.

No composite ranking
evals.report never combines benchmarks. score on LiveBench is its own number — don’t average it with other metrics.