evals.report
BenchmarksSourcesLabsCompareRun guides
BenchmarksReasoning

LiveBench

A frequently updated public benchmark suite spanning reasoning, coding, math, language, and instruction-following tasks.

ReasoningscoreHigher is better

Known official sources 1

Ready nowResult archiveStructured dataPartial run guidePublic data

LiveBench

Broad public eval with frequently updated releases across reasoning, coding, math, and instruction following.

Category
Reasoning
Owner
LiveBench
Data path
Use the current release table CSV; the headline score is the global average across the six task categories.
Known caveat
Show the LiveBench global average as a source-scoped LiveBench metric only; do not mix with unrelated benchmarks.