LiveBench

A frequently updated public benchmark suite spanning reasoning, coding, math, language, and instruction-following tasks.

ReasoningscoreHigher is better

Known official sources 1

Ready nowResult archiveStructured dataPartial run guidePublic data

Broad public eval with frequently updated releases across reasoning, coding, math, and instruction following.

Category: Reasoning
Owner: LiveBench
Data path: Use the current release table CSV; the headline score is the global average across the six task categories.

Known caveat

Show the LiveBench global average as a source-scoped LiveBench metric only; do not mix with unrelated benchmarks.