SourcesReasoning
LiveBench
Broad public eval with frequently updated releases across reasoning, coding, math, and instruction following.
Ready nowResult archiveStructured dataPartial run guidePublic data
Source detail
Score source
The livebench.ai site loads per-release table CSVs from livebench.github.io/public (latest release 2026-01-08); the HF model_judgment split is frozen at 2025-04.
Run guide
Official repo includes run_livebench.py, scoring utilities, and download_leaderboard.py.
How it can be used
Use the current release table CSV; the headline score is the global average across the six task categories.
Caveat
Show the LiveBench global average as a source-scoped LiveBench metric only; do not mix with unrelated benchmarks.