evals.report
BenchmarksSourcesLabsCompareRun guides
BenchmarksReasoning

LiveBench

A frequently updated public benchmark suite spanning reasoning, coding, math, language, and instruction-following tasks.

ReasoningscoreHigher is better

Official repo includes run_livebench.py, scoring utilities, and download_leaderboard.py.

Benchmark
LiveBench
Dataset
huggingface.co/datasets/livebench/model_judgment
Metric
score

1Expected output

Use the official source links for current output format, submission steps, and benchmark-specific result files.

2Submit results

Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.

Gotchas

Show the LiveBench global average as a source-scoped LiveBench metric only; do not mix with unrelated benchmarks.
Do not mix this benchmark's metric with unrelated benchmark metrics.