evals.report
BenchmarksSourcesLabsCompareRun guides
SourcesOther

SimpleQA Verified

Factual-accuracy / hallucination benchmark with a consistent independent leaderboard across frontier models.

NextRaw JSONStructured dataPartial run guidePublic data
Official source Benchmark page

Source detail

Score source

Epoch AI Benchmarking Hub publishes per-model accuracy (epoch.ai/data/benchmarks.csv).

Run guide

Problems and methodology are documented on the Epoch AI benchmarks hub.

How it can be used

Use Epoch's per-model accuracy on SimpleQA Verified.

Caveat

Scores depend on grading strictness; keep the source methodology attached.

Evidence links 2