evals.report
BenchmarksSourcesLabsCompareRun guides
SourcesReasoning

Humanity's Last Exam

High-visibility frontier benchmark with difficult expert questions.

NextManual curatedWatchlistPartial run guidePage-backed data
Official source Benchmark page

Source detail

Score source

Public pages and releases exist, but exact score provenance often lives in benchmark or lab pages.

Run guide

Dataset/eval access is public enough to document, but official run details vary.

How it can be used

Use only after each score row has source verification and retrieved-at metadata.

Caveat

Avoid stale scraped tables without retrieved-at metadata.

Evidence links 1