evals.report
BenchmarksSourcesLabsCompareRun guides
Run guidesReasoning

Dataset/eval access is public enough to document, but official run details vary.

Benchmark
Humanity's Last Exam
Repository
Not provided
Dataset
lastexam.ai
Metric
accuracy

1Expected output

Use the official source links for current output format, submission steps, and benchmark-specific result files.

2Submit results

Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.

Gotchas

Avoid stale scraped tables without retrieved-at metadata.
Do not mix this benchmark's metric with unrelated benchmark metrics.