evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

SuperGPQA

A large-scale knowledge-and-reasoning benchmark of ~26,000 graduate-level multiple-choice questions (up to 10 answer options each) spanning 285 academic disciplines, measuring overall answer accuracy.

ReasoningaccuracyHigher is better

What this benchmark measures

A large-scale knowledge-and-reasoning benchmark of ~26,000 graduate-level multiple-choice questions (up to 10 answer options each) spanning 285 academic disciplines, measuring overall answer accuracy.

Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.

The metric shown here is accuracy. It should be interpreted within SuperGPQA, not compared as part of a site-wide ranking.

No composite ranking
evals.report never combines benchmarks. accuracy on SuperGPQA is its own number — don’t average it with other metrics.