evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

SuperGPQA

A large-scale knowledge-and-reasoning benchmark of ~26,000 graduate-level multiple-choice questions (up to 10 answer options each) spanning 285 academic disciplines, measuring overall answer accuracy.

ReasoningaccuracyHigher is better

No run guide for this benchmark yet.