evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

IMO-Bench

A suite of IMO-level mathematical reasoning benchmarks from Google DeepMind, whose IMO-AnswerBench component tests models on 400 robustified Olympiad problems (Algebra, Combinatorics, Geometry, Number Theory) with verifiable short answers graded by an autograder.

ReasoningaccuracyHigher is better

No run guide for this benchmark yet.