BenchmarksReasoning
IMO-Bench
A suite of IMO-level mathematical reasoning benchmarks from Google DeepMind, whose IMO-AnswerBench component tests models on 400 robustified Olympiad problems (Algebra, Combinatorics, Geometry, Number Theory) with verifiable short answers graded by an autograder.
ReasoningaccuracyHigher is better
No run guide for this benchmark yet.