BenchmarksReasoning
AIME (OTIS Mock)
Competition mathematics in the AIME format (Epoch AI's OTIS Mock AIME 2024-2025 set), a high-signal short-answer math reasoning benchmark.
ReasoningaccuracyHigher is better
Problems and methodology are documented on the Epoch AI benchmarks hub.
1Expected output
Use the official source links for current output format, submission steps, and benchmark-specific result files.
2Submit results
Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.
Gotchas
AIME-style benchmarks are saturating at the top; keep effort/config attached.
Do not mix this benchmark's metric with unrelated benchmark metrics.