evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

AIME (OTIS Mock)

Competition mathematics in the AIME format (Epoch AI's OTIS Mock AIME 2024-2025 set), a high-signal short-answer math reasoning benchmark.

ReasoningaccuracyHigher is better

Problems and methodology are documented on the Epoch AI benchmarks hub.

Benchmark
AIME (OTIS Mock)
Repository
Not provided
Dataset
epoch.ai/benchmarks/otis-mock-aime-2024-2025
Metric
accuracy

1Expected output

Use the official source links for current output format, submission steps, and benchmark-specific result files.

2Submit results

Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.

Gotchas

AIME-style benchmarks are saturating at the top; keep effort/config attached.
Do not mix this benchmark's metric with unrelated benchmark metrics.