evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

FrontierMath Tier 4

FrontierMath Tier 4 is Epoch AI's expansion set of 50 exceptionally difficult, original research-level mathematics problems—crafted and vetted by expert mathematicians—that can take a specialist days to solve, measuring an AI model's advanced mathematical reasoning by exact-answer accuracy.

ReasoningaccuracyHigher is better

What this benchmark measures

FrontierMath Tier 4 is Epoch AI's expansion set of 50 exceptionally difficult, original research-level mathematics problems—crafted and vetted by expert mathematicians—that can take a specialist days to solve, measuring an AI model's advanced mathematical reasoning by exact-answer accuracy.

Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.

The metric shown here is accuracy. It should be interpreted within FrontierMath Tier 4, not compared as part of a site-wide ranking.

No composite ranking
evals.report never combines benchmarks. accuracy on FrontierMath Tier 4 is its own number — don’t average it with other metrics.