Question 1

What is FrontierMath Tier 4?

Accepted Answer

FrontierMath Tier 4 is Epoch AI's expansion set of 50 exceptionally difficult, original research-level mathematics problems—crafted and vetted by expert mathematicians—that can take a specialist days to solve, measuring an AI model's advanced mathematical reasoning by exact-answer accuracy. It is a reasoning benchmark measured by accuracy.

Question 2

What does accuracy mean on FrontierMath Tier 4?

Accepted Answer

FrontierMath Tier 4 reports accuracy (%); higher is better. Scores are shown only within FrontierMath Tier 4 and are never averaged with other benchmarks.

Question 3

What is the top reported FrontierMath Tier 4 score?

Accepted Answer

GPT-5.5 Pro has the top reported score on FrontierMath Tier 4: 39.6% (accuracy).

Question 4

Why do FrontierMath Tier 4 scores differ across runs?

Accepted Answer

Harness, scaffold, reasoning effort, and prompt setup change results, so two runs of the same model can differ. evals.report keeps each score with its run context so the differences stay visible.

Question 5

Does evals.report rank models across benchmarks?

Accepted Answer

No. FrontierMath Tier 4 scores are shown within their own metric; evals.report never combines benchmarks into a composite ranking or a single "best model".

FrontierMath Tier 4

What this benchmark measures

Frequently asked