Question 1

What is MathArena HMMT February 2026?

Accepted Answer

Contamination-free evaluation of large language models on the 33 problems of the HMMT February 2026 mathematics competition, scoring final-answer accuracy (pass@1 estimated from 4 samples per problem) on problems released after model training. It is a reasoning benchmark measured by accuracy.

Question 2

What does accuracy mean on MathArena HMMT February 2026?

Accepted Answer

MathArena HMMT February 2026 reports accuracy (%); higher is better. Scores are shown only within MathArena HMMT February 2026 and are never averaged with other benchmarks.

Question 3

What is the top reported MathArena HMMT February 2026 score?

Accepted Answer

GPT-5.4 has the top reported score on MathArena HMMT February 2026: 97.73% (accuracy).

Question 4

Why do MathArena HMMT February 2026 scores differ across runs?

Accepted Answer

Harness, scaffold, reasoning effort, and prompt setup change results, so two runs of the same model can differ. evals.report keeps each score with its run context so the differences stay visible.

Question 5

Does evals.report rank models across benchmarks?

Accepted Answer

No. MathArena HMMT February 2026 scores are shown within their own metric; evals.report never combines benchmarks into a composite ranking or a single "best model".

MathArena HMMT February 2026

What this benchmark measures

Frequently asked