Question 1

What is MathArena HMMT February 2026?

Accepted Answer

Contamination-free evaluation of large language models on the 33 problems of the HMMT February 2026 mathematics competition, scoring final-answer accuracy (pass@1 estimated from 4 samples per problem) on problems released after model training. It is a reasoning benchmark measured by accuracy.

Question 2

What does accuracy mean on MathArena HMMT February 2026?

Accepted Answer

MathArena HMMT February 2026 reports accuracy (%); higher is better. Scores are shown only within MathArena HMMT February 2026 and are never averaged with other benchmarks.

Question 3

What is the top reported MathArena HMMT February 2026 score?

Accepted Answer

GPT-5.4 has the top reported score on MathArena HMMT February 2026: 97.73% (accuracy).

Question 4

Why do MathArena HMMT February 2026 scores differ across runs?

Accepted Answer

Harness, scaffold, reasoning effort, and prompt setup change results, so two runs of the same model can differ. evals.report keeps each score with its run context so the differences stay visible.

Question 5

Does evals.report rank models across benchmarks?

Accepted Answer

No. MathArena HMMT February 2026 scores are shown within their own metric; evals.report never combines benchmarks into a composite ranking or a single "best model".

Model	Lab	Score↓	Source model	Status	Date
GPT-5.4	OpenAI	97.73%	—	Official	Mar 5, 2026	Details
GPT-5.5	OpenAI	97.73%	—	Official	Apr 23, 2026	Details
GPT-5.2	OpenAI	96.97%	—	Official	Dec 11, 2025	Details
Claude Opus 4.6	Anthropic	96.21%	—	Official	Feb 5, 2026	Details
Gemini 3.5 Flash	Google DeepMind	95.45%	—	Official	May 19, 2026	Details
Claude Opus 4.8	Anthropic	95.45%	—	Official	May 28, 2026	Details
Kimi K2.6Open	Moonshot AI	94.70%	—	Official	Apr 20, 2026	Details
Gemini 3.1 Pro Preview	Google DeepMind	94.70%	—	Official	Feb 19, 2026	Details
DeepSeek V4 FlashOpen	DeepSeek	93.94%	—	Official	Apr 24, 2026	Details
DeepSeek V4 ProOpen	DeepSeek	93.94%	—	Official	Apr 24, 2026	Details
Claude Opus 4.7	Anthropic	93.94%	—	Official	Apr 16, 2026	Details
GLM-5.2Open	Z.ai	92.5%	GLM-5.2	Verified	Jun 16, 2026	Details
Gemini 3 Flash	Google DeepMind	89.39%	—	Official	Dec 17, 2025	Details
GLM-5.1Open	Z.ai	89.39%	—	Official	Apr 7, 2026	Details
Qwen3.5-397B-A17BOpen	Alibaba / Qwen	87.88%	—	Official	Feb 16, 2026	Details
Kimi K2.5Open	Moonshot AI	87.12%	—	Official	Jan 27, 2026	Details
Grok 4.1 fast reasoning	xAI	86.36%	—	Official	Nov 19, 2025	Details
GLM-5Open	Z.ai	86.36%	—	Official	Feb 11, 2026	Details
Gemini 3 Pro	Google DeepMind	86.36%	—	Official	Nov 18, 2025	Details
NVIDIA Nemotron 3 Super 120B-A12BOpen	NVIDIA	84.85%	—	Official	Mar 10, 2026	Details
DeepSeek V3.2Open	DeepSeek	84.09%	—	Official	Dec 1, 2025	Details