BenchmarksReasoning
MathArena HMMT February 2026
Contamination-free evaluation of large language models on the 33 problems of the HMMT February 2026 mathematics competition, scoring final-answer accuracy (pass@1 estimated from 4 samples per problem) on problems released after model training.
ReasoningaccuracyHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| GPT-5.4 | OpenAI | 97.73% | — | Official | Mar 5, 2026 | Details |
| GPT-5.5 | OpenAI | 97.73% | — | Official | Apr 23, 2026 | Details |
| GPT-5.2 | OpenAI | 96.97% | — | Official | Dec 11, 2025 | Details |
| Claude Opus 4.6 | Anthropic | 96.21% | — | Official | Feb 5, 2026 | Details |
| Gemini 3.5 Flash | Google DeepMind | 95.45% | — | Official | May 19, 2026 | Details |
| Claude Opus 4.8 | Anthropic | 95.45% | — | Official | May 28, 2026 | Details |
| Kimi K2.6 | Moonshot AI | 94.70% | — | Official | Apr 20, 2026 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 94.70% | — | Official | Feb 19, 2026 | Details |
| DeepSeek V4 Flash | DeepSeek | 93.94% | — | Official | Apr 24, 2026 | Details |
| DeepSeek V4 Pro | DeepSeek | 93.94% | — | Official | Apr 24, 2026 | Details |
| Claude Opus 4.7 | Anthropic | 93.94% | — | Official | Apr 16, 2026 | Details |
| Gemini 3 Flash | Google DeepMind | 89.39% | — | Official | Dec 17, 2025 | Details |
| GLM-5.1 | Z.ai | 89.39% | — | Official | Apr 7, 2026 | Details |
| Qwen3.5-397B-A17B | Alibaba / Qwen | 87.88% | — | Official | Feb 16, 2026 | Details |
| Kimi K2.5 | Moonshot AI | 87.12% | — | Official | Jan 27, 2026 | Details |
| Grok 4.1 fast reasoning | xAI | 86.36% | — | Official | Nov 19, 2025 | Details |
| GLM-5 | Z.ai | 86.36% | — | Official | Feb 11, 2026 | Details |
| Gemini 3 Pro | Google DeepMind | 86.36% | — | Official | Nov 18, 2025 | Details |
| NVIDIA Nemotron 3 Super 120B-A12B | NVIDIA | 84.85% | — | Official | Mar 10, 2026 | Details |
| DeepSeek V3.2 | DeepSeek | 84.09% | — | Official | Dec 1, 2025 | Details |
Each row reports the model’s accuracy on MathArena HMMT February 2026. Click a row for the full run context.