BenchmarksReasoning
FrontierMath Tier 4
FrontierMath Tier 4 is Epoch AI's expansion set of 50 exceptionally difficult, original research-level mathematics problems—crafted and vetted by expert mathematicians—that can take a specialist days to solve, measuring an AI model's advanced mathematical reasoning by exact-answer accuracy.
ReasoningaccuracyHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| GPT-5.5 Pro | OpenAI | 39.6% | — | Official | Apr 23, 2026 | Details |
| GPT-5.4 Pro | OpenAI | 37.5% | — | Official | Mar 5, 2026 | Details |
| GPT-5.5 | OpenAI | 35.4% | — | Official | Apr 23, 2026 | Details |
| GPT-5.4 | OpenAI | 27.1% | — | Official | Mar 5, 2026 | Details |
| Claude Opus 4.7 | Anthropic | 22.9% | — | Official | Apr 16, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 22.9% | — | Official | Feb 5, 2026 | Details |
| GPT-5.2 | OpenAI | 18.8% | — | Official | Dec 11, 2025 | Details |
| Gemini 3 Pro | Google DeepMind | 18.8% | — | Official | Nov 18, 2025 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 16.7% | — | Official | Feb 19, 2026 | Details |
| Muse Spark | Meta | 14.6% | — | Official | Apr 8, 2026 | Details |
| Gemini 3.5 Flash | Google DeepMind | 14.6% | — | Official | May 19, 2026 | Details |
| Kimi K2.6 | Moonshot AI | 14.6% | — | Official | Apr 20, 2026 | Details |
| GLM-5.1 | Z.ai | 12.5% | — | Official | Apr 7, 2026 | Details |
| GPT-5.1 | OpenAI | 12.5% | — | Official | Nov 12, 2025 | Details |
| GPT-5 | OpenAI | 12.5% | — | Official | Aug 7, 2025 | Details |
| Qwen 3.6 Plus | Alibaba / Qwen | 8.3% | — | Official | Apr 2, 2026 | Details |
| Claude Sonnet 4.6 | Anthropic | 8.3% | — | Official | Feb 17, 2026 | Details |
| GPT-5 mini | OpenAI | 6.3% | — | Official | Aug 7, 2025 | Details |
| o4-mini | OpenAI | 6.3% | — | Official | Apr 16, 2025 | Details |
| Kimi K2.5 | Moonshot AI | 4.2% | — | Official | Jan 27, 2026 | Details |
| Qwen 3.6 Max Preview | Alibaba / Qwen | 4.2% | — | Official | Apr 20, 2026 | Details |
| Gemini 2.5 Flash | Google DeepMind | 4.2% | — | Official | Apr 17, 2025 | Details |
| Gemini 3 Flash | Google DeepMind | 4.2% | — | Official | Dec 17, 2025 | Details |
| Claude Opus 4.5 | Anthropic | 4.2% | — | Official | Nov 24, 2025 | Details |
| Claude Sonnet 4.5 | Anthropic | 4.2% | — | Official | Sep 29, 2025 | Details |
| Claude Opus 4.1 | Anthropic | 4.2% | — | Official | Aug 5, 2025 | Details |
| Gemini 2.5 Pro | Google DeepMind | 4.2% | — | Official | Mar 25, 2025 | Details |
| Claude Opus 4 | Anthropic | 4.2% | — | Official | May 22, 2025 | Details |
| GLM-4.6 | Z.ai | 2.1% | — | Official | Sep 30, 2025 | Details |
| GLM-5 | Z.ai | 2.1% | — | Official | Feb 11, 2026 | Details |
| DeepSeek V3.2 | DeepSeek | 2.1% | — | Official | Dec 1, 2025 | Details |
| Qwen3.5-397B-A17B | Alibaba / Qwen | 2.1% | — | Official | Feb 16, 2026 | Details |
| Claude Haiku 4.5 | Anthropic | 2.1% | — | Official | Oct 15, 2025 | Details |
| Grok 4 | xAI | 2.1% | — | Official | Jul 9, 2025 | Details |
| o3 | OpenAI | 2.1% | — | Official | Apr 16, 2025 | Details |
Each row reports the model’s accuracy on FrontierMath Tier 4. Click a row for the full run context.