BenchmarksReasoning
FrontierMath
A frontier math benchmark with constrained public access and source-linked result claims.
ReasoningaccuracyHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| GPT-5.5 Pro | OpenAI | 52.4% | GPT-5.5 Pro | Official | May 30, 2026 | Details |
| GPT-5.5 | OpenAI | 51.7% | GPT-5.5 | Official | May 30, 2026 | Details |
| GPT-5.4 Pro | OpenAI | 50.0% | GPT-5.4 Pro | Official | May 30, 2026 | Details |
| GPT-5.4 | OpenAI | 47.6% | GPT-5.4 | Official | May 30, 2026 | Details |
| Claude Opus 4.7 | Anthropic | 43.79% | Claude Opus 4.7 | Official | May 30, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 40.7% | Claude Opus 4.6 | Official | May 30, 2026 | Details |
| GPT-5.2 | OpenAI | 40.7% | GPT-5.2 | Official | May 30, 2026 | Details |
| Gemini 3.5 Flash | Google DeepMind | 38.97% | Gemini 3.5 Flash | Official | May 30, 2026 | Details |
| Kimi K2.6 | Moonshot AI | 38.97% | Kimi K2.6 | Official | May 30, 2026 | Details |
| Gemini 3 Pro | Google DeepMind | 37.6% | Gemini 3 Pro | Official | May 30, 2026 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 36.9% | Gemini 3.1 Pro | Official | May 30, 2026 | Details |
| Gemini 3 Flash | Google DeepMind | 35.64% | Gemini 3 Flash | Official | May 30, 2026 | Details |
| GLM-5.1 | Z.ai | 33.45% | GLM-5.1 | Official | May 30, 2026 | Details |
| GPT-5 | OpenAI | 32.41% | GPT-5 | Official | May 30, 2026 | Details |
| Claude Sonnet 4.6 | Anthropic | 32.4% | Claude Sonnet 4.6 | Official | May 30, 2026 | Details |
| GPT-5.1 | OpenAI | 31.03% | GPT-5.1 | Official | May 30, 2026 | Details |
| Kimi K2.5 | Moonshot AI | 27.9% | Kimi K2.5 | Official | May 30, 2026 | Details |
| o4-mini | OpenAI | 24.83% | o4-mini | Official | May 30, 2026 | Details |
| DeepSeek V3.2 | DeepSeek | 22.1% | DeepSeek-V3.2 | Official | May 30, 2026 | Details |
| Claude Opus 4.5 | Anthropic | 20.69% | Claude Opus 4.5 | Official | May 30, 2026 | Details |
| Grok 4 | xAI | 19.66% | Grok 4 | Official | May 30, 2026 | Details |
| o3 | OpenAI | 18.69% | o3 | Official | May 30, 2026 | Details |
Each row reports the model’s accuracy on FrontierMath. Click a row for the full run context.