BenchmarksReasoning
Epoch Capabilities Index
A composite capability index from Epoch AI that statistically stitches together scores from 40+ benchmarks (using an Item-Response-Theory-style model) into a single saturation-resistant general-capability scale, calibrated so Claude 3.5 Sonnet=130 and GPT-5=150.
ReasoningIndexHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| GPT-5.5 Pro | OpenAI | 159.3 | — | Official | Apr 23, 2026 | Details |
| GPT-5.5 | OpenAI | 158.2 | — | Official | Apr 23, 2026 | Details |
| GPT-5.4 Pro | OpenAI | 157.7 | — | Official | Mar 5, 2026 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 156.6 | — | Official | Feb 19, 2026 | Details |
| Gemini 3.5 Flash | Google DeepMind | 156.3 | — | Official | May 19, 2026 | Details |
| Claude Opus 4.7 | Anthropic | 156.2 | — | Official | Apr 16, 2026 | Details |
| GPT-5.4 | OpenAI | 156.1 | — | Official | Mar 5, 2026 | Details |
| GPT-5.3-Codex | OpenAI | 155.9 | — | Official | Feb 5, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 155.3 | — | Official | Feb 5, 2026 | Details |
| Muse Spark | Meta | 155.1 | — | Official | Apr 8, 2026 | Details |
| Grok 4.20 beta reasoning | xAI | 154.3 | — | Official | Mar 9, 2026 | Details |
| GPT-5.2 | OpenAI | 153.7 | — | Official | Dec 11, 2025 | Details |
| Gemini 3 Pro | Google DeepMind | 153.4 | — | Official | Nov 18, 2025 | Details |
| Claude Sonnet 4.6 | Anthropic | 152.6 | — | Official | Feb 17, 2026 | Details |
| Kimi K2.6 | Moonshot AI | 151.6 | — | Official | Apr 20, 2026 | Details |
| Gemini 3 Flash | Google DeepMind | 150.9 | — | Official | Dec 17, 2025 | Details |
| Qwen 3.6 Max Preview | Alibaba / Qwen | 150.2 | — | Official | Apr 20, 2026 | Details |
| GPT-5 | OpenAI | 150.0 | — | Official | Aug 7, 2025 | Details |
| GLM-5.1 | Z.ai | 149.9 | — | Official | Apr 7, 2026 | Details |
| Claude Opus 4.5 | Anthropic | 149.9 | — | Official | Nov 24, 2025 | Details |
| GPT-5.1 | OpenAI | 149.7 | — | Official | Nov 12, 2025 | Details |
| Qwen 3.6 Plus | Alibaba / Qwen | 149.1 | — | Official | Apr 2, 2026 | Details |
| Kimi K2.5 | Moonshot AI | 148.2 | — | Official | Jan 27, 2026 | Details |
| OpenAI o3-pro | OpenAI | 148.1 | — | Official | Jun 10, 2025 | Details |
| MiniMax M2.5 | MiniMax | 147.4 | — | Official | Feb 12, 2026 | Details |
| Grok 4 | xAI | 147.4 | — | Official | Jul 9, 2025 | Details |
| o3 | OpenAI | 147.3 | — | Official | Apr 16, 2025 | Details |
| Claude Sonnet 4.5 | Anthropic | 147.2 | — | Official | Sep 29, 2025 | Details |
| o4-mini | OpenAI | 146.9 | — | Official | Apr 16, 2025 | Details |
| Gemini 2.5 Pro | Google DeepMind | 146.7 | — | Official | Mar 25, 2025 | Details |
| GLM-5 | Z.ai | 146.6 | — | Official | Feb 11, 2026 | Details |
| DeepSeek V3.2 | DeepSeek | 146.5 | — | Official | Dec 1, 2025 | Details |
| Qwen3.5-397B-A17B | Alibaba / Qwen | 146.1 | — | Official | Feb 16, 2026 | Details |
| Kimi K2 Thinking | Moonshot AI | 145.6 | — | Official | Nov 6, 2025 | Details |
| GPT-5 mini | OpenAI | 145.6 | — | Official | Aug 7, 2025 | Details |
| Qwen3 Max | Alibaba / Qwen | 144.9 | — | Official | Sep 5, 2025 | Details |
| Claude Opus 4.1 | Anthropic | 144.7 | — | Official | Aug 5, 2025 | Details |
| GLM-4.7 | Z.ai | 144.6 | — | Official | Dec 22, 2025 | Details |
| Gemini 2.5 Flash | Google DeepMind | 143.4 | — | Official | Apr 17, 2025 | Details |
| Claude Opus 4 | Anthropic | 143.4 | — | Official | May 22, 2025 | Details |
| Claude Haiku 4.5 | Anthropic | 143.1 | — | Official | Oct 15, 2025 | Details |
| Claude Sonnet 4 | Anthropic | 142.6 | — | Official | May 22, 2025 | Details |
| DeepSeek R1 | DeepSeek | 142.2 | — | Official | Jan 20, 2025 | Details |
| Claude 3.7 Sonnet | Anthropic | 142.0 | — | Official | Feb 24, 2025 | Details |
| GLM-4.6 | Z.ai | 141.4 | — | Official | Sep 30, 2025 | Details |
| Kimi K2 Instruct | Moonshot AI | 141.2 | — | Official | Jul 11, 2025 | Details |
| GPT-OSS-120B | OpenAI | 140.8 | — | Official | Aug 5, 2025 | Details |
| Qwen3 235B A22B Instruct 2507 | Alibaba / Qwen | 139.1 | — | Official | Jul 21, 2025 | Details |
| DeepSeek V3.1 | DeepSeek | 138.9 | — | Official | Aug 21, 2025 | Details |
| GPT-4.1 | OpenAI | 137.6 | — | Official | Apr 14, 2025 | Details |
| DeepSeek V3 0324 | DeepSeek | 137.2 | — | Official | Mar 24, 2025 | Details |
| Gemini 2.0 Flash | Google DeepMind | 135.9 | — | Official | Dec 11, 2024 | Details |
| Claude 3.5 Sonnet | Anthropic | 134.4 | — | Official | Jun 20, 2024 | Details |
| Llama 4 Maverick | Meta | 133.1 | — | Official | Apr 5, 2025 | Details |
| DeepSeek V3 | DeepSeek | 133.1 | — | Official | Dec 26, 2024 | Details |
| Gemini 1.5 Pro | Google DeepMind | 132.8 | — | Official | Feb 15, 2024 | Details |
| Llama 4 Scout | Meta | 130.6 | — | Official | Apr 5, 2025 | Details |
| GPT-4o | OpenAI | 129.4 | — | Official | May 13, 2024 | Details |
| Llama 3.1 405B | Meta | 129.1 | — | Official | Jul 23, 2024 | Details |
| Mistral Large | Mistral AI | 121.0 | — | Official | Feb 26, 2024 | Details |
Each row reports the model’s Index on Epoch Capabilities Index. Click a row for the full run context.