BenchmarksAgents
GDPval
GDPval evaluates AI models agentically (shell + web access via a sandbox harness) on real-world economically valuable knowledge-work deliverables — documents, spreadsheets, slides, diagrams — spanning 44 occupations across 9 major U.S. GDP industries, scored by blind pairwise quality comparison; the Artificial Analysis GDPval-AA variant reports results as an Elo rating.
AgentsEloHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Claude Opus 4.8 | Anthropic | 1890 | — | Official | May 28, 2026 | Details |
| GPT-5.5 | OpenAI | 1769 | — | Official | Apr 23, 2026 | Details |
| Claude Opus 4.7 | Anthropic | 1753 | — | Official | Apr 16, 2026 | Details |
| Claude Sonnet 4.6 | Anthropic | 1676 | — | Official | Feb 17, 2026 | Details |
| GPT-5.4 | OpenAI | 1674 | — | Official | Mar 5, 2026 | Details |
| Gemini 3.5 Flash | Google DeepMind | 1659 | — | Official | May 19, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 1619 | — | Official | Feb 5, 2026 | Details |
| MiMo-V2.5-Pro | Xiaomi | 1571 | — | Official | Apr 22, 2026 | Details |
| DeepSeek V4 Pro | DeepSeek | 1558 | — | Official | Apr 24, 2026 | Details |
| MiMo-V2.5 | Xiaomi | 1551 | — | Official | Apr 22, 2026 | Details |
| GLM-5.1 | Z.ai | 1535 | — | Official | Apr 7, 2026 | Details |
| MiniMax M2.7 | MiniMax | 1505 | — | Official | Mar 18, 2026 | Details |
| Qwen 3.6 Max Preview | Alibaba / Qwen | 1504 | — | Official | Apr 20, 2026 | Details |
| Grok 4.3 | xAI | 1495 | — | Official | Apr 17, 2026 | Details |
| GPT-5.3-Codex | OpenAI | 1482 | — | Official | Feb 5, 2026 | Details |
| Kimi K2.6 | Moonshot AI | 1481 | — | Official | Apr 20, 2026 | Details |
| GPT-5.2 | OpenAI | 1467 | — | Official | Dec 11, 2025 | Details |
| Claude Opus 4.5 | Anthropic | 1452 | — | Official | Nov 24, 2025 | Details |
| Muse Spark | Meta | 1417 | — | Official | Apr 8, 2026 | Details |
| DeepSeek V4 Flash | DeepSeek | 1414 | — | Official | Apr 24, 2026 | Details |
| GLM-5 | Z.ai | 1395 | — | Official | Feb 11, 2026 | Details |
| Qwen 3.6 Plus | Alibaba / Qwen | 1354 | — | Official | Apr 2, 2026 | Details |
| Gemini 3 Deep Think | Google DeepMind | 1324 | — | Official | Dec 4, 2025 | Details |
| Claude Sonnet 4.5 | Anthropic | 1317 | — | Official | Sep 29, 2025 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 1314 | — | Official | Feb 19, 2026 | Details |
| Step 3.7 Flash | StepFun | 1298 | — | Official | May 29, 2026 | Details |
| GPT-5 | OpenAI | 1294 | — | Official | Aug 7, 2025 | Details |
| GPT-5.2-Codex | OpenAI | 1288 | — | Official | Dec 18, 2025 | Details |
| Kimi K2.5 | Moonshot AI | 1285 | — | Official | Jan 27, 2026 | Details |
| GPT-5.1 | OpenAI | 1227 | — | Official | Nov 12, 2025 | Details |
| Qwen3.5-397B-A17B | Alibaba / Qwen | 1220 | — | Official | Feb 16, 2026 | Details |
| Gemini 3 Flash | Google DeepMind | 1204 | — | Official | Dec 17, 2025 | Details |
| DeepSeek V3.2 | DeepSeek | 1197 | — | Official | Dec 1, 2025 | Details |
| GLM-4.7 | Z.ai | 1185 | — | Official | Dec 22, 2025 | Details |
| GPT-5 mini | OpenAI | 1184 | — | Official | Aug 7, 2025 | Details |
| Gemini 3 Pro | Google DeepMind | 1184 | — | Official | Nov 18, 2025 | Details |
| MiniMax M2.5 | MiniMax | 1176 | — | Official | Feb 12, 2026 | Details |
| Claude Haiku 4.5 | Anthropic | 1171 | — | Official | Oct 15, 2025 | Details |
| Mistral Medium 3.5 | Mistral AI | 1168 | — | Official | Apr 28, 2026 | Details |
| Claude Sonnet 4 | Anthropic | 1133 | — | Official | May 22, 2025 | Details |
| MiniMax M2.1 | MiniMax | 1091 | — | Official | Dec 23, 2025 | Details |
| DeepSeek V3.1 | DeepSeek | 1080 | — | Official | Aug 21, 2025 | Details |
| Gemini 2.5 Flash | Google DeepMind | 1071 | — | Official | Apr 17, 2025 | Details |
| Claude 3.7 Sonnet | Anthropic | 1048 | — | Official | Feb 24, 2025 | Details |
| Grok 4.1 fast reasoning | xAI | 1046 | — | Official | Nov 19, 2025 | Details |
| Qwen3 Max | Alibaba / Qwen | 1038 | — | Official | Sep 5, 2025 | Details |
| GLM-4.6 | Z.ai | 1029 | — | Official | Sep 30, 2025 | Details |
| o4-mini | OpenAI | 1008 | — | Official | Apr 16, 2025 | Details |
| Kimi K2 Thinking | Moonshot AI | 992 | — | Official | Nov 6, 2025 | Details |
| Grok 4 | xAI | 989 | — | Official | Jul 9, 2025 | Details |
| GPT-OSS-120B | OpenAI | 947 | — | Official | Aug 5, 2025 | Details |
| Gemini 2.5 Pro | Google DeepMind | 919 | — | Official | Mar 25, 2025 | Details |
| Mistral Large | Mistral AI | 864 | — | Official | Feb 26, 2024 | Details |
| K-EXAONE | LG AI Research | 825 | — | Official | Jan 12, 2026 | Details |
| Qwen3 235B A22B Instruct 2507 | Alibaba / Qwen | 778 | — | Official | Jul 21, 2025 | Details |
| GPT-4.1 | OpenAI | 776 | — | Official | Apr 14, 2025 | Details |
| o3 | OpenAI | 753 | — | Official | Apr 16, 2025 | Details |
| Gemini 2.0 Flash | Google DeepMind | 566 | — | Official | Dec 11, 2024 | Details |
| Qwen 3 Coder 480B | Alibaba / Qwen | 506 | — | Official | Jul 22, 2025 | Details |
| Solar Pro 2 | Upstage | 449 | — | Official | Jul 10, 2025 | Details |
| Llama 4 Maverick | Meta | 435 | — | Official | Apr 5, 2025 | Details |
| DeepSeek V3 0324 | DeepSeek | 407 | — | Official | Mar 24, 2025 | Details |
| GPT-4o | OpenAI | 378 | — | Official | May 13, 2024 | Details |
| Jamba 1.7 Large | AI21 Labs | 282 | — | Official | Jul 3, 2025 | Details |
| Llama 4 Scout | Meta | 270 | — | Official | Apr 5, 2025 | Details |
| Llama 3.1 405B | Meta | 255 | — | Official | Jul 23, 2024 | Details |
| DeepSeek R1 | DeepSeek | 248 | — | Official | Jan 20, 2025 | Details |
Each row reports the model’s Elo on GDPval. Click a row for the full run context.