BenchmarksReasoning
Artificial Analysis Intelligence Index
A composite intelligence score (AAII v4.0) that aggregates a model's performance across 10 challenging evaluations spanning reasoning, knowledge, coding, agentic tasks, and instruction-following (GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, and CritPt) into a single ~0–100 index.
ReasoningIndexHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Claude Opus 4.8 | Anthropic | 61 | — | Official | May 28, 2026 | Details |
| GPT-5.5 | OpenAI | 60 | — | Official | Apr 23, 2026 | Details |
| Claude Opus 4.7 | Anthropic | 57 | — | Official | Apr 16, 2026 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 57 | — | Official | Feb 19, 2026 | Details |
| GPT-5.4 | OpenAI | 56.8 | — | Unverified | Mar 5, 2026 | Details |
| Qwen3.7 Max Preview | Alibaba / Qwen | 56.6 | — | Unverified | May 14, 2026 | Details |
| Gemini 3.5 Flash | Google DeepMind | 55.3 | — | Unverified | May 19, 2026 | Details |
| Kimi K2.6 | Moonshot AI | 53.9 | — | Unverified | Apr 20, 2026 | Details |
| MiMo-V2.5-Pro | Xiaomi | 53.8 | — | Unverified | Apr 22, 2026 | Details |
| GPT-5.3-Codex | OpenAI | 53.6 | — | Unverified | Feb 5, 2026 | Details |
| Grok 4.3 | xAI | 53.2 | — | Unverified | Apr 17, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 53 | — | Unverified | Feb 5, 2026 | Details |
| Muse Spark | Meta | 52.1 | — | Unverified | Apr 8, 2026 | Details |
| Qwen 3.6 Max Preview | Alibaba / Qwen | 51.8 | — | Unverified | Apr 20, 2026 | Details |
| DeepSeek V4 Pro | DeepSeek | 51.5 | — | Unverified | Apr 24, 2026 | Details |
| GLM-5.1 | Z.ai | 51.4 | — | Unverified | Apr 7, 2026 | Details |
| GPT-5.2 | OpenAI | 51.3 | — | Unverified | Dec 11, 2025 | Details |
| Qwen 3.6 Plus | Alibaba / Qwen | 50 | — | Unverified | Apr 2, 2026 | Details |
| GLM-5 | Z.ai | 49.8 | — | Unverified | Feb 11, 2026 | Details |
| Claude Opus 4.5 | Anthropic | 49.7 | — | Unverified | Nov 24, 2025 | Details |
| MiniMax M2.7 | MiniMax | 49.6 | — | Unverified | Mar 18, 2026 | Details |
| GPT-5.2-Codex | OpenAI | 49 | — | Unverified | Dec 18, 2025 | Details |
| Gemini 3 Pro | Google DeepMind | 48.4 | — | Unverified | Nov 18, 2025 | Details |
| GPT-5.1 | OpenAI | 47.7 | — | Unverified | Nov 12, 2025 | Details |
| Kimi K2.5 | Moonshot AI | 46.8 | — | Unverified | Jan 27, 2026 | Details |
| DeepSeek V4 Flash | DeepSeek | 46.5 | — | Unverified | Apr 24, 2026 | Details |
| GPT-5 | OpenAI | 44.6 | — | Unverified | Aug 7, 2025 | Details |
| Claude Sonnet 4.6 | Anthropic | 44.4 | — | Unverified | Feb 17, 2026 | Details |
| Step 3.7 Flash | StepFun | 42.6 | — | Unverified | May 29, 2026 | Details |
| GLM-4.7 | Z.ai | 42.1 | — | Unverified | Dec 22, 2025 | Details |
| Claude Opus 4.1 | Anthropic | 42 | — | Unverified | Aug 5, 2025 | Details |
| Grok 4 | xAI | 41.5 | — | Unverified | Jul 9, 2025 | Details |
| OpenAI o3-pro | OpenAI | 40.7 | — | Unverified | Jun 10, 2025 | Details |
| Qwen3.5-397B-A17B | Alibaba / Qwen | 40.1 | — | Unverified | Feb 16, 2026 | Details |
| Mistral Medium 3.5 | Mistral AI | 39.2 | — | Unverified | Apr 28, 2026 | Details |
| Grok 4.1 fast reasoning | xAI | 38.6 | — | Unverified | Nov 19, 2025 | Details |
| o3 | OpenAI | 38.4 | — | Unverified | Apr 16, 2025 | Details |
| Gemini 3 Flash | Google DeepMind | 35 | — | Unverified | Dec 17, 2025 | Details |
| Gemini 2.5 Pro | Google DeepMind | 34.6 | — | Unverified | Mar 25, 2025 | Details |
| GPT-OSS-120B | OpenAI | 33.3 | — | Unverified | Aug 5, 2025 | Details |
| Claude Sonnet 4 | Anthropic | 33 | — | Unverified | May 22, 2025 | Details |
| DeepSeek V3.2 | DeepSeek | 32 | — | Official | Dec 1, 2025 | Details |
| Qwen3 Max | Alibaba / Qwen | 31.4 | — | Unverified | Sep 5, 2025 | Details |
| GLM-4.6 | Z.ai | 30.2 | — | Unverified | Sep 30, 2025 | Details |
| DeepSeek V3.1 | DeepSeek | 28.1 | — | Unverified | Aug 21, 2025 | Details |
| DeepSeek R1 | DeepSeek | 27.1 | — | Unverified | Jan 20, 2025 | Details |
| GPT-4.1 | OpenAI | 26.3 | — | Unverified | Apr 14, 2025 | Details |
| Kimi K2 Instruct | Moonshot AI | 26.3 | — | Unverified | Jul 11, 2025 | Details |
| Gemini 2.5 Flash | Google DeepMind | 20.6 | — | Unverified | Apr 17, 2025 | Details |
| Llama 4 Maverick | Meta | 18.4 | — | Unverified | Apr 5, 2025 | Details |
| Llama 3.1 405B | Meta | 17.4 | — | Unverified | Jul 23, 2024 | Details |
| GPT-4o | OpenAI | 17.3 | — | Unverified | May 13, 2024 | Details |
| DeepSeek V3 | DeepSeek | 16.5 | — | Unverified | Dec 26, 2024 | Details |
| Gemini 1.5 Pro | Google DeepMind | 16 | — | Unverified | Feb 15, 2024 | Details |
| Solar Pro 2 | Upstage | 13.6 | — | Unverified | Jul 10, 2025 | Details |
| Llama 4 Scout | Meta | 13.5 | — | Unverified | Apr 5, 2025 | Details |
Each row reports the model’s Index on Artificial Analysis Intelligence Index. Click a row for the full run context.