evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

Artificial Analysis Intelligence Index

A composite intelligence score (AAII v4.0) that aggregates a model's performance across 10 challenging evaluations spanning reasoning, knowledge, coding, agentic tasks, and instruction-following (GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, and CritPt) into a single ~0–100 index.

ReasoningIndexHigher is better
ModelLabScoreSource modelStatusDate
Claude Opus 4.8Anthropic61OfficialMay 28, 2026Details
GPT-5.5OpenAI60OfficialApr 23, 2026Details
Claude Opus 4.7Anthropic57OfficialApr 16, 2026Details
Gemini 3.1 Pro PreviewGoogle DeepMind57OfficialFeb 19, 2026Details
GPT-5.4OpenAI56.8UnverifiedMar 5, 2026Details
Qwen3.7 Max PreviewAlibaba / Qwen56.6UnverifiedMay 14, 2026Details
Gemini 3.5 FlashGoogle DeepMind55.3UnverifiedMay 19, 2026Details
Kimi K2.6Moonshot AI53.9UnverifiedApr 20, 2026Details
MiMo-V2.5-ProXiaomi53.8UnverifiedApr 22, 2026Details
GPT-5.3-CodexOpenAI53.6UnverifiedFeb 5, 2026Details
Grok 4.3xAI53.2UnverifiedApr 17, 2026Details
Claude Opus 4.6Anthropic53UnverifiedFeb 5, 2026Details
Muse SparkMeta52.1UnverifiedApr 8, 2026Details
Qwen 3.6 Max PreviewAlibaba / Qwen51.8UnverifiedApr 20, 2026Details
DeepSeek V4 ProDeepSeek51.5UnverifiedApr 24, 2026Details
GLM-5.1Z.ai51.4UnverifiedApr 7, 2026Details
GPT-5.2OpenAI51.3UnverifiedDec 11, 2025Details
Qwen 3.6 PlusAlibaba / Qwen50UnverifiedApr 2, 2026Details
GLM-5Z.ai49.8UnverifiedFeb 11, 2026Details
Claude Opus 4.5Anthropic49.7UnverifiedNov 24, 2025Details
MiniMax M2.7MiniMax49.6UnverifiedMar 18, 2026Details
GPT-5.2-CodexOpenAI49UnverifiedDec 18, 2025Details
Gemini 3 ProGoogle DeepMind48.4UnverifiedNov 18, 2025Details
GPT-5.1OpenAI47.7UnverifiedNov 12, 2025Details
Kimi K2.5Moonshot AI46.8UnverifiedJan 27, 2026Details
DeepSeek V4 FlashDeepSeek46.5UnverifiedApr 24, 2026Details
GPT-5OpenAI44.6UnverifiedAug 7, 2025Details
Claude Sonnet 4.6Anthropic44.4UnverifiedFeb 17, 2026Details
Step 3.7 FlashStepFun42.6UnverifiedMay 29, 2026Details
GLM-4.7Z.ai42.1UnverifiedDec 22, 2025Details
Claude Opus 4.1Anthropic42UnverifiedAug 5, 2025Details
Grok 4xAI41.5UnverifiedJul 9, 2025Details
OpenAI o3-proOpenAI40.7UnverifiedJun 10, 2025Details
Qwen3.5-397B-A17BAlibaba / Qwen40.1UnverifiedFeb 16, 2026Details
Mistral Medium 3.5Mistral AI39.2UnverifiedApr 28, 2026Details
Grok 4.1 fast reasoningxAI38.6UnverifiedNov 19, 2025Details
o3OpenAI38.4UnverifiedApr 16, 2025Details
Gemini 3 FlashGoogle DeepMind35UnverifiedDec 17, 2025Details
Gemini 2.5 ProGoogle DeepMind34.6UnverifiedMar 25, 2025Details
GPT-OSS-120BOpenAI33.3UnverifiedAug 5, 2025Details
Claude Sonnet 4Anthropic33UnverifiedMay 22, 2025Details
DeepSeek V3.2DeepSeek32OfficialDec 1, 2025Details
Qwen3 MaxAlibaba / Qwen31.4UnverifiedSep 5, 2025Details
GLM-4.6Z.ai30.2UnverifiedSep 30, 2025Details
DeepSeek V3.1DeepSeek28.1UnverifiedAug 21, 2025Details
DeepSeek R1DeepSeek27.1UnverifiedJan 20, 2025Details
GPT-4.1OpenAI26.3UnverifiedApr 14, 2025Details
Kimi K2 InstructMoonshot AI26.3UnverifiedJul 11, 2025Details
Gemini 2.5 FlashGoogle DeepMind20.6UnverifiedApr 17, 2025Details
Llama 4 MaverickMeta18.4UnverifiedApr 5, 2025Details
Llama 3.1 405BMeta17.4UnverifiedJul 23, 2024Details
GPT-4oOpenAI17.3UnverifiedMay 13, 2024Details
DeepSeek V3DeepSeek16.5UnverifiedDec 26, 2024Details
Gemini 1.5 ProGoogle DeepMind16UnverifiedFeb 15, 2024Details
Solar Pro 2Upstage13.6UnverifiedJul 10, 2025Details
Llama 4 ScoutMeta13.5UnverifiedApr 5, 2025Details

Each row reports the model’s Index on Artificial Analysis Intelligence Index. Click a row for the full run context.