evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

Epoch Capabilities Index

A composite capability index from Epoch AI that statistically stitches together scores from 40+ benchmarks (using an Item-Response-Theory-style model) into a single saturation-resistant general-capability scale, calibrated so Claude 3.5 Sonnet=130 and GPT-5=150.

ReasoningIndexHigher is better
ModelLabScoreSource modelStatusDate
GPT-5.5 ProOpenAI159.3OfficialApr 23, 2026Details
GPT-5.5OpenAI158.2OfficialApr 23, 2026Details
GPT-5.4 ProOpenAI157.7OfficialMar 5, 2026Details
Gemini 3.1 Pro PreviewGoogle DeepMind156.6OfficialFeb 19, 2026Details
Gemini 3.5 FlashGoogle DeepMind156.3OfficialMay 19, 2026Details
Claude Opus 4.7Anthropic156.2OfficialApr 16, 2026Details
GPT-5.4OpenAI156.1OfficialMar 5, 2026Details
GPT-5.3-CodexOpenAI155.9OfficialFeb 5, 2026Details
Claude Opus 4.6Anthropic155.3OfficialFeb 5, 2026Details
Muse SparkMeta155.1OfficialApr 8, 2026Details
Grok 4.20 beta reasoningxAI154.3OfficialMar 9, 2026Details
GPT-5.2OpenAI153.7OfficialDec 11, 2025Details
Gemini 3 ProGoogle DeepMind153.4OfficialNov 18, 2025Details
Claude Sonnet 4.6Anthropic152.6OfficialFeb 17, 2026Details
Kimi K2.6Moonshot AI151.6OfficialApr 20, 2026Details
Gemini 3 FlashGoogle DeepMind150.9OfficialDec 17, 2025Details
Qwen 3.6 Max PreviewAlibaba / Qwen150.2OfficialApr 20, 2026Details
GPT-5OpenAI150.0OfficialAug 7, 2025Details
GLM-5.1Z.ai149.9OfficialApr 7, 2026Details
Claude Opus 4.5Anthropic149.9OfficialNov 24, 2025Details
GPT-5.1OpenAI149.7OfficialNov 12, 2025Details
Qwen 3.6 PlusAlibaba / Qwen149.1OfficialApr 2, 2026Details
Kimi K2.5Moonshot AI148.2OfficialJan 27, 2026Details
OpenAI o3-proOpenAI148.1OfficialJun 10, 2025Details
MiniMax M2.5MiniMax147.4OfficialFeb 12, 2026Details
Grok 4xAI147.4OfficialJul 9, 2025Details
o3OpenAI147.3OfficialApr 16, 2025Details
Claude Sonnet 4.5Anthropic147.2OfficialSep 29, 2025Details
o4-miniOpenAI146.9OfficialApr 16, 2025Details
Gemini 2.5 ProGoogle DeepMind146.7OfficialMar 25, 2025Details
GLM-5Z.ai146.6OfficialFeb 11, 2026Details
DeepSeek V3.2DeepSeek146.5OfficialDec 1, 2025Details
Qwen3.5-397B-A17BAlibaba / Qwen146.1OfficialFeb 16, 2026Details
Kimi K2 ThinkingMoonshot AI145.6OfficialNov 6, 2025Details
GPT-5 miniOpenAI145.6OfficialAug 7, 2025Details
Qwen3 MaxAlibaba / Qwen144.9OfficialSep 5, 2025Details
Claude Opus 4.1Anthropic144.7OfficialAug 5, 2025Details
GLM-4.7Z.ai144.6OfficialDec 22, 2025Details
Gemini 2.5 FlashGoogle DeepMind143.4OfficialApr 17, 2025Details
Claude Opus 4Anthropic143.4OfficialMay 22, 2025Details
Claude Haiku 4.5Anthropic143.1OfficialOct 15, 2025Details
Claude Sonnet 4Anthropic142.6OfficialMay 22, 2025Details
DeepSeek R1DeepSeek142.2OfficialJan 20, 2025Details
Claude 3.7 SonnetAnthropic142.0OfficialFeb 24, 2025Details
GLM-4.6Z.ai141.4OfficialSep 30, 2025Details
Kimi K2 InstructMoonshot AI141.2OfficialJul 11, 2025Details
GPT-OSS-120BOpenAI140.8OfficialAug 5, 2025Details
Qwen3 235B A22B Instruct 2507Alibaba / Qwen139.1OfficialJul 21, 2025Details
DeepSeek V3.1DeepSeek138.9OfficialAug 21, 2025Details
GPT-4.1OpenAI137.6OfficialApr 14, 2025Details
DeepSeek V3 0324DeepSeek137.2OfficialMar 24, 2025Details
Gemini 2.0 FlashGoogle DeepMind135.9OfficialDec 11, 2024Details
Claude 3.5 SonnetAnthropic134.4OfficialJun 20, 2024Details
Llama 4 MaverickMeta133.1OfficialApr 5, 2025Details
DeepSeek V3DeepSeek133.1OfficialDec 26, 2024Details
Gemini 1.5 ProGoogle DeepMind132.8OfficialFeb 15, 2024Details
Llama 4 ScoutMeta130.6OfficialApr 5, 2025Details
GPT-4oOpenAI129.4OfficialMay 13, 2024Details
Llama 3.1 405BMeta129.1OfficialJul 23, 2024Details
Mistral LargeMistral AI121.0OfficialFeb 26, 2024Details

Each row reports the model’s Index on Epoch Capabilities Index. Click a row for the full run context.