BenchmarksReasoning
Global-MMLU
A multilingual extension of MMLU covering 42 languages with culturally-sensitive and culturally-agnostic multiple-choice knowledge questions, measuring accuracy across diverse high-, mid-, and low-resource languages.
ReasoningaccuracyHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Gemini 3.1 Pro Preview | Google DeepMind | 93.2% | — | Unverified | Feb 19, 2026 | Details |
| Gemini 3 Pro | Google DeepMind | 92.2% | — | Unverified | Nov 18, 2025 | Details |
| Claude Opus 4.6 | Anthropic | 92.2% | — | Unverified | Feb 5, 2026 | Details |
| Gemini 3 Flash | Google DeepMind | 91.4% | — | Unverified | Dec 17, 2025 | Details |
| Claude Opus 4.5 | Anthropic | 91.3% | — | Unverified | Nov 24, 2025 | Details |
| GPT-5 | OpenAI | 90.7% | — | Unverified | Aug 7, 2025 | Details |
| GPT-5.1 | OpenAI | 90.6% | — | Unverified | Nov 12, 2025 | Details |
| Claude Sonnet 4.6 | Anthropic | 90.5% | — | Unverified | Feb 17, 2026 | Details |
| Gemini 2.5 Pro | Google DeepMind | 90.3% | — | Unverified | Mar 25, 2025 | Details |
| Qwen3.5-397B-A17B | Alibaba / Qwen | 90.0% | — | Unverified | Feb 16, 2026 | Details |
| GPT-5.2 | OpenAI | 89.8% | — | Unverified | Dec 11, 2025 | Details |
| Grok 4 | xAI | 89.5% | — | Unverified | Jul 9, 2025 | Details |
| Grok 4.20 beta reasoning | xAI | 89.5% | — | Unverified | Mar 9, 2026 | Details |
| Claude Sonnet 4.5 | Anthropic | 89.3% | — | Unverified | Sep 29, 2025 | Details |
| GPT-5 mini | OpenAI | 87.4% | — | Unverified | Aug 7, 2025 | Details |
| DeepSeek V3.2 | DeepSeek | 86.5% | — | Unverified | Dec 1, 2025 | Details |
| DeepSeek R1 | DeepSeek | 86.0% | — | Unverified | Jan 20, 2025 | Details |
| GLM-4.6 | Z.ai | 85.6% | — | Unverified | Sep 30, 2025 | Details |
| Grok 4.1 fast reasoning | xAI | 85.6% | — | Unverified | Nov 19, 2025 | Details |
| MiniMax M2.5 | MiniMax | 84.2% | — | Unverified | Feb 12, 2026 | Details |
| Kimi K2.5 | Moonshot AI | 84.0% | — | Unverified | Jan 27, 2026 | Details |
| MiniMax M2.1 | MiniMax | 84.0% | — | Unverified | Dec 23, 2025 | Details |
| Claude Haiku 4.5 | Anthropic | 83.4% | — | Unverified | Oct 15, 2025 | Details |
| Qwen3 Max | Alibaba / Qwen | 83.3% | — | Unverified | Sep 5, 2025 | Details |
| GPT-OSS-120B | OpenAI | 82.8% | — | Unverified | Aug 5, 2025 | Details |
| DeepSeek V3.1 | DeepSeek | 82.7% | — | Unverified | Aug 21, 2025 | Details |
| Llama 4 Maverick | Meta | 82.5% | — | Unverified | Apr 5, 2025 | Details |
| GLM-5 | Z.ai | 81.9% | — | Unverified | Feb 11, 2026 | Details |
| Mistral Large | Mistral AI | 80.3% | — | Unverified | Feb 26, 2024 | Details |
| GLM-4.7 | Z.ai | 79.9% | — | Unverified | Dec 22, 2025 | Details |
| Llama 4 Scout | Meta | 74.1% | — | Unverified | Apr 5, 2025 | Details |
| Kimi K2 Thinking | Moonshot AI | 73.5% | — | Unverified | Nov 6, 2025 | Details |
Each row reports the model’s accuracy on Global-MMLU. Click a row for the full run context.