evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

Global-MMLU

A multilingual extension of MMLU covering 42 languages with culturally-sensitive and culturally-agnostic multiple-choice knowledge questions, measuring accuracy across diverse high-, mid-, and low-resource languages.

ReasoningaccuracyHigher is better
ModelLabScoreSource modelStatusDate
Gemini 3.1 Pro PreviewGoogle DeepMind93.2%UnverifiedFeb 19, 2026Details
Gemini 3 ProGoogle DeepMind92.2%UnverifiedNov 18, 2025Details
Claude Opus 4.6Anthropic92.2%UnverifiedFeb 5, 2026Details
Gemini 3 FlashGoogle DeepMind91.4%UnverifiedDec 17, 2025Details
Claude Opus 4.5Anthropic91.3%UnverifiedNov 24, 2025Details
GPT-5OpenAI90.7%UnverifiedAug 7, 2025Details
GPT-5.1OpenAI90.6%UnverifiedNov 12, 2025Details
Claude Sonnet 4.6Anthropic90.5%UnverifiedFeb 17, 2026Details
Gemini 2.5 ProGoogle DeepMind90.3%UnverifiedMar 25, 2025Details
Qwen3.5-397B-A17BAlibaba / Qwen90.0%UnverifiedFeb 16, 2026Details
GPT-5.2OpenAI89.8%UnverifiedDec 11, 2025Details
Grok 4xAI89.5%UnverifiedJul 9, 2025Details
Grok 4.20 beta reasoningxAI89.5%UnverifiedMar 9, 2026Details
Claude Sonnet 4.5Anthropic89.3%UnverifiedSep 29, 2025Details
GPT-5 miniOpenAI87.4%UnverifiedAug 7, 2025Details
DeepSeek V3.2DeepSeek86.5%UnverifiedDec 1, 2025Details
DeepSeek R1DeepSeek86.0%UnverifiedJan 20, 2025Details
GLM-4.6Z.ai85.6%UnverifiedSep 30, 2025Details
Grok 4.1 fast reasoningxAI85.6%UnverifiedNov 19, 2025Details
MiniMax M2.5MiniMax84.2%UnverifiedFeb 12, 2026Details
Kimi K2.5Moonshot AI84.0%UnverifiedJan 27, 2026Details
MiniMax M2.1MiniMax84.0%UnverifiedDec 23, 2025Details
Claude Haiku 4.5Anthropic83.4%UnverifiedOct 15, 2025Details
Qwen3 MaxAlibaba / Qwen83.3%UnverifiedSep 5, 2025Details
GPT-OSS-120BOpenAI82.8%UnverifiedAug 5, 2025Details
DeepSeek V3.1DeepSeek82.7%UnverifiedAug 21, 2025Details
Llama 4 MaverickMeta82.5%UnverifiedApr 5, 2025Details
GLM-5Z.ai81.9%UnverifiedFeb 11, 2026Details
Mistral LargeMistral AI80.3%UnverifiedFeb 26, 2024Details
GLM-4.7Z.ai79.9%UnverifiedDec 22, 2025Details
Llama 4 ScoutMeta74.1%UnverifiedApr 5, 2025Details
Kimi K2 ThinkingMoonshot AI73.5%UnverifiedNov 6, 2025Details

Each row reports the model’s accuracy on Global-MMLU. Click a row for the full run context.