MMLU-ProX

Name: MMLU-ProX
Creator: evals.report
License: https://creativecommons.org/licenses/by/4.0/

A multilingual extension of MMLU-Pro spanning 29 typologically diverse languages with 11,829 parallel reasoning-focused multiple-choice questions (10 answer choices) per language, measuring LLM reasoning and knowledge across linguistic and cultural boundaries.

ReasoningaccuracyHigher is better

Scores About Run this benchmark

Model	Lab	Score↓	Source model	Status	Date
DeepSeek R1	DeepSeek	75.5%	—	Official	Jan 20, 2025	Details
GPT-4.1	OpenAI	72.7%	—	Official	Apr 14, 2025	Details
DeepSeek V3	DeepSeek	70.5%	—	Official	Dec 26, 2024	Details
o4-mini	OpenAI	69.3%	—	Official	Apr 16, 2025	Details
Llama 3.1 405B	Meta	60.1%	—	Verified	Jul 23, 2024	Details

Each row reports the model’s accuracy on MMLU-ProX. Click a row for the full run context.