MMMU-Pro

Name: MMMU-Pro
Creator: evals.report

The harder MMMU-Pro multimodal reasoning benchmark (college-level subject tasks with text and images); the variant current frontier models report.

MultimodalaccuracyHigher is better

Model	Lab	Score↓	Source model	Status	Date
GPT-5.4	OpenAI	82.1%	GPT-5.4 Thinking w/ tools	Official	Apr 8, 2026	Details
Gemini 3 Pro	Google DeepMind	81.0%	Gemini 3.0 Pro	Official	Apr 8, 2026	Details
Gemini 3.1 Pro Preview	Google DeepMind	80.5%	Gemini 3.1 Pro Thinking (High)	Official	Apr 8, 2026	Details
Muse Spark	Meta	80.4%	Muse Spark Thinking	Official	Apr 8, 2026	Details
GPT-5.2	OpenAI	80.4%	GPT-5.2 Thinking w/o Python	Official	Apr 8, 2026	Details
GPT-5.1	OpenAI	79.0%	GPT-5.1 Thinking	Official	Apr 8, 2026	Details
GPT-5 high	OpenAI	78.4%	GPT-5 w/ thinking	Official	Apr 8, 2026	Details
Claude Opus 4.6	Anthropic	77.3%	Claude Opus 4.6 w/ tools	Official	Apr 8, 2026	Details
o3	OpenAI	76.4%	o3	Official	Apr 8, 2026	Details
Claude Sonnet 4.6	Anthropic	75.6%	Claude Sonnet 4.6 w/ tools	Official	Apr 8, 2026	Details
Claude Opus 4.5	Anthropic	73.9%	Claude Opus 4.5	Official	Apr 8, 2026	Details
Claude Sonnet 4.5	Anthropic	68.9%	Claude Sonnet 4.5	Official	Apr 8, 2026	Details
Gemini 2.5 Pro	Google DeepMind	68.0%	Gemini 2.5 Pro 05-06	Official	Apr 8, 2026	Details

Each row reports the model’s accuracy on MMMU-Pro. Click a row for the full run context.