BenchmarksMultimodal
MMMU-Pro
The harder MMMU-Pro multimodal reasoning benchmark (college-level subject tasks with text and images); the variant current frontier models report.
MultimodalaccuracyHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| GPT-5.4 | OpenAI | 82.1% | GPT-5.4 Thinking w/ tools | Official | Apr 8, 2026 | Details |
| Gemini 3 Pro | Google DeepMind | 81.0% | Gemini 3.0 Pro | Official | Apr 8, 2026 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 80.5% | Gemini 3.1 Pro Thinking (High) | Official | Apr 8, 2026 | Details |
| Muse Spark | Meta | 80.4% | Muse Spark Thinking | Official | Apr 8, 2026 | Details |
| GPT-5.2 | OpenAI | 80.4% | GPT-5.2 Thinking w/o Python | Official | Apr 8, 2026 | Details |
| GPT-5.1 | OpenAI | 79.0% | GPT-5.1 Thinking | Official | Apr 8, 2026 | Details |
| GPT-5 high | OpenAI | 78.4% | GPT-5 w/ thinking | Official | Apr 8, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 77.3% | Claude Opus 4.6 w/ tools | Official | Apr 8, 2026 | Details |
| o3 | OpenAI | 76.4% | o3 | Official | Apr 8, 2026 | Details |
| Claude Sonnet 4.6 | Anthropic | 75.6% | Claude Sonnet 4.6 w/ tools | Official | Apr 8, 2026 | Details |
| Claude Opus 4.5 | Anthropic | 73.9% | Claude Opus 4.5 | Official | Apr 8, 2026 | Details |
| Claude Sonnet 4.5 | Anthropic | 68.9% | Claude Sonnet 4.5 | Official | Apr 8, 2026 | Details |
| Gemini 2.5 Pro | Google DeepMind | 68.0% | Gemini 2.5 Pro 05-06 | Official | Apr 8, 2026 | Details |
Each row reports the model’s accuracy on MMMU-Pro. Click a row for the full run context.