BenchmarksMultimodal
Video-MMMU
A multi-discipline benchmark evaluating large multimodal models' ability to acquire and apply knowledge from expert-level professional videos across six disciplines through three cognitive stages (Perception, Comprehension, Adaptation), measured by question-answering accuracy.
MultimodalaccuracyHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Gemini 3 Pro | Google DeepMind | 87.6% | — | Verified | Nov 18, 2025 | Details |
| Gemini 3 Flash | Google DeepMind | 86.9% | — | Unverified | Dec 17, 2025 | Details |
| Kimi K2.5 | Moonshot AI | 86.6% | — | Unverified | Jan 27, 2026 | Details |
| GPT-5.2 | OpenAI | 85.9% | — | Unverified | Dec 11, 2025 | Details |
| GPT-5 | OpenAI | 84.6% | — | Unverified | Aug 7, 2025 | Details |
| Gemini 2.5 Pro | Google DeepMind | 83.6% | — | Verified | Mar 25, 2025 | Details |
| o3 | OpenAI | 83.3% | — | Unverified | Apr 16, 2025 | Details |
| Claude 3.5 Sonnet | Anthropic | 65.8% | — | Official | Jun 20, 2024 | Details |
| GPT-4o | OpenAI | 61.2% | — | Official | May 13, 2024 | Details |
| Gemini 1.5 Pro | Google DeepMind | 53.9% | — | Official | Feb 15, 2024 | Details |
Each row reports the model’s accuracy on Video-MMMU. Click a row for the full run context.