evals.report
BenchmarksLabsCompareRun guides
BenchmarksMultimodal

Video-MMMU

A multi-discipline benchmark evaluating large multimodal models' ability to acquire and apply knowledge from expert-level professional videos across six disciplines through three cognitive stages (Perception, Comprehension, Adaptation), measured by question-answering accuracy.

MultimodalaccuracyHigher is better
ModelLabScoreSource modelStatusDate
Gemini 3 ProGoogle DeepMind87.6%VerifiedNov 18, 2025Details
Gemini 3 FlashGoogle DeepMind86.9%UnverifiedDec 17, 2025Details
Kimi K2.5Moonshot AI86.6%UnverifiedJan 27, 2026Details
GPT-5.2OpenAI85.9%UnverifiedDec 11, 2025Details
GPT-5OpenAI84.6%UnverifiedAug 7, 2025Details
Gemini 2.5 ProGoogle DeepMind83.6%VerifiedMar 25, 2025Details
o3OpenAI83.3%UnverifiedApr 16, 2025Details
Claude 3.5 SonnetAnthropic65.8%OfficialJun 20, 2024Details
GPT-4oOpenAI61.2%OfficialMay 13, 2024Details
Gemini 1.5 ProGoogle DeepMind53.9%OfficialFeb 15, 2024Details

Each row reports the model’s accuracy on Video-MMMU. Click a row for the full run context.