BenchmarksMultimodal
MMMU (Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark)
A benchmark of ~11.5K college-level multimodal questions spanning 30 subjects and 183 subfields across six disciplines, measuring a vision-language model's accuracy at jointly perceiving images (charts, diagrams, maps, tables, etc.) and reasoning with domain knowledge.
MultimodalaccuracyHigher is better
No run guide for this benchmark yet.