BenchmarksMultimodal
MMMU-Pro
The harder MMMU-Pro multimodal reasoning benchmark (college-level subject tasks with text and images); the variant current frontier models report.
MultimodalaccuracyHigher is better
Repo has evaluation scripts and prompts for MMMU-Pro.
1Expected output
Use the official source links for current output format, submission steps, and benchmark-specific result files.
2Submit results
Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.
Gotchas
Preserve multimodal prompt packaging and image handling as run context.
Do not mix this benchmark's metric with unrelated benchmark metrics.