evals.report
BenchmarksLabsCompareRun guides
BenchmarksMultimodal

MMMU-Pro

The harder MMMU-Pro multimodal reasoning benchmark (college-level subject tasks with text and images); the variant current frontier models report.

MultimodalaccuracyHigher is better

Repo has evaluation scripts and prompts for MMMU-Pro.

Benchmark
MMMU-Pro
Dataset
huggingface.co/datasets/MMMU/MMMU_Pro
Metric
accuracy

1Expected output

Use the official source links for current output format, submission steps, and benchmark-specific result files.

2Submit results

Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.

Gotchas

Preserve multimodal prompt packaging and image handling as run context.
Do not mix this benchmark's metric with unrelated benchmark metrics.