evals.report
BenchmarksSourcesLabsCompareRun guides
SourcesMultimodal

MMMU-Pro

Leading multimodal reasoning benchmark; MMMU-Pro is the harder variant frontier models still report.

NextRaw JSONStructured dataRun guide readyPublic data
Official source Benchmark page

Source detail

Score source

The official MMMU site publishes a maintained leaderboard JSON (leaderboard_data.json) with MMMU-Pro overall scores.

Run guide

Repo has evaluation scripts and prompts for MMMU-Pro.

How it can be used

Use the official leaderboard's MMMU-Pro overall accuracy; keep tool use / thinking as run context.

Caveat

Preserve multimodal prompt packaging and image handling as run context.

Evidence links 2