SourcesMultimodal
MMMU-Pro
Leading multimodal reasoning benchmark; MMMU-Pro is the harder variant frontier models still report.
NextRaw JSONStructured dataRun guide readyPublic data
Source detail
Score source
The official MMMU site publishes a maintained leaderboard JSON (leaderboard_data.json) with MMMU-Pro overall scores.
Run guide
Repo has evaluation scripts and prompts for MMMU-Pro.
How it can be used
Use the official leaderboard's MMMU-Pro overall accuracy; keep tool use / thinking as run context.
Caveat
Preserve multimodal prompt packaging and image handling as run context.