evals.report
BenchmarksLabsCompareRun guides
BenchmarksMultimodal

Video-MME

A comprehensive evaluation benchmark for multimodal LLMs in video analysis, using 900 videos (254 hours) and 2,700 human-annotated multiple-choice QA pairs across short, medium, and long durations, scored by answer accuracy with and without subtitles.

MultimodalaccuracyHigher is better

No run guide for this benchmark yet.