BenchmarksMultimodal
Video-MME
A comprehensive evaluation benchmark for multimodal LLMs in video analysis, using 900 videos (254 hours) and 2,700 human-annotated multiple-choice QA pairs across short, medium, and long durations, scored by answer accuracy with and without subtitles.
MultimodalaccuracyHigher is better
No run guide for this benchmark yet.