BenchmarksMultimodal
ZeroBench
An intentionally 'impossible' visual reasoning benchmark of 100 hand-crafted main questions (plus 334 subquestions) on which contemporary large multimodal models score near zero, designed to provide maximum headroom for measuring genuine multi-step visual understanding.
MultimodalaccuracyHigher is better
No run guide for this benchmark yet.