evals.report
BenchmarksLabsCompareRun guides
BenchmarksMultimodal

ZeroBench

An intentionally 'impossible' visual reasoning benchmark of 100 hand-crafted main questions (plus 334 subquestions) on which contemporary large multimodal models score near zero, designed to provide maximum headroom for measuring genuine multi-step visual understanding.

MultimodalaccuracyHigher is better

No run guide for this benchmark yet.