evals.report
BenchmarksLabsCompareRun guides
BenchmarksMultimodal

OCRBench v2

A large-scale bilingual (English/Chinese) text-centric benchmark of ~10,000 human-verified QA pairs across 31 scenarios that evaluates large multimodal models on visual text localization, recognition, parsing, and reasoning.

MultimodalaccuracyHigher is better

No run guide for this benchmark yet.