BenchmarksMultimodal
OCRBench v2
A large-scale bilingual (English/Chinese) text-centric benchmark of ~10,000 human-verified QA pairs across 31 scenarios that evaluates large multimodal models on visual text localization, recognition, parsing, and reasoning.
MultimodalaccuracyHigher is better
No run guide for this benchmark yet.