evals.report
BenchmarksLabsCompareRun guides
BenchmarksMultimodal

CharXiv

A multimodal benchmark of 2,323 real scientific charts from arXiv papers that evaluates chart understanding in MLLMs via descriptive questions and complex reasoning questions, with the reasoning split (CharXiv-R) measuring accuracy on questions that require synthesizing information across chart elements.

MultimodalaccuracyHigher is better

No run guide for this benchmark yet.