BenchmarksMultimodal
CharXiv
A multimodal benchmark of 2,323 real scientific charts from arXiv papers that evaluates chart understanding in MLLMs via descriptive questions and complex reasoning questions, with the reasoning split (CharXiv-R) measuring accuracy on questions that require synthesizing information across chart elements.
MultimodalaccuracyHigher is better
No run guide for this benchmark yet.