Question 1

What is MathVista?

Accepted Answer

A benchmark of 6,141 examples (evaluated on the 1,000-example testmini split) that measures mathematical reasoning in visual contexts, spanning figure QA, geometry, math word problems, textbook QA, and visual QA, reported as answer accuracy. It is a multimodal benchmark measured by accuracy.

Question 2

What does accuracy mean on MathVista?

Accepted Answer

MathVista reports accuracy (%); higher is better. Scores are shown only within MathVista and are never averaged with other benchmarks.

Question 3

What is the top reported MathVista score?

Accepted Answer

o3 has the top reported score on MathVista: 86.8% (accuracy).

Question 4

Why do MathVista scores differ across runs?

Accepted Answer

Harness, scaffold, reasoning effort, and prompt setup change results, so two runs of the same model can differ. evals.report keeps each score with its run context so the differences stay visible.

Question 5

Does evals.report rank models across benchmarks?

Accepted Answer

No. MathVista scores are shown within their own metric; evals.report never combines benchmarks into a composite ranking or a single "best model".

MathVista

What this benchmark measures

Frequently asked