Question 1

What is GDPval?

Accepted Answer

GDPval evaluates AI models agentically (shell + web access via a sandbox harness) on real-world economically valuable knowledge-work deliverables — documents, spreadsheets, slides, diagrams — spanning 44 occupations across 9 major U.S. GDP industries, scored by blind pairwise quality comparison; the Artificial Analysis GDPval-AA variant reports results as an Elo rating. It is a agents benchmark measured by Elo.

Question 2

What does Elo mean on GDPval?

Accepted Answer

GDPval reports Elo; higher is better. Scores are shown only within GDPval and are never averaged with other benchmarks.

Question 3

What is the top reported GDPval score?

Accepted Answer

Claude Fable 5 has the top reported score on GDPval: 1932 (Elo).

Question 4

Why do GDPval scores differ across runs?

Accepted Answer

Harness, scaffold, reasoning effort, and prompt setup change results, so two runs of the same model can differ. evals.report keeps each score with its run context so the differences stay visible.

Question 5

Does evals.report rank models across benchmarks?

Accepted Answer

No. GDPval scores are shown within their own metric; evals.report never combines benchmarks into a composite ranking or a single "best model".

GDPval

What this benchmark measures

Frequently asked