GDPval
GDPval evaluates AI models agentically (shell + web access via a sandbox harness) on real-world economically valuable knowledge-work deliverables — documents, spreadsheets, slides, diagrams — spanning 44 occupations across 9 major U.S. GDP industries, scored by blind pairwise quality comparison; the Artificial Analysis GDPval-AA variant reports results as an Elo rating.
What this benchmark measures
GDPval evaluates AI models agentically (shell + web access via a sandbox harness) on real-world economically valuable knowledge-work deliverables — documents, spreadsheets, slides, diagrams — spanning 44 occupations across 9 major U.S. GDP industries, scored by blind pairwise quality comparison; the Artificial Analysis GDPval-AA variant reports results as an Elo rating.
Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.
The metric shown here is Elo. It should be interpreted within GDPval, not compared as part of a site-wide ranking.