WebDev Arena
A live, community-driven leaderboard where two LLMs compete head-to-head to build interactive web applications from user-submitted prompts, with human votes ranking models by a Bradley-Terry (Elo-like) score.
What this benchmark measures
A live, community-driven leaderboard where two LLMs compete head-to-head to build interactive web applications from user-submitted prompts, with human votes ranking models by a Bradley-Terry (Elo-like) score.
Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.
The metric shown here is Elo. It should be interpreted within WebDev Arena, not compared as part of a site-wide ranking.