evals.report
BenchmarksLabsCompareRun guides
BenchmarksChat preference

WebDev Arena

A live, community-driven leaderboard where two LLMs compete head-to-head to build interactive web applications from user-submitted prompts, with human votes ranking models by a Bradley-Terry (Elo-like) score.

Chat preferenceEloHigher is better

No run guide for this benchmark yet.