BenchmarksChat preference
WebDev Arena
A live, community-driven leaderboard where two LLMs compete head-to-head to build interactive web applications from user-submitted prompts, with human votes ranking models by a Bradley-Terry (Elo-like) score.
Chat preferenceEloHigher is better
No run guide for this benchmark yet.