evals.report
BenchmarksLabsCompareRun guides
BenchmarksChat preference

WebDev Arena

A live, community-driven leaderboard where two LLMs compete head-to-head to build interactive web applications from user-submitted prompts, with human votes ranking models by a Bradley-Terry (Elo-like) score.

Chat preferenceEloHigher is better
ModelLabScoreSource modelStatusDate
Claude Opus 4.7Anthropic1562VerifiedApr 16, 2026Details
Qwen3.7 Max PreviewAlibaba / Qwen1541VerifiedMay 14, 2026Details
Claude Opus 4.6Anthropic1538VerifiedFeb 5, 2026Details
GLM-5.1Z.ai1533VerifiedApr 7, 2026Details
Claude Sonnet 4.6Anthropic1523VerifiedFeb 17, 2026Details
Kimi K2.6Moonshot AI1518VerifiedApr 20, 2026Details
Muse SparkMeta1508VerifiedApr 8, 2026Details
Gemini 3.5 FlashGoogle DeepMind1506VerifiedMay 19, 2026Details
GPT-5.5OpenAI1505VerifiedApr 23, 2026Details
Qwen 3.6 Max PreviewAlibaba / Qwen1486VerifiedApr 20, 2026Details
MiMo-V2.5-ProXiaomi1471VerifiedApr 22, 2026Details
Claude Opus 4.5Anthropic1467VerifiedNov 24, 2025Details
DeepSeek V4 ProDeepSeek1464VerifiedApr 24, 2026Details
Qwen 3.6 PlusAlibaba / Qwen1460VerifiedApr 2, 2026Details
Gemini 3.1 Pro PreviewGoogle DeepMind1448VerifiedFeb 19, 2026Details
GLM-4.7Z.ai1440VerifiedDec 22, 2025Details
MiMo-V2.5Xiaomi1440VerifiedApr 22, 2026Details
Gemini 3 ProGoogle DeepMind1438VerifiedNov 18, 2025Details
Gemini 3 FlashGoogle DeepMind1437VerifiedDec 17, 2025Details
GLM-5Z.ai1436VerifiedFeb 11, 2026Details
Kimi K2.5Moonshot AI1431VerifiedJan 27, 2026Details
GPT-5.3-CodexOpenAI1407VerifiedFeb 5, 2026Details
GPT-5.2OpenAI1404VerifiedDec 11, 2025Details
MiniMax M2.7MiniMax1401VerifiedMar 18, 2026Details
Grok 4.20 beta reasoningxAI1395VerifiedMar 9, 2026Details
GPT-5OpenAI1394VerifiedAug 7, 2025Details
Qwen3.5-397B-A17BAlibaba / Qwen1393VerifiedFeb 16, 2026Details
MiniMax M2.1MiniMax1392VerifiedDec 23, 2025Details
GPT-5.1OpenAI1391VerifiedNov 12, 2025Details
GPT-5.4OpenAI1388VerifiedMar 5, 2026Details
Claude Sonnet 4.5Anthropic1386VerifiedSep 29, 2025Details
Claude Opus 4.1Anthropic1386VerifiedAug 5, 2025Details
MiniMax M2.5MiniMax1382VerifiedFeb 12, 2026Details
Grok 4.3xAI1377VerifiedApr 17, 2026Details
GLM-4.6Z.ai1355VerifiedSep 30, 2025Details
GPT-5.2-CodexOpenAI1335VerifiedDec 18, 2025Details
DeepSeek V3.2DeepSeek1332VerifiedDec 1, 2025Details
Kimi K2 ThinkingMoonshot AI1329VerifiedNov 6, 2025Details
Claude Haiku 4.5Anthropic1322VerifiedOct 15, 2025Details
Qwen 3 Coder 480BAlibaba / Qwen1282VerifiedJul 22, 2025Details
Grok 4.1 fast reasoningxAI1234VerifiedNov 19, 2025Details
Mistral LargeMistral AI1223VerifiedFeb 26, 2024Details
Gemini 2.5 ProGoogle DeepMind1204VerifiedMar 25, 2025Details

Each row reports the model’s Elo on WebDev Arena. Click a row for the full run context.