BenchmarksChat preference
WebDev Arena
A live, community-driven leaderboard where two LLMs compete head-to-head to build interactive web applications from user-submitted prompts, with human votes ranking models by a Bradley-Terry (Elo-like) score.
Chat preferenceEloHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Claude Opus 4.7 | Anthropic | 1562 | — | Verified | Apr 16, 2026 | Details |
| Qwen3.7 Max Preview | Alibaba / Qwen | 1541 | — | Verified | May 14, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 1538 | — | Verified | Feb 5, 2026 | Details |
| GLM-5.1 | Z.ai | 1533 | — | Verified | Apr 7, 2026 | Details |
| Claude Sonnet 4.6 | Anthropic | 1523 | — | Verified | Feb 17, 2026 | Details |
| Kimi K2.6 | Moonshot AI | 1518 | — | Verified | Apr 20, 2026 | Details |
| Muse Spark | Meta | 1508 | — | Verified | Apr 8, 2026 | Details |
| Gemini 3.5 Flash | Google DeepMind | 1506 | — | Verified | May 19, 2026 | Details |
| GPT-5.5 | OpenAI | 1505 | — | Verified | Apr 23, 2026 | Details |
| Qwen 3.6 Max Preview | Alibaba / Qwen | 1486 | — | Verified | Apr 20, 2026 | Details |
| MiMo-V2.5-Pro | Xiaomi | 1471 | — | Verified | Apr 22, 2026 | Details |
| Claude Opus 4.5 | Anthropic | 1467 | — | Verified | Nov 24, 2025 | Details |
| DeepSeek V4 Pro | DeepSeek | 1464 | — | Verified | Apr 24, 2026 | Details |
| Qwen 3.6 Plus | Alibaba / Qwen | 1460 | — | Verified | Apr 2, 2026 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 1448 | — | Verified | Feb 19, 2026 | Details |
| GLM-4.7 | Z.ai | 1440 | — | Verified | Dec 22, 2025 | Details |
| MiMo-V2.5 | Xiaomi | 1440 | — | Verified | Apr 22, 2026 | Details |
| Gemini 3 Pro | Google DeepMind | 1438 | — | Verified | Nov 18, 2025 | Details |
| Gemini 3 Flash | Google DeepMind | 1437 | — | Verified | Dec 17, 2025 | Details |
| GLM-5 | Z.ai | 1436 | — | Verified | Feb 11, 2026 | Details |
| Kimi K2.5 | Moonshot AI | 1431 | — | Verified | Jan 27, 2026 | Details |
| GPT-5.3-Codex | OpenAI | 1407 | — | Verified | Feb 5, 2026 | Details |
| GPT-5.2 | OpenAI | 1404 | — | Verified | Dec 11, 2025 | Details |
| MiniMax M2.7 | MiniMax | 1401 | — | Verified | Mar 18, 2026 | Details |
| Grok 4.20 beta reasoning | xAI | 1395 | — | Verified | Mar 9, 2026 | Details |
| GPT-5 | OpenAI | 1394 | — | Verified | Aug 7, 2025 | Details |
| Qwen3.5-397B-A17B | Alibaba / Qwen | 1393 | — | Verified | Feb 16, 2026 | Details |
| MiniMax M2.1 | MiniMax | 1392 | — | Verified | Dec 23, 2025 | Details |
| GPT-5.1 | OpenAI | 1391 | — | Verified | Nov 12, 2025 | Details |
| GPT-5.4 | OpenAI | 1388 | — | Verified | Mar 5, 2026 | Details |
| Claude Sonnet 4.5 | Anthropic | 1386 | — | Verified | Sep 29, 2025 | Details |
| Claude Opus 4.1 | Anthropic | 1386 | — | Verified | Aug 5, 2025 | Details |
| MiniMax M2.5 | MiniMax | 1382 | — | Verified | Feb 12, 2026 | Details |
| Grok 4.3 | xAI | 1377 | — | Verified | Apr 17, 2026 | Details |
| GLM-4.6 | Z.ai | 1355 | — | Verified | Sep 30, 2025 | Details |
| GPT-5.2-Codex | OpenAI | 1335 | — | Verified | Dec 18, 2025 | Details |
| DeepSeek V3.2 | DeepSeek | 1332 | — | Verified | Dec 1, 2025 | Details |
| Kimi K2 Thinking | Moonshot AI | 1329 | — | Verified | Nov 6, 2025 | Details |
| Claude Haiku 4.5 | Anthropic | 1322 | — | Verified | Oct 15, 2025 | Details |
| Qwen 3 Coder 480B | Alibaba / Qwen | 1282 | — | Verified | Jul 22, 2025 | Details |
| Grok 4.1 fast reasoning | xAI | 1234 | — | Verified | Nov 19, 2025 | Details |
| Mistral Large | Mistral AI | 1223 | — | Verified | Feb 26, 2024 | Details |
| Gemini 2.5 Pro | Google DeepMind | 1204 | — | Verified | Mar 25, 2025 | Details |
Each row reports the model’s Elo on WebDev Arena. Click a row for the full run context.