BenchmarksChat preference
Design Arena
A crowdsourced human-preference benchmark where top AI models receive identical design/frontend prompts and users vote head-to-head on the anonymized outputs, producing a Bradley-Terry (Elo) ranking of design taste across categories like websites, UI components, games, and data visualization.
Chat preferenceEloHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 | Anthropic | 1344 | — | Verified | Feb 5, 2026 | Details |
| GLM-5.1 | Z.ai | 1335 | — | Verified | Apr 7, 2026 | Details |
| Kimi K2.6 | Moonshot AI | 1335 | — | Verified | Apr 20, 2026 | Details |
| Claude Opus 4.7 | Anthropic | 1328 | — | Verified | Apr 16, 2026 | Details |
| Claude Sonnet 4.6 | Anthropic | 1327 | — | Verified | Feb 17, 2026 | Details |
| MiMo-V2.5-Pro | Xiaomi | 1325 | — | Verified | Apr 22, 2026 | Details |
| MiniMax M3 | MiniMax | 1321 | — | Verified | Jun 1, 2026 | Details |
| MiMo-V2.5 | Xiaomi | 1309 | — | Verified | Apr 22, 2026 | Details |
| Muse Spark | Meta | 1306 | — | Verified | Apr 8, 2026 | Details |
| DeepSeek V4 Pro | DeepSeek | 1302 | — | Verified | Apr 24, 2026 | Details |
| GPT-5.5 | OpenAI | 1301 | — | Verified | Apr 23, 2026 | Details |
| GLM-5 | Z.ai | 1300 | — | Verified | Feb 11, 2026 | Details |
| Gemini 3.5 Flash | Google DeepMind | 1297 | — | Verified | May 19, 2026 | Details |
| Claude Opus 4.5 | Anthropic | 1295 | — | Verified | Nov 24, 2025 | Details |
| Gemini 3 Pro | Google DeepMind | 1295 | — | Verified | Nov 18, 2025 | Details |
| Kimi K2.5 | Moonshot AI | 1292 | — | Verified | Jan 27, 2026 | Details |
| MiniMax M2.7 | MiniMax | 1285 | — | Verified | Mar 18, 2026 | Details |
| Claude Opus 4.8 | Anthropic | 1282 | — | Verified | May 28, 2026 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 1282 | — | Verified | Feb 19, 2026 | Details |
| Qwen 3.6 Plus | Alibaba / Qwen | 1281 | — | Verified | Apr 2, 2026 | Details |
| GLM-4.7 | Z.ai | 1273 | — | Verified | Dec 22, 2025 | Details |
| Grok 4.20 beta reasoning | xAI | 1271 | — | Verified | Mar 9, 2026 | Details |
| DeepSeek V4 Flash | DeepSeek | 1268 | — | Verified | Apr 24, 2026 | Details |
| GPT-5.4 | OpenAI | 1264 | — | Verified | Mar 5, 2026 | Details |
| MiniMax M2.5 | MiniMax | 1261 | — | Verified | Feb 12, 2026 | Details |
| Grok 4.3 | xAI | 1260 | — | Verified | Apr 17, 2026 | Details |
| MiniMax M2.1 | MiniMax | 1245 | — | Verified | Dec 23, 2025 | Details |
| Gemini 3 Flash | Google DeepMind | 1244 | — | Verified | Dec 17, 2025 | Details |
| Claude Sonnet 4.5 | Anthropic | 1235 | — | Verified | Sep 29, 2025 | Details |
| Qwen3.5-397B-A17B | Alibaba / Qwen | 1233 | — | Verified | Feb 16, 2026 | Details |
| Claude 3.7 Sonnet | Anthropic | 1231 | — | Verified | Feb 24, 2025 | Details |
| GPT-5.2 | OpenAI | 1224 | — | Verified | Dec 11, 2025 | Details |
| GPT-5 | OpenAI | 1223 | — | Verified | Aug 7, 2025 | Details |
| DeepSeek V3.2 | DeepSeek | 1220 | — | Verified | Dec 1, 2025 | Details |
| GLM-4.6 | Z.ai | 1220 | — | Verified | Sep 30, 2025 | Details |
| Claude Opus 4.1 | Anthropic | 1219 | — | Verified | Aug 5, 2025 | Details |
| GPT-5.1 | OpenAI | 1216 | — | Verified | Nov 12, 2025 | Details |
| Claude Opus 4 | Anthropic | 1215 | — | Verified | May 22, 2025 | Details |
| Gemini 2.5 Pro | Google DeepMind | 1208 | — | Verified | Mar 25, 2025 | Details |
| GPT-5.3-Codex | OpenAI | 1199 | — | Verified | Feb 5, 2026 | Details |
| Qwen 3 Coder 480B | Alibaba / Qwen | 1197 | — | Verified | Jul 22, 2025 | Details |
| Claude Sonnet 4 | Anthropic | 1196 | — | Verified | May 22, 2025 | Details |
| DeepSeek R1 | DeepSeek | 1193 | — | Verified | Jan 20, 2025 | Details |
| Mistral Medium 3.5 | Mistral AI | 1176 | — | Verified | Apr 28, 2026 | Details |
| GPT-5 mini | OpenAI | 1170 | — | Verified | Aug 7, 2025 | Details |
| Claude Haiku 4.5 | Anthropic | 1169 | — | Verified | Oct 15, 2025 | Details |
| DeepSeek V3.1 | DeepSeek | 1166 | — | Verified | Aug 21, 2025 | Details |
| Qwen3 Max | Alibaba / Qwen | 1165 | — | Verified | Sep 5, 2025 | Details |
| DeepSeek V3 0324 | DeepSeek | 1163 | — | Verified | Mar 24, 2025 | Details |
| Grok 4.1 fast reasoning | xAI | 1142 | — | Verified | Nov 19, 2025 | Details |
| Gemini 2.5 Flash | Google DeepMind | 1113 | — | Verified | Apr 17, 2025 | Details |
| Qwen3 235B A22B Instruct 2507 | Alibaba / Qwen | 1093 | — | Verified | Jul 21, 2025 | Details |
| Kimi K2 Instruct | Moonshot AI | 1088 | — | Verified | Jul 11, 2025 | Details |
| GPT-4.1 | OpenAI | 1080 | — | Verified | Apr 14, 2025 | Details |
| o3 | OpenAI | 1074 | — | Verified | Apr 16, 2025 | Details |
| Grok 4 | xAI | 1070 | — | Verified | Jul 9, 2025 | Details |
| o4-mini | OpenAI | 1030 | — | Verified | Apr 16, 2025 | Details |
| OLMo 3.1-Think 32B | Allen Institute for AI | 1029 | — | Verified | Dec 12, 2025 | Details |
| GPT-OSS-120B | OpenAI | 1017 | — | Verified | Aug 5, 2025 | Details |
| Llama 4 Maverick | Meta | 934 | — | Verified | Apr 5, 2025 | Details |
| GPT-4o | OpenAI | 915 | — | Verified | May 13, 2024 | Details |
| Llama 4 Scout | Meta | 844 | — | Verified | Apr 5, 2025 | Details |
Each row reports the model’s Elo on Design Arena. Click a row for the full run context.