evals.report
BenchmarksLabsCompareRun guides
BenchmarksChat preference

LMArena

A public chat-preference evaluation surface with source-defined preference ratings and model comparisons.

Chat preferencesource-defined ratingHigher is better
ModelLabScoreSource modelStatusDate
Claude Opus 4.6 thinkingAnthropic1499claude-opus-4-6-thinkingOfficialMay 27, 2026Details
Claude Opus 4.6Anthropic1497claude-opus-4-6OfficialMay 27, 2026Details
Claude Opus 4.7 thinkingAnthropic1486claude-opus-4-7-thinkingOfficialMay 27, 2026Details
Gemini 3.5 FlashGoogle DeepMind1482gemini-3.5-flashOfficialMay 27, 2026Details
Gemini 3.1 Pro PreviewGoogle DeepMind1481gemini-3.1-pro-previewOfficialMay 27, 2026Details
Claude Opus 4.7Anthropic1480claude-opus-4-7OfficialMay 27, 2026Details
Gemini 3 ProGoogle DeepMind1479gemini-3-proOfficialMay 27, 2026Details
Qwen3.7 Max PreviewAlibaba / Qwen1474qwen3.7-max-previewOfficialMay 27, 2026Details
Muse SparkMeta1474muse-sparkOfficialMay 27, 2026Details
GPT-5.4OpenAI1472gpt-5.4-highOfficialMay 27, 2026Details
Qwen3.5 Max PreviewAlibaba / Qwen1470qwen3.5-max-previewOfficialMay 27, 2026Details
ERNIE 5.1Baidu1469ernie-5.1OfficialMay 27, 2026Details
GLM-5.1Z.ai1469glm-5.1OfficialMay 27, 2026Details
GPT-5.5 highOpenAI1468gpt-5.5-highOfficialMay 27, 2026Details
Gemini 3 FlashGoogle DeepMind1466gemini-3-flashOfficialMay 27, 2026Details
GPT-5.5OpenAI1463gpt-5.5OfficialMay 27, 2026Details
Gemini 2.5 ProGoogle DeepMind1457gemini-2.5-proOfficialMay 27, 2026Details
Claude Sonnet 4.6Anthropic1454claude-sonnet-4-6OfficialMay 27, 2026Details
Grok 4.20 beta reasoningxAI1453grok-4.20-beta-0309-reasoningOfficialMay 27, 2026Details
DeepSeek V4 ProDeepSeek1446deepseek-v4-proOfficialMay 27, 2026Details

Each row reports the model’s source-defined rating on LMArena. Click a row for the full run context.