evals.report
BenchmarksLabsCompareRun guides
BenchmarksChat preference

Search Arena

A crowdsourced human-preference leaderboard from LMArena that ranks search-augmented LLMs via blind pairwise votes on grounded, web-search answers, reported as Bradley-Terry Elo-scale ratings.

Chat preferenceEloHigher is better
ModelLabScoreSource modelStatusDate
Claude Opus 4.6Anthropic1251VerifiedFeb 5, 2026Details
GPT-5.5OpenAI1239VerifiedApr 23, 2026Details
Claude Opus 4.7Anthropic1237VerifiedApr 16, 2026Details
ERNIE 5.1Baidu1226VerifiedMay 8, 2026Details
Claude Sonnet 4.6Anthropic1219VerifiedFeb 17, 2026Details
Gemini 3.1 Pro PreviewGoogle DeepMind1216VerifiedFeb 19, 2026Details
GPT-5.2OpenAI1210VerifiedDec 11, 2025Details
Gemini 3 ProGoogle DeepMind1208VerifiedNov 18, 2025Details
Gemini 3 FlashGoogle DeepMind1206VerifiedDec 17, 2025Details
GPT-5.1OpenAI1199VerifiedNov 12, 2025Details
GPT-5.4OpenAI1199VerifiedMar 5, 2026Details
Grok 4.20 beta reasoningxAI1193VerifiedMar 9, 2026Details
Grok 4.3xAI1189VerifiedApr 17, 2026Details
Claude Opus 4.5Anthropic1182VerifiedNov 24, 2025Details
Grok 4.1 fast reasoningxAI1175VerifiedNov 19, 2025Details
Claude Sonnet 4.5Anthropic1152VerifiedSep 29, 2025Details
Claude Opus 4.1Anthropic1148VerifiedAug 5, 2025Details
o3OpenAI1144VerifiedApr 16, 2025Details
Gemini 2.5 ProGoogle DeepMind1143VerifiedMar 25, 2025Details
Grok 4xAI1143VerifiedJul 9, 2025Details
GPT-5OpenAI1134VerifiedAug 7, 2025Details
Claude Opus 4Anthropic1128VerifiedMay 22, 2025Details
GPT-4oOpenAI1006VerifiedMay 13, 2024Details

Each row reports the model’s Elo on Search Arena. Click a row for the full run context.