evals.report
BenchmarksLabsCompareRun guides
BenchmarksChat preference

Search Arena

A crowdsourced human-preference leaderboard from LMArena that ranks search-augmented LLMs via blind pairwise votes on grounded, web-search answers, reported as Bradley-Terry Elo-scale ratings.

Chat preferenceEloHigher is better

No run guide for this benchmark yet.