BenchmarksChat preference
Search Arena
A crowdsourced human-preference leaderboard from LMArena that ranks search-augmented LLMs via blind pairwise votes on grounded, web-search answers, reported as Bradley-Terry Elo-scale ratings.
Chat preferenceEloHigher is better
No run guide for this benchmark yet.