BenchmarksChat preference
LMArena
A public chat-preference evaluation surface with source-defined preference ratings and model comparisons.
Chat preferencesource-defined ratingHigher is better
What this benchmark measures
A public chat-preference evaluation surface with source-defined preference ratings and model comparisons.
Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.
The metric shown here is source-defined rating. It should be interpreted within LMArena, not compared as part of a site-wide ranking.
What to be careful about
Ranking-native UX conflicts with evals.report tone. Include only with careful framing.
No composite ranking
evals.report never combines benchmarks. source-defined rating on LMArena is its own number — don’t average it with other metrics.