evals.report
BenchmarksLabsCompareRun guides
BenchmarksChat preference

LMArena

A public chat-preference evaluation surface with source-defined preference ratings and model comparisons.

Chat preferencesource-defined ratingHigher is better

No run guide for this benchmark yet.