BenchmarksChat preference
LMArena
A public chat-preference evaluation surface with source-defined preference ratings and model comparisons.
Chat preferencesource-defined ratingHigher is better
No run guide for this benchmark yet.
A public chat-preference evaluation surface with source-defined preference ratings and model comparisons.
No run guide for this benchmark yet.