evals.report
BenchmarksLabsCompareRun guides
BenchmarksChat preference

Design Arena

A crowdsourced human-preference benchmark where top AI models receive identical design/frontend prompts and users vote head-to-head on the anonymized outputs, producing a Bradley-Terry (Elo) ranking of design taste across categories like websites, UI components, games, and data visualization.

Chat preferenceEloHigher is better

No run guide for this benchmark yet.