BenchmarksChat preference
Design Arena
A crowdsourced human-preference benchmark where top AI models receive identical design/frontend prompts and users vote head-to-head on the anonymized outputs, producing a Bradley-Terry (Elo) ranking of design taste across categories like websites, UI components, games, and data visualization.
Chat preferenceEloHigher is better
No run guide for this benchmark yet.