BenchmarksCoding
Vibe Code Bench
An end-to-end web application development benchmark (by Vals AI / Replit) where models build complete full-stack web apps from natural-language specifications in a sandboxed environment with production services (Supabase, Stripe, email), then are scored by an autonomous browser agent on overall application pass accuracy.
CodingOverall accuracyHigher is better
No run guide for this benchmark yet.