evals.report
BenchmarksLabsCompareRun guides

Vibe Code Bench

An end-to-end web application development benchmark (by Vals AI / Replit) where models build complete full-stack web apps from natural-language specifications in a sandboxed environment with production services (Supabase, Stripe, email), then are scored by an autonomous browser agent on overall application pass accuracy.

CodingOverall accuracyHigher is better

No run guide for this benchmark yet.