BenchmarksCoding
Aider Polyglot
A coding benchmark that measures how reliably an LLM can solve and apply diff-based code edits across 225 challenging Exercism exercises spanning C++, Go, Java, JavaScript, Python, and Rust, with up to two attempts per problem.
Coding% correctHigher is better
No run guide for this benchmark yet.