Aider Polyglot
A coding benchmark that measures how reliably an LLM can solve and apply diff-based code edits across 225 challenging Exercism exercises spanning C++, Go, Java, JavaScript, Python, and Rust, with up to two attempts per problem.
What this benchmark measures
A coding benchmark that measures how reliably an LLM can solve and apply diff-based code edits across 225 challenging Exercism exercises spanning C++, Go, Java, JavaScript, Python, and Rust, with up to two attempts per problem.
Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.
The metric shown here is % correct. It should be interpreted within Aider Polyglot, not compared as part of a site-wide ranking.