BenchmarksCoding
SWE-bench Multilingual
A software-engineering benchmark of 300 curated GitHub issue-resolution tasks spanning 42 repositories and 9 programming languages (C, C++, Go, Java, JavaScript, TypeScript, PHP, Ruby, Rust), measuring the percentage of real-world issues a model can resolve so that fail-to-pass and pass-to-pass tests succeed.
Coding% resolvedHigher is better
No run guide for this benchmark yet.