BenchmarksCoding
SWE-fficiency
Measures whether coding agents can optimize real-world repositories for performance: generate a pull request that speeds up a target workload while keeping the repository's existing tests passing (498 tasks across 9 large Python repos).
Codingspeedup scoreHigher is better
No run guide for this benchmark yet.