SWE-fficiency

Name: SWE-fficiency
Creator: evals.report

Measures whether coding agents can optimize real-world repositories for performance: generate a pull request that speeds up a target workload while keeping the repository's existing tests passing (498 tasks across 9 large Python repos).

Codingspeedup scoreHigher is better

Scores About Run this benchmark

No run guide for this benchmark yet.