BenchmarksCoding
GSO: Software Optimization Benchmark for SWE-Agents
GSO evaluates AI coding agents on 102 challenging real-world software performance optimization tasks across 10 codebases in 5 languages, measuring whether an agent's patch matches expert-developer speedups while remaining correct.
CodingOpt@1Higher is better
No run guide for this benchmark yet.