GSO: Software Optimization Benchmark for SWE-Agents
GSO evaluates AI coding agents on 102 challenging real-world software performance optimization tasks across 10 codebases in 5 languages, measuring whether an agent's patch matches expert-developer speedups while remaining correct.
What this benchmark measures
GSO evaluates AI coding agents on 102 challenging real-world software performance optimization tasks across 10 codebases in 5 languages, measuring whether an agent's patch matches expert-developer speedups while remaining correct.
Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.
The metric shown here is Opt@1. It should be interpreted within GSO: Software Optimization Benchmark for SWE-Agents, not compared as part of a site-wide ranking.