BenchmarksCoding
GSO: Software Optimization Benchmark for SWE-Agents
GSO evaluates AI coding agents on 102 challenging real-world software performance optimization tasks across 10 codebases in 5 languages, measuring whether an agent's patch matches expert-developer speedups while remaining correct.
CodingOpt@1Higher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Claude Opus 4.7 | Anthropic | 44.12% | — | Official | Apr 16, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 41.18% | — | Official | Feb 5, 2026 | Details |
| GPT-5.5 | OpenAI | 40.2% | — | Official | Apr 23, 2026 | Details |
| GPT-5.4 | OpenAI | 31.37% | — | Official | Mar 5, 2026 | Details |
| GPT-5.2 | OpenAI | 27.45% | — | Official | Dec 11, 2025 | Details |
| Claude Opus 4.5 | Anthropic | 26.47% | — | Official | Nov 24, 2025 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 22.55% | — | Official | Feb 19, 2026 | Details |
| Gemini 3 Pro | Google DeepMind | 18.63% | — | Official | Nov 18, 2025 | Details |
| Claude Sonnet 4.5 | Anthropic | 14.71% | — | Official | Sep 29, 2025 | Details |
| GPT-5.1 | OpenAI | 13.73% | — | Official | Nov 12, 2025 | Details |
| Gemini 3 Flash | Google DeepMind | 9.8% | — | Official | Dec 17, 2025 | Details |
| o3 | OpenAI | 8.82% | — | Official | Apr 16, 2025 | Details |
| GPT-5 | OpenAI | 6.86% | — | Official | Aug 7, 2025 | Details |
| Claude Opus 4 | Anthropic | 6.86% | — | Official | May 22, 2025 | Details |
| Qwen 3 Coder 480B | Alibaba / Qwen | 4.9% | — | Official | Jul 22, 2025 | Details |
| Kimi K2 Instruct | Moonshot AI | 4.9% | — | Official | Jul 11, 2025 | Details |
| Claude Sonnet 4 | Anthropic | 4.9% | — | Official | May 22, 2025 | Details |
| Claude 3.5 Sonnet | Anthropic | 4.6% | — | Official | Jun 20, 2024 | Details |
| Gemini 2.5 Pro | Google DeepMind | 3.92% | — | Official | Mar 25, 2025 | Details |
| Claude 3.7 Sonnet | Anthropic | 3.8% | — | Official | Feb 24, 2025 | Details |
| o4-mini | OpenAI | 3.6% | — | Official | Apr 16, 2025 | Details |
| GPT-4o | OpenAI | 0.0% | — | Official | May 13, 2024 | Details |
Each row reports the model’s Opt@1 on GSO: Software Optimization Benchmark for SWE-Agents. Click a row for the full run context.