BenchmarksCoding
SWE-bench Verified
A curated SWE-bench split for evaluating systems that resolve real software engineering issues.
Coding% resolvedHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Claude Opus 4.8 | Anthropic | 88.6% | Claude Opus 4.8 | Verified | May 28, 2026 | Details |
| Claude Opus 4.7 | Anthropic | 83.5% | Claude Opus 4.7 | Official | May 30, 2026 | Details |
| GPT-5.5 | OpenAI | 80.6% | GPT-5.5 | Official | May 30, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 78.7% | Claude Opus 4.6 | Official | May 30, 2026 | Details |
| GPT-5.4 | OpenAI | 76.9% | GPT-5.4 | Official | May 30, 2026 | Details |
| Kimi K2.6 | Moonshot AI | 76.7% | Kimi K2.6 | Official | May 30, 2026 | Details |
| Claude Opus 4.5 | Anthropic | 76.7% | Claude Opus 4.5 | Official | May 30, 2026 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 75.6% | Gemini 3.1 Pro | Official | May 30, 2026 | Details |
| Gemini 3 Flash | Google DeepMind | 75.4% | Gemini 3 Flash | Official | May 30, 2026 | Details |
| Claude Sonnet 4.6 | Anthropic | 75.2% | Claude Sonnet 4.6 | Official | May 30, 2026 | Details |
| GPT-5.3-Codex | OpenAI | 74.8% | GPT-5.3 Codex | Official | May 30, 2026 | Details |
| GLM-5.1 | Z.ai | 74.2% | GLM-5.1 | Official | May 30, 2026 | Details |
| Kimi K2.5 | Moonshot AI | 73.8% | Kimi K2.5 | Official | May 30, 2026 | Details |
| GPT-5.2 | OpenAI | 73.8% | GPT-5.2 | Official | May 30, 2026 | Details |
| GPT-5 high | OpenAI | 73.6% | GPT-5 | Official | May 30, 2026 | Details |
| Claude Opus 4.1 | Anthropic | 73.3% | Claude Opus 4.1 | Official | May 30, 2026 | Details |
| Gemini 3 Pro | Google DeepMind | 72.9% | Gemini 3 Pro | Official | May 30, 2026 | Details |
| GLM-5 | Z.ai | 72.1% | GLM-5 | Official | May 30, 2026 | Details |
| Claude Sonnet 4.5 | Anthropic | 71.3% | Claude Sonnet 4.5 | Official | May 30, 2026 | Details |
| Claude Opus 4 | Anthropic | 70.7% | Claude Opus 4 | Official | May 30, 2026 | Details |
| GPT-5.1 | OpenAI | 68.0% | GPT-5.1 | Official | May 30, 2026 | Details |
| GPT-5 mini | OpenAI | 64.7% | GPT-5 mini | Official | May 30, 2026 | Details |
| o3 | OpenAI | 62.3% | o3 | Official | May 30, 2026 | Details |
| Claude 3.7 Sonnet | Anthropic | 61.0% | Claude 3.7 Sonnet | Official | May 30, 2026 | Details |
Each row reports the model’s % resolved on SWE-bench Verified. Click a row for the full run context.