LabsAnthropic
Models 14
Claude 3.5 Sonnet
Claude Sonnet · claude 3.5 sonnet
2024-06-20
0 results
Claude 3.7 Sonnet
Claude Sonnet · claude 3.7 sonnet
2025-02-24
1 results
Claude Sonnet 4
Claude Sonnet · claude sonnet 4
2025-05-22
2 results
Claude Opus 4
Claude Opus · claude opus 4
2025-05-22
1 results
Claude Opus 4.1
Claude Opus · claude opus 4.1
2025-08-05
1 results
Claude Sonnet 4.5
Claude Sonnet · claude sonnet 4.5
2025-09-29
5 results
Claude Haiku 4.5
Claude Haiku · claude haiku 4.5
2025-10-01
3 results
Claude Opus 4.5
Claude Opus · claude opus 4.5
2025-11-01
8 results
Claude Opus 4.6
Claude Opus · claude opus 4.6
2026-02-05
12 results
Claude Opus 4.6 thinking
Claude Opus · claude opus 4.6 thinking
2026-02-05
2 results
Claude Sonnet 4.6
Claude Sonnet · claude sonnet 4.6
2026-02-05
8 results
Claude Opus 4.7
Claude Opus · claude opus 4.7
2026-04-16
11 results
Claude Opus 4.7 thinking
Claude Opus · claude opus 4.7 thinking
2026-04-16
1 results
Claude Opus 4.8
Claude Opus · claude opus 4.8
2026-05-28
4 results
Progress by benchmark
Show progress on
Claude 3.5 Sonnet
Jun 20, 2024
—
Claude 3.7 Sonnet
Feb 24, 2025
61.0%
Claude Sonnet 4
May 22, 2025
—
Claude Opus 4
May 22, 2025
70.7%
Claude Opus 4.1
Aug 5, 2025
73.3%
Claude Sonnet 4.5
Sep 29, 2025
71.3%
Claude Haiku 4.5
Oct 1, 2025
—
Claude Opus 4.5
Nov 1, 2025
76.7%
Claude Opus 4.6
Feb 5, 2026
78.7%
Claude Opus 4.6 thinking
Feb 5, 2026
—
Claude Sonnet 4.6
Feb 5, 2026
75.2%
Claude Opus 4.7
Apr 16, 2026
83.5%
Claude Opus 4.7 thinking
Apr 16, 2026
—
Claude Opus 4.8
May 28, 2026
88.6%
Single benchmark only
This view shows SWE-bench Verified (% resolved) only. Other benchmarks use different metrics and are not directly comparable.
Progress matrix
| Model | SWE-bench Verified % resolved | GPQA Diamond accuracy | LiveCodeBench Pro Codeforces Elo | Berkeley Function Calling Leaderboard accuracy | LiveBench score | Terminal-Bench 2.1 task success | SWE-bench Pro % resolved | DeepSWE % resolved | Humanity's Last Exam accuracy | MMMU-Pro accuracy | LMArena source-defined rating | ARC-AGI-3 accuracy | ARC-AGI-2 accuracy | FrontierMath accuracy | AIME (OTIS Mock) accuracy | SimpleQA Verified accuracy |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Claude 3.5 Sonnet Claude Sonnet | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Claude 3.7 Sonnet Claude Sonnet | 61.0% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Claude Sonnet 4 Claude Sonnet | — | — | — | — | — | — | 42.70% | — | — | — | — | — | 5.93% | — | — | — |
| Claude Opus 4 Claude Opus | 70.7% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Claude Opus 4.1 Claude Opus | 73.3% | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Claude Sonnet 4.5 Claude Sonnet | 71.3% | — | 1412 | 73.24% | — | — | 43.60% | — | — | 68.9% | — | — | — | — | — | — |
| Claude Haiku 4.5 Claude Haiku | — | — | — | 68.70% | — | — | 39.45% | 0.22% | — | — | — | — | — | — | — | — |
| Claude Opus 4.5 Claude Opus | 76.7% | 86.0% | — | 77.47% | 75.96% | — | 45.89% | — | 25.8% | 73.9% | — | — | — | 20.69% | — | — |
| Claude Opus 4.6 Claude Opus | 78.7% | 90.5% | — | — | 76.33% | — | — | 27.06% | 34.2% | 77.3% | 1497 | 0.51% | 69.17% | 40.7% | 94.4% | 46.5% |
| Claude Opus 4.6 thinking Claude Opus | — | — | — | — | — | — | 51.90% | — | — | — | 1499 | — | — | — | — | — |
| Claude Sonnet 4.6 Claude Sonnet | 75.2% | 87.4% | — | — | 75.47% | — | — | 31.56% | 21.07% | 75.6% | 1454 | — | — | 32.4% | — | — |
| Claude Opus 4.7 Claude Opus | 83.5% | 90.2% | — | — | 76.91% | — | — | 54.20% | 39.04% | — | 1480 | 0.18% | 75.83% | 43.79% | 97.8% | 50.6% |
| Claude Opus 4.7 thinking Claude Opus | — | — | — | — | — | — | — | — | — | — | 1486 | — | — | — | — | — |
| Claude Opus 4.8 Claude Opus | 88.6% | 93.6% | — | — | 77.22% | — | — | — | 49.8% | — | — | — | — | — | — | — |
Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.