LabsGoogle DeepMind
Models 9
Gemini 1.5 Pro
Gemini · gemini 1.5 pro
2024-02-15
0 results
Gemini 2.0 Flash
Gemini · gemini 2.0 flash
2024-12-11
0 results
Gemini 2.5 Pro
Gemini · gemini 2.5 pro
2025-03-25
6 results
Gemini 2.5 Flash
Gemini · gemini 2.5 flash
2025-04-17
1 results
Gemini 3 Pro
Gemini · gemini 3 pro
2025-11-18
12 results
Gemini 3 Deep Think
Gemini · gemini 3 deep think
2026-02-01
2 results
Gemini 3 Flash
Gemini · gemini 3 flash
2026-02-17
9 results
Gemini 3.1 Pro Preview
Gemini · gemini 3.1 pro preview
2026-03-05
14 results
Gemini 3.5 Flash
Gemini · gemini 3.5 flash
2026-04-20
9 results
Progress by benchmark
Show progress on
Gemini 1.5 Pro
Feb 15, 2024
—
Gemini 2.0 Flash
Dec 11, 2024
—
Gemini 2.5 Pro
Mar 25, 2025
—
Gemini 2.5 Flash
Apr 17, 2025
—
Gemini 3 Pro
Nov 18, 2025
72.9%
Gemini 3 Deep Think
Feb 1, 2026
—
Gemini 3 Flash
Feb 17, 2026
75.4%
Gemini 3.1 Pro Preview
Mar 5, 2026
75.6%
Gemini 3.5 Flash
Apr 20, 2026
—
Single benchmark only
This view shows SWE-bench Verified (% resolved) only. Other benchmarks use different metrics and are not directly comparable.
Progress matrix
| Model | SWE-bench Verified % resolved | GPQA Diamond accuracy | LiveCodeBench Pro Codeforces Elo | Berkeley Function Calling Leaderboard accuracy | LiveBench score | Terminal-Bench 2.1 task success | SWE-bench Pro % resolved | DeepSWE % resolved | Humanity's Last Exam accuracy | MMMU-Pro accuracy | LMArena source-defined rating | ARC-AGI-3 accuracy | ARC-AGI-2 accuracy | FrontierMath accuracy | AIME (OTIS Mock) accuracy | SimpleQA Verified accuracy |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gemini 1.5 Pro Gemini | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Gemini 2.0 Flash Gemini | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| Gemini 2.5 Pro Gemini | — | 85.3% | 1769 | — | — | — | — | — | 21.64% | 68.0% | 1457 | — | — | — | — | 56.0% |
| Gemini 2.5 Flash Gemini | — | — | — | 56.24% | — | — | — | — | — | — | — | — | — | — | — | — |
| Gemini 3 Pro Gemini | 72.9% | 92.6% | 2439 | 72.51% | 73.39% | — | 43.30% | — | 38.3% | 81.0% | 1479 | — | — | 37.6% | 91.4% | 72.9% |
| Gemini 3 Deep Think Gemini | — | — | 3298 | — | — | — | — | — | — | — | — | — | 84.58% | — | — | — |
| Gemini 3 Flash Gemini | 75.4% | — | 2316 | — | — | — | 34.63% | 5.16% | 36.6% | — | 1466 | — | — | 35.64% | 92.8% | 67.4% |
| Gemini 3.1 Pro Preview Gemini | 75.6% | 94.1% | 2887 | — | 79.93% | — | 46.10% | 9.88% | 45.9% | 80.5% | 1481 | 0.42% | 77.08% | 36.9% | 95.6% | 77.3% |
| Gemini 3.5 Flash Gemini | — | 92.8% | — | — | 75.02% | — | — | 28.32% | 42.5% | — | 1482 | — | 72.08% | 38.97% | 95.6% | 68.4% |
Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.