evals.report
BenchmarksSourcesLabsCompareRun guides
LabsGoogle DeepMind

Google DeepMind

Model lab for Gemini public benchmark rows.

9 models53 results deepmind.google

Models 9

Progress by benchmark

Show progress on
Gemini 1.5 Pro
Feb 15, 2024
Gemini 2.0 Flash
Dec 11, 2024
Gemini 2.5 Pro
Mar 25, 2025
Gemini 2.5 Flash
Apr 17, 2025
Gemini 3 Pro
Nov 18, 2025
72.9%
Gemini 3 Deep Think
Feb 1, 2026
Gemini 3 Flash
Feb 17, 2026
75.4%
Gemini 3.1 Pro Preview
Mar 5, 2026
75.6%
Gemini 3.5 Flash
Apr 20, 2026
Single benchmark only
This view shows SWE-bench Verified (% resolved) only. Other benchmarks use different metrics and are not directly comparable.

Progress matrix

ModelSWE-bench Verified
% resolved
GPQA Diamond
accuracy
LiveCodeBench Pro
Codeforces Elo
Berkeley Function Calling Leaderboard
accuracy
LiveBench
score
Terminal-Bench 2.1
task success
SWE-bench Pro
% resolved
DeepSWE
% resolved
Humanity's Last Exam
accuracy
MMMU-Pro
accuracy
LMArena
source-defined rating
ARC-AGI-3
accuracy
ARC-AGI-2
accuracy
FrontierMath
accuracy
AIME (OTIS Mock)
accuracy
SimpleQA Verified
accuracy
Gemini 1.5 Pro
Gemini
Gemini 2.0 Flash
Gemini
Gemini 2.5 Pro
Gemini
85.3%176921.64%68.0%145756.0%
Gemini 2.5 Flash
Gemini
56.24%
Gemini 3 Pro
Gemini
72.9%92.6%243972.51%73.39%43.30%38.3%81.0%147937.6%91.4%72.9%
Gemini 3 Deep Think
Gemini
329884.58%
Gemini 3 Flash
Gemini
75.4%231634.63%5.16%36.6%146635.64%92.8%67.4%
Gemini 3.1 Pro Preview
Gemini
75.6%94.1%288779.93%46.10%9.88%45.9%80.5%14810.42%77.08%36.9%95.6%77.3%
Gemini 3.5 Flash
Gemini
92.8%75.02%28.32%42.5%148272.08%38.97%95.6%68.4%

Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.