evals.report
BenchmarksSourcesLabsCompareRun guides
LabsDeepSeek

DeepSeek

Model lab for DeepSeek public benchmark rows.

5 models10 results deepseek.com

Models 5

Progress by benchmark

Show progress on
DeepSeek V3
Dec 26, 2024
DeepSeek R1
Jan 20, 2025
DeepSeek V3 0324
Mar 24, 2025
DeepSeek V3.2
Sep 29, 2025
DeepSeek V4 Pro
Mar 1, 2026
Single benchmark only
This view shows SWE-bench Verified (% resolved) only. Other benchmarks use different metrics and are not directly comparable.

Progress matrix

ModelSWE-bench Verified
% resolved
GPQA Diamond
accuracy
LiveCodeBench Pro
Codeforces Elo
Berkeley Function Calling Leaderboard
accuracy
LiveBench
score
Terminal-Bench 2.1
task success
SWE-bench Pro
% resolved
DeepSWE
% resolved
Humanity's Last Exam
accuracy
MMMU-Pro
accuracy
LMArena
source-defined rating
ARC-AGI-3
accuracy
ARC-AGI-2
accuracy
FrontierMath
accuracy
AIME (OTIS Mock)
accuracy
SimpleQA Verified
accuracy
DeepSeek V3
DeepSeek
DeepSeek R1
DeepSeek
1284
DeepSeek V3 0324
DeepSeek
1124
DeepSeek V3.2
DeepSeek
56.73%15.56%22.1%87.8%
DeepSeek V4 Pro
DeepSeek
73.58%7.52%32.4%1446

Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.