evals.report
BenchmarksLabsCompareRun guides
DeepSeekDeepSeek-V4

DeepSeek V4 Flash

DeepSeek · DeepSeek-V4. Released Apr 24, 2026.

16 results

Benchmark results 16

Compare this model
BenchmarkCategoryScoreMetricStatusDate
SWE-bench VerifiedCoding79.0%% resolvedVerifiedApr 24, 2026Details
SWE-bench ProCoding52.6%% resolvedVerifiedApr 24, 2026Details
GPQA DiamondReasoning88.1%accuracyVerifiedApr 24, 2026Details
Humanity's Last ExamReasoning34.8%accuracyVerifiedApr 24, 2026Details
SimpleQA VerifiedOther34.1%accuracyVerifiedApr 24, 2026Details
MCP AtlasTool use69.0%pass rateVerifiedApr 24, 2026Details
Artificial Analysis Intelligence IndexReasoning46.5IndexUnverifiedApr 24, 2026Details
τ²-bench (Telecom)Tool use95.0%pass^1OfficialApr 24, 2026Details
AIME 2026Reasoning95.83%accuracyOfficialApr 24, 2026Details
GDPvalAgents1414EloOfficialApr 24, 2026Details
SciCodeCoding44.9%accuracyUnverifiedApr 24, 2026Details
AA-Omniscience: Knowledge and Hallucination BenchmarkReasoning-23AA-Omniscience IndexOfficialApr 24, 2026Details
IFBenchReasoning79.2%accuracyOfficialApr 24, 2026Details
EQ-Bench Creative Writing v3Chat preference1556EloVerifiedApr 24, 2026Details
Design ArenaChat preference1268EloVerifiedApr 24, 2026Details
MathArena HMMT February 2026Reasoning93.94%accuracyOfficialApr 24, 2026Details