DeepSeekDeepSeek-V4
DeepSeek V4 Flash
DeepSeek · DeepSeek-V4. Released Apr 24, 2026.
16 results
Benchmark results 16
Compare this model| Benchmark | Category | Score | Metric | Status | Date | |
|---|---|---|---|---|---|---|
| SWE-bench Verified | Coding | 79.0% | % resolved | Verified | Apr 24, 2026 | Details |
| SWE-bench Pro | Coding | 52.6% | % resolved | Verified | Apr 24, 2026 | Details |
| GPQA Diamond | Reasoning | 88.1% | accuracy | Verified | Apr 24, 2026 | Details |
| Humanity's Last Exam | Reasoning | 34.8% | accuracy | Verified | Apr 24, 2026 | Details |
| SimpleQA Verified | Other | 34.1% | accuracy | Verified | Apr 24, 2026 | Details |
| MCP Atlas | Tool use | 69.0% | pass rate | Verified | Apr 24, 2026 | Details |
| Artificial Analysis Intelligence Index | Reasoning | 46.5 | Index | Unverified | Apr 24, 2026 | Details |
| τ²-bench (Telecom) | Tool use | 95.0% | pass^1 | Official | Apr 24, 2026 | Details |
| AIME 2026 | Reasoning | 95.83% | accuracy | Official | Apr 24, 2026 | Details |
| GDPval | Agents | 1414 | Elo | Official | Apr 24, 2026 | Details |
| SciCode | Coding | 44.9% | accuracy | Unverified | Apr 24, 2026 | Details |
| AA-Omniscience: Knowledge and Hallucination Benchmark | Reasoning | -23 | AA-Omniscience Index | Official | Apr 24, 2026 | Details |
| IFBench | Reasoning | 79.2% | accuracy | Official | Apr 24, 2026 | Details |
| EQ-Bench Creative Writing v3 | Chat preference | 1556 | Elo | Verified | Apr 24, 2026 | Details |
| Design Arena | Chat preference | 1268 | Elo | Verified | Apr 24, 2026 | Details |
| MathArena HMMT February 2026 | Reasoning | 93.94% | accuracy | Official | Apr 24, 2026 | Details |