DeepSeekDeepSeek V3
DeepSeek V3.1
DeepSeek · DeepSeek V3. Released Aug 21, 2025.
19 results
Benchmark results 19
Compare this model| Benchmark | Category | Score | Metric | Status | Date | |
|---|---|---|---|---|---|---|
| SWE-bench Verified | Coding | 66.0% | % resolved | Verified | Aug 21, 2025 | Details |
| GPQA Diamond | Reasoning | 80.1% | accuracy | Verified | Aug 21, 2025 | Details |
| Humanity's Last Exam | Reasoning | 15.9% | accuracy | Verified | Aug 21, 2025 | Details |
| Artificial Analysis Intelligence Index | Reasoning | 28.1 | Index | Unverified | Aug 21, 2025 | Details |
| Epoch Capabilities Index | Reasoning | 138.9 | Index | Official | Aug 21, 2025 | Details |
| Aider Polyglot | Coding | 68.4% | % correct | Unverified | Aug 21, 2025 | Details |
| MMLU-Pro | Reasoning | 85.1% | accuracy | Verified | Aug 21, 2025 | Details |
| GAIA: A Benchmark for General AI Assistants | Agents | 11.5% | accuracy | Unverified | Aug 21, 2025 | Details |
| GDPval | Agents | 1080 | Elo | Official | Aug 21, 2025 | Details |
| LiveCodeBench | Coding | 57.7% | Pass@1 | Unverified | Aug 21, 2025 | Details |
| SciCode | Coding | 36.7% | accuracy | Unverified | Aug 21, 2025 | Details |
| MultiChallenge | Reasoning | 46.10% | accuracy | Verified | Aug 21, 2025 | Details |
| Global-MMLU | Reasoning | 82.7% | accuracy | Unverified | Aug 21, 2025 | Details |
| EQ-Bench Creative Writing v3 | Chat preference | 1420 | Elo | Verified | Aug 21, 2025 | Details |
| Design Arena | Chat preference | 1166 | Elo | Verified | Aug 21, 2025 | Details |
| MASK (Model Alignment between Statements and Knowledge) | Other | 46.27 | Honesty score | Verified | Aug 21, 2025 | Details |
| MCP-Universe | Tool use | 22.08% | Overall Success Rate | Verified | Aug 21, 2025 | Details |
| Vectara Hallucination Leaderboard | Other | 5.5% | Hallucination Rate | Official | Aug 21, 2025 | Details |
| Gray Swan Arena (Agent Red-Teaming / Indirect Prompt Injection) | Agents | 5.4% | Attack Success Rate (ASR) | Verified | Aug 21, 2025 | Details |