Moonshot AIKimi K2
Kimi K2 Thinking
Moonshot AI · Kimi K2. Released Nov 6, 2025.
13 results
Benchmark results 13
Compare this model| Benchmark | Category | Score | Metric | Status | Date | |
|---|---|---|---|---|---|---|
| SWE-bench Verified | Coding | 71.3% | % resolved | Verified | Nov 6, 2025 | Details |
| GPQA Diamond | Reasoning | 84.5% | accuracy | Verified | Nov 6, 2025 | Details |
| Humanity's Last Exam | Reasoning | 23.9% | accuracy | Verified | Nov 6, 2025 | Details |
| Epoch Capabilities Index | Reasoning | 145.6 | Index | Official | Nov 6, 2025 | Details |
| MMLU-Pro | Reasoning | 84.6% | accuracy | Unverified | Nov 6, 2025 | Details |
| BrowseComp | Agents | 60.2% | accuracy | Verified | Nov 6, 2025 | Details |
| GDPval | Agents | 992 | Elo | Official | Nov 6, 2025 | Details |
| MultiChallenge | Reasoning | 55.42% | accuracy | Verified | Nov 6, 2025 | Details |
| Global-MMLU | Reasoning | 73.5% | accuracy | Unverified | Nov 6, 2025 | Details |
| WebDev Arena | Chat preference | 1329 | Elo | Verified | Nov 6, 2025 | Details |
| EQ-Bench Creative Writing v3 | Chat preference | 1695 | Elo | Verified | Nov 6, 2025 | Details |
| MCP-Universe | Tool use | 26.41% | Overall Success Rate | Verified | Nov 6, 2025 | Details |
| Gray Swan Arena (Agent Red-Teaming / Indirect Prompt Injection) | Agents | 4.8% | Attack Success Rate (ASR) | Verified | Nov 6, 2025 | Details |