Alibaba / QwenQwen3.5
Qwen3.5-397B-A17B
Alibaba / Qwen · Qwen3.5. Released Feb 16, 2026.
21 results
Benchmark results 21
Compare this model| Benchmark | Category | Score | Metric | Status | Date | |
|---|---|---|---|---|---|---|
| SWE-bench Verified | Coding | 76.4% | % resolved | Unverified | Feb 16, 2026 | Details |
| GPQA Diamond | Reasoning | 88.4% | accuracy | Unverified | Feb 16, 2026 | Details |
| Humanity's Last Exam | Reasoning | 28.7% | accuracy | Unverified | Feb 16, 2026 | Details |
| Berkeley Function Calling Leaderboard | Tool use | 72.9% | accuracy | Unverified | Feb 16, 2026 | Details |
| MMMU-Pro | Multimodal | 79.0% | accuracy | Unverified | Feb 16, 2026 | Details |
| Artificial Analysis Intelligence Index | Reasoning | 40.1 | Index | Unverified | Feb 16, 2026 | Details |
| Epoch Capabilities Index | Reasoning | 146.1 | Index | Official | Feb 16, 2026 | Details |
| τ²-bench (Telecom) | Tool use | 95.6% | pass^1 | Official | Feb 16, 2026 | Details |
| AIME 2026 | Reasoning | 93.33% | accuracy | Official | Feb 16, 2026 | Details |
| GDPval | Agents | 1220 | Elo | Official | Feb 16, 2026 | Details |
| SciCode | Coding | 42.0% | accuracy | Unverified | Feb 16, 2026 | Details |
| AA-Omniscience: Knowledge and Hallucination Benchmark | Reasoning | -30 | AA-Omniscience Index | Official | Feb 16, 2026 | Details |
| IFBench | Reasoning | 78.8% | accuracy | Official | Feb 16, 2026 | Details |
| Global-MMLU | Reasoning | 90.0% | accuracy | Unverified | Feb 16, 2026 | Details |
| WebDev Arena | Chat preference | 1393 | Elo | Verified | Feb 16, 2026 | Details |
| EQ-Bench Creative Writing v3 | Chat preference | 1469 | Elo | Verified | Feb 16, 2026 | Details |
| Design Arena | Chat preference | 1233 | Elo | Verified | Feb 16, 2026 | Details |
| ScreenSpot-Pro | Multimodal | 65.6% | accuracy | Unverified | Feb 16, 2026 | Details |
| SuperGPQA | Reasoning | 70.4% | accuracy | Unverified | Feb 16, 2026 | Details |
| MathArena HMMT February 2026 | Reasoning | 87.88% | accuracy | Official | Feb 16, 2026 | Details |
| FrontierMath Tier 4 | Reasoning | 2.1% | accuracy | Official | Feb 16, 2026 | Details |