evals.report
BenchmarksSourcesLabsCompareRun guides
LabsAlibaba / Qwen

Alibaba / Qwen

Model provider for Qwen-family public benchmark rows.

7 models15 results qwen.ai

Models 7

Progress by benchmark

Show progress on
Qwen 3 Coder 480B
Jul 22, 2025
Qwen3 235B A22B Instruct 2507
Jul 25, 2025
Qwen3 Max
Sep 23, 2025
Qwen3.5 Max Preview
Mar 1, 2026
Qwen 3.6 Plus
Mar 31, 2026
Qwen 3.6 Max Preview
Apr 20, 2026
Qwen3.7 Max Preview
May 1, 2026
Single benchmark only
This view shows SWE-bench Verified (% resolved) only. Other benchmarks use different metrics and are not directly comparable.

Progress matrix

ModelSWE-bench Verified
% resolved
GPQA Diamond
accuracy
LiveCodeBench Pro
Codeforces Elo
Berkeley Function Calling Leaderboard
accuracy
LiveBench
score
Terminal-Bench 2.1
task success
SWE-bench Pro
% resolved
DeepSWE
% resolved
Humanity's Last Exam
accuracy
MMMU-Pro
accuracy
LMArena
source-defined rating
ARC-AGI-3
accuracy
ARC-AGI-2
accuracy
FrontierMath
accuracy
AIME (OTIS Mock)
accuracy
SimpleQA Verified
accuracy
Qwen 3 Coder 480B
Qwen
38.70%
Qwen3 235B A22B Instruct 2507
Qwen
167352.15%21.41%50.1%
Qwen3 Max
Qwen
67.5%
Qwen3.5 Max Preview
Qwen
1470
Qwen 3.6 Plus
Qwen
87.4%90.6%49.1%
Qwen 3.6 Max Preview
Qwen
89.1%91.1%56.9%
Qwen3.7 Max Preview
Qwen
74.29%1474

Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.