evals.report
BenchmarksLabsCompareRun guides

Qwen3.5-397B-A17B

Alibaba / Qwen · Qwen3.5. Released Feb 16, 2026.

21 results

Benchmark results 21

Compare this model
BenchmarkCategoryScoreMetricStatusDate
SWE-bench VerifiedCoding76.4%% resolvedUnverifiedFeb 16, 2026Details
GPQA DiamondReasoning88.4%accuracyUnverifiedFeb 16, 2026Details
Humanity's Last ExamReasoning28.7%accuracyUnverifiedFeb 16, 2026Details
Berkeley Function Calling LeaderboardTool use72.9%accuracyUnverifiedFeb 16, 2026Details
MMMU-ProMultimodal79.0%accuracyUnverifiedFeb 16, 2026Details
Artificial Analysis Intelligence IndexReasoning40.1IndexUnverifiedFeb 16, 2026Details
Epoch Capabilities IndexReasoning146.1IndexOfficialFeb 16, 2026Details
τ²-bench (Telecom)Tool use95.6%pass^1OfficialFeb 16, 2026Details
AIME 2026Reasoning93.33%accuracyOfficialFeb 16, 2026Details
GDPvalAgents1220EloOfficialFeb 16, 2026Details
SciCodeCoding42.0%accuracyUnverifiedFeb 16, 2026Details
AA-Omniscience: Knowledge and Hallucination BenchmarkReasoning-30AA-Omniscience IndexOfficialFeb 16, 2026Details
IFBenchReasoning78.8%accuracyOfficialFeb 16, 2026Details
Global-MMLUReasoning90.0%accuracyUnverifiedFeb 16, 2026Details
WebDev ArenaChat preference1393EloVerifiedFeb 16, 2026Details
EQ-Bench Creative Writing v3Chat preference1469EloVerifiedFeb 16, 2026Details
Design ArenaChat preference1233EloVerifiedFeb 16, 2026Details
ScreenSpot-ProMultimodal65.6%accuracyUnverifiedFeb 16, 2026Details
SuperGPQAReasoning70.4%accuracyUnverifiedFeb 16, 2026Details
MathArena HMMT February 2026Reasoning87.88%accuracyOfficialFeb 16, 2026Details
FrontierMath Tier 4Reasoning2.1%accuracyOfficialFeb 16, 2026Details