evals.report
BenchmarksLabsCompareRun guides
Models
1 selected
Qwen3 235B A22B Instruct 2507Alibaba / Qwen
Benchmarks
2 selected
SWE-bench VerifiedCodingGPQA DiamondReasoning
BenchmarkQwen3 235B A22B Instruct 2507Alibaba / Qwen
SWE-bench Verified% resolved
GPQA Diamondaccuracy80.1%
SWE-bench Verified
% resolved
Qwen3 235B A22B Instruct 2507 · Alibaba / Qwen
GPQA Diamond
accuracy
Qwen3 235B A22B Instruct 2507 · Alibaba / Qwen
80.1%

No aggregate score is calculated. Each row uses its benchmark’s own metric. Compare rows independently.