LabsZ.ai
Models 3
GLM-4.6
GLM · glm-4.6
2025-09-30
2 results
GLM-5
GLM · glm-5
2026-02-11
2 results
GLM-5.1
GLM · glm-5.1
2026-04-01
8 results
Progress by benchmark
Show progress on
GLM-4.6
Sep 30, 2025
—
GLM-5
Feb 11, 2026
72.1%
GLM-5.1
Apr 1, 2026
74.2%
Single benchmark only
This view shows SWE-bench Verified (% resolved) only. Other benchmarks use different metrics and are not directly comparable.
Progress matrix
| Model | SWE-bench Verified % resolved | GPQA Diamond accuracy | LiveCodeBench Pro Codeforces Elo | Berkeley Function Calling Leaderboard accuracy | LiveBench score | Terminal-Bench 2.1 task success | SWE-bench Pro % resolved | DeepSWE % resolved | Humanity's Last Exam accuracy | MMMU-Pro accuracy | LMArena source-defined rating | ARC-AGI-3 accuracy | ARC-AGI-2 accuracy | FrontierMath accuracy | AIME (OTIS Mock) accuracy | SimpleQA Verified accuracy |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GLM-4.6 GLM | — | — | — | 72.38% | — | — | 9.67% | — | — | — | — | — | — | — | — | — |
| GLM-5 GLM | 72.1% | 87.8% | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
| GLM-5.1 GLM | 74.2% | 85.5% | — | — | 70.18% | — | — | 17.48% | 25.63% | — | 1469 | — | — | 33.45% | 92.2% | — |
Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.