LabsxAI
Models 5
Grok 4
Grok · grok-4
2025-07-09
5 results
Grok 4.1 fast reasoning
Grok · grok-4.1-fast-reasoning
2025-11-01
1 results
Grok 4.20 beta reasoning
Grok · grok-4.20
2026-03-05
3 results
Grok 4.2
Grok · grok 4.2
2026-03-09
1 results
Grok 4.3
Grok · grok 4.3
2026-04-17
1 results
Progress by benchmark
Show progress on
Grok 4
Jul 9, 2025
—
Grok 4.1 fast reasoning
Nov 1, 2025
—
Grok 4.20 beta reasoning
Mar 5, 2026
—
Grok 4.2
Mar 9, 2026
—
Grok 4.3
Apr 17, 2026
—
Single benchmark only
This view shows SWE-bench Verified (% resolved) only. Other benchmarks use different metrics and are not directly comparable.
Progress matrix
| Model | SWE-bench Verified % resolved | GPQA Diamond accuracy | LiveCodeBench Pro Codeforces Elo | Berkeley Function Calling Leaderboard accuracy | LiveBench score | Terminal-Bench 2.1 task success | SWE-bench Pro % resolved | DeepSWE % resolved | Humanity's Last Exam accuracy | MMMU-Pro accuracy | LMArena source-defined rating | ARC-AGI-3 accuracy | ARC-AGI-2 accuracy | FrontierMath accuracy | AIME (OTIS Mock) accuracy | SimpleQA Verified accuracy |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Grok 4 Grok | — | 87.0% | — | 62.97% | — | — | — | — | 24.52% | — | — | — | — | 19.66% | — | 47.9% |
| Grok 4.1 fast reasoning Grok | — | — | — | 69.57% | — | — | — | — | — | — | — | — | — | — | — | — |
| Grok 4.20 beta reasoning Grok | — | — | — | — | 67.96% | — | — | — | — | — | 1453 | 0.09% | — | — | — | — |
| Grok 4.2 Grok | — | — | — | — | — | — | — | — | 30.2% | — | — | — | — | — | — | — |
| Grok 4.3 Grok | — | — | — | — | — | — | — | — | 33.12% | — | — | — | — | — | — | — |
Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.