LabsCohere
Models 1
Progress by benchmark
Show progress on
Command A Reasoning
Aug 1, 2025
—
Single benchmark only
This view shows SWE-bench Verified (% resolved) only. Other benchmarks use different metrics and are not directly comparable.
Progress matrix
| Model | SWE-bench Verified % resolved | GPQA Diamond accuracy | LiveCodeBench Pro Codeforces Elo | Berkeley Function Calling Leaderboard accuracy | LiveBench score | Terminal-Bench 2.1 task success | SWE-bench Pro % resolved | DeepSWE % resolved | Humanity's Last Exam accuracy | MMMU-Pro accuracy | LMArena source-defined rating | ARC-AGI-3 accuracy | ARC-AGI-2 accuracy | FrontierMath accuracy | AIME (OTIS Mock) accuracy | SimpleQA Verified accuracy |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Command A Reasoning Command | — | — | — | 57.06% | — | — | — | — | — | — | — | — | — | — | — | — |
Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.