LabsUpstage
Models 1
Progress by benchmark
Show progress on
Single benchmark only
This view shows SWE-bench Verified (% resolved) only. Other benchmarks use different metrics and are not directly comparable.
Progress matrix
Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.