evals.report
BenchmarksSourcesLabsCompareRun guides
SourcesCoding

SWE-bench Pro

Harder public follow-on to SWE-bench with professional software tasks.

Ready nowStatic HTMLReview neededRun guide readyPublic data
Official source Benchmark page

Source detail

Score source

Official Scale page and public repo include result JSON and trajectories.

Run guide

Repo includes harness scripts, Dockerfiles, and run scripts.

How it can be used

Use official page rows and repo result JSON together where both are available.

Caveat

Track max turns and agent configuration because results are scaffold-dependent.

Evidence links 2