SourcesCoding
SWE-bench Pro
Harder public follow-on to SWE-bench with professional software tasks.
Ready nowStatic HTMLReview neededRun guide readyPublic data
Source detail
Score source
Official Scale page and public repo include result JSON and trajectories.
Run guide
Repo includes harness scripts, Dockerfiles, and run scripts.
How it can be used
Use official page rows and repo result JSON together where both are available.
Caveat
Track max turns and agent configuration because results are scaffold-dependent.