BenchmarksCoding
SWE-bench Pro
A harder public software-engineering agent benchmark built around professional repository tasks.
Coding% resolvedHigher is better
Repo includes harness scripts, Dockerfiles, and run scripts.
1Expected output
Use the official source links for current output format, submission steps, and benchmark-specific result files.
2Submit results
Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.
Gotchas
Track max turns and agent configuration because results are scaffold-dependent.
Do not mix this benchmark's metric with unrelated benchmark metrics.