evals.report
BenchmarksLabsCompareRun guides

Terminus 2 + DeepSeek V4 Pro

Agent systems · Agent.

1 results

Benchmark results 1

Compare this model
BenchmarkCategoryScoreMetricStatusDate
SWE-MarathonAgents4.0%resolution rate (pass@1)OfficialDetails