evals.report
BenchmarksLabsCompareRun guides

Terminus 2 + Claude Opus 4.8

Agent systems · Agent.

1 results

Benchmark results 1

Compare this model
BenchmarkCategoryScoreMetricStatusDate
Terminal-Bench 2.1Agents74.6%task successVerifiedMay 29, 2026Details