BenchmarksTool use
τ²-bench (Telecom)
A dual-control, multi-turn tool-agent-user benchmark (telecom split) where both the AI agent and a simulated user invoke tools to coordinate and resolve technical-support troubleshooting tasks in a shared, dynamic environment.
Tool usepass^1Higher is better
No run guide for this benchmark yet.