BenchmarksTool use
MCP-Universe
A benchmark from Salesforce AI Research that evaluates LLMs and agents on real-world Model Context Protocol (MCP) server tasks across six domains (location navigation, repository management, financial analysis, 3D design, browser automation, web searching), measuring end-to-end task success rate.
Tool useOverall Success RateHigher is better
No run guide for this benchmark yet.