evals.report
BenchmarksLabsCompareRun guides
BenchmarksTool use

MCP-Universe

A benchmark from Salesforce AI Research that evaluates LLMs and agents on real-world Model Context Protocol (MCP) server tasks across six domains (location navigation, repository management, financial analysis, 3D design, browser automation, web searching), measuring end-to-end task success rate.

Tool useOverall Success RateHigher is better

No run guide for this benchmark yet.