Berkeley Function Calling Leaderboard

Name: Berkeley Function Calling Leaderboard
Creator: evals.report

A function-calling and tool-use benchmark covering single-turn, multi-turn, live, and agentic scenarios.

Tool useaccuracyHigher is better

Known official sources 1

Ready nowResult archiveReview neededRun guide readyPublic data

Strong public benchmark for function calling, multi-turn, live, and agentic tool categories.

Category: Tool use
Owner: UC Berkeley Gorilla
Data path: Use the latest dated result archive after matching it to the public leaderboard. Prefer category rows first.

Known caveat

BFCL includes source-provided within-benchmark aggregates; label them as BFCL metrics, never evals.report composites.