SourcesTool use
Berkeley Function Calling Leaderboard
Strong public benchmark for function calling, multi-turn, live, and agentic tool categories.
Ready nowResult archiveReview neededRun guide readyPublic data
Source detail
Score source
Harness writes score files and CSVs; public dated BFCL-Result archive contains score/result JSON.
Run guide
Official BFCL README documents install, generation, evaluation, and score output.
How it can be used
Use the latest dated result archive after matching it to the public leaderboard. Prefer category rows first.
Caveat
BFCL includes source-provided within-benchmark aggregates; label them as BFCL metrics, never evals.report composites.