BenchmarksTool use
Berkeley Function Calling Leaderboard
A function-calling and tool-use benchmark covering single-turn, multi-turn, live, and agentic scenarios.
Tool useaccuracyHigher is better
Official BFCL README documents install, generation, evaluation, and score output.
1Expected output
Use the official source links for current output format, submission steps, and benchmark-specific result files.
2Submit results
Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.
Gotchas
BFCL includes source-provided within-benchmark aggregates; label them as BFCL metrics, never evals.report composites.
Do not mix this benchmark's metric with unrelated benchmark metrics.