evals.report
BenchmarksSourcesLabsCompareRun guides
BenchmarksTool use

Berkeley Function Calling Leaderboard

A function-calling and tool-use benchmark covering single-turn, multi-turn, live, and agentic scenarios.

Tool useaccuracyHigher is better

Official BFCL README documents install, generation, evaluation, and score output.

Benchmark
Berkeley Function Calling Leaderboard
Dataset
Not provided
Metric
accuracy

1Expected output

Use the official source links for current output format, submission steps, and benchmark-specific result files.

2Submit results

Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.

Gotchas

BFCL includes source-provided within-benchmark aggregates; label them as BFCL metrics, never evals.report composites.
Do not mix this benchmark's metric with unrelated benchmark metrics.