evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

FACTS Grounding

A Google DeepMind benchmark that measures how factually grounded an LLM's long-form responses are to a provided source document, scoring the share of responses that are eligible and fully supported by the context with no hallucinations.

ReasoningGrounding accuracyHigher is better

No run guide for this benchmark yet.