BenchmarksReasoning
FACTS Grounding
A Google DeepMind benchmark that measures how factually grounded an LLM's long-form responses are to a provided source document, scoring the share of responses that are eligible and fully supported by the context with no hallucinations.
ReasoningGrounding accuracyHigher is better
No run guide for this benchmark yet.