evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

IFBench

Ai2's instruction-following benchmark that measures precise instruction-following generalization on 58 diverse, verifiable out-of-domain output constraints designed to test whether models can obey novel rules rather than overfit to familiar constraint templates.

ReasoningaccuracyHigher is better

No run guide for this benchmark yet.