BenchmarksReasoning
IFBench
Ai2's instruction-following benchmark that measures precise instruction-following generalization on 58 diverse, verifiable out-of-domain output constraints designed to test whether models can obey novel rules rather than overfit to familiar constraint templates.
ReasoningaccuracyHigher is better
No run guide for this benchmark yet.