BenchmarksReasoning
PutnamBench
A multilingual formal theorem-proving benchmark of hand-formalized William Lowell Putnam Mathematical Competition problems in Lean 4, Isabelle, and Coq, where a model's output is scored by whether the proof assistant's compiler verifies the proof.
ReasoningProblems solvedHigher is better
No run guide for this benchmark yet.