BenchmarksReasoning
MathArena HMMT February 2026
Contamination-free evaluation of large language models on the 33 problems of the HMMT February 2026 mathematics competition, scoring final-answer accuracy (pass@1 estimated from 4 samples per problem) on problems released after model training.
ReasoningaccuracyHigher is better
No run guide for this benchmark yet.