evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

AIME 2026

Accuracy of LLMs on the 30 problems of the 2026 American Invitational Mathematics Examination (AIME I and II), a contamination-free competition-math benchmark requiring integer answers (0-999), evaluated live by MathArena.

ReasoningaccuracyHigher is better

No run guide for this benchmark yet.