evals.report
BenchmarksLabsCompareRun guidesIn the wild
BenchmarksReasoning

ARC-AGI-1

The original ARC-AGI-1 abstract-reasoning puzzle benchmark (semi-private set): few-shot grid transformations that are easy for humans but resist memorization. Largely cleared by 2026 frontier reasoning models, which is what motivated the harder ARC-AGI-2.

ReasoningaccuracyHigher is better

No run guide for this benchmark yet.