BenchmarksReasoning
ARC-AGI-1
The original ARC-AGI-1 abstract-reasoning puzzle benchmark (semi-private set): few-shot grid transformations that are easy for humans but resist memorization. Largely cleared by 2026 frontier reasoning models, which is what motivated the harder ARC-AGI-2.
ReasoningaccuracyHigher is better
No run guide for this benchmark yet.