BenchmarksReasoning
ARC-AGI-2
The ARC-AGI-2 abstract-reasoning puzzle benchmark (semi-private set), the harder static successor to ARC-AGI-1.
ReasoningaccuracyHigher is better
Tasks and evaluation are public; frontier scores are ARC-Prize-verified.
1Expected output
Use the official source links for current output format, submission steps, and benchmark-specific result files.
2Submit results
Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.
Gotchas
Public and semi-private splits differ; keep the reported effort/compute as run context.
Do not mix this benchmark's metric with unrelated benchmark metrics.