evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

ARC-AGI-3

The interactive ARC-AGI-3 generalization benchmark: agents must learn novel game environments from scratch (semi-private set).

ReasoningaccuracyHigher is better

Dataset/task execution is documented, but frontier submissions are competition-style.

Benchmark
ARC-AGI-3
Repository
Not provided
Dataset
arcprize.org
Metric
accuracy

1Expected output

Use the official source links for current output format, submission steps, and benchmark-specific result files.

2Submit results

Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.

Gotchas

Competition submissions and private/evaluation splits make provenance important.
Do not mix this benchmark's metric with unrelated benchmark metrics.