LabsAI21 Labs
Models 1
Progress by benchmark
Show progress on
Single benchmark only
This view shows GPQA Diamond (accuracy) only. Other benchmarks use different metrics and are not directly comparable.
Progress matrix
Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.