BenchmarksReasoning
Artificial Analysis Intelligence Index
A composite intelligence score (AAII v4.0) that aggregates a model's performance across 10 challenging evaluations spanning reasoning, knowledge, coding, agentic tasks, and instruction-following (GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, and CritPt) into a single ~0–100 index.
ReasoningIndexHigher is better
No run guide for this benchmark yet.