BenchmarksReasoning
Epoch Capabilities Index
A composite capability index from Epoch AI that statistically stitches together scores from 40+ benchmarks (using an Item-Response-Theory-style model) into a single saturation-resistant general-capability scale, calibrated so Claude 3.5 Sonnet=130 and GPT-5=150.
ReasoningIndexHigher is better
No run guide for this benchmark yet.