evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

Epoch Capabilities Index

A composite capability index from Epoch AI that statistically stitches together scores from 40+ benchmarks (using an Item-Response-Theory-style model) into a single saturation-resistant general-capability scale, calibrated so Claude 3.5 Sonnet=130 and GPT-5=150.

ReasoningIndexHigher is better

No run guide for this benchmark yet.