BenchmarksReasoning
LongBench v2
A long-context benchmark of 503 challenging multiple-choice questions with contexts from 8k to 2M words across six task categories, designed to test deep understanding and reasoning over realistic long-context multitasks.
ReasoningaccuracyHigher is better
No run guide for this benchmark yet.