evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

OpenAI-MRCR v2 (Multi-Round Coreference Resolution)

A long-context retrieval benchmark in which a model must locate and reproduce a specific instance (the i-th 'needle') of repeated similar requests buried in a long synthetic multi-turn conversation, scored on the 8-needle variant across context lengths up to 1M tokens.

Reasoningaccuracy (mean SequenceMatcher similarity)Higher is better

No run guide for this benchmark yet.