OpenAI-MRCR v2 (Multi-Round Coreference Resolution)
A long-context retrieval benchmark in which a model must locate and reproduce a specific instance (the i-th 'needle') of repeated similar requests buried in a long synthetic multi-turn conversation, scored on the 8-needle variant across context lengths up to 1M tokens.
What this benchmark measures
A long-context retrieval benchmark in which a model must locate and reproduce a specific instance (the i-th 'needle') of repeated similar requests buried in a long synthetic multi-turn conversation, scored on the 8-needle variant across context lengths up to 1M tokens.
Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.
The metric shown here is accuracy (mean SequenceMatcher similarity). It should be interpreted within OpenAI-MRCR v2 (Multi-Round Coreference Resolution), not compared as part of a site-wide ranking.