evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

MultiNRC

A native (non-translated) multilingual reasoning benchmark of 1,000+ questions written by native speakers in French, Spanish, and Chinese across four categories (language-specific linguistic reasoning, wordplay/riddles, cultural/tradition reasoning, and culturally relevant math), scoring LLMs on accuracy.

ReasoningaccuracyHigher is better

What this benchmark measures

A native (non-translated) multilingual reasoning benchmark of 1,000+ questions written by native speakers in French, Spanish, and Chinese across four categories (language-specific linguistic reasoning, wordplay/riddles, cultural/tradition reasoning, and culturally relevant math), scoring LLMs on accuracy.

Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.

The metric shown here is accuracy. It should be interpreted within MultiNRC, not compared as part of a site-wide ranking.

No composite ranking
evals.report never combines benchmarks. accuracy on MultiNRC is its own number — don’t average it with other metrics.