BenchmarksReasoning
MultiNRC
A native (non-translated) multilingual reasoning benchmark of 1,000+ questions written by native speakers in French, Spanish, and Chinese across four categories (language-specific linguistic reasoning, wordplay/riddles, cultural/tradition reasoning, and culturally relevant math), scoring LLMs on accuracy.
ReasoningaccuracyHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Gemini 3.1 Pro Preview | Google DeepMind | 64.74% | — | Official | Feb 19, 2026 | Details |
| GPT-5.4 Pro | OpenAI | 62.27% | — | Official | Mar 5, 2026 | Details |
| Muse Spark | Meta | 59.05% | — | Official | Apr 8, 2026 | Details |
| Gemini 3 Pro | Google DeepMind | 58.96% | — | Official | Nov 18, 2025 | Details |
| GPT-5.4 | OpenAI | 58.29% | — | Official | Mar 5, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 57.06% | — | Official | Feb 5, 2026 | Details |
| GPT-5 | OpenAI | 52.13% | — | Official | Aug 7, 2025 | Details |
| GPT-5.1 | OpenAI | 49.00% | — | Official | Nov 12, 2025 | Details |
| OpenAI o3-pro | OpenAI | 49.00% | — | Official | Jun 10, 2025 | Details |
| Claude Opus 4.5 | Anthropic | 48.63% | — | Official | Nov 24, 2025 | Details |
| o3 | OpenAI | 45.50% | — | Official | Apr 16, 2025 | Details |
| Gemini 2.5 Pro | Google DeepMind | 45.12% | — | Official | Mar 25, 2025 | Details |
| GPT-5.2 | OpenAI | 42.18% | — | Official | Dec 11, 2025 | Details |
| Claude Opus 4.1 | Anthropic | 38.39% | — | Official | Aug 5, 2025 | Details |
| Claude Sonnet 4.5 | Anthropic | 35.83% | — | Official | Sep 29, 2025 | Details |
| Kimi K2.5 | Moonshot AI | 35.17% | — | Official | Jan 27, 2026 | Details |
| Claude Opus 4 | Anthropic | 33.93% | — | Official | May 22, 2025 | Details |
| Claude 3.7 Sonnet | Anthropic | 27.77% | — | Official | Feb 24, 2025 | Details |
| DeepSeek R1 | DeepSeek | 24.27% | — | Official | Jan 20, 2025 | Details |
| GPT-5 mini | OpenAI | 23.89% | — | Official | Aug 7, 2025 | Details |
| DeepSeek V3.1 | DeepSeek | 23.60% | — | Official | Aug 21, 2025 | Details |
| o4-mini | OpenAI | 22.18% | — | Official | Apr 16, 2025 | Details |
| GPT-4.1 | OpenAI | 21.23% | — | Official | Apr 14, 2025 | Details |
| Claude Sonnet 4 | Anthropic | 18.39% | — | Official | May 22, 2025 | Details |
| GPT-OSS-120B | OpenAI | 15.17% | — | Official | Aug 5, 2025 | Details |
| GPT-4o | OpenAI | 12.42% | — | Official | May 13, 2024 | Details |
| Llama 4 Maverick | Meta | 8.44% | — | Official | Apr 5, 2025 | Details |
Each row reports the model’s accuracy on MultiNRC. Click a row for the full run context.