BenchmarksOther
SimpleQA Verified
A factual short-answer QA benchmark measuring parametric knowledge and hallucination resistance (Epoch AI's SimpleQA Verified).
OtheraccuracyHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| Gemini 3.1 Pro Preview | Google DeepMind | 77.3% | Gemini 3.1 Pro | Official | May 30, 2026 | Details |
| Gemini 3 Pro | Google DeepMind | 72.9% | Gemini 3 Pro | Official | May 30, 2026 | Details |
| Gemini 3.5 Flash | Google DeepMind | 68.4% | Gemini 3.5 Flash | Official | May 30, 2026 | Details |
| Qwen3 Max | Alibaba / Qwen | 67.5% | Qwen3-Max | Official | May 30, 2026 | Details |
| Gemini 3 Flash | Google DeepMind | 67.4% | Gemini 3 Flash | Official | May 30, 2026 | Details |
| Muse Spark | Meta | 66.3% | Muse Spark | Official | May 30, 2026 | Details |
| GPT-5.5 Pro | OpenAI | 64.5% | GPT-5.5 Pro | Official | May 30, 2026 | Details |
| GPT-5.5 | OpenAI | 63.1% | GPT-5.5 | Official | May 30, 2026 | Details |
| Qwen 3.6 Max Preview | Alibaba / Qwen | 56.9% | Qwen 3.6 Max (Preview) | Official | May 30, 2026 | Details |
| Gemini 2.5 Pro | Google DeepMind | 56.0% | Gemini 2.5 Pro (Jun 2025) | Official | May 30, 2026 | Details |
| o3 | OpenAI | 53.0% | o3 | Official | May 30, 2026 | Details |
| Claude Opus 4.7 | Anthropic | 50.6% | Claude Opus 4.7 | Official | May 30, 2026 | Details |
| GPT-5 high | OpenAI | 50.6% | GPT-5 | Official | May 30, 2026 | Details |
| Qwen3 235B A22B Instruct 2507 | Alibaba / Qwen | 50.1% | Qwen3-235B-A22B (Jul 2025) | Official | May 30, 2026 | Details |
| Qwen 3.6 Plus | Alibaba / Qwen | 49.1% | Qwen 3.6 Plus | Official | May 30, 2026 | Details |
| GPT-5.1 | OpenAI | 48.9% | GPT-5.1 | Official | May 30, 2026 | Details |
| Grok 4 | xAI | 47.9% | Grok 4 | Official | May 30, 2026 | Details |
| GPT-5.4 Pro | OpenAI | 47.8% | GPT-5.4 Pro | Official | May 30, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 46.5% | Claude Opus 4.6 | Official | May 30, 2026 | Details |
| GPT-5.4 xHigh | OpenAI | 44.8% | GPT-5.4 | Official | May 30, 2026 | Details |
Each row reports the model’s accuracy on SimpleQA Verified. Click a row for the full run context.