BenchmarksOther
Vectara Hallucination Leaderboard
Measures how often LLMs introduce hallucinations when summarizing short documents, scored by Vectara's HHEM-2.3 factual-consistency model, reported as a hallucination rate.
OtherHallucination RateLower is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| OpenAI o3-pro | OpenAI | 23.3% | — | Official | Jun 10, 2025 | Details |
| Grok 4.1 fast reasoning | xAI | 19.2% | — | Official | Nov 19, 2025 | Details |
| o4-mini | OpenAI | 18.6% | — | Official | Apr 16, 2025 | Details |
| GPT-5 | OpenAI | 15.1% | — | Official | Aug 7, 2025 | Details |
| GPT-OSS-120B | OpenAI | 14.2% | — | Official | Aug 5, 2025 | Details |
| Kimi K2.5 | Moonshot AI | 14.2% | — | Official | Jan 27, 2026 | Details |
| Gemini 3 Pro | Google DeepMind | 13.6% | — | Official | Nov 18, 2025 | Details |
| Gemini 3 Flash | Google DeepMind | 13.5% | — | Official | Dec 17, 2025 | Details |
| GPT-5 mini | OpenAI | 12.9% | — | Official | Aug 7, 2025 | Details |
| MiniMax M2.7 | MiniMax | 12.9% | — | Official | Mar 18, 2026 | Details |
| Claude Opus 4.6 | Anthropic | 12.2% | — | Official | Feb 5, 2026 | Details |
| GPT-5.1 | OpenAI | 12.1% | — | Official | Nov 12, 2025 | Details |
| Claude Sonnet 4.5 | Anthropic | 12.0% | — | Official | Sep 29, 2025 | Details |
| Claude Opus 4 | Anthropic | 12.0% | — | Official | May 22, 2025 | Details |
| Claude Opus 4.7 | Anthropic | 12.0% | — | Official | Apr 16, 2026 | Details |
| Claude Opus 4.1 | Anthropic | 11.8% | — | Official | Aug 5, 2025 | Details |
| MiniMax M2.1 | MiniMax | 11.8% | — | Official | Dec 23, 2025 | Details |
| GLM-4.7 | Z.ai | 11.7% | — | Official | Dec 22, 2025 | Details |
| DeepSeek R1 | DeepSeek | 11.3% | — | Official | Jan 20, 2025 | Details |
| Claude Opus 4.5 | Anthropic | 10.9% | — | Official | Nov 24, 2025 | Details |
| GPT-5.2 | OpenAI | 10.8% | — | Official | Dec 11, 2025 | Details |
| Kimi K2.6 | Moonshot AI | 10.8% | — | Official | Apr 20, 2026 | Details |
| Claude Sonnet 4.6 | Anthropic | 10.6% | — | Official | Feb 17, 2026 | Details |
| Gemini 3.1 Pro Preview | Google DeepMind | 10.4% | — | Official | Feb 19, 2026 | Details |
| Claude Sonnet 4 | Anthropic | 10.3% | — | Official | May 22, 2025 | Details |
| GLM-5 | Z.ai | 10.1% | — | Official | Feb 11, 2026 | Details |
| Claude Haiku 4.5 | Anthropic | 9.8% | — | Official | Oct 15, 2025 | Details |
| Jamba 1.7 Large | AI21 Labs | 9.7% | — | Official | Jul 3, 2025 | Details |
| GPT-4o | OpenAI | 9.6% | — | Official | May 13, 2024 | Details |
| GLM-4.6 | Z.ai | 9.5% | — | Official | Sep 30, 2025 | Details |
| GPT-5.5 | OpenAI | 9.3% | — | Official | Apr 23, 2026 | Details |
| MiniMax M2.5 | MiniMax | 9.1% | — | Official | Feb 12, 2026 | Details |
| DeepSeek V4 Pro | DeepSeek | 8.6% | — | Official | Apr 24, 2026 | Details |
| GPT-5.4 Pro | OpenAI | 8.3% | — | Official | Mar 5, 2026 | Details |
| Llama 4 Maverick | Meta | 8.2% | — | Official | Apr 5, 2025 | Details |
| Gemini 2.5 Flash | Google DeepMind | 7.8% | — | Official | Apr 17, 2025 | Details |
| Llama 4 Scout | Meta | 7.7% | — | Official | Apr 5, 2025 | Details |
| GPT-5.4 | OpenAI | 7.0% | — | Official | Mar 5, 2026 | Details |
| Gemini 2.5 Pro | Google DeepMind | 7.0% | — | Official | Mar 25, 2025 | Details |
| DeepSeek V3.2 | DeepSeek | 6.3% | — | Official | Dec 1, 2025 | Details |
| DeepSeek V3 | DeepSeek | 6.1% | — | Official | Dec 26, 2024 | Details |
| GPT-4.1 | OpenAI | 5.6% | — | Official | Apr 14, 2025 | Details |
| DeepSeek V3.1 | DeepSeek | 5.5% | — | Official | Aug 21, 2025 | Details |
| Amazon Nova 2 Lite | Amazon | 5.1% | — | Official | Dec 2, 2025 | Details |
| Mistral Large | Mistral AI | 4.5% | — | Official | Feb 26, 2024 | Details |
Each row reports the model’s Hallucination Rate on Vectara Hallucination Leaderboard. Click a row for the full run context.