evals.report
BenchmarksLabsCompareRun guides

Vectara Hallucination Leaderboard

Measures how often LLMs introduce hallucinations when summarizing short documents, scored by Vectara's HHEM-2.3 factual-consistency model, reported as a hallucination rate.

OtherHallucination RateLower is better
ModelLabScoreSource modelStatusDate
OpenAI o3-proOpenAI23.3%OfficialJun 10, 2025Details
Grok 4.1 fast reasoningxAI19.2%OfficialNov 19, 2025Details
o4-miniOpenAI18.6%OfficialApr 16, 2025Details
GPT-5OpenAI15.1%OfficialAug 7, 2025Details
GPT-OSS-120BOpenAI14.2%OfficialAug 5, 2025Details
Kimi K2.5Moonshot AI14.2%OfficialJan 27, 2026Details
Gemini 3 ProGoogle DeepMind13.6%OfficialNov 18, 2025Details
Gemini 3 FlashGoogle DeepMind13.5%OfficialDec 17, 2025Details
GPT-5 miniOpenAI12.9%OfficialAug 7, 2025Details
MiniMax M2.7MiniMax12.9%OfficialMar 18, 2026Details
Claude Opus 4.6Anthropic12.2%OfficialFeb 5, 2026Details
GPT-5.1OpenAI12.1%OfficialNov 12, 2025Details
Claude Sonnet 4.5Anthropic12.0%OfficialSep 29, 2025Details
Claude Opus 4Anthropic12.0%OfficialMay 22, 2025Details
Claude Opus 4.7Anthropic12.0%OfficialApr 16, 2026Details
Claude Opus 4.1Anthropic11.8%OfficialAug 5, 2025Details
MiniMax M2.1MiniMax11.8%OfficialDec 23, 2025Details
GLM-4.7Z.ai11.7%OfficialDec 22, 2025Details
DeepSeek R1DeepSeek11.3%OfficialJan 20, 2025Details
Claude Opus 4.5Anthropic10.9%OfficialNov 24, 2025Details
GPT-5.2OpenAI10.8%OfficialDec 11, 2025Details
Kimi K2.6Moonshot AI10.8%OfficialApr 20, 2026Details
Claude Sonnet 4.6Anthropic10.6%OfficialFeb 17, 2026Details
Gemini 3.1 Pro PreviewGoogle DeepMind10.4%OfficialFeb 19, 2026Details
Claude Sonnet 4Anthropic10.3%OfficialMay 22, 2025Details
GLM-5Z.ai10.1%OfficialFeb 11, 2026Details
Claude Haiku 4.5Anthropic9.8%OfficialOct 15, 2025Details
Jamba 1.7 LargeAI21 Labs9.7%OfficialJul 3, 2025Details
GPT-4oOpenAI9.6%OfficialMay 13, 2024Details
GLM-4.6Z.ai9.5%OfficialSep 30, 2025Details
GPT-5.5OpenAI9.3%OfficialApr 23, 2026Details
MiniMax M2.5MiniMax9.1%OfficialFeb 12, 2026Details
DeepSeek V4 ProDeepSeek8.6%OfficialApr 24, 2026Details
GPT-5.4 ProOpenAI8.3%OfficialMar 5, 2026Details
Llama 4 MaverickMeta8.2%OfficialApr 5, 2025Details
Gemini 2.5 FlashGoogle DeepMind7.8%OfficialApr 17, 2025Details
Llama 4 ScoutMeta7.7%OfficialApr 5, 2025Details
GPT-5.4OpenAI7.0%OfficialMar 5, 2026Details
Gemini 2.5 ProGoogle DeepMind7.0%OfficialMar 25, 2025Details
DeepSeek V3.2DeepSeek6.3%OfficialDec 1, 2025Details
DeepSeek V3DeepSeek6.1%OfficialDec 26, 2024Details
GPT-4.1OpenAI5.6%OfficialApr 14, 2025Details
DeepSeek V3.1DeepSeek5.5%OfficialAug 21, 2025Details
Amazon Nova 2 LiteAmazon5.1%OfficialDec 2, 2025Details
Mistral LargeMistral AI4.5%OfficialFeb 26, 2024Details

Each row reports the model’s Hallucination Rate on Vectara Hallucination Leaderboard. Click a row for the full run context.