evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

AA-Omniscience: Knowledge and Hallucination Benchmark

A factuality and knowledge benchmark of 6,000 questions across 42 economically relevant topics in six domains, scoring models on the AA-Omniscience Index (-100 to 100) that rewards correct answers, penalizes hallucinations, and applies no penalty for abstaining.

ReasoningAA-Omniscience IndexHigher is better
ModelLabScoreSource modelStatusDate
Gemini 3.1 Pro PreviewGoogle DeepMind33OfficialFeb 19, 2026Details
Claude Opus 4.8Anthropic27OfficialMay 28, 2026Details
Claude Opus 4.7Anthropic26OfficialApr 16, 2026Details
Gemini 3.5 FlashGoogle DeepMind23OfficialMay 19, 2026Details
GPT-5.5OpenAI20OfficialApr 23, 2026Details
Grok 4.3xAI18OfficialApr 17, 2026Details
Claude Sonnet 4.6Anthropic12OfficialFeb 17, 2026Details
Kimi K2.6Moonshot AI6OfficialApr 20, 2026Details
GPT-5.4OpenAI6OfficialMar 5, 2026Details
Muse SparkMeta4OfficialApr 8, 2026Details
MiMo-V2.5-ProXiaomi4OfficialApr 22, 2026Details
GLM-5.1Z.ai2OfficialApr 7, 2026Details
MiniMax M2.7MiniMax1OfficialMar 18, 2026Details
Claude Haiku 4.5Anthropic-4OfficialOct 15, 2025Details
DeepSeek V4 ProDeepSeek-10OfficialApr 24, 2026Details
Llama 3.1 405BMeta-17OfficialJul 23, 2024Details
DeepSeek V4 FlashDeepSeek-23OfficialApr 24, 2026Details
Qwen3.5-397B-A17BAlibaba / Qwen-30OfficialFeb 16, 2026Details
Mistral Medium 3.5Mistral AI-36OfficialApr 28, 2026Details
NVIDIA Nemotron 3 Super 120B-A12BNVIDIA-42OfficialMar 10, 2026Details
GPT-OSS-120BOpenAI-50OfficialAug 5, 2025Details

Each row reports the model’s AA-Omniscience Index on AA-Omniscience: Knowledge and Hallucination Benchmark. Click a row for the full run context.