BenchmarksOther
SimpleQA Verified
A factual short-answer QA benchmark measuring parametric knowledge and hallucination resistance (Epoch AI's SimpleQA Verified).
OtheraccuracyHigher is better
Problems and methodology are documented on the Epoch AI benchmarks hub.
1Expected output
Use the official source links for current output format, submission steps, and benchmark-specific result files.
2Submit results
Keep source URL, source model name, benchmark version, harness, and run context attached to any reported score.
Gotchas
Scores depend on grading strictness; keep the source methodology attached.
Do not mix this benchmark's metric with unrelated benchmark metrics.