evals.report
BenchmarksLabsCompareRun guides
BenchmarksReasoning

EnigmaEval

A benchmark of 1,184 puzzle-hunt challenges spanning text and images that probes models' ability to perform implicit knowledge synthesis, lateral thinking, and multi-step deductive reasoning to uncover hidden solution paths.

ReasoningaccuracyHigher is better

No run guide for this benchmark yet.