evals.report
BenchmarksLabsCompareRun guidesIn the wild

Fugu Ultra

Sakana AI · Fugu. Released Jun 15, 2026.

Fugu Ultra is a model from Sakana AI in the Fugu family, released Jun 15, 2026. evals.report tracks 7 reported Fugu Ultra benchmark scores across SWE-bench Pro, Terminal-Bench 2.1, Humanity's Last Exam, GPQA Diamond, CharXiv, SciCode, LiveCodeBench — each shown with its benchmark, metric, source status, and date, and never combined into a single ranking.

7 results

Benchmark results 7

Compare this model
BenchmarkCategoryScoreMetricStatusDate
SWE-bench ProCoding73.7%% resolvedVerifiedJun 15, 2026Details
Terminal-Bench 2.1Agents82.1%task successVerifiedJun 15, 2026Details
Humanity's Last ExamReasoning50.0%accuracyVerifiedJun 15, 2026Details
GPQA DiamondReasoning95.5%accuracyVerifiedJun 15, 2026Details
CharXivMultimodal86.6%accuracyVerifiedJun 15, 2026Details
SciCodeCoding58.7%accuracyVerifiedJun 15, 2026Details
LiveCodeBenchCoding93.2%Pass@1VerifiedJun 15, 2026Details

In the wild 1

See all

Real-world feedback on Fugu Ultra from people using it on actual prompts — praise and criticism alike, each linked to its source. Qualitative, never scored.

am.will
X·@LLMJunky·
Negative
The game was pretty bad and notably worse than GPT 5.5. … GPT 5.5 by contrast did a pretty good job and required no follow ups.

On Asked it to build a Three.js replica of Rocket League via Codex.