am.will
The game was pretty bad and notably worse than GPT 5.5. … GPT 5.5 by contrast did a pretty good job and required no follow ups.
On Asked it to build a Three.js replica of Rocket League via Codex.
Sakana AI · Fugu. Released Jun 15, 2026.
Fugu Ultra is a model from Sakana AI in the Fugu family, released Jun 15, 2026. evals.report tracks 7 reported Fugu Ultra benchmark scores across SWE-bench Pro, Terminal-Bench 2.1, Humanity's Last Exam, GPQA Diamond, CharXiv, SciCode, LiveCodeBench — each shown with its benchmark, metric, source status, and date, and never combined into a single ranking.
| Benchmark | Category | Score | Metric | Status | Date | |
|---|---|---|---|---|---|---|
| SWE-bench Pro | Coding | 73.7% | % resolved | Verified | Jun 15, 2026 | Details |
| Terminal-Bench 2.1 | Agents | 82.1% | task success | Verified | Jun 15, 2026 | Details |
| Humanity's Last Exam | Reasoning | 50.0% | accuracy | Verified | Jun 15, 2026 | Details |
| GPQA Diamond | Reasoning | 95.5% | accuracy | Verified | Jun 15, 2026 | Details |
| CharXiv | Multimodal | 86.6% | accuracy | Verified | Jun 15, 2026 | Details |
| SciCode | Coding | 58.7% | accuracy | Verified | Jun 15, 2026 | Details |
| LiveCodeBench | Coding | 93.2% | Pass@1 | Verified | Jun 15, 2026 | Details |
Real-world feedback on Fugu Ultra from people using it on actual prompts — praise and criticism alike, each linked to its source. Qualitative, never scored.
The game was pretty bad and notably worse than GPT 5.5. … GPT 5.5 by contrast did a pretty good job and required no follow ups.
On Asked it to build a Three.js replica of Rocket League via Codex.