In the wild — real-world feedback on AI models

Official benchmarks show reported scores. In-the-wild reports show what users hit after release: latency, cost, quota, regressions, surprising wins, and task-specific failures. These are source-linked anecdotes, not benchmark scores.

Models

1 selected

Fugu UltraSakana AI

Report tone

Report type

Topic

AIAcademy · on Fugu Ultra

X·@AIAcademykorea·Jun 22, 2026

Mixed

The 5-hour limit has been exceeded, so I have to wait 4 hours. However, it kindly provides guidance … I like this one better because it is user-oriented, offering friendly guidance for beginners and general users.

anecdotal

AM9:21 · on Fugu Ultra

X·@AM921543266·Jun 22, 2026

Positive

It discovered 27 bugs that Fable 5 couldn't find and fixed all of them. The code quality is impeccable … it implemented about 70,000 lines of new features, resolved 4 issues, and created 7 PRs.

Task Introduced Fugu into a repo previously worked on with Claude Fable 5; ~1 hour of use.

anecdotaloutput shown

am.will · on Fugu Ultra

X·@LLMJunky·Jun 22, 2026

Negative

The game was pretty bad and notably worse than GPT 5.5. … GPT 5.5 by contrast did a pretty good job and required no follow ups.

Task Asked it to build a Three.js replica of Rocket League via Codex.

anecdotalprompt shownsingle run

Mark Santos · on Fugu Ultra

X·@markksantos·Jun 22, 2026

Mixed

In terms of model speed and performance, Fugu on Opencode won … inverted directional turn, wonky camera, no sfx, not identical to Crossy Road game.

Task Head-to-head vs Claude Opus 4.8: a single-file Three.js Crossy Road game.

prompt shownsingle run