In the wild — real-world feedback on AI models

Official benchmarks show reported scores. In-the-wild reports show what users hit after release: latency, cost, quota, regressions, surprising wins, and task-specific failures. These are source-linked anecdotes, not benchmark scores.

Models

0 selected

All models

Report tone

Report type

Topic

Prem · on Claude Opus 4.8

X·@btrmasaladosa·Jun 22, 2026

Positive

i was stuck on a landing page redesign with gpt 5.5 and opus 4.6 since a couple of days. gave a fresh try with opus 4.8 and it one shotted what i was looking for

Task A landing-page redesign GPT-5.5 and Opus 4.6 hadn't cracked over several days.

anecdotal

Mark Santos · on Claude Opus 4.8

X·@markksantos·Jun 22, 2026

Mixed

I think in terms of application functionality, quality, and design, Opus won … got stuck twice in a retry loop (had to prompt to self-correct).

Task Head-to-head vs Sakana Fugu Ultra: a single-file Three.js Crossy Road game.

prompt shownsingle run

Igor Kotenkov · on Gemini 3 Pro

X·@stalkermustang·Jun 22, 2026

Mixed

It is great at writing - i'm using it to this day. It was good in one-shotting front-end. But agentic? … in my memory it was never a catch up in the most important and money making areas

anecdotalhigh-signal user

AIAcademy · on Fugu Ultra

X·@AIAcademykorea·Jun 22, 2026

Mixed

The 5-hour limit has been exceeded, so I have to wait 4 hours. However, it kindly provides guidance … I like this one better because it is user-oriented, offering friendly guidance for beginners and general users.

anecdotal

AM9:21 · on Fugu Ultra

X·@AM921543266·Jun 22, 2026

Positive

It discovered 27 bugs that Fable 5 couldn't find and fixed all of them. The code quality is impeccable … it implemented about 70,000 lines of new features, resolved 4 issues, and created 7 PRs.

Task Introduced Fugu into a repo previously worked on with Claude Fable 5; ~1 hour of use.

anecdotaloutput shown

am.will · on Fugu Ultra

X·@LLMJunky·Jun 22, 2026

Negative

The game was pretty bad and notably worse than GPT 5.5. … GPT 5.5 by contrast did a pretty good job and required no follow ups.

Task Asked it to build a Three.js replica of Rocket League via Codex.

anecdotalprompt shownsingle run

Mark Santos · on Fugu Ultra

X·@markksantos·Jun 22, 2026

Mixed

In terms of model speed and performance, Fugu on Opencode won … inverted directional turn, wonky camera, no sfx, not identical to Crossy Road game.

Task Head-to-head vs Claude Opus 4.8: a single-file Three.js Crossy Road game.

prompt shownsingle run

Pranav Sriram · on GLM-5.2

X·@PranavSriram1·Jun 22, 2026

Negative

For my research, Fable felt like a clear step change … I was excited about the GLM 5.2 hype and tried it; sadly it's nowhere close

Task Evaluating models for research work (alongside Fable and GPT-5.5 Pro).

anecdotal

Machina · on Claude Opus 4.8

X·@EXM7777·Jun 21, 2026

Mixed

Opus 4.8 in the last 48hrs is amazing … it's just very sad to go from godlike performance to barely usable some days.

anecdotal

@ceo_tommy1 · on GPT-5.5 Pro

X·@ceo_tommy1·Jun 21, 2026

Positive

It's way too convenient to make Codex handle GPT5.5Pro work, and it makes my tasks infinitely more productive.

Task Using GPT-5.5 Pro from the Codex CLI for day-to-day work.

anecdotalpaid user

@Hesamation · on GLM-5.2

X·@Hesamation·Jun 21, 2026

Positive

GLM 5.2 ranks unusually high on FrontierSWE (long-horizon agentic engineering) … using it with OpenCode is also not far from the quality of Claude Code or Codex.

Task Day-to-day agentic coding with GLM-5.2 in OpenCode.

anecdotalhigh-signal user

Guillermo Rauch · on GLM-5.2

X·@rauchg·Jun 21, 2026

Positive

Genuinely impressed, almost shocked, at how good GLM-5.2 … is at coding. This changes things.

anecdotalhigh-signal user

Theo · on GLM-5.2

X·@theo·Jun 21, 2026

Mixed

Having an open weight model surpass GPT-5.4 and every Gemini model is dope. That said - it's not cheap. Both Opus 4.8 and GPT-5.5 set to "medium" are cheaper and smarter than GLM-5.2

anecdotalhigh-signal user

@spoobsV1 · on Claude Fable 5

X·@spoobsV1·Jun 9, 2026

Positive

Wow Claude Fable 5 is insane!! It just recreated the 2011 game of the year … The Elder Scrolls V: Skyrim in ONE prompt.

Task The entire prompt was: make skyrim.

anecdotalprompt shown

elvis · on DeepSeek V4 Pro

X·@omarsar0·May 1, 2026

Positive

I have been testing DeepSeek-V4-Pro with the Pi coding agent. I am mindblown by how well it works out of the box.

Task Built an LLM wiki with an agent powered entirely by DeepSeek-V4-Pro.

anecdotalhigh-signal user