GLM-5.2

Z.ai · GLM. Released Jun 16, 2026.

GLM-5.2 is a model from Z.ai in the GLM family, released Jun 16, 2026. evals.report tracks 14 reported GLM-5.2 benchmark scores across ARC-AGI-1, ARC-AGI-2, Humanity's Last Exam, AIME 2026, GPQA Diamond, MathArena HMMT February 2026, SWE-bench Pro, DeepSWE, and 6 more — each shown with its benchmark, metric, source status, and date, and never combined into a single ranking.

Open14 results

Benchmark results 14

Compare this model

Benchmark	Category	Score	Metric	Status	Date
ARC-AGI-1	Reasoning	77%	accuracy	Official	Jun 16, 2026	Details
ARC-AGI-2	Reasoning	22.78%	accuracy	Official	Jun 16, 2026	Details
Humanity's Last Exam	Reasoning	40.5%	accuracy	Verified	Jun 16, 2026	Details
AIME 2026	Reasoning	99.2%	accuracy	Verified	Jun 16, 2026	Details
GPQA Diamond	Reasoning	91.2%	accuracy	Verified	Jun 16, 2026	Details
MathArena HMMT February 2026	Reasoning	92.5%	accuracy	Verified	Jun 16, 2026	Details
SWE-bench Pro	Coding	62.1%	% resolved	Verified	Jun 16, 2026	Details
DeepSWE	Coding	46.2%	% resolved	Verified	Jun 16, 2026	Details
Terminal-Bench 2.1	Agents	81.0%	task success	Verified	Jun 16, 2026	Details
MCP Atlas	Tool use	76.8%	pass rate	Verified	Jun 16, 2026	Details
SWE-Marathon	Agents	13.0%	resolution rate (pass@1)	Verified	Jun 16, 2026	Details
FrontierSWE	Agents	74%	dominance score	Official	Jun 16, 2026	Details
PostTrainBench	Agents	34.3%	weighted average score	Verified	Jun 16, 2026	Details
FrontierCode	Coding	24.5%	weighted score (Main)	Official	Jun 16, 2026	Details

In the wild 5

See all

Real-world feedback on GLM-5.2 from people using it on actual prompts — praise and criticism alike, each linked to its source. Qualitative, never scored.

Report tone

Report type

Topic

Pranav Sriram

X·@PranavSriram1·Jun 22, 2026

Negative

For my research, Fable felt like a clear step change … I was excited about the GLM 5.2 hype and tried it; sadly it's nowhere close

Task Evaluating models for research work (alongside Fable and GPT-5.5 Pro).

anecdotal

@Hesamation

X·@Hesamation·Jun 21, 2026

Positive

GLM 5.2 ranks unusually high on FrontierSWE (long-horizon agentic engineering) … using it with OpenCode is also not far from the quality of Claude Code or Codex.

Task Day-to-day agentic coding with GLM-5.2 in OpenCode.

anecdotalhigh-signal user

Guillermo Rauch

X·@rauchg·Jun 21, 2026

Positive

Genuinely impressed, almost shocked, at how good GLM-5.2 … is at coding. This changes things.

anecdotalhigh-signal user

Theo

X·@theo·Jun 21, 2026

Mixed

Having an open weight model surpass GPT-5.4 and every Gemini model is dope. That said - it's not cheap. Both Opus 4.8 and GPT-5.5 set to "medium" are cheaper and smarter than GLM-5.2

anecdotalhigh-signal user

Hassan

X·@nutlope·Jun 16, 2026

Positive

Asked GLM 5.2 (left) and Opus 4.8 (right) to design a menu and GLM has even better taste while being 4x cheaper. GLM also nailed a lot of small details like adding "chef's pick" and "vegetatian" tags to some dishes.

Task Asked GLM-5.2 and Opus 4.8 to design a restaurant menu (UI/design).

high-signal useroutput shown