Model profile · Z.AI

GLM-5.2

Name: GLM-5.2
Author: Z.AI

CurrentReleased Jun 16, 2026Open WeightReasoning1M context

GLM-5.2 scores 62.9 out of 100 and ranks #41 of 214. This profile shows 18 source-displayable benchmark rows; its strongest eligible category is Knowledge at #9. API pricing is $1.4 input and $4.4 output per million tokens.

Data as of July 31, 2026 · How the score is built

Compare GLM-5.2 Find alternatives

Strongest published evidence

Knowledge ranks #9. Particularly effective for knowledge-intensive tasks like research, analysis, and factual Q&A.

Validate before choosing

18 published rows leave some tracked benchmark slots empty. Independent runtime speed has not been measured.

Decision snapshot

Each value carries a field reference instead of floating alone. Markers compare this model with the current ranked and priced catalog; they are not absolute quality thresholds.

Capability

62.9/100

field median 57.3

#41 of 214 ranked models

Price

$1.40input / $4.40 output

input median $1

blended $2.90

Speed

Not measured

field median 108 tok/s

Time to first token not measured

Context

1Mtokens

field median 256,000

Reported for this model; direct source link not stored

Capability shape

Each axis shows percentile within that category’s eligible cohort. The comparison outline is the median of the six nearest public-score peers; a collapsed vertex means the category is not rank-eligible.

Agentic84th percentile
Coding91st percentile
ReasoningNot eligible
Knowledge85th percentile
MathNot eligible
MultilingualNot eligible
MultimodalNot eligible
Instruction followingNot eligible

The dashed outline is median of 6 nearest peers.

Top decileTop quartileMid-fieldNot eligible

What it costs to get this score

Published API price against the public score. The x-axis uses a log scale; the dashed path marks models that are not beaten by a cheaper, higher-scoring option. Price uses average of published input and output rates.

Explore all models

The chart opens on the current model. Scroll horizontally to inspect the full price axis.

Current modelGLM-5.2 · 62.9 score · $2.90 blended per million tokens

Horizontal: blended price per million tokens, log scale · Vertical: public score

How much of this is verified

Coverage is split by category so a strong number never hides a thin evidence base. Verified means the row is tied to a published source; provisional rows remain visible but separate.

Agentic4/4 verified
Coding5/5 verified
Reasoning1/1 verified
Knowledge4/4 verified
Math4/4 verified
MultilingualNot measured
MultimodalNot measured
Inst. FollowingNot measured

Verified sourceProvisionalNot measured

Spec sheet

Each documented value carries its source. Missing fields stay visible as not sourced or not published, rather than disappearing from the page.

API model ID: Not published
Context window: 1M
Maximum output: Not sourced yet
Knowledge cutoff: Not sourced yet
Input modalities: Not sourced yet
Output modalities: Not sourced yet
Parameters: Not sourced yet

Availability: Not sourced yet
Cloud regions: Not tracked yet
Lifecycle: Current
API capabilities: Tool calling, structured outputs, and batch support are not tracked yet
Prompt caching: Not documented in the pricing record
Self-host: Open weights available; hardware estimate not sourced
Rate limits: Not tracked yet

Deployment options

Self-host and provider-specific paths stay separate from benchmark evidence so operating constraints are visible before a score becomes the whole decision.

Published weights are available, but BenchLM does not yet have a sourced parameter and VRAM profile for this exact model. Hardware cost estimates stay unavailable until that sizing record is complete.

Estimate VRAM from known parameters

Category score record

Scores and ranks appear only where published evidence can be displayed. The table keeps the score, weight, cohort, and evidence state together.

Category scores, ranks, weighting, benchmark coverage, and evidence status
Category	Score	Rank	Percentile	Weight	Benchmarks	Evidence
AgenticRank #21 of 128Percentile 84thWeight 22%4 benchmarksVerified	56.8	#21 of 128	84th	22%	4 benchmarks	Verified
CodingRank #12 of 129Percentile 91stWeight 20%5 benchmarksVerified	64.4	#12 of 129	91st	20%	5 benchmarks	Verified
ReasoningWeight 17%1 benchmarkVerified	Score pending	Not ranked	Not available	17%	1 benchmark	Verified
KnowledgeRank #9 of 55Percentile 85thWeight 12%4 benchmarksVerified	82.4	#9 of 55	85th	12%	4 benchmarks	Verified
MathRank Not rankedWeight 5%4 benchmarksVerified	81.4	Not ranked	Not available	5%	4 benchmarks	Verified
MultilingualWeight 7%0 benchmarksNot measured	Not measured	Not ranked	Not available	7%	0 benchmarks	Not measured
MultimodalWeight 12%0 benchmarksNot measured	Not measured	Not ranked	Not available	12%	0 benchmarks	Not measured
Inst. FollowingWeight 5%0 benchmarksNot measured	Not measured	Not ranked	Not available	5%	0 benchmarks	Not measured

Benchmark ledger

Coding opens by default. The marker compares each value with the best source-verified result in the catalog; provisional leaders do not set the reference. Expand the remaining categories for every published row.

Coding5 rows

Coding benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
SWE-bench Pro	Score62.1%	Versus best verified row Best verified: Claude Mythos 5 · 80.3%	Gap18.2 behind	WeightWeighted 10%	Provider exact Z.AI GLM-5.2 model card
NL2Repo	Score48.9%	Versus best verified row Best verified: DeepSeek V4 Flash (Max) · 54.2%	Gap5.3 behind	WeightDisplay only	Provider exact Z.AI GLM-5.2 model card
Terminal-Bench 2.0	Score81.0%	Versus best verified row Best verified: GPT-5.6 Sol · 91.9%	Gap10.9 behind	WeightDisplay only	Provider exact Z.AI GLM-5.2 model card
ProgramBenchProgramBench: Can Language Models Rebuild Programs From Scratch?	Score63.7%	Versus best verified row Best verified: Claude Opus 5 · 93.0%	Gap29.3 behind	WeightDisplay only	Provider exact Z.AI GLM-5.2 model card
cursorBench32	Score55.0%	Versus best verified row Best verified: Claude Fable 5 · 70.5%	Gap15.5 behind	WeightDisplay only	Benchmark exact Cursor evals: CursorBench 3.2

Agentic4 rows

Agentic benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
Terminal-Bench 2.0	Score81%	Versus best verified row Best verified: GPT-5.6 Sol · 91.9%	Gap10.9 behind	WeightWeighted 38%	Provider exact Z.AI GLM-5.2 model card
MCP Atlas	Score76.8%	Versus best verified row Best verified: Muse Spark 1.1 · 88.1%	Gap11.3 behind	WeightDisplay only	Provider exact Z.AI GLM-5.2 model card
Toolathlon	Score48.2%	Versus best verified row Best verified: Muse Spark 1.1 · 75.6%	Gap27.4 behind	WeightDisplay only	Provider exact Z.AI GLM-5.2 model card
ResearchClawBench	Score20.7%	Versus best verified row Best verified: Claude Opus 4.8 · 21.1%	Gap0.4 behind	WeightDisplay only	Benchmark exact ResearchClawBench leaderboard

Reasoning1 row

Reasoning benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
CritPtCritical Physics Tasks	Score20.9%	Versus best verified row Best verified: GLM-5.2 · 20.9%	GapBest verified	WeightDisplay only	Provider exact Z.AI GLM-5.2 model card

Knowledge4 rows

Knowledge benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
HLEHumanity's Last Exam	Score54.7%	Versus best verified row Best verified: Claude Opus 5 · 64.7%	Gap10 behind	WeightWeighted 45%	Provider exact Z.AI GLM-5.2 model card
GPQAGraduate-Level Google-Proof Q&A	Score91.2%	Versus best verified row Best verified: Sakana Fugu-Ultra · 95.5%	Gap4.3 behind	WeightWeighted 7%	Provider exact Z.AI GLM-5.2 model card
GPQA-DGPQA Diamond	Score91.2%	Versus best verified row Best verified: Sakana Fugu-Ultra · 95.5%	Gap4.3 behind	WeightDisplay only	Provider exact Z.AI GLM-5.2 model card
HLE w/o toolsHumanity's Last Exam without tools	Score40.5%	Versus best verified row Best verified: Claude Mythos 5 · 59%	Gap18.5 behind	WeightDisplay only	Provider exact Z.AI GLM-5.2 model card

Math4 rows

Math benchmark values, best verified comparison, weight, and source status
Benchmark	Score	Versus best verified row	Gap	Weight	Evidence
AIME26AIME 2026	Score99.2%	Versus best verified row Best verified: GLM-5.2 · 99.2%	GapBest verified	WeightWeighted 25%	Provider exact Z.AI GLM-5.2 model card
HMMT Feb 2026Harvard-MIT Mathematics Tournament February 2026	Score92.5%	Versus best verified row Best verified: Qwen3.7 Max · 97.1%	Gap4.6 behind	WeightWeighted 25%	Provider exact Z.AI GLM-5.2 model card
HMMT Nov 2025Harvard-MIT Mathematics Tournament November 2025	Score94.4%	Versus best verified row Best verified: Qwen3.6 Plus · 94.6%	Gap0.2 behind	WeightDisplay only	Provider exact Z.AI GLM-5.2 model card
MMAnswerBench	Score91.0%	Versus best verified row Best verified: GLM-5.2 · 91.0%	GapBest verified	WeightDisplay only	Provider exact Z.AI GLM-5.2 model card

Lineage

The sequence follows explicit supersedes links. Scores and prices remain blank when the corresponding public row or first-party rate is unavailable.

Mar 1, 2026

GLM-5

Score 65.2 · $1 / $3.2

Apr 7, 2026

GLM-5.1

Score 66.8 · $1.4 / $4.4

Jun 16, 2026 · you are here

GLM-5.2

Score 62.9 · $1.4 / $4.4

flagship · 5.2

GLM-5 GLM-5 · 65.2 GLM-5.1 · 66.8 GLM-5 (Reasoning) · 59.0 GLM-5-Turbo · 65.9 GLM-5V-Turbo · 62.1

How to read this profile

The visual layer above carries the decisions. These notes preserve the model, ranking, coverage, and family context behind the numbers.

GLM-5.2 ranks #41 of 214 on the public leaderboard with a score of 62.94/100. It does not yet have enough sourced coverage for a verified position.

GLM-5.2 is a open weight model with a 1M context window. It uses an explicit reasoning mode, which can improve complex problem solving while adding latency and token use.

Official exact-value snapshot from Z.AI's GLM-5.2 launch materials and Hugging Face model card. BenchLM maps directly comparable rows into existing keys; DeepSWE, FrontierSWE, PostTrainBench, SWE-Marathon, and other unsupported long-horizon rows remain excluded or external-only until stable local keys exist.

GLM-5.2 sits in the GLM-5 family with GLM-5, GLM-5.1, GLM-5 (Reasoning), GLM-5-Turbo, GLM-5V-Turbo. Its explicit predecessor is GLM-5.1. 18 of 376 tracked benchmark slots currently have displayable evidence. Missing categories stay blank.

Its strongest eligible category is Knowledge at #9, while its lowest eligible position is Agentic at #21. particularly effective for knowledge-intensive tasks like research, analysis, and factual Q&A.

Radar

GLM-5.2 release history

Full release history

GLM-5.2 released

Z.AI · Model release

Source confirmedJun 16, 2026

Frequently asked questions

How does GLM-5.2 perform overall in AI benchmarks?

GLM-5.2 ranks #41 out of 214 models on the public BenchAlign leaderboard, with a score of 62.94/100. Its evidence status is Estimated, and this profile shows 18 source-displayable benchmark rows. The label describes evidence depth, not a provider quality claim; inspect category rows before choosing a workload.

Is GLM-5.2 good for knowledge and understanding?

GLM-5.2 ranks #9 out of 55 eligible models for knowledge and understanding, with a public category score of 82.4/100. That places it in the current top ten for this category. Check the underlying rows before treating the aggregate as a workload guarantee.

Is GLM-5.2 good for coding and programming?

GLM-5.2 ranks #12 out of 129 eligible models for coding and programming, with a public category score of 64.4/100. Higher-ranked alternatives are available for workloads where this category decides the choice. Check the underlying rows before treating the aggregate as a workload guarantee.

Is GLM-5.2 good for mathematics?

GLM-5.2 has source-displayable benchmark coverage for mathematics, but the public category table does not assign it a rank there. The individual rows remain available for inspection. A missing category position means the evidence threshold was not met; it does not convert the model's unmeasured work into a zero.

Is GLM-5.2 good for reasoning and logic?

GLM-5.2 has source-displayable benchmark coverage for reasoning and logic, but the public category table does not assign it a rank there. The individual rows remain available for inspection. A missing category position means the evidence threshold was not met; it does not convert the model's unmeasured work into a zero.

Is GLM-5.2 good for agentic tool use and computer tasks?

GLM-5.2 ranks #21 out of 128 eligible models for agentic tool use and computer tasks, with a public category score of 56.8/100. Higher-ranked alternatives are available for workloads where this category decides the choice. Check the underlying rows before treating the aggregate as a workload guarantee.

Is GLM-5.2 open source?

GLM-5.2 is an open-weight model from Z.AI. Its weights can be downloaded for local or hosted deployment, subject to the published license. Open weight does not automatically mean open source: training data and training code may remain private, and commercial restrictions can still apply.

Which sibling models are related to GLM-5.2?

GLM-5.2 belongs to the GLM-5 family. Related tracked variants include GLM-5, GLM-5.1, GLM-5 (Reasoning), plus 2 more. A sibling link indicates shared lineage or a documented configuration relationship; it does not mean the variants have identical pricing, context limits, benchmark evidence, or deployment behavior. Compare before switching.

Does GLM-5.2 have full benchmark coverage on BenchLM?

No. GLM-5.2 currently has 42 source-displayable rows across 376 tracked benchmark slots. The profile exposes published, non-generated evidence and leaves missing categories blank until an exact evaluation is available. Coverage describes how much was measured; it is not a penalty added to an individual benchmark result.

What is the context window size of GLM-5.2?

GLM-5.2 has a reported context window of 1M in the exact-model catalog record. The value stays visible, but the profile marks its source link as unavailable instead of presenting it as directly documented. Maximum output length remains separate because providers often publish a different limit.

Last updated July 31, 2026. Runtime fields remain blank until a sourced snapshot exists.

Watch GLM-5.2 in the weekly brief

Get one weekly email when material rank, price, availability, or benchmark evidence changes are worth revisiting.

Read a sample issue

Join 2,000+ readers.

GLM-5.2

Strongest published evidence

Validate before choosing

Decision snapshot

Capability shape

Eligible category ranks

What it costs to get this score

How much of this is verified

Spec sheet

Deployment options

Category score record

Benchmark ledger

Lineage

How to read this profile

GLM-5.2 release history

Frequently asked questions

Watch GLM-5.2 in the weekly brief