GLM-4.7: Advancing the Coding Capability

39 comments

·December 22, 2025

jtrn

My quickie: MoE model heavily optimized for coding agents, complex reasoning, and tool use. 358B/32B active. vLLM/SGLang only supported on the main branch of these engines, not the stable releases. Supports tool calling in OpenAI-style format. Multilingual English/Chinese primary. Context window: 200k. Claims Claude 3.5 Sonnet/GPT-5 level performance. 716GB in FP16, probably ca 220GB for Q4_K_M.

My most important takeaway is that, in theory, I could get a "relatively" cheap Mac Studio and run this locally, and get usable coding assistance without being dependent on any of the large LLM providers. Maybe utilizing Kimik2 in addition. I like that open-weight models are nipping at the feet of the proprietary models.

__natty__

I can imagine someone from the past reading this comment and having a moment of doubt

embedding-shape

> Supports tool calling in OpenAI-style format

So Harmony? Or something older? Since Z.ai also claim the thinking mode does tool calling and reasoning interwoven, would make sense it was straight up OpenAI's Harmony.

> in theory, I could get a "relatively" cheap Mac Studio and run this locally

In practice, it'll be incredible slow and you'll quickly regret spending that much money on it instead of just using paid APIs until proper hardware gets cheaper / models get smaller.

biddit

> In practice, it'll be incredible slow and you'll quickly regret spending that much money on it instead of just using paid APIs until proper hardware gets cheaper / models get smaller.

Yes, as someone who spent several thousand $ on a multi-GPU setup, the only reason to run local codegen inference right now is privacy or deep integration with the model itself.

It’s decidedly more cost efficient to use frontier model APIs. Frontier models trained to work with their tightly-coupled harnesses are worlds ahead of quantized models with generic harnesses.

theLiminator

Yeah, I think without a setup that costs 10k+ you can't even get remotely close in performance to something like claude code with opus 4.5.

reissbaker

No, it's not Harmony; Z.ai has their own format, which they modified slightly for this release (by removing the required newlines from their previous format). You can see their tool call parsing code here: https://github.com/sgl-project/sglang/blob/34013d9d5a591e3c0...

rz2k

In practice the 4bit MLX version runs at 20t/s for general chat. Do you consider that too slow for practical use?

What example tasks would you try?

reissbaker

s/Sonnet 3.5/Sonnet 4.5

The model output also IMO look significantly more beautiful than GLM-4.6; no doubt in part helped by ample distillation data from the closed-source models. Still, not complaining, I'd much prefer a cheap and open-source model vs. a more-expensive closed-source one.

buppermint

I've been playing around with this in z-ai and I'm very impressed. For my math/research heavy applications it is up there with GPT-5.2 thinking and Gemini 3 Pro. And its well ahead of K2 thinking and Opus 4.5.

Tiberium

The frontend examples, especially the first one, look uncannily similar to what Gemini 3 Pro usually produces. Make of that what you will :)

EDIT: Also checked the chats they shared, and the thinking process is very similar to the raw (not the summarized) Gemini 3 CoT. All the bold sections, numbered lists. It's a very unique CoT style that only Gemini 3 had before today :)

reissbaker

I don't mind if they're distilling frontier models to make them cheaper, and open-sourcing the weights!

Imustaskforhelp

Same, although gemini 3 flash already gives a run for the cheaper aspect but a part of me really wants to get open source too because that way if I really want to some day, I can have privacy or get my own hardware to run it

I genuinely hope that gemini 3 flash gets open sourced but I feel like that can actually crash the AI bubble if something like this happens because I genuinely feel like although there are still some issues of vibing with the overall model itself, I find it very competent overall and fast and I genuinely feel like at this point, there might be some placebo effects too but in reality, the model feels really solid.

Like all of western countries (mostly) wouldn't really have a point to compete or incentives if someone open sources the model because then the competition would rather be on providers/ their speeds (like how groq,cerebras have an insane speed)

I had heard that google would allow institutions like universities to self host gemini models or similar so there are chances as to what if the AI bubble actually pops up if gemini models or top tier models accidentally get leaked or similar but I genuinely doubt of it as happening and there are many other ways that the AI bubble will pop.

XCSme

Funny how they didn't include Gemini 3.0 Pro in the bar chart comparison, considering that it seems to do the best in the table view.

jychang

Also, funny how they included GPT-5.0 and 5.1 but not 5.2... I'm pretty sure they ran the benchmarks for 5.0, then 5.1 came out, so they ran the benchmarks for 5.1... and then 5.2 came out and they threw their hands up in the air and said "fuck it".

XCSme

I didn't even notice that, I assumed it was the latest GPT version.

guluarte

Gemini is garbage and does it's own thing most of the time ignoring the instructions

null

[deleted]

esafak

The terminal bench scores look weak but nice otherwise. I hope once the benchmarks are saturated, companies can focus on shrinking the models. Until then, let the games continue.

theshrike79

z.ai models are crazy cheap. The one year lite plan is like 30€ (on sale though).

Complete no-brainer to get it as a backup with Crush. I've been using it for read-only analysis and implementing already planned tasks with pretty good results. It has a slight habit of expanding scope without being asked. Sometimes it's a good thing, sometimes it does useless work or messes things up a bit.

maxdo

I tried several times . It is no match in my personal experience with Claude models . There’s almost no place for second spot from my point of view . You are doing things for work each bug is hours of work, potentially lost customer etc . Why would you trust your money … just to back up ?

CuriouslyC

We're not gonna see significant model shrinkage until the money tap dries up. Between now and then, we'll see new benchmarks/evals that push the holes in model capabilities in cycles as they saturate each new round.

lanthissa

isn't gemini 3 flash already model shrinkage that does well in coding?

Imustaskforhelp

How much billion parameter model is gemini 3 flash, I can't seem to find info about it online.

hedgehog

Smaller open-weights models are also improving noticeably (like Qwen3 Coder 30B), the improvements are happening at all sizes.

bigyabai

It's a good model, for what it is. Z.ai's big business prop is that you can get Claude Code with their GLM models at much lower prices than what Anthropic charges. This model is going to be great for that agentic coding application.

maxdo

… and wake up every night because you saved a few dollars , there are bugs and they are due to this decision?

Imustaskforhelp

well I feel like all models are converging and maybe claude is good but only time will tell as gemini flash and GLM put pressure on claude/anthropic models

People (here) are definitely comparing it to sonnet so if you take this stance of saving a few dollars, I am sure that you must be having the same opinion of using opus model and nobody should use sonnet too

Personally I am interested in open source models because they would be something which would have genuine value and competition after the bubble bursts

gigatexal

Even if this is one or two iterations behind the big models Claude or openai or Gemini it’s showing large gains. Here’s hoping this gets even better and better and I can run this locally and also that it doesn’t melt my PC.

Imustaskforhelp

Although one would hope they can run it locally (which I hope so too but I doubt that with the increase of ram prices, I feel like its possible around 2027-2028). but Even if in the meanwhile we can't, I am sure that competition in general (on places like Openrouter and others) would give a meaningful way to cheapen the prices overall even further than the monopolistic ways of claude (let's say).

It does feel like these models are only behind 6 months tho as many like to say and for some things its 100% reasonable to use it and for some others not so much.

larodi

From my limited exposure to these models, they seem very very very promising.

maxdo

Funny enough they excluded 4.5 opus :)

cmrdporcupine

Running it in Crush right now and so far fairly impressed. It seems roughly in the same zone as Sonnet, but not as good as Opus or GPT 5.2.

null

[deleted]

observationist

Grok 4 Heavy wasn't considered in comparisons. Grok meets or exceeds the same benchmarks that Gemini 3 excels at, saturating mmlu, scoring highest on many of the coding specific benchmarks. Overall better than Claude 4.5, in my experience, not just with the benchmarks.

Benchmarks aren't everything, but if you're going to contrast performance against a selection of top models, then pick the top models? I've seen a handful of companies do this, including big labs, where they conveniently leave out significant competitors, and it comes across as insecure and petty.

Claude has better tooling and UX. xAI isn't nearly as focused on the app and the ecosystem of tools around it and so on, so a lot of things end up more or less an afterthought, with nearly all the focus going toward the AI development.

$300/month is a lot, and it's not as fast as other models, so it should be easy to sell GLM as almost as good as the very expensive, slow, Grok Heavy, or so on.

GLM has 128k, grok 4 heavy 256k, etc.

Nitpicking aside, the fact that they've got an open model that is just a smidge less capable than the multibillion dollar state of the art models is fantastic. Should hopefully see GLM 4.7 showing up on the private hosting platforms before long. We're still a year or two from consumer gear starting to get enough memory and power to handle the big models. Prosumer mac rigs can get up there, quantized, but quantized performance is rickety at best, and at that point you look at the costs of self hosting vs private hosts vs $200/$300 a month (+ continual upgrades)

Frontier labs only have a few years left where they can continue to charge a pile for the flagship heavyweight models, I don't think most people will be willing to pay $300 for a 5 or 10% boost over what they can run locally.

kristianp

Perhaps people are steering clear of grok due to its extremist political training.

lame-robot-hoax

Grok, in my experience, is extremely prone to hallucinations when not used for coding. It will readily claim to have access to internal Slack channels at companies, it will hallucinate scientific papers that do not exist, etc. to back its claims.

I don’t know if the hallucinations extend to code, but it makes me unwilling to consider using it.

HN

GLM-4.7: Advancing the Coding Capability

GLM-4.7: Advancing the Coding Capability