GLM-5.2 is a step change for open agents

(interconnects.ai)

104 points | by vantareed 1 day ago

11 comments

  • jerojero 1 day ago
    Open weight models from Chinese labs tend to be significantly cheaper.

    I think theyre absolutely needed. I can't afford 200 USD a month for personal use of coding AI, and I don't think such prices are reasonable for most of the world economy anyway. Not to mention US firms might be giving their employees a lot more than that.

    It's increasingly feeling, to me, that theres a gap building up between haves and have nots. But then, we get news of these open weight models that are reasonably priced in inference with reasonable capabilities. Yes, they take maybe 6-9 months to get there, tbh, that's not a bad trade off at all.

    • ImaCake 32 minutes ago
      Significantly cheaper than comparable models if you are using openrouter [0]. Just yesterday I spent roughly 13 cents centering some divs using Deepseek in a personal project. It would have been north of $1 to do that with a US frontier model.

      0. https://openrouter.ai/compare/z-ai/glm-5.2/anthropic/claude-...

    • arikrahman 32 minutes ago
      Someone else on this forum put it well, U.S. is trying to achieve AGI at all costs, while Chinese models are seeking widespread adoption.
      • azinman2 13 minutes ago
        I don't think anthropic/openai/google aren't also seeing widespread adoption. In fact they already have they already have the marketshare.
    • tacomagick 17 hours ago
      DeepSeek through their own API has saved me tons of tokens honestly. Even though it is not as smart as Kimi or Claude, their level of entry is very low with a top up of 2$ and Pay as you go compared to the subscription of Claude or 20$ top up of Kimi
      • praveer13 2 hours ago
        For personal use I’m considering using the frontier models from openai or anthropic to create a plan with research and brainstorming etc with enough details for cheap models to be able to follow (glm, deepseek etc) - with openrouter - will monitor how cheap and effective that turns out to be.
        • ImaCake 37 minutes ago
          You should try out the cheaper models first. I find Deepseek v4 models pretty comparable to sonnet 4.6 but at a fraction of the cost. You might find you just don't need to use the American models at all.
    • matheusmoreira 5 minutes ago
      > It's increasingly feeling, to me, that theres a gap building up between haves and have nots.

      People speak of a permanent underclass.

      https://www.nytimes.com/2026/04/30/opinion/ai-labor-work-for...

    • Fr0styMatt88 45 minutes ago
      If we can agree that the AI model is at least as capable as a junior engineer or new contractor, how’s that different to saying “software engineering isn’t worth $200 a month”?

      Has a very race-to-the-bottom feel to it.

      Though in the grand scheme of it, $200/mo probably isn’t the real price either. Also looking at it not just in a vacuum - paying for a product that can change what you get from under you doesn’t seem great anyway.

      At least with a locally-hosted model you know what you’re getting.

      • matheusmoreira 1 minute ago
        Yeah. The real future is running these models at home. Opus level inference on our own hardware would be a dream come true.
    • throwaway-blaze 24 minutes ago
      Just don't ask it to tell you the events of June 4, 1989.
    • ttoinou 1 day ago
      200 is much less than the value you’re supposed to get out of it. If it’s not then yeah go ahead and use cheaper models with worst quality
      • martinjc 2 hours ago
        Are you aware of how much purchasing power 200 dollars is in china, brazil, thailand or india is? This is an extremely arrogant take.
        • nwienert 17 minutes ago
          I’ve hired many asian developers anywhere from 1-4k a month.

          I get a lot more out of a 200/mo subscription now in a week than I did from them in a month.

          Now obviously in today’s world they’d be using a 200/mo subscription themselves. But it’s not like money is nothing, software development doesn’t scale down below 1k/mo for anyone competent even in the poorest areas.

      • Dayshine 1 day ago
        I'm not sure how I'm supposed to get $200 of value out of personal use!
        • LPisGood 2 hours ago
          Note that 200 dollars of value is different than 200 dollars of profit.
        • devmor 2 hours ago
          I personally don’t find it that useful for most tasks, but if say, you get paid $50/hr for your work and it saves you more than 4 hours of work in a month, there you go.
        • holoduke 2 hours ago
          Here most of my colleagues have +200 dollar rates. It's really a no brainer. But sure, in south America or some Asian countries maybe it is. But still most devs need it anyway. Also in the poor regions.
          • folkrav 2 minutes ago
            Most of the world's developers, even in not-poor regions, make significantly less than what your colleagues charge.
          • HDBaseT 1 hour ago
            $200/h is on the extreme end and I would argue most people here aren't anywhere close to that.

            The median hourly wage in the US is $28/h, this equates to nearly 7.5 hours. A full day of work a month for the average person to use Claude with reasonable limits.

            Yes, the people on $28/h may not be the software development types, so their income might not be as high, but these are the people who would probably be vibe coding the most since they aren't day to day programmers!

            • ray_kay777 30 minutes ago
              I suspect the reply above is referring to charge out rates rather than wages.
      • uberex 2 hours ago
        Unless that value is $200 cash in hand it will be hard to afford it for people who just don't have $200.
      • margalabargala 1 hour ago
        Last time you bought a computer, did you buy the absolute fastest best CPU available?
        • girvo 1 hour ago
          Yes, but that was because I could see the writing on the wall with respect to hardware prices being cooked by AI demand, so I built the best computer possible at the time knowing it'd probably need to last me the next 5+ years

          So not really comparable. I use Step 3.7 Flash locally, models are good enough for so many coding tasks even at the lower end! (Though I note that calling a 200B model "lower end" is kind of amusing)

      • smrtinsert 1 hour ago
        I've actually come to believe the overwhelming majority of use cases require nowhere frontier quality so there's that. Much faster execution is just a bonus on top of the much reduced cost
  • guybedo 5 minutes ago
    GLM-5.2 has been a step change in how fast i can burn through tokens.

    I subscribed to their max plan to try it out. It counted me 700M tokens and drained my weekly quota in under 2 days.

    Quota just reset less than 24h ago and i'm already >60% weekly quota usage.

    For reference the kind of work i did would have used somewhere between 3% and 5% of Codex max or Claude max.

    The model is good, the plan is a scam

    • jubilanti 1 minute ago
      > The model is good, the plan is a scam

      If it is needing to generate that many tokens to do the same tasks, then it probably has higher inference costs. So (for you) the model is bad, the plan is the same plan.

  • fraywing 16 minutes ago
    It feels like the gap is closing from an intelligence perspective. Or at least doing some kind of log flattening.

    Been playing with GLM 5.2 in different contexts. It's less good if you don't max out thinking, but as xhigh it's been able to solve most problems I was throwing at Opus in the about the same amount of time (via OpenRouter).

    Wild time to be alive.

  • dools 0 minutes ago
    Is z.ai

    Is 2 better than x.ai

  • christophilus 13 minutes ago
    I've been working with Deepseek V4 Flash (with opencode as the harness). It's been almost indistinguishable from Codex / Claude Code for me. I'm sure I'll run into problems when I get to a stickier ticket to tackle. But so far, it's been quite good, and I find it writes straightforward code.

    I do think the Chinese models are good enough for an 80/20 rule use case.

  • aunty_helen 1 hour ago
    I signed up to a z.ai max account, $144. Hardly been able to use it as it 429s on most requests. They’re also refusing to refund me.
    • guybedo 3 minutes ago
      same here. Barely usable due to API connections issues.

      And when i can use it, it just drains the quota 5 times faster than codex or claude.

      Their plan is a scam

    • osti 37 minutes ago
      Even as a GLM z.ai fan, I wouldn't pay for their plans. They are just way worse values than gpt or anthropic plans, in terms of both usage and capabilities.
    • sergiotapia 43 minutes ago
      My experience as well unfortunately :(
  • neosat 25 minutes ago
    I've been using GLM 5.2 recently (company hosted, for non-coding tasks) and it's been strong and reliable. There are areas where GPT 5.5 and Opus 4.x still feel marginally better but only marginally. For most tasks if GLM 5.2 is the only model I have to use I'm productive and happy. This was not true before GLM 5.2. No doubt in my mind that the gap is closing quickly and for most tasks that are not very specialized open models will be usably on par on flagship closed models and have an edge factoring in cost.

    For coding I still use 5.5 w/ Codex and prefer that to other models + harness combinations.

  • themgt 2 hours ago
    I just tested GLM 5.2 out via Z.ai in pi for a little one-off project that was already scoped. It actually did a relatively decent job starting out, and figured important things out from context.

    But the reasoning traces became increasingly hilarious, with it getting confused and going in loops, doubting itself. I began to feel almost sad, it was like listening to the internal monologue of someone with anxiety disorder.

    It made pretty good progress but wound up going in a lot of goofy loops and doing things a bit "off" from standards I'd hoped it would infer, and finally started going a bit nuts, "This is very confusing.", "OH WAIT", seemingly hallucinating a whole side-quest that didn't make sense and looking at making internal system changes to try to achieve its (now very confused) goal when I pulled the plug.

    Without seeing the reasoning traces from Claude/GPT it's hard to really know, but it definitely didn't feel like the same quality of reasoning, even if dogged persistence does wind up actually working eventually.

    • jauntywundrkind 2 hours ago
      I think the self-doubt might actually be a very crucial part of it's capability. I often feel compelled to interrupt when I'm watching it think (which thank the stars it let's us do, unlike the big American models!!), but usually it makes the right pick!

      Being willing and able to reconsider seems very good. Going around and around, pulling in more thinking, integrating it: maybe that's why it is as good as it's good.

      I want to emphasize again how excellent it is that we can see the thinking. I think this makes GLM so much better an experience for me. It gives me such insight into what is being considered, helps me see where things go wrong. It grounds me, gives me the notion of where the results come from. It was so jarring to switch to GPT and Opus and find that they won't discuss with me, won't reveal their thinking: that feels fundamentally unsafe, for me, for society, to have such a severe black box. I don't think it should be allowed, honestly.

      Many thanks to this recent submission, which is the first time I've seen anyone blog about this core difference: The text in Claude Code’s “Extended Thinking” output is not authentic. https://patrickmccanna.net/the-text-in-claude-codes-extended... https://news.ycombinator.com/item?id=48630535

      • wuhhh 1 hour ago
        Your post made me laugh because I experienced the same as you but the other way around. I switched from Claude to a multi model harness a couple of days ago and the first model I tried was GLM5.2.

        I gave it some simple code porting exercises and watched dumbfounded at the reasoning, which was more like the ravings of a lunatic - but lo and behold, after much confusion and a dizzying number of eureka moments the task was completed very successfully.

        I tried Kimi on a similar task, much faster, a little more reassuring somehow in its ramblings, also surprisingly good results.

        To be clear, I’m not surprised the results were good because they’re not GPT or Claude, but because the line of reasoning was so bonkers. Coming from Claude, I was just not used to seeing this, but I’ll bet it’s just as nuts with the frontier models and we’re just not allowed to see it (I’m about to read the links you shared).

        Agree wholeheartedly that transparency is of grave importance.

        • rainmaking 1 hour ago
          Yeah isn't that thinking weird?

          Now I see the issue clearly! But wait... now I have the full picture! But wait... Found it!

          I gave up a few times because of it at first until I realized I just had to let GLM get on with it and what came out was great!

          But once it was outright endearing- challenging bug, it said: I have been very thorough. Then it escalated where to look and aced it. Built in confucian values

          • wuhhh 44 minutes ago
            If there’s one thing I’ve learned these past couple of days, it’s to resist the temptation to jab the escape button and start waving my arms! I wonder how much of this cyclical self doubt / self congratulating I go through in my own thoughts without even realising it. If you could verbalise or articulate all the half thoughts, snatches of ideas, feelings and ruminations the human mind goes through on some tasks it might be even more bizarre (or could just be me)
  • timcobb 1 hour ago
    Can people share their GLM and open model setups in general please? What provider do you use. Why do you trust it with serving full quality? What harness do you use? Why do you trust it not to have malware (most harnessed are TS apps). I am just trying GLM 5.1 from Nvidia build in open code would love to hear how you all do it, thanks.
    • michimagdesign 1 hour ago
      Next to my Claude Pro plan, I have subbed to OpenCode Go. I find the OpenCode UX much better than in Claude Code CLI. As for models, I started a few months ago with GLM 5.1 and it was solid and could archive near sonnet-level tasks. It weirdly sputtered out Chinese characters sometimes. Then I switched to Kimi K2.6, which is the Chinese model I used the most until now. It used way too many reasoning tokens (improved in k2.7). But executed Claude created plans reliably. Now I’m back with GLM 5.2 and it’s really solid (among other things it’s good at design) and I get good usage with the $10 plan. Still the Claude models have less hiccups but the Chinese models are getting really close.
    • gandreani 51 minutes ago
      I use both the openai subscription and the opencode go subscription. I use the go subscription for my personal work and the openai subscription for my consulting work.

      The differences between the models are minimal, but I usually stick with gpt-5.4-mini, gpt-5.4, mimo-pro-2.5, deepseek-v4-pro. These latter ones have way more usage than even using 5.4-mini so I tend to use them in personal projects for that reason.

      My harness is https://github.com/can1357/oh-my-pi. I trust it...enough. It updates very frequently so as a safe guard I run it sandboxed with https://github.com/containers/bubblewrap so it can only access the project folder and some whitelisted config files

      • timcobb 42 minutes ago
        Thanks. I was looking at open code go yesterday and I couldn't figure out if the base pricing is including usage or if that's just base pricing and then you have to pay for usage too. How does it work? It is very cheap.
    • smoe 1 hour ago
      For work, I mostly use Codex and some Claude. For personal use, I’ve started using Chinese models directly through their respective providers, mostly for automation tasks and experiments so far, either via the API directly or through the Pi harness.

      I do not trust any of them. Everything runs inside virtual machines, not just the sandboxes provided by the harnesses. I also do not run Claude or Codex directly on the host machine. Not just because of supply chain fears, but also because of how incredibly user hostile the VC funded companies are when it comes to installing random stuff on your machine.

    • rainmaking 1 hour ago
      GLM 5.2 coding plan- I'll post the agent as soon as I can! But opencode works and their own zcode is really good as well.
  • citizenpaul 2 hours ago
    Ive been using glm5 since its release and still prefer it to glm5.1 and so far to glm5.2

    Perhaps it is just my harness and workflow, but the older model still seems to work better. Also the token cost is significantly lower. I rarely spend more than $20 a week with $50 cap. Not even half claudes ambiguous minimum $200 a month plan.

    • rainmaking 40 minutes ago
      Now that's a tremendous pointer, I'm going to have to try that.

      Do you full on let GLM5 get stuff done on its own or is it more like a guided workflow? The former's what the point releases doubled down on and is also something that uses a lot of juice.

  • Balinares 1 day ago
    I can't help wondering what kind of models we'll see coming out of China once it gets its own chip fabs up and running. Right now it sounds like the US's export ban is not slowing them down a whole lot.
    • pianopatrick 2 minutes ago
      There does not seem to be a big penalty for going slow anyways. People seem to just switch on cost as soon as a model can do a task well enough. There do not seem to be strong network effects or vendor lock in.

      Seems to me that going slow is the better long term tactic. China can just let the USA pay the high R&D costs to figure out what works, then just copy what works.

    • briga 18 minutes ago
      With subsidization from the Chinese government they will probably be equal to or better than the models here. I mean, have you looked at the author list of any given AI paper published within, say, the past 5 years? I wouldn't be surprised if half or more AI researches are from China.
    • ceejayoz 2 hours ago
      > Right now it sounds like the US's export ban is not slowing them down a whole lot.

      It may wind up being a massive boost to them in the long run, even.

      Necessity is the mother of invention.