Apple just released a weirdly interesting coding language model

31 comments

·July 5, 2025

vessenes

Short version: A Qwen-2.5 7b model that has been turned into a diffusion model.

A couple notable things: first is that you can do this at all, (left to right model -> out of order diffusion via finetuning) which is really interesting. Second, the final version beats original by a small margin on some benchmarks. Third is that it’s in the ballpark of Gemini diffusion, although not competitive — to be expected for any 7B parameter model.

A diffusion model comes with a lot of benefits in terms of parallelization and therefore speed; to my mind the architecture is a better fit for coding than strict left to right generation.

Overall, interesting. At some point these local models will get good enough for ‘real work’ and they will be slotted in at API providers rapidly. Apple’s game is on-device; I think we’ll see descendants of these start shipping with Xcode in the next year as just part of the coding experience.

baobun

Without having tried it, what I keep getting surprised with is how apparently widely different architectures (and in other cases training data) lead to very similar outcomes. I'd expect results to vary a lot more.

IMTDb

I would expect a lot of attempts to fail and those tend to not be published, or gather less attention. So if we have reached a local optimum, any technique that gets close to the current benchmarks is worth publishing, as soon as results reach that point. All the one that are too distant are discarded. In the end all the paper you see are close to the current status quo.

It's possible that some of those new architecture / optimization would allow us to go beyond the current benchmark score, but probably with more training data, and money. But to get money you need to show results, which is what you see today. Scaling remains king; maybe one of these technique is 2025 "attention" paper, but even that one needed a lot of scaling to go from the 2017 version to ChatGPT.

viraptor

It doesn't look like it got pushed that much unfortunately. The article says they only added 20k examples to fine tune at the end, but maybe the ceiling is much higher for diffusion?

But yeah, RWKV also ends up in a similar performance area with similar sizes - I wish someone started using it at scale finally...

miroljub

When we look at the small models suitable for running locally, by far the best programming model is DeepSeek-R1-0528-Qwen3-8B. It is quite comparable in real world usage even to much bigger models.

jeswin

> to my mind the architecture is a better fit for coding

We have to see if it produces better results. Humans have a planning phase, followed be a part-by-part implementation phase. This is reasonably well emulated by plan/architect + codegen tools.

dboreham

It's delusional to think that most software projects can be planned in advance beyond "there will be a beginning, a middle, and an end". People do it, but their efforts are in my experience generally ignored once implementation get underway.

koakuma-chan

> At some point these local models will get good enough for ‘real work’

Are these small models good enough for anything but autocomplete?

MangoToupe

Given that's 99% of my usage of it, that alone would make me quite happy.

_heimdall

Isn't that all they're designed for?

They predict more than just the second half of a word you are typing, but at the end of the day they're still just predicting what a human would have typed.

koakuma-chan

I'm disappointed because I don't use autocomplete.

iwontberude

I think Apple will ultimately destroy the data center, I hope they succeed.

lxgr

Maybe for compute, but not for storage.

Why can’t I backup an iOS device to a local NAS in the way I can use Time Machine, for example? (Rhetorical question; the answer is obviously that they want to sell more iCloud storage for that all-important services revenue).

tonyedgecombe

[delayed]

hiatus

Synology supports exactly that, and I'm sure they're not the only one.

nxobject

Shades of 1980s Apple v. Big Blue. I can't wait for the rehash of the "1984" ad.

andsoitis

> The result is faster code generation, at a performance that rivals top open-source coding models

So even though it is faster (than what?) it still doesn’t beat top models?

ellisv

I often write out of order so it’s interesting to me that we have a model that can do so as well.

quaintdev

Jetbrains has 100MB models of languages for their IDEs that can auto complete single lines. It's good but I think we can do better for local code auto complete. I hope Apple succeeds in their on device AI attempts.

infecto

Is any of the Jetbrains offerings even competitive. I jumped shipped from Pycharm and have tried their AI offerings a few times since release but was always wildly impressed at how far behind they were of the competition.

crappybird

Junie is so amazing. It burns through credits like hellfire, but I have only seen claude-code and opencode coming anywhere close to it.

Did you use the normal Jetbrains AI assistant, or was it junie?

nsm

I've had good experiences with Junie and AI assistant at their bread and butter languages of Java and Kotlin. I haven't tried it in anger though.

skylerwiernik

Here's the paper that they wrote: https://arxiv.org/pdf/2506.20639

It's notable that this was an intern project.

supriyo-biswas

Either this or https://huggingface.co/apple/DiffuCoder-7B-cpGRPO should be replaced with the current article.

runeblaze

I mean tbh industry research labs pump out a lot of good research due to them being intern projects (as in you have an army of passionate interns)

jbellis

Looks like this is targeted more at "better autocomplete" than "powers your next agent." Makes sense given Apple's interest in the on-device experience.

vintagedave

Does diffusion allow for 'size editing'? Unsure how to ask this, or if this (most likely) reveals a fundamental misunderstanding of my own, but: for an image, the size is set (say, 256x256.) For text, if each token were a pixel, it's very small. The article image showed text colour-coded by generation order. What if it would need to, say, insert another line for the rest of a comment sentence? How would it even know the size upfront, the way an image size is known?

mjp

Yes, block diffusion for example generates fixed-size blocks of text, but can dynamically change the number of blocks.

The Zed team recently posted a pretty good intro to diffusion models for text: https://www.youtube.com/watch?v=oot4O9wMohw

amelius

Points I thought were interesting:

> Apple’s model is built on top of Qwen2.5‑7B, an open-source foundation model from Alibaba. Alibaba first fine-tuned that model for better code generation (as Qwen2.5‑Coder‑7B), then Apple took it and made its own adjustments.

> it still doesn’t quite reach the level of GPT-4 or Gemini Diffusion.

WillAdams

Articles such as this really make me wish that the current generation of LLMs were more often described as workable implementations of "the infinite monkey theorem" --- the descriptions/images in this article are esp. worth showing to aperson to whom one is trying to describe how an AI model "creates" an image.

null

[deleted]

HN

Apple just released a weirdly interesting coding language model

Apple just released a weirdly interesting coding language model