Developers are choosing older AI models, and the data explains why
34 comments
·October 29, 2025KronisLV
For development use cases, I switched to Sonnet 4.5 and haven't looked back. I mean, sure, sometimes I also use GPT-5 (and mini) and Gemini 2.5 Pro (and Flash), and also Cerebras Code just switched to providing GLM 4.6 instead of the previous Qwen3 Coder so those as well, but in general the frontier models are pretty good for development and I wouldn't have much reason to use something like Sonnet 4 or 3.7 or whatever.
JanSt
I have canceled my Claude Max subscription because Sonnet 4.5 is just too unreliable. For the rest of the month I'm using Opus 4.1 which is much better but seems to have much lower usage limits than before Sonnet 4.5 was released. When I hit 4.1 Opus limits I'm using Codex. I will probably go through with the Codex pro subscription.
ashirviskas
Just one week data right after the release, when it is already one month later?
This data is basically meaningless, show us the latest stats.
xiphias2
Even for non-developer use cases o3 is a much better model for me than GPT5 on any setting.
30 seconds-1 minute is just the time I am patient enough to wait as that's the time I am spending on writing a question.
Faster models just make too many mistakes / don't understand the question.
tifa2up
We tried GPT-5 for a RAG use case, and found that it performs worse than 4.1. We reverted and didn't look back.
sigmoid10
4.1 is such an amazing model in so many ways. It's still my nr. 1 choice for many automation tasks. Even the mini version works quite well and it has the same massive context window (nearly 8x GPT-5). Definitely the best non-reasoning model out there for real world tasks.
teekert
So… You did look back then didn’t look forward anymore… sorry couldn’t resist.
l5870uoo9y
To those who complain about GPT5 being slow; I recently migrated https://app.sqlai.ai and found that setting service_tier = “priority” makes it reason twice as fast.
s1mplicissimus
Seems to completely ignore usage of local/free models as well as anything but Sonnet/ChatGPT. So my confidence in the good faith of the author is... heavily restricted.
nicce
Most people can’t affort the GPUs for local models if you want to get close to cloud capabilities.
EagnaIonat
The more recent LLMs work fine on an M1 mac. Can't speak for Windows/Linux.
There was even a recent release of Granite4 that runs on a Raspberry Pi.
https://github.com/Jewelzufo/granitepi-4-nano
For my local work I use Ollama. (M4 Max 128GB)
- gpt-oss. 20b or 120b depending on complexity of use cases.
- granite4 for speed and lower complexity (around the same as gpt20b).
rhdunn
A 4090 has 24GB of VRAM allowing you to run a 22B model entirely in memory at FP8 and 24B models at Q6_K (~19GB).
A 5090 has 32GB of VRAM allowing you to run a 32B model in memory at Q6_K.
You can run larger models by splitting the GPU layers that are run in VRAM vs stored in RAM. That is slower, but still viable.
This means that you can run the Qwen3-Coder-30B-A3B model locally on a 4090 or 5090. That model is a Mixture of Experts model with 3B active parameters, so you really only need a card with 3B of VRAM so you could run it on a 3090.
The Qwen3-Coder-480B-A35B model could also be run on a 4090 or 5090 by splitting the active 35B parameters across VRAM and RAM.
Yes, it will be slower than running it in the cloud. But you can get a long way with a high-end gaming rig.
ashirviskas
How much context do you get with 2GB of leftover VRAM on Nvidia GPU?
Foobar8568
Yes but they are really less performant than claude code or codex. I really cried with the 20-25GB models ( 30b Qwen, Devstral etc). They really don't hold a candle, I didn't think the gap was this large or maybe Claude code and GPT performs much better than I imagined.
jen729w
Honestly though how many people reading this do you think have that setup vs. 85% of us being on a MBx?
> The Qwen3-Coder-480B-A35B model could also be run on a 4090 or 5090 by splitting the active 35B parameters across VRAM and RAM.
Reminds me of running Doom when I had to hack config.sys to forage 640KB of memory.
Less than 0.1% of the people reading this are doing that. Me, I gave $20 to some cloud service and I can do whatever the hell I want from this M1 MBA in a hotel room in Japan.
s1mplicissimus
Most people I know can't afford to leak business insider information to 3rd party SaaS providers, so it's unfortunately not really an option.
Tepix
Isn't the point that you don't need SOTA capabilities all the time?
NumerousProcess
Augment doesn't support local models or anything else other than Claude/GPT
pistoriusp
Do you use a local/ free model?
s1mplicissimus
Yes, for the little it's good I'm currently using LMStudio with varying models
busymom0
I am currently using a local model qwen3:8b running on a 2020 (2018 intel chip) Mac mini for classifying news headlines and it's working decently well for my task. Each headline takes about 2-3 seconds but is pretty accurate. Uses about 5.3 gigs of ram.
darkwater
Can you expand a bit on your software setup? I thought running local models was restricted to having expensive GPUs or latest Apple Silicon with unified memory. I have a Intel 11th gen home server which I would like to use to run some local model for tinkering if possible.
blitzar
GPT-5 usage is 20% higher on days that start with "S"
Nevertheless, 7 datapoints does not a trend make. The daily variation is more than I would have expected, but could also be down to what day of the week the pizza party is or the weekly scrum meetings is at a few of their customers workplaces.
Manfred
It could be an interesting data point, but without correcting for absolute usage figures and their customers it's kind of hard to make general statements.
jonplackett
Isn’t this obvious? When you have a task you think is hard. You give it to a cleverer model. When a task is straight forward you give it to an older one.
PeterStuer
Not realy. Most developers would prefer one model that does everything best. That is the easiest, set it and forget it, no manual descision required.
What is unclear from the presentation is wether they do this or not. Do teams that use Sonnet 4.5 just always use it, and teams on Sonnet 4.0 likewise? Or do individuals decided which model to use on a per task basis.
Personally I tend to default to just 1, and only go to an alternative if it gets stuck or doesn't get me what I want.
hn_throw2025
Not sure why you were downvoted.. I think you are correct.
As evidenced by furious posters on r/cursor, who make every prompt to super-opus-thinking-max+++ and are astonished when they have blown their monthly request allowance in about a day.
If I need another pair of (artificial) eyes on a difficult debugging problem, I’ll occasionally use a premium model sparingly. For chore tasks or UI layout tweaks, I’ll use something more economical (like grok-4-fast or claude-4.5-haiku - not old models but much cheaper).
rcarmo
I think this is somewhat disingenuous since not everyone uses the latest thing, and people tend to stick to “what works” for them.
Models are picky enough about prompting styles that changing to a new model every week/month becomes an added chunk of cognitive overload, testing and experimentation, plus even in developer tooling there have been minor grating changes in API invocations and use of parameters like temperature (I have a fairly low-level wrapper for OpenAI, and I had to tweak the JSON handling for GPT-5).
Also, there are just too many variations in API endpoints, providers, etc. We don’t really have a uniform standard. Since I don’t use “just” OpenAI, every single tool I try out requires me to jump through a bunch of hoops to grab a new API key, specify an endpoint, etc.—and it just gets worse if you use a non-mainstream AI endpoint.
rafaelmn
> I think this is somewhat disingenuous since not everyone uses the latest thing, and people tend to stick to “what works” for them.
They say that the number of users on Claude 4.5 spiked and then a significant number of users reverted to 4.0 with the trend going up, and they are talking about their usage metrics. So I don't get how your comment is relevant to the article ?
dotancohen
His comment is relevant to the headline. You must be new here.
gptfiveslow
GPT5 is HELLISHLY slow. That's all there is to it.
It loves doing a whole bunch of reasoning steps and prolaim how mucf of a very good job it did clearing up its own todo steps and all that mumbo jumbo, but at the end of the day, I only asked it a small piece of information about nginx try_files that even GPT3 could answer instantly.
Maybe before you make reasoning models that go on funny little sidequests wher they multiply numbers by 0 a couple of times, make it so its good at identfying the length of a task. ntil then, I'll ask little bro and advance only if necessity arrives. And if it ends up gathering dust, well... yeah.
szundi
[dead]
Tepix
The article(§) talks about going from Sonnet 4.5 back to Sonnet 4.0.
(§) You know that it's a hyperlink, do you? /s
To the authors of the site, please know that your current "Cookiebot by Usercentrics" is old and pretty much illegal. You shouldn't need to click 5 times to "Reject all" if accepting all is one click. Newer versions have a "Deny" button.