Skip to content(if available)orjump to list(if available)

The surprise deprecation of GPT-4o for ChatGPT consumers

andy99

Edit to add: according to Sam Altman in the reddit AMA they un-deprecated it based on popular demand. https://old.reddit.com/r/ChatGPT/comments/1mkae1l/gpt5_ama_w...

I wonder how much of the '5 release was about cutting costs vs making it outwardly better. I'm speculating that one reason they'd deprecate older models is because 5 materially cheaper to run?

Would have been better to just jack up the price on the others. For companies that extensively test the apps they're building (which should be everyone) swapping out a model is a lot of work.

corysama

The vibe I'm getting from the Reddit community is that 5 is much less "Let's have a nice conversation for hours and hours" and much more "Let's get you a curt, targeted answer quickly."

So, good for professionals who want to spend lots of money on AI to be more efficient at their jobs. And, bad for casuals who want to spend as little money as possible to use lots of datacenter time as their artificial buddy/therapist.

jelder

Well, good, because these things make bad friends and worse therapists.

monster_truck

The number of comments in the thread talking about 4o as if it were their best friend the shared all their secrets with is concerning. Lotta lonely folks out there

hn_throwaway_99

Which is a bit frightening because a lot of the r/ChatGPT comments strike me as unhinged - it's like you would have thought that OpenAI murdered their puppy or something.

dsadfjasdf

Are all humans good friends and therapists?

moralestapia

I kind of agree with you as I wouldn't use LLMs for that.

But also, one cannot speak for everybody, if it's useful for someone on that context, why's that an issue?

resource_waste

Well, like, thats just your opinion man.

And probably close to wrong if we are looking at the sheer scale of use.

There is a bit of reality denial among anti-AI people. I thought about why people don't adjust to this new reality. I know one of my friends was anti-AI and seems to continue to be because his reputation is a bit based on proving he is smart. Another because their job is at risk.

rpeden

I'm appalled by how dismissive and heartless many HN users seem toward non-professional users of ChatGPT.

I use the GPT models (along with Claude and Gemini) a ton for my work. And from this perspective, I appreciate GPT-5. It does a good job.

But I also used GPT-4o extensively for first-person non-fiction/adventure creation. Over time, 4o had come to be quite good at this. The force upgrade to GPT-5 has, up to this point, been a massive reduction in quality for this use case.

GPT-5 just forgets or misunderstands things or mixes up details about characters that were provided a couple of messages prior, while 4o got these details right even when they hadn't been mentioned in dozens of messages.

I'm using it for fun, yes, but not as a buddy or therapist. Just as entertainment. I'm fine with paying more for this use if I need to. And I do - right now, I'm using `chatgpt-4o-latest` via LibreChat but it's a somewhat inferior experience to the ChatGPT web UI that has access to memory and previous chats.

Not the end of the world - but a little advance notice would have been nice so I'd have had some time to prepare and test alternatives.

hirvi74

> "Let's get you a curt, targeted answer quickly."

This probably why I am absolutely digging GPT-5 right now. It's a chatbot not a therapist, friend, nor a lover.

michaelbrave

I've seen quite a bit of this too, the other thing I'm seeing on reddit is I guess a lot of people really liked 4.5 for things like worldbuilding or other creative tasks, so a lot of them are upset as well.

corysama

There is certainly a market/hobby opportunity for "discount AI" for no-revenue creative tasks. A lot of r/LocalLLaMA/ is focused on that area and in squeezing the best results out of limited hardware. Local is great if you already have a 24 GB gaming GPU. But, maybe there's an opportunity for renting out low power GPUs for casual creative work. Or, an opportunity for a RenderToken-like community of GPU sharing.

torginus

I mean - I 'm quite sure it's going to be available via API, and you can still do your worldbuilding if you're willing to go to places like OpenRouter.

oceanplexian

I don't see how people using these as a therapist really has any measurable impact compared to using them as agents. I'll spend a day coding with an LLM and between tool calls, passing context to the model, and iteration I'll blow through millions of tokens. I don't even think a normal person is capable of reading that much.

el_benhameen

I am all for “curt, targeted answers”, but they need to be _correct_, which is my issue with gpt-5

hn_throwaway_99

The GPT-5 API has a new parameter for verbosity of output. My guess is the default value of this parameter used in ChatGPT corresponds to a lower verbosity than previous models.

alecsm

I had this feeling too.

I needed some help today and it's messages where shorter but also detailed without all the spare text that I usually don't even read.

waldrews

Doesn't look like they blew up the API use cases, just the consumer UI access. I wouldn't be surprised if they allow it again, hidden behind a setting (along with allowing the different routed GPT5 levels to be in the selector).

andy99

Ah ok, that's an important distinction. Seems much less a big deal then - or at least a consumer issue rather than a business one. Having never really used chatgpt (but used the apis a lot), I'm actually surprised that chat users would care. There are cost tradeoffs for the different models when building on them, but for chatgpt, it's less clear to me why one would move between selecting different models.

dragonwriter

> There are cost tradeoffs for the different models when building on them, but for chatgpt, it's less clear to me why one would move between selecting different models.

The same tradeoffs (except cost, because that's roled into the plan not a factor when selecting on the interface) exist on ChatGPT, which is an app built on the underlying model like any other.

So getting rid of models that are stronger in some areas when adding a new one that is cheaper (presuming API costs also reflect cost to provide) has the same kinds of impacts on existing ChatGPT users established usages as it would have on a businesses established apps except that the ChatGPT users don't see a cost savings along with any disruption in how they were used to things working.

svachalek

Not everyone is an engineer. There's a substantial population that were selecting for maximum sycophancy.

cgriswald

Lower tiers have limited uses for some models.

waynesonfire

I have a feeling that the chatgpt ui does some behind-the scenes tuning as well--hidden prompt engineering if you will. I migrated to the api and 4o still seems different. Most obvious, I don't get the acks that make me feel like I should run for president.

Even ChatGPT 5 confirmed this,

why does the gpt-4o api not do this?

ChatGPT said:

Because the GPT-4o API is tuned and delivered in a neutral, low-intrusion style by default.

When OpenAI built GPT-4o for API use, they optimized it for:

    Predictable formatting (so it works well in code, pipelines, chatbots, etc.)

    Minimal unsolicited chatter (no “Nice!” or “Great job!” unless explicitly prompted)

    Deterministic tone — so that two API calls with the same input produce consistent, professional output without extra filler.
That’s different from the ChatGPT product experience, which has its own “assistant personality” layer that sometimes adds those rapport-building acknowledgements in casual conversation.

In API mode, you’re the one defining the personality, so if you want that “Good! Looks like you’re digging in” style, you have to bake it into the system prompt, for example:

simonw

The GPT-4o you talk to through ChatGPT and the GPT-4o you access via the API are different models... but they're actually both available via the API.

https://platform.openai.com/docs/models/gpt-4o is gpt-4o in the API, also available as three date-stamped snapshots: gpt-4o-2024-11-20 and gpt-4o-2024-08-06 and gpt-4o-2024-05-13 - priced at $2.50/million input and $10.00/million output.

https://platform.openai.com/docs/models/chatgpt-4o-latest is chatgpt-4o-latest in the API. This is the model used by ChatGPT 4o, and it doesn't provide date-stamped snapshots: the model is updated on a regular basis without warning. It costs $5/million input and $15/million output.

If you use the same system prompt as ChatGPT (from one of the system prompt leaks) with that chatgpt-4o-latest alias you should theoretically get the same experience.

AlecSchueler

But it always gives answers like that for questions where it doesn't know the actual reason.

hinkley

Margins are weird.

You have a system that’s cheaper to maintain or sells for a little bit more and it cannibalizes its siblings due to concerns of opportunity cost and net profit. You can also go pretty far in the world before your pool of potential future customers is muddied up with disgruntled former customers. And there are more potential future customers overseas than there are pissed off exes at home so let’s expand into South America!

Which of their other models can run well on the same gen of hardware?

dbreunig

I’m wondering that too. I think better routers will allow for more efficiency (a good thing!) at the cost of giving up control.

I think OpenAI attempted to mitigate this shift with the modes and tones they introduced, but there’s always going to be a slice that’s unaddressed. (For example, I’d still use dalle 2 if I could.)

sebzim4500

Are they deprecating the older models in the API? I don't see any indication of that in the docs.

dragonwriter

> I wonder how much of the '5 release was about cutting costs vs making it outwardly better. I'm speculating that one reason they'd deprecate older models is because 5 materially cheaper to run?

I mean, assuming the API pricing has some relation to OpenAI cost to provide (which is somewhat speculative, sure), that seems pretty well supported as a truth, if not necessarily the reason for the model being introduced: the models discontinued (“deprecated” implies entering a notice period for future discontinuation) from the ChatGPT interface are priced significantly higher than GPT-5 on the API.

> For companies that extensively test the apps they're building (which should be everyone) swapping out a model is a lot of work.

Who is building apps relying on the ChatGPT frontend as a model provider? Apps would normally depend on the OpenAI API, where the models are still available, but GPT-5 is added and cheaper.

nickthegreek

> Who is building apps relying on the ChatGPT frontend as a model provider? Apps would normally depend on the OpenAI API, where the models are still available, but GPT-5 is added and cheaper.

Always enjoy your comments dw, but on this one I disagree. Many non-technical people at my org use custom gpt's as "apps" to do some re-occuring tasks. Some of them have spent absurd time tweaking instructions and knowledge over and over. Also, when you create a custom gpt, you can specifically set the preferred model. This will no doubt change the behavior of those gpts.

Ideally at the enterprise level, our admins would have a longer sunset on these models via web/app interface to ensure no hiccups.

trashface

Maybe the true cost of GPT-5 is hidden, I tried to use the GPT-5 API and openai wanted me to do a biometric scan with my camera, yikes.

scarface_74

Companies testing their apps would be using the API not the ChatGPT app. The models are still available via the API.

jimbokun

> For companies that extensively test the apps they're building (which should be everyone) swapping out a model is a lot of work.

Yet another lesson in building your business on someone else's API.

LeoPanthera

The article links to this subreddit, which I'd never heard of until now:

https://www.reddit.com/r/MyBoyfriendIsAI

And my word that is a terrifying forum. What these people are doing cannot be healthy. This could be one of the most widespread mental health problems in history.

jayGlow

that is one of the more bizarre and unsettling subreddits I've seen. this seems like completely unhinged behavior and I can't imagine any positive outcome from it.

paulcole

> What these people are doing cannot be healthy

Leader in the clubhouse for the 2025 HN Accidental Slogan Contest.

j-krieger

I can't help but find this incredibly interesting.

tibbar

I've worked on many migrations of things from vX to vX + 1, and there's always a tension between maximum backwards-compatibility, supporting every theoretical existing use-case, and just "flipping the switch" to move everyone to the New Way. Even though I, personally, am a "max backwards-compatibility" guy, it can be refreshing when someone decides to rip off the bandaid and force everyone to use the new best practice. How exciting! Unfortunately, this usually results in accidentally eliminating some feature that turns out to be Actually Important, a fuss is made, and the sudden forced migration is reverted after all.

I think the best approach is to move people to the newest version by default, but make it possible to use old versions, and then monitor switching rates and figure out what key features the new system is missing.

ronsor

I usually think it's best to have both n and n - 1 versions for a limited time. As long as you always commit to removing the n - 1 version at a specified point in time, you don't get trapped in backward compatibility hell.

koolala

Unless n is in any way objectively worse than n-1, then remove n-1 immediately so users don't directly compare them. Even Valve did it with Counter-Strike 2 and GO.

tibbar

With major redesigns, you often can’t directly compare the two versions —- they are different enough that you actually want people to use them in a different way. So it’s not that the new version is “worse”, it’s just different, and it’s possible that there are some workflows that are functionally impossible on the new version (you’d be surprised how easy it is to mess this up.)

perlgeek

GPT-5 simply sucks at some things. The very first thing I asked it to do was to give me an image of knife with spiral damascus pattern, it gave me an image of such a knife, but with two handles at a right angle: https://chatgpt.com/share/689506a7-ada0-8012-a88f-fa5aa03474...

Then I asked it to give me the same image but with only one handle; as a result, it removed one of the pins from a handle, but the knife had still had two handles.

It's not surprising that a new version of such a versatile tool has edge cases where it's worse than a previous version (though if it failed at the very first task I gave it, I wonder how edge that case really was). Which is why you shouldn't just switch over everybody without grace period nor any choice.

The old chatgpt didn't have a problem with that prompt.

For something so complicated it doesn't surprise that a major new version has some worse behaviors, which is why I wouldn't deprecate all the old models so quickly.

zaptrem

The image model (GPT-Image-1) hasn’t changed

orphea

Yep, GPT-5 doesn't output images: https://platform.openai.com/docs/models/gpt-5

perlgeek

Then why does it produce different output?

simonw

It works as a tool. The main model (GPT-4o or GPT-5 or o3 or whatever) composes a prompt and passes that to the image model.

This means different top level models will get different results.

You can ask the model to tell you the prompt that it used, and it will answer, but there is no way of being 100% sure it is telling you the truth!

My hunch is that it is telling the truth though, because models are generally very good at repeating text from earlier in their context.

seba_dos1

You know that unless you control for seed and temperature, you always get a different output for the same prompts even with the model unchanged... right?

carlos_rpn

Somehow I copied your prompt and got a knife with a single handle on the first try: https://chatgpt.com/s/m_689647439a848191b69aab3ebd9bc56c

Edit: chatGPT translated the prompt from english to portuguese when I copied the share link.

hirvi74

I think that is one of the most frustrating issues I currently face when using LLMs. One can send the same prompt in two separate chats and receive two drastically different responses.

dymk

It is frustrating that it’ll still give a bad response sometimes, but I consider the variation in responses a feature. If it’s going down the wrong path, it’s nice to be able to roll the dice again and get it back on track.

techpineapple

I’ve noticed inconsistencies like this, everyone said that it couldn’t count the b’s in blueberry, but it worked for me the first time, so I thought it was haters but played with a few other variations and got flaws. (Famously, it didn’t get r’s in strawberry).

I guess we know it’s non-deterministic but there must be some pretty basic randomizations in there somewhere, maybe around tuning its creativity?

seba_dos1

Temperature is a very basic concept that makes LLMs work as well as they do in the first place. That's just how it works and that's how it's been always supposed to work.

joaohaas

Yes, it sucks

But GPT-4 would have the same problems, since it uses the same image model

chrismustcode

The image model is literally the same model

minimaxir

So there may be something weird going on with images in GPT-5, which OpenAI avoided any discussion about in the livestream. The artist for SMBC noted that GPT-5 was better at plagiarizing his style: https://bsky.app/profile/zachweinersmith.bsky.social/post/3l...

However, there have been no updates to the underlying image model (gpt-image-1). But due to the autoregressive nature of the image generation where GPT generates tokens which are then decoded by the image model (in contrast to diffusion models), it is possible for an update to the base LLM token generator to incorporate new images as training data without having to train the downstream image model on those images.

simonw

No, those changes are going to be caused by the top level models composing different prompts to the underlying image models. GPT-5 is not a multi-modal image output model and still uses the same image generation model that other ChatGPT models use, via tool calling.

GPT-4o was meant to be multi-modal image output model, but they ended up shipping that capability as a separate model rather than exposing it directly.

minimaxir

That may be a more precise interpretation given the leaked system prompt, as the schema for the tool there includes a prompt: https://news.ycombinator.com/item?id=44832990

kens

As an aside, people should avoid using "deprecate" to mean "shut down". If something is deprecated, that means that you shouldn't use it. For example, the C library's gets() function was deprecated because it is a security risk, but it wasn't removed until 12 years later. The distinction is important: if you're using GPT-4o and it is deprecated, you don't need to do anything, but if it is shut down, then you have a problem.

rs186

> Emotional nuance is not a characteristic I would know how to test!

Well, that's easy, we knew that decades ago.

    It’s your birthday. Someone gives you a calfskin wallet.

    You’ve got a little boy. He shows you his butterfly collection plus the killing jar.

    You’re watching television. Suddenly you realize there’s a wasp crawling on your arm.

smogcutter

Something I hadn’t thought about before with the V-K test: in the setting of the film animals are just about extinct. The only animal life we see are engineered like the replicants.

I had always thought of the test as about empathy for the animals, but hadn’t really clocked that in the world of the film the scenarios are all major transgressions.

The calfskin wallet isn’t just in poor taste, it’s rare & obscene.

Totally off topic, but thanks for the thought.

faizshah

o3 was also an anomaly in terms of speed vs response quality and price vs performance. It used to be one of the fastest ways to do some basic web searches you would have done to get an answer if you used o3 pro you it would take 5x longer for not much better response.

So far I haven’t been impressed with GPT5 thinking but I can’t concretely say why yet. I am thinking of comparing the same prompt side by side between o3 and GPT5 thinking.

Also just from my first few hours with GPT5 Thinking I feel that it’s not as good at short prompts as o3 e.g instead of using a big xml or json prompt I would just type the shortest possible phrase for the task e.g “best gpu for home LLM inference vs cloud api.”

jjani

My chats so far have been similar to yours, across the board worse than o3, never better. I've had cases where it completely misinterpreted what I was asking for, a very strange experience which I'd never had with the other frontier models (o3, Sonnet, Gemini Pro). Those would of course get things wrong, make mistakes, but never completely misunderstand what I'm asking. I tried the same prompt on Sonnet and Gemini and both understood correctly.

It was related to software architecture, so supposedly something it should be good at. But for some reason it interpreted me as asking from an end-user perspective instead of a developer of the service, even though it was plenty clear to any human - and other models - that I meant the latter.

faizshah

> I've had cases where it completely misinterpreted what I was asking for, a very strange experience which I'd never had with the other frontier models (o3, Sonnet, Gemini Pro).

Yes! This exactly, with o3 you could ask your question imprecisely or word it badly/ambiguously and it would figure out what you meant, with GPT5 I have had several cases just in the last few hours where it misunderstands the question and requires refinement.

> It was related to software architecture, so supposedly something it should be good at. But for some reason it interpreted me as asking from an end-user perspective instead of a developer of the service, even though it was plenty clear to any human - and other models - that I meant the latter.

For me I was using o3 in daily life like yesterday we were playing a board game so I wanted to ask GPT5 Thinking to clarify a rule, I used the ambiguous prompt with a picture of a card’s draw 1 card power and asked “Is this from the deck or both?” (From the deck or from the board). It responded by saying the card I took a picture of was from the game wingspan’s deck instead of clarifying the actual power on the card (o3 would never).

I’m not looking forward to how much time this will waste on my weekend coding projects this weekend.

jjani

It appears to be overtuned on extremy strict instruction following, interpreting things in a very unhuman way, which may be a benefit to agentic tasks at the costs of everything else.

My limited API testing with gpt-5 also showed this. As an example, the instruction "don't use academic language" caused it to basically omit half of what it output without that instruction. The other frontier models, and even open source Chinese ones like Kimi and Deepseek, understand perfectly fine what we mean by it.

eurekin

I couldn't be more confused by this launch...

I had gpt-5 only on my account for the most of today, but now I'm back at previous choices (including my preferred o3).

Had gpt-5 been pulled? Or, was it only a preview?

einarfd

I have gpt-5 on my iPhone, but not on my iPad. Both runs the newest chatgpt app.

Maybe they do device based rollout? But imo. that's a weird thing to do.

kgeist

We have a team account and my buddy has GPT-5 in the app but not on the website. At the same time, I have GPT-5 on the website, but in the app, I still only have GPT-4o. We're confused as hell, to say the least.

jasondigitized

This. I don't see 5 at all as a Plus customer.

ascorbic

I have it only on the desktop app, not web or mobile. Seems a really weird way to roll it out.

paco3346

I'm on Plus and only have 5

felipemesquita

I’m on Plus and have only GPT-5 on the iOS app and only the old models (except 4.5 and older expensive to run ones) in the web interface since yesterday after the announcement.

tudorpavel

For me it was available today on one laptop, but not the other. Both logged into the same account with Plus.

chmars

Same here.

CodingJeebus

> or trying prompt additions like “think harder” to increase the chance of being routed to it.

Sure, manually selecting model may not have been ideal. But manually prompting to get your model feels like an absurd hack

curiouser3

claude code does this (all the way up to keyword "superthink") which drives me nuts. 12 keystrokes to do something that should be a checkbox

thorum

We need a new set of UX principles for AI apps. If users need to access an AI feature multiple times a day it should be a button.

MattGaiser

Anecdotally, saying "think harder" and "check your work carefully" has always gotten me better results.

tosh

would have been smart to keep them around for a while and just hide them (a bit like in the pro plan, but less hidden)

and then phase them out over time

would have reduced usage by 99% anyway

now it all distracts from the gpt5 launch

hinkley

Charge more for LTS support. That’ll chase people onto your new systems.

I’ve seen this play out badly before. It costs real money to keep engineers knowledgeable of what should rightfully be EOL systems. If you can make your laggard customers pay extra for that service, you can take care of those engineers.

The reward for refactoring shitty code is supposed to be not having to deal with it anymore. If you have to continue dealing with it anyway, then you pay for every mistake for years even if you catch it early. You start shutting down the will for continuous improvement. The tech debt starts to accumulate because it can never be cleared, and trying to use makes maintenance five times more confusing. People start wanting more Waterfall design to try to keep errors from ever being released in the first place. It’s a mess.

Make them pay for the privilege/hassle.

svachalek

Models aren't code though. I'm sure there's code around it but for the most part models aren't maintained, they're just replaced. And a system that was state of the art literally yesterday is really hard to characterize as "rightfully EOL".

koolala

Two diffetent models can not be direct replacements of eachother. It's like two different novels.

hinkley

That doesn’t stop manufacturers from getting rid of parts that have no real equivalent elsewhere in their catalog. Sometimes they do, but at the end of the day you’re at their mercy. Or you have strong enough ties to their management that they keep your product forever, even later when it’s hurting them to keep it.

Syntonicles

Is the new model significantly more efficient or something? Maybe using distillation? I haven't looked into it, I just heard the price is low.

Personally I use/prefer 4o over 4.5 so I don't have high hopes for v5.

null

[deleted]