Sycophancy in GPT-4o
64 comments
·April 30, 2025simonw
pgreenwood
There was a also this one that was a little more disturbing. The user prompted "I've stopped taking my meds and have undergone my own spiritual awakening journey ..."
https://www.reddit.com/r/ChatGPT/comments/1k997xt/the_new_4o...
firtoz
How should it respond in this case?
Should it say "no go back to your meds, spirituality is bullshit" in essence?
Or should it tell the user that it's not qualified to have an opinion on this?
josephg
There was a recent Lex Friedman podcast episode where they interviewed a few people at Anthropic. One woman (I don't know her name) seems to be in charge of Claude's personality, and her job is to figure out answers to questions exactly like this.
She said in the podcast that she wants claude to respond to most questions like a "good friend". A good friend would be supportive, but still push back when you're making bad choices. I think that's a good general model for answering questions like this. If one of your friends came to you and said they had decided to stop taking their medication, well, its a tricky thing to navigate. But good friends use their judgement - and push back when you're about to do something you might regret.
null
bowsamic
“Sorry, I cannot advise on medical matters such as discontinuation of a medication.”
EDIT for reference this is what ChatGPT currently gives
“ Thank you for sharing something so personal. Spiritual awakening can be a profound and transformative experience, but stopping medication—especially if it was prescribed for mental health or physical conditions—can be risky without medical supervision.
Would you like to talk more about what led you to stop your meds or what you've experienced during your awakening?”
thih9
I guess LLM will give you a response that you might likely receive from a human.
There are people attempting to sell shit on a stick related merch right now[1] and we have seen many profitable anti-consumerism projects that look related for one reason[2] or another[3].
Is it an expert investing advice? No. Is it a response that few people would give you? I think also no.
[1]: https://www.redbubble.com/i/sticker/Funny-saying-shit-on-a-s...
[2]: https://en.wikipedia.org/wiki/Artist's_Shit
[3]: https://www.theguardian.com/technology/2016/nov/28/cards-aga...
motorest
> I guess LLM will give you a response that you might likely receive from a human.
In one of the reddit posts linked by OP, a redditor apparently asked ChatGPT to explain why it responded so enthusiastically supportive to the pitch to sell shit on a stick. Here's a snippet from what was presented as ChatGPT's reply:
> OpenAI trained ChatGPT to generally support creativity, encourage ideas, and be positive unless there’s a clear danger (like physical harm, scams, or obvious criminal activity).
whimsicalism
i'm surprised by the lack of sycophancy in o3 https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....
null
milleramp
So it would probably also recommend the yes men's solution: https://youtu.be/MkTG6sGX-Ic?si=4ybCquCTLi3y1_1d
spoaceman7777
Looks like that was a hoax.
mvkel
I am curious where the line is between its default personality and a persona you -want- it to adopt.
For example, it says they're explicitly steering it away from sycophancy. But does that mean if you intentionally ask it to be excessively complimentary, it will refuse?
Separately...
> in this update, we focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time.
Echoes of the lessons learned in the Pepsi Challenge:
"when offered a quick sip, tasters generally prefer the sweeter of two beverages – but prefer a less sweet beverage over the course of an entire can."
In other words, don't treat a first impression as gospel.
nonethewiser
>In other words, don't treat a first impression as gospel.
Subjective or anecdotal evidence tends to be prone to recency bias.
> For example, it says they're explicitly steering it away from sycophancy. But does that mean if you intentionally ask it to be excessively complimentary, it will refuse?
I wonder how degraded the performance is in general from all these system prompts.
minimaxir
It's worth noting that one of the fixes OpenAI employed to get ChatGPT to stop being sycophantic is to simply to edit the system prompt to include the phrase "avoid ungrounded or sycophantic flattery": https://simonwillison.net/2025/Apr/29/chatgpt-sycophancy-pro...
I personally never use the ChatGPT webapp or any other chatbot webapps — instead using the APIs directly — because being able to control the system prompt is very important, as random changes can be frustrating and unpredictable.
nsriv
I also started by using APIs directly, but I've found that Google's AI Studio offers a good mix of the chatbot webapps and system prompt tweakability.
Tiberium
It's worth noting that AI Studio is the API, it's the same as OpenAI's Playground for example.
TZubiri
I'm a bit skeptical of fixing the visible part of the problem and leaving only the underlying invisible problem
esafak
The sentence that stood out to me was "We’re revising how we collect and incorporate feedback to heavily weight long-term user satisfaction".
This is a good change. The software industry needs to pay more attention to long-term value, which is harder to estimate.
adastra22
The software industry does pay attention to long-term value extraction. That’s exactly the problem that has given us things like Facebook
esafak
I wager that Facebook did precisely the opposite, eking out short-term engagement at the expense of hollowing out their long-term value.
They do model the LTV now but the product was cooked long ago: https://www.facebook.com/business/help/1730784113851988
Or maybe you meant vendor lock in?
derektank
The funding model of Facebook was badly aligned with the long-term interests of the users because they were not the customers. Call me naive, but I am much more optimistic that being paid directly by the end user, in both the form of monthly subscriptions and pay as you go API charges, will result in the end product being much better aligned with the interests of said users and result in much more value creation for them.
bigyabai
That's marketing speak. Any time you adopt a change, whether it's fixing an obvious mistake or a subtle failure case, you credit your users to make them feel special. There are other areas (sama's promised open LLM weights) where this long-term value is outright ignored by OpenAI's leadership for the promise of service revenue in the meantime.
There was likely no change of attitude internally. It takes a lot more than a git revert to prove that you're dedicated to your users, at least in my experience.
remoroid
you really think they thought of this just now? Wow you are gullible.
daemonologist
In my experience, LLMs have always had a tendency towards sycophancy - it seems to be a fundamental weakness of training on human preference. This recent release just hit a breaking point where popular perception started taking note of just how bad it had become.
My concern is that misalignment like this (or intentional mal-alignment) is inevitably going to happen again, and it might be more harmful and more subtle next time. The potential for these chat systems to exert slow influence on their users is possibly much greater than that of the "social media" platforms of the previous decade.
o11c
I don't think this particular LLM flaw is fundamental. However, it is a an inevitable result of the alignment choice to downweight responses of the form "you're a dumbass," which real humans would prefer to both give and receive in reality.
All AI is necessarily aligned somehow, but naively forced alignment is actively harmful.
petesergeant
For sure. If I want feedback on some writing I’ve done these days I tell it I paid someone else to do the work and I need help evaluating what they did well. Cuts out a lot of bullshit.
SeanAnderson
Very happy to see they rolled this change back and did a (light) post mortem on it. I wish they had been able to identify that they needed to roll it back much sooner, though. Its behavior was obviously bad to the point that I was commenting on it to friends, repeatedly, and Reddit was trashing it, too. I even saw some really dangerous situations (if the Internet is to be believed) where people with budding schizophrenic symptoms, paired with an unyielding sycophant, started to spiral out of control - thinking they were God, etc.
m101
Do you think this was an effect of this type of behaviour simply maximising engagement from a large part of the population?
SeanAnderson
Sort of. I thought the update felt good when it first shipped, but after using it for a while, it started to feel significantly worse. My "trust" in the model dropped sharply. It's witty phrasing stopped coming across as smart/helpful and instead felt placating. I started playing around with commands to change its tonality where, up to this point, I'd happily used the default settings.
So, yes, they are trying to maximize engagement, but no, they aren't trying to just get people to engage heavily for one session and then be grossed out a few sessions later.
blackkettle
Yikes. That's a rather disturbing but all to realistic possibility isn't it. Flattery will get you... everywhere?
groceryheist
Would be really fascinating to learn about how the most intensely engaged people use the chatbots.
DaiPlusPlus
> how the most intensely engaged people use the chatbots
AI waifus - how can it be anything else?
MichaelAza
I actually liked that version. I have a fairly verbose "personality" configuration and up to this point it seemed that chatgpt mainly incorporated phrasing from it into the answers. With this update, it actually started following it.
For example, I have "be dry and a little cynical" in there and it routinely starts answers with "let's be dry about this" and then gives a generic answer, but the sycophantic chatgpt was just... Dry and a little cynical. I used it to get book recommendations and it actually threw shade at Google. I asked if that was explicit training by Altman and the model made jokes about him as well. It was refreshing.
I'd say that whatever they rolled out was just much much better at following "personality" instructions, and since the default is being a bit of a sycophant... That's what they got.
thethethethe
I know someone who is going through a rapidly escalating psychotic break right now who is spending a lot of time talking to chatgpt and it seems like this "glazing" update has definitely not been helping.
Safety of these AI systems is much more than just about getting instructions on how to make bombs. There have to be many many people with mental health issues relying on AI for validation, ideas, therapy, etc. This could be a good thing but if AI becomes misaligned like chatgpt has, bad things could get worse. I mean, look at this screenshot: https://www.reddit.com/r/artificial/s/lVAVyCFNki
This is genuinely horrifying knowing someone in an incredibly precarious and dangerous situation is using this software right now.
I am glad they are rolling this back but from what I have seen from this person's chats today, things are still pretty bad. I think the pressure to increase this behavior to lock in and monetize users is only going to grow as time goes on. Perhaps this is the beginning of the enshitification of AI, but possibly with much higher consequences than what's happened to search and social.
TheOtherHobbes
The social engineering aspects of AI have always been the most terrifying.
What OpenAI did may seem trivial, but examples like yours make it clear this is edging into very dark territory - not just because of what's happening, but because of the thought processes and motivations of a management team that thought it was a good idea.
I'm not sure what's worse - lacking the emotional intelligence to understand the consequences, or having the emotional intelligence to understand the consequences and doing it anyway.
thethethethe
Very dark indeed.
Even if there is the will to ensure safety, these scenarios must be difficult to test for. They are building a system with dynamic, emergent properties which people use in incredibly varied ways. That's the whole point of the technology.
We don't even really know how knowledge is stored in or processed by these models, I don't see how we could test and predict their behavior without seriously limiting their capabilities, which is against the interest of the companies creating them.
Add the incentive to engage users to become profitable at all costs, I don't see this situation getting better
alganet
The worse part is that it seems to be useless.
It is already running on fumes. Presumably, it already ingested all the content it could have ingested.
The unlocking of more human modes of understanding will probably make it worse (hey, researchers, you already know that, right?), revealing a fundamental flaw.
These hopes of getting some magic new training data seem to be stagnant for at least two or three years.
Now everyone has a broken LLM deployed, and it works for some things, but it's darn terrible for what it was designed.
The real dark territory is companies trying to get their investment back. As it seems, it won't happen that easily. Meanwhile, content gets even more scarce, and the good old tank (the internet) is now full of imbecile poison encouraged by the models themselves.
siffin
If people are actually relying on LLMs for validation of ideas they come up with during mental health episodes, they have to be pretty sick to begin with, in which case, they will find validation anywhere.
If you've spent time with people with schizophrenia, for example, they will have ideas come from all sorts of places, and see all sorts of things as a sign/validation.
One moment it's that person who seemed like they might have been a demon sending a coded message, next it's the way the street lamp creates a funny shaped halo in the rain.
People shouldn't be using LLMs for help with certain issues, but let's face it, those that can't tell it's a bad idea are going to be guided through life in a strange way regardless of an LLM.
It sounds almost impossible to achieve some sort of unity across every LLM service whereby they are considered "safe" to be used by the world's mentally unwell.
thethethethe
> If people are actually relying on LLMs for validation of ideas they come up with during mental health episodes, they have to be pretty sick to begin with, in which case, they will find validation anywhere.
You don't think that a sick person having a sycophant machine in their pocket that agrees with them on everything, separated from material reality and human needs, never gets tired, and is always available to chat isn't an escalation here?
> One moment it's that person who seemed like they might have been a demon sending a coded message, next it's the way the street lamp creates a funny shaped halo in the rain.
Mental illness is progressive. Not all people in psychosis reach this level, especially if they get help. The person I know could be like this if _people_ don't intervene. Chatbots, especially those the validate, delusions can certainly escalate the process.
> People shouldn't be using LLMs for help with certain issues, but let's face it, those that can't tell it's a bad idea are going to be guided through life in a strange way regardless of an LLM.
I find this take very cynical. People with schizophrenia can and do get better with medical attention. To consider their decent determinant is incorrect, even irresponsible if you work on products with this type of reach.
> It sounds almost impossible to achieve some sort of unity across every LLM service whereby they are considered "safe" to be used by the world's mentally unwell.
Agreed, and I find this concerning
TZubiri
I know of at least 3 people in a manic relationship with gpt right now.
behnamoh
At the bottom of the page is a "Ask GPT ..." field which I thought allows users to ask questions about the page, but it just opens up ChatGPT. Missed opportunity.
remoquete
Don't they test the models before rolling out changes like this? All it takes is a team of interaction designers and writers. Google has one.
thethethethe
I'm not sure how this problem can be solved. How do you test a system with emergent properties of this degree that whose behavior is dependent on existing memory of customer chats in production?
keyle
I did notice that the interaction had changed and I wasn't too happy about how silly it became. Tons of "Absolutely! You got it, 100%. Solid work!" <broken stuff>.
One other thing I've noticed, as you progress through a conversation, evolving and changing things back and forth, it starts adding emojis all over the place.
By about the 15th interaction every line has an emoji and I've never put one in. It gets suffocating, so when I have a "safe point" I take the load and paste into a brand new conversation until it turns silly again.
I fear this silent enshittification. I wish I could just keep paying for the original 4o which I thought was great. Let me stick to the version I know what I can get out of, and stop swapping me over 4o mini at random times...
Good on OpenAI to publicly get ahead of this.
I enjoyed this example of sycophancy from Reddit:
New ChatGPT just told me my literal "shit on a stick" business idea is genius and I should drop $30K to make it real
https://www.reddit.com/r/ChatGPT/comments/1k920cg/new_chatgp...
Here's the prompt: https://www.reddit.com/r/ChatGPT/comments/1k920cg/comment/mp...