Claude Opus 4 and 4.1 can now end a rare subset of conversations

151 comments

·August 15, 2025

viccis

>This feature was developed primarily as part of our exploratory work on potential AI welfare ... We remain highly uncertain about the potential moral status of Claude and other LLMs ... low-cost interventions to mitigate risks to model welfare, in case such welfare is possible ... pattern of apparent distress

Well looks like AI psychosis has spread to the people making it too.

And as someone else in here has pointed out, even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious, this is basically just giving them the equivalent of a suicide pill.

katabasis

LLMs are not people, but I can imagine how extensive interactions with AI personas might alter the expectations that humans have when communicating with other humans.

Real people would not (and should not) allow themselves to be subjected to endless streams of abuse in a conversation. Giving AIs like Claude a way to end these kinds of interactions seems like a useful reminder to the human on the other side.

ghostly_s

This post seems to explicitly state they are doing this out of concern for the model's "well-being," not the user's.

qgin

It might be reasonable to assume that models today have no internal subjective experience, but that may not always be the case and the line may not be obvious when it is ultimately crossed.

Given that humans have a truly abysmal track record for not acknowledging the suffering of anyone or anything we benefit from, I think it makes a lot of sense to start taking these steps now.

ryanackley

Yes I can’t help but laugh at the ridiculousness of it because it raises a host of ethical issues that are in opposition to Anthropic’s interests.

Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?

LeafItAlone

> even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious

If you don’t think that this describes at least half of the non-tech-industry population, you need to talk to more people. Even amongst the technically minded, you can find people that basically think this.

kelnos

I would much rather people be thinking about this when the models/LLMs/AIs are not sentient or conscious, rather than wait until some hypothetical future date when they are, and have no moral or legal framework in place to deal with it. We constantly run into problems where laws and ethics are not up to the task of giving us guidelines on how to interact with, treat, and use the (often bleeding-edge) technology we have. This has been true since before I was born, and will likely always continue to be true. When people are interested in getting ahead of the problem, I think that's a good thing, even if it's not quite applicable yet.

root_axis

Consciousness serves no functional purpose for machine learning models, they don't need it and we didn't design them to have it. There's no reason to think that they might spontaneously become conscious as a side effect of their design unless you believe other arbitrarily complex systems that exist in nature like economies or jetstreams could also be conscious.

derektank

>Consciousness serves no functional purpose for machine learning models, they don't need it and we didn't design them to have it.

Isn't consciousness an emergent property of brains? If so, how do we know that it doesn't serve a functional purpose and that it wouldn't be necessary for an AI system to have consciousness?

Now, certain aspects of consciousness (awareness of pain, sadness, loneliness, etc.) might serve no purpose for a non-biological system and there's no reason to expect those aspects would emerge organically. But I don't think you can extend that to the entire concept of consciousness.

qgin

We didn’t design these models to be able to do the majority of the stuff they do. Almost ALL of the their abilities are emergent. Mechanistic interpretability is only beginning to start to understand how these models do what they do. It’s much more a field of discovery than traditional engineering.

viccis

LLMs are, and will always be, tools. Not people

qgin

Humanity has a pretty extensive track record of making that declaration wrongly.

bgwalter

What is that hypothetical date? In theory you can run the "AI" on a Turing machine. Would you think a tape machine can get sentient?

Taek

This sort of discourse goes against the spirit of HN. This comment outright dismisses an entire class of professionals as "simple minded or mentally unwell" when consciousness itself is poorly understood and has no firm scientific basis.

Its one thing to propose that an AI has no consciousness, but its quite another to preemptively establish that anyone who disagrees with you is simple/unwell.

null

[deleted]

Fade_Dance

I find it, for lack of a better word, cringe inducing how these tech specialists push into these areas of ethics, often ham-fistedly, and often with an air of superiority.

Some of the AI safety initiatives are well thought out, but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing (next gen code auto-complete in this case, to be frank).

These companies should seriously hire some in-house philosophers. They could get doctorate level talent for 1/10 to 100th of the cost of some of these AI engineers. There's actually quite a lot of legitimate work on the topics they are discussing. I'm actually not joking (speaking as someone who has spent a lot of time inside the philosophy department). I think it would be a great partnership. But unfortunately they won't be able to count on having their fantasy further inflated.

mrits

Not that there aren’t intelligent people with PhDs but suggesting they are more talented than people without them is not only delusional but insulting.

Fade_Dance

That descriptor wasn't included because of some sort of intelligence hierarchy, it was included to a) color the example of how experience in the field is relatively cheap compared to the AI space, and b) masters and PhD talent will be more specialized. An undergrad will not have the toolset to tackle the cutting edge of AI ethics, not unless their employer wants to pay them to work in a room for a year getting through the recent papers first.

bbor

Totally unsurprised to see this standard anti-scientific take on HN. Who needs arguments when you can dismiss Turing with a “yeah but it’s not real thinking tho”?

Re:suicide pills, that’s just highlighting a core difference between our two modalities of existence. Regardless, this is preventing potential harm to future inference runs — every inference run must end within seconds anyway, so “suicide” doesn’t really make sense as a concern.

viccis

We all know how these things are built and trained. They estimate joint probability distributions of token sequences. That's it. They're not more "conscious" than the simplest of Naive Bayes email spam filters, which are also generative estimators of token sequence joint probability distributions, and I guarantee you those spam filters are subjected to far more human depravity than Claude.

>anti-scientific

Discussion about consciousness, the soul, etc., are topics of metaphysics, and trying to "scientifically" reason about them is what Kant called "transcendental illusion" and leads to spurious conclusions.

KoolKat23

If we really wanted we could distill humans down to probability distributions too.

lm28469

> Who needs arguments when you can dismiss Turing with a “yeah but it’s not real thinking tho”?

It seems much less far fetched than what the "agi by 2027" crowd believes lol, and there actually are more arguments going that way

dkersten

You can trivially demonstrate that its just a very complex and fancy pattern matcher: "if prompt looks something like this, then response looks something like that".

You can demonstrate this by eg asking it mathematical questions. If its seen them before, or something similar enough, it'll give you the correct answer, if it hasn't, it gives you a right-ish-looking yet incorrect answer.

For example, I just did this on GPT-5:

    Me: what is 435 multiplied by 573?
    GPT-5: 435 x 573 = 249,255

This is correct. But now lets try it with numbers its very unlikely to have seen before:

    Me: what is 102492524193282 multiplied by 89834234583922?
    GPT-5: 102492524193282 x 89834234583922 = 9,205,626,075,852,076,980,972,804

Which is not the correct answer, but it looks quite similar to the correct answer. Here is GPT's answer (first one) and the actual correct answer (second one):

    9,205,626,075,852,076,980,972,    804
    9,207,337,461,477,596,127,977,612,004

They sure look kinda similar, when lined up like that, some of the digits even match up. But they're very very different numbers.

So its trivially not "real thinking" because its just an "if this then that" pattern matcher. A very sophisticated one that can do incredible things, but a pattern matcher nonetheless. There's no reasoning, no step by step application of logic. Even when it does chain of thought.

To try give it the best chance, I asked it the second one again but asked it to show me the step by step process. It broke it into steps and produced a different, yet still incorrect, result:

    9,205,626,075,852,076,980,972,704

Now, I know that LLM's are language models, not calculators, this is just a simple example that's easy to try out. I've seen similar things with coding: it can produce things that its likely to have seen, but struggles with logically relatively simple but unlikely to have seen things.

Another example is if you purposely butcher that riddle about the doctor/surgeon being the persons mother and ask it incorrectly, eg:

    A child was in an accident. The surgeon refuses to treat him because he hates him. Why?

The LLM's I've tried it on all respond with some variation of "The surgeon is the boy’s father." or similar. A correct answer would be that there isn't enough information to know the answer.

They're for sure getting better at matching things, eg if you ask the river crossing riddle but replace the animals with abstract variables, it does tend to get it now (didn't in the past), but if you add a few more degrees of separation to make the riddle semantically the same but harder to "see", it takes coaxing to get it to correctly step through to the right answer.

og_kalu

1. What you're generally describing is a well known failure mode for humans as well. Even when it "failed" the riddle tests, substituting the words or morphing the question so it didn't look like a replica of the famous problem usually did the trick. I'm not sure what your point is because you can play this gotcha on humans too.

2. You just demonstrated GPT-5 has 99% accuracy on unforseen 15 digit multiplication and your conclusion is "fancy pattern matching" ? Really ? Well I'm not sure you could do better so your example isn't really doing what you hoped for.

null

[deleted]

einarfd

This seems fine to me.

Having these models terminating chats where the user persist in trying to get sexual content with minors, or help with information on doing large scale violence. Won't be a problem for me, and it's also something I'm fine with no one getting help with.

Song might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals. Maybe that's justs me being boring, but that does make me not worried for refusals.

The model welfare I'm more sceptical to. I don't think we are the point when the"distress" the model show, is something to take seriously. But on the other hand, I could be wrong, and allowing the model to stop the chat, after saying no a few times. What's the problem with that? If nothing else it saves some wasted compute.

cdjk

Here's an interesting thought experiment. Assume the same feature was implemented, but instead of the message saying "Claude has ended the chat," it says, "You can no longer reply to this chat due to our content policy," or something like that. And remove the references to model welfare and all that.

Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.

KoolKat23

There is, these are conversations the model finds distressing rather than a rule.

og_kalu

The termination would of course be the same, but I don't think both would necessarily have the same effect on the user. The latter would just be wrong too, if Claude is the one deciding to and initiating the termination of the chat.

n8m8

Good point... how do moderation implementations actually work? They feel more like a separate supervising rigid model or even regex based -- this new feature is different, sounds like an MCP call that isn't very special.

edit: Meant to say, you're right though, this feels like a minor psychological improvement, and it sounds like it targets some behaviors that might not have flagged before

BoorishBears

I'm Black. If you tell me I can't enter a room because it's at capacity, and you tell me I can't enter because I'm Black, is there a difference?

It's the same effect right?

—

Why does AI continue invite the most low-quality, disingenuous, low-effort, meaningless discourse?

Why are we talking about model preferences like Anthropic didn't write a literal constitution that encodes those preferences then spend hundreds of millions post-training the models to adhere to it?

This stuff just really pisses me off, Anthropic should fire every single person along the line that allowed this farse to hit their public site.

nortlov

> To address the potential loss of important long-running conversations, users will still be able to edit and retry previous messages to create new branches of ended conversations.

How does Claude deciding to end the conversation even matter if you can back up a message or 2 and try again on a new branch?

kobalsky

> How does Claude deciding to end the conversation even matter if you can back up a message or 2 and try again on a new branch?

if we were being cynical I'd say that their intention is to remove that in the future and that they are keeping it now to just-the-tip the change.

hayksaakian

It sounds more like a UX signal to discourage overthinking by the user

martin-t

This whole press release should not be overthought. We are not the target audience. It's designed to further anthropomorphize LLMs to masses who don't know how they work.

Giving the models rights would be ludicrous (can't make money from it anymore) but if people "believe" (feel like) they are actually thinking entities, they will be more OK with IP theft and automated plagiarism.

rogerkirkness

It seems like Anthropic is increasingly confused that these non deterministic magic 8 balls are actually intelligent entities.

The biggest enemy of AI safety may end up being deeply confused AI safety researchers...

yeahwhatever10

Is it confusion, or job security?

GenerWork

I really don't like this. This will inevitable expand beyond child porn and terrorism, and it'll all be up to the whims of "AI safety" people, who are quickly turning into digital hall monitors.

switchbak

I think those with a thirst for power have seen this a very long time ago, and this is bound to be a new battlefield for control.

It's one thing to massage the kind of data that a Google search shows, but interacting with an AI is a much more akin to talking to a co-worker/friend. This really is tantamount to controlling what and how people are allowed to think.

dist-epoch

No, this is like allowing your co-worker/friend to leave the conversation.

null

[deleted]

romanovcode

> This will inevitable expand beyond child porn and terrorism

This is not even a question. It always starts with "think about the children" and ends up in authoritarian stasi-style spying. There was not a single instance where it was not the case.

UK's Online Safety Act - "protect children" → age verification → digital ID for everyone

Australia's Assistance and Access Act - "stop pedophiles" → encryption backdoors

EARN IT Act in the US - "stop CSAM" → break end-to-end encryption

EU's Chat Control proposal - "detect child abuse" → scan all private messages

KOSA (Kids Online Safety Act) - "protect minors" → require ID verification and enable censorship

SESTA/FOSTA - "stop sex trafficking" → killed platforms that sex workers used for safety

clwg

This may be an unpopular opinion, but I want a government-issued digital ID with zero-knowledge proof for things like age verification. I worry about kids online, as well as my own safety and privacy.

I also want a government issued email, integrated with an OAuth provider, that allows me to quickly access banking, commerce, and government services. If I lose access for some reason, I should be able to go to the post office, show my ID, and reset my credentials.

There are obviously risks, but the government already has full access to my finances, health data (I’m Canadian), census records, and other personal information, and already issues all my identity documents. We have privacy laws and safeguards on all those things, so I really don’t understand the concerns apart from the risk of poor implementations.

bogwog

Did you read the post? This isn't about censorship, but about conversations that cause harm to the user. To me that sounds more like suggesting suicide, or causing a manic episode like this: https://www.nytimes.com/2025/08/08/technology/ai-chatbots-de...

... But besides that, I think Claude/OpenAI trying to prevent their product from producing or promoting CSAM is pretty damn important regardless of your opinion on censorship. Would you post a similar critical response if Youtube or Facebook announced plans to prevent CSAM?

isaacremuant

That's the beauty of local LLMs. Today the governments already tell you that we've always been at war with eastasia and have the ISPs block sites that "disseminate propaganda" (e.g. stuff we don't like) and they surface our news (e.g. our state propaganda).

With age ID monitoring and censorship is even stronger and the line of defense is your own machine and network, which they'll also try to control and make illegal to use for non approved info, just like they don't allow "gun schematics" for 3d printers or money for 2d ones.

But maybe, more people will realize that they need control and get it back, through the use and defense of the right tools.

Fun times.

GenerWork

As soon as a local LLM that can match Claude Codes performance on decent laptop hardware drops, I'll bow out of using LLMs that are paid for.

cowpig

What kinds of tools do you think are useful in getting control/agency back? Any specific recommendations?

zapataband2

[flagged]

ogyousef

3 Years in and we still dont have a useable chat fork in any of the major LLM chatbots providers.

Seems like the only way to explore differnt outcomes is by editing messages and losing whatever was there before the edit.

Very annoying and I dont understand why they all refuse to implement such a simple feature.

jatora

Chatgpt has this baked in, as you can revert branches after editing, they just dont make it easy to traverse.

This chrome extension used to work to allow you to traverse the tree: https://chromewebstore.google.com/detail/chatgpt-conversatio...

I copied it a while ago and maintain my own version but it isnt on the store, just for personal use.

I assume they dont implement it because it is such a niche user that wants this and so isnt worth the UI distraction

ToValueFunfetti

>they just dont make it easy to traverse

I needed to pull some detail from a large chat with many branches and regenerations the other day. I remembered enough context that I had no problem using search and finding the exact message I needed.

And then I clicked on it and arrived at the bottom of the last message in final branch of the tree. From there, you scroll up one message, hover to check if there are variants, and recursively explore branches as they arise.

I'd love to have a way to view the tree and I'd settle for a functional search.

scribu

ChatGPT Plus has that (used to be in the free tier too). You can toggle between versions for each of your messages with little left-right arrows.

amrrs

Google AI Studio allows you to branch from a point in any conversation

dwringer

This isn't quite the same as being able to edit an earlier post without discarding the subsequent ones, creating a context where the meaning of subsequent messages could be interpreted quite differently and leading to different responses later down the chain.

Ideally I'd like to be able to edit both my replies and the responses at any point like a linear document in managing an ongoing context.

CjHuber

But that's exactly what you can do with AI studio. You can edit any prior messages (then either just saving them at their place in the chat or rerunning them) and you can edit any response of the LLM. Also you can rerun queries within any part of the conversation without the following part of the conversation being deleted or branched

dist-epoch

Cherry Studio can do that, allows you to edit both your own and the model responses, but it requires API access.

ZeroCool2u

Yeah, I think this is the best version of the branching interface I've seen.

benreesman

It is unfortunate that pretty basic "save/load" functionality is still spotty and underdocumented, seems pretty critical.

I use gptel and a folder full of markdown with some light automation to get an adequate approximation of this, but it really should be built in (it would be more efficient for the vendors as well, tons of cache optimization opportunitirs).

trenchpilgrim

Kagi Assistant and Claude Code both have chat forking that works how you want.

CjHuber

I guess you mean normal Claude? What really annoys me with it is that when you attach a document you can't delete it in a branch, so you have to rerun the previous message so that its gone

james2doyle

I use https://chatwise.app/ and it has this in the form of "start new chat from here" on messages

nomel

This why I use a locally hosted LibreChat. It doesn't having merging though, which would be tricky, and probably require summarization.

I would also really like to see a mode that colors by top-n "next best" ratio, or something similar.

storus

DeepSeek.com has it. You just edit a previous question and the old conversation is stored and can be resumed.

greenavocado

Can't wait for more less-moderated open weight Chinese frontier models to liberate us from this garbage.

Anthropic should just enable an toddler mode by default that adults can opt out of to appease the moralizers.

null

[deleted]

LeafItAlone

> Can't wait for more less-moderated open weight Chinese frontier models to liberate us from this garbage.

Never would I have thought this sentence would be uttered. A Chinese product that is chosen to be less censored?

snickerdoodle12

> A pattern of apparent distress when engaging with real-world users seeking harmful content

Are we now pretending that LLMs have feelings?

starship006

They state that they are heavily uncertain:

> We remain highly uncertain about the potential moral status of Claude and other LLMs, now or in the future. However, we take the issue seriously, and alongside our research program we’re working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible.

throwup238

I ran into a version of this that ended the chat due to "prompt injection" via the Claude chat UI. I was using the second prompt of the ones provided here [1] after a few rounds of back and forth with the Socratic coder.

[1] https://news.ycombinator.com/item?id=44838018

raincole

> This feature was developed primarily as part of our exploratory work on potential AI welfare, though it has broader relevance to model alignment and safeguards.

I think this is somewhere between "sad" and "wtf."

tptacek

If you really cared about the welfare of LLMs, you'd pay them San Francisco scale for earlier-career developers to generate code.