Skip to content(if available)orjump to list(if available)

Anthropic Outage for Opus 4.5 and Sonnet 4/4.5 across all services

dinkleberg

Props to them for actually updating their status page as issues are happening rather than hours later. I was working with claude code and hit an API error, checked the status page and sure enough there was an outage.

This should be a given for any service that others rely on, but sadly this is seldom the case.

palcu

Thank you! Opening an incident as soon as user impact begins is one of those instincts you develop after handling major incidents for years as an SRE at Google, and now at Anthropic.

I was also fortunate to be using Claude at that exact moment (for personal reasons), which meant I could immediately see the severity of the outage.

koakuma-chan

It's important for companies to use their own products.

aduwah

Take my condolences, Sunday outages are rough

null

[deleted]

LanceH

Confusingly, I was trying to debug something with a 529, and this outage really had me going for a minute.

arach

Same as you and I was glad to see the status page - hit subscribe on updates

Claude user base believes in Sunday PM work sessions

gwd

As a solo bootstrapped founder, I take my sabbath sundown on Saturday to sundown on Sunday. Sunday evening therefore is generally the start of my work week.

airstrike

Sunday PM builder, reporting in.

exe34

hah I ran out of tokens a bit before it hit I reckon.

rnewme

same here, and I just got started, Hm..

null

[deleted]

fragmede

Seldom? Most status pages I've seen do eventually get updated, just not within that first critical 3 minutes.

palcu

Hello, I'm one of the engineers who worked on the incident. We have mitigated the incident as of 14:43 PT / 22:43 UTC. Sorry for the trouble.

l1n

Also an engineer on this incident. This was a network routing misconfiguration - an overlapping route advertisement caused traffic to some of our inference backends to be blackholed. Detection took longer than we’d like (about 75 minutes from impact to identification), and some of our normal mitigation paths didn’t work as expected during the incident.

The bad route has been removed and service is restored. We’re doing a full review internally with a focus on synthetic monitoring and better visibility into high-impact infrastructure changes to catch these faster in the future.

wouldbecouldbe

Trying to understand what this means.

Did the bad route cause an overload? Was there a code error on that route that wasn’t spotted? Was it a code issue or an instance that broke?

999900000999

Was this a typo situation or a bad process thing ?

Back when I did website QA Automation I'd manually check the website at the end of my day. Nothing extensive, just looking at the homepage for piece of mind.

Once a senior engineer decided to bypass all of our QA, deploy and took down prod. Fun times.

weird-eye-issue

Great story. They should hire you and you could open the Claude homepage once a day. It would definitely fix issues like this

nickpeterson

The one time you desperately need to ask Claude and it isn’t working…

dan_wood

Can you divulge more on the issue?

Only curious as a developer and dev op. It's all quite interesting where and how things go wrong especially with large deployments like Anthropic.

binsquare

I yearn for the nitty gritty details too

mulhoon

They turned it off and on again.

dgellow

Hope you have a good rest of your weekend

Chance-Device

Thank you for your service.

g-mork

it's still down get back to work

irishcoffee

I’m imagining a steampunk dystopia in 50 years: “all world production stopped, LLM hosting went down. The market is in free-fall. Sam, are you there?”

Man that cracks me up.

lxgr

Everybody using the same three centralized inference providers? That would be as absurd and unrealistic as everybody hosting in us-east-1 and behind Cloudflare today!

adonovan

“A lone coder, trained in the direct manipulation of symbols—an elegant weapon from a more civilized age—-is now all that stands between humanity and darkness.” etc

michelsedgh

Just like the internet, or Cloudflare going down?

irishcoffee

No, not even close

patcon

Agreed. When cloudflare (ugh, aka the internet) goes down, we can't access information to think and work through. ("the fuel" in some metaphor)

But what about when LLMs go down and a good chunk of a whole generation won't even know how to think, when the remote system goes down? (Is the ability to think "the engine" of self and agency in this metaphor?)

We are building a wildly irresponsible context to exist in.

bdangubic

it is much worse, I forgot how to push to remote so deploys are delayed :)

jsight

I remember hearing Karpathy refer to these outages as a worldwide "intelligence brownout".

Crazy: https://www.youtube.com/shorts/SV4DMqAJ8RQ

cdelsolar

Claude code cut me off a few days ago and I _seriously_ had no idea what to do. I’ve been coding for 33 years and I suddenly felt like anything I did manually would be an order of magnitude slower than it had to be.

sergiotapia

Like if electricity went out, no? Same deal.

irishcoffee

Imagination.

“So sorry, we can’t read the WiFi signal from your generator. We don’t know how much to tax you for the emissions. Sadly, we can’t allow it to start.”

The story writes itself.

teaearlgraycold

The nice thing is unlike Cloudflare or AWS you can actually host good LLMs locally. I see a future where a non-trivial percentage of devs have an expensive workstation that runs all of the AI locally.

breatheoften

I'm more and more convinced of the importance of this.

There is a very interesting thing happening right now where the "llm over promisers" are incentivized to over promise for all the normal reasons -- but ALSO to create the perception that the "next/soon" breakthrough is only going to be applicable when run on huge cloud infra such that running locally is never going to be all that useful ... I tend to think that will prove wildly wrong and that we will very soon arrive at a world where state of art LLM workloads should be expected to be massively more efficiently runnable than they currently are -- to the point of not even being the bottleneck of the workflows that use these components. Additionally these workloads will be viable to run locally on common current_year consumer level hardware ...

"llm is about to be general intelligence and sufficient llm can never run locally" is a highly highly temporary state that should soon be falsifiable imo. I don't think the llm part of the "ai computation" will be the perf bottleneck for long.

lwhi

Is there any utility in thinking about LLM provision in terms of the electricity grid?

I've often thought that local power generation (via solar or wind) could be (or could have been) a viable alternative to national grid supply.

lxgr

I’m fairly sure you can also still run computers locally and connect them to the Internet.

irishcoffee

Ah, you need to buy into this dystopia wholesale. The internet is also down because the LLMs fucked up the BGP routing table, which congress agreed (at the time) should run through the LLM interface.

Imagination, either the first or last thing to die in 2075.

cft

That's the only future of open source that I can see.

colordrops

What's the best you can do hosting an LLM locally for under $X dollars. Let's say $5000. Is there a reference guide online for this? Is there a straight answer or does it depend? I've looked at Nvidia spark and high end professional GPUs but they all seem to have serious drawbacks.

exe34

I think it's possible, but the current trend is that by the time you can run x level at home, they have 10-100x in the frontier models, so if you can run today's Claude.ai at home, then software engineering as a career is already over.

pstuart

My poorly informed hope is that that we can have mixture of experts with highly tuned models on areas of focus. If I'm coding in language Foo, I only care about a model that understands Foo and its ecosystem. I imagine that should be self-hostable now.

szundi

[dead]

null

[deleted]

dorongrinstein

You guys do an awesome job. I noticed the outage today and felt bad for your Sunday. If you guys need help with unbreakable compute, please reach out to https://controlplane.com

we let you run compute across any cloud so if a region/cloud experiences an issue, users are served on surviving regions/clouds/on-prem.

Cheers, Doron

sebastiennight

In the Claude.ai chat, this was announced to me as

    "You have reached the messages quota for your account. It will reset in 2 hours, or you can upgrade now"
Either I have perfect timing for reaching my quota limits, or some product monetization manager deserves a raise.

manquer

More likely that error handling is not well implemented - i.e Either backend is not throwing the equivalent of 429/402 errors or the gateway is not handling the errors well and returns this message even though a 429 is being thrown.

frankdenbow

i ran into the same thing, i thought it was just timing

michelsedgh

If they shut down opus 4.5 I'll cry

agumonkey

i already heard people ask for more api credits embarassed like drug addics

XCSme

Just a few more credits and it will finally fix that bug without introducing new ones, exactly how I asked

baobabKoodaa

I can stop any time I want, and in fact I am going to stop. Just one more (bug)fix.

teaearlgraycold

I think we’re all very happy with the pricing on it.

tcdent

I use it as much as my brain can handle and I never exceed my Max plan quota.

AnotherGoodName

Just a warning for those not on the max plan; if you pay by the token or have the lower tier plans you can easily blow through $100s or cap your plan in under an hour. The rates for paying by the token are insane and the scaling from pro to max is also pretty crazy.

They made pro have many times more value than paying per token and then they made max again have 25x more tokens than pro on the $200 plan.

It’s a bit like being offered rice at $1 per grain (pay per token) or a tiny bag of rice for $20 (pro) or a truck load for $200. That’s the pricing structure right now.

So while i agree you can’t easily exceed the quota on the big plans it’s a little crazy how they’ve tiered pricing. I hope no one out there’s paying per token!

michelsedgh

Exactly I feel like my brain burns out after a few days. Like Im the limit already (yet im the maximizer also) its a very weird feeling

m_ke

Was it just me or did Opus start producing incredibly long responses before the crash. I was asking basic questions and it wouldn't stop trying to spit out full codebases worth of unrelated code. For some very simple questions about database schemas it ended up compacting twice on a 3 message conversation.

6r17

It seems resolved now (per the status-page) - i experienced a moment where the agent got stuck in the same error loop just to pop the result this time. Makes me wonder if there has been some kind of rule applied in order to automatically detect such failure occurring again - quiet inspiring work

termos

https://canivibe.ai/

So we can maybe vibe, depending what service we use.

giancarlostoro

Nice website, embeds poorly on Discord and other chat apps sadly.

iLoveOncall

Wow 89% availability is a joke

bonesss

Vibedetector

ares623

We need a service that rates vibe coding capabilities. A "vibe rater".

jcims

Anyone know if Claude via Amazon bedrock was impacted?

AFAIK it shouldn’t have been.

abigail95

it's monday morning i'm going back to bed

Tom1380

Australia?

abigail95

yes and for political reasons i'm also taking the day off, this is just another excuse.