Claude Code weekly rate limits
494 comments
·July 28, 2025Wowfunhappy
I'm probably not going to hit the weekly limit, but it makes me nervous that the limit is weekly as opposed to every 36 hours or something. If I do hit the limit, that's it for the entire week—a long time to be without a tool I've grown accustomed to!
I feel like someone is going to reply that I'm too reliant on Claude or something. Maybe that's true, but I'd feel the same about the prospect of loosing ripgrep for a week, or whatever. Loosing it for a couple of days is more palatable.
Also, I find it notable they said this will affect "less than 5% of users". I'm used to these types of announcements claiming they'll affect less than 1%. Anthropic is saying that one out of every 20 users will hit the new limit.
el_benhameen
This is how I feel about the 100 msg/wk limit on o3 for the ChatGPT plus plan. There’s no way to see how much I’ve used, and it’s an important enough resource that my lizard brain wants to hoard it. The result is that I way underutilize my plan and go for one of the o4-mini models instead. I would much prefer a lower daily limit, but maybe the underutilization is the point of the weekly limit.
*edited to change “pro” to “plus”
landl0rd
You can tell how it’s intentional with both OpenAI and Anthropic by how they’re intentionally made opaque. I cant see a nice little bar with how much I’ve used versus have left on the given rate limits so it’s pressuring users to hoard. Because it prevents them from budgeting it out and saying “okay I’ve used 1/3 of my quota and it’s Wednesday, I can use more faster.”
canada_dry
OpenAI's "PRO" subscription is really a waste of money IMHO for this and other reasons.
Decided to give PRO a try when I kept getting terrible results from the $20 option.
So far it's perhaps 20% improved in complex code generation.
It still has the extremely annoying ~350 line limit in its output.
It still IGNORES EXPLICIT CONTINUOUS INSTRUCTIONS eg: do not remove existing comments.
The opaque overriding rules that - despite it begging forgiveness when it ignores instructions - are extremely frustrating!!
xpe
> pressures users to hoard
As a pedantic note, I would say 'ration'. Things you hoard don't magically go away after some period of time.
sothatsit
Anthropic also does this because they will dynamically change the limits to manage load. Tools like ccusage show you how much you've used and I can tell sometimes that I get limited with significantly lower usage than I would usually get limited for.
brookst
I think the simple product prioritization explanation makes way more sense than a a second-order conspiracy to trick people into hoarding.
Reality is probably that there’s a backlog item to implement a view, but it’s hard to prioritize over core features.
sitkack
Working as Intended.
Wowfunhappy
Well, kind of. If you don't use it at all you're going to unsubscribe.
This isn't like a gym membership where people join aspirationally. No one's new year's resolution is "I'm going to use o3 more often."
gfiorav
I nervously hover over the VSCode Copilot icon, watching the premium requests slowly accumulate. It’s not an enjoyable experience (whether you know how much you've used or not :) )
milankragujevic
Where did you find this info? I am unable to find in on OpenAI's website. https://help.openai.com/en/articles/6950777-what-is-chatgpt-...
I haven't yet run into this limit...
null
oc1
They know this psychology. This dark pattern is intentional so you will use their costly service less.
hn_throwaway_99
I don't think this counts as a "dark pattern". The reality is that these services are resource constrained, so they are trying to build in resource limits that are as fair as possible and prevent people from gaming the system.
artursapek
Just curious, what do people use these expensive reasoning models for?
littlestymaar
> This is how I feel about the 100 msg/wk limit on o3 for the ChatGPT
Do I read this correctly? Only 100 messages per week, on the pro plan worth a few hundred buck a month?!
CSMastermind
That's definitely not correct because I'm on the pro plan and make extensive use of o3-pro for coding. I've sent 100 messages in a single day with no limitation.
Per their website: https://help.openai.com/en/articles/9793128-what-is-chatgpt-...
There are no usage caps on pro users (subject to some common sense terms of use).
mhl47
No it's 100 a week for plus users.
doorhammer
I think it’s just a mistype
I have a pro plan and I hammer o3–I’d guess more than a hundred a day sometimes—and have never run into limits personally
Wouldn’t shock me if something like that happened but haven’t seen evidence of it yet
jstummbillig
If it behaves anything like the GPT-4.5 Limit, it will let you know when you near the limit.
yencabulator
Claude daily limit sure didn't.
forty
It makes me sad that devs start relying on proprietary online services to be able to work. We have been lucky enough to have FOSS tools to do everything and not to have to rely on any specific service or company to work and some of us are deciding to become like Monsanto-addicted farmers who forgot how to do their jobs without something they have to pay every month.
pythonguython
Do you mind sharing what industry you’re in where you can fully rely on FOSS? In my industry we’re dependent on MATLAB, Xilinx tools, closed source embedded software and more. To name a few industries: game devs might be stuck with unity, finance quant devs might be stuck with Bloomberg terminals, iOS app devs are stuck with apple’s tooling etc… this isn’t just an LLM problem IMO.
brookst
I remember being beholden to commercial (cough, pirated, cough) compilers and assemblers back in the day. FOSS is awesome but often lags because capital sees a chance to make money and can move faster.
It will change. There will be FOSS models, once it no longer takes hundreds of millions of dollars to train them.
danielbln
The very recent agentic open weights models seem to be shaping up, so if all fails you can host one of these yourself (if you have the vram) or host it yourself somewhere.
sneak
We can and did work without it. It just makes us many times faster.
Nothing about using an LLM removes skills and abilities you already had before it.
3836293648
For the industry as a whole it absolutely does. And for the individual it absolutely does kill your ability to do it unless you actually do practice.
And yes, the goal might be to only use it for boilerplate or first draft. But that's today, people are lazy, just wait for the you of tomorrow
hoppp
My experience was that reviewing generated code can take longer than writing it from scratch.
There was research about vibe coding that had similar conclusion. Feels productive but can take longer to review.
the moment you generate code you don't instantly understand you are better off reading the docs and writing it yourself
umbra07
If you don't use a skill, it atrophies.
Now, maybe that is the future (no more/extremely little human-written code). Maybe that's a good thing in the same way that "x technological advancement means y skill is no longer necessary" - like how the advent of readily-accessible live maps means you don't need to memorize street intersections and directions or whatever. But it is true.
blitzar
> Anthropic is saying that one out of every 20 users will hit the new limit.
I regularly hit the the Pro limits 3 times a day using sonnet. If I use claude code & claude its over in about 30 minutes. No multi 24/7 agent whatever, no multiple windows open (except using Claude to write a letter between claude code thoughts).
I highly doubt I am a top 5%er - but wont be shocked if my week ends on a wednessday. I was just starting to use Claude chat more as it is in my subscription but if I can not rely on it to be available for multiple days its functionally useless - I wont even bother.
Aurornis
> If I use claude code & claude its over in about 30 minutes.
Can you share what you're doing? I've been experimenting with Claude Code and I feel like I have to be doing a lot with it before I even start seeing the usage warning limits on the $20/month plan.
When I see people claiming they're getting rate limited after 30 minutes on the $100/month plan I have a hard time understanding what they're doing so different.
For what it's worth I don't use it every day, so maybe there's a separate rate that applies to heavy and frequent users?
flutas
$20/mo plan doesn't include opus (the larger model) like the $100+ plans do, it's likely they are hitting the opus limit which is fairly low.
bogtog
> I highly doubt I am a top 5%er - but wont be shocked if my week ends on a wednessday. I was just starting to use Claude chat more as it is in my subscription but if I can not rely on it to be available for multiple days its functionally useless - I wont even bother.
You very well might be a top 5%er among people only on the Pro rather than Max plan
ketzo
What does your Claude code usage look like if you’re getting limited in 30 minutes without running multiple instances? Massive codebase or something?
blitzar
I set claude about writing docstrings on a handful of files - 4/5 files couple 100 lines each - couple of classes in each - it didnt need to scan codebase (much).
Low danger task so I let it do as it pleased - 30 minutes and was maxed out. Could probably have reduced context with a /clear after every file but then I would have to participate.
rapind
I think you'll want to specify your /model to not use opus. Strangely unintuitive, but I opted out of opus on the max plan myself and aren't really having any usage issues since.
_jab
> Anthropic is saying that one out of every 20 users will hit the new limit.
Very good point, I find it unlikely that 1/20 users is account sharing or running 24/7 agentic workflows.
Terretta
Moreover, if you run a SaaS, generally somewhere from 1 in 5 to 1 in 20 users are using you for real, while the others are mostly not using you.
The stat would be more interesting if instead of 1 in 20 users, they said x in y of users with at least one commit per business day, or with at least one coding question per day, or whatever.
I suspect this could be a significantly higher percentage of professional users they plan to throttle. Be careful of defining Pro like Apple does if you market to actual professionals who earn based on using your product. Your DAUs might be a different ratio than you expect.
0cf8612b2e1e
… if you run a SaaS, generally somewhere from 1 in 5 to 1 in 20 users are using you for real, while the others are mostly not using you.
That is a hilarious and believable stat. Has anyone published such numbers or is it a dirty secret about how many corporate licenses are purchased and never used by the rank and file?I can personally think of a few internally licensed products, announced with huge fan fare, which never get used beyond the demo to a VP.
rapind
A decent chunk (more than 1/20) account shared netflix. Also there are probably some who are account sharing with more than one other person. I don't really doubt it.
furyofantares
> I'm probably not going to hit the weekly limit, but it makes me nervous that the limit is weekly as opposed to every 36 hours or something. If I do hit the limit, that's it for the entire week—a long time to be without a tool I've grown accustomed to!
Well, not the entire week, however much of it is left. You said you probably won't hit it -- if you do, it's very likely to be in the last 36 hours (20% of a week) then, right? And you can pay for API usage anyway if you want.
arghwhat
> but I'd feel the same about the prospect of loosing ripgrep for a week, or whatever. Loosing it for a couple of days is more palatable.
Just to nitpick: When the limit is a week, going over it does not mean losing access for a week, but for the remaining time which would assuming the limits aren't overly aggressive mean losing access for at most a couple of days (which you say is more palatable).
I wouldn't say you're too reliant, but it's still good to stay sharp by coding manually every once in a while.
arach
if it affects only a minority of accounts, why not figure out how to special case them without affecting everyone else is the primary question I would ask myself if I worked on this
the principle: let's protect against outliers without rocking the behavior of the majority, not at this stage of PMF and market discovery
i'd also project out just how much the compute would cost for the outlier cohort - are we talking $5M, $100M, $1B per year? And then what behaviors will simply be missed by putting these caps in now - is it worth missing out on success stories coming from elite and creative users?
I'm sure this debate was held internally but still...
vineyardmike
Because the goal is to extract more money from the people who have significant usage. These users are the actual targets of the product. The idea that it’s a few bad actors is misdirection of blame to distract “power users”.
They undercharged for this product to collect usage data to build better coding agents in the future. It was a ploy for data.
Anecdotally, I use Claude Code with the $20/mo subscription. I just use it for personal projects, so I figured $20 was my limit on what I’d be willing to spend to play around with it. I historically hit my limits just a few times, after ~4hrs of usage (resets every 5hrs). They recently updated the system and I hit my limits consistently within an hour or two. I’m guessing this weekly limit will affect me.
I found a CLI tool (which I found in this thread today) that estimates I’m using ~$150/mo in usage if I paid through the API. Obviously this is very different from my payments. If this was a professional tool, maybe I’d pay, but not as a hobbyist.
Uehreka
> why not figure out how to special case them without affecting everyone else
I’m guessing that they did, and that that’s what this policy is.
If you’re talking about detecting account sharing/reselling, I’m guessing they have some heuristics, but they really don’t want the bad press from falsely accusing people of that stuff.
arach
fair enough - DS probably ran through data and came up with 5% and some weekly cutoff as a good starting point until they have better measures in place
my point is that 5% still a large cohort and they happen to be your most excited/creative cohort. they might not all want to pay a surchage yet while everyone is discovering the use cases / patterns / etc
having said that, entirely possible burn rate math and urgency requires this approach
data-ottawa
They did have several outages last week, it would be good to find better plans for those huge users but I can also see them wanting to just stop the bleeding.
arach
I've noticed the frequent perf issues and I'm on the 20x plan myself - good point that you'd want to stop the bleeding or bad actors to make sure the majority have a better experience
Aurornis
> why not figure out how to special case them without affecting everyone else is the primary question I would ask myself if I worked on this
The announcement says that using historical data less than 5% of users would even be impacted.
That seems kind of clear: The majority of users will never notice.
arach
5% of a large number is a large number - this why it's both a significant problem for them and why I'm thinking out loud about the downsides of discouraging good actors who happen to be power users.
that 5% is probably the most creative and excited cohort. obviously it's critical to not make the experience terrible for the 95% core, but i'd hate to lose even a minority of the power users who want to build incredible things on the platform
having said that, the team is elite, sure they are thinking about all angles of this issue
nharada
What do you think they should have done instead?
actsasbuffoon
At a bare minimum there needs to be some way to understand how close you are to these limits. People shouldn’t be wondering if this is going to impact them or not.
arach
It’s tricky without seeing the actual data. 5% of a massive user base can still be a huge number so I get that it’s hard to be surgical.
But those power users are often your most creative, most productive, and most likely to generate standout use cases or case studies. Unless they’re outright abusing the system, I’d lean toward designing for them, not against them.
if the concern is genuine abuse, that feels like something you handle with escalation protocols: flag unusual usage, notify users, and apply adaptive caps if needed. Blanket restrictions risk penalizing your most valuable contributors before you’ve even discovered what they might build
bananapub
> if it affects only a minority of accounts, why not figure out how to special case them without affecting everyone else
that's exactly what they have done - the minority of accounts that consume many standard deviations above the mean of resources will be limited, everyone else will be unaffected.
arach
"You're absolutely right!" i misread the announcement - thought everyone moved to primarily a weekly window but seems like 5hr window still in place and they're putting in place another granularity level control that DS teams will adjust to cutoff mostly bad actors.
correct me if I'm wrong, it's not like we have visibility into the token limit logic, even on the 5hr window?
jonas21
You can use an API key to get more usage on a pay-as-you-go basis.
blitzar
You can set cash on fire if you want to.
thejazzman
i've gotten months of usage out of openai and claude where i seeded each with only $5
but if you use an agent and it tries to include a 500kb json file, yeah, you will light cash on fire
(this happened to me today but the rate limit bright it to my attention.)
fullstackwife
you spend 400$ per month on api usage, but your AI builds the next unicorn worth billions, where is the problem?
Wowfunhappy
Yes, but that's so expensive I will never do it!
tqwhite
With one months exception, I've never gotten past $150 with API. I plan to do the $100 plan and use the API for overflow. I think I will come out ahead.
slimebot80
Overall I think this is as positive - protecting the system from being hit heavily 24/7 and with multiple agents from some users might make the system more sustainable for a wider population of users.
This one thing that bugs me is the visibility of how far through your usage you are. Being told when you're close to the end means I cannot plan. I'm not expecting an exact %, but a few notices at intervals (eg: halfway through) would help a lot. Not providing this kinda makes me worry they don't want us to measure. (I don't want to closely measure, but I do want to have a sense of where I am at)
jimbo808
I'm not sure how this will play out long term, but I really am not a fan of having to feel like I'm using a limited resource whenever I use an LLM. People like unlimited plans, we are used to them for internet, text messaging, etc. The current pricing models just feel bad.
andruby
Unlimited works well for everything that is “too cheap to meter”.
Internet, text messages, etc are roughly that: the direct costs are so cheap.
That’s not the case with LLM’s at this moment. There are significant direct costs to each long-running agent.
rmujica
Internet and SMS used to be expensive and metered until they weren't thanks to technological advances and expanded use. I think LLMs will follow the same path, maybe on a shorter timespan.
cmsjustin
They were not expensive to operate, they were only expensive for consumers
alwillis
Yes and no.
It’s very expensive to create these models and serve them at scale.
Eventually the processing power required to create them will come down, but that’s going to be a while.
Even if there was a breakthrough GPU technology announced tomorrow, it would take several years before it could be put into production.
And pretty much only TSMC can produce cutting edge chips at scale and they have their hands full.
Between Anthropic, xAI and OpenAI, these companies have raised about $84 billion dollars in venture capital… VCs are going to want a return on their investment.
So it’s going to be a while…
margalabargala
SMS was designed from the start to fit in the handul of unused bytes in the tower handshake that was happening anyway, hence the 160 char limit. Its marginal cost has always been free on the supply side.
xtracto
I think LLMs follow more of an Energy analogy: Gas or Electricity, or even water.
How much has any if these decreased over the last 5 decades? The problem is that as of right now, LLM cost is linearly (if not exponentially) related to the output. It's basically "transferring energy" converted into bytes. So unless we see some breakthrough in energy generation, or better use it, it will be difficult to scale.
This makes me wonder, would it be possible to pre-compue some kind of "rainbow tables" equivalent for LLMs? Either stored in the client or in the server; so as to reduce the computing needed for inference.
whimsicalism
maybe, but they are not nearly as comparable as you’re making it out to be
MuffinFlavored
> That’s not the case with LLM’s at this moment.
I'd be curious to know how many tokens the average $200/mo user uses and what the cost on their end for it is.
KronisLV
I personally take an issue with them expecting that your usage would be more or less consistent throughout the month. Instead, I might have low usage throughout most of the month and then an 11 hour binge a few days, which in most cases would involve running into rate limits (either that, or just token limitations for inputs).
That's why using the API directly and paying for tokens anything past that basic usage feels a bit nicer, since it's my wallet that becomes the limitation then, not some arbitrary limits dreamed up by others. Plus with something like OpenRouter, you can also avoid subscription tier related limits like https://docs.anthropic.com/en/api/rate-limits#rate-limits
Though for now Gemini 2.5 Pro seems to work a bit better than Claude for my code writing/refactoring/explanation/exploration needs. Curious what other cost competitive options are out there.
tqwhite
This is my strategy as well. I definitely have surges of usage.
Except for one catastrophic binge where I accidentally left Opus on for a whole binge (KILL ME!!!), I use around $150/month. I like having the spigot off when I am not working.
Would the $100/month plan plus API for overflow come out ahead? Certainly on some months. Over the year, I don't know. I'll let you know.
bugglebeetle
Gemini 2.5 Pro is a better coding model, but Gemini CLI is way behind Claude Code, perhaps because the model itself isn’t well-tuned for agentic work. If you’re just doing targeted refactoring and exploration, you can copy and paste back and forth from the web app for $20 a month.
KronisLV
I mostly use RooCode nowadays, which works well enough with both Claude and Gemini and other models, even locally hosted ones. Decoupling the LLM vendor from the tools might miss out on some finer features, but also gives me a little bit more freedom, much like how you can also do with the Copilot plugins and Continue.dev and a few others.
Note: all of them sometimes screw up applying diffs, but in general are good enough.
ewoodrich
Gemini 2.5 Pro made some big post-launch improvements for tool calling/agentic usage that made it go from “pulling teeth” to “almost as smooth as Claude” in Cline/Roo Code (which is saying something since Cline was originally built around Claude tool use specifically).
So the team at least seems to be aware of its shortcomings in that area and working to improve it with some success which I appreciate.
But you are correct that Gemini CLI still lags behind for whatever reason. It gets stuck in endless thought loops way too often for me, like maybe 1/15 tasks hits a thought loop burning API credits or it just never exits from the “Completing task, Verifying completion, Reviewing completion, Assessing completion status…” phase (watching the comical number of ways it rephrases it is pretty funny though).
Meanwhile I’ve only had maybe one loop over a period of a couple months using Gemini 2.5 Pro heavily in Roo Code with the most recent version so it seems like an issue with the CLI specifically.
j45
Can anyone help compare a cost comparison between Gemini 2.5 pro vs Claude Code on a plan or API?
Jcampuzano2
My opinion is all of these tools should completely get rid of the "pay 20/month, 200/month", etc just to get access to some beholden rate limited amount that becomes hard to track.
Mask off completely and just make it completely usage based for everyone. You could do something for trial users like first 20 (pick your number here) requests are free if you really need to in order to get people on board. Or you could do tiered pricing like first 20 free, next 200 for X rate, next 200 for X*1.25 rate, and then for really high usage users charge the full cost to make up for their extreme patterns. With this they can still subsidize for the people who stay lower on usage rates for market share. Of course you can replace 200 requests with just token usage if that makes sense but I'm sure they can do the math to make it work with request limits if they work hard enough.
Offer better than open-router pricing and that keeps people in your system instead of reaching for 3rd party tools.
If your tool is that good, even with usage based it will get users. The issue is all the providers are both subsidizing users to get market share, but also trying to prohibit bad actors and the most egregious usage patterns. The only way this 100% becomes a non-issue is usage based for everything with no entry fee.
But this also hurts some who pay a subscription but DONT use enough to account for the usage based fees. So some sales people probably don't like that option either. It also makes it easier for people to shop around instead of feeling stuck for a month or two since most people don't want multiple subs at once.
vineyardmike
> My opinion is all of these tools should completely get rid of the "pay 20/month, 200/month", etc just to get access.
I think that you should just subscribe to a preset allotment of tokens at a certain price, or a base tier with incremental usage costs for models that aren’t tiny (like paid per minute “long distance calling”).
I use an LLM tool that shows the cost associated with each message/request and most are pennies each. There’s a point where the friction of paying is a disincentive to using it. Imagine you had to pay $0.01 every time you Google searched something? Most people would never use the product because trying to pay $0.30/mo for one day a month of usage is annoying. And no one would want to prepay and fund an account if you weren’t familiar with the product. No consumer likes micro transactions
No one wants to hear this, but the answer is advertising and it will change the game of LLMs. Once you can subsidize the lowest end usage, the incentive for businesses to offer these $20 subscriptions will change, and they’d charge per-usage rates for commercial users.
troupo
> you should just subscribe to a preset allotment of tokens at a certain price
The problem is that there's no way to gauge or control token usage.
I have no idea why Claude Code wrote that it consumed X tokens now, and Y tokens later, and what to do about it
CodeBrad
I think Claude Code also already has the option to provide an API key directly for usage based pricing.
I'm a fan of having both a subscription and a usage based plan available. The subscription is effectively a built in spending limit. If I regularly hit it and need more value, I can switch to an API key for unlimited usage.
The downside is you are potentially paying for something you don't use, but that is the same for all subscription services.
raincole
Giving how expensive Claude Code is if you use API key, I think it's safe to assume the subscription model is bleeding money out.
tqwhite
I use API but think about getting the $100/mo plan and using API for overflow if it occurs.
But I have slow months and think that might not actually be the winner. Basically I'm going to wait and see before I sign up for auto-pay.
jononor
Investors love MRR/ARR, so I do not think we will see that as the main option anytime soon. That said, you can use the Claude API to get usage-based billing.
bananapub
> Mask off completely and just make it completely usage based for everyone.
you can already pay per token by giving Claude Code an API key, if you want.
thus, the subtext of every complaint on this thread is that people want "unlimited" and they want their particular use to be under whatever the cap is, and they want it to be cheap.
Wowfunhappy
No, I'm explicitly not saying that! I'm saying that I'd really like the rolling window to be less than a full week, because that's such a long time to wait if I exhaust the limit!
thorum
The long term is unlimited access to local LLMs that are better than 2025’s best cloud models and good enough for 99% of your needs, and limited access to cloud models for when you need to bring more intelligence to bear on a problem.
LLMs will become more efficient, GPUs, memory and storage will continue to become cheaper and more commonplace. We’re just in the awkward early days where things are still being figured out.
pakitan
I'm often using LLMs for stuff that requires recent data. No way I'm running a web crawler in addition to my local LLM. For coding it could theoretically work as you don't always need latest and greatest but would still make me anxious.
data-ottawa
That’s a perfect use case with MCP though.
My biggest issue is local models I can run on my m1/m4 mbp are not smart enough to use tools consistently, and the context windows are too small for iterative uses.
The last year has seen a lot of improvement in small models though (gemma 3n is fantastic), so hopefully it’s only a matter of time.
qiller
I'm ok using a limited resource _if_ I know how much of it I am using. The lack of visible progress towards limits is annoying.
steveklabnik
npx ccusage@latest
I'm assuming it'll get updated to include these windows as well. Pass in "blocks --live" to get a live dashboard!
data-ottawa
Oh wow, this showed me the usage stats for the period before ccusage was installed, that’s very helpful especially considering this change.
ETA: You don’t need to authenticate or share your login with this utility, basically zero setup.
mtmail
Package page (with screenshot) https://www.npmjs.com/package/ccusage
bravura
Does ccusage (or claude code with subscription) actually tell you what the limits are or how close you are too them?
blalezarian
Totally agree with this. I live in constant anxiety not knowing how far into my usage I am in all the time.
flkiwi
It's not exactly the same thing, but imagine my complete surprise when, in the middle of a discussion with Copilot and without warning, it announced that the conversation had reached its length limit and I had to start a new one with absolutely no context from the current one. Copilot has many, many usability quirks, but that was the first that actually made me mad.
jononor
ChatGPT and Claude do the same. And I have noticed that model performance can often degrade a lot before such a hard limit. So even when not hitting the hard limit, splitting out to a new session can be useful. Context management is the new prompt engineering...
stronglikedan
The craziest thing to me is that it actually completely stopped you in your tracks instead of upselling you on the spot to continue.
andix
I guess you need to get used to it. LLM token usage directly translates to energy consumption. There are also no flat fee electricity plans, it doesn't make any sense.
idunnoboutthat
that's true of everything on the internet.
andix
Yes, but for most things it's not significant.
For example Stack Overflow used to handle all their traffic from 9 on-prem servers (not sure if this is still the case). Millions of daily users. Power consumption and hardware cost is completely insignificant in this case.
LLM inference pricing is mostly driven by power consumption and hardware cost (which also takes a lot of power/heat to manufacture).
tracker1
An "AI" box with a few high end gpu/npu cards takes more energy in a 4u box than an entire rack of commodity hardware takes. It's not nearly comparible... meaning entirely new and expansive infrastructure costs to support the high energy. That also doesn't count the needs for really high bandwidth networking to these systems. Not to mention the insanely more expensive hardware costs.
The infrastructure and hardware costs are seriously more costly than typical internet apps and storage.
jm4
Blame the idiots who abused it. Like that guy who posted a video a couple weeks ago where he had like 6 instances going nonstop and he was controlling it with his voice. There was some other project posted recently that was queuing up requests so that you could hit the limits in every time block. I've seen reddit posts where people were looking for others to share team accounts. It's always the morons who ruin a good thing.
Unless/until I start having problems with limits, I'm willing to reserve judgment. On a max plan, I expect to be able to use it throughout my workday without hitting limits. Occasionally, I run a couple instances because I'm multitasking and those were the only times I would hit limits on the 5x plan. I can live with that. I don't hit limits on the 20x plan.
xtracto
This is actually so fascinating to me. I remember when we had metered & very expensive long distance calls, "metered" dial-up Internet (Todito Card!), then capped DSL internet, then metered Mobile calls, SMSs and then Metered Mobile internet (that last one we still do).
The stuff that we do now, my 13 year old self in 1994 would never dream of! When I dialed my 33.6kbps modem and left it going the whole night, to download an mp3.
It's exciting that nowadays we complain about Intelligent Agents bandwidth plans!! Can you imagine! I cannot imagine the stuff that will be built when this tech has the same availability as The Internet, or POTS!
strictnein
Confused on the Max 5x vs Max 20x. I'm on the latter, and in my email it says:
> "Most Max 20x users can expect 240-480 hours of Sonnet 4 and 24-40 hours of Opus 4 within their weekly rate limits."
In this post it says:
> "Most Max 5x users can expect 140-280 hours of Sonnet 4 and 15-35 hours of Opus 4 within their weekly rate limits."
How is the "Max 20x" only an additional 5-9 hours of Opus 4, and not 4x that of "Max 5x"? At least I'd expect a doubling, since I'm paying twice as much.
thomasfromcdnjs
Would love more feedback on this, I will definitely downgrade from Max 20x if it is the case. Cost me $350 a month in Australia...
akmarinov
I upgraded to 20x because i was constantly running against Opus limits and now it seems the 20x is almost equal to the 5x in that regard
lvl155
This is why I stopped using the MAX. Downgraded to Pro and started using o3 and others via API. I really don’t need that many hours to game plan in the beginning. At most it will cost me $10 between o3, Gemini, and Opus per project. There are new model releases every couple of weeks and I’d hate to get stuck with just one provider.
foota
You're paying for prioritization during high traffic periods, not for 2x usage.
strictnein
That's not what they claim:
https://www.anthropic.com/pricing
> Max
> Choose 5x or 20x more usage per session than Pro*
> Higher output limits for all tasks
> Priority access at high traffic times
That first bullet pretty clearly implies 4x the usage and the last one implies that Max gets priority over Pro, not that 20x gets priority over 5x.foota
That is sort of what it implies, but I don't think that's what's actually happening on the backend. I was looking at this yesterday though and I agree that it's all a bit hand-wavy. I feel for them somewhat though because it's hand-wavy because it's a difficult problem to solve. They're essentially offering spot instances.
yobid20
Someone should do a study then file a class action if their marketing material is false.
gabriel666smith
I've been tracking usage in my first month of "20x" Max (which was, unfortunately, this month). Depending on how this usage is amortised (working days, which is what matters to me, or 5 hour periods, or I guess, now weeks..?) their marketing material has been varying degrees of false. This has ranged from 'a bit false' to 'extremely false'.
That is true both on a relative scale ("20x") compared to my previous use of the $20 plan, but - more dishonestly, in my opinion - absolutely false when comparing my (minimal, single-session, tiny codebase) usage to the approximate usage numbers quoted in the marketing materials. The actual usage provided has regularly been 10% of the quoted allowance before caps are hit.
I have had responses from their CS team, having pointed this out, in the hope they would _at least_ flag to users times that usage limits are dramatically lower so that I can plan my working day a little better. I haven't received any sort of acknowledgement of the mismatch between marketing copy and delivered product, beyond promised "future fixes". I have, of course, pointed out that promised and hypothetical future fixes do not have any bearing on a period of paid usage that exists in the past. No dice!
I'm, unfortunately, a UK customer, and from my research any sort of recourse is pretty limited. But it has - without question - been one of the least pleasant customer experiences I've had with a company in some time, even allowing for Anthropic experiencing extremely high-growth.
Claude Code Router has been a Godsend for my usage level. I'm not sure I can justify the time and effort to care and pursue Anthropic's functional bait-and-switch offering more than I already have, because being annoyed about things doesn't make me happy.
But I completely second this: it's not acceptable to sell customers a certain amount of a thing - then and deliver another - and I hope US customers (who I believe should have more recourse) take action. There are few other industries where "it's a language and compute black box!" would be a reasonable defence, and I think it sets a really bad precedent going forward for LLM providers.
One might imagine that Anthropic's recent ~$200m US gov contract (iirc) might allow for a bit of spare cash to, for example, provide customers with the product they paid for (let alone refund them, where necessary) but that does not seem to be the case.
It makes me sad to see a great product undermined like this, which is, I think, a feeling lots of people share. If anyone is actually working towards wider recourse, and would find my (UK) usage data useful, they're very welcome to get in touch.
serf
200 bucks a month isn't enough. Fine. Make a plan that is enough so that I will be left alone about time limits and enforced breaks.
NOTHING breaks flow better than "Woops! Times up!"; it's worse than credit quotas -- at least then I can make a conscious decision to spend more money or not towards the project.
This whole 'twiddle your thumbs for 5 hours while the gpus cool off' concept isn't productive for me.
'35 hours' is absolutely nothing when you spawn lots of agents, and the damn thing is built to support that behavior.
Aurornis
> '35 hours' is absolutely nothing when you spawn lots of agents, and the damn thing is built to support that behavior.
I wouldn't call "spawning a lot of agents" to be a typical use case of the personal plan.
That was always in the domain of switching to a pay as you go API. It's nice that they allowed it on the fixed rate plans, but those plans were always advertised as higher limits, not unlimited.
nojs
It’s literally recommended in “Best practices”: https://www.anthropic.com/engineering/claude-code-best-pract...
ankit219
API has fewer limits, and practically limitless. Claude is also on Aws and gcp, where you get more quotas (probably credits as well) and different rate limits.
ChadMoran
This. Optimize for the good actors, not the bad ones.
chomp
Just use the API
whalesalad
I use the API. Just pay-per-use. Refill it $100 at a time.
nojito
Use the API.
strictnein
The API is far more expensive. For Opus 4 it's almost priced in a way that says "don't use this".
chomp
That’s not what the parent commenter asked though, they wanted a price for not being concerned about limits. The API pricing is that.
awestroke
I use Claude Code authenticated via the API (Anthropic Console). There's no limits for me. And I also assume API-metered requests are prioritized, so it's faster as well.
data-ottawa
The API does have limits but they’re determined by your monthly spend. I did a trial of tier 1 spend and did hit the limits, but on on tier two spending it was much much better.
https://docs.anthropic.com/en/api/rate-limits#requirements-t...
porridgebrain
[flagged]
tough
Or just use any other CLI and API provider and let Anthropic hang with their -alignment- and selling out to warmongers
actually you can keep using claude code (cli) tool with any provider, just replace the env endpoints.
garciasn
I know how to program; I just don’t have the time to do it all and my company doesn’t have the revenue to support more devs. So; this is the best way to make do with what I have.
steve_adams_86
What if they're using it to help them learn to program? There are plenty of valid uses people might have. And ultimately, it's their call, right? Capitalism and all that. I suppose the argument then is "just use the API", and sure, that's a solution. Yet it's odd to have an expensive subscription that's heavily rate limited like this.
It's a non-issue for me. When I hit the limit, which is rare, I go back to analog life where I use my head's brain to do the heavy lifting and that has all kinds of perks that Claude doesn't have. I get why people are frustrated, though.
alwillis
From Anthropic’s Reddit account:
One user consumed tens of thousands in model usage on a $200 plan. Though we're developing solutions for these advanced use cases, our new rate limits will ensure a more equitable experience for all users while also preventing policy violations like account sharing and reselling access.
This is why we can’t have nice things.
Aurornis
I worked at a startup that offered an unlimited option.
It's amazing how fast you go from thinking nobody could ever use that much of your service to discovering how many of your users are creatively abusing the service.
Accounts will start using your service 24/7 with their request rating coming at 95% of your rate limiter setting. They're accessing it from a diverse set of IPs. Depending on the type of service and privacy guarantees you might not be able to see exactly what they're doing, but it's clearly not the human usage pattern you intended.
At first you think you can absorb the outliers. Then they start multiplying. You suspect batches of accounts are actually other companies load-splitting their workload across several accounts to stay under your rate limits.
Then someone shows a chart of average profit or loss per user, and there's a giant island of these users deep into the loss end of the spectrum consuming dollar amounts approaching the theoretical maximum. So the policy changes. You lose those 'customers' while 90+% of your normal users are unaffected. The rest of the people might experience better performance, lower latencies, or other benefits because the service isn't being bombarded by requests all day long.
Basically every startup with high usage limits goes through this.
0xbadcafebee
If you launch your service without knowing how much it costs to offer your service at the maximum rate it could be used, then this will definitely happen. Engineering directors need to require performance testing benchmarks and do the math to figure out where the ceiling is. If you happen to be "lucky" enough to scale very fast, you don't want to then bang your customer's heads repeatedly on a ceiling.
tomwphillips
I think it might actually be because they're selling services at a loss.
eldenring
I don't understand why the current setup for rate limits wouldn't be sufficient to stop this kind of thing.
Tokumei-no-hito
guy was bragging about it on twitter yesterday. $13,200 of spend for his $200 account. he said he had like 4-5 opus only agents running nonstop and calling each other recursively.
clearly that's abusive and should be targeted. but in general idk how else any inference provider can handle this situation.
cursor is fucked because they are a whole layer of premium above the at-cost of anthropic / openai etc. so everyone leaves goes to cc. now anthropic is in the same position but they can't cut any premium off.
you can't practically put a dollar cap on monthly plans because they are self exposing. if you say 20/mo caps at 500/mo usage then that's the same as 480/500 (95%) discount against raw API call. that's obviously not sustainable.
there's a real entitled chanting going on too. i get that it sucks to get used to something and have it taken away but does anyone understand that just the cap/opex alone is unsustainable let alone the RD to make the models and tools.
I’m not really sure what can be done besides a constant churn of "fuck [whoever had to implement sustainable pricing], i'm going to [next co who wants to subsidize temporarily in exchange for growth]".
i think it's shitty the way it's playing out though. these cos should list these as trial periods and be up front about subsidizing. people can still use and enjoy the model(s) during the trial, and some / most will leave at the end, but at least you don't get the uproar.
maybe it would go a long way to be fully transparent about the cap/op/rdex. nobody is expecting a charity, we understand you need a profit margin. but it turns it from the entitled "they're just being greedy" chanting to "ok that makes sense why i need to pay X to have 1+ tireless senior engineers on tap".
const_cast
> clearly that's abusive and should be targeted.
You can't abuse a company by buying their services and using them to their own terms and conditions. The T&C is already stacked against you, you're in a position of no leverage.
The correct solution is what Anthropic is doing here - change the T&C so you can make money. If you offer unlimited stuff, people will use it... unlimitedly. So, don't let them call your bluff.
Tokumei-no-hito
we differ on the opinion that you can be abusive without breaking ToS. perhaps a charitable view is that this type of [abuse || acceptable use] helps lawyers stay employed so they can [eliminate exploitation of || more adequately describe] their ToS.
Aurornis
> guy was bragging about it on twitter yesterday. $13,200 of spend for his $200 account. he said he had like 4-5 opus only agents running nonstop and calling each other recursively.
Do you have a link?
I'm always curious to see these users after working at a startup that was the target of some creative use from some outlier customers.
jjmarr
If VCs want to give me free money in exchange for using their product, do you expect me to say no?
OtherShrezzing
This email could have been a lot more helpful if it read “in the following months, your account entered one of these rate limits: Aug 2024, Jan 2025, May 2025” or similar.
I have no idea if I’m in the top 5% of users. Top 1% seems sensible to rate limit, but top 5% at most SaaS businesses is the entire daily-active-users pool.
jjcm
They need metered billing for their plans.
All AI companies are hitting the same thing and dealing with the same play - they don't want users to think about cost when they're prompting, so they offer high cost flat fee plans.
The reality is though there will always be a cohort of absolute power users who will push the limits of those flat fee plans to the logical extremes. Startups like Terragon are specifically engineered to help you optimize your plan usage. This causes a cat and mouse game where they have to keep lowering limits as people work around them, which often results in people thinking about price more, not less.
Cursor has adjusted their limits several times, now Anthropic is, others will soon follow as they decide to stop subsidizing the 10% of extreme power users.
Just offer metered plans that let me use the web interface.
paxys
The API exists. You can generate a token and use Claude Code with it directly, no plan needed.
bravesoul2
It's 1990s shared hosting again!
richwater
> Just offer metered plans that let me use the web interface.
The problem is this would reveal how expensive it _actually_ is to service interference right now at the scale that people use it for productive things.
throwdbaaway
Another problem is that it works like a slot machine -- sometimes the code is good, most of the time the code is mediocre and full of bugs.
Last Friday I spent about $15 in 1 hour using claude code with API key, and the code doesn't really work, even though all the unit tests passed. I am not going to touch it for weeks, while the loss is fresh in my mind.
With a subscription though, you can keep on gambling, until you get a hit.
rstupek
"... and advanced usage patterns like running Claude 24/7 in the background" this is why we can't have nice things
serial_dev
All of these AI services tell everyone how amazing AI is, it can run things, solve things on its own, while the developers are drinking coffee or sleeping. Some developers could actually do that with the service they paid for, fully in agreement with the terms and now it is their fault?
OtherShrezzing
Anthropic put out a press release over the weekend describing the internal team’s hints and tips to make CC useful. The 2nd tip was “run it in a bunch of different features at once”.
null
furyofantares
Many such people have been in HN threads bragging about having servers running 24/7 and how they're getting $10,000 worth of compute (based on API pricing) for $200 per month. If anyone doing that is surprised that it wasn't going to last, then lmfao.
ohdeargodno
"they paid for"
$100 doesn't even cover the electricity of running the servers every night, they were abusing a service and now everyone suffers because of them.
serial_dev
It is still not the users fault, pricing is not their responsibility. As a user, I check the price and what the service offers, then I subscribe and I use it. If these users did something illegal or breaking some conditions, any service would be free to block them. But they didn’t, meaning the AI tools promised too much for the price so they update their conditions, they are basically figuring out the pricing.
I don’t know what is there to be mad about, and using dramatic language like “everyone suffers because of them”
Dylan16807
Any good numbers on what it costs? I can look up how many watts a GPU needs but I don't know how the batching is typically done to understand how many users are sharing those watts.
But a compute-focused datacenter is probably not paying more than 10 cents per kWh, so $100 would pay for more than a 24/7 kilowatt of GPU plus cooling plus other overhead.
Modified3019
Yeah that part made me laugh. Clearly the work of Benevolent World Exploders trying to hasten the heat death of the universe.
taylorbuley
I imagine this was not surprising. This had to have been well-considered by the teams in the first round of pricing. I'm guessing they just didn't want it to be a blocker for release and the implementation is now catching up.
bad_haircut72
They set the pricing how is this even wrong - I will run my claude subscription non stop until they cut me off, I paid for it they should honor what they sold. And yes Im a max subscriber who still frequently hits limits
volkk
i mean, this is exactly how price discovery works. if you give loose usage requirements, you'll have actors who take full advantage of it. not on the people using it but ultimately on the company that pretends they can sustain something like this, and then claw back the niceties
404mm
I got the same email (for my Pro account). And all the limits they set have nothing to do with their reason for setting them. Pro is so limited already that people “running 24/7” is a total nonsense.
0xbadcafebee
Possibly dumb suggestion, but what about adaptive limits?
Option 1: You start out bursting requests, and then slow them down gradually, and after a "cool-down period" they can burst again. This way users can still be productive for a short time without churning your servers, then take a break and come back.
Option 2: "Data cap": like mobile providers, a certain number of high requests, and after that you're capped to a very slow rate, unless you pay for more. (this one makes you more money)
Option 3: Infrastructure and network level adaptive limits. You can throttle process priority to de-prioritize certain non-GPU tasks (though I imagine the bulk of your processing is GPU?), and you can apply adaptive QoS rules to throttle network requests for certain streams. Another one might be different pools of servers (assuming you're using k8s or similar), and based on incoming request criteria, schedule the high-usage jobs to slower servers and prioritize faster shorter jobs to the faster servers.
And aside from limits, it's worth spending a day tracing the most taxing requests to find whatever the least efficient code paths are and see if you can squash them with a small code or infra change. It's not unusual for there to be inefficient code that gives you tons of extra headroom once patched.
swalsh
That's fine, please make it VERY CLEAR how much of my limit is left, and how much i've used.
steve_adams_86
I'm well within the 95%. I might lack an imagination here, but... What are you guys doing that you hit or exceed limits so easily, and if you do... Why does it matter? Sometimes I'd like to continue exploring ideas with Claude, but once I hit the limit I make a mental note of the time it'll come back and carry on planning and speccing without it. That's fine. If anything, some time away from the slot machine often helps with ensuring I stay on course.
throwup238
Opus + extended thinking + deep research = 3-5 messages/reports per five hour limit. That’s the fastest way I’ve found to blow through the Pro plan.
Some stuff I’ve used it for in the last day: figuring out what a family member needs for FAFSA as a nontraditional student, help identify and authenticate some rare first editions and incunabula for a museum collection I volunteer at, find a list of social events in my area (based on my preferences) that are coming up in the next week (Chatgpt Agent works surprisingly well for this too), adapting Directus and Medusa to my project’s existing schema and writing up everything I need to migrate, and so on.
Deep research really hits the Claude limits hard and that’s the best way to avoid hallucinations when asking an important question or making it write complex code. I just switch from Claude to ChatGPT/Gemini until the limits reset but Claude’s deep research seems to handily beat Gemini (and OpenAI isnt even in the running). DR queries take much longer (5-10 min in average) but have much more in depth and accurate answers.
steve_adams_86
I hadn't considered that. I'm using it almost exclusively to validate logic, kind of like a fuzzer in nature ("What if we need to do this with this logic/someone tries to do that/what am I missing/etc"), or to fill in specifications ("what feature would compliment this/what could be trimmed to achieve MVP more easily/does this spec appear to be missing anything according to this set of requirements"), which requires a lot of review, and using more expensive models like Opus doesn't appear to provide meaningfully better results. After prompting it, I typically have a lot to think about and the terminal goes quiet, or I prompt it on a related matter that will similarly require my eyes and brain for long enough that I won't be able to limit out.
I can see how work involving larger contexts and deeper consideration would lead to exhausting limits a lot faster though, even if you aren't using it like a slot machine.
theshrike79
"find a list of social events in my area"
Isn't this something you can do with a simple Google search? Or Perplexity?
No need to shove by far the most expensive LLM (Claude Opus 4) at it.
throwup238
Not for the Los Angeles metro area. There isn’t a single calendar or event aggregator that covers the entire area and with an LLM I can give it complex schedules (i.e. a dump of my calendar for that week) and preferences to filter the list of events for the stuff I like, including vague stuff like “I like country music in the style of ‘Take Me Home, Country Roads’ but not modern country radio”.
Terretta
When you say not even in the running, is that including Deep Research on o3-pro?
throwup238
I haven't tried o3-pro, but my fundamental problem with ChatGPT Deep Research is that it only searches for a few dozen sources, whereas Claude and Gemini regularly use 400+ sources.
zarzavat
I agree. I'm on the base plan, yet to hit any limits. The bottleneck is my ability to review the code it writes, and to write prompts detailed enough for the output to be useful to me.
I assume that the people hitting limits are just letting it cycle, but doesn't that just create garbage if you don't keep it on a tight leash? It's very eager but not always intelligent.
loufe
Switching to Opus is an eye-opening experience. You hit limits often, and need to get creative to avoid burning through limits, but the difference is seriously impressive. You'll waste a lot less time with dead ends and bad code.
steve_adams_86
I think this is it. They use it like a slot machine, and when something isn't quite what they wanted, they provide broad instructions to refine and do better. Progress is slow and questionable, but anticipation and (sometimes) reward is increased.
The issue could be, in part, that a lot of users don't care to be efficient with token usage and maintaining condensed, efficient, focused contexts to work with.
Tokumei-no-hito
i wonder how many are negligent vs ignorant. negligence would be senior engineers that could scope and work with context properly but are lazy or don't care. ignorance would be vibe coders that genuinely can't express anything beyond plain english and superficial descriptions of issues and changes.
mendor
I've found that asking for deep research consumes my quota quite fast, so If I run 2 or 3 and normal use I hit the limit and have to wait to reset
steve_adams_86
Me too. I've also found that even when trying to restrict models meant for these tasks, they tend to go on tangents and waste tremendous amounts of tokens without providing meaningfully better outputs. I'm not yet sold on these models for anything outside of fuzzy tasks like "does this logic seem sound?". They tend to be good at that (though they often want to elaborate excessively or propose solutions excessively).
SaucyWrong
One way I've seen personally is that folks are using tools that drive many Claude Code sessions at once via something like git-worktree as a way of multitasking in a single codebase. Even with garden-variety model use, these folks are hitting the existing 5-hourly rate limits routinely.
steve_adams_86
I use this approach because I like to work on features or logical components in isolation and then bring them together. I still can't limit out most of the time because I need to actually look at the outputs and think about what I'm building. At the moment I have 3 directories in my work tree. Sometimes I prompt in more than one at a time, especially at interfacing code, but that could mean 30–90 minutes of reviewing and implementing things in each directory. Over a work day I apparently send an average of ~40 messages according to `claude --resume`
bad_haircut72
Im not a formula 1 driver but why do they have those big padel things on the back? looks dumbo IMHO I just dont get it
steve_adams_86
I respectfully consider this analogy void, but welcome an explanation of why I'm wrong.
I haven't yet seen anyone doing anything remarkable with their extensive use of Claude. Without frequent human intervention, all of it looks like rapid regression to the mean, or worse.
null
null
Hi there,
Next month, we're introducing new weekly rate limits for Claude subscribers, affecting less than 5% of users based on current usage patterns.
Claude Code, especially as part of our subscription bundle, has seen unprecedented growth. At the same time, we’ve identified policy violations like account sharing and reselling access—and advanced usage patterns like running Claude 24/7 in the background—that are impacting system capacity for all. Our new rate limits address these issues and provide a more equitable experience for all users.
What’s changing: Starting August 28, we're introducing weekly usage limits alongside our existing 5-hour limits: Current: Usage limit that resets every 5 hours (no change) New: Overall weekly limit that resets every 7 days New: Claude Opus 4 weekly limit that resets every 7 days As we learn more about how developers use Claude Code, we may adjust usage limits to better serve our community. What this means for you: Most users won't notice any difference. The weekly limits are designed to support typical daily use across your projects. Most Max 5x users can expect 140-280 hours of Sonnet 4 and 15-35 hours of Opus 4 within their weekly rate limits. Heavy Opus users with large codebases or those running multiple Claude Code instances in parallel will hit their limits sooner. You can manage or cancel your subscription anytime in Settings. We take these decisions seriously. We're committed to supporting long-running use cases through other options in the future, but until then, weekly limits will help us maintain reliable service for everyone.
We also recognize that during this same period, users have encountered several reliability and performance issues. We've been working to fix these as quickly as possible, and will continue addressing any remaining issues over the coming days and weeks.
–The Anthropic Team