Blocking LLMs from your website cuts you off from next-generation search
55 comments
·August 6, 2025jerf
andrewmutz
Why would removing your content from LLM training data cause people to go and seek it out directly from you?
Would removing your website from google search results cause people to go directly to your website?
piker
Fair, if your content is your product, but I’m more than happy for every LLM on the planet to summarize my page and hype the virtues of my product to its user.
Disposal8433
Why do tech bros assume that every site is selling a product? There are blogs, personal web sites, communities, and open-source projects out there.
piker
If there's no product, and it's free, why would one care about it appearing in the output of an LLM? If it's so secret that it shouldn't, then perhaps it should be behind some auth anyway.
shortstuffsushi
I'm surprised I don't see any comments here to this effect yet: isn't this just AMP 2.0? Website authors don't want their content scraped and rehosted by a 3rd party, even when that 3rd party claims it's for their own benefit. We have a whole kerfuffle about this nearly a decade ago. The arguments for both sides don't appear to have changed.
nerdjon
That is basically several paragraphs to just say "well you should just adapt to the new world instead of pushing against bad practices". There is barely any "why" actually said here.
We just had the article about how AI search is leading to less clicks, so where is that supposed "pipeline"?
Also completely ignores how you may not want your information to be misconstrued (lied basically) to the user with a helpful link telling them where the source is, but they may never click through. And worse if they know that the information being told to them is wrong, they may then think it was because your site was wrong and trust you less, all without ever clicking that link.
riffraff
> LLMs are the next generation’s search layer. They’re already generating massive amounts of pipeline for the companies and websites that have gotten good at getting their content displayed in LLMs
[citation needed]
cpursley
Just check your analytics dashboards and see where hits are now starting to come from. Saw on LinkedIn the other day that in the space I serve that a new customer found them via ChatGPT.
eric-burel
The first sentence of the article is literally wrong as it conflates LLM and the search part of a RAG (retrieval augmented generation, when you mix a web search and an LLM). Blocking bots cuts you off from the next-generation search, because it cuts you off from search at all. So far, blocking LLM simply prevents you from being part of the training dataset, which is not the same thing. Please stop upvoting such bad content it really makes Hackernews a terrible place for staying informed on LLMs.
bellBivDinesh
Incredibly simplistic. I’m having a hard time believing a real person wrote this, read it over and decided they had made anything resembling a point.
How about the fact that Google (ideally) sends users to you rather than sharing your work unattributed?
endemic
Heck, Google mostly shows "AI Summaries" and ads -- you'd be lucky to get traffic from 'em now!
ryandrake
Like everything else on the web, LLMs are going to eventually be ruined by marketing teams trying to get them to say "Pepsi" instead of "Coke."
tartoran
Long live local LLMs!
jdiff
Open models still have to get their data from somewhere, the only way they're any more immune is direct corruption. But marketers have shown time and time again that if there's any algorithmic crack in the wall, they will find it.
SideburnsOfDoom
I don't know what you mean ... by including "eventually" in that sentence.
ayaros
Screw this. I didn't put effort into writing many paragraphs of content for my own websites just so it could be summarized by an LLM. I wrote it because I wanted other human beings to read it.
This is just yet another person running an AI company telling me why I should provide free data and labor to the LLMs that power their company. These AI companies are acting as middlemen between the end-user and the content creator; its the latest iteration of an age-old business model which works-out great for the middlemen. Meanwhile, people on either side are taken advantage of.
If the "next-generation" of search is accessed mostly through an LLM, then there's no incentive to participate in it unless you're directly selling a product or service... and then you have to hope and pray the LLM doesn't lie and misrepresent you. Otherwise, if you're making a website to share information or show off your own work, there's zero incentive to participate.
If AI companies want to pay me cold hard cash every time they query my site, then we can negotiate.
mflaherty22
Very reductionist - so much so that I'm not even sure you understand why websites block LLMs.
JSR_FDED
Nonsensical article. Even if your goal is to create something on the web “for others” (as the article asserts), when 99.9% of your costs go to serving LLM crawlers, it puts that very objective at risk.
merelysounds
> most LLMs have an agentic web-search component that will actively generate links
I guess that’s the problem - search being only a component.
Is the possible search traffic worth having your content become part of an LLM’s training set and possibly used elsewhere?
I guess the answer depends on the content and the website’s business model.
Mars008
Instead of fighting you can submit text advertising to LLM bots. And sell it. This 'knowledge' will be embedded into next models.
This post gets the reason why people are cutting off LLMs exactly backwards and consequently completely fails to address the core issue. The whole reason people are blocking LLMs is precisely that they believe it kills the flow of readers to your content. The LLMs present your ideas and content, maybe with super-tiny attribution that nobody notices or uses [1], maybe with no attribution at all, and you get nothing. People are blocking LLMs with the precise intent of trying to preserve the flow to their content, be it commercially, reputationally, whatever.
[1]: https://www.pewresearch.org/short-reads/2025/07/22/google-us...