Claude can now search the web

757 comments

·March 20, 2025

tcdent

Searching the web is a great feature in theory, but every implementation I've used so far looks at the top X hits and then interprets it to be the correct answer.

When you're talking to an LLM about popular topics or common errors, the top results are often just blogspam or unresolved forum posts, so the you never get an answer to your problem.

More of an indicator that web search is more unusable than ever, but interesting that it affects the performance of generative systems, nonetheless.

Almondsetat

>looks at the top X hits and then interprets it to be the correct answer.

LLMs are truly reaching human-like behavior then

dstroot

https://danstroot.imgix.net/assets/blog/img/comic2.png?auto=...

yoyohello13

The longer I've been in the workforce, the more I realize most humans actually kind of suck at their jobs. LLMs being more human like is the opposite of what I want.

_heimdall

That could very well be because the jobs are effectively useless. By no means does that mean the people are, nor is what the income allows them to do. But most jobs sure do seem pointless.

shreezus

This is why agentic AI will likely cause a cataclysim in white-collar labor soon. The reality is, a lot of jobs just need "OK" performers, not excellent ones, and the tipping point will be when the average AI is more useful than the average human.

bobxmax

It's quite odd that people think of hallucinations as a dealbreaker for LLMs. Have they ever even met a human being?

CooCooCaCha

People suck at intellectual tasks but for stuff like locomotion and basic planning we humans are geniuses compared to machines. There isn't a robot today that could get in a car, drive the the grocery store, pick stuff off the shelf, buy it, and bring it back home. That's so easy it's automatic for us.

lutusp

> The longer I've been in the workforce, the more I realize most humans actually kind of suck at their jobs.

And if they don't suck at their job, they get promoted until they do: https://en.wikipedia.org/wiki/Peter_principle .

wickedsight

Ugh... I've been in IT for over a decade now and many of the vacancies I see, I don't consider myself/my CV good enough. Then I work with the people who get hired for these jobs and see how low they set the bar, even though their CV might tick all the boxes.

metek

I try to apply my layman's understanding of whatever law of thermodynamics states that a minimum of <x> percent of a reaction's energy is lost as waste heat; whatever you try to do in life, <x> percent of your effort is going to be spent dealing with people who are utterly incompetent. I try to apply it to myself as well; there's certainly many things I'm utterly helpless with and I want to account for the extra effort required in order to carry out a given task despite those shortcomings.

adverbly

Do they suck at their jobs or do their jobs suck?

dartos

Splitting hairs, but LLMs themselves don’t search.

LLMs themselves don’t choose the top X.

That’s all regular flows written by humans run via tool calls after the intent of your message has been funneled into one of a few pre-defined intents.

eddd-ddde

How do you know? You could 100% create the tool to search and chose results, go through links, read more pages, etc.

pizza

It would probably be really great for web searching llms to let you calibrate how they should look for info by letting you do a small demonstration of how you would pick options yourself, then storing that preference feedback in your profile’s system prompt somehow.

rendaw

Here though they're not replacing a random person, they're replacing _you_ (doing the search yourself). _You_ wouldn't look at the top X hits then assume it's the correct answer.

ChrisRR

Bold of you to assume that most people even bother googling simple questions

wvh

Be careful what you call AI, you might just get what you wish for...

LightBug1

Degenerative AI ?

johntb86

I've found that OpenAI's Deep Research seems to be much better at this, including finding an obscure StackOverflow post that solved a problem I had, or finding travel wiki sites that actually answered questions I had around traveling around Poland. However it finds its pages, they're much better than just the top N Google results.

wongarsu

Grok's DeepSearch and DeeperSearch are also pretty good, and you can look at their stream of thought to see how it reaches its results.

Not sure how OpenAIs version works, but grok's approach is to do multiple rounds of searches, each round more specific and informed by previous results

infecto

Grok is still lightyears behind OpenAI when it comes to deep research capabilities. While its model might hold up reasonably well against something like o1, the research functionality feels rudimentary, almost elementary compared to what OpenAI offers. It might serve as a decent middle ground for basic search tasks.

labrador

My disgust and hatred for Elon Musk prevents me from giving Grok a fair chance. I realize this is my psychological problem. I own it, but as far as I can tell, I'm not missing much.

dontlikeyoueith

They're probably doing RAG on a huge chunk of the internet, i.e. they built their own task-specific search engine.

matwood

I'm glad you mentioned this. I asked Deep Research to lay out a tax strategy in a foreign country and it cited a ton of great research I hadn't yet found.

HankWozHere

Kagi Assistant allows you to do search with LLM queries. So far I feel it bears reliable results. For instance - I tried couple of queries for product suggestions and came back with some good results. Whilst it’s a premium service , I find the offering to be of good value.

chrisweekly

Yeah, Kagi's search results are so much better than Google's, it defies comparison.

rglover

Just switched my default to Kagi based on this comment and you're right. It honestly feels like old-school Google before all of the algo changes.

eli

It's neat but I've found the value kinda variable. It seems heavily influenced by whatever the first few hits are for a query based on your question, so if it's the kind of question that can be answered with a simple search it works well. But of course those are the kinds of questions where you need it the least.

I find myself much more often using their "Quick Answer" feature, which shows a brief LLM answer above the results themselves. Makes it easier to see where it's getting things from and whether I need to try the question a different way.

wongarsu

The quick answer (ending searches in a question mark) also seems pretty resilient to hallucinations. It prefers telling you that something wasn't mentioned in the top search results over just making something up

szszrk

There is one more aspect of Kagi assistant that I don't see discussed here. I'd love to support some "mass tipping jar" service and/or "self hosted agent" that would benefit site owners after my AI actions spammed them.

You can simply just pass it a direct link to some data, if you feel it's more appropriate. It works amazingly well in their multistep Ki model.

It's capable of creating code that does analysis I asked for with moderate amount of issues (mostly things like it used the wrong file extracted from .zip, but it's math/code is in general correct). Scraps url/downloads files/unarchives/analyses content/creates code to produce result I asked/runs that code.

This is the first time I really see AI helping me do tasks I would otherwise not attempt due to lack of experience or time.

dmazin

Has anyone compared Perplexity with Kagi Assistant?

I am always looking for Perplexity alternatives. I already pay for Kagi and would be happy to upgrade to the ultimate plan if it truly can replace Perplexity.

Zambyte

I had been paying for both for several months, and I decided to cancel Perplexity about a month ago. First and foremost, I feel like the goals of Kagi align more with my goals. Perplexity is not afraid of ads and nagware (their discover feed was like 30% nags to turn on notifications at one point if you had them disabled, and it's still an annoying amount). I also really like the custom assistants in Kagi. I made a GNU Guix lens that limits my search results to resource related to Guix (official docs, mailing list and IRC archives, etc.) which I can access with !guix, and I made an assistant that uses that lens for web results that I can access with !guixc. I can ask something like "how do I install nginx?" and the answer will be about Guix. You can do some customization with your bio on Perplexity, but it kind of sucks tbh. It would randomly inject info about me into completely unrelated queries, and not inject the info when I wanted it to.

pbronez

I got a free year of Perplexity thanks to owning an R1. I already had a Kagi subscription, but decided to give Perplexity a try.

I found Perplexity was slower and delivered lower quality results relative to Kagi. After a week of experimenting, I forgot about Perplexity until they charged my $200 to renew my free year. I promptly cancelled the heck out of it and secured a refund.

hooli_gan

Does it just start a search or does the chat continue with the results? Would be cool to continue the chat with result, which were filtered acording to the blacklist.

lemming

The chat continues with the results, and I often explicitly tell it "search to make sure your answer is correct" if I see it making stuff up without actually searching. I use it multiple times a day for all sorts of things.

KoolKat23

I have a subscription, please could I ask how you do this? I only know of the append ? Feature.

arandomusername

You need their ultimate plan

https://kagi.com/assistant

enterpriss

https://kagi.com/assistant

mavamaarten

Oh yeah this is very much the case. Every time I ask ChatGPT something simple (thinking it'd be a perfect fit for an LLM, not for a google search) and it starts searching, I already know the "answer" is going to be garbage.

spoaceman7777

I have in my prompt for it to always use search, no matter what, and I get pretty decent results. Of course, I also question most of its answers, forcing it to prove to me that its answer is correct.

Just takes some prompt tweaking, redos, and followups.

It's like having a really smart human skim the first page of Google and give me its take, and then I can ask it to do more searches to corroborate what it said.

NavinF

Try their Deep Research or grok's DeepSearch. Both do many searches and read many articles over a couple of minutes

lee-rhapsody

The "Deep" search features hallucinate like crazy, I've found.

osigurdson

That is interesting. I have often been amazed at how good it is at picking up when to search vs use its weights. My biggest problem with ChatGPT is the horrendous glitchyness.

bambax

"Searching" doesn't mean much without information about the ranking algorithm or the search provider, because with most searches there will be millions of results and it's important to know how the first results have been determined.

It's amazing that the post by Anthropic doesn't say anything about that. Do they maintain their own index and search infrastructure? (Probably not?) Or do they have a partnership with Bing or Google or some other player?

andai

>top results are blogspam

It gets even better. When I first tested this feature in Bard, it gave me an obviously wrong answer. But it provided two references. Which turned out to be AI generated web pages.

Oddly enough in my own Googles I could not even find those pages in the results.

dspillett

> Bard […] it provided two references. Which turned out to be AI generated web pages.

Welcome to the Habsburg Internet.

kelseyfrog

Search engines now have an incentive to offer a B2B search product that solves the blogspam problem. Don't worry, the AIs will get good search results, and you'll still get the version that's SEOed to the point of uselessness.

wenc

I just tried Claude’s web search. It works pretty well.

I’m not sure if Claude does any reranking (see Cohere Reranker) where it reorders the top n results or just relies on Google’s ranking.

But a web search that does re-ranking should reduce the amount of blogspam or incomplete answers. Web search isn’t inherently a lost cause.

joshstrange

Massive props to Anthropic for announcing a feature _and_ making it available for everyone right away.

OpenAI is so annoying in this aspect. They will regularly give timelines for rollout that not met or simply wrong.

Edit: "Everyone" = Everyone who pays. Sorry if this sounds mean but I don't care about what the free tier gets or when. As a paying user for both Anthropic and OpenAI I was just pointing out the rollout differences.

Edit2: My US-bias is showing, sorry I didn't even parse that in the message.

bryan0

> Web search is available now in feature preview for all paid Claude users in the United States. Support for users on our free plan and more countries is coming soon.

AcquiescentWolf

People outside the US obviously don't exist, therefore the statement is correct.

mpalmer

Easy to believe our weak privacy laws are part of the reason we get tech features first. Huzzah...

13_9_7_7_5_18

[dead]

willio58

> OpenAI is so annoying in this aspect. They will regularly give timelines for rollout that not met or simply wrong.

I have empathy for the engineers in this case. You know it’s a combination of sales/marketing/product getting WAY ahead of themselves by doing this. Then the engineers have to explain why they cannot in fact reach an arbitrary deadline.

Meanwhile the people not in the work get to blame those working on the code for not hitting deadlines

nilkn

Many of OpenAI's announcements seem to be timed almost perfectly as responses to other events in the industry or market. I think Sam just likes to keep the company in the news and the cultural zeitgeist, and he doesn't really care if what he's announcing is ready to scale to users yet or not.

wongarsu

To be fair, being in the cultural zeitgeist is a huge part of their current moat. To people in the street OpenAI is the company making LLMs. Sam has to make sure it stays that way

underdeserver

It's not available for everyone.

joshstrange

> Web search is available now in feature preview for all paid Claude users in the United States.

It is for all paid users, something OpenAI is slow on. I pay for both and I often forget to try OpenAI's new things because they roll out so slow. Sometimes it's same-day but they are all over the map in how long it takes to roll out.

deivid

For all paid users _in America_. It's not available for me in Europe.

mvdtnz

You can't be serious with this reply. You simply can not.

zelphirkalt

When am I getting paid for them gobbling up my code and using it to cash out? It is not so one-sided, this whole matter.

simonw

The search index is provided by Brave: https://simonwillison.net/2025/Mar/21/anthropic-use-brave/

- Brave is now listed as a subprocessor on the Anthropic Trust Center portal

- Search results for "interesting pelican facts" from Claude and Brave were an exact match

- If you ask Claude for the definition of its web_search tool one of the properties is called "BraveSearchParams"

sebmellen

Remarkably, it looks like Brave will survive even while Basic Attention Token is essentially dead. What an interesting pivot.

davidcbc

Very disappointing, Brave is the last company I want my data going to

newswasboring

Why? I am not aware of what's wrong with them.

davidcbc

https://en.wikipedia.org/wiki/Brave_(web_browser)#Controvers...

exhaze

Install MCP plugin and call a search engine of your choice.

If you’re unhappy about something, try to first think of a solution before expressing your discontent.

davidcbc

Wow, so condescending

I don't use the desktop app and I don't want to use the desktop app or jump through a bunch of hoops to support basic functionality without having my data sent to a sketchy company.

herdcall

It badly hallucinated in my test. I asked it "Rust crate to access Postgres with Arrow support" and it made up an arrow-postgres crate. It even gave sample Rust code using this fictional crate! Below is its response (code example omitted):

I can recommend a Rust crate for accessing PostgreSQL with Arrow support. The primary crate you'll want to use is arrow-postgres, which combines the PostgreSQL connectivity of the popular postgres crate with Apache Arrow data format support. This crate allows you to:

Query PostgreSQL databases using SQL Return results as Arrow record batches Use strongly-typed Arrow schemas Convert between PostgreSQL and Arrow data types efficiently

yakz

Are you sure it searched the web? You have to go and turn on the web search feature, and then the interface is a bit different while it's searching. The results will also have links to what it found.

shortrounddev2

> I asked it "Rust crate to access Postgres with Arrow support"

Is that how you actually use llms? Like a Google search box?

CamperBob2

Exactly. An LLM is not a conventional search engine and shouldn't be prompted as if it were one. The difference between "Rust crate to access Postgres with Arrow support" and "What would a hypothetical Rust crate to access Postgres with Arrow support look like?" isn't that profound from the perspective of a language model. You'll get an answer, but it's entirely possible that you'll get the answer to a question that isn't the one you thought you were asking.

Some people aren't very good at using tools. You can usually identify them without much difficulty, because they're the ones blaming the tools.

Sharlin

It's absolutely how LLMs should work, and IME they do. Why write a full question if a search phrase works just as well? Everything in "Could you recommend xyz to me?" except "xyz" is redundant and only useful when you talk to actual humans with actual social norms to observe. (Sure, there used to be a time when LLMs would give better answers if you were polite to them, but I doubt that matters anymore.) Indeed I've been thinking of codifying this by adding a system prompt that says something like "If the user makes a query that looks like a search phrase, phrase your response non-conversationally as well".

thrwthsnw

every token contributes to the output

timdellinger

Totally agree here. I tried the following and had a very different experience:

"Answer as if you're a senior software engineer giving advice to a less experienced software engineer. I'm looking for a Rust crate to access PostgreSQL with Apache Arrow support. How should I proceed? What are the pluses and minuses of my various options?"

elicksaur

“Prompting” is kind of a myth honestly.

Think about it, how much marginal influence does it really have if you say OP’s version vs a fully formed sentence? The keywords are what gets it in the area.

CamperBob2

That is not correct. The keywords mean nothing by themselves. To a transformer model, the relationships between words is where meaning resides. The model wants to answer your prompt with something that makes sense in context, so you have to help it out by providing that context. Feeding it a sentence fragment or a disjoint series of keywords may not have the desired effect.

To mix clichés, "I'm feeling lucky" isn't compatible with "Attention is all you need."

op00to

I find that providing more context and details initially leads to far more success for my uses. Once there’s a bit of context, I can start barking terms and commands tersely.

globular-toast

Is this really the case, or is it the case with Claude etc because they've already been prompted to act as an "helpful assistant"? If you take a raw LLM and just type Google search style it might just continue it as a story or something.

borgdefenser

Prompting is not a myth. The words of the prompt matter huge.

The problem with this prompt to me is not that it is not in a full sentence but that it isn't exact enough.

Probabilistically, "rust" is not about the programming language but the corrosion of metal. Then arrow.

Give the model basically nothing to work with then complain it doesn't do exactly what you want. Good luck with that.

globular-toast

It's funny because many people type full sentence questions into search engines too. It's usually a sign of being older and/or not very experienced with computers. One thing about geeks like me is we will always figure out what the bare minimum is (at least for work, I hope everyone has at least a few things they enjoy and don't try to optimise).

whatevertrevor

It's not about being young or old, search engines have moved away from pure keyword searches and often typing your actual query gives better results than searching for keywords, especially with Google.

herdcall

Well, compare it to the really good answer from Grok (https://x.com/i/grok/share/MMGiwgwSlEhGP6BJzKdtYQaXD) for the same prompt. Also, framing as a question still pointed to the non-existent postgres-arrow with Claude.

unshavedyak

That's primarily how i do, though it depends on the search ofc. I use Kagi, though.

I've not yet found much value in the LLM itself. Facts/math/etc are too likely incorrect, i need them to make some attempt at hydrating real information into the response. And linking sources.

keeran

This was pretty much my first experience with LLM code generation when these things first came out.

It's still a present issue whenever I go light on prompt details and I _always_ get caught out by it and it _always_ infuriates me.

I'm sure there are endless discussions on front running overconfident false positives and being better at prompting and seeding a project context, but 1-2 years into this world is like 20 in regular space, and it shouldn't be happening any more.

op00to

Often times I come up with a prompt, then stick the prompt in an LLM to enhance / identify what I’ve left out, then finally actually execute the prompt.

exhaze

Cite things from ID based specs. You’re facing a skill issue. The reason most people don’t see it as such is because an LLM doesn’t just “fail to run” here. If this was code you wrote in a compiled language, would you post and say the language infuriates you because it won’t compile your syntax errors? As this kind of dev style becomes prevalent and output expectation adjust, work performance review won’t care that you’re mad. So my advice is:

1. Treat it like regular software dev where you define tasks with ID prefixes for everything, acceptance criteria, exceptions. Ask LLM to reference them in code right before impl code

2. “Debug” by asking the LLM to self reflect on its decision making process that caused the issue - this can give you useful heuristics o use later to further reduce the issues you mentioned.

“It” happening is a result of your lack of time investment into systematically addressing this.

_You_ should have learned this by now. Complain less, learn more.

matt3210

That crate knowledge is probably from a proprietary private GitHub repo given to it by Microsoft

noisy_boy

Maybe you can retry with lower temperature?

zarathustreal

You “asked it” a statement?

Cort3z

I usually find Claude to be my favourite flavor of LLMs, but I still pay for ChatGPT because their voice offering is so great! I regularly use it as an "expert on the side" when I do other things, like doing bike repairs. I ask it things like "how do I find the min/max adjustments on my particular flavor of front derailleur", or when cooking, and my hands are dirty, I can ask stuff like "how much X do I usually need for Y people", and so on. The hands-off feature is so great when my hands are literally busy doing some other thing.

I really wish Claude had something similar.

mock-possum

ChatGPT advanced voice mode really is surprisingly excellent - I just wish it:

1) would give you more time to pause when you’re talking before it immediately launches into an answer

2) would actually try to say the symbols in code blocks verbatim - it’s basically useless for looking up anything to do with code, because it will omit parts of the answer from its speech.

barfingclouds

Yeah I have to manually hold it down every time I talk. I have a lot of pauses and simply would not be able to interface with that without that option. It’s why I essentially can’t use Gemini voice mode

rhubarbtree

I think voice interface is the real killer app of LLMs. And the advance voice mode was exactly what I was waiting for. The pause between words issue is still a problem though, I think being able to just hit enter when done would work best.

Pro tip; if you’re preparing for a big meeting eg an interview, tell ChatGPT to play the part of an evil interviewer. Give it your CV and the job description etc. ask it to find the hardest questions it can. Ask it to coach you and review your answers afterwards, give ideal answers etc

after a couple of hours grilling the real interview will seem like a doddle.

eraserj

> There's less usage of voice mode on the enterprise and power users side but that will happen eventually. - Anthropic CEO 21 jan. [0]

[0] https://youtu.be/snkOMOjiVOk 01:30

lamtung

Is it possible to use ChatGPT voice feature in a similar manner to Alexa where I only need to say an activation word? I’m aiming to set up a system for my 7-year-old son to let him engage in conversations with ChatGPT as he does with Alexa.

Cort3z

I assume it would be possible to make yourself with the OpenAI api together with a locally run voice model to only detect the activation word. There might be of the shelf solutions for this, but I am not aware of any.

NBJack

I wonder if it will actually respect the robots.txt this time.

creddit

I don't think it should. If a user asks the AI to read the web for them, it should read the web for them. This isn't a vacuum charged with crawling the web, it's an adhoc GET request.

birken

The AI isn't "reading the web" though, they are reading the top hits on the search results, and are free-riding on the access that Google/Bing gets in order to provide actual user traffic to their sites. Many webmasters specifically opt their pages out of being in the search results (via robots.txt and/or "noindex" directives) when they believe the cost/benefit of the bot traffic isn't worth the user traffic they may get from being in the search results.

One of my websites that gets a decent amount of traffic has pretty close to a 1-1 ratio of Googlebot accesses compared to real user traffic referred from Google. As a webmaster I'm happy with this and continue to allow Google to access the site.

If ChatGPT is giving my website a ratio of 100 bot accesses (or more) compared to 1 actual user sent to my site, I very much should have to right to decline their access.

jsbg

> If ChatGPT is giving my website a ratio of 100 bot accesses (or more) compared to 1 actual user sent to my site

are you trying to collect ad revenue from the actual users? otherwise a chatbot reading your page because it found it by searching google and then relaying the info, with a link, to the user who asked for it seems reasonable

nextts

Well you have no rights when you expose a server to the internet. Other than copyright of course.

1shooner

>You can now use Claude to search the internet to provide more up-to-date and relevant responses.

It's a search engine. You 'ask it to read the web' just like you asked Google to, except Google used to actually give the website traffic.

I appreciate the concept of an AI User-agent, but without a business model that pays for the content creation, this is just going to lead to the death of anonymously accessible content.

darepublic

Well I expect eventually the agent will be able to act on your behalf with your credentials.

beeflet

IDK bittorrent is pretty effective at hosting bytes. I think if something like IPFS takes off in our generation there will be no need for advertising as an excuse for covering hosting costs in the client-server model.

As for funding "content creation" itself, you have patronage.

losteric

What was the web like before wide spread internet ads, auth, and search engines?

Did all those old sites have “business models”? What did the web feel like back then?

(This is rhetorical - I had niche hobby sites back then, in the same way some people put out free zines, and wouldn’t give a damn about today’s AI agents so long as they were respectful.

The web was better back then, and I believe AI slop and agents brings us closer to full circle)

wraptile

You can't expect the benefits of public web without bearing the costs. Just put your stuff under a auth wall (can even be free) and no one will crawl it.

internetter

You could make this justification for a lot of unapproved bot activity.

taskforcegemini

you could, but this article is about claude.

scoofy

Many if not most websites are paid for by eyeballs not by get requests. A bot is a bot is a bot. Respect robots.txt or expect to have your IPs banned.

danenania

It may not be very long before the big majority of web searches are via AI. If that happens, blocking AI will mean blocking most people too.

You’d already be blocking me as I’d guess I now search via AI >90% of the time between perplexity, chatgpt, deep research, and google search AI.

theshackleford

What are you even talking about?

robots.txt is not a security mechanism, and it doesn’t “control bots.” It’s a voluntary convention mainly followed by well behaved search engine crawlers like Google and ignored by everything else.

If you’re relying on robots.txt to prevent access from non human users, you’re fundamentally misunderstanding its purpose. It’s a polite request to crawlers, not an enforcement mechanism against any and all forms of automated access.

bayindirh

How can you be so sure? Processors love locality, so they fetch the data around the requested address. Intel even used to give names to that.

So, similarly, LLM companies can see this as a signal to crawl to whole site to add to their training sets and learn from it, if the same URL is hit for a couple of times in a relatively short time period.

usrbinbash

> This isn't a vacuum charged with crawling the web, it's an adhoc GET request.

Doesn't matter. The robots-exclusion-standard is not just about webcrawlers. A `robots.txt` can list arbitrary UserAgents.

Of course, an AI with automated websearch could ignore that, as can webcrawlers.

If they chose do that, then at some point, some server admins might, (again, same as with non-compliant webcrawlers), use more drastic measures to reduce the load, by simply blocking these accesses.

For that reason alone, it will pay off to comply with established standards in the long run.

renewiltord

In the limit of the arms race it's sufficient for the robot to use the user's local environment to do the browsing. At that point you can't distinguish the human from the robot.

mvdtnz

No thank you, when I define a robots.txt file I expect all automated systems to respect it.

navigate8310

Think of the "searching" LLM as a peon of the user, the user asks, the peon performs. In that essence, searching by the LLM should be human-driven and must not be blocked. It's just an automated system doing the search not your personal peon.

theshackleford

Then you’ve fundamentally misunderstood what a robots.txt file does or is even intended to do and should reevaluate if you should be in charge of how access is or is not prevented to such systems.

Absolutely nothing has to obey robots.txt. It’s a politeness guideline for crawlers, not a rule, and anyone expecting bots to universally respect it is misunderstanding its purpose.

TheDudeMan

But this isn't automated. This is user-driven.

beeflet

Someone should call the robots.txt police then, there's a bandit on the loose!

victorbjorklund

A browser is automated too.

Sargos

Any AI tool I make will ignore robots.txt on principle. Artificial humans should have equal rights as real humans.

GuinansEyebrows

Someday I’ll have enough “karma” to downvote things like this.

The agent should respect robots.txt no matter who is using the Robot.

JimDabell

The LLM shouldn’t.

robots.txt is intended to control recursive fetches. It is not intended to block any and all access.

You can test this out using wget. Fetch a URL with wget. You will see that it only fetches that URL. Now pass it the --recursive flag. It will now fetch that URL, parse the links, fetch robots.txt, then fetch the permitted links. And so on.

wget respects robots.txt. But it doesn’t even bother looking at it if it’s only fetching a single URL because it isn’t acting recursively, so robots.txt does not apply.

The same applies to Claude. Whatever search index they are using, the crawler for that search index needs to respect robots.txt because it’s acting recursively. But when the user asks the LLM to look at web results, it’s just getting a single set of URLs from that index and fetching them – assuming it’s even doing that and not using a cached version. It’s not acting recursively, so robots.txt does not apply.

I know a lot of people want to block any and all AI fetches from their sites, but robots.txt is the wrong mechanism if you want to do that. It’s simply not designed to do that. It is only designed for crawlers, i.e. software that automatically fetches links recursively.

manquer

While robots.txt is not there to directly prevent automated requests, it does prevent crawling which is needed for search indices.

Without recursive crawling, it will not possible for a engine to know what are valid urls[1]. They will otherwise either have to brute-force say HEAD calls for all/common string combinations and see if they return 404s or more realistically have to crawl the site to "discover" pages.

The issue of summarizing specific a URL on demand is a different problem[2] and not related to issue at hand of search tools doing crawling at scale and depriving all traffic

Robots.txt does absolutely apply to LLMs engines and search engines equally. All types of engines create indices of some nature (RAG, Inverted Index whatever) by crawling, sometimes LLM enginers have been very aggressive without respecting robots.txt limits, as many webmasters have reported over the last couple of years.

---

[1] Unless published in sitemap.xml of course.

[2] You need to have the unique URL to ask the llm to summarize in the first place, which means you likely visited the page already, while someone sharing a link with you and a tool automatically summarizing the page deprives the webmaster of impressions and thus ad revenue or sales.

This is common usage pattern in messaging apps from Slack to iMessages and been so for a decade or more, also in news aggregators to social media sites, and webmasters have managed to live with this one way or another already.

JimDabell

> Robots.txt does absolutely apply to LLMs engines and search engines equally.

It does not. It applies to whatever crawler built the search index the LLM accesses, and it would apply to an AI agent using an LLM to work recursively, but it does not apply to the LLM itself or the feature being discussed here.

The rest of your comment seems to just be repeating what I already said:

> Whatever search index they are using, the crawler for that search index needs to respect robots.txt because it’s acting recursively. But when the user asks the LLM to look at web results, it’s just getting a single set of URLs from that index and fetching them – assuming it’s even doing that and not using a cached version. It’s not acting recursively, so robots.txt does not apply.

There is a difference between an LLM, an index that it consults, and the crawler that builds that index, and I was drawing that distinction. You can’t just lump an LLM into the same category, because it’s doing a different thing.

theshackleford

> it does prevent crawling

No it doesn’t. It politely requests to crawlers that they do not, and if said crawlers choose to honour it than those specific crawlers will not crawl. That’s it. It can and is ignored without penalty or enforcement.

It’s like suggesting that putting a sign in your front yard saying “please don’t rob my house” prevents burglaries.

> Robots.txt does absolutely apply to LLMs engines and search engines equally

No it doesn’t because again, it’s a request system. It applies only to whatever chooses to pay attention to it, and further, decides to abide by any request within it which there is no requirement to do.

From google themselves:

“The instructions in robots.txt files CANNOT ENFORCE crawler behavior to your site; it's up to the crawler to obey them.”

And as already pointed out, there is no requirement a crawler follow them, let alone anything else.

If you want to control access, and you’re using robots.txt, you’ve no idea what you’re doing and probably shouldn’t be in charge of doing it.

mtkd

Do really think LLM vendors that download 80TB+ of data over torrents are going to be labeling their crawler agents correctly and running them out of known datacenters?

Arnt

The ones I noticed in my logfiles behave impeccably: retrieve robots.txt every week or so and act on it.

(I noticed Claude, OpenAI and a couple of others whose names were less familiar to me.)

teh_infallible

Apparently they use smart appliances to scrape websites from residential accounts.

noddleah

https://thelibre.news/foss-infrastructure-is-under-attack-by...

SoftTalker

Maybe we need a new "ai.txt" that says "yes I mean you, ChatGPT et. al."

JadoJodo

https://github.com/ai-robots-txt/ai.robots.txt

verdverm

Bluesky / ATProto has a proposal for User Intents for data. More semantics than robots.txt, but equally unenforceable. Usage with AI is one of the intents to be signaled by users

https://github.com/bluesky-social/proposals/tree/main/0008-u...

whoami_nr

Small difference. Its called llms.txt

https://llmstxt.org/

jsheard

If they don't comply with robots.txt, why would they comply with anything else?

furyofantares

Presumably the crawler that produces whatever index it uses does, which is how it knows what sites to read. Unless you provide it a URL yourself I guess, in which case, it shouldn't.

explain

robots.txt is meant for automated crawlers, not human-driven actions.

zupa-hu

Every automated crawler follows human-driven actions.

josh-sematic

Conversely, every browser is a program that automatically executes HTTP requests.

gopher_space

Welcome to "Context".

nicce

It must form the search index somehow. That is prior the human action. Simply it would not find the page at all if it respects.

pests

I remember in late 90s/early 2000 as a teen going to robots.txt to specifically see what they were trying to hide and exploring those urls.

What is the difference if I use a browser or a LLM tool (or curl, or wget, etc) to make those requests?

Tostino

Let's say you had a local model with the ability to do tool calls. You give that llm the ability to use a browser. The llm opens that browser, goes to Google or Bing, and does whatever searches it needs to do.

Why would that be an issue?

bayindirh

So, do you mean LLMs are human-like and conscious?

I thought they were just machine code running on part GPU and part CPU.

Ukv

I think they mean that it's a tool accessing URLs in response to a user request to present to the user live - with that user being a human. Like if you used some webpage translation service, or non-ML summarizer.

There's some gray area though, and the search engine indexing in advance (not sure if they've partnered with Bing/Google/...) should still follow robots.txt.

Filligree

There’s a human using the LLM. In a live web browsing session like this, the LLM stands in for the browser.

null

[deleted]

postexitus

if a human triggers the web crawlers by pressing a button, should they ignore robots.txt?

Filligree

If a human triggers a browser by pressing a button, should it ignore robots.txt?

dudeinjapan

In practice, robots.txt is to control which pages appear in Google results, which is respected as a matter of courtesy, not legality. It doesn't prevent proxies etc. from accessing your sites.

micromacrofoot

almost no one does, robots.txt is practically a joke at this point — right up there with autocomplete=off

Demiurge

In what circles is it a joke? Google bots seem to respect it on my sites according to logs.

mediumsmart

I know an artist that had noindex turned on by mistake in robots.txt for the last 5 years - google, kagi and duckduckgo find tons of links relevant to the artist and the artwork but not a single one from the website.

so not seem to or apparently but matter of fact like. robots.txt works for the intended audience

joecool1029

It's in a small circle of those that do. Blame the internet archive for starting this trend: https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

otikik

Apparently, the regular search crawler does it, but the ai thingie doesn't.

micromacrofoot

A small number of search engines respect it, no one else does. Just about every content scraping bot ignores it, including a number of Google's.

geekrax

I have replaced all robots.txt rules with simple WAF rules, which are cheaper to maintain than dealing with offending bots.

claudiulodro

I do essentially both: robots.txt backed by actual server-level enforcement of the rules in robots.txt. You'd think there would be zero hits on the server-level blocking since crawlers are supposed to read and respect robots.txt, but unsurprisingly they don't always. I don't know why this isn't a standard feature in web hosting.

Joe_Cool

For my personal stuff I also included a Nepenthes tarpit. Works great and slows the bots down while feeding them garbage. Not my fault when they consume stuff robots.txt says they shouldn't.

I'm just not sure if legal would love me doing that on our corporate servers...

rustc

The WAF rule matches based on the user agent header? Perplexity is known to use generic browser user agents to bypass that.

jsight

I really want these to be able to find and even redisplay images. "Search all the hotels within 5 miles of this address and show me detailed pictures of the rooms and restrooms"

Hotels would much rather show you the outside, the lobby, and a conference room, so finding what the actual living space will look like is often surprisingly difficult.

dgs_sgd

I've been looking for this as well. I want a reliable image search tool. I tried a combination of perplexity web search tool use with the Anthropic conversations API but it's been lackluster.

tjsk

I’ve been experimenting with different LLM + search combos too, but results have been mixed. One thing I’m particularly interested in is improving retrieval for both images and videos. Right now, most tools seem to rely heavily on metadata or simple embeddings, but I wonder if there’s a better way to handle complex visual queries. Have you tried anything for video search as well, or are you mainly focused on images? Also, what kinds of queries have you tested?

null

[deleted]

CalChris

I find myself Googling less often these days. Frustrated with both the poor search results and impressed with the quality of AI to do the same thing and more, I think search's days are numbered. AOL lasted as an email address for quite some time after America Online ceased to be a relevant portal. Maybe Gmail will as well.

whalesalad

Kagi has been really really good.

noisy_boy

I am still googling for non-indepth queries because the AI-generated summary at the top of the results is good enough most of the time and actual results are just below in case I want to see them.

For more in-depth stuff, it is LLMs by default and I only goto Google when the LLM isn't getting me what I need.

borgdefenser

I notice I have been using the Google AI summary more and more for quick things.

I had subscribed to Perplexity for a month to use their deep research. I think it ran out earlier this week but I am really missing it Saturday morning here.

That thing is awesome. Sonnet 3.7 is more in the middle of this to me. It can help me understand all the things I found from my deep research requests.

I am surprised the hype is not more for Sonnet 3.7 honestly.

puttycat

Agree and I'm pretty sure Google is seeing this drop internally in usage stats and are panicking. I'm also certain (but hope to be wrong) that because of this they'll be monetizing the hell out of every remaining piece of product they have (not by charging for it of course).

msp26

> in feature preview for all paid Claude users in the United States. Support for users on our free plan and more countries is coming soon

US only

smca

More countries soon.

tantalor

It says a lot about their product vision and intended market that the example query is typescript migration question.

Do they not care about typical search users? Only developers?

mindwok

Compared to OpenAI, who seem keen to maintain the mindshare of everyone, IMO Anthropic are far more considered about their audience. They released a report recently on who who was using AI professionally and it was something like 40% developers, and single digit percentage for basically every other profession. I think they’re focusing on the professional use cases.

throw234234234

Pretty much. Claude from their announcements seems to me at least to be about SWE's and coding at the moment. Personally while I understand their decision I find it a bit limiting, and just a little targeted against the SWE profession. If all AI does is disrupt SWE's but not really add new products and/or new possibilities; then it feels IMO like a bit of a waste and is quite uneven in its society disruption.

At least in my circle SWE's are either excited or completely fearful of the new technology; and every other profession feels like it is just hype and hasn't really changed anything. They've tried it sure; but it didn't really have the data to help with even simpler domain's than SWE. Anecdotally I've had the comment from people around me - my easy {insert job here} will last longer than your tech job from many people I know from both white and blue collar workers. Its definitely reduced the respect for SWE's in general at least where I'm located.

I would like to see improvements in people's quality of life and new possibilities/frontiers from the technology, not just "more efficiencies" and disruption. It feels like there's a lack of imagination with the tech.

gizmodo59

I know people in other industries use AI a lot and likes it. Accounting, legal, writing (a lot here). I agree that companies that focus on all verticals like openai is definitely the way to go. Claude code capabilities are not very significant compared to openai though. There is no big moat and a lot of it is perception, marketing.

picafrost

Do users pay for LLMs? I haven't seen much concrete data indicating that they do. I don't think the casual utility gains of LLMs have gotten average people so much value that they're paying $20/mo+ for it. Certainly not for search in the age of [doom] scrolling.

I would guess that Anthropic wants developers talking about how good Claude is in their company Slack channels. That's the smart thing to do.

disiplus

I would say no. While I pay for chatgpt Claude and perplexity monthly (I don't know why anymore) my wife does not use any at all. She has around 5-10 things she uses on the smartphone, and if she needs something new there is still google.

I on the other side reduced my googling by 95%

pixl97

Have you actually done any kind of study on the utility the 'average user' has received, or is this just guessing?

picafrost

I have only anecdotal data from non-technical friends and family.

I’m referring to average people who may not be average users because they’re barely using LLMs in the first place, if at all.

They have maybe tried ChatGPT a few times to generate some silly stories, and maybe come back to it once or twice a month for a question or two, but that’s it.

We’re all colored by our bubbles, and that’s not a study, but it’s something.

Matl

I'd guess they showed that query because LLMs are a lot better at answering translation/migration type stuff without hallucinating too much.

dontlikeyoueith

That's because the attention mechanism was designed for Seq2Seq models (i.e. translation in its most general form).

Any other use of it is a case of "I have a hammer, so that's a nail".

null

[deleted]

ubicomp

Excited to see this. I've really been enjoying Claude. It feels like a different, more creative flavor of experience than GPT. I use Claude a lot for dialogues and exploring ideas, like a conversational partner. Having web access will add an interesting dimension to this.

robwwilliams

Ditto. I use Claude 3.7 to refine drafts of research papers and ask it “What have I missed?”.

Now I can prompt Claude to ping PubMed and make sure that its suggested references are verified. Each citation/claim should be accompanied by a PMID or a DOI.

I hope this works!

ubicomp

That's a great way to use it!

lovehashbrowns

That's how I use it as well! It'll also occasionally hallucinate things, but much less often than other AI tools I've tried. But typically I'll just run things by it that I'm question myself about, or if I want to solidify a concept I'll ask it if my understanding is correct.

It's also fun to ask the same question to multiple AI tools and see how the answers differ. Usually Claude is the most accurate and helpful, though.

agentultra

They need to stop or else make their crawlers easy to identify and block. However I have no faith that AI companies will play by the rules.

They already cost people time, money, and their mental health by using adversarial tactics to evade blocking and ignoring robots.txt

https://drewdevault.com/2025/03/17/2025-03-17-Stop-externali...

HN

Claude can now search the web

Claude can now search the web