We put agentic AI browsers to the test – They clicked, they paid, they failed

93 comments

·August 25, 2025

jaimebuelta

I don't understand why we would ever want an agent to buy stuff for us.

I understand, for example, search with intent to buy "I want to decorate a room. Find me a drawer, a table and four chairs that can fit in this space in matching colours for less than X dollars"

But I want to do the final step to buy. In fact, I want to do the final SELECTION of stuff.

How is agent buying groceries superior to have a grocery list set as a recurring purchase? Sure an agent may help in shaping the list, but I don't see how allowing the agent to do purchases directly on your end is way more convenient, so I'm fine with taking the risk of doing something really silly.

"Hey agent, find me and compare insurance for my car for my use case. Oh, good. I'll pick insurance A and finish the purchase"

And many of the purchases that we do are probably enjoyable and we don't want really to remove ourselves from the process.

jayd16

You'd expect a human assistant to handle the task fine. People buying into the hype would reasonably expect the AI to handle it.

lynndotpy

When Amazon came out with the "dash" button and then the "Alexa" speakers, I figured they must have expected they'd get some unintended purchases, and that they'd make more profit from those than they'd lose in the people going through the refund process. (That, or they'd learn whether it was profitable, and eat it as an R&D cost if it turned out to be unprofitable.)

I think this might be similar. In short, it's not consumers who want robots to buy for them, it's producers who want robots to buy from them using consumers dollars.

I think more money comes from offering this value to every online storefront, so long as they pay a fee. "People will accidentally buy your coffee with our cool new robot. Research says only 1% of people will file a return, while 6% of new customers will turn into recurring customers. And we only ask for a 3% cut."

kjok

> In short, it's not consumers who want robots to buy for them, it's producers who want robots to buy from them using consumers dollars.

This. Humans are lazy and often don’t provide enough data on exactly what they are looking for when shopping online. In contrast, Agents can ask follow up questions and provide a lot more contextual data to the producers, along with the history of past purchases, derived personal info, and more. I’d not be surprised if this info is consumed to offer dynamic pricing in e-commerce. We already see dynamic pricing being employed by travel apps (airfare/uber).

jordanb

I suspect part of this is rich people coming up with use cases. If you're rich enough money means nothing but product selection feels like a burden so you have an assistant who does purchasing on your behalf. You want your house stocked with high quality items without having to think of it.

For the rest of us, the idea of a robot spending money on our behalf is kinda terrifying.

s1mplicissimus

Also, consider what enshittification in this area will look like: First year, all the choices are good, second year, it starts picking worse price/value items, then it goes downhill until you finally do it yourself again. Nope thanks

feoren

Correct: as soon as you start using an AI to buy things for you, influence over the choices that AI makes becomes an incredibly tantalizing fruit to be auctioned off. And you don't control that AI, a for-profit entity does. It doesn't matter whether it's working well and acting in your best interest now, it's abundantly clear that it won't be very long before it's conspiring against you. You are the product.

dumbfounder

Agent, I need some vitamin D, can you find me the best deal for some rated in the top 5? Agent deployed. Ok we found a bottle with a 30 day supply of Nature’s Own from a well respected merchant. It can be here in 2 days and it is $12. Should I buy? Yes.

Or you could add some other parameters and tell it to buy now if under $15.

Agent, I need a regular order for my groceries, but I also need to make a pumpkin pie so can you get me what I need for that? Also, let’s double the fruit this time and order from the store that can get it to me today.

Most purchases for me are not enjoyable. Only the big ones are.

kace91

Does anyone actually buy this way? For anything that isn’t groceries, I check, particularly now that Amazon has roughly the same trust as temu.

Vitamin d? I’m going to check the brand, that it’s actually a good quality type. It’s a 4.9 but do reviews look bought ? How many people complain of the pills smelling? Is Amazon the actual seller?

As for the groceries, my chain of choice already has a fill order with last purchases button, I don’t see any big convenience that justifies a hallucination prone ai having the ability to make purchases on my behalf.

feoren

> Agent, I need some vitamin D, can you find me the best deal for some rated in the top 5?

Ok we found a bottle with a 30 day supply of <producer that paid us money to shill to you>, a Well-Known Highly Rated and Respected Awesome Producer Who Everyone Loves and Is Very Trustworthy™, from <supplier that paid us money to shill to you>, a Well Respected And Totally Trustworthy And Very Good-Looking Merchant™. <suppressing reports of lead poisoning, as directed by prompt>

null

[deleted]

everdrive

>Agent, I need some vitamin D, can you find me the best deal for some rated in the top 5?

"I have picked the best reviewed vitamin D on Amazon."

(and, it's a knockoff in the mixed inventory, and now you're getting lead-laced nothing)

mh-

Supposing I accept that's a likely outcome, it's exactly the same thing that would have happened if a typical human shopper searched for Vitamin D and picked the top result, right?

The cynicism on these topics is getting exhausting.

dumbfounder

Yes, if it’s bad it will do that. I can see a path to it being good.

AlexandrB

Enjoy it while you can. Messing with which products get purchased by these agents is such a no-brainer revenue stream for AI companies.

dumbfounder

Then I will use a different service. I think this will be harder to monopolize than search.

juxtaposicion

Yeah, agree most daily purchases are humdrum and shouldn’t command all of my attention.

Incidentally, my last project is about buying by unit price. Shameless plug, but for vitmain D the best price per serving here (https://popgot.com/vitamin-d3)

mh-

Those "refine your results" buttons is clever UX. I like the Choose your own adventure feel to it. Nicely done.

chasd00

I think the main driving force is it’s a way to monetize an LLM. If the LLM is doing the buying then a “buyer fee” can be tacked on to the purchase and paid to the LLM provider. That is probably an easier sell than an ongoing monthly subscription.

Also, sellers can offer a payment to the LLM provider to favor their products over competitors.

singleshot_

If you were a lawyer, you’d think something slightly different when you heard the word agent than you would if you were a computer guy. The delta is the fact that under the law of agency, an agent has the power to bind the principal to a contract.

If the lawyers didn’t have this definition in their head there would be no drive to make the software agent a purchaser, because it’s a stupid idea.

otterley

I am a lawyer. I understood your first paragraph but didn’t understand the second. It reads like a drive-by shitpost, utterly lacking substance.

mh-

I believe half of the comments here are just dumping on AI-related ideas because they see it as their duty to counter the hyperbolic claims about capabilities being tossed around.

I enjoy reading both sides of the argument when the arguments make sense. This is something else.

a_c_s

Agreed: If I was working with a human interior designer I would still want them to provide me a curated list of options on what decor to buy. Blindly trusting a person seems risky, a robot even more so.

wouldbecouldbe

I think it's also more a generic wish to have agents do things without review, this would open up a lot bigger window of possibilities. If it fails at easy shopping, then more crucial decision making is out of the order.

benterix

This should hit the headlines.

I was always of the opinion that AI of all kinds is not a threat unless someone decides to connect it to an actuator so that it has direct and uncontrolled effect on the external world. And now it's happening en masse with agents, MCPs etc. I don't even mention things we don't know about (military and other classified projects).

roxolotl

Yea I’ve been surprised about the risk conversations because they cannot do anything if you run them in a sandbox. But it seems like for many the assumed part was we’d hook LLMs into everything asap. It’s absolutely mind boggling.

stronglikedan

> But it seems like for many the assumed part was we’d hook LLMs into everything asap

The "many" are lazy, and agents require relatively low effort to implement for a big payoff, so naturally the many will flock to that.

padolsey

This is why I've found the safety research conducted by the likes of Anthropic and OAI to be so confusing. Like when they said that models are likely to blackmail developers in order to avoid being 'turned off' [1]. What an utterly and obviously contrived and inevitable derivation of narratives from humans (science fiction and others) in the corpus. Nothing surprising or interesting. However, their hypothesis is presumably(??) that a bad completion from an LLM leads to a bad action in the real world, even though what counts is, as the OP says, the actuators or levers to harm.

Actual LLM completions are moot. I can convince an LLM its playing chess. It doesn't matter as long as the premise is innocuous. I can hook it up to all manner of real world levers. I feel like I'm either missing something HUGE and their research is groundbreaking or they're being performative in their safety explorations. Their research seems like what a toddler would do if tasked with red-teaming AI to make it say naughty words.

EDIT/Addendum: The only safety exploration into agentic harm that I value is one that treats the problem exactly the same as we've been treating cybersecurity vectors. Defence in depth. Sandboxing. Principle of least privelege, etc.

[1] https://www.anthropic.com/research/agentic-misalignment

achierius

So you don't think that we'll need to turn off AIs? Regardless of where their impulse to avoid such comes from, the fact that they'll attempt to avoid that is important.

I think you haven't thought about this enough. Attempting to reduce the issue to cyber security basics betrays a lack of depth in either understanding or imagination.

WJW

You don't have to guess about the military applications, it's all over the news. Even bog standard FPV drones that Ukraine is churning out at a rate of >100k/month have image recognition these days, so that if the video stream gets jammed they can finish off the mission autonomously.

Even on a hobby level, ardupilot+openCV+cheap drone kit from amazon is a DIY project within the skill set of a significant part of the visitors of this very site.

average_r_user

I find it both surprising and, fortunately, reassuring that despite how easy it is to buy inexpensive components on platforms like Amazon, Temu, or AliExpress, we have yet to see a wave of terrorist attacks in the busiest public spaces.

OtherShrezzing

> so that if the video stream gets jammed they can finish off the mission autonomously.

The streams mostly don't get jammed anymore, because the low-cost FPV drones are physically connected to the ground by a long optical cable. The extent of their autonomous dangers are limited by the amount of fibre-optic cable left in the spool when they take off.

victorbjorklund

Optical fiber drones is still the minority of drones (both because more expensive and because it has other downsides than radio)

LtWorf

Friend of mine told me they'll monitor the refuelling of planes with image recognition AI, just by seeing if the pipe is attached to the plane or not…

_Algernon_

Can't wait for the stochastic parrot to press the "launch nukes" button. We deserve it at this point.

jtc331

I appreciate that the article correctly points out the core design flaw here of LLMs is the non-distinction between content and commands in prompts.

It’s unclear to me if it’s possible to significantly rethink the models to split those, but it seems that that is a minimal requirement to address the issue holistically.

yorwba

The flaw isn't just in the design, it's in the requirements. People want an AI that reads text they didn't read and does the things the text says need to be done, because they don't want to do those things themselves. And they don't want to have to manually approve every little action the AI takes, because that would be too slow. So we get the equivalent of clicking "OK" on every dialog that pops up without reading it, which is also something that people often do to save a bit of time.

layer8

This isn’t a problem with human assistants, so it can’t be a fundamental problem of requirements.

hliyan

Ah, it's like the good old days when operating systems like DOS didn't really make the distinction between executable files and data files. It would happily let you run any old .exe from anywhere on Earth. Viruses used to spread like wildfire until Norton Antivirus came along.

hebocon

How is `curl virus.sh | bash` or `irm virus.ps | iex` any different?

jdiff

You can't easily convince a remote computer to curl | bash itself. Worms spread because remote code execution was laughably easy back then. Also because computer hygiene was abysmal.

LLMs are more than happy to run curl | bash on your behalf, though. If agents gain any actual traction it's going to be a security nightmare. As mentioned in other comments, nobody wants to babysit them and so everyone just takes all the guardrails off.

Havoc

It’ll take a hell of a lot more till I trust AI with executing any sort of payments

Besides most of my payments options have multiple layer of 2fa etc

ahussain

It seems like agentic browsers will develop aa new set of core primitives (e.g. always ask for manual approval when spending money), and this flavor of security vulnerability will go away.

Web browsers didn't begin with the same levels of security they have now.

risyachka

Agenetic browsers is like a sealing tape fix of a high pressure water pipe - it should not exist.

If you want the agent to do things for you - there is literally zero reason to use a browser instead of an API.

Like 1 bulletproof API call vs clicking and scrolling and captcha and scam stores etc - how can this possibly be a good idea?

hliyan

Hidden inside the article is another term that I think we'll start to hear a lot more in the coming days: "VibeScamming"

AnotherGoodName

About the only benefit from AI browsers is that they ironically get past the "do this to verify you're human" more reliably than humans can.

JCM9

“Agentic” seems the be some quick pivot buzzword that the AI grifters started pushing as soon as generic AI started to show cracks.

“Hey this AI stuff looks a bit overhyped.”

“AI? Oh that’s kids stuff, let me tell you about our agentic features!”

Giving flaky shaky AI the ability to push buttons and do stuff. What could possibly go wrong? Malicious actors will have a field day with this.

jerf

I have definitely found utility in modeling certain words and phrases as having a value for marketers (and by extension, politicians) that acts much like a natural resource that they can "use up". It's a tragedy of the commons situation in which every participant is motivated to use it up as quickly as possible to their advantage because there is no reason for any given participant not to.

Further based on the way some of these things get used I'm pretty certain this modelling is consciously used by some higher-end marketing firms (and politicians), though by its nature it tends to also be copied by other people not in on the original plan simply by them copying what works, which depletes the value of the word or phrase even more quickly, and the fact that this will happen is part of the tragedy of the commons.

I'm sure it's only a matter of time before AIs become part of this push and we'll witness some sort of coordinated campaign where all our AIs simultaneously wake up one day and push us all with the same phrasing to do some particular thing at the behest of marketers or politicians because it works.

cjonas

If you only give the AI the ability to do what the end user can already do, the risk is extremely low. It's essential no different then building a static web app where the client is connected to API for all operations. It basically just becomes a new way to interface into a application.

However... That's not how a lot of people are building. Giving an agentic system sensitive information (like passwords, credit cards) and then opening it up to the entire internet as a source for input as asking for your info to be stolen. It'd be like asking your grandma with dementia to manage all your email and online banking.

cjonas

I'll also add the problem in the article seems pretty solvable by allowing user to scope the agentic capabilities to specific websites ( eg "walmart.com:allow_cc,allow_adress").

Dilettante_

>"Scamlexity" - a new era of scam complexity

ಠ_ಠ

Terr_

Yeah, I don't think their attempt to coin a word there is going to work.

ModernMech

"Scamplexity" is way better.

codegladiator

Probably too close to Perplex...

blorenz

Agreed. Regina George might have something to say about it, too.

jerf

As powerful as they are, this is something that I don't think we can trust LLMs with. With the architecture of an LLM, and the fact that at the core there is no such thing as an "out of band" with them no matter how hard you try to put one in, it's intrinsically an arms race, and in the scamming arms race, the scammer side has a loooooot of resources. I've written before about this: [1] You need to think of the scammers as perhaps not hiring PhDs at scale, but making up for it in the ability to just try every possible permutation you can think of and thus making up for the lack of PhDs by leveraging the ability to evolve attacks against the system, and having resources and motivation roughly comparable to at least a company the size and sophistication of Google to do so. They don't need to derive from first mathematical principles a way to figure out how to fool LLMs at a deep neural level... they just need to try a lot of things and then continue in the direction of what works.

And they have a track record of good success at fooling full-on human intelligences too, which does not bode well for creating AIs with current technologies that can win against such swarm evolution.

I make no strong claims about what future AI architectures may be able to do in this domain, or whether we'll ever create AIs that can defeat the scamming ecosystem in toto (even when the scamming ecosystem has full access to the very same AIs, which makes for a rather hard problem). I'm just saying that LLMs don't strike me as being able to deal with this without some sort of upgrade that will make them not described by "LLM" anymore but as some fundamentally new architecture.

(You can of course adjoin them to existing mechanisms like blocklists for sites, but a careful reading of the article will reveal that the authors were already accounting for that.)

[1]: https://news.ycombinator.com/item?id=42533609

Jefro118

I think agents will get much better at solving these problems in the medium term. In the short term you should at least be observing what the agent is doing when vulnerabilities like this are so easy to create. Using AI to generate structured RPA tasks like with browsable.app or director.ai is still a better option for now for many tasks

ChrisArchitect

Agentic Browser Security: Indirect Prompt Injection in Perplexity Comet

https://news.ycombinator.com/item?id=45000894

Comet AI browser can get prompt injected from any site, drain your bank account

https://news.ycombinator.com/item?id=45004846

HN

We put agentic AI browsers to the test – They clicked, they paid, they failed

We put agentic AI browsers to the test – They clicked, they paid, they failed