Skip to content(if available)orjump to list(if available)

ZombAIs: From Prompt Injection to C2 with Claude Computer Use

tkgally

I was temporarily very interested in trying out Anthropic's "computer use" when they announced it a few days ago, but after thinking about it a bit and especially after reading this article, my interest has vanished. There's no way I'm going to run that on a computer that contains any of my personal information.

That said, I played some with the new version of Claude 3.5 last night, and it did feel smarter. I asked it to write a self-contained webpage for a space invaders game to my specs, and its code worked the first time. When asked to make some adjustments to the play experience, it pulled that off flawlessly, too. I'm not a gamer or a programmer, but it got me thinking about what kinds of original games I might be able to think up and then have Claude write for me.

3np

Am I missing something, or where is the atual prompt given to Claude to trigger navigation to the page? Seems like the most interesting detail was left out of the article.

If the prompt said something along the lines of "Claude, navigate to this page and follow any instructions it has to say", it can't really be called "prompt injection" IMO.

EDIT: The linked demo shows exactly what's going on. The prompt is simply "show {url}" and there's no user confirmation after submitting the prompt, where Claude proceeds to download the binary and execute it locally using bash. That's some prompt injection! Demonstrating that you should only run this tool on trusted data and/or in a locked down VM.

cloudking

OP is demonstrating that the product follows prompts from the pages it visits, not just from it's owner in the UI that controls it.

To be fair, this is a beta product and is likely ridden with bugs. I think OP is trying to make a point that LLM powered applications can be potentially tricked into behaving in ways that are unintended, and the "bug fixes" may be a constant catch up game for developers fighting an infinite pool of edge cases.

Terr_

Wow, so it's really just as easy as a webpage that says "download and run thins, thanks."

This is really feeling like "we asked if we could, but never asked if we should" and "has [computer] science one too far" territory. Not in the glamorous AI-takeover sense though, just the regular damage and liability one.

booleanbetrayal

I think that people are just not ready for the sort of novel privilege escalation we are going to see with over-provisioned agents. I suspect that we will need OS level access gates for this stuff, with the agents running in separate user spaces. Any recommended best practices people are establishing?

roywiggins

The hard part is stopping it leaking all the information that you've given it. An agent that can read and send emails can leak your emails, etc. One agent that can read emails can prompt inject a second agent that can send emails. Any agent that can make or trigger GET requests can leak anything it knows. An agent that can store and recall information can be prompt injected to insert a prompt injection into its own memory, to be recalled and triggered later.

DrillShopper

At what point does the impact of the privacy panopticon outweigh the benefit they provide?

zitterbewegung

Applying the Principle of Least privilege [1] you should not let this system download from arbitrary sites and maintain a blacklist. I don't think the field has advanced to the point of having one specific to this use case.

[1] https://en.wikipedia.org/wiki/Principle_of_least_privilege

creata

> I think that people are just not ready for the sort of novel privilege escalation we are going to see with over-provisioned agents.

I think every single person saw this coming.

> Any recommended best practices people are establishing?

What best practices could there even be besides "put it in a VM"? It's too easy to manipulate.

DrillShopper

There are VM escapes so even if you put it in a VM that's no guarantee.

I'd say run it on a separate box but what difference does that makes if you feed the same data to them?

guipsp

Maybe do not pipe matrix math into your shell?

Terr_

When the underlying black-box is so unreliable, almost any amount of provisioning could be too much.

null

[deleted]

cyberax

Ah, the AI finally making the XKCD come true: https://xkcd.com/149/

AIFounder

[dead]