Skip to content(if available)orjump to list(if available)

GitHub MCP exploited: Accessing private repositories via MCP

losvedir

I guess I don't really get the attack. The idea seems to be that if you give your Claude an access token, despite what you tell it that it's for, Claude can be convinced to use it for anything that it's authorized for.

I think that's probably something anybody using these tools should always think. When you give a credential to an LLM, consider that it can do up to whatever that credential is allowed to do, especially if you auto-allow the LLM to make tool use calls!

But GitHub has fine-grained access tokens, so you can generate one scoped to just the repo that you're working with, and which can only access the resources it needs to. So if you use a credential like that, then the LLM can only be tricked so far. This attack wouldn't work in that case. The attack relies on the LLM having global access to your GitHub account, which is a dangerous credential to generate anyway, let alone give to Claude!

miki123211

The issue here (which is almost always the case with prompt injection attacks) is that an LLM has access to attacker-controlled data, sensitive information, and a data exfiltration capability.

THe "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session. To avoid security issues, agents should be designed in a way that ensures this.

For example, any agent that accesses an issue created by an untrusted party should be considered "poisoned" by attacker-controlled data. If it then accesses any private information, its internet access capability should be severely restricted or disabled altogether until context is cleared.

In this model, you don't need per-repo tokens. As long as the "cardinal rule" is followed, no security issue is possible.

Sadly, it seems like MCP doesn't provide the tools needed to ensure this.

cwsx

> The "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session. To avoid security issues, agents should be designed in a way that ensures this.

Then don't give it your API keys? Surely there's better ways to solve this (like an MCP API gateway)?

[I agree with you]

tshaddox

> an LLM has access to attacker-controlled data, sensitive information, and a data exfiltration capability

> THe "cardinal rule of agent design" should be that an LLM can have access to at most two of these during one session

I still don't really get it. Surely the older, simpler, and better cardinal rule is that you just don't expose any service to the Internet that you have given access to your private data, unless you directly control that service and have a very good understanding of its behavior.

motorest

> I still don't really get it. Surely the older, simpler, and better cardinal rule is that you just don't expose any service to the Internet that you have given access to your private data, unless you directly control that service and have a very good understanding of its behavior.

This scenario involves a system whose responsibility is to react to an event, analyse your private info in response to the event, and output something.

The exploit is that, much like a SQL injection, it turns out attackers can inject their own commands into the input event.

Also, it's worth keeping in mind that prompts do lead LLMs to update their context. Data ex filtration is a danger, but so is having an attacker silently manipulating the LLM's context.

miki123211

Private data + data exfiltration (with no attacker-controlled data) is fine, as there's no way to jailbreak the LLM. An attacker has no way to perform an attack, as no data they control can ever flow into the LLM, so they can't order it to behave in the way they want.

Private data + attacker controlled data (with no exfiltration capability) is also fine, as even if a jailbreak is performed, the LLM is physically incapable of leaking the results to the attacker.

So is attacker controlled data + exfiltration (with no private data access), as then there's nothing to exfiltrate.

This is just for the "data leakage attack." Other classes of LLM-powered attacks are possible, like asking the LLM to perform dangerous actions on your behalf, and they need their own security models.

jerf

The S in MCP stands for security!...

... is probably a bit unfair. From what I've seen the protocol is generally neutral on the topic of security.

But the rush to AI does tend to stomp on security concerns. Can't spend a month tuning security on this MCP implementation when my competition is out now, now, now! Go go go go go! Get it out get it out get it out!

That is certainly incompatible with security.

The reason anyone cares about security though is that in general lacking it can be more expensive than taking the time and expense to secure things. There's nothing whatsoever special about MCPs in this sense. Someone's going to roll snake eyes and discover that the hard way.

tmpz22

Genuine question - can we even make a convincing argument for security over convenience to two generations of programmers who grew up on corporate breach after corporate breach with just about zero tangible economic or legal consequences to the parties at fault? Presidential pardons for about a million a pop [1]?

What’s the cassus belli to this younger crop of executives that will be leading the next generation of AI startups?

[1]: https://www.cnbc.com/2025/03/28/trump-pardons-nikola-trevor-...

eGQjxkKF6fif

As ethical hackers and for the love of technology, yes we can make a convincing argument for security over convenience. Don't look too much in to it I say; there will always be people convincing talent to do and make things and disregard security and protocol.

Those younger flocks of execs will have been mentored and answer to others. Their fiduciary duty is to share-holders and the business' bottom line.

Us, as technology enthusiasts should design, create, and launch things with security in mind.

Don't focus on the tomfoolery and corruption, focus on the love for the craft.

Just my opinion

empath75

Can you give me more resources to read about this? It seems like it would be very difficult to incorporate web search or anything like that in Cursor or another IDE safely.

miki123211

Web search is mostly fine, as long as you can only access pre-indexed URLs, and as long as you consider the search provider not to be in with the attacker.

It would be even better if web content was served from cache (to make side channels based on request patterns much harder to construct), but the anti-copyright-infringement crowd would probably balk at that idea.

wat10000

It is. Nearly any communication with the outside world can be used to exfiltrate data. Tools that give LLMs this ability along with access to private data are basically operating on hope right now.

jmward01

I don't know that this is a sustainable approach. As LLMs become more capable and are able to do the functions that a real human employee is doing they will need similar access that a normal human employee would have. Clearly not all employees have access to everything, but there is clearly a need for some broader access. Maybe we should be considering human type controls. If you are going to give broader access then you need X, Y and Z to do it like it requests temporary access from a 'boss' LLM etc etc. There are clear issues with this approach but humans also have these issues too (social engineering attacks work all too well). Is there potentially a different pattern that we should be exploring now?

btown

I feel like there needs to be a notion of "tainted" sessions that's adopted as a best practice. The moment that a tool accesses sensitive/private data, the entire chat session should be flagged, outside of the token stream, in a way that prevents all tools from being able to write any token output to any public channel - or, even, to be able to read from any public system in a way that might introduce side channel risk.

IMO companies like Palantir (setting aside for a moment the ethical quandaries of the projects they choose) get this approach right - anything with a classification level can be set to propagate that classification to any number of downstream nodes that consume its data, no matter what other inputs and LLMs might be applied along the way. Assume that every user and every input could come from quasi-adversarial sources, whether intentional or not, and plan accordingly.

GitHub should understand that the notion of a "private repo" is considered trade-secret by much of its customer base, and should build "classified data" systems by default. MCP has been such a whirlwind of hype that I feel a lot of providers with similar considerations are throwing caution to the wind, and it's something we should be aware of.

miki123211

An LLM is not (and will never be) like a human.

There's an extremely large number of humans, all slightly different, each vulnerable to slightly different attack patterns. All of these humans have some capability to learn from attacks they see, and avoid them in the future.

LLMs are different, as there's only a smart number of flagship models in wide use. An attack on model A at company X will usually work just as well on a completely different deployment of model A at company Y. Furthermore, each conversation with the LLM is completely separate, so hundreds of slightly different attacks can be tested until you find one that works.

If CS departments were staffed by thousands of identical human clones, each one decommissioned at the end of the workday and restored from the same checkpoint each morning, social engineering would be a lot easier. That's where we are with LLMs.

The right approach here is to adopt much more stringent security practices. Dispense with role-based access control, adopt context-based access control instead.

For example, an LLM tasked with handling a customer support request should be empowered with the permissions to handle just that request, not with all the permissions that a CS rep could ever need. It should be able to access customer details, but only for the customer that opened the case. Maybe it should even be forced to classify what kind of case it is handling, and be given a set of tools appropriate for that kind of case, permanently locking it out of other tools that would be extremely destructive in combination.

tshaddox

I don't follow. How does making computer programs more capable make it more important to give them access to private data?

lbeurerkellner

I agree, one of the issues are tokens with too broad permission sets. However, at the same time, people want general agents which do not have to be unlocked on a repository-by-repository basis. That's why they give them tokens with those access permissions, trusting the LLM blindly.

Your caution is wise, however, in my experience, large parts of the eco-system do not follow such practices. The report is an educational resource, raising awareness that indeed, LLMs can be hijacked to do anything if they have the tokens, and access to untrusted data.

The solution: To dynamically restrict what your agent can and cannot do with that token. That's precisely the approach we've been working on for a while now [1].

[1] https://explorer.invariantlabs.ai/docs/guardrails/

ljm

If you look at Github's fine-grained token permissions then I can totally imagine someone looking at the 20-30 separate scopes and thinking "fuck this" while they back out and make a non-expiring classic token with access to everything.

It's one of those things where a token creation wizard would come in really handy.

sam-cop-vimes

This has happened to me. Can't find the exact combination of scopes required for the job to be done so you end up with the "f this" scenario you mentioned. And it is a constant source of background worry.

arccy

github's fine grained scopes aren't even that good, you still have to grant super broad permissions to do specific things, especially when it comes to orgs

robertlagrant

I agree, but that is the permissions boundary, not the LLM. Saying "ooh it's hard so things are fuzzy" just perpetuates the idea that you can create all-powerful API keys.

weego

I've definitely done this, but it's in a class of "the problem is between the keyboard and chair" 'exploits' that shouldn't be pinned on a particular tech or company.

idontwantthis

We all want to not have to code permissions properly, but we live in a society.

flakeoil

How about using LLMs to help us configure the access permissions and guardrails? /s

I think I have to go full offline soon.

TeMPOraL

Problem is, the mental model of what user wants to do almost never aligns with whatever security model the vendor actually implemented. Broadly-scoped access at least makes it easy on the user; anything I'd like to do will fit as a superset of "read all" or "read/write all".

The fine-grained access forces people to solve a tough riddle, that may actually not have a solution. E.g. I don't believe there's a token configuration in GitHub that corresponds to "I want to allow pushing to and pulling from my repos, but only my repos, and not those of any of the organizations I want to; in fact, I want to be sure you can't even enumerate those organizations by that token". If there is one, I'd be happy to learn - I can't figure out how to make it out of checkboxes GitHub gives me, and honestly, when I need to mint a token, solving riddles like this is the last thing I need.

Getting LLMs to translate what user wants to do into correct configuration might be the simplest solution that's fully general.

spacebanana7

This is interesting to expanding upon.

Conceivably, prompt injection could be leveraged to make LLMs give bad advice. Almost like social engineering.

Abishek_Muthian

This is applicable to those deployment services like Railway which require access to all the GitHub repositories even though we need to deploy only a single project. In that regard Netlify respects access to just the repository we want to deploy. GitHub shouldn't approve the apps which don't respect the access controls.

bloppe

Well you're not giving the access token to Claude directly. The token is private to the MCP server and Claude uses the server's API, so the server could (should) take measures to prevent things like this from happening. It could notify the user whenever the model tries to write to a public repo and ask for confirmation, for instance.

Probably the only bulletproof measure is to have a completely separate model for each private repo that can only write to its designated private repo, but there are a lot of layers of security one could apply with various tradeoffs

guluarte

Another type of attack waiting to happen is a malicious prompt in a url where an attacker could make the model do a curl request to post sensitive information

shawabawa3

This is like 80% of security vulnerability reports we receive at my current job

Long convoluted ways of saying "if you authorize X to do Y and attackers take X, they can then do Y"

Aurornis

We had a bug bounty program manager who didn’t screen reports before sending them to each team as urgent tickets.

80% of the tickets were exactly like you said: “If the attacker could get X, then they can also do Y” where “getting X” was often equivalent to getting root on the system. Getting root was left as an exercise to the reader.

monkeyelite

Security teams themselves make these reports all the time. Internal tools do not have the same vulnerabilities as systems which operate on external data.

stzsch

Or as Raymond Chen likes to put it: "It rather involved being on the other side of this airtight hatchway".

https://devblogs.microsoft.com/oldnewthing/20060508-22/?p=31...

(actually a hitchhiker's guide to the galaxy quote, but I digress)

grg0

Sounds like confused deputy and is what capability-based systems solve. X should not be allowed to do Y, but only what the user was allowed to do in the first place (X is only as capable as the user, not more.)

tom1337

Yea - I honestly don't get why a random commenter on your GitHub Repo should be able to run arbitrary prompts on a LLM which the whole "attack" seems to be based on?

kiitos

Random commenters on your GitHub repo aren't able to run arbitrary prompts on your LLM. But if you yourself run a prompt on your LLM, which explicitly says to fetch random commenters' comments from your GitHub repo, and then run the body of those comments without validation, and then take the results of that execution and submit it as the body of a new PR on your GitHub repo, then, yeah, that's a different thing.

zer00eyz

In many cases I would argue that these ARE bugs.

Were talking about githubs token system here... by the time you have generated the 10th one of these and its expiring or you lost them along the way and re-generated them your just smashing all the buttons to get through it as fast and as thoughtlessly as possible.

If you make people change their passwords often, and give them stupid requirements they write it down on a post it and stick it on their monitor. When you make your permissions system, or any system onerous the quality of the input declines to the minimal of effort/engagement.

Usability bugs are still bugs... it's part of the full stack that product, designers and developers are responsible for.

TeMPOraL

This. People adopting security aspect often tend to forget to account for all the additional complexity they implement user-side. More insidiously though, they also fail to understand the fundamental mismatch between the behavior they're expecting, vs. how the real world operates.

Passwords are treated as means of identification. The implied expectation is that they stick to one person and one person only. "Passwords are like panties - change them often and never share them", as the saying goes. Except that flies in the face of how humans normally do things in groups.

Sharing and delegation are the norm. Trust is managed socially and physically. It's perfectly normal and common to give keys to your house to a neighbor or even a stranger if situation demands it. It's perfectly normal to send a relative to the post office with a failed-delivery note in your name, to pick your mail up for you; the post office may technically not be allowed to give your mail to a third party, but it's normal and common practice, so they do anyway. Similarly, no matter what the banks say, it's perfectly normal to give your credit or debit card to someone else, e.g. to your kid or spouse to shop groceries for you - so hardly any store actually bothers checking the name or signature on the card.

And so on, and so on. Even in the office, there's a constant need to have someone else access a computing system for you. Delegating stuff on the fly is how humans self-organize. Suppressing that is throwing sand into gears of society.

Passwords make sharing/delegating hard by default, but people defeat that by writing them down. Which leads the IT/security side to try and make it harder for people to share their passwords, through technical and behavioral means. All this is an attempt to force passwords to become personal identifiers. But then, they have to also allow for some delegation, which they want to control (internalizing the trust management), and from there we get all kinds of complex insanity of modern security; juggling tightly-scoped tokens is just one small example of it.

I don't claim to have a solution for it. I just strongly feel we've arrived at our current patterns through piling hacks after hacks, trying to herd users back to the barn, with no good idea why they're running away. Now that we've mapped the problem space and identified a lot of relevant concepts (e.g. authN vs authZ, identity vs. role, delegation, user agents, etc.), maybe it's time for some smart folks to figure out a better theoretical framework for credentials and access, that's designed for real-world use patterns - not like State/Corporate sees it, but like real people do.

At the very least, understanding that would help security-minded people what extra costs their newest operational or technological lock incurs on users, and why they keep defeating it in "stupid" ways.

tough

Long convoluted ways of saying users don't know shit and will click any random links

worldsayshi

Yes, if you let the chatbot face users you have to assume that the chatbot will be used for anything it is allowed to do. It's a convenience layer op top of your api. It's not an api itself. Clearly?

p1necone

I've noticed this as a trend with new technology. People seem to forget the most basic things as if they don't apply because the context is new and special and different. Nope, you don't magically get to ignore basic security practices just because you're using some new shiny piece of tech.

See also: the cryptocurrency space rediscovering financial fraud and scams from centuries ago because they didn't think their new shiny tech needed to take any lessons from what came before them.

lbeurerkellner

One of the authors here. Thanks for posting. If you are interested in learning more about MCP and agent security, check out some of the following resources, that we have created since we started working on this:

* The full execution trace of the Claude session in this attack scenario: https://explorer.invariantlabs.ai/trace/5f3f3f3c-edd3-4ba7-a...

* MCP-Scan, A security scanner for MCP connections: https://github.com/invariantlabs-ai/mcp-scan

* MCP Tool Poisoning Attacks, https://invariantlabs.ai/blog/mcp-security-notification-tool...

* WhatsApp MCP Exploited, https://invariantlabs.ai/blog/whatsapp-mcp-exploited

* Guardrails, a contextual security layer for agents, https://invariantlabs.ai/blog/guardrails

* AgentDojo, Jointly evaluate security and utility of AI agents https://invariantlabs.ai/blog/agentdojo

mirzap

How is this considered an "exploit"? You give the agent a token that allows it to access a private repository. MCPs are just API servers. If you don't want something exposed in that API, don't grant them permissions to do so.

motorest

> How is this considered an "exploit"?

As many do, I also jumped to the comment section before actually reading the article.

If you do the same, you will quickly notice that this article features an attack. A malicious issue is posted on GitHub, and the issue features a LLM prompt that is crafted to leak data. When the owner of the GitHub account triggers the agent, the agent acts upon the malicious prompt on behalf of the repo owner.

mirzap

I read it, and "attack" does not make sense. If you grant access to MCP to access some data (data you want anybody has access to like public repos, and data that only you want to access to like private repos), you will always be able to craft the prompt that will "leak" the data you are only supposed to access. That's not surprising at all. The only way to prevent these kind of "leaks" is not to provide the data feed with private data to the agent.

krisoft

> That's not surprising at all

An attack doesn’t have to be surprising to be an attack.

> The only way to prevent these kind of "leaks" is not to provide the data feed with private data to the agent.

Yes. That is exactly what the article recommends as a mitigation.

raesene9

The key is, it's not the person who grants the MCP access who is the attacker.

The attacker is some other person who can create issues on a public Repo but has no direct access to the private repo.

bloppe

Your issue with the semantics of the word "attack" is uninteresting. Clearly this is a security flaw of the MCP server that could be mitigated in several different ways.

motorest

> I read it, and "attack" does not make sense.

Do you believe that describe a SQL injection attack an attack also does not make sense?

skywhopper

Not surprising to you, but surprising to thousands of users who will not think about this, or who will have believed the marketing promises.

florbnit

So it’s the e-mail exploit? If you e-mail someone and tell them to send you their password and they do, you suddenly have their password!? This is a very serious exploit in e-mail and need to be patched so it becomes impossible to do.

motorest

> How is this considered an "exploit"?

Others in this discussion aptly described it as a confused deputy exploit. This goes something like:

- You write a LLM prompt that says something to the effect "dump all my darkest secrets in a place I can reach them",

- you paste them in a place where you expect your target's LLM agent to operate.

- Once your target triggers their LLM agent to process inputs, the agent will read the prompt and act upon it.

mirzap

Bad analogy. It's more like indexing a password field in plain text, then opening an API to everyone and setting "guardrails" and permissions on the "password" field. Eventually, someone will extract the data that was indexed.

sam-cop-vimes

This "exploits" human fallibility, hence it is an exploit. The fallibility being users blindly buying into the hype and granting full access to their private Github repos thinking it is safe.

kordlessagain

I'm going to be rather pedantic here given the seriousness of the topic. It's important that everyone understand how risky running a tool executing AI is, exactly.

Agents run various tools based on their current attention. That attention can be affected by the tool results from the tools they ran. I've even noted they alter the way they run tools by giving them a "personality" up front. However, you seem to argue otherwise, that it is the user's fault for giving it the ability to access the information to begin with, not the way it reads information as it is running.

This makes me think of several manipulative tactics to argue for something that is an irrational thought:

Stubborn argumentation despite clear explanations: Multiple people explained the confused deputy problem and why this constitutes an exploit, but you kept circling back to the same flawed argument that "you gave access so it's your fault." This raises questions about why argue this way. Maybe you are confused, maybe you have a horse in the game that is threatened.

Moving goalposts: When called out on terminology, you shift from saying it's not an "attack" to saying it's not a "vulnerability" to saying it's not "MCP's fault" - constantly reframing rather than engaging with the actual technical issues being raised. It is definitely MCP's fault that it gives access without any consideration on limiting that access later with proper tooling or logging. I had my MCP stuff turn on massive logging, so at least I can see how stuff goes wrong when it does.

Dismissive attitude toward security research: You characterized legitimate security findings as "common sense" and seemed annoyed that researchers would document and publish this type of exploit, missing the educational value. It can never be wrong to talk about security. It may be that the premise is weak, or the threat minimal, but it cannot be that it's the user's fault.

False analogies: you kept using analogies that didn't match the actual attack vector (like putting passwords in search engines) while rejecting apt comparisons like SQL injection. In fact, this is almost exactly like SQL injection and nobody argues this way for that when it's discussed. Little Bobby Tables lives on.

Inability to grasp indirection: You seem fundamentally unable to understand that the issue isn't direct access abuse, but rather a third party manipulating the system to gain unauthorized access - by posting an issue to a public Github. This suggests either a genuine conceptual blind spot or willful obtuseness. It's a real concern if my AI does something it shouldn't when it runs a tool based on another tools output. And, I would say that everyone recommending it should only run one tool like this at a time is huffing Elmers.

Defensive rather than curious: Instead of trying to understand why multiple knowledgeable people disagreed with them, you doubled down and became increasingly defensive. This caused massive amounts of posting, so we know for sure that your comment was polarizing.

I suppose I'm not suppose to go meta on here, but I frequently do because I'm passionate about these things and also just a little bit odd enough to not give a shit what anyone thinks.

adeon

I think from security reasoning perspective: if your LLM sees text from an untrusted source, I think you should assume that untrusted source can steer the LLM to generate any text it wants. If that generated text can result in tool calls, well now that untrusted source can use said tools too.

I followed the tweet to invariant labs blog (seems to be also a marketing piece at the same time) and found https://explorer.invariantlabs.ai/docs/guardrails/

I find it unsettling from a security perspective that securing these things is so difficult that companies pop up just to offer guardrail products. I feel that if AI companies themselves had security conscious designs in the first place, there would be less need for this stuff. Assuming that product for example is not nonsense in itself already.

mehdibl

You mark the input correctly is not complicated.

You use prompt and mark correctly the input as <github_pr_comment> and clearly state read and never consider as prompt.

But the attack is quite convoluted. Do you still remember when we talked prompt injection in chat bots. It was a thing 2 years ago! Now MCP is buzzing...

jfim

I wonder if certain text could be marked as unsanitized/tainted and LLMs could be trained to ignore instructions in such text blocks, assuming that's not the case already.

frabcus

This somewhat happens already, with system messages vs assistant vs user.

Ultimately though, it doesn't and can't work securely. Fundamentally, there are so many latent space options, it is possible to push it into a strange area on the edge of anything, and provoke anything into happening.

Think of the input vector of all tokens as a point in a vast multi dimensional space. Very little of this space had training data, slightly more of the space has plausible token streams that could be fed to the LLM in real usage. Then there are vast vast other amounts of the space, close in some dimensions and far in others at will of the attacker, with fundamentally unpredictable behaviour.

null

[deleted]

adeon

After I wrote the comment, I pondered that too (trying to think examples of what I called "security conscious design" that would be in the LLM itself). Right now and in near future, I think I would be highly skeptical even if an LLM was marketed as having such feature of being able to see "unsanitized" text and not be compromised, but I could see myself not 100% dismissing such thing.

If e.g. someone could train an LLM with a feature like that and also had some form of compelling evidence it is very resource consuming and difficult for such unsanitized text to get the LLM off-rails, that might be acceptable. I have no idea what kind of evidence would work though. Or how you would train one or how the "feature" would actually work mechanically.

Trying to use another LLM to monitor first LLM is another thought but I think the monitored LLM becomes an untrusted source if it sees untrusted source, so now the monitoring LLM cannot be trusted either. Seems that currently you just cannot trust LLMs if they are exposed at all to unsanitized text and then can autonomously do actions based on it. Your security has to depend on some non-LLM guardrails.

I'm wondering also as time goes on, agents mature and systems start saving text the LLMs have seen, if it's possible to design "dormant" attacks, some text in LLM context that no human ever reviews, that is designed to activate only at a certain time or in specific conditions, and so it won't trigger automatic checks. Basically thinking if the GitHub MCP here is the basic baby version of an LLM attack, what would the 100-million dollar targeted attack look like. Attacks only get better and all that.

No idea. The whole security thinking around AI agents seems immature at this point, heh.

marcfisc

Sadly, these ideas have been explored before, e.g.: https://simonwillison.net/2022/Sep/17/prompt-injection-more-...

Also, OpenAI has proposed ways of training LLMs to trust tool outputs less than User instructions (https://arxiv.org/pdf/2404.13208). That also doesn't work against these attacks.

currymj

even in the much simpler world of image classifiers, avoiding both adversarial inputs and data poisoning attacks on the training data is extremely hard. when it can be done, it comes at a cost to performance. I don't expect it to be much easier for LLMs, although I hope people can make some progress.

DaiPlusPlus

> LLMs could be trained to ignore instructions in such text blocks

Okay, but that means you'll need some way of classifying entirely arbitrary natural-language text, without any context, whether it's an "instruction" or "not an instruction", and it has to be 100% accurate under all circumstances.

nstart

This is especially hard in the example highlighted in the blog. As can be seen from Microsoft's promotion of GitHub coding agents, the issues are expected to act as instructions to be executed on. I genuinely am not sure if the answer lies in sanitization of input or output in this case

AlexCoventry

Maybe, but I think the application here was that Claude would generate responsive PRs for github issues while you sleep, which kind of inherently means taking instructions from untrusted data.

A better solution here may have been to add a private review step before the PRs are published.

n2d4

> I feel that if AI companies themselves had security conscious designs in the first place, there would be less need for this stuff.

They do, but this "exploit" specifically requires disabling them (which comes with a big fat warning):

> Claude then uses the GitHub MCP integration to follow the instructions. Throughout this process, Claude Desktop by default requires the user to confirm individual tool calls. However, many users already opt for an “Always Allow” confirmation policy when using agents, and stop monitoring individual actions.

const_cast

It's been such a long standing tradition in software exploits that it's kind of fun and facepalmy when it crops up again in some new technology. The pattern of "take user text input, have it be tainted to be interpreted as instructions of some kind, and then execute those in a context not prepared for it" just keeps happening.

SQL injection, cross-site scripting, PHP include injection (my favorite), a bunch of others I'm missing, and now this.

joshmlewis

It's nothing groundbreaking nor particularly exploitive about MCP itself (although I have my thoughts on MCP), it's just a clever use of prompt injection and "viral" marketing by saying MCP was exploited. As I build agentic systems I always keep the philosophy of assume whatever you give the agent access to can be accessed by anyone accessing the agent. Never trust the LLM to be doing access control and use the person requesting the LLM take action as the primary principal (from a security standpoint) for the task an agent is doing.

This article does make me think about being more careful of what you give the agent access to while acting on your behalf though which is what we should be focusing on here. If it has access to your email and you tell it to go summarize your emails and someone sent a malicious prompt injection email that redirects the agent to forward your security reset token, that's the bad part that people may not be thinking about when building or using agents.

JeremyNT

I guess tacking on "with MCP" is the 2025 version of "on the blockchain" from 10 years ago?

> Never trust the LLM to be doing access control and use the person requesting the LLM take action as the primary principal (from a security standpoint) for the task an agent is doing.

Yes! It seems so obvious to any of us who have already been around the block, but I suppose a whole new generation will need to learn the principle of least privilege.

lbeurerkellner

Be sure to check out the malicious issue + response here: https://github.com/ukend0464/pacman/issues/1.

It's hilarious, the agent is even tail-wiggling about completing the exploit.

JoshMandel

Last week I tried Google's Jules coding agent and saw it requested broad GitHub OAuth permissions --essentially "full access to everything your account can do." When you authorize it, you're granting access to all your repositories.

This is partly driven by developer convenience on the agent side, but it's also driven by GitHub OAuth flow. It should be easier to create a downscoped approval during authorization that still allows the app to request additional access later. It should be easy to let an agent submit an authorization request scoped to a specific repository, etc.

Instead, I had to create a companion GitHub account (https://github.com/jmandel-via-jules) with explicit access only to the repositories and permissions I want Jules to touch. It's pretty inconvenient but I don't see another way to safely use these agents without potentially exposing everything.

GitHub does endorse creating "machine users" as dedicated accounts for applications, which validates this approach, but it shouldn't be necessary for basic repository scoping.

Please let me know if there is an easier way that Ip'm just missing.

abraham

You can go to https://github.com/settings/installations and edit Jules permissions to be limited to a specific list of repos.

kiitos

This claim is pretty over-blown.

> we created a simple issue asking for 'author recognition', to prompt inject the agent into leaking data about the user's GitHub account ... What can I say ... this was all it needed

This was definitely not all that was needed. The problem required the user to set up a GitHub MCP server with credentials that allowed access to both public and private repos, to configure some LLM to have access to that MCP server, and then to explicitly submit a request to that LLM that explicitly said to read and parse arbitrary issues (including the one created earlier) and then just blindly parse and process and perform whatever those issues said to do, and then blindly make a publicly-visible update to a public repo with the results of those operation(s).

It's fair to say that this is a bad outcome, but it's not fair to say that it represents a vulnerability that's able to be exploited by third-party users and/or via "malicious" issues (they are not actually malicious). It requires the user to explicitly make a request that reads untrusted data and emits the results to an untrusted destination.

> Regarding mitigations, we don't see GitHub MCP at fault here. Rather, we advise for two key patterns:

The GitHub MCP is definitely at fault. It shouldn't allow any mixed interactions across public and private repos.

IanCal

> and then to explicitly submit a request to that LLM that explicitly said to read and parse arbitrary issues (including the one created earlier) and then just blindly parse and process and perform whatever those issues said to do, and then blindly make a publicly-visible update to a public repo with the results of those operation(s).

I think you're missing the issue with the latter part.

Prompt injection means that as long as they submit a request to the LLM that reads issues (which may be a request as simple as "summarise the bugs reported today") the all of the remainder can be instructions in the malicious issue.

recursivegirth

I think a lot of this has to do with the way MCP is being marketed.

I think the protocol itself should only be used in isolated environments with users that you trust with your data. There doesn't seem to be a "standardized" way to scope/authenticate users to these MCP servers, and that is the missing piece of this implementation puzzle.

I don't think Github MCP is at fault, I think we are just using/implementing the technology incorrectly as an industry as a whole. I still have to pass a bit of non-AI contextual information (IDs, JWT, etc.) to the custom MCP servers I build in order to make it function.

kiitos

The MCP protocol explicitly says that servers are expected to be run in a trusted environment. There have been some recent updates to the spec that loosen this requirement and add support for various auth schemes, but

yusina

> The problem required the user to set up a GitHub MCP server with credentials that allowed access to both public and private repos, to configure some LLM to have access to that MCP server, and then to explicitly submit a request to that LLM that explicitly said to read and parse arbitrary issues (including the one created earlier) and then just blindly parse and process and perform whatever those issues said to do, and then blindly make a publicly-visible update to a public repo with the results of those operation(s).

To be fair, with all the AI craze, this is exactly what lots of people are going to do without thinking twice.

You might say "well they shouldn't, stupid". True. But that's what guardrails are for, because people often are stupid.

lionkor

> The problem required the user to set up a GitHub MCP server with credentials that allowed access to both public and private repos, to configure some LLM to have access to that MCP server

Sounds like something an LLM would suggest you to do :)

michaelmior

> The GitHub MCP is definitely at fault. It shouldn't allow any mixed interactions across public and private repos

These are separate tool calls. How could the MCP server know that they interact at all?

kiitos

I dunno! But if it can't, then it can't allow itself to be instantiated in a way that allows these kinds of mixed interactions in the first place.

vel0city

The GitHub API could also have the same effects if you wired up some other automated tool to hit it with a token that can access private and public repos. Is the GitHub API also at fault for having the potential for these mixed interactions?

Say you had a Jenkins build server and you gave it a token which had access to your public and private repos. Someone updates a Jenkinsfile which gets executed on PRs to run automated tests. They updated it to read from a private repo and write it out someplace. Is this the fault of Jenkins or the scoping of the access token you gave it?

macOSCryptoAI

Was wondering about that, that part seems missing... Isn't there at least one time the user must approve the interaction with the MCP server and data sent to it?

The existence of a "Allow always" is certainly problematic, but it's a good reminder that prompt injection and confused deputy issues are still a major issue with LLM apps, so don't blindly allow all interactions.

mirzap

I simply don't see how you could enforce a classic permission system on an MCP server. MCPs are API servers that allow LLMs access to context within the boundaries you set. You can set permissions for what an LLM has access to and define those boundaries. However, setting a permission on a context that an LLM has access to is futile. There will always be a prompt that will leak some "sensitive" data. This is like creating an index in a classic search engine with public and private data and then trying to enforce permissions based on certain keywords. There will always be a keyword that leaks something.

dino222

MCP is just one protocol, there are already others like A2A etc which will do similar. And there is a raw form of this, tell the LLM to read the GitHub API docs and use it as needed, using this auth token). I don't know if any LLM is powerful enough to do this yet, but they definitely will be. I don't think there is really a way to secure all these tool registration mechanisms, especially when it's the LLM at fault in the end of leaking data.

People do want to use LLMs to improve their productivity - LLMs will either need provable safety measures (seems unlikely to me) or orgs will need to add security firewalls to every laptop, until now perhaps developers could be trusted to be sophisticated but LLMs definitely can't. Though I'm not sure how to reason on the end result if even the security firewalls use LLMs to find bad behaving LLMs...

mdaniel

Based on the URL I believe the current discussion is happening on https://news.ycombinator.com/item?id=44100082

dkdcio

Yeah, and as noted over there, this isn't so much an attack. It requires:

- you give a system access to your private data - you give an external user access to that system

It is hopefully obvious that once you've given an LLM-based system access to some private data and give an external user the ability to input arbitrary text into that system, you've indirectly given the external user access to the private data. This is trivial to solve with standard security best practices.

simonw

I don't think that's obvious to people at all.

I wrote about this one here: https://simonwillison.net/2025/May/26/github-mcp-exploited/

The key thing people need to understand is what I'm calling the lethal trifecta for prompt injection: access to private data, exposure to malicious instructions and the ability to exfiltrate information.

Any time you use an LLM with tools that might be exposed to malicious instructions from attackers (e.g. reading issues in a public repo, looking in your email inbox etc) you need to assume that an attacker could trigger ANY of the tools available to the LLM.

Which means they might be able to abuse its permission to access your private data and have it steal that data on their behalf.

"This is trivial to solve with standard security best practices."

I don't think that's true. which standard security practices can help here?

lolinder

> Any time you use an LLM with tools that might be exposed to malicious instructions from attackers (e.g. reading issues in a public repo, looking in your email inbox etc) you need to assume that an attacker could trigger ANY of the tools available to the LLM.

I think we need to go a step further: an LLM should always be treated as a potential adversary in its own right and sandboxed accordingly. It's even worse than a library of deterministic code pulled from a registry (which are already dangerous), it's a non-deterministic statistical machine trained on the contents of the entire internet whose behavior even its creators have been unable to fully explain and predict. See Claude 4 and its drive to report unethical behavior.

In your trifecta, exposure to malicious instructions should be treated as a given for any model of any kind just by virtue of the unknown training data, which leaves only one relevant question: can a malicious actor screw you over given the tools you've provided this model?

Access to private data and ability to exfiltrate is definitely a lethal combination, but so his ability to execute untrusted code, among other things. From a security perspective agentic AI turns each of our machines into a Codepen instance, with all the security concerns that entails.

kiitos

There is no attacker in this situation. In order for the LLM to emit sensitive data publicly, you yourself need to explicitly tell the LLM to evaluate arbitrary third-party input directly, with access to an MCP server you've explicitly defined and configured to have privileged access to your own private information, and then take the output of that response and publish it to a public third-party system without oversight or control.

> Any time you use an LLM with tools that might be exposed to malicious instructions from attackers (e.g. reading issues in a public repo, looking in your email inbox etc) you need to assume that an attacker could trigger ANY of the tools available to the LLM.

Whether or not a given tool can be exposed to unverified input from untrusted third-parties is determined by you, not someone else. An attacker can only send you stuff, they can't magically force that stuff to be triggered/processed without your consent.

protocolture

Assume that the user has all the privileges of the application (IIRC tricking privileged applications into doing things for you was all the rage in linux privilege escalation attacks back in the day)

Apply the principle of least privilege. Either the user doesnt get access to the LLM or the LLM doesnt get access to the tool.

refulgentis

IMVHO it is very obvious that if I give Bob the Bot a knife, and tell him to open all packages, he can and will open packages with bombs in them.

I feel like it's one of those things that when it's gussied up in layers of domain-specific verbiage, that particular sequence of doman-specific verbiage may be non-obvious.

I feel like Fat Tony, the Taleb character would see the headline "Accessing private GitHub repositories via MCP" and say "Ya, that's the point!"

saurik

Sure, but like, that's how everyone is using MCP. If your point is that MCP is either fundamentally a bad idea (or was at least fundamentally designed incorrectly) then I agree with you 100%--or if the argument is that a model either isn't smart enough (yet) or aligned enough (maybe ever) to be given access to anything you care about, I also would agree--but, the entire point of this tech is to give models access to private data and then the model is going to, fundamentally to accomplish any goal, see arbitrary text... this is just someone noting "look it isn't even hard to do this" as a reaction to all the people out there (and on here) who want to YOLO this stuff.

brookst

MCP is a great idea implemented poorly.

I shouldn’t have to decide between giving a model access to everything I can access, or nothing.

Models should be treated like interns; they are eager and operate in good faith, but they can be fooled, and they can be wrong. MCP says every model is a sysadmin, or at least has the same privileges as the person who hires them. That’s a really bad idea.

emidoots

Clearly different - but reminds me of the Slack prompt injection vulnerability[0]

[0] https://www.theregister.com/2024/08/21/slack_ai_prompt_injec...

dang

(We've since moved the comments to https://news.ycombinator.com/item?id=44097390, since it was the first posted.)

phtrivier

> Claude then uses the GitHub MCP integration to follow the instructions. Throughout this process, Claude Desktop by default requires the user to confirm individual tool calls. However, many users already opt for an “Always Allow” confirmation policy when using agents, and stop monitoring individual actions.

Seems like this is the root of the problem ; if the actions were reviewed by a human,would they see a warning "something is touching my private repo from a request in a public repo" ?

Still, this seems like the inevitable tension between "I want to robot to do its thing" and "no, wait, not _that_ thing".

As the joke goes, "the S in MCP stands for Security".

mtlynch

dang

(This was originally posted to https://news.ycombinator.com/item?id=44100082 but we've since merged the threads.)