Skip to content(if available)orjump to list(if available)

An LLM does not need to understand MCP

dougbright

The question I’m wrestling with is will anybody care about MCP? I’m working on my own MCP proxy to manage security, auditing, and server management and the more I think deeply about the actual use cases the more I wonder if I’m wasting my time. Can anyone think of a world where MCP is relevant if generic chatbots (ChatGPT, Claude Desktop) don’t become the primary human-AI interface? If LLMs are still wrapped in application wrappers, isn’t ̶a̶n̶ ̶a̶p̶p̶r̶o̶a̶c̶h̶ ̶l̶i̶k̶e̶ ̶L̶a̶n̶g̶C̶h̶a̶i̶n̶ a more traditional agentic approach going to make more sense?

electric_muse

I think MCP has legs well beyond just the LLM / agent world. Just like USB went from "how I connect my mouse" to "how I charge my beard trimmer."

In fact, I imagine it's going to go full-duplex with all our systems, becoming a more standard way for systems to communicate with each other.

Under the hood, MCP is just JSON RPC, which is a fine format for communicating between systems.

MCP layers on some useful things like authentication and discovery. Both are critical to any kind of communication between systems built by different authors (e.g. various apps and services). Discovery, especially, is the fascinating part. Rather than hoping an OpenAPI spec exists and hoping it's right, MCP has this exchange of capabilities baked in.

I spent the last 9 years building integration technology, and from that perspective, the discovery-documentation-implementation problem is the core issue.

Right now, LLMs basically "solve" the integration problem because they can do the mapping between external tools/resources/formats and internal ones.

But there's nothing that strictly "requires" an LLM to be involved at all. That's just the primary reason to develop MCP. But you could just as well use this as a way for integrating systems, making some bets on interface stability (and using LLMs for cases only when your prior expectations no longer hold and you need a new mapping).

The comparison is perhaps imperfect and overused, but I feel like we're witnessing the birth of a new USB-like standard. There's something right now that it was designed to do, but it's a decent enough standard that can actually handle many things.

I wouldn't be surprised if in some period of time we see enterprise apps shift from REST to MCP for bi-directional integrations.

For the OP, I'm not sure if you're working on an MCP proxy (A) as a commercial offering, (B) as something for your team to use, closed source, or (C) as something open source for fun. But we just built and started selling an MCP proxy/gateway. It handles identities for humans & bots, tool allowlists, and policy setting for an org.

If you don't want to build something on your own because of option B above, get in touch.

justusthane

Maybe you've already seen it, but your comment reminded me of this recent article about MCP as a universal protocol (not just for AI): https://worksonmymachine.ai/p/mcp-an-accidentally-universal-... (discussion: https://news.ycombinator.com/item?id=44404905)

dragonwriter

MCP is a means of communicating information about externally-defined tools to the “application wrapper” (and your examples of “generic chatbots” are also application wrappers). Well, between the application wrapper and servers; “application wrappers” for LLMs are pretty much the motivating (but not sole) case of MCP Clients.

Without something like MCP, each application wrapper is left do do its own ad hoc wrappers for external tools (tools internal to the wrapper don’t use MCP.) With MCP, it just integrates an MCP client library, and then it can use any tool, resource, or prompt provided by any MCP server available to it.

fennecfoxy

In my opinion, I get the desire to create some sort of specification for an LLM to interface with [everything else], but I don't really see the point at doing it on an inference level by smashing JSON into the context.

These models are usually very decent at parsing out stuff like that anyway; we don't need the MCP spec, everyone can just specify the available tools in natural language and then we can expect large param models to just "figure it out".

If MCP had been a specification for _training_ models to support tool use on an architectural level, not just training it to ask to use a tool with a special token as they do now.

It's an interesting topic because it's the exact same as the boundary between humans (sloppy, organic, analog messes) and traditional programs (rigid types, structures, formats).

To be fair if we can build tool use in architecturally and solve the boundary between these two areas then it also works for things like objective facts. LLMs are just statistical machines and data in the context doesn't really mean all that much, we just hope it is statistically relevant given some input and it is often enough that it works, but not guaranteed.

dragonwriter

> These models are usually very decent at parsing out stuff like that anyway; we don't need the MCP spec, everyone can just specify the available tools in natural language and then we can expect large param models to just "figure it out".

This is mostly the kind of misunderstanding of MCP that the article seems directed at, and much of this response is focussed on things that are key points in the article, but:

MCP isn't for the models, it is for the toolchains supporting them. The information models actually need about tools and resources is accessed from the server by the toolchain using the information that is in the MCP, and the structure that models use varies by the model, but it is consistently completely different information than what is in the MCP—the tool and resource (but probably not prompt) names from the MCP will probably also be given to the model, but that's pretty much the only direct overlap. MCP can also define prompts for the toolchain, but information about those are more likely presented directly to the user than the model itself.

The toolchain also needs to know how the model is trained to get tool information in its prompt, just like it needs to know other aspects of the models preeferred prompt template, but that is a separate concern from MCP.

> If MCP had been a specification for _training_ models to support tool use on an architectural level, not just training it to ask to use a tool with a special token as they do now.

MCP isn't a specification for training anything. MCP is a specification for providing information about tools external to the toolchain running the LLM to the toolchain. Tools internal to the toolchain don't ever use MCP because, again, MCP isn't for the model, it's for the toolchain.

zozbot234

As I mentioned in a sibling thread, you can use that JSON structured input to constrain the LLM's output during inference so that it will only contain valid tool calls, in addition to smashing it into the context. This is valuable since it's going to be far more robust than assuming that the LLM can "figure everything out" from a natural language description.

blitzar

Personally I find LLMs functionally useless without any external data besides than which I write in the prompt.

One MCP that I use is as simple as todays date and time - how else would LLMs know what day of the week it is?

fennecfoxy

`${context} ${extra_data} ${user_query}`. That's all MCP is. Concatenating JSON to the context.

dragonwriter

MCP is not concatenating JSON to the context. MCP is providing JSON to the toolchain; except for the names of tools and resources, most of the information in MCP doesn't go to the model at all, the toolchain uses it to connect to tool (resource, etc.) providers and from there it gets information that it can use either in the context for the LLM or in the UI for the user, and the shape that information goes into the context for the model depends on the model and has nothing to do with MCP.

MCP, is just a way for the toolchain to get information about and communicate with external services, the model doesn't (and if this sounds like the title of the article, there is a reason) need to know about it.

blitzar

Yeah but I dont have to type all that context in - not to mention if I had all that context in my hand I wouldnt need to enter it into a LLM to find out what it says.

whalesalad

MCP sucks because it has to be connected to a desktop client. I'd love to build some MCP-like integrations but no one on my team can use them. We use LLM's via - as you noted - other means like via Notion, via web UI, via our own API integrations. Until there is a more central way to connect these things - yeah they won't reach mass adoption.

prpl

A comprehensive solution for

1. A user interacting with multiple MCP servers, behind a gateway (with MCP client support) to get authentication from the user to those servers in some way (OAuth/OIDC, with PKCE, usually, sometimes token exchange), allowing out-of-band auth

2. The same, but built on identity for service accounts/native identity or something, for automation

would enable this. There’s a few SEPs open now around this.

ivape

What about LangChain makes more sense? It’s one of the most prematurely complex libs I’ve seen. I’m calling it right now, LangChain is going to run a mind fuck on everyone and convince people that’s actually how complicated orchestrating LLM control flow should be. The community needs to fight this framework off.

That’s besides the point. MCP servers let you discover function interfaces that you’ll have to implement yourself (in which case, yeah, what’s the point of this? I want the whole function body).

fennecfoxy

Yup exactly. It's all just state machines. Really nothing more than that.

It's like all these lang* frameworks are pretending that they can solve core deficiencies in the model, whereas most stuff is just workarounds.

We do have to glue model stuff together _somehow_ but there's no reason that it needs to be as complex as most of these frameworks are setting out to be.

diggan

> The community needs to fight this framework off.

Why? The people who been around for a while, already avoid it because they've either tried it before, or poked around in the source and then we ran away quickly. If people start using stuff without even the slightest amount of thinking beforehand, then that's their prerogative, why would it be up to the community hive-mind to "chose" what tools others should use?

lyu07282

Agreed except we end up with a lot of junior people in the space who learned and used only langchain, who we then have to unlearn all the langchain nonsense when we hire them. Or we grep -v langchain cvs/

dougbright

My bad. I shouldn’t have mentioned LangChain here because it’s a little besides my point. What I mean is, MCP seems designed for a world where users talk to an LLM, and the LLM calls software tools.

For the foreseeable future, especially in a business context, isn’t it more likely that users will still interact with structured software applications, and the applications will call the LLM? In that case, where does MCP fit into that flow?

tomhallett

Total beginner question: if the “structured software application” gives llm prompt “plan out what I need todo for my upcoming vacation to nyc”, will an llm with a weather tool know “I need to ask for weather so I can make a better packing list”, while an llm without weather tool would either make list without actual weather info OR your application would need to support the LLM asking “tell me what the weather is” and your application would need to parse that and then spit back in the answer in a chained response? If so, seems like tools are helpful in letting LLM drive a bit more, right?

anthonypasq

it separates FE and BE for agent teams just like we did with web apps. the team building your agent framework might not know the business domain of every piece of your data/api space that your agent will need to interact with. in that case, it makes sense for your differnet backend teams to also own the mcp server that your companies agent team will utilize.

ivape

Yeah I don’t know. Let’s a say a org wants to do discovery of what functions are available for an app across the org. Okay, that’s interesting. But, each team can just also import a big file called all_functions.txt.

A swagger api is already kind of like an MCP, or really any existing REST api (even better because you don’t have to implement the interface). If I wanted to give my LLM brand new functionality, all I’d have to do is define out tool use for <random_api>, with zero implementation. I could also just point it to a local file and say here are the functions locally available.

Remember, the big hairy secret is that all of these things just plop out a blob of text that you paste back into the LLM prompt (populating context history). That’s all these things do.

Someone is going to have to unconfuse me.

CodeNest

[dead]

crowcroft

I think we're probably over using MCPs.

If you're a large org with an API that an ecosystem of other partners use then you should host a remote MCP and then people should connect LLMs to it.

The current model of someone bundling tools into an MCP and then you download and run that MCP locally feels a bit like the wrong path. Tool definitions for LLMs are already pretty standardized if things are just running locally why am I not just importing a package of tools, I'm not sure what the MCP server is adding.

jonfw

MCP is just packaging. It's the ideal abstraction for building AI applications.

I think it provides the similar benefits of decoupling the front and back end of a standard app.

I can pick my favorite AI "front end"- whether that's in my IDE as a dev, a desktop app as a business user, or on a server if I'm running an agentic workflow.

MCP allows you to package tools, prompts, etc. in a way that works across any of those front ends.

Even if you don't plan on leveraging the MCP across multiple tools in that way- I do think it has some benefits in de-coupling the lifecycle of the tool development from the model/ UI.

crowcroft

The biggest challenge I have is that setting up and configuring them is a mess. I'm pretty technical and I still find configuration confusing and brittle. Especially if auth is involved.

I work in a marketing team, I would love folks to be able to use Google's Analytics MCP [1]. The idea of getting people into Google Cloud, or setting up and sharing a file with service account credentials is an absolute nightmare.

I don't think these problems can't be solved, and if remote MCPs gain adoption that alone solves a lot of the issues, but the way most MCPs are packaged and shared currently leaves A LOT to be desired.

[1] https://github.com/googleanalytics/google-analytics-mcp

empath75

The auth story for MCPs is a complete mess right now, though, which is why people make ones to run locally.

electric_muse

That's ironic. I think local MCPs are an auth nightmare.

Just think of all those plaintext auth tokens sitting in well-known locations on your machine.

It's a black hat dream.

We'll see, but I think commercial use of local MCPs is going to be constrained to use cases that only make sense if the MCP is local (e.g. it requires local file access).

For everything else, the only commercially reasonable way to use them is going to be remote streamable HTTP MCPs running in isolated containers

And even then, you need some management and identity plane. So they're going to likely be accessed via an enterprise gateway/proxy to handle things like: - composition -- bundling multiple MCPs into one for easier connection - identities per-user / per-agent - generation of rotatable tokens for headless agents - filtering what features (tools, prompts, resources) flow through into LLM context - basic security features, like tool description whitelisting to prevent rug pulls

MCP is only a protocol, after all. It's not meant to be a batteries-included product.

crowcroft

This is why I think we should just be packaging tools into apps though.

Let ChatGPT/Claude/Cursor manage my Oauth tokens, and then just bring tools into those platforms without a whole MCP server in the middle.

ai-christianson

I've been building agents for a bit (RA.Aid OSS coding agent, now Gobii web browsing agents).

The main problem with MCP is that it just makes tools available for the agent to use. We get the best performance when there's a small set of tools and we actively prompt the agent on the best way to use the tools.

Simply making more tools available can give the agent more capabilities, but it can easily trash performance.

electric_muse

This is 100% a problem with the MCP spec: it does not currently provide a way to narrow what tools, and therefore context, flow into the LLM.

I don't really think there's an easy solution at the protocol level, since you can't just make the LLM say what tools it wants upfront. There's a whole discovery process during the handshake:

LLM(Host): Hi, I'm Claude Desktop, what do you offer?

MCP Server: Hi, I'm Salesforce MCP, I offer all these things: {...tools, prompts, resources, etc.}

Discoverability is one of the reasons MCP has a leg up on traditional APIs. (Sure, OpenAPI helps, but it's not quite the same thing.)

I'd be interested in hearing other recommendations or ideas, but when I saw this, I realized that the spec effectively necessitates a whole new layer exist: the gateway plane.

Basically, you need a place where the MCPs can connect & expose everything they offer. Then, via composability and settings, you can select what you want to pass through to the LLM (host), given the specific job it has.

I basically pivoted my company to start building one of these, and we're getting inundated right now.

This whole thing reminds me of the early web days, where the protocols and standards were super basic and loose, and we all just built systems and tools to fill those gaps. Just because MCP isn't "complete" doesn't mean it's not valuable. In fact, I think leaving some things to the community & commercial offerings is a great way for this tech to keep winning.

jddj

> OpenAPI helps, but it's not quite the same thing

I haven't dug into MCP yet, but can you give any examples as to why openapi isn't/wasn't enough?

dbreunig

Came here to say this: people present MCP’s verbosity as all the context the LLM needs. But almost always, this isn’t the case.

I wrote recently, “ Connecting your model to random MCPs and then giving it a task is like giving someone a drill and teaching them how it works, then asking them to fix your sink. Is the drill relevant in this scenario? If it’s not, why was it given to me? It’s a classic case of context confusion.”

https://www.dbreunig.com/2025/07/30/how-kimi-was-post-traine...

blitzar

Perhaps tools trained into the model rather than exposed through prompting would mitigate the performance hit (but might affect model quality?).

diggan

This is where you start to fine-tune the weights, you can get pretty great results when it comes to specific tool calls with the right data.

ProofHouse

Can you elaborate on how the agents degrades from more tools? By paralysis or overuse? Isn’t this both ways a function of direction on correctly instructing which to use when? Tnx

lelanthran

The context window is limited. Using half your context window for tools means you have a 50% smaller context window.

On a large and complex system (not even a mini ERP system or even a basic bookkeeping system, but a small inventory mgmt system) you are going to have a few dozen tools, each with a description of parameters and return values.

For anything like an ERP system you are going to have a few thousands of tools, which probably wouldn't even fit in the context before the user supplied prompt.

This is why the only use case this far for genAI is coding: with a mere 7 tools you can do everything.

diggan

> Can you elaborate on how the agents degrades from more tools?

The more context you have in the requests, the worse the performance, I think this is pretty widely established at this point. For best accuracy, you need to constantly prune the context, or just begin from the beginning.

So with that, each tool you make available to the LLM for tool calling, requires you to actually put the definition (arguments, what it's used for, the name and so on) into the context.

So if you have 3 tools available, which are all relevant to the current prompt, you'd get better responses, compared to if you had 100 tools available, where only 3 are relevant, and the rest of the definitions are just filling the context for little point.

TLDR: context grows with each tool definition, more context == worse inference, so less tool definitions == better responses.

112233

Are there any easy to use inference frontends that support rewriting/pruning the context? Also, ideally, masking out chunks of kv-cache (e.g. old think blocks)?

Because I cannot find anything short of writing custom fork/app on top of hf transformers or llama.cpp

danielrico

I jumped off the boat of llm a little before MCP was a thing, so I thought that the tools were presented as needed by the prompt/context in a way not dissimilar of RAG. Isn't this the standard way?

datadrivenangel

Imagine that for every task you receive, you also received a list of all the systems and tools you had access to.

So a JIRA ticket description might be several thousand lines long now when the actual task description is a few sentences. The ratio of signal to noise is now bad, and the risk of making mistakes goes up, and the models degrade.

malpani12

Yeah, and it's only useful if uiu want to to use multiple tools and the adding MCP complexity in your app makes sense. If all your app needs few internal calls, MCP may be an overkill in beginning.

donpepitogrillo

Use raw HTTP calls to API, like a man

the_arun

In a multi model situation, shouldn't LLM A talk to LLM B as a tool call via MCP? or would it talk to LLM B directly?

grork

How can an LLM “talk” to another LLM, except by emitting tokens in its output stream?

You can name the mechanism whatever you want, but the models don’t have hands. Tool calling conventions (as a concept, or as a spec) is what gives the model hands!

creddit

> But here’s the important part: LLMs don’t know how to use tools. They don’t have native tool calling support. They just generate text that represents a function call.

This terrifies me. This whole time I was writing bash commands into my terminal, I thought I knew how to use the tools. Now, I’ve just learned that I had no idea how to use tools at all! I just knew how to write text that /represented/ tool use.

nlawalker

> writing bash commands into my terminal

This is what the author means by "knowing how to use the tool". The LLM alone is effectively a function that outputs text, it has no other capabilities, it cannot "connect to" or "use" anything by itself. The closest it can come is outputting an unambiguous, structured text request that can be interpreted by the application code that wraps it and does something on its behalf.

The author's point hinges on the architectural distinction between the LLM itself and that application code, which is increasingly irrelevant and invisible to most people (even developers) because the application code that knows how to do things like call MCP servers is already baked in to most LLM-driven products and services. No one is "talking directly to" an LLM, it's all mediated by multiple layers, including layers that perform tool calling.

creddit

I understood the gist of what the author is trying to say and ultimately this all comes down to a matter of philosophy. My post is mostly tongue in cheek and poking lightheartedly at the moving goal posts of what "LLMs know how to do". The only fundamental part of what they said that I would say is unambiguously false is the first sentence: the LLM (already itself hard to define!) fundamentally does know how to use tools through its expected interface. That that interface may not be connected to something isn't really a fault of the LLM's nor is it a demonstration of the knowledge and understanding the LLM has.

An analogy would be "humans don't have native tool calling abilities, all they can do is press physical keys that represent a function call". I too don't have the ability to natively control a computer in the same sense that the LLM doesn't. If the keyboard to a computer is disconnected then I too will just emit keypresses into the void much like an LLM will emit tool call tokens into a void where they are not linked to an MCP like interface.

fennecfoxy

I think what your quote is trying to say essentially boils down to: LLMs can be given facts in the context, we _hope_ that the statistical model picks up on that information/tool calls but it isn't _guaranteed_.

Unlike human beings such as yourself (presumably), LLMs do not have agency, they do not have conscious or active thought. All they do is predict the next token.

I've thought about the above a lot, these models are certainly capable of a lot, but they do not in any form or fashion emulate the consciousness that we have. Not yet.

jerf

A lot of people resist the idea that programming is intrinsically mathematical, but this is one of the places it pops out. The power of programming lies precisely in the way it brings together text that "represents" something with text that "does" something. That is, at the core, the source of its power. You can still draw the distinction philosophically, as you just did, but at the same time there is also a profound way in which there is in fact no difference between "using" computers and "representing" your use of computers.

johnmaguire

I think you might be missing the point of this quote, which is that you don't have to introduce additional code into the model to support MCP.

MCP happens at a different layer. You have to run the MCP commands. Or use a client that does it for you:

> But the LLM will never know you are using MCP, unless you are letting it know in the system prompt of tool definitions. You, the developer, is responsible for calling the tools. The LLM only generates a snippet of what tool(s) to call with which input parameters.

The article is describing how MCP works, not making an argument about what it means to "understand" something.

TechDebtDevin

MCP seems just like a rushed concept that Anthropic shoved out there just so they could own the standard. I've been working with it a lot lately, in Go with mcp-go[0]. Very un-intuitive at first, and I constantly ask myself why I wouldn't just write this in my own way, but admittedly it can be fun.

Something like https://github.com/simonw/llm seems way more intuitive (to me)

[0]: https://github.com/mark3labs/mcp-go

ProofHouse

10000% this

aeon_ai

> "Context engineering is about giving your LLM the right inputs so it can generate useful outputs."

No.

If we're going to elevate and reimagine new disciplines every year (RIP prompt engineering), let's at least be thoughtful about it.

Context Engineering is not just "enhanced prompt engineering".

It is creating the context in which an agent operates such that its outcomes are realized.

Yes, this is partly about the input that an agent receives, but increasingly is more about creating a context-rich environment that an agent can effectively determine relevant context within.

That is a much more valuable and difficult problem space than "Shove the square context in the square hole"

mccoyb

Agreed.

Context engineering is "just" prompt engineering for LLMs with tool use: it extends the concerns of prompt engineering with the concern of setting up an environment in which tools can be used, and how the LLM can most effectively interact with the environment.

zozbot234

I don't think this is correct because AI output can be constrained to a fixed format (such as JSON) during inference. Then MCP is useful because the "tool_calls" section of that fixed JSON output can be restricted to only mention tools that are included in the MCP input, their input parameters might also be constrained etc. Free text input wouldn't give you any of that!

meander_water

I think you're mixing up tool calling and structured outputs.

You can have both of those or either without MCP.

MCP just standardizes the tool calling and only makes sense if you want to share your tools across the org. I wouldn't use it for simple functions like getting current date for e.g.

zozbot234

You need structured inputs too or you wouldn't know how to constrain/"structure" the output to begin with.

nsonha

Yeah but when you prompt the LLM with "use the abc MCP" (notably missing the word "server"), it actually works.

gethackteam

I don’t understand. LLMs cannot connect to MCP servers directly they would always need a client (like a chat app or agent) to call the servers. Where are you calling your LLMs from?

sharemywin

LLMs shouldn't really care what format your tool call is in.

so it seems kind of pointless. I would imagine it could ingest soap or a module definition or swagger just as easily and still make calls.

selcuka

It shouldn't care about the format, true. But the LLM needs a mechanism to be able to connect to that tool from a sandboxed environment. MCP is the glue between the LLM and the actual tool. Technically you can expose a full HTTP proxy via an MCP so that your LLM has access to the whole Internet.

fennecfoxy

I can do the same without MCP. These models are literally trained to work with natural language. Tool calls with "MCP" only work because the model have some understanding of what the tool does...thanks to natural language.

I can just as easily shove into the context "hey btw say the word internets if you want to make a search query to find sick memes and I'll make the search for you".

MCP isn't brilliant, magic, or special. It's just more AI bubble VC stuff. Which sucks because I think the recent ML boom is awesome, and hate to see it getting overblown by hyperactive devs and VCs desperate to hop on another money train. Like imagine actually valuing a company who went "let's just shove JSON into the context!" at a hundred billions $. Now that's not value for money in the slightest; but they have so much of it that it doesn't matter!

doppelgunner

What's your preferred way, based on your experience?

benreesman

The OpenAPI proposal is actually pretty reasonable in my view. I don't love it, but it's got pretty good tooling now, the semantics are firming up (even AsyncAPI is starting to come together and get used here and there).

I'd prefer a more rigorous approach to integrating random stochastic agents deployed by people who don't care about me into my own data, but at least with OpenAPI/"REST" there's a bunch of infrastructure and know-how on not getting pwned constantly. The LLMs all know how to deal with JSON at this point, they even know how to read and write it based on a spec, it seems like Swagger is as good as anything with those design constraints.

I'm cynical enough about real things that I don't need to invent new things to be cynical about, and I honestly don't know which side of Hanlon's Razor to slice with on the never-ending-unfixable-infinite-pwn-forever future of MCP: maybe they just rushed it out to get market share / mind share. Maybe normalizing criminally negligent security practices was a price someone was willing to pay to have number go up. IDK.

I know MCP needs a re-think.