Skip to content(if available)orjump to list(if available)

MCP-B: A Protocol for AI Browser Automation

throwanem

> If I asked you to build a table and gave you a Home Depot you probably would have a harder time than if I gave you a saw, a hammer and some nails.

I doubt that, first and not least because Home Depot stocks lumber.

bobmcnamara

Home Depot also sells tables.

null

[deleted]

bustodisgusto

Fixed. Nice catch

airtonix

[dead]

Abishek_Muthian

I’ve haven’t used any MCP so far but as a disabled person I see use cases in accessibility for MCPs doing browser/smartphone automation.

But any accessibility tool will be exploited by nefarious actors so I wonder how many main stream websites/apps would implement these MCP.

Has anyone tried any MCP for improving accessibility?

fzysingularity

The contributions for the Github project is quite intriguing: https://github.com/MiguelsPizza/WebMCP/graphs/contributors

MiguelsPizza | 3 commits | 89++ | 410--

claude | 2 commits | 31,799++ | 0--

bustodisgusto

I did some git history re-visioning when I closed sourced the extension for a bit. So these are not super accurate. Claude code did write about 85% of the code though.

fzysingularity

Nice!

rapind

I was checking that out too. Looks like claude was co-author on the initial commit, which is like 90%.

https://github.com/MiguelsPizza/WebMCP/commit/26ec4a75354b1c...

efitz

You’re going to see this pattern a lot more in the future.

consumer451

Claude's contributions graph is interesting. What is going on here? Does Claude Code commit as itself sometimes, but extremely rarely? I don't understand.

https://github.com/claude

handfuloflight

If you ask it to commit it'll sign itself as the author.

consumer451

But then, how are there so few commits in its profile graph? I suppose I may be admitting my ignorance of how public GitHub works, but still curious.

gubicle

That doesn't look right... if you look at the actual commits, they are all from

MiguelsPizza / Alex Nahas

https://github.com/MiguelsPizza/WebMCP/commits/main/

byteknight

He rewrote history to hide it?

He admits it here https://news.ycombinator.com/item?id=44516104

slt2021

Could all of this be replaced simply by publishing OpenAPI (Swagger) spec and using universal swagger mcp client ???

This basically leaves up to the user to establish authenticated session manually.

Assuming claude is smart enough to pick up API key from prompt/config, and can use swagger based api client, wouldnt that be the same?

bustodisgusto

That was everyone's first thought when MCP came out. Turns out it doesn't work too well since there is generally too many tools. People are doing interesting work in this space though

nilslice

pls don't put an api key in a prompt

efitz

Do it.

abrookewood

Looks similar to Elixir's Tidewave MCP server, which currently also supports Ruby: https://tidewave.ai/

Paraphrasing: Connect your editor's assistant to your web framework runtime via MCP and augment your agentic workflows and chats with: Database integration; Logs and runtime introspection; Code evaluation; and Documentation context.

Edit: Re-reading MCP-B docs, that is more geared towards allowing visitors to your site to use MCP, while Tidewave is definitely focussed on Developers.

mehdibl

From the blog post:

"The Auth problem At this point, the auth issues with MCP are well known. OAuth2.1 is great, but we are basically trying to re-invent auth for agents that act on behalf of the user. This is a good long term goal, but we are quickly realizing that LLM sessions with no distinguishable credentials of their own are difficult to authorize and will require a complete re-imagining of our authorization systems. Data leakage in multi-tenant apps that have MCP servers is just not a solved problem yet.

I think a very strong case for MCP is to limit the amount of damage the model can do and the amount of data it will ever have access to. The nice thing about client side APIs in multi-tenant apps is they are hopefully already scoped to the user. If we just give the model access to that, there's not much damage they can do.

It's also worth mentioning that OAuth2.1 is basically incompatible with internal Auth at Amazon (where I work). I won't go to much into this, but the implications of this reach beyond Amazon internal."

1. Oauth is not working in Amazon ==> need solution.

2. Oauth are difficult to authorize

3. limit the amount of damage the model can do WHILE "ulti-tenant apps is they are hopefully already scoped to the user".

I feel from a security side there is an issue here in this logic.

Oauth for apps can be far more tuned than current web user permission as usually, user have modification permission, that you may not want to provide.

Oauth not implemented in Amazon, is not really an issue.

Also this means you backdoor the App with another APP you establish trust with it. ==> This is a major no go for security as all actions on MCP app will be logged in the same scope as USER access.

You might just copy your session ID/ Cookie and do the same with an MCP.

I may be wrong the idea seem intersting but from a security side, I feel it's a bypass that will have a lot of issues with compliance.

SchemaLoad

Not sure who the intended user is here? For frontend testing you actually do somewhat want the tests to break when the UI changes in major ways. And for other automation you'd be better off providing an actual API to use.

muratsu

This puts the burden on the website owner. If I go through the trouble of creating and publishing an MCP server for my website, I assume that through some directory or method I'll be able to communicate that with consumers (browsers & other clients). It would be much more valuable for website owners if you can automate the MCP creation & maintenance.

mindwok

Pretty much every revolution in how we do things originates from the supplier. When websites became a thing the burden was on businesses to build them. Same with REST APIs. Same with mobile apps. As soon as there’s a competitive advantage to having the new thing, companies will respond if consumers demand it.

gavmor

Am I going to start to choose products based on their compatibility with WebMCP?

rapind

I think this is the practical way. The website owner (or rather the builder, since if you're running wordpress, we can assume MCP will be part of the package) is already responsible for the human interface across many devices, and also the search engine interface (robots.txt, sitemap.xml, metatags). Having a standard we can use to curate what the AI sees and how it can interact would be hugely beneficial.

There's space for both IMO. The more generic tool that figures it out on it's own, and the streamlined tool that accesses a site's guiderails. There's also the backend service of course which doesn't require the browser or UI, but as he describes this entails complexity around authentication and I would assume discoverability.

muratsu

I agree with you that platforms like wordpress, shopify etc will likely ship MCP extensions to help with various use cases. Accompanied with a discovery standard similar to llms.txt, I think it will be beneficial too. My only argument is that platforms like this are also the most "templated" designs and it's already easy for AI to navigate them (since dom structure variance is small).

The bigger challenge I think is figuring out how to build MCPs easily for SaaS and other legacy portals. I see some push on the OpenAPI side of things which is promising but requires you to make significant changes to existing apps. Perhaps web frameworks (rails, next, laravel, etc) can agree on a standard.

sbarre

> it's already easy for AI to navigate them (since dom structure variance is small).

The premise of MCP-B is that it's in fact not easy to reliably navigate websites today with LLMs, if you're just relying on DOM traversal or computer vision.

And when it comes to authenticated and read/write operations, I think you need the reliability and control that comes from something like MCP-B, rather than just trusting the LLM to figure it out.

Both Wordpress and Shopify allow users to heavily customize their front-end, and therefore ship garbage HTML + CSS if they choose to (or don't know any better). I certainly wouldn't want to rely on LLMs parsing arbitrary HTML if I'm trying to automate a purchase or some other activity that involves trust and/or sensitive data.

bustodisgusto

I think with AI tools you can pretty confidently build out an MCP server for your existing website. I plan to have good LLM docs for this very purpose.

For react in particular, lots of the form ecosystem (react hook form) can be directly ported to MCP tools. I am currently working on a zero config react hook form integration.

But yes, MCP-B is more "work" than having the agent use the website like a user. The admission here is that it's not looking like models will be able to reliably do browser automation like humans for a while. Thus, we need to make an effort to build out better tooling for them (at least in the short term)

mfrye0

I was thinking the same. Forward thinking sites might add this, but the vast majority of website owners probably wouldn't be able to figure this out.

Some middle ground where an agent reverse engineers the api as a starting point would be cool, then is promoted to use the "official" mcp api if a site publishes it.

Flux159

This is an interesting take since web developers could add mcp tools into their apps rather than having browser agents having to figure out how to perform actions manually.

Is the extension itself open source? Or only the extension-tools?

In theory I should be able to write a chrome extension for any website to expose my own custom tools on that site right (with some reverse engineering of their APIs I assume)?

bustodisgusto

The extension should be open source. I had it as a private submodule until today. Let me figure out my it's not showing up and get back to you.

The extension itself is a MCP server which can be connected to by other extension over cross extension messaging. Since the extension is part of the protocol, I'd like for the community to pull from the same important parts of the extension (MCPHub, content script) so they are consistent across extension implementations.

bustodisgusto

Ok it's open source now

Flux159

Thanks! Took a very quick look. It seems like the extension exposes tools for all domains that support mcp-b looking at DomainToolManager - does this mean if I have two tabs for a single domain you'll have duplicate tools per tab?

Haven't had enough time to look through all the code there - interesting problem I guess since a single domain could have multiple accounts connected (ex: gmail w/ account 0 vs account 1 in different tabs) or just a single account (ex: HN).

bustodisgusto

No there is built in tool de-duping. I'm not sure how to handle domains with different url states though.

Like you said there are some edge cases where two tabs of the same website expose different tool sets or have tools of the same name but would result in different outcomes when called.

Curios if you have any thoughts on how to handle this

orliesaurus

I don't get it from the homepage, feels like Selenium on the browser, since you built it can you explain ?

bustodisgusto

Similar but also very different. Playwright and Selenium are browser automation frameworks. There is a Playwright-MCP server which let's your agent use Playwright for browser automation.

MCP-B is a different approach. Website owners create MCP servers `inside` their websites, and MCP-B clients are either injected by browser extensions or included in the websites JS.

Instead of visual parsing like Playwright, you get standard deterministic function calls.

You can see the blog post for code examples: https://mcp-b.ai/blogs

mhio

A playright-mcp server, or any bidi browser automation, should be equally capable of discovering/injecting and calling the same client JS exposed MCP-B site API?

It's like an OpenAPI definition but for JS/MCP? (outside of the extension to interact with that definition)

Nathanba

what do you mean by "visual parsing like Playwright"? I'm pretty sure Playwright queries the DOM via js, there isn't inherently any visual parsing. Do you just mean that mcp-b has dedicated js APIs for each website? Your example is also pretty confusing, it looks like the website itself offers an "Increment by x" "tool" and then your first command to the website is to "subtract two from the count". So the AI model has to still understand the mcp tools offered by the website quite loosely and just calls them as needed? I suppose this is basically like using playwright except it doesn't have to parse the DOM (although it probably still does, I mean how else will it know that the "Increment by X" tool offered is in any way connected to the "count" you mention in your vague prompt. And then the additional benefit is that it can call a js function instead of having to generate the DOM/js playwright calls to do it.

I mean all this MCP stuff certainly seems useful even though this example isn't so good, the bigger uses will be when larger APIs and interactions are offered by the website like "Make a purchase" or "sort a table" and the AI would have to implement very complex set of DOM operations and XHR requests to make that happen and instead of flailing to do that, it can call an MCP tool which is just a js function.

bustodisgusto

Sorry this is in reference to the Playwright MCP server which gives a model access to screen shots of the browser and Playwright API's.

MCP-B doesn't do any DOM parsing. It exchanges data purely over browser events.

c0wb0yc0d3r

What differentiates this from something like data-test-id attributes?

bustodisgusto

data-test-id attributes and other attributes are hardcoded and need to be know by the automator at run time. MCP-B clients request what they can call at injection time and the server responds with standard MCP tools. (functions LLM's can call with context for how to call them)

netrem

The product seems interesting, but the landing page I found very chaotic and gave up reading it. The individual pieces of information are fine I think, but the flow is poor and some info repeats. Was it AI generated?

bustodisgusto

Yes it was mostly AI generated. I'm much more of a dev than a writer/marketer. Hopefully if this gains some traction I can pay someone to clean it up

metta2uall

Looks great. I love ideas that increase efficiency and reduce electricity usage.

Only nitpick is that the home page says "cross-browser" at the bottom but the extension is only available for Chrome..

bustodisgusto

Ah yea I'll fix that. Nice catch