Show HN: Stagehand – an open source browser automation framework powered by AI
47 comments
·January 8, 2025dchuk
This looks awesome.
What I would love to see either as something leveraging this, or built in to this, is if you prompt stagehand to extract data from a page, it also returns the xpath elements you'd use to re-scrape the page without having to use an LLM to do that second scraping.
So basically, you can scrape new pages never before seen with the non-deterministic LLM tool, and then when you need to rescrape the page again to update content for example, you can use the cheaper old-school scraping method.
Not sure how brittle this would be both going from LLM version to xcode version reliably, or how to fallback to the LLM version if your xcode script fails, but overall conceptually, being able to scrape using the smart tools but then building up basically a library of dumb scraping scripts over time would be killer.
hackgician
Yeah, I think someone opened a similar issue on GitHub: https://github.com/browserbase/stagehand/issues/389
Repeatability of extract() is definitely super interesting and something we're looking into
9dev
Cache the response for a given query-page hash pair maybe? So the LLM will only be consulted when the page content hash changes, the previous answer be reused otherwise
matsemann
Agree. The worst part of integration tests are how brittle they often are. I don't want to introduce yet another thing that could give false test errors.
But of course, the way it works now could also help reduce the brittleness. With an xpath or selector, it quickly breaks when the design changes or things are moved around. With this, it might overcome this.
So tradeoffs, I guess.
ushakov
there’s also llm-scraper: https://github.com/mishushakov/llm-scraper
disclaimer: i am the author
asar
This looks really cool, thanks for sharing!
I recently tried to implement a workflow automation using similar frameworks that were playwright or puppeteer based. My goal was to log into a bunch of vendor backends and extract values for reporting (no APIs available). What stopped me entirely were websites that implemented an invisible captcha. They can detect a playwright instance by how it interacts with the DOM. Pretty frustrating, but I can totally see this becoming a standard as crawling and scraping is getting out of control.
mpalmer
This looks very cool and makes a lot of sense, except for the idea that it should take the place of Playwright et al.
Personally I'd love to use this as an intermediate workflow for producing deterministic playwright code, but it looks like this is intended for running directly.
I don't think I could plausibly argue for using LLMs at runtime in our test suite at work...
Klaster_1
It's funny you mentioned "deterministic Playwright code," because in my experience, that’s one of the most frustrating challenges of writing integration tests with browser automation tools. Authoring tests is relatively easy, but creating reliable, deterministic tests is much harder.
Most of my test failures come down to timing issues—CPU load subtly affects execution, leading to random timeouts. This makes it difficult to run tests both quickly and consistently. While proactive load-testing of the test environment and introducing artificial random delays during test authoring can help, these steps often end up taking more time than writing the tests themselves.
It would be amazing if tools were smart enough to detect these false positives automatically. After all, if a human can spot them, shouldn’t AI be able to as well?
Kostarrr
Hi! Kosta from Octomind here.
We built basically this: Let an LLM agent take a look at your web page and generate the playwright code to test it. Running the test is just running the deterministic playwright code.
Of course, the actual hard work is _maintaining_ end-to-end tests so our agent can do that for you as well.
Feel free to check us out, we have a no-hassle free tier.
hackgician
Interesting, thanks for the feedback! By "taking the place of Playwright," we don't mean the AI itself is going to replace Playwright. Rather, you can continue to use existing Playwright code with new AI functionalities. In addition, we don't really intend for Stagehand to be used in a test suite (though you could!).
Rather, we want Stagehand to assist people who want to build web agents. For example, I was using headless browsers earlier in 2024 to do real-time RAG on e-commerce websites that could aggregate results for vibes-based search queries. These sites might have random DOM changes over time that make it hard to write sustainable DOM selectors, or annoying pop-ups that are hard to deterministically code against.
This is the perfect use for Stagehand! If you're doing QA on your own site, then base Playwright (as you mention) is likely the better solution
cjonas
How do you get by when every major sites starts blocking headless browsers? A good example right now is Zillow, but I foresee a world where big chunks of the internet are behind captcha and bot detection
andrewmcwatters
That's not really a problem for Stagehand. It's a problem for Selenium, Playwright, Puppeteer and others at the browser automation library level.
andrewmcwatters
It seems to me like Selenium would have been a more appropriate API to extend from, then. Playwright, despite whatever people want it to be otherwise, is explicitly positioned for testing, first.
People in the browser automation space consistently ignore this, for whatever reason. Though, it's right on their site in black and white.
hackgician
Appreciate the feedback. Our take is that Playwright is an open-sourced library with a lot of built-in features that make building with it a lot easier, so it's definitely an easier starting point for us
z3t4
My kneejerk reflex: "create-browser-app" is a very generic name, should just have called it "stagehand"
fbouvier
Hey Anirudh, Stagehand looks awesome, congrats. Really love the focus on making browser automations more resilient to DOM changes. The act, extract, and observe methods are super clean.
You might want to check out Lightpanda (https://github.com/lightpanda-io/browser). It's an open-source, lightweight headless browser built from scratch for AI and web automation. It's focused on skipping graphical rendering to make it faster and lighter than Chrome headless.
TheTaytay
Lightpanda does look promising, but this is an important note from the readme: " You should expect most websites to fail or crash."
fbouvier
You're absolutely right, the 'most websites will fail' note is there because we're still in development, and the browser doesn't yet handle the long tail of web APIs.
That said, the architecture's coming together and the performance gains we're seeing make us excited about what's possible as we keep building. Feedback is very welcome, especially on what APIs you'd like to see us prioritize for specific workflows and use cases.
bluelightning2k
Does this open up the possibility of automating an existing open browser tab? (Instead of a headless or specifically opened instance of chrome?)
namanyayg
Afaik no. But if it's access to authenticated resources that you want, you can do so by copying over cookies.
sparab18
I've been playing around with Stagehand for a minute now, actually a useful abstraction here. We build scrapers for websites that are pretty adversarial, so having built in proxies and captcha is delightful.
Do you guys ever think you'll do a similar abstraction for MCP and computer use more broadly?
hackgician
Thanks so much! Our Stagehand MCP server actually won Anthropic's Claude MCP hackathon :) Check it out: https://github.com/browserbase/mcp-server-browserbase/tree/m...
We're working on a better computer use integration using Stagehand, def a lot of interesting potential there
jimmySixDOF
interesting and hope to see this improve with open source GUI Agent vision model projects like OS-Atlas
zanesabbagh
Have been on the Slack for a while and this crew has had an insane product velocity. Excited to see where it goes!
hackgician
Thanks so much Zane!!
jsdalton
Does it operate by translating your higher level AI methods into lower level Playwright methods, and if so is it possible to debug the actual methods those methods were translated to?
Also is there some level of deterministic behavior here or might every test run result in a different underlying command if your wording isn’t precise enough?
hackgician
It's a little hacky, but we have a method in the act() handler called performPlaywrightMethod that takes in a playwright method + xpath and executes the playwright method on the xpath. There's definitely a lot of room for improvement here, and we're working on making observe() fill those gaps. I think observe() aims to be like GitHub Copilot's gray suggested text that you can then confirm in a secondary step; whereas act() takes on a more agentic workflow that you let the underlying agent loop make decisions on your behalf
pryelluw
Can it be adapted to use ollama? Seems like a good tool to setup locally as a navigation tool.
hackgician
Yes, you can certainly use Ollama! However, we strongly recommend using a more beefed up model to get sustainable results. Check out our external_client.ts file in examples/ that shows you how to setup a custom LLMClient: <https://github.com/browserbase/stagehand/blob/main/examples/...>
jerrygoyal
wow. It's like cursor vs vscode movement but for browser automation and scrapping. Kudos to the author. Are there any other similar tools?
andrethegiant
https://crawlspace.dev is similar [disclaimer: I’m the founder]
righthand
I’m curious how this compares to playwrights already built in codegen:
https://playwright.dev/docs/codegen-intro
Is a chat bot easier to reiterate a test?
hackgician
Playwright codegen is incredibly powerful, but still pretty brittle. Its DOM selectors are still hardcoded, so you run the risk of Playwright selecting an unsustainable DOM selector. With Stagehand, the code is self-healing since it's dynamically generating Playwright every time, making it much more resilient to minor DOM changes
kevmo314
How do you avoid this becoming horrendously expensive per run? Are the results cached if the DOM doesn't change?
hackgician
The purpose of using Playwright is to basically write deterministic workflows in deterministic automation code. We have basic prompt caching right now that works if the DOM doesn't change (as you mention), but also the best way to reduce token cost is to reduce reliance on AI itself. You have the most control over how much you want to rely on AI v. how much you want to write repeatable Playwright code.
Hi HN! I’m Anirudh — longtime lurker, first time poster, and I couldn’t be more excited to show you Stagehand.
Stagehand is a TypeScript project that extends Playwright with three simple AI methods — act, extract, and observe. We’d love for you to try it out using the command below:
Here’s a sample workflow: We built Stagehand because we loved building browser automations using Playwright and Selenium, but we grew frustrated at how cumbersome it is to just get started and write simple browser automations. These frameworks, while incredibly powerful, are built for QA testing and are thus notoriously prone to fail if there are minor changes in the UI or underlying DOM structure.The goal of Stagehand is twofold:
1. Make browser automations easier to write 2. Make browser automations more resilient to DOM changes.
We were super energized by what we’ve been seeing with vision-based computer use agents. We think with a browser, you can provide even richer data by leveraging the information in the DOM + a11y tree in addition to what’s rendered on the page. However, we didn’t want to go so far as to build an agent, since we wanted fine-grained control over each step that an agent can take.
Therefore, the happy medium we built was to extend the existing powerful functionalities of Playwright with simple and extensible AI APIs that return the decision-making power back to the developer at each step.
Check out our docs: https://docs.stagehand.dev
We’d love for you to join and give us feedback on Slack as well: https://stagehand.dev/slack