Show HN: Nxtscape – an open-source agentic browser
94 comments
·June 20, 2025hannob
Okay, maybe this is a stupid question, but: what is an agentic browser? You seem to assume that everyone knows what that means.
Is this a common and well-defined term that people use? I've never heard it.
It would appear to me from the context that it means something like "web browser with AI stuff tackled on".
felarof
Thanks for asking - not a stupid question at all! I should have probably explained it at the top of my post.
By "agentic browser" we basically mean a browser with AI agents that can do web navigation tasks for you. So instead of you manually clicking around to reorder something on Amazon or fill out forms, the AI agent can actually navigate the site and do those tasks.
wild_egg
Not to pull a "why should I use Dropbox when I have rsync" but why should we use this over adding a Playwright MCP to Claude Desktop or similar?
Does having access to Chromium internals give you any super powers over connecting over the Chrome Devtools Protocol?
felarof
Yes, eventually we think there is more value of owning the entire stack than just be a MCP connector.
Few ideas we were thinking of: integrating a small LLM, building MCP store into browser, building a more AI friendly DOM, etc.
Even today, we use chrome's accessibility tree (a better representation of DOM for LLMs) which is not exposed via chrome extension APIs.
shortrounddev2
I would take the position of "why use this when I have eyes and hands and a brain?"
al_borland
I first heard the term agentic about a month ago. I went from never hearing it, to hearing it 3 or 4 times in 2 days... one of which was on an internal town hall where I work, where leadership was simply using it as if the whole world already knew what it meant, instead of literally being the first time it was ever mentioned.
The tl;dr is that it's AI that makes decisions on its own.
tcdent
[flagged]
kordlessagain
Agents are LLM responses that are feed with tools, like calculate(expression). When it encounters a thing it needs to do to meet desired output, it will run the tool. That is defining a simple agentic workflow.
A complicated workflow may involve other tools. For example, the input to the LLM may produce something that tells it to set the user-agent to such and such as string:
set_user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36");
Other tools could be clicking on things in the page, or even injecting custom JavaScript when a page loads.ugh123
That tab grouping capability is leaps and bounds better than the junk Google put into Chrome for "AI suggested" grouping.
felarof
+1, I use AI tab grouping feature in our browser quite a lot.
jklinger410
What is with Mac users forking Chromium and then only making releases for Mac?
felarof
Haha, was easier to build and we were the first users :)
have linux next on our radar. What build do you want?
jtolly710
.deb would be great to see next :)
shortrounddev2
Windows
awongh
I think LLMs could have a reasonable chance at solving tab-related workflows (keeping track of tabs or the idea/concept of tabs) - that is tracking and sorting lots of small related research ideas.
Sort of like a backwards perplexity search. (LLM context is from open tabs rather than the tool that brings you to those tabs)
I built a tab manager extension a long time ago that people used but ran into the same problem- the concept of tab management runs deeper than just the tabs themselves.
felarof
Yeah, I feel LLMs can finally solve the tab overload issue. I suffer from this constantly.
I added few features which I felt would be useful - easy way to organise and group tabs - simple way to save and resume sessions with selective context.
What are your problems that you would like to see solved?
awongh
I don't like the idea of letting the LLM run wild and categorize things directly, but in a tab-organizing view it would be useful to add more semantic sorting of the tabs- maybe it would enable something like multiple tab-view control panel: Show all the AI tabs. Show all the image diffusion tabs. Show all the LLM tabs. (so overlapping views of sets of tabs)
This would of course apply to not just open tabs but tabs I used to have open, where the LLM knows about my browsing history.
But I think I would want a non-chat interface for this. (of course at any time I could chat/ask a question as well)
psychoslave
I think a large part of it is us, as user, we lake the appropriate discipline.
Resist the call to open in a tab every link in this article, overcome the fear of losing something if all these tabs lagging behind are closed right now without further consideration.
xena
Do you respect robots.txt?
felarof
No, not today.
But wonder if it matter if it the agent is mostly using it for "human" use cases and not scrapping?
xena
You should, because universities are starting to get legal involved due to mass scraping taking down their systems.
dotancohen
Yes it would matter. The AI might be I in your eyes, but it is still A.
mattigames
What do you mean? This AI cannot scrape multiple links automatically? Like "make a summary of all the recipes linked in this page" kind of stuff? If it can it definitely meets the definition of scraping.
grepexdev
I think what he means is it is not just generally crawling and scraping, and uses a more targeted approach. Equivalent to a user going to each of those sites, just more efficiently.
b0a04gl
so agents can control tabs, forms, clicks—like a real user would.so what about undo. if an agent clicks the wrong thing, how do you roll that back without reloading the world?
felarof
There is a big red button to always stop the agent.
finolex
This is cool! Congrats on launch! How do you store user data? Do you write to device? Curious if there's a basic.tech x nxtscape collab possible here where you can store each user's info to their dedicated PDS
felarof
Thank you! Yeah all user data is just stored locally on device.
Oh cool, will look into basic.tech to understand more.
anilgulecha
Very interesting approach. Why a browser, and not a fantastic chrome extension? Grouping tabs, summarizing, even taking open ended actions, seem very doable with permissions extensions have..
edit: Just read about the accessibility thing, but that's thin. Is there any usecase in the future that a browser can, but an extension can't?
esafak
It sounds like something that needs to be dealt with in Chromium rather than forked. I am sure lots of developers want such functionality, if it is missing. I found:
https://developer.chrome.com/docs/extensions/ai
Don't any of these fit the bill? Are they Gemini-locked and you want something else? I am not familiar with the Chrome API, so pardon my ignorance.
dataviz1000
> Is there any usecase in the future that a browser can, but an extension can't?
The only reason to use a browser over a chrome extension is to bypass security features, for example, trusted events. If a user wants the browser window to go to full screen or play a video, a physical mouse click or key press is required. Moreover, some websites do not want to be automated like ChatGPT web console and Chase.com which checks if the event was a trusted event before accepting a button click or key press. This means that a Chrome extension can not automate voice commands inferred with audio to text. However, to get a trusted event only requires the user to press a button, any button, so message or dialog prompt that says, "Press to go full screen," is all that is required. This can be down with a remote bluetooth keyboard also.
The way I see it, these limitations are in place for very, very good reasons and should not be bypassed. Moreover, there are much larger security issues using a agentic browser which is sending entire contents of a bank website or health records in a hospital patient portal to a third party server. It is possible to run OpenAI's whisper on webgpu on a Macbook Pro M3 but most text generation models over 300M will cause it to heat up enough to cook a steak. There are even bigger issues with potential prompt injection attacks from third party websites that know agentic browsers are visiting their sites.
The first step in mitigating these security vulnerabilities is preventing the automation from doing anything a Chrome extension can't already do. The second is blacklisting or opt in only allowing the agents to read and especially to write (fill in form is a write) any webpage without explicit permission. I've started to use VSCode's copilot for command line action and it works with permissions the same way such as only session only access.
I've already solved a lot of the problems associated with using a Chrome extension for agentic browser automation. I really would like to be having this conversation with people.
EDIT: I forgot the most important part. There are 3,500,000,000 Chrome users on Earth. Getting them to install a Chrome extension is much, much easier than getting them to install a new browser.
felarof
Yeah accessibility is one such usecase, but in future we have few other ideaswhere having a fork makes it lot easier. Few ideas:
- Ship a small LLM along with browser - MCP store built in
null
lxe
Before I dive into the source code... how do you pass the page content, and the locations of interactive components to the LLM? And how do you dispatch events to interact with the page? I just want to verify if it's ARIA tree like the others, or it's something else.
felarof
Today, we connect to chrome using CDP and use Puppeteer to send clicks and other operations. Also, using browser use DOM tree highlighting, which works great.
To get the page content we parse accessibility tree.
rodolphoarruda
This is the missing piece from Karpathy's keynote: the browser.
mahoro
This is great, I'd like to test! Is there any recommendations on which ollama models works best with this kind of tasks?
OsrsNeedsf2P
Is this only for MacOS? If it's a Chromium fork, what's the reason for no Linux/Windows?
Also what's the business model?
felarof
Yes MacOS for now, but looking into getting Linux binary next.
> what's the reason for no Linux/Windows?
Sorry, just lack of time. Also we use Sparkle for distributing updates, which is MacOS only.
> Also what's the business model?
We are considering an enterprise version of the browser for teams.
Hi HN - we're Nithin and Nikhil, twin brothers and founders of nxtscape.ai (YC S24). We're building Nxtscape ("next-scape") - an open-source, agentic browser for the AI era.
-- Why bother building a new browser? For the first time since Netscape was released in 1994, it feels like we can reimagine browsers from scratch for the age of AI agents. The web browser of tomorrow might not look like what we have today.
We saw how tools like Cursor gave developers a 10x productivity boost, yet the browser—where everyone else spends their entire workday—hasn't fundamentally changed.
And honestly, we feel like we're constantly fighting the browser we use every day. It's not one big thing, but a series of small, constant frustrations. I'll have 70+ tabs open from three different projects and completely lose my train of thought. And simple stuff like reordering tide pods from amazon or filling out forms shouldn't need our full attention anymore. AI can handle all of this, and that's exactly what we're building.
Here’s a demo of our early version https://dub.sh/nxtscape-demo
-- What makes us different We know others are exploring this space (Perplexity, Dia), but we want to build something open-source and community-driven. We're not a search or ads company, so we can focus on being privacy-first – Ollama integration, BYOK (Bring Your Own Keys), ad-blocker.
Btw we love what Brave started and stood for, but they've now spread themselves too thin across crypto, search, etc. We are laser-focused on one thing: making browsers work for YOU with AI. And unlike Arc (which we loved too but got abandoned), we're 100% open source. Fork us if you don't like our direction.
-- Our journey hacking a new browser To build this, we had to fork Chromium. Honestly, it feels like the only viable path today—we've seen others like Brave (started with electron) and Microsoft Edge learn this the hard way.
We also started with why not just build an extension. But realized we needed more control. Similar to the reason why Cursor forked VSCode. For example, Chrome has this thing called the Accessibility Tree - basically a cleaner, semantic version of the DOM that screen readers use. Perfect for AI agents to understand pages, but you can't use it through extension APIs.
That said, working with the 15M-line C++ chromium codebase has been an adventure. We've both worked on infra at Google and Meta, but Chromium is a different beast. Tools like Cursor's indexing completely break at this scale, so we've had to get really good with grep and vim. And the build times are brutal—even with our maxed-out M4 Max MacBook, a full build takes about 3 hours.
Full disclosure: we are still very early, but we have a working prototype on GitHub. It includes an early version of a "local Manus" style agent that can automate simple web tasks, plus an AI sidebar for questions, and other productivity features (grouping tabs, saving/resuming sessions, etc.).
Looking forward to any and all comments!
You can download the browser from our github page: https://github.com/nxtscape/nxtscape