Skip to content(if available)orjump to list(if available)

Show HN: Aura – Like robots.txt, but for AI actions

Show HN: Aura – Like robots.txt, but for AI actions

21 comments

·August 6, 2025

Hi,

I've been watching the rise of AI agents with a mix of excitement and dread. We're building incredible tools that can browse the web, but we're forcing them to navigate a world built for human eyes. They scrape screens and parse fragile DOMs.

We're trying to tame them to act like humans. I believe this is fundamentally wrong. The goal isn't to make AI operate at a human level, but to unlock its super-human potential.

The current path is dangerous. When agents from OpenAI, Google, and others start browsing at scale and speed, concepts like UI/UX will lose meaning for them. The entire model of the web is threatened. Website owners are losing control over how their sites are used, and no one is offering a real solution. The W3C is thinking about it. I decided to build it.

That's why I created AURA (Agent-Usable Resource Assertion).

It's an open protocol with a simple, powerful idea: let website owners declare what an AI can and cannot do. Instead of letting an agent guess, the site provides a simple aura.json manifest.

This gives control back to the site owner. It's a shift from letting AIs scrape data to being granted capabilities. We get to define the rules of engagement. This allows us to increase what AIs can do, not by letting them run wild, but by giving them clear, structured paths to follow.

A confession: I'm not a hardcore programmer; I consider myself more of a systems thinker. I actually used AI extensively to help me write the reference implementation for AURA. It felt fitting to use the tool to build its own guardrails.

The core of the protocol, a reference server, and a client are all open source on GitHub. You can see it work in 5 minutes:

Clone & Install: git clone https://github.com/osmandkitay/aura.git && cd aura && pnpm install

Run the Server: pnpm --filter aura-reference-server dev

Run the Agent: (in a new terminal) pnpm --filter aura-reference-client agent -- http://localhost:3000 "list all the blog posts"

You'll see the agent execute the task directly, no scraping or DOM parsing involved.

The GitHub repo is here: https://github.com/osmandkitay/aura

I don't know if AURA will become the standard, but I believe it's my duty to raise this issue and start the conversation. This is a foundational problem for the future of the web. It needs to be a community effort.

The project is MIT licensed. I'm here all day to answer questions and listen to your feedback—especially the critical kind. Let's discuss it.

input_sh

How is it different than llms.txt? https://llmstxt.org/

Are any websites actually using it (for llms.txt: https://llmstxt.site/)? Why do I need to npm install anything instead of writing a text file?

tempfile

> While websites serve both human readers and LLMs, the latter benefit from more concise, expert-level information gathered in a single, accessible location.

This honestly made me laugh out loud.

pbronez

Honestly I love the llmstxt idea if only because it implies that I can get an unbloated version of the web for my own use. Reader mode gets way easier if a .md is available everywhere.

Which of course means this is never going to fly with any site that needs to show you ads.

1gn15

As with what others had said, this is less of a robots.txt and more of a sitemap.

The issue with this is that website owners don't want to do this. Take Reddit removing the API for example. Everyone just switched to scraping the Reddit website instead for any remaining third party clients.

Yes, APIs were supposed to be a compromise to lower the resources needed on both sides, but Reddit's stock price is linked to the value of "their" data, so...

Alternatively, malicious website owners may make incorrect Aura files to mislead user agents. Then we're back to screen scraping as the ground truth, because behaving like a human is the best way to avoid discrimination.

flufluflufluffy

I will purposefully implement this with a bunch of incorrect information for every website I make

JimDabell

I’m not sure this is targeted correctly. As a general rule, new protocols for the web work best when they are associated with individual URLs, not with the somewhat nebulous concept of a site.

Have you considered something like <script type="text/llm"> or Link: <https://api.example.com/llms/foo>; rel="llm:foo", or just normal content negotiation on individual pages?

JohnFen

Like robots.txt, it has the fatal flaw of being unenforceable.

OsmanDKitay

You're right... Aura.json file itself is completely voluntary. A badly behaved agent can just ignore it. But this is where the model differs from robots.txt. Aura isnt the fence, it s the official map to the gates. The real enforcement happens on the backend at the API endpoint, just like any normal web app.

For example, aura manifest says the create_post capability needs auth. If an agent ignores that and POSTs to /api/posts without a valid cookie, our server's API will reject it with a 401. The manifest doesnt do the blocking, the backend does. It just tells the cooperative agents the rules of the road ahead of time.

So the real incentive for an agent to use Aura isnt about avoiding punishment, it s about the huge upside in efficiency and reliability. Why scrape a page and guess at DOM elements when you can make a single, clean API call that you know will work? It saves the agent developer time, compute resources, and the headache of maintaining brittle scrapers.

So;

robots.txt tells good bots what they shouldn't do.

aura.json tells them what they can do, and gives them the most efficient way to do it, all backed by the server's actual security logic.

JohnFen

The primary purpose of robots.txt isn't to deny access. That's just a sideline. The intended purpose is to do exactly what this aura proposal does: to provide guidance to crawlers as to what parts of the site are valuable to crawl. That's why it's voluntary: it's main reason for existing is to benefit the crawlers in the first place.

In that light, I guess your proposal makes a certain amount of sense. I don't think it addresses what a lot of web sites want, but that's not necessarily a bad thing. Own your niche.

OsmanDKitay

You re right that not every website needs this today. My bet is that this becomes essential for any site that wants to be a verb (a place to do actions), not just a noun (a place to read content), in the emerging agent driven web. Thanks for the thoughtful discussion.

paulryanrogers

Are you thinking of sitemap?

Nikkau

You should add a section to explain why OpenAPI isn't enough (narrator's voice: it is).

Otherwise, it just seems you vibecoded the wheel.

OsmanDKitay

The comparison to OpenAPI is the main thing to address and you re right to ask why it isn t enough.

OpenAPI is fantastic for describing a static API for a developer to read. But the web is more than that its a dynamic stateful environment built for human interaction. The current trend of forcing AI agents to navigate this human-centric web with screen scraping and DOM manipulation is brittle and I believe, unsustainable. Its like sending a robot into a grocery store to read the label on every single can instead of just asking the manager for the inventory list.

This is where Aura tries to be different in two key ways

Control & Permission:not just Documentation: Aura is designed from the website owner's perspective. It's a way for a site to say "This is my property and here are the explicit rules for how an automated agent can interact with it." The aura.json file is a handshake a declaration of consent. It gives control back to the site owner.

Statefulness(This is the big one): An OpenAPI spec is stateless. It cant tell an agent what it can do right now based on its current context. This is what the AURA-State header solves. So for example before you log in the AURA-State might only show you list_posts and login capabilities. After you successfully call login the very next response from the server includes a new AURA-State header that now unlocks capabilities like create_post and update_profile. The agent discovers its new powers dynamically. This state management is core to the protocol and doesn't really have a parallel in OpenAPI.

You re right to be skeptical and as I said in my post maybe Aura isnt the final answer. But I strongly believe the web needs a native capability-aware layer for the coming wave of AI agents. The current path of brute force interaction feels like it will break the open, human-centric web we ve all built.

YVoyiatzis

"vibecoded the wheel". Where did you get this from‽

Nikkau

> I actually used AI extensively to help me write the reference implementation for AURA.

zveyaeyv3sfye

How about we don't cater the web for the handful of companies hell bent on killing it?

> I actually used AI extensively to help me write the reference implementation for AURA.

So that's why. You drank the kool aid.

OsmanDKitay

The current way agents interact with the web is a problem.My view isnt that we should "cater" to them but that we should define the terms of engagement before they define it for us. Right now they re using scraping. That gives site owners zero control. Aura is an attempt to hand control back to the site owner by providing a clear aura.json manifest. It s about consent. As for using AI to build it, I believe you have to deeply understand a technology to help steer it.

jakeydus

> The project is MIT licensed. I'm here all day to answer questions and listen to your feedback—especially the critical kind. Let's discuss it.

You can tell that OP is a big AI believer by the final sentence. That's gotta be one of the most ChatGPT lines I've ever read.

tempfile

> It proposes a new standard for AI-web interaction that moves beyond fragile screen scraping and DOM manipulation towards a robust, secure, and efficient machine-readable layer for the internet.

This is nothing like robots.txt, it is much more like a sitemap. In fact, this design goal is almost word for word the point of the semantic web in general. You may find that there are existing working groups for similar resource description frameworks. Given how poor adoption of semantic tagging has been, I somewhat doubt sites start doing it just for LLMs.

Incidentally, I thought the whole point of an AI agent was that it could read and understand things by itself. I welcome any improvement in the semantic content of the web, but isn't scraping kind of the point?

null

[deleted]