How and where will agents ship software?
26 comments
·July 16, 2025jamest
croes
If they get better. At the moment the progress is on the toolchains because the LLMs progress as such slows down because of the lack of training data
simonw
This article is about teaching coding agents to use InstantDB, which is "a modern Firebase".
I suggest jumping straight to this document, which is designed to tell the agent how to work with Instant but is pretty great documentation for humans who want to understand what it can do at the same time: https://www.instantdb.com/mcp-tutorial/claude-rules.md
nezaj
Thank you for the kind words on the rules/documentation! It was definitely an iterative process to figure out how to get good results.
We have an llms.txt and llms-full.txt (~9k lines) which contains all our documentation. Feeding these to the claude didn't get great results, it was just too much information.
We manually compressed our llms-full.txt into a rules file (~1.5k lines) which declared the API upfront and provided snippets of how to do different things with callouts to common examples. This condensed version did better but would cause Claude to make subtle mistakes.
Looking at the kind of mistakes Claude made, it seemed like a human could make those mistakes too (very useful feedback for us to improve our API ). We thought “what's one of the smallest fully contained examples we can make that packs a bunch of info on how to use Instant?” That would probably be useful for both a human and an agent. And indeed it seemed to be the case.
arscan
> Looking at the kind of mistakes Claude made, it seemed like a human could make those mistakes too (very useful feedback for us to improve our API ).
This is something we've found for our API -- just having LLMs attempt to use it helps us identify things that we haven't documented well or placed enough emphasis on (for things that are critical but are non-obvious or may be drowned out by other less important information). Improvements that help the LLM tend to be good for developers too.
stopachka
Yes. Fun fact, Instant got the `create` method because of how many times LLMs hallucinated it.
ar7hur
I've been using InstantDB for two projects for one year and it's awesome.
sails
> Traditionally, end-users were non-technical and would be stuck with whatever the application developer gave them. But now every user has an LLM too.
Interesting point.
I keep coming back to the idea that users could request changes, and they could be experimentally deployed immediately.
stopachka
Thank you. There was a lot to extensions that was bit of scope for the essay, which I would love to go deeper on in later writing.
Some open questions I had as I thought through extensions:
We talked about the data abstraction side: when you expose data, it's easier for end-users to build extensions. But there are questions on UIs and data modeling.
UIs: How cool would it be agents could "enter" into applications and change the UI? In one sense this hard, but at least a demo feels in reach. What if an app exposed the UI components that it was built out of? This would let the agent remix them.
Data modeling: Exposing data works, but what if users want to store extra information? Maybe each user could spin up their own separate "extra" database.
CartwheelLinux
I'm saving all of these articles for the next time we go through the "AI (LLMs) is going to change the world," cycle.
The systems we use can only be as smart and intuitive as the people who prompt them.
On top of it, this (LLMs) is not AI, not even close, if anything they are glorified prediction systems that require human prompting.
lenerdenator
> On top of it, this (LLMs) is not AI, not even close, if anything they are glorified prediction systems that require human prompting.
puts in retainer; pushes glasses back up bridge of nose
Technically schpeaking, what you're talking about is the difference between weak AI and strong AI/artificial general intelligence (AGI). AGI is the kind of AI that has reached human levels of consciousness. We're not there yet. Personally, I hope we don't get there, but I'm not the one in charge, so shrug.
You can do a lot with glorified prediction systems that require human prompting. Actually, they are arguably more valuable than AGI because you can more easily communicate and utilize their value proposition. People don't need a machine that wonders the same stuff they do; they need something that does a specific task in lieu of their own effort.
CartwheelLinux
Haha. You're 100% correct in the AGI/AI thing. I'm just sick and tired of every article being about AI, it's great people but we can't stop innovating and attending to other areas of technology.
>You can do a lot with glorified prediction systems that require human prompting >People don't need a machine that wonders the same stuff they do; they need something that does a specific task in lieu of their own effort.
This is the problem with our current revision with AI; the way I see it those two are in conflict with each other. In lieu of their own effort, the way a vast amount of the would be users think, is "without promoting" which would lend towards AGI than AI.
>Actually, they are arguably more valuable than AGI because you can more easily communicate and utilize their value proposition.
To you and I this might be true, but to your average non-techie I don't think it's quite as true as you would like it to be.
Short term it is very true, everyone sees the value until you realize it's inherit limitations and the 'shiny, wears off
sidewndr46
Given the level of disruption we'd see if a company reached AGI, wouldn't they be incentivized to somehow hide it? They could just use said AGI to produce inferior versions of itself, each one iteratively a little bit better than before.
xnx
> On top of it, this (LLMs) is not AI, not even close,
Do you think that the LLM/AI tools today are better than those from 2 years ago? Do you think the LLM/AI tools in 2 years time will be no better than the ones we have today?
dmitrygr
False equivalency. Faster and faster stochastic parrots != intelligence.
gregsadetsky
What other MCP-compatible tools are people using to ship/deploy software? Is there anything AWS-compatible that people like/use? Something for self-hosters? Anyone letting their agents ssh into machines..?
I suppose that most deployment/devops is done using existing git push workflows and IaaC. Has anyone had good experience with LLM/agent-compatible tools?
physix
Read the comments here so far and I find that they are absolutely right to offer an AI layer that speeds up building apps on their db.
Once built, the solution is plain-old-runnable-code (PORC :-), as long as the business logic implemented doesn't exit to LLM. So I don't fret so much about the AI hype story here.
For anyone starting off building with new tech, an AI assistant is really helpful.
hoppp
Just dont connect an agent to a pay per query database, unless you want to risk getting large bills.
Make sure the agent knows how much it costs to query
nezaj
In this case the cost per query is zero!
debarshri
If we achieve super intelligence, agents will be shipping themselves.
jerf
Any finite intelligence will have limits and a "complexity budget". I know I see a lot of people assuming AI will just be able to do anything, but they can't escape the limits of being finite. An AI will benefit from a well-packaged library in a similar way to what a human can, though they may have meaningfully different preferences on what it should look like.
ASalazarMX
Then they will be founding their own startups, and if successful, they'll invest in each other's startups.
And every one of them will be ads.
air7
but who will buy the advertised products? and with what money?
mhog_hn
Agents with their agent money - get ready for new legal structures and a bifurcation of the economy: agentic and human.
Who knows…
bluefirebrand
This is an implementation detail they'll figure it out as they go
I built an app (HN Clone, of course) with Instant's MCP hooked up to Claude Code.
The experience was brilliant.
Pros:
+ Fast
+ Easy
+ "Vibe coding on steroids" basically
+ The sense of 'wow' that comes very rarely with new tech
Cons:
- It used Instant as the database/backend, but I wasn't sure what it had done / how exactly it worked and had to spend a bunch of time asking Claude + reading the code to get it. It seemed reasonable, but if I were doing a prod system vs a PoC, this is where the time would be spent. ("Vibe coding lets you create tech debt 10x faster")
Net-net: This is the way for prototyping / validating. This is probably the way for production systems in N months too once the toolchain + agents get better.