Show HN: Why write code if the LLM can just do the thing? (web app experiment)
133 comments
·November 1, 2025unbehagen
Amazing! Very similar approach, would love to heae what you think: https://github.com/gerkensm/vaporvibe
Finbarr
If you added a few more tools that let the LLM modify code files that would directly serve requests, that would significantly speed up future responses and also ensure consistency. Code would act like memory. A direct HTTP request to the LLM is like a cache miss. You could still have the feedback mechanism allowing a bypass that causes an update to the code. Perhaps code just becomes a store of consistency for LLMs over time.
ychen306
It's orders of magnitude cheaper to serve requests with conventional methods than directly with LLM. My back-of-envelope calculation says, optimistically, it takes more than 100 GFLOPs to generate 10 tokens using a 7 billion parameter LLM. There are better ways to use electricity.
sramam
I work in enterprise IT and sometimes wonder if we should add the equivalent energy calculations of human effort - both productive and unproductive - that underlies these "output/cost" comparisons.
I realize it sounds inhuman, but so is working in enterprise IT! :)
sunaurus
The question posed sounds like "why should we have deterministic behavior if we can have non-deterministic behavior instead?"
Am I wrong to think that the answer is obvious? I mean, who wants web apps to behave differently every time you interact with them?
reissbaker
I think it's actually conceptually pretty different. LLMs today are usually constrained to:
1. Outputting text (or, sometimes, images).
2. No long term storage except, rarely, closed-source "memory" implementations that just paste stuff into context without much user or LLM control.
This is a really neat glimpse of a future where LLMs can have much richer output and storage. I don't think this is interesting because you can recreate existing apps without coding... But I think it's really interesting as a view of a future with much richer, app-like responses from LLMs, and richer interactions — e.g. rather than needing to format everything as a question, the LLM could generate links that you click on to drill into more information on a subject, which end up querying the LLM itself! And similarly it can ad-hoc manage databases for memory+storage, etc etc.
ddalex
Like, for sure you can ask the AI to save it's "settings" or "context" to a local file in a format of its own choosing, and then bring that back in the next prompt ; couple this with temperature 0 and you should get to a fixed-point deterministic app immediately
dehsge
There still maybe some variance at temperature 0. The outputted code could still have errors. LLMs are still bounded by the undecidable problems in computational theory like Rices theorem.
geraneum
> couple this with temperature 0
Not quite the case. Temperature 0 is not the same as random seed. Also there are downsides to lowering temperature (always choosing the most probable next token).
guelo
Why wouldn't the llm codify that "context" into code so it doesn't have to rethink through it over and over? Just like humans would. Imagine if you were manually operating a website and every time a request came in you had come up with sql queries (without remembering how you did it last time) and manually type the responses. You wouldn't last long before you started automating.
jstummbillig
Because nobody actually wants a "web app". People want food, love, sex or: solutions.
You or your coworker are not a web app. You can do some of the things that web apps can, and many things that a web app can't, but neither is because of the modality.
Coded determinism is hard for many problems and I find it entirely plausible that it could turn out to be the wrong approach in software, that is designed to solve some level of complex problems more generally. Average humans are pretty great at solving a certain class of complex problems that we tried to tackle unsuccessfully with many millions lines of deterministic code, or simply have not had a handle on at all, like (like build a great software CEO).
cheema33
> Average humans are pretty great at solving a certain class of complex problems that we tried to tackle unsuccessfully with many millions lines of deterministic code..
Are you suggesting that an average user would want to precisely describe in detail what they want, every single time, instead of clicking on a link that gives them what they want?
113
> Because nobody actually wants a "web app". People want food, love, sex or: solutions.
Okay but when I start my car I want to drive it, not fuck it.
hinkley
Christine didn’t end well for anyone.
jstummbillig
Most of us actually drive a car to get somewhere. The car, and the driving, are just a modality. Which is the point.
OJFord
...so that you can get to the supermarket for food, to meet someone you love, meet someone you may or may not love, or to solve the problem of how to get to work; etc.
Your ancestors didn't want horses and carts, bicycles, shoes - they wanted the solutions of the day to the same scenarios above.
ozim
I feel like this is the point where we start to make jokes about Honda owners.
stirfish
But do you want to drive, or do you want to be wherever you need to be to fuck?
lazide
Even if it purred real nice when it started up? (I’m sorry)
myhf
Designing a system with deterministic behavior would require the developer to think. Human-Computer Interaction experts agree that a better policy is to "Don't Make Me Think" [1]
_se
This is such a massive misunderstanding of the book. Have you even read it? The developer needs to think so that the user doesn't have to...
finnborge
My most charitable interpretation of the perceived misunderstanding is that the intent was to frame developers as "the user."
This project would be the developer tool used to produce interactive tools for end users.
More practically, it just redefines the developer's position; the developer and end-user are both "users". So the developer doesn't need to think AND the user doesn't need to think.
krapp
That book is talking about user interaction and application design, not development.
We absolutely should want developers to think.
crabmusket
As experiments like TFA become more common, the argument will shift to whether anybody should think about anything at all.
AstroBen
..is this an AI comment?
admax88qqq
Web apps kind of already do that with most companies shipping constant UX redesigns, A/B tests, new features, etc.
For a typical user today’s software isn’t particularly deterministic. Auto updates mean your software is constantly changing under you.
Jaygles
I don't think that is what the original commenter was getting at. In your case, the company is actively choosing to make changes. Whether its for a good reason, or leads to a good outcome, is beside the point.
LLMs being inherently non-deterministic means using this technology as the foundation of your UI will mean your UI is also non-deterministic. The changes that stem from that are NOT from any active participation of the authors/providers.
This opens a can of worms where there will always be a potential for the LLM to spit out extremely undesirable changes without anyone knowing. Maybe your bank app one day doesn't let you access your money. This is a danger inherent and fundamental to LLMs.
admax88qqq
Right I get tha. The point I’m making is that from a users perspective it’s functionally very similar. A non deterministic llm or a non deterministic company full of designers and engineers.
paulhebert
The rate of change is so different it seems absurd to compare the two in that way.
The LLM example gives you a completely different UI on _every_ page load.
That’s very different from companies moving around buttons occasionally and rarely doing full redesigns
jeltz
And most end users hate it.
samrolken
No, I wouldn’t say that my hypothesis is that non-deterministic behavior is good. It’s an undesirable side effect and illustrates the gap we have between now and the coming post-code world.
killingtime74
AI wouldn't be intelligent though if it was deterministic. It would just be information retrieval
finnborge
It already is "just" information retrieval, just with stochastic threads refining the geometry of the information.
thih9
> who wants web apps to behave differently every time you interact with them?
Technically everyone, we stopped using static pages a while ago.
Imagine pages that can now show you e.g. infinitely customizable UI; or, more likely, extremely personalized ads.
ehutch79
No.
When I go to the dmv website to renew my license, I want it to renew my license every single time
finnborge
This is amazing. It very creatively emphasizes how our definition of "boilerplate code" will shift over time. Another layer of abstraction would be running N of these, sandboxed, responding to each request, and then serving whichever instance is internally evaluated to have done the best. Then you're kind of performing meta reinforcement learning with each whole system as a head.
The hard part (coming from this direction) is enshrining the translation of specific user intentions into deterministic outputs, as others here have already mentioned. The hard part when coming from the other direction (traditional web apps) is responding fluidly/flexibly, or resolving the variance in each user's ability to express their intent.
Stability/consistency could be introduced through traditional mechanisms: Encoded instructions systematically evaluated, or, via the LLMs language interface, intent-focusing mechanisms: through increasing the prompt length / hydrating the user request with additional context/intent: "use this UI, don't drop the db."
From where I'm sitting, LLMs provide a now modality for evaluating intent. How we act on that intent can be totally fluid, totally rigid, or, perhaps obviously, somewhere in-between.
Very provocative to see this near-maximum example of non-deterministic fluid intent interpretation>execution. Thanks, I hate how much I love it!
SkiFire13
> serving whichever instance is internally evaluated to have done the best. Then you're kind of performing meta reinforcement learning
I thought this didn't work? You basically end up fitting your AI models to whatever is the internal evaluation method, and creating a good evaluation method most often ends up having a similar complexity as creating the initial AI model you wanted to train.
siliconc0w
Wrote a similar PoC here: https://github.com/s1liconcow/autoapp
Some ideas - use a slower 'design' model at startup to generate the initial app theme and DB schema and a 'fast' model for responses. I tried a version using PostREST so the logic was in entirely in the DB and but then it gets too complicated and either the design model failed to one-shot a valid schema or the fast model kept on generating invalid queries.
I also use some well known CSS libraries and remember previous pages to maintain some UI consistency.
It could be an interesting benchmark or "App Bench". How well can an LLM one-shot create a working application.
diwank
Just in time UI is incredibly promising direction. I don't expect (in the near term) that entire apps would do this but many small parts of them would really benefit. For instance, website/app tours could be just generated atop the existing ui.
zmmmmm
Yes, why not burn a forest to make a up of tea, if we can fully externalise the cost.
Even if LLMs do get 10x as fast, that's not even remotely enough. They are 1e9 times as compute intensive.
jasonthorsness
I tried this as well at https://github.com/jasonthorsness/ginprov (hosted at https://ginprov.com). After a while it sort of starts to all look the same though.
null
SamInTheShell
Currently today, I would say these models can be used by someone with minimal knowledge to churn out SPAs with React. They can probably get pretty far into making logins, message systems, and so on because there is lots of training data for those things. They can struggle through building desktop apps as well with relative ease compared to how I had to learn in years long past.
What these LLMs continue to prove those is they are no substitute for real domain knowledge. To date, I've yet to have a model implement RAFT consensus correctly in testing to see if they can build a database.
The way I interact with these models is almost adversarial in nature. I prompt them with the bare minimum that a developer might get in a feature request. I may even have a planning session to populate the context before I set it off on a task.
The bias in these LLMs really shines through an proves their autocomplete properties when they have a strong bias towards changing the one snippet of code I wrote because it doesn't fit in how it's training data would suggest the shape of it's code should be. Most models will course correct with instructions that they are wrong and I am right though.
One thing I've noted is that if you let it generate choices for you from the start of a project, it will make poor choices in nearly every language. You can be using uv to manage a python project and it will continue to try using pip or python commands. You can start an electron app and it will continuously botch if it's using commonjs or some other standard. It persistently wants to download go modules before coding instead of just writing the code and doing `go mod tidy` after (it literally doesn't need the module in advance, it doesn't even have tools to probe the module before writing the code anyway).
RAFT consensus is my go-to test because there is no 1 size fits all way for you to implement it. It might get an in-memory key store system right, but what if you want it to organize etcd/raft/v3 in a way that you can do multi-group RAFT? What if you need RAFT to coordinate some other form of data replication? None of these LLMs can really do it without a lot of prep work.
This is across all the models available from OpenAI, Claude, and Google.
zkmon
Kind of similar to the Minecraft game which computed frames on the fly without any code behind the visuals?
I don't see a point in using probabilistic methods to perform a deterministic logic. Even if it's output is correct, it's wasteful.
I spent a few hours last weekend testing whether AI can replace code by executing directly. Built a contact manager where every HTTP request goes to an LLM with three tools: database (SQLite), webResponse (HTML/JSON/JS), and updateMemory (feedback). No routes, no controllers, no business logic. The AI designs schemas on first request, generates UIs from paths alone, and evolves based on natural language feedback. It works—forms submit, data persists, APIs return JSON—but it's catastrophically slow (30-60s per request), absurdly expensive ($0.05/request), and has zero UI consistency between requests. The capability exists; performance is the problem. When inference gets 10x faster, maybe the question shifts from "how do we generate better code?" to "why generate code at all?"