Perverse incentives of vibe coding

238 comments

·May 14, 2025

neonate

brooke2k

I don't understand the productivity that people get out of these AI tools. I've tried it and I just can't get anything remotely worthwhile unless it's something very simple or something completely new being built from the ground up.

Like sure, I can ask claude to give me the barebones of a web service that does some simple task. Or a webpage with some information on it.

But any time I've tried to get AI services to help with bugfixing/feature development on a large, complex, potentially multi-language codebase, it's useless.

And those tasks are the ones that actually take up the majority of my time. On the occasion that I'm spinning a new thing up quickly, I don't really need an AI to do it for me -- I mean, that's the easy part!

Is there something I'm missing? Am I just not using it right? I keep seeing people talk about how addictive it is, how the productivity boost is insane, how all their code is now written by AI and then audited, and I just don't see how that's possible outside of really simple rote programming.

tptacek

The first and most important question to ask here is: are you using a coding agent? A lot of times, people who aren't getting much out of LLM-assisted coding are just asking Claude or GPT for code snippets, and pasting and building them themselves (or, equivalently, they're using LLM-augmented autocomplete in their editor).

Almost everybody doing serious work with LLMs is using an agent, which means that the LLM is authoring files, linting them, compiling them, and iterating when it spots problems.

There's more to using LLMs well than this, but this is the high-order bit.

lexandstuff

Funny, I would give the absolute opposite advice. In my experience, the use of agents (mainly Cursor) is a sure-fire way to have a really painful experience with LLM-assisted coding. I much prefer to use AI as a pair programmer, that I talk to and sometimes get to write entire files, but I'm always the one doing the driving, and mostly the one writing the code.

If you aren't building up mental models of the problem as you go, you end up in a situation where the LLM gets stuck at the edges of its capability, and you have no idea how even to help it overcome the hurdle. Then you spend hours backtracking through what it's done building up the mental model you need, before you can move on. The process is slower and more frustrating than not using AI in the first place.

I guess the reality is, your luck with AI-assisted coding really comes down to the problem you're working on, and how much of it is prior art the LLM has seen in training.

tptacek

I mean, it might depend, but many of the most common complaints about LLM coding (most notably hallucination) are essentially solved problems if you're using agents. Whatever works for you! I don't even like autocomplete, so I sympathize with not liking agents.

If it helps, for context: I'll go round and round with an agent until I've got roughly what I want, and then I go through and beat everything into my own idiom. I don't push code I don't understand and most of the code gets moved or reworked a bit. I don't expect good structure from LLMs (but I also don't invest the time to improve structure until I've done a bunch of edit/compile/test cycles).

I think of LLMs mostly as a way of unsticking and overcoming inertia (and writing tests). "Writing code", once I'm in flow, has always been pleasant and fast; the LLMs just get me to that state much faster.

I'm sure training data matters, but I think static typing and language tooling matters much more. By way of example: I routinely use LLMs to extend intensely domain-specific code internal to our project.

tom_m

Cursor is pretty bad in my experience. I don't know why because I find Windsurf better and they both use Claude.

Regardless, Gemini 2.5 Pro is far far better and I use that with open-source free Roo Code. You can use the Gemini 2.5 Pro experimental model for free (rate limited) to get a completely free experience and taste for it.

Cursor was great and started is off, but others took notice and now they're all more or less the same. It comes down to UX and preference, but I think Windsurf and Roo Code just did a better job here than Cursor, personally.

mnoronha

Agree. My favorite workflow has been chatting with the LLM in the assistant panel of Zed, then making inline edits by prompting the AI with the context of that chat. That way, I can align with the AI on how the problem should be solved before letting it loose. What's great about this depending on how easy or hard the problem is for the LLM, I can shift between handholding / manual coding and vibe coding.

theshrike79

Agents make it easier for you to give context to the LLM, or for it to grab some by itself like Cline/Claude/Cursor/Windsurf can do.

With a web-based system you need repomix or something similar to give the whole project (or parts of it if you can be bothered to filter) as context, which isn't exactly nifty

WD-42

It’s because the people doing rote programming with them don’t think they are doing rote programming, they think it’s exceptional.

khazhoux

My sweet spot is Cursor to generate/tweak code, but I do all the execution and debugging iteration myself.

__mharrison__

What agent do you recommend?

tptacek

I think they're all fine. Cursor is popular and charges a flat fee for model calls (interposed through their model call router, however that works). Aider is probably the most popular open source command line one. Claude Code is probably the most popular command line agent overall; Codex is the OpenAI equivalent (I like Codex fine).

later

Oh, I like Zed a lot too. People complain that Zed's agent (the back-and-forth with the model) is noticeably slower than the other agents, but to me, it doesn't matter: all the agents are slow enough that I can't sit there and wait for them to finish, and Zed has nice desktop notifications for when the agent finishes.

Plus you get a pretty nice editor --- I still write exclusively in Emacs, but I think of Zed as being a particularly nice code UI for an LLM agent.

theshrike79

I've settled on Cline for now, with openrouter as the backend for LLMs, Gemini 2.5 for planning and Claude 3.7 for act mode.

Cursor is fine, Claude Code and Aider are a bit too janky for me - and tend to go overboard (making full-ass git commits without prompting) and I can't be arsed to rein them in.

physix

I use Augment Code as a plugin in IntelliJ and PyCharm. It's quite good, but I only use it for narrow, targeted objectives, agent mode or not.

I haven't seen any mentions of Augment code yet in comment threads on HN. Does anyone else use Augment Code?

kbaker

Try https://aider.chat + OpenRouter.ai, pay-as-you-go, use any model you want, I use Claude Sonnet.

It has a very good system prompt so the code is pretty good without a lot of fluff.

haiku2077

I've been having okayish results with Zed + Claude 3.7

kasey_junk

Speaking up for Devin.ai here. What I like about it is that after the initial prompt nearly all of my interaction with it is via pull request comments.

I have this workflow where I trigger a bunch of prompts in the morning, lunch and at the end of the day. At those same times I give it feedback. The async nature really means I can have it work on things I can’t be bothered with myself.

lukan

Yesterday I gave cursor a try and made my first (intentionally very lazy) vibe coding approach (a simple threejs project). It accepted the task and did things, failed, did things, failed, did things ... failed for good.

I guess I could work on the magic incantations to tweak here and there a bit until it works and I guess that's the way it is done. But I wasn't hooked.

I do get value out of LLM's for isolated broken down subtasks, where asking a LLM is quicker than googling.

For me, AI will probably become really usefull, once I can scan and integrate my own complex codebase so it gives me solutions that work there and not hallucinate API points or jump between incompatible libary versions (my main issue).

rockemsockem

I did almost the same thing and had pretty much the same experience. A lot of times it felt so close to being great, but it ultimately wasted more time than if I had just worked on the project and occasionally asked chat GPT to generate some sample code to figure out an API

christophilus

I’ve had the same experience with Cursor. Claude Code, though, has been a game changer. It is really excellent.

tauoverpi

I've had the same issue every time I've tried it. The code I generally work on is embedded C/C++ with in-house libraries where the tools are less than useful as they try to generate non-existant interfaces and generally generate worse code than I'd write by hand. There's a need for correctness and being able to explain the code thus use of those tools is also detrimental to explainability unless I hand-hold it to the point where I'm writing all of the code myself.

Generating function documentation hasn't been that useful either as the doc comments generated offer no insight and often the amount I'd have to write to get it to produce anything of value is more effort than just writing the doc comments myself.

For my personal project in zig they either get lost completely or gives me terrible code (my code isn't _that_ bad!). There seems to be no middle ground here. I've even tried the tools as pair programmers but they often get lost or stuck in loops of repeating the same thing that's already been mentioned (likely falls out of the context window).

When it comes to others using such tools I've had to ask them to stop using it to think as it becomes next to impossible to teach / mentor if they're passing that I say to the LLM or trying to have it perform the work. I'm confident in debugging people when it comes to math / programming but with an LLM between it's just not possible to guess where they went wrong or how to bring them back to the right path as the throught process is lost (or there wasn't one to begin with).

This is not even "vibe coding", I've just never found it generally useful enough to use day-to-day for any task and my primary use of say phind has been to use it as an alternative to qwant when I cannot game the search query well enough to get the search results I'm looking for (i.e I ignore the LLM output and just look at the references).

motorest

> I've had the same issue every time I've tried it. The code I generally work on is embedded C/C++ with in-house libraries where the tools are less than useful as they try to generate non-existant interfaces and generally generate worse code than I'd write by hand.

That's because whatever training the model had, it didn't covered anything remotely similar to the codebase you worked on.

We get this issue even with obscure FLOSS libraries.

When we fail to provide context to LLMs, they generate examples by following supperficial queues like coding conventions. In extreme cases, such as code that employs source code generators or templates, LLMs even fill in function bodies that code generators are designed to generate for you. That's because, if LLMs are oblivious to the context, they resort to hallucinate their way into something seemingly coherent. Unless you provide them with context or instruct them not to make up stuff, they will resort to bullshit their way into an example.

What's truly impressive about this is that often times the hallucinated code actually works.

> Generating function documentation hasn't been that useful either as the doc comments generated offer no insight and often the amount I'd have to write to get it to produce anything of value is more effort than just writing the doc comments myself.

Again,this suggest a failure on your side for not providing any context.

If you give it enough context LLMs synthesize and present them almost instantly. If you're prompting a LLM to generate documentation, which boils down to synthesizing what an implementation does and what's their purpose,and the LLM comes up empty, that means you failed to give it anything to work on.

The bulk of your comment screams failure to provide any context. If your code steers far away from what it expects, fails to follow any discernible structure, and doesn't even convey purpose and meaning in little things like naming conventions, you're not giving the LLM anything to work on.

tauoverpi

I'm aware of the _why_ but this is why the tools aren't useful for my case. If they cannot consume the codebase in a reasonable amount of time and provide value from that then they generally aren't useful in areas where I would want to use them (navigating large codebases). If the codebase is relatively small or the problem is known then an LLM is not any better than tab-complete and arguably worse in many cases as the generated result has to be parsed and added to my mental model of the problem rather than the mental model being constructed while working on the code itself.

I guess my point is, I have no use for LLMs in their current state.

> That's because whatever training the model had, it didn't covered anything remotely similar to the codebase you worked on. > We get this issue even with obscure FLOSS libraries.

This is the issue however as unfamiliar codebases is exactly where I'd want to use such tooling. Not working in those cases makes it less than useful.

> Unless you provide them with context or instruct them not to make up stuff, they will resort to bullshit their way into an example.

In all cases context was provided extensively but at some point it's easier to just write the code directly. The context is in surrounding code which if the tool cannot pick up on that when combined with direction is again less than useful.

> What's truly impressive about this is that often times the hallucinated code actually works.

I haven't experienced the same. It fails more often than not and the result is much worse than the hand-written solution regardless of the level of direction. This may be due to unfamiliar code but again, if code is common then I'm likely familiar with it already thus lowering the value of the tool.

> Again,this suggest a failure on your side for not providing any context.

This feels like a case of blaming the user without full context of the situation. There are comments, the names are descriptive and within reason, and there's annotation of why certain things are done the way they are. The purpose of a doc comment is not "this does X" but rather _why_ you want to use this function and it's purpose which is something LLMs struggle to derive from my testing of them. Adding enough direction to describe such is effectively writing the documentation with a crude english->english compiler between. This is the same problem with unit test generation where unit tests are not to game code coverage but to provide meaningful tests of the domain and known edge cases of a function which is again something the LLM struggles with.

For any non-junior task LLM tools are practically useless (from what I've tested) and for junior level tasks it would be better to train someone to do better.

Starlevel004

> Is there something I'm missing? Am I just not using it right?

The talk about it makes more sense when you remember most developers are primarily writing CRUD webapps or adware, which is essentially a solved problem already.

tptacek

I'm not doing either of those things with it.

rockemsockem

What are some examples of things you are doing with it?

tom_m

They aren't increasing productivity. In the short term.

They are very handy tools that can help you learn a foreign code/base faster. They can help you when you run into those annoying blockers that usually take hours or days or a second set of eyes to figure out. They give you a sounding board and help you ask questions and think about the code more.

Big IF here. IF you bother to read. The danger is some people just keep clicking and re-prompting until something works, but they have zero clue what it is and how it works. This is going to be the biggest problem with AI code editors. People just letting Jesus take the wheel and during this process, inefficient usage of the tools will lead to slower throughput and a higher bill. AI costs a good chunk of change per token and that's only going up.

I do think it's addictive for sure. I also think the "productivity boost" is a feeling people get, but no one measures. I mean, it's hard to measure. Then again, if you do spend an hour on a problem you get stuck on vs 3 days then sure it helped productivity. In that particular scenario. Averaged out? Who knows.

They are useful tools, they are just also very misunderstood and many people are too lazy to take the time to understand them. They read headlines and unsubstantiated claims and get overwhelmed by hype and FOMO. So here we are. Another tech bubble. A super bubble really. It's not that the tools won't be with us for a long time or that they aren't useful. It's that they are way way overvalued right now.

danbolt

I appreciate you voicing your feelings here. My previous employer requested we try AI tooling for productivity purposes, and I was finding myself in similar scenarios to what you mention. The parts that would have benefitted from a productivity gain weren’t seeing any improvement, while the areas that saw a speedup weren’t terribly mission-critical.

The one thing I really appreciated though was the AI’s ability to do a “fuzzy” search in occasional moments of need. Or, for example, sometimes the colloquial term for a feature didn’t match naming conventions in source code. The AI could find associations in commit messages and review information to save me time rummaging through git-blame. Like I said though, that sort of problem wasn’t necessarily a bottleneck and could often be solved much more cheaply by asking around coworker on Slack.

hx8

Probably 80% of the time I spend coding, I'm inside a code file I haven't read in the last month. If I need to spend more than 30 seconds reading a section of code before I understand it, I'll ask AI to explain it to me. Usually, it does a good job of explaining code at a level of complexity that would take me 1-15 minutes to understand, but does a poor job of answering more complex questions or at understanding more complex code.

It's a moderately useful tool for me. I suspect the people that get the most use out of are those that would take more than 1 hour to read code I would take 10 minutes to read. Which is to say the least experienced people get the most value.

etler

I find it's incredibly helpful for prototyping. These tools quickly reach a limit of complexity and put out sub par code, but for a green field prototype that's ok.

I've successfully been able to test out new libraries and do explorations quickly with AI coding tools and I can then take those working examples and fix them up manually to bring them up to my coding standards. I can also extend the lifespan of coding tools by doing cleanup cycles where I manually clean up the code since they work better with cleaner encapsulation, and you can use them to work on one scoped component at a time.

I've found that they're great to test out ideas and learn more quickly, but my goal is to better understand the technologies I'm prototyping myself, I'm not trying to get it to output production quality code.

I do think there's a future where LLMs can operate in a well architected production codebase with proper type safe compilation, linting, testing, encapsulation, code review, etc, but with a very tight leash because without oversight and quality control and correction it'll quickly degrade your codebase.

andy99

I wish more had been written about the first assertion that using an LLM to code is like gambling and you're always hoping that just one more prompt will get you what you want.

It really captures how little control one has over the process, while simultaneously having the illusion of control.

I don't really believe that code is being made verbose to make more profits. There's probably some element of model providers not prioritizing concise code, but if conciseness while maintaining "quality" was possible is would give one model a sufficient edge over others that I suspect providers would do it.

meander_water

Agreed, I've been thinking about the first assertion a lot recently as I've been using Cursor to create a react app. I think it's more prevalent in frontend development because it tightens the feedback loop considerably, and the more positive feedback you get, the more conditioned you get to reach for it anytime you need to do anything in code.

I think there's another perverse incentive here - organisations want to produce features/products fast, which LLMs help with, but it comes at the cost of reduced cognitive capabilities/skills in the developers over the longer term as they've given that up through lack of use/practice.

tptacek

That's not a great argument for talking down their utility for experienced developers, though.

meander_water

I'm not so sure, I think skills atrophy with disuse no matter what level of experience you have. Like I have around 15 years of experience, but if I stepped away from coding for even just a year a lot of those years of experience will count for nothing.

Rastonbury

I don't believe there are perverse incentives yet, right now it's arms race burn money and operate at a loss days. There is no moat only quality and price per token and the leader moves around too quickly. Also Author should really look into Cursor at $20 with unlimited slow requests, I imagine paying per token hurts when it spits out garbage even when you've thought you provided enough context but it wasn't enough.

Someone needs to make a plugin to count lines of discard code and prompts

techpineapple

Something I caught about Andrej Karpathy’s original tweet, was he said “give into the vibes”, and I wonder if he meant that about outcomes too.

andy99

I still think the original tweet was tongue-in-cheek and not really meant to be a serious description of how to do things.

nico

> It really captures how little control one has over the process, while simultaneously having the illusion of control.

This is actually a big insight about life, that in some eastern philosophies, you are supposed to arrive to

We love the illusion of control, even though we don’t really have it. Life mostly just unfolds as we experience it

nativeit

This has certainly been my own experience in life. My step-father was a very studious and responsible person. He worked 30-years from the age of 19 with the state as an HVAC service tech until he retired at 49yo with a full state pension, and then went to work for a private company. His plan was to earn as much as he could until he turned 55, and then retire to live/work on the small farm he and my mother had just purchased. Everything was coming together, his new job placed him in a senior project management position, and gave him a considerable salary compared with the state.

Shortly after he turned 50, he was diagnosed with pancreatic cancer, and he died several months later, following a very painful and difficult attempt to treat it.

In my mind, this kind of thing is the height of tragedy—he did everything right. He exhibited an incredible amount of self-control and deferred his happiness, ensuring that his family and finances were well-cared for and secured, and then having fulfilled his obligations, he was almost immediately robbed of a life that he’d worked so hard to earn.

I experienced a few more object lessons in the same vein myself, namely having been diagnosed with multiple sclerosis at the age of 18, and readjusting my life’s goals to accommodate the prospect of disability. I’m thankfully still churning along under my own capacities, now at 41yo, but MS can be unpredictable, and I find it is necessary to remind myself of this from time to time. I am grateful for every day that I have, and to the extent it’s possible, I try to find nearer-term sources for happiness and fulfillment.

Don’t waste any time planning for more than the next five years (with the obvious exceptions for things like financial planning), as you can’t possible know what’s coming. Even if the unexpected event is a happy one, like an unexpected child or sudden financial windfall, your perspective will almost certainly be dramatically altered 1-2x each decade.

nicbou

I've experienced this for the first time with a close friend, and it really stays on your mind. There was no reason it had to be him. He didn't roll the dice wrong.

It created a sense of urgency in my own life. You have this idea that you will be the same person until you die of old age, and suddenly you realise that the current year is worth much more than another year two decades from now. A bird in the hand is worth two in the bush.

theshrike79

But just like gambling, there are ways to do it correctly.

Yes, there are the grandmas in a trance vibe-gambling by shoving a bucket of quarters in a slot machine.

But you also have people playing Blackjack and beating the averages by knowing how it's played, maybe having a "feel" for the deck (or counting cards...), and most importantly knowing when to fold and walk away.

Same with LLMs, you need to understand context sizes and prompts and you need to have a feel for when the model is just chasing its own tail or trying to force a "solution" just to please the user.

matsemann

While I get your point, this also kinda sounds like a gambling addict trying to explain how they're not an addict and how they're losing money the correct way, heh.

erulabs

These perverse incentives run at the heart of almost all Developer Software as a Service tooling. Using someone else's hosted model incentivizes increasing token usage, but it's nothing special about AI.

Consider Database-as-a-service companies: They're not incentivized to optimize on CPU usage, they charge per cpu. They're not incentivized to improve disk compression, they charge for disk-usage. There are several DB vendors who explicitly disable disk compression and happily charge for storage capacity.

When you run the software yourself, or the model yourself, the incentives aligned: use less power, use less memory, use less disk, etc.

tmpz22

> When you run the software yourself, or the model yourself, the incentives aligned: use less power, use less memory, use less disk, etc.

But my team's time is soooo valuable. It's sooo sooo sooo valuable. Oh and we can't afford to hire anyone else either. But our time its sooo valuable. We need these tools!

alternatex

Opens PR with quadruple-nested for-loop running synchronous DB queries.

- Premature optimization is the root of all evil, can't waste expensive dev hours on that..

jiggawatts

My favourite example of this is the recent trend towards “wide events” replacing logs and metrics… spearheaded and popularised by companies that charge by the gigabytes ingested.

tptacek

Companies that ingest logs generally rip their customers faces off with their pricing. At least oTel spans can be tail-sampled.

jiggawatts

I worked out that it's cheaper to write logs to high-end Samsung SSDs and then throw them away every month than to retain them in the log analytics systems of some cloud services for the same period of time.

Wait, no, sorry... that doesn't quite "paint the right picture".

The "single use" SSDs are 75 times cheaper than storing the data in the cloud.

chaboud

1. Yes. I've spent several late nights nudging Cline and Claude (and other systems) to the right answers. And being able to use AWS Bedrock to do this has been great (note: I work at Amazon).

2. I've had good fortunes keeping the agents to constrained areas, working on functions, or objects, with clearly defined (by me) boundaries. If the measure of a junior engineer is that you correct them once a day, an engineer once a week, a senior once a month, a principal once a quarter... Treat these agents like hyper-energetic interns. Nudge frequently.

3. Standard org management coding practices apply. Force the agents to show work, plan, unit test, investigate.

And, basically, I've described that we're becoming Software Development Managers with teams of on-demand low-quality interns. That's an incredibly powerful tool, but don't expect hyper-elegant and compact code from them. Keep that for the senior engineering staff (humans) for now.

(Note: The AlphaEvolve announcement makes me wonder if I'm going to have hyper-energetic applied science interns next...)

lubujackson

I feel like "vibe coding" as a "no look" sort of way to produce anything is bad and will probably remain bad for some time.

However... "vibe architecting" is likely going to be the way forward. I have had success with generating/tuning an architecture plan with AI, having it create stub files/functions then filling them out individually. I can get pretty much the whole way without typing code, but it does require a fair bit more architectural thinking than usual and a good bit of reading code (then telling the AI to "do better").

I think of it like the analogy of blind men describing an elephant when they can only feel a single part. AI is decent at high level architecture and decent at low level production but you need a human to understand the big picture and how the pieces fit (and which ones are missing).

nowittyusername

What you are talking about is the "proper" way of vibe coding. Most of the issues with vibe coding stem from user misunderstanding the capabilities of the technology they are using. They are overestimating the capabilities of current systems and are essentially asking for magic to happen. They don't give proper guidance, context or anything of value for the coding IDE to work with. They are relying a mindset of the 2030's to work with systems from 2025. We aint there yet folks, give as much guidance and context as you can and you will have a better time.

xianshou

Amusingly, about 90% of my rat's-nest problems with Sonnet 3.7 are solved by simply appending a few words to the end of the prompt:

"write minimum code required"

It's not even that sensitive to the wording - "be terse" or "make minimal changes" amount to the same thing - but the resulting code will often be at least 50% shorter than the un-guided version.

panstromek

Well, the article mentions that this reduces accuracy. Do you hit that problem often then?

SteveMqz

The study the article cited is specifically about when asking the LLMs about misinformation. I think on coding tasks and such shorter answers are usually more accurate.

theshrike79

Gemini on the other hand has a tendency for super-defensive coding.

It'll check _EVERY_ edge case separately, even in situations where it will never ever happen and if it does, it's a NOP anyway.

YossarianFrPrez

There are two sets of perverse incentives at play. The main one the author focuses on is that LLM companies are incentivized to produce verbose answers, so that when you task an LLM on extending an already verbose project, the tokens used and therefore cost increases.

The second one is more intra/interpersonal: under pressure to produce, it's very easy to rely on LLMs to get one 80% of the way there and polish the remaining 20%. I'm in a new domain that requires learning a new language. So something I've started doing is asking ChatGPT to come up with exercises / coding etudes / homework for me based on past interactions.

vanschelven

> Its “almost there” quality — the feeling we’re just one prompt away from the perfect solution — is what makes it so addicting. Vibe coding operates on the principle of variable-ratio reinforcement, a powerful form of operant conditioning where rewards come unpredictably. Unlike fixed rewards, this intermittent success pattern (“the code works! it’s brilliant! it just broke! wtf!”), triggers stronger dopamine responses in our brain’s reward pathways, similar to gambling behaviors.

Though I'm not a "vibe coder" myself I very much recognize this as part of the "appeal" of GenAI tools more generally. Trying to get Image Generators to do what I want has a very "gambling-like" quality to it.

Suppafly

>Trying to get Image Generators to do what I want has a very "gambling-like" quality to it.

Especially when you try to get them to generate something they explicitly tell you they won't, like nudity. It feels akin to hacking.

dingnuts

it's not like gambling, it is gambling. you exchange dollars for chips (tokens -- some casinos even call the chips tokens) and insert it into the machine in exchange for the chance of a prize.

if it doesn't work the first time you pull the lever, it might the second time, and it might not. Either way, the house wins.

It should be regulated as gambling, because it is. There's no metaphor, the only difference from a slot machine is that AI will never output cash directly, only the possibility of an output that could make money. So if you're lucky with your first gamble, it'll give you a second one to try.

Gambling all the way down.

NathanKP

This only makes sense if you have an all or nothing concept of the value of output from AI.

Every prompt and answer is contributing value toward your progress toward the final solution, even if that value is just narrowing the latent space of potential outputs by keeping track of failed paths in the context window, so that it can avoid that path in a future answer after you provide followup feedback.

The vast majority of slot machine pulls produce no value to the player. Every single prompt into an LLM tool produces some form of value. I have never once had an entirely wasted prompt unless you count the AI service literally crashing and returning a "Service Unavailable" type error.

One of the stupidest takes about AI is that a partial hallucination or a single bug destroys the value of the tool. If a response is 90% of the way there and I have to fix the 10% of it that doesn't meet my expectations, then I still got 90% value from that answer.

NegativeLatency

> Every prompt and answer is contributing value toward your progress toward the final solution

This has not been my experience, maybe sometimes, but certainly not always.

As an example: asking chatgpt/gemini about how to accomplish some sql data transformation set me back in finding the right answer because the answer it did give me was so plausible but also super duper not correct in the end. Would've been better off not using it in that case.

Brings to mind "You can't build a ladder to the moon"

secabeen

> One of the stupidest takes about AI is that a partial hallucination or a single bug destroys the value of the tool. If a response is 90% of the way there and I have to fix the 10% of it that doesn't meet my expectations, then I still got 90% value from that answer.

That assumes that the value of a solution is linear with the amount completed. If the Pareto Principle holds (80% of effects come from 20% of causes), then not getting that critical 10+% likely has an outsized effect on the value of the solution. If I have to do the 20% of the work that's hard and important after taking what the LLM did for the remainder, I haven't gained as much because I still have to build the state machine in my head to understand the problem-space well enough to do that coding.

PaulDavisThe1st

This assumes you can easily and reliably identify the 10% you need to fix.

princealiiiii

> It should be regulated as gambling, because it is.

That's wild. Anything with non-deterministic output will have this.

martin-t

That's incorrect, gambling is about waiting.

Brain scans have revealed that waiting for a potential win stimulates the same areas as the win itself. That's the "appeal" of gambling. Your brain literally feels like it's winning while waiting because it _might_ win.

kagevf

> "Anything with non-deterministic output will have this.

Anything with non-deterministic output that charges money ...

Edit Added words to clarify what I meant.

GuinansEyebrows

maybe more accurately anything with non-deterministic output that you have to pay-per-use instead of paying by outcome.

rapind

By this logic:

- I buy stock that doesn't perform how I expected.

- I hire someone to produce art.

- I pay a lawyer to represent me in court.

- I pay a registration fee to play a sport expecting to win.

- I buy a gift for someone expecting friendship.

Are all gambas.

You aren't paying for the result (the win), you are paying for the service that may produce the desired result, and in some cases one of may possibly desirable results.

rjbwork

>I buy stock that doesn't perform how I expected.

Hence the adage "sir, this is a casino"

nkrisc

None of those are a games of chance, except the first.

squeaky-clean

So how exactly does that work for the $25/mo flat fee that I pay OpenAI for chatgpt. They want me to keep getting the wrong output and burning money on their backend without any additional payment from me?

dwringer

Something of an aside, but this is sort of equivalent to asking "how does that work for the $50 dollars the casino gave me to gamble with for free"? I once made 50 dollars exactly in that way by taking the casino's free tokens and putting them all on black in a single roulette spin. People like that are not the ones companies like that make money off of.

kimixa

For the amount of money OpenAI burns that $25/mo is functionally the same as zero - they're still in the "first one is free" phase.

Though you could say the same thing about pretty much any VC funded sector in the "Growth" phase. And I probably will.

csallen

Books are not like gambling, they are gambling. you exchange dollars for chips (money — some libraries even give you digital credits for "tokens") and spend them on a book in exchange for the chance of getting something good out of it.

If you don't get something good the first time you buy a book, you might with the next book, or you might not. Either way, the house wins.

It should be regulated as gambling, because it is. There's no metaphor — the only difference from a slot machine is that books will never output cash directly, only the possibility of an insight or idea that could make money. So if you're lucky with your first gamble, you'll want to try another.

Gambling all the way down.

mystified5016

I run genAI models on my own hardware for free. How does that fit into your argument?

codr7

The fact that you can get your drugs for free doesn't exactly make you less of an addict.

null

[deleted]

abletonlive

Yikes. The reactionary reach for more regulation from a certain group is just so tiresome. This is the real mind virus that I wish would be contained in Europe.

I almost can't believe this idea is being seriously considered by anybody. By that logic buying any CPU is gambling because it's not deterministic how far you can overclock it.

Just so you know, not every llm use case requires paying for tokens. You can even run a local LLM and use cline w/ it for all your coding needs. Pull that slot machine lever as many times as you like without spending a dollar.

slurpyb

Do you understand what electricity is?

yewW0tm8

Same with anything though? Startups, marriages, kids.

All those laid off coders gambled on a career that didn’t pan out.

Want more certainty in life, gonna have to get political.

And even then there is no guarantee the future give a crap. Society may well collapse in 30 years, or 100…

This is all just role play to satisfy the prior generations story driven illusions.

Inityx

If you consider your marriage or your kids to be a gamble, that's a sign that you shouldn't get married or have kids.

bitwize

"Vibe coding as gacha game" is a new wrinkle I didn't expect. It certainly explains why I see people who should know better talking up AI and LLMs like they're the second coming: it's like how stoners talk about weed as a cancer cure.

flashgordon

This addiction and fear of things-going-bad-if-i-dont-listen-to-the-copilot is precisely why my workflow is a bit more simple and caveman-ish:

1. start a project with vague README (or take an existing one).

2. create makefile with the "prompt" action that looks something like (I might put it in a script to work around tabs etc):

```

prompt:

    for f in `find ./ | grep '*.go *.ts *.files_i_care_about' | grep -v 'files to ignore' | pbcopy`

    do

        echo "// FILE: $f"

        cat $f

    done

```

3. Run `make prompt` to get a fresh new starting prompt, Go to Gemini (AI Studio) and use the prompt:

```You have the following files. Understand it and we will start building some features.

<Ctrl-v to paste the files copied above> ```

4. It thinks, understands and gives me the "I am ready" line.

5. To build feature X I simply prompt it with:

``` I want to build feature X. Understand it, plan it, and do not regenerate entire files. Just give me unix style diffs. ```

6. Iterate on what i like and dont (including refactors, etc)

7. Copy patches and apply locally

8. Repeat steps 5 - 7.

10. After about 300-400k tokens generated (say over 20-40 features) I snapshot with the prompt:

``` Great now is a great time to checkpoint. Generate a SUMMARY.md on a per folder basis of your understanding of the current state of the project along with a roadmap of next steps. ```

11. I save/update the SUMMARY.md and go to bed. When I come back I repeat from step 2 - and voila the SUMMARY.md generated before are included too.

I have generated about 20M tokens so far at a cost of 0. For me "copy/pasting" diffs is not a big deal. Getting clean code, having a nice custom workflow is more important. I am still ready to relinquish control fully to an agent. I just want a really good code search/auto-complete out of the LLM that adheres to *my* interfaces and constraints.

insane_dreamer

> In an effort to impress the user and over-deliver, LLMs end up creating a rat’s nest of ultra-defensive code littered with debugging statementsIn an effort to impress the user and over-deliver, LLMs end up creating a rat’s nest of ultra-defensive code littered with debugging statements

This has been my experience as well. I have to continuously explicitly instruct Claude to be more concise (though that often leads to broken code ...). Gemini is even more verbose.

I'm not sure in the end how much time is saved over simple good auto-completes (for method syntax lookups), other than for rote tasks like "replicate this pattern across X" (and even then it doesn't get it 100% right), and for quick answers to specific questions usually in frameworks I'm not that well versed it that I would have searched SO for ("how do I do X in Qt?", "how do I do the equivalent of Y in Linux on Windows")--but even then I have to verify the answer, whereas if it's a highly voted answer on SO I'll know it works (or there will be helpful comments to the contrary under the reply).

Most of the "it can build X app for you automatically" comments I read remind me of "build a Rails app in 5 lines" (back in the day).