I know when you're vibe coding
171 comments
·July 31, 2025lmm
elashri
To be honest, most of these things can happen for poorly documented large codebase. I work on academic research project that have docs that tells you basically that the code is self documented. And give one or two pages about configuring CMake and build the project and another page on how to benchmark the throughout. But the internal quirks and the expected convention you will need to figure it on your own.
New people contributing usually reinvent many things and change global configuration because they don't know they can use something already there.
Ironically indexing the codebase and ask LLM questions about specific things is the best thing you can do. Because the only 3 people who you can ask left the project or are busy and will reply within a week.
godelski
> docs that tells you basically that the code is self documented
Anytime someone tells me the code is self-documented I hear "there's no documentation."The most common programmer's footgun
I don't have time to document
| ^
v |
Spends lots of time trying to understand code
We constantly say we don't have time to document the code. So instead we spend all our time reading code and trying to figure out what it does, to the minimal amount of understanding needed to implement whatever thing we need to implement.This, of course, itself is naive because you can't know what the minimal necessary information is without knowing something about the whole codebase. Which is also why institutional knowledge is so important and why it is also weird that we'd rather have pay raises through switching companies than through internal raises. That's like trying to fix the damage from the footgun with a footgun.
apabepa
For me it is difficult to give good code comments just when code is written. The problem is solved, the tricky parts if any are internalized. I dont mind reading code so just documenting what the code is doing does seldom bring value. The important thing is to document why the code does things in an non obvious way and unintuitive scenarios and edge cases etc.
When revisiting code is the best time to add comments because then you will find out what is tricky and what is obvious.
Code reviews are also good for adding code comments. If the people reviewing are doing their job and are actually trying to understand the code then it is a good time to get feedback where to add comments.
avhception
Sometimes I feel like watching people dig a hole until it starts filling with groundwater. They then start bailing the water out with buckets. They're very busy doing that, so the actual digging work slowly grinds to almost zero. I stand at the edge of the pit, trying to talk to them about electrical pumps and drainage solutions and get yelled at: "I don't have time for your nonsense, can't you see I'm busy bailing water here!?"
fsloth
In some cases like clean written geometry algorithms the code _is_ the best technical documentation and attempts at verbal description would sound awkward and plausible become dated. In this case the purpose of the written docs is to offer enough context (possibly quite a lot) to understand the _why_ but the how is easiest to understand by reading the code.
I’m not arguing about your personal experience but these things are not absolutes.
The key thing is can a new developer jump in and understand the thing. Add enough docs until they facilitate this understanding as well as possible. Then stop documenting and point the reader to the code.
aspenmayer
> why it is also weird that we'd rather have pay raises through switching companies than through internal raises
How does the saying go, something like “show me the incentives and I’ll show you the outcome?”
> That's like trying to fix the damage from the footgun with a footgun.
If you value your money/time/etc, wouldn't the best way to fix the damage from footguns be by preventing the damage to you in the first place by not being there if/when it goes off?
I think your point is well put, I’m just trying to follow your reasoning to a conclusion logical to me, though I don't know if mine is the most helpful framing. I didn’t pick the footgun metaphor, but it is a somewhat useful model here for explaining why people may act the way they do.
nosianu
> most of these things can happen for poorly documented large codebase.
Documentation does not help beyond a point. Nobody reads the documentation repeatedly, which would be needed.
When you keep working on a project, and you need a new function, you would need to check or remember every single time that such a function already exists or might exist somewhere. You may have found it when you read the docs months ago, but since you had no need for that function at the time your brain just dismissed it and tossed that knowledge out.
For example, I had a well-documented utils/ folder with just a few useful modules, but they kept getting reimplemented by various programmers. I did not fault them, they would have had to remember every single time they needed some utility to first check that folder. All while keeping up that diligence forever, and while working on a number of projects. It is just too hard. Most of the time you would not find what you need, so most of the time that extra check would be a waste. Even the most diligent person would at some point reimplement something that already exists, no matter how well-documented it is. It's about that extra search step itself.
The closer you want 100% perfection you get exponentially increasing effort. So we have some duplication, not a big deal. Overall architectural quality is more important than squeezing out those last not really important few percent of perfection.
globnomulous
In my experience, the usefulness of documentation in code declines as familiarity with a codebase increases. The result: people ignore it; it becomes outdated; now it's debt. Similarly, non-intralinear documentation (documentation that isn't in the code) tends to grow with a codebase. Meanwhile, the codebase changes, personnel change, and more and more of the documentation beco.ed noise, a historical artifact of solving problems that either no longer exist or can no longer be solved the same way.
That being said, good documentation is worth its weight in gold and supports the overall health and quality of a codebase/project. Open-source projects that succeed often seem to have unusually strong, disciplined documentation practices. Maybe that's just a by-product of engineering discipline, but I don't think it is -- at least not entirely.
JoRyGu
I'm sorry, but this is selling good engineers very short. If you didn't nest your utils folder 8 folders deep, it seems pretty obvious that one should check the utils folder before writing another utility function. This stuff should also be caught in code reviews. Maybe the new guy didn't know that util function existed, but surely you did when you reviewed their MR? Obviously mistakes like that can happen, but I've found that to be the exception rather than the rule, even in some of the gnarlier codebases I've worked in.
michaelsalim
I for one think that this discipline is what separates a good developer from being a good engineer. This kind of rigorous process is the kind of thing that I'd expect from most devs but is sadly missing most of the time.
yunohn
I agree with you completely, but also posit that this is exactly what agentic LLMs should solve?
Claude code’s Plan mode kind of does this research before coding - but tbf the Search tool seemingly fails half the time with 0 results and it gets confused and then reimplements too…
godelski
I think you're missing the author's thesis
> Is speed the greatest virtue?
If speed is the greatest virtue then yeah, all that stuff will happen. But if it isn't, then that stuff will happen at a much lower frequency. Because, all the stuff mentioned is just tech debt. Debt doesn't go away, it accrues interest.If speed is all that matters then you need exponential output, as your output needs to offset the debt. If speed is a factor but isn't the only factor, then you need to weigh it against the other things. Take on debt wisely and pay it off when you can. But it does seem like there's a trend to just take on as much debt and hope for the best. Last I checked, most people aren't really good at handling debt.
lbriner
I think you are misusing the phrase "tech debt" like many people do.
Not everything that is not perfect is Tech Debt, some of it is just pragmatism. If you end up with two methods doing the same thing, who cares? As long as they are both correct, they cost nothing, might never need any maintenance attention and will never be paid down before the codebase is replaced in 10 years time.
Same with people writing code in a different style to others. If it is unreadable, that isn't tech debt either, it's just a lack of process or lack of someone following the process. Shouldn't be merged = no tech debt.
Adding some code to check edge cases that are already handled elsewhere. Again, who cares? If the code make it unreadable, delete it if you know it isn't needed, it only took 10 seconds to generate. If it stays in place and is understandable, it's not tech debt. Again, not going to pay it down, it doesn't cost anything and worse case is you change one validation and not the other and a test fails, shouldn't take long to find the problem.
Tech debt is specifically borrowing against the right way to do something in order to speed up delivery but knowing that either the code will need updating later to cope with future requirements or that it is definitely not done in a reliable/performant/safe way and almost certainly will need visiting again.
darkwater
> Tech debt is specifically borrowing against the right way to do something in order to speed up delivery but knowing that either the code will need updating later to cope with future requirements or that it is definitely not done in a reliable/performant/safe way and almost certainly will need visiting again.
Thing that many people do without even realizing they are incurring in tech debt. This kind of developers are the one that will just generate more tech debt with an LLM in their hands (at least now).
godelski
I'm a bit confused because you start by disagreeing with me but then end up agreeing with me.
> If you end up with two methods doing the same thing, who cares? As long as they are both correct, they cost nothing
To be clear, tech debt isn't "code that doesn't run". It's, like you later say "borrowing against the right way to do something in order to speed up delivery", which is what I said the authors thesis was.No need for perfection. Perfection doesn't exist in code. The environment is constantly moving, so all code needs to eventually be maintained.
But I also want to be very very clear here. Just because two functions have the same output doesn't mean that they're the same and no one should care. I'll reference Knuth's premature optimization here. You grab a profiler and find the bottleneck in the code and it's written with a function that's O(n^3) but can be written in O(n log n). Who cares? The customer cares. Or maybe your manager who's budgeting that AWS bill does. You're right that they're both logically "correct" but it's not what you want in your code.
Similarly, code that is held together with spaghetti and duct tape is tech debt. It runs. It gives the correct output. But it is brittle, hard to figure out what it does (in context), and will likely rot. "There's nothing more permanent than a temporary fix that works ", as the saying goes. I guess I'll also include the saying "why is there never time to do things right but there's always time to do things twice?"
Code can be broken in many ways. Both of those situations have real costs. Costs in terms of both time and money. It's naïve to think that the only way code can be broken is by not passing tests. It's naïve to think you've tested everything that needs to be tested. Idk about you, but when I code I learn more about the problem, often with the design changing. Most people I know code this way. Which is why it is always good to write flexible code, because the only thing you can rely on with high confidence is that it's going to change
whilenot-dev
Code that demands to pay off its dept isn't non-working code, it's rather code that exceeds one's ability to maintain it properly (your mention of "unreadability" included). Whether a PR introduces debt isn't always known and often times has to be discovered later on, depending on how fast its maintainers fluctuate and the ecosystem advances.
That said, tech debt isn't paid by developers individually, it's paid by organizations in developers time. Only in rare cases can you make a deliberate decision for it, as it grows organically within any project. For example, most python2 code today that used niche libraries with outdated docs that have been taken offline in the meantime has to be considered expensive tech debt nowadays.
0x_rs
This. Reinventing the wheel at every opportunity, forgetting about or ignoring the expected way to do something, mixing patterns, you name it. The author may call it "vibe coding", that's fine but it has little to do with LLMs. The tool has the same amount of care anyone rushing to get something done, or that hasn't build the project themselves, or maybe doesn't have enough experience would. I can only assume it's a not-very-subtle complaint about a specific person in their team, "written in a way no developer on the team would" is telling.
I'd be extremely careful about applying this thinking anywhere else. There's enough baseless finger-pointing in academia and arts already.
noodletheworld
> The author may call it "vibe coding", that's fine but it has little to do with LLMs.
Humm.
Maybe if we say that this is not an issue from vibe coding it wont be?
Maybe if we pretend that maybe a naive junior would make these mistakes (true) we should be happy to accept them from senior developers (false)?
LLMs are extraordinarily bad at doing these things.
I’ve seen it.
You've seen it.
The OP has seen it.
You’re in a rush so you wrote some classes in a code base in a language which supports classes but has no classes in it?
Really? Did that get past code review before? Did you deliberately put up a code review that you knew would be rejected and take longer to merge as a result because you were in a hurry?
Of course not.
You did the bare minimum that still met the basic quality standards expected of you.
I get it. We all get it. When youre in a rush you cut corners to move faster.
…but thats not what the OP is talking about, and its not what I see either:
Its people putting up AI slop and not caring at all what the content was.
Just a quick check it compiled and the tests pass if youre lucky.
Too lazy even put a “dont use classes” in their cursor rules file.
Come on. The OP isnt saying dont use AI.
Theyre saying care, just a little bit about your craft ffs.
perrygeo
I've seen developers add a second ORM library as a dependency, not because the first didn't do the job but because they just "forgot" about the first one and wanted to use the new hotness. Developers, just like LLMs, have biases that taint the solution space.
The key is that we all have an intuitive sense that this behavior is wrong - building a project means working within the established patterns of that project, or at least being aware of them! Going off half-cocked and building a solution without considering the context is extremely bad form.
In the case of human developers, this can be fixed on the code review level, encouraging a culture of reading not just writing code. Without proper guardrails, they can create code that's dissonant with the existing project.
In the case of LLMs, the only recourse is context engineering. You need to make everything explicit. You need to teach the LLM all the patterns that matter. Their responses will always be probabilistic token salad, by definition. Without proper guardrails, it will create code that's dissonant with the existing project.
Either way, it's a question of subjective values. The patterns that are important need to be articulated, otherwise you get token salad randomly sampling the solution space.
eddd-ddde
Heavy agree on your first paragraph. You spend one evening removing one unnecessary dependency only to see 5 more were added by next Monday.
I think soon enough we'll have a decent LLM that's capable of reviewing ALL changes to ensure they follow the "culture" we expect to see.
Msurrow
I have worked on such teams. Mostly, even. I would not accept any PRs with code doing any of those things (human or machine made). Small(er) teams on small to medium sized projects.
Critical solutions, but small(er) projects with 2-4 devs, that’s where it’s at. I feel like it’s because then it’s actually possible to build a devteam culture and consensus that has the wanted balance of quality and deliveryspeed.
shepherdjerred
You can prevent quite a lot of these issues if you write rules for Cursor or your preferred IDE
Linters can also help quite a bit. In the end, you either have your rules enforced programmatically or by a human in review.
I think it’s a very different (and so far, for me, uncomfortable) way of working, but I think there can be benefits especially as tooling improves
sshine
It seems like people who use AI for coding need to reinvent a lot of the same basic principles of software engineering before they gradually propagate into the mainstream agentic frameworks.
Coding agents come with a lot of good behavior built in.
Like "planning mode" where they create a strong picture of what's to be made before touching files. This has honestly improved my workflow at programming from wanting to jump into prototyping before I even have a clear idea, to being very spec-oriented: Of course there needs to be a plan, especially when it will be drafted for me in seconds.
But the amount of preventable dumb things coding agents will do that need to be explicitly stated and meticulously repeated in their contexts reveals how simply training on the world's knowledge does not capture senior software engineer workflows entirely, and captures a lot of human averageness that is frowned upon.
shepherdjerred
100%
A lot has to be re-done. Using an IDE like Cursor is really a skill in its own & you likely won't see a productivity boost using agents without moderate investment, and even then there are tradeoffs.
I think the real benefit comes in a few years when more engineering has been done and the tools are more polished. The way that I look at is that these tools are the worst they'll ever be.
cardanome
Do those rules really work? I have added the rule to not not add comments and I still have to constantly remind the model to not add comments despite of it.
ewoodrich
I have a .roorules file with only about four instructions, one of which is an (unintentional) binary canary of very simple rule following at the end of a task. And another rule that’s a fuzzier canary as it is not always applicable but usually occurs a few times in a task so helps me confirm the rules are being parsed at all in case Roo has a bug.
All the models I’ve used (yes, including all the biggest, newest, smartest ones) follow the binary rule about 75% of the time at the very most. Usually closer to 50% on average, with odds significantly decreasing the longer the context increases as it occurs at the end of a task but other than that seems to have no predictable pattern.
The fuzzier rule is slightly better, I’m guessing because it applies earlier in the context window, at around 80% compliance and uses lots of caps and emphasis. This one has a more predictable failure mode of the ratio of reading code vs thinking/troubleshooting/time the model is “in its own head”. When mostly reading code or my instructions compliance is very high, when doing extended troubleshooting or anything that starts to veer away from the project itself into training data it is much lower.
So it’s hit and miss and does help but definitely not something I’d rely on as a hard guardrail, like not executing commands, which Roo has a non-LLM tool config to control. So over time I hope agentic runners add more detetministic config outside the model itself, because instructions still aren't as reliable as they should be and don't seem to be getting substantially better in real use.
shepherdjerred
I've found that it mostly works, though it can still makes mistakes.
e.g. I had a lint rule enabled that the AI would always violate & have to iterate to fix, I added a lint rule to tell it not do to a certain thing, and most of the time it wrote code that passed the linter on the first try.
hakunin
My entire (decades) career I worked primarily in small start up teams, and even when people didn't see eye to eye, they always maintained these kinds of basic practices. I think a lot of disagreement on these expectations is rooted in the size and "tight-knit"ness of your team.
camdenreslink
I think size of the code base matters a lot too. In a multi-100k line of code application, you might not know a utility function is already defined somewhere and reinvent the wheel.
bux93
Not only do people do these things, they put their work on github. Which is where the LLM learns to do it!
buserror
I personally treat the LLM as a very junior programmer. He's willing to work, will take instructions, but his knowledge of the codebase, and patterns we use is lacking strongly. So it needs a LOT of handholding, very clear instructions, description of potential pitfalls, and smaller, scoped tasks, and reviewed carefully to catch any straying off pattern.
Also, I make it work the same way I do: I first come up with the data model until it "works" in my head, before writing any "code" to deal with it. Again, clear instructions.
Oh another thing, one of my "golden rule" is that it needs to keep a block comment at the top of the file to describe what's going on in that file. It acts as a second "prompt" when I restart a session.
It works pretty well, it doesn't appear as "magic" as the "make it so!" approach people think they can get away with, but it works for me.
But yes, I still also spend maybe 30% of the time cleaning up, renaming stuff and do more general rework of the code before it comes "presentable" but it still allows to work pretty quickly, a lot quicker than if I were to do it all by hand.
otikik
I think "junior programmer" (or "copilot") oversells the AI in some cases and undersells it in others. It does forget things that a regular person wouldn't, and it does very basic coding mistakes sometimes. At the same time it's better than me at some things (getting off-by-one errors when dealing with algorithms that work on arrays). It also has encyclopedic knowledge about basically anything out there on the internet. Red-black Trees? Sure thing. ECS systems for game programming? No problemo, here are the most used libraries.
I have ended up thinking about it as a "hunting dog". It can do some things better than me. It can get into tiny crevasses and bushes. It doesn't mind getting wet or dirty. It will smell the prey better than me.
But I should make the kill. And I should be leading the hunt, not the other way around.
jeanloolz
That hunting dog analogy is epic and perfectly matches my experience.
pulse7
The difference between LLM and a very junior programmer: junior programmer will learn and change, LLM won't change! The more instructions you put in the prompt, the more will be forgotten and the more it will bounce back to the "general world-wide average". And on next prompt you must start all over again... Not so with junior programmers ...
irb
This is the only thing that makes junior programmers worthwhile. Any task will take longer and probably be more work for me if I give it to a junior programmer vs just doing it myself. The reason I give tasks to junior programmers is so that they eventually become less junior, and can actually be useful.
Having a junior programmer assistant who never gets better sounds like hell.
presentation
The tech might get better eventually, it has gotten better rapidly to this point and everyone working on the models are aware of these problems. Strong incentive to figure something out.
Or maybe this is it. Who knows.
buserror
Ahaha you likely haven't seen as many Junior Programmer as I have then! </jk>
But I agree completely some juniors are a pleasure to see bloom, it's nice when one day you see their eye shine and "wow this is so cool, never realized you made that like THAT for THAT reason" :-)
n4r9
The other big difference is that you can spin up an LLM instantly. You can scale up your use of LLMs far more quickly and conveniently than you can hire junior devs. What used to be an occasional annoyance risks becoming a widespread rot.
muzani
They're automations. You have to program them like every other script.
freilanzer
The learning is in the model versions.
relistan
My guess is that you're letting the context get polluted with all the stuff it's reading in your repo. Try using subagents to keep the top level context clean. It only starts to forget rules (mostly) when the context is too full of other stuff and the amount taken up by the rules is small.
animal531
Definitely. To be honest I don't think LLM's are any different from googling and copying code off the Internet. Its still up to the developer to take the code, go over it, make sure its doing what its supposed to be doing (and only that) etc.
As for the last part, I've recently been getting close to 50 and my eyes aren't what they used to be. In order to fight off eye-strain I now have to tightly ration whatever I do into 20 minute blocks, before having to take appropriate breaks etc.
As a result of that time has become one of the biggest factors for me. An LLM can output code 1000x faster than a human, so if I can wrangle it somehow to do whatever basics for me then its a huge bonus. At the moment I'm busy generating appropriate struct of arrays for SIMD from input AoS structs, and I'm using Unity C# with LINQ to output the text (I need it to be editable by anyone, so I didn't want to go down the Roslyn or T4 route).
The queries are relatively simple, take the list of data elements and select the correct entries, then take whatever fields and construct strings with them. Even so, copying/editing them takes a lot longer than me telling GPT to select this, exclude that and make the string look like ABC.
I think there was a post yesterday about AI's as HUDs, and that makes a lot of sense to me. We don't need an all-powerful model that can write the whole program, what we need is a super-powered assistant that can write and refactor on a very small and local scale.
uyzstvqs
I personally see the LLM as a (considerably better) alternative to StackOverflow. I ask it questions, and it immediately has answers for my exact questions. Most often I then write my own code based on the answer. Sometimes I have the LLM generate functions that I can use in my code, but I always make sure to fully understand how it works before copy-pasting it into my codebase.
But sometimes I wonder if pushing a +400.000 lines PR to an open-source project in a programming language that I don't understand is more beneficial to my career than being honest and quality-driven. In the same way that YoE takes precedence over actual skill in hiring at most companies.
presentation
Unlike stack overflow, if it doesn’t know the answer it’ll just confidently spit out some nonsense and you might fall for it or waste a lot of time figuring out that it’s clueless.
You might get the same in Stack Overflow too, but more likely I’ve found either no response or, or someone pretty competent actually does come out of the woodworks.
presentation
I find success basically limiting it to the literal coding but not the thinking - chop tasks down to specific, constrained changes; write detailed specs including what files should be changed, how I want it to write the code, specific examples of other places to emulate, and so on. Doesn’t have to be insanely granular but the more breadcrumbs the higher chance it’ll work, you find a balance. And whatever it produces, I git add -p one by one to make sure each chunk makes sense.
More work up front and some work after, but still saves time and brain power vs doing it all myself or letting it vibe out some garbage.
_def
I agree. Brings me to the question though, how to deal with team members that are less experienced and use LLMs. Code review needs much more work then to teach these principles. And most of the time people won't bother to do that and just rubber stamp the working solution.
buserror
In my experience, this is a problem without LLM anyway; many times you cannot just tell coworkers (junior, or not) to completely trash their patch and do it again (even using nicer words).
Very often it comes down to HR issues in the end, so you end up having to take that code anyway, and either sneakily revert it or secretly rework it...
neurostimulant
> Also, I make it work the same way I do: I first come up with the data model until it "works" in my head, before writing any "code" to deal with it. Again, clear instructions.
But then it's not vibe coding anymore :)
presentation
Is that a bad thing? What do we call this?
x3n0ph3n3
I have never seen a junior engineer make up API calls or arguments.
palata
A risk with vibe coding is that it may make a good developer slightly faster, but it will make bad developers waaaay faster. Resulting in more bad code being produced.
The question then is: do the bad developers improve by vibe coding, or are they stuck in a local optimum?
woolion
So, I was wondering when I would see that... from my experience, I would say it also makes mediocre developers bad ones very fast. The reason being a false sense of confidence, but mostly it's because of the sheer volume that is produced.
If we want to be more precise, I think the main issue is that the AI-generated code lacks a clear architecture. It has no (or very little) respect for overall information flow, and single-responsibility principle.
Since the AI wants you to have "safe" code, so it will catch things and return non-results instead. In practice, that means the calling code has to inspect the result to see if it's a placeholder or not, instead of being confident because you'd get an exception otherwise.
Similarly, to avoid problems the AI might tweak some parameter. If for example you were to design an program to process something with AI, you might to gather_parameters -> call -> process_results. Call should not try to do funky things with parameters because that should be fixed at the gathering step. But locally the AI is always going to suggest having a bunch of "if this parameter is not good, swap it silently so that it can go through anyway".
Then tests are such a problem it would require an even longer explanation...
palata
To echo the article, I don't want to know it was written with an AI. Just like I don't want to see that it was obviously copy-pasted from StackOverflow.
The developer can do whatever they want, but at the end, what I review is their code. If that code is bad, it is the developer's responsibility. No amount of "the agent did it" matters to me. If the code written by the agent requires heavy refactoring, then the developer has to do it, period.
woolion
100% agree.
However, you'll probably get an angry answer that it's management fault, or something of the sort, that is to blame (because there isn't enough time). Responsibility would have to be taken up before in pushing back if some objectives truly are not reasonable.
oc1
A whole new generation will discover the term net-negative programmer again ;)
null
bubblyworld
Personally I think caring is the resource in extremely short supply here and I don't think vibe coding has much to do with it causally. AIs are just tools - basically all of the issues the author has raised are present with human juniors too, and can be resolved quite easily with a little more guidance/interaction in both cases. I don't think AIs are universally causing people to care less about quality output, although that may be true for some people who didn't care much in the first place.
The common counter-argument here is that you miss out on training juniors, which is true, but it's not always an option (we are really struggling to hire at my startup, for instance, so I'm experimenting with AI to work on tasks I would otherwise give to a junior as a stop-gap).
Another aspect to consider is that what we used to consider important for software quality may change a lot in light of AI tooling. These things aren't absolutes. I think this is already happening, but it's early days, so I'm not sure what will play out here.
philipp-gayret
All major AI assistants already come with ways to not have any of these issues.
Claude Code has /init, Cursor comes with /Generate Cursor Rules, and so on. It's not even context engineering: There are out of the box tools you can use not to have this happen. And even if they do happen: you can make them never happen again, with these same tools, for your entire organization - if you had invested the time to know how to use them.
It is interesting how these tools split up the development community.
ceuk
CC very regularly ignores very explicit stuff in CLAUDE.md for me, and I know I'm not the only one. The cycle of compacting/starting new conversations feels like a sisyphean spiral of the same undesirable behaviour and I've yet to find a satisfactory solution despite a lot of effort to that end.
I don't think it's fair to dismiss this article as a superficial anti-ai knee jerk. The solutions you describe are far from perfect
philipp-gayret
Fair enough. For me compacting conversation always feels a bit weird; I have no way to tell what it effectively deleted from the context but I (very) regularly have it re-read and update CLAUDE.md as part of the process or after "discussions" with the LLM so I would guess that might be why it follows patterns in it a bit stricter for me than for most. It would be nice if the tool took care of that automatically.
woolion
Serious question: I'm currently re-evaluating if Cursor can speed up my daily work. Currently it is not really the case because of the many subtle errors (like switching a ":" for a ","). But mostly the problem I face is that the code base is big, with entirely outdated parts and poorly coded ones. So the AI favors the most common patterns, which are the bad ones. Even with basic instructions like "take inspiration from <part of the code that is very similar and well-written>" it still mostly takes from the overall codebase (which, by the way, was worsened by a big chunk of vibe-coded output that was hastily merged). My understanding is that a rule should essentially do the same as if it is put in the prompt directly. Is there a solution to that?
philipp-gayret
I recently switched to Claude Code, I much prefer it (I end up less in cycles of Cursor getting stuck on problems). Before I used Cursor for some months.
> My understanding is that a rule should essentially do the same as if it is put in the prompt directly. Is there a solution to that?
Yes from my understanding Cursor Rule files are essentially an invisible prefix to every prompt. I had some issues in the past with Cursor not picking up rule files until I restarted it (some glitch, probably gone by now.). So put something simple like a "version" or for your rules file and ask it what version of the rules are we following for this conversation just to validate that the process is working.
For Cursor with larger projects I use a set of larger rule files that always apply. Recently I worked with Spotify's Backstage for example and I had it index online documentation on architecture, build instructions, design, development of certain components, project layout. Easily 500+ lines worth of markdown. I tell Cursor where to look, i.e. online documentation of the libraries you use, reference implementations if you have any, good code examples and why they are good, and then it writes its own rule files - I don't write them manually anymore. That has been working really well for me. If you have a common technology stack you or way of working you can also try throwing in some examples from https://github.com/PatrickJS/awesome-cursorrules
For a codebase containing both good and bad code; maybe you can point it to a past change where code was refactored from bad to good, so it can write out what why you prefer which style and how to manage the migration from bad to good. That said; the tools are not perfect. Even with rules the bad output still can happen but larger rule files describing what you'd like to do and what to avoid makes the chance significantly smaller and the tool more pleasant to work with. I recently switched to Claude Code because Cursor tended to get "stuck" on the same problem which I don't really experience with Claude Code but YMMV.
croes
The issue isn’t in the tool but in the vibe „coder“.
They care like they code: not.
Zanfa
LLMs would also need to use historic commits as context, rather than just the current state of the codebase in isolation. Most codebases I've worked with go through migrations from a legacy pattern A to a newer and better pattern B, used across different parts of the codebase. Rarely can these migrations be done in a single go, so both patterns tend to stick around for a while as old code is revisited. Like the HTTP example, even if LLMs pick up a pattern to follow (which they often don't), it's a coin flip whether they pick the right one or not.
dwd
This...
I once worked on a massive codebase that had survived multiple acquisitions, renames and mergers over a 20 year period. By the time I left it had finally passed into the hands of a Fortune 500 global company.
You would often find code that matched an API call you required that was last updated in the mid-2000s, but there was a good chance that it was not the most recent code for that task, but still existed as it was needed for some bespoke function a single client used.
There could also be similar API calls with no documentation, and you had to pick the one that returned the data fields that you wanted.
antihero
You can craft a nice CLAUDE.md saying write code like this bit, avoid writing code like this legacy bit etc.
manmal
Better to tell them exactly how this and that is done, with some examples.
croes
But that kind of awareness is what vibe coder often lack.
Many didn’t code (much) before.
anshumankmr
That would assume a commit message is implemented correctly, and isn't like "Updated this file" or "Bugfix"
wldlyinaccurate
I think the parent comment means "commits" in the sense of the actual changeset; not just the message.
anshumankmr
That is also problematic, cause a git diff will probably require an exponential gain in context length AND also the ability for the LLM to use said context effectively.
That being said, a context length problem could be potentially be solved but it will take a bit of time, I think Llama4 had 10M context length (not sure if anyone tried prompting it with that much data to see how effective it really is)
subarctic
Setting up a linter, formatter and having a lot of strict type checking are really helpful when using an llm to generate code, just like they are when you start receiving contributions from people who don't know or don't agree with the style rules that you implicitly follow. As are tests. Basically anything automated that helps ensure the right style and behaviour of your code becomes very useful with coding agents as they can just run whatever command you tell them to check for issues and/or fix them if possible.
arduanika
I believe you that these tools help a lot, but they would not prevent ~any of the examples listed in the article (under "The smell of vibe coding").
helloplanets
Most of those look like context issues to me. Repo map (using Tree-sitter, etc) and documentation would already do wonders. Feeding 32-64kTok of context directly into a model like Gemini Pro 2.5 is something that more people should try out in situations like this. Or even 128kTok+.
Cthulhu_
Probably not, but that's where code review comes in. Which can also partially be done by AI, ironically.
dkdbejwi383
With cursor at least it recognises linter errors and failing tests and attempts to correct its own problems, with varying levels of success
antihero
This just masks the problem.
nojs
In my experience pretty much all of these issues stem from a combination of short context windows and suboptimal “context engineering”.
If the agent has a clean, relevant context explaining what global functions are available it tends to use them properly.
The biggest challenge is how to construct the right context for each request, and keep it clean until the feature is finished. I expect we will see a lot of improvements in this area the coming months (sub-agents being an obvious example).
frizlab
> If the agent has a clean, relevant context explaining what global functions are available it tends to use them properly.
STOP! The agent does not exist. There are no agents; only mathematical functions that have an input and produce an output.
Stop anthropomorphizing LLMs, they are not human, they don’t do anything.
It might seem like it does not matter; my take is it’s primordial. Humans are not machines and vice-versa.
johnisgood
We have used the term "agent" in AI for some time.
> The main unifying theme is the idea of an intelligent agent. We define AI as the study of agents that receive percepts from the environment and perform actions.
This is from Artificial Intelligence: A Modern Approach by Stuart J. Russell and Peter Norvig.
Sebalf
Frankly, this take is so reductionistic that it's useless. You can substitute "mathematical functions" with "biochemistry" and apply the exact same argument to human beings.
What I'd like is for people to stop pretending we have any idea what the hidden layer of an LLM is actually doing. We do not know at all. Yes, words like "statistics" and "mathematical functions" can accurately describe the underlying architecture of LLMs, but the actual mechanism of knowledge processing is not understood at all. It is exactly analogous to how we understand quite a lot about how neurons function at the cellular level (but far from everything, seeing as how complicated and opaque nature tends to be), but that we have no idea whatsoever what exactly is happening when a human being is doing a cognitive task.
It is a fallacy to confuse the surface level understanding of how a transformer functions, to the unknown mechanisms that LLMs employ.
atleastoptimal
This is all true. The best way to treat LLM's as they are now is one step above the abstraction offered by compiled languages over assembly. You can describe something in plain english, note its explicit requirements, inputs and outputs, and an LLM can effectively write the code as a translation of the logic you are specifying. Using LLM's, you are best served minimizing the entropy they have to deal with. The transformer is essentially a translation engine, so use it as a translator, not as a generator.
That being said, every few months a new model comes out that is a little less encumbered by the typical flaws of LLM's, a little more "intuitively" smart and less needing of hand-holding, a little more reliable. I feel that this is simply a natural course of evolution, as more money is put into LLM's they get better because they're essentially a giant association machine, and those associations give rise to larger abstractions, more robust conceptions of how to wield the tools of understanding the world, etc. Over time it seems inevitable that providing an LLM any task it will be able to perform that task better than any human programmer given it, and the same will go for the rest of what humans do.
nickm12
This is a false analogy. LLMs do not "compile" natural language to high level code in the same way that a compiler or interpreter implements a high-level programming language in terms of machine instructions (or, for that matter, how a CPU implements machine instructions in hardware).
Programming and machine languages aim for a precise and unambiguous semantics, such that it's meaningful to talk about things like whether the semantics are actually precise or whether the compiler has a bug in failing to implement the spec.
Natural language is not just a higher level of abstraction on our existing stack. If a new model comes out, or you even run an existing model with a new seed, you can get different code out that behaves differently. This is not how compilers work.
atleastoptimal
If a program calls an API like
search_engine.get_search_results(query, length, order)
It doesn't "care" about the algorithm that produced that list of results, only that it fits the approximation of how the algorithm works as defined by the schema. There are thousands of ways the engine could have been implemented to produce the schema that returns relevance-based results from a web-crawler-sourced database.
In the same way, if I prompt an LLM "design a schema with [list of requirements] that works in [code context and API calls]", there are thousands of ways it could produce that code, but within a margin of error a high quality LLM should be able to produce the code that fits those requirements.
Of course the difference is that there is a stochastic element to LLM generated code. However it is useful to think of LLM's this way because it allows being able to leverage their probability of being correct, even if they aren't as precise as calling APIs but being explicit in how those abstractions are used.
devnullbrain
This is a false interpretation, you've put "compile" in quotes when it doesn't appear in the parent comment and the actual phrasing used is more correct.
voxl
No, LLMs are not an "abstraction" like a compiler is. This is bullshit. LLMs are stochastic token generators. I have NEVER met someone in real life that has produced something I wouldn't throw in the trash using LLMs, and I have had the displeasure of eating cookies baked from an LLM recipe.
No, LLMs will not get better. The singularity bullshit has been active since 2010s. LLMs have consumed the entire fucking Internet and are still useless. Where the fuck is the rest of the data going to come from? All these emails from people wanting high quality data from PhDs only for them to be scammy. People only want to train these things on easily stolen garbage, not quality input, because quality is expensive. Go figure!
This optimistic horeshit hype is embarrassing.
atleastoptimal
>No, LLMs will not get better.
What makes you so sure of this? They've been getting better like clockwork every few months for the past 5 years.
bigstrat2003
I don't claim that they won't get better, but they certainly haven't gotten better. From the original release of ChatGPT to now, they still suck in the same exact ways.
otabdeveloper4
No they haven't.
The hallucinate exactly as much as they did five years ago.
anshumankmr
> LLMs have consumed the entire fucking Internet and are still useless.
They aren't useless. Otherwise, ChatGPT would have died a long time back
> Where the fuck is the rest of the data going to come from?
Good question. Personally, I think companies will start paying more for high quality data or what is at least perceived as high quality data. I think Reddit and some other social media companies like it are poised to reap the rewards of this.
Whether this will be effective in the long run remains to be seen.
misnome
> They aren't useless. Otherwise, ChatGPT would have died a long time back
Isn’t the entire industry being fuelled by orders of magnitude more VC funding than revenue?
mns
> But people want a good cup of coffee, even if they have to wait a little bit for it.
I think the author is vastly underestimating what the majority of people actually want. It took me a lot to get this, but for many people, quick/cheap will always trump quality.
0_____0
the ubiquity of keurig machines really does speak to this.
CompoundEyes
I’ve been working with Claude Code subagents for the first time this week and I have one that purposefully goes through code analyzing it for human maintainability, over-architecting, too many clever one liner tricks, meaningful variable names, ease of debugging, cognitive load, class sizes and so on. It had some interesting analysis about a few of the classes being heavy in requiring domain knowledge and suggested documentation / onboarding for that particular area. I’m interested to see where this goes. The code will be in production this aspect is important to me. Gold plating is in reach with LLMs why not?
agile-gift0262
I don't think that it's people no caring. I think many (most?) of us are biased to accept what we already have that's working. I noticed that of myself when I tried programming with an LLM "agent". After all the fuzzing around, many iterations of novels I had to write as many prompts, once the LLM produced something that worked, I had to fight with my instinct of just pushing that for review.
I also noticed that the time I had to spend on reviews from some of my colleagues increased by 9 times (time tracked). So I don't know how much faster they are being at producing that code, but I think it's taking longer overall to get that ticket closed.
> Because no one would write an HTTP fetching implementation covering all edge cases when we have a data fetching library in the project that already does that.
> No one would implement a bunch of utility functions that we already have in a different module.
> No one would change a global configuration when there’s a mechanism to do it on a module level.
> No one would write a class when we’re using a functional approach everywhere.
Boy I'd like to work on whatever teams this guy's worked on. People absolutely do all those things.