Generative AI coding tools and agents do not work for me

189 comments

·June 17, 2025

socalgal2

> Another common argument I've heard is that Generative AI is helpful when you need to write code in a language or technology you are not familiar with. To me this also makes little sense.

I'm not sure I get this one. When I'm learning new tech I almost always have questions. I used to google them. If I couldn't find an answer I might try posting on stack overflow. Sometimes as I'm typing the question their search would finally kick in and find the answer (similar questions). Other times I'd post the question, if it didn't get closed, maybe I'd get an answer a few hours or days later.

Now I just ask ChatGPT or Gemini and more often than not it gives me the answer. That alone and nothing else (agent modes, AI editing or generating files) is enough to increase my output. I get answers 10x faster than I used to. I'm not sure what that has to do with the point about learning. Getting answers to those question is learning, regardless of where the answer comes from.

socalgal2

To add, another experience I had. I was using an API I'm not that familiar with. My program was crashing. Looking at the stack trace I didn't see why. Maybe if I had many months experience with this API it would be obvious but it certainly wasn't to me. For fun I just copy and pasted the stack trace into Gemini. ~60 frames worth of C++. It immediately pointed out the likely cause given the API I was using. I fixed the bug with a 2 line change once I had that clue from the AI. That seems pretty useful to me. I'm not sure how long it would have taken me to find it otherwise since, as I said, I'm not that familiar with that API.

turtlebits

It's perfect for small boilerplate utilities. If I need a browser extension/tampermonkey script, I can get up and running quickly without having to read docs/write manifests. These are small projects where without AI, I wouldn't have bothered to even start.

At its least, AI can be extremely useful for autocompleting simple code logic or automatically finding replacements when I'm copying code/config and making small changes.

waprin

To some degree, traditional coding and AI coding are not the same thing, so it's not surprising that some people are better at one than the other. The author is basically saying that he's much better at coding than AI coding.

But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize. You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

With that said, I'm still very skeptical of letting the AI drive the majority of the software work, despite meeting people who swear it works. I personally am currently preferring "let the AI do most of the grunt work but get good at managing it and shepherding the high level software design".

It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

mitthrowaway2

The skill ceiling might be "high" but it's not like investing years of practice to become a great pianist. The most experienced AI coder in the world has about three years of practice working this way, much of which is obsoleted because the models have changed to the point where some lessons learned on GPT 3.5 don't transfer. There aren't teachers with decades of experience to learn from, either.

notnullorvoid

Is it a skill worth learning though? How much does the output quality improve? How transferable is it across models and tools of today, and of the future?

From what I see of AI programming tools today, I highly doubt the skills developed are going to transfer to tools we'll see even a year from now.

serpix

Regarding using AI tools for programming it is not a one-for-all choice. You can pick a grunt work task such as "Tag every such and such terraform resource with a uuid" and let it do just that. Nothing to do with quality but everything to do with a simple task and not having to bother with the tedium.

autobodie

Why use AI to do something so simple? You're only increasing the possibility that it gets done wrong. Multi-cursor editing wil be faster anyway.

stitched2gethr

It will very soon be the only way.

skydhash

> But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize

No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

> It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

While they share a lot of principles (around composition, poses,...), they are different activities with different output. No one conflates the two. You don't draw and think you're going to capture a moment in time. The intent is to share an observation with the world.

furyofantares

> No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

The skill floor is something you can pick up in a few minutes and find it useful, yes. I have been spending dedicated effort toward finding the skill ceiling and haven't found it.

I've picked up lots of skills in my career, some of which were easy, but some of which required dedicated learning, or practice, or experimentation. LLM-assisted coding is probably in the top 3 in terms of effort I've put into learning it.

I'm trying to learn the right patterns to use to keep the LLM on track and keeping the codebase in check. Most importantly, and quite relevant to OP, I'd like to use LLMs to get work done much faster while still becoming an expert in the system that is produced.

Finding the line has been really tough. You can get a LOT done fast without this requirement, but personally I don't want to work anywhere that has a bunch of systems that nobody's an expert in. On the flip side, as in the OP, you can have this requirement and end up slower by using an LLM than by writing the code yourself.

viraptor

> It's something you can pick in a few minutes

You can start in a few minutes, sure. (Also you can start using gdb in minutes) But GP is talking about the ceiling. Do you know which models work better for what kind of task? Do you know what format is better for extra files? Do you know when it's beneficial to restart / compress context? Are you using single prompts or multi stage planning trees? How are you managing project-specific expectations? What type of testing gives better results in guiding the model? What kind of issues are more common for which languages?

Correct prompting these days what makes a difference in tasks like SWE-verified.

sothatsit

I feel like there is also a very high ceiling to how much scaffolding you can produce for the agents to get them to work better. This includes custom prompts, custom CLAUDE.md files, other documentation files for Claude to read, and especially how well and quickly your linting and tests can run, and how much functionality they cover. That's not to mention MCP and getting Claude to talk to your database or open your website using Playwright, which I have not even tried yet.

For example, I have a custom planning prompt that I will give a paragraph or two of information to, and then it will produce a specification document from that by searching the web and reading the code and documentation. And then I will review that specification document before passing it back to Claude Code to implement the change.

This works because it is a lot easier to review a specification document than it is to review the final code changes. So, if I understand it and guide it towards how I would want the feature to be implemented at the specification stage, that sets me up to have a much easier time reviewing the final result as well. Because it will more closely match my own mental model of the codebase and how things should be implemented.

And it feels like that is barely scratching the surface of setting up the coding environment for Claude Code to work in.

oxidant

I do not agree it is something you can pick up in an hour. You have to learn what AI is good at, how different models code, how to prompt to get the results you want.

If anything, prompting well is akin to learning a new programming language. What words do you use to explain what you want to achieve? How do you reference files/sections so you don't waste context on meaningless things?

I've been using AI tools to code for the past year and a half (Github Copilot, Cursor, Claude Code, OpenAI APIs) and they all need slightly different things to be successful and they're all better at different things.

AI isn't a panacea, but it can be the right tool for the job.

15123123

I am also interested in how much of these skills are at the mercy of OpenAI ? Like IIRC 1 or 2 years ago there was an uproar of AI "artists" saying that their art is ruined because of model changes ( or maybe the system prompt changed ).

>I do not agree it is something you can pick up in an hour.

But it's also interesting that the industry is selling the opposite ( with AI anyone can code / write / draw / make music ).

>You have to learn what AI is good at.

More often than not I find it you need to learn what the AI is bad at, and this is not a fun experience.

sagarpatil

Yeah, you can’t do sh*t in an hour. I spend a good 6-8 hours every day using Claude Code, and I actually spend an hour every day trying new AI tools, it’s a constant process.

Here’s what my today’s task looks like: 1. Test TRAE/Refact.ai/Zencoder: 70% on SWE verified 2. https://github.com/kbwo/ccmanager: use git tree to manage multiple Claude Code sessions 3. https://github.com/julep-ai/julep/blob/dev/AGENTS.md: Read and implement 4. https://github.com/snagasuri/deebo-prototype: Autonomous debugging agent (MCP) 5. https://github.com/claude-did-this/claude-hub: connects Claude Code to GitHub repositories.

JimDabell

> It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up).

This doesn’t give you any time to experiment with alternative approaches. It’s equivalent to saying that the first approach you try as a beginner will be as good as it possibly gets, that there’s nothing at all to learn.

__MatrixMan__

It definitely takes more than minutes to discover the ways that your model is going to repeatedly piss you off and set up guardrails to mitigate those problems.

dingnuts

> You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

ok but how much am I supposed to spend before I supposedly just "get good"? Because based on the free trials and the pocket change I've spent, I don't consider the ROI worth it.

qinsig

Avoid using agents that can just blow through money (cline, roocode, claudecode with API key, etc).

Instead you can get comfortable prompting and managing context with aider.

Or you can use claude code with a pro subscription for a fair amount of usage.

I agree that seeing the tools just waste several dollars to just make a mess you need to discard is frustrating.

goalieca

And how often do your prompting skills change as the models evolve.

stray

You're going to spend a little over $1k to ramp up your skills with AI-aided coding. It's dirt cheap in the grand scheme of things.

viraptor

Not even close. I'm still under $100, creating full apps. Stick to reasonable models and you can achieve and learn a lot. You don't need latest and greatest in max mode (or whatever the new one calls it) for majority of the tasks. You can have to throw the whole project at the service every time either.

dingnuts

do I get a refund if I spend a grand and I'm still not convinced? at some point I'm going to start lying to myself to justify the cost and I don't know how much y'all earn but $1k is getting close

asciimov

How are those without that kind of scratch supposed to keep up with those that do?

badsectoracula

It wont be the hippest of solutions, but you can use something like Devstral Small with a full open source setup to get experimenting with local LLMs and a bunch of tools - or just chat with it with a chat interface. I did pingponged between Devstral running as a chat interface and my regular text editor some time ago to make a toy project of a raytracer [0] (output) [1] (code).

While it wasn't the fanciest integration (nor the best of codegen), it was good enough to "get going" (the loop was to ask the LLM do something, then me do something else in the background, then fix and merge the changed it did - even though i often had to fix stuff[2], sometimes it was less of a hassle than if i had to start from scratch[3]).

It can give you a vague idea that with more dedicated tooling (i.e. something that does automatically what you'd do by hand[4]) you could do more interesting things (combining with some sort of LSP functionality to pass function bodies to the LLM would also help), though personally i'm not a fan of the "dedicated editor" that seems to be used and i think something more LSP-like (especially if it can also work with existing LSPs) would be neat.

IMO it can be useful for a bunch of boilerplate-y or boring work. The biggest issue i can see is that the context is too small to include everything (imagine, e.g., throwing the entire Blender source code in an LLM which i don't think even the largest of cloud-hosted LLMs can handle) so there needs to be some external way to store stuff dynamically but also the LLM to know that external stuff are available, look them up and store stuff if needed. Not sure how exactly that'd work though to the extent where you could -say- open up a random Blender source code file, point to a function, ask the LLM to make a modification, have it reuse any existing functions in the codebase where appropriate (without you pointing them out) and then, if needed, have the LLM also update the code where the function you modified is used (e.g. if you added/removed some argument or changed the semantics of its use).

[0] https://i.imgur.com/FevOm0o.png

[1] https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...

[2] e.g. when i asked it to implement a BVH to speed up things it made something that wasn't hierarchical and actually slowed down things

[3] the code it produced for [2] was fixable to do a simple BVH

[4] i tried a larger project and wrote a script that `cat`ed and `xclip`ed a bunch of header files to pass to the LLM so it knows the available functions and each function had a single line comment about what it does - when the LLM wrote new functions it also added that comment. 99% of these oneliner comments were written by the LLM actually.

grogenaut

how much time did you spend learning your last language to become comfortable with it?

marssaxman

So far as I can tell, generative AI coding tools make the easy part of the job go faster, without helping with the hard part of the job - in fact, possibly making it harder. Coding just doesn't take that much time, and I don't need help doing it. You could make my coding output 100x faster without materially changing my overall productivity, so I simply don't bother to optimize there.

Jonovono

Are you a plumber perhaps?

worik

I am.

That is the mental model I have for the work (computer programing) i like to do and am good at.

Plumbing

lexandstuff

Great article. The other thing that you miss out on when you don't write the code yourself is that sense of your subconscious working for you. Writing code has a side benefit of developing a really strong mental model of a problem, that kinda gets embedded in your neurons and pays dividends down the track, when doing stuff like troubleshooting or deciding on how to integrate a new feature. You even find yourself solving problems in your sleep.

I haven't observed any software developers operating at even a slight multiplier from the pre-LLM days at the organisations I've worked at. I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

nerevarthelame

> I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

I think that's a really elegant way to put it. Google Research tried to measure LLM impacts on productivity in 2024 [1]. They gave their subjects an exam and assigned them different resources (a book versus an LLM). They found that the LLM users actually took more time to finish than those who used a book, and that only novices on the subject material actually improved their scores when using an LLM.

But the participants also perceived that they were more accurate and efficient using the LLM, when that was not the case. The researchers suggested that it was due to "reduced cognitive load" - asking an LLM something is easy and mostly passive. Searching through a book is active and can feel more tiresome. Like you said: people are getting addicted to not having to expend brain energy to solve problems, and mistaking that for productivity.

[1] https://storage.googleapis.com/gweb-research2023-media/pubto...

wiseowise

You’re twisting results. Just because they took more time doesn’t mean their productivity went down. On the contrary, if you can perform expert task with much less mental resources (which 99% of orgs should prioritize for) then it is an absolute win. Work is extremely mentally draining and soul crushing experience for majority of people, if AI can lower that while maintaining roughly same result with subjects allocating only, say, 25% of their mental energy – that’s an amazing win.

didibus

If I follow what you are saying, employers won't see any benefits, but employees, while they will take the same time and create the same output in the same amount of time, will be able to do so at a reduced mental strain?

Personally, I don't know if this is always a win, mostly because I enjoy the creative and problem solving aspect of coding, and reducing that to something that is more about prompting, correcting, and mentoring an AI agent doesn't bring me the same satisfaction and joy.

aryehof

[delayed]

tptacek

I'm fine with anybody saying AI agents don't work for their work-style and am not looking to rebut this piece, but I'm going to take this opportunity to call something out.

The author writes "reviewing code is actually harder than most people think. It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself". That sounds within an SD of true for me, too, and I had a full-time job close-reading code (for security vulnerabilities) for many years.

But it's important to know that when you're dealing with AI-generated code for simple, tedious, or rote tasks --- what they're currently best at --- you're not on the hook for reading the code that carefully, or at least, not on the same hook. Hold on before you jump on me.

Modern Linux kernels allow almost-arbitrary code to be injected at runtime, via eBPF (which is just a C program compiled to an imaginary virtual RISC). The kernel can mostly reliably keep these programs from crashing the kernel. The reason for that isn't that we've solved the halting problem; it's that eBPF doesn't allow most programs at all --- for instance, it must be easily statically determined that any backwards branch in the program runs for a finite and small number of iterations. eBPF isn't even good at determining that condition holds; it just knows a bunch of patterns in the CFG that it's sure about and rejects anything that doesn't fit.

That's how you should be reviewing agent-generated code, at least at first; not like a human security auditor, but like the eBPF verifier. If I so much as need to blink when reviewing agent output, I just kill the PR.

If you want to tell me that every kind of code you've ever had to review is equally tricky to review, I'll stipulate to that. But that's not true for me. It is in fact very easy to me to look at a rote recitation of an idiomatic Go function and say "yep, that's what that's supposed to be".

kenjackson

I can read code much faster than I can write it.

This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.

globnomulous

> I can read code much faster than I can write it.

I have known and worked with many, many engineers across a wide range of skill levels. Not a single one has ever said or implied this, and in not one case have I ever found it to be true, least of all in my own case.

I don't think it's humanly possible to read and understand code faster than you can write and understand it to the same degree of depth. The brain just doesn't work that way. We learn by doing.

autobodie

I think that's wrong. I only have to write code once, maybe twice. But when using AI agents, I have to read many (5? 10? I will always give up before 15) PRs before finding one close enough that I won't have to rewrite all of it. This nonsense has not saved me any time, and the process is miserable.

I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.

kenjackson

I only generate the code once with GenAI and typically fix a bug or two - or at worst use its structure. Rarely do I toss a full PR.

It’s interesting some folks can use them to build functioning systems and others can’t get a PR out of them.

smaudet

I guess my challenge is that "if it was a rote recitation of an idiomatic go function", was it worth writing?

There is a certain, style, lets say, of programming, that encourages highly non re-usable code that is both at once boring and tedious, and impossible to maintain and thus not especially worthwhile.

The "rote code" could probably have been expressed, succinctly, in terms that border on "plain text", but with more rigueur de jour, with less overpriced, wasteful, potentially dangerous models in-between.

And yes, machines like the eBPF verifier must follow strict rules to cut out the chaff, of which there is quite a lot, but it neither follows that we should write everything in eBPF, nor does it follow that because something can throw out the proverbial "garbage", that makes it a good model to follow...

Put another way, if it was that rote, you likely didn't need nor benefit from the AI to begin with, a couple well tested library calls probably sufficed.

tptacek

Yes. More things should be rote recitations. Rote code is easy to follow and maintain. We get in trouble trying to be clever (or DRY) --- especially when we do it too early.

Important tangential note: the eBPF verifier doesn't "cut out the chaff". It rejects good, valid programs. It does not care that the programs are valid or good; it cares that it is not smart enough to understand them; that's all that matters. That's the point I'm making about reviewing LLM code: you are not on the hook for making it work. If it looks even faintly off, you can't hurt the LLM's feelings by killing it.

smaudet

> We get in trouble trying to be clever (or DRY)

Certainly, however:

> That's the point I'm making about reviewing LLM code: you are not on the hook for making it work

The second portion of your statement is either confusing (something unsaid) or untrue (you are still ultimately on the hook).

Agentic AI is just yet another, as you put it way to "get in trouble trying to be clever".

My previous point stands - if it was that cut and dry, then a (free) script/library could generate the same code. If your only real use of AI is to replace template systems, congratulations on perpetuating the most over-engineered template system ever. I'll stick with a provable, free template system, or just not write the code at all.

stitched2gethr

Why would you review agent generated code any differently than human generated code?

tptacek

Because you don't care about the effort the agent took and can just ask for a do-over.

112233

This is radical and healthy way to do it. Obviously wrong — reject. Obviously right — accept. In any other case — also reject, as non-obvious.

I guess it is far removed from the advertized use case. Also, I feel one would be better off having auto-complete powered by LLM in this case.

tptacek

I don't find this to be the case. I've used (and hate) autocomplete-style LLM code generation. But I can feed 10 different tasks to Codex in the morning and come back and pick out the 3-4 I think might be worth pursuing, and just re-prompt the 7 I kill. That's nothing like interactive autocomplete, and drastically faster than than I could work without LLM assistance.

bluefirebrand

> Obviously right — accept.

I don't think code is ever "obviously right" unless it is trivially simple

monero-xmr

I mostly just approve PRs because I trust my engineers. I have developed a 6th sense for thousand-line PRs and knowing which 100-300 lines need careful study.

Yes I have been burned. But 99% of the time, with proper test coverage it is not an issue, and the time (money) savings have been enormous.

"Ship it!" - me

theK

I think this points out the crux of the difference of collaborating with other devs vs collaborating with am AI. The article correctly States that the AI will never learn your preferences or idiosyncrasies of the specific projects/company etc because it effectively is amnesic. You cannot trust the AI the same you trust other known collaborators because you don't have a real relationship with it.

autobodie

Haha, doing this with AI will bury you in a very deep hole.

jumploops

> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

As someone who uses Claude Code heavily, this is spot on.

LLMs are great, but I find the more I cede control to them, the longer it takes to actually ship the code.

I’ve found that the main benefit for me so far is the reduction of RSI symptoms, whereas the actual time savings are mostly over exaggerated (even if it feels faster in the moment).

adriand

Do you have to review the code? I’ll be honest that, like the OP theorizes, I often just spot review it. But I also get it to write specs (often very good, in terms of the ones I’ve dug into), and I always carefully review and test the results. Because there is also plenty of non-AI code in my projects I didn’t review at all, namely, the myriad open source libraries I’ve installed.

jumploops

Yes, I’m actually working on an another project with the goal of never looking at the code.

For context, it’s just a reimplementation of a tool I built.

Let’s just say it’s going a lot slower than the first time I built it by hand :)

hatefulmoron

It depends on what you're doing. If it's a simple task, or you're making something that won't grow into something larger, eyeballing the code and testing it is usually perfect. These types of tasks feel great with Claude Code.

If you're trying to build something larger, it's not good enough. Even with careful planning and spec building, Claude Code will still paint you into a corner when it comes to architecture. In my experience, it requires a lot of guidance to write code that can be built upon later.

The difference between the AI code and the open source libraries in this case is that you don't expect to be responsible for the third-party code later. Whether you or Claude ends up working on your code later, you'll need it to be in good shape. So, it's important to give Claude good guidance to build something that can be worked on later.

sagarpatil

I always use Claude Code to debug issues, there’s no point in trying to do this yourself when AI can fix it in minutes (easy to verify if you write tests first) o3 with new search can do things in 5 mins that will take me at least 30 mins if I’m very efficient. Say what you want but the time savings is real.

susshshshah

How do you know what tests to write if you don’t understand the code?

adastra22

I’m not sure I understand this statement. You give your program parameters X and expect result Y, but instead get Z. There is your test, embedded in the problem statement.

9rx

Same way you normally would? Tests are concerned with behaviour. The code that implements the behaviour is immaterial.

wiseowise

How do you do TDD without having code in the first place? How do QA verifies without reading the source?

null

[deleted]

cbsmith

There's an implied assumption here that code you write yourself doesn't need to be reviewed from a context different from the author's.

There's an old expression: "code as if your work will be read by a psychopath who knows where you live" followed by the joke "they know where you live because it is future you".

Generative AI coding just forces the mindset you should have had all along: start with acceptance criteria, figure out how you're going to rigorously validate correctness (ideally through regression tests more than code reviews), and use the review process to come up with consistent practices (which you then document so that the LLM can refer to it).

It's definitely not always faster, but waking up in the morning to a well documented PR, that's already been reviewed by multiple LLMs, with successfully passing test runs attached to it sure seems like I'm spending more of my time focused on what I should have been focused on all along.

Terr_

There's an implied assumption here that developers who end up spending all their time reviewing LLM code won't lose their skills or become homicidal. :p

cbsmith

Fair enough. ;-)

I'm actually curious about the "lose their skills" angle though. In the open source community it's well understood that if anything reviewing a lot of code tends to sharpen your skills.

mleonhard

I solved my RSI symptoms by keeping my arms warm all the time, while awake or asleep. Maybe that will work for you, too?

jumploops

My issue is actually due to ulnar nerve compression related to a plate on my right clavicle.

Years of PT have enabled me to work quite effectively and minimize the flare ups :)

null

[deleted]

hooverd

Is anybody doing cool hybrid interfaces? I don't actually want to do everything in conversational English, believe it or not.

jumploops

My workflow is to have spec files (markdown) for any changes I’m making, and then use those to keep Claude on track/pull out of the trees.

Not super necessary for small changes, but basically a must have for any larger refactors or feature additions.

I usually use o3 for generating the specs; also helpful for avoiding context pollution with just Claude Code.

adastra22

I do similar and find that this is the best compromise that I have tried. But I still find myself nodding along with OP. I am more and more finding that this is not actually faster, even though it certainly seems so.

bdamm

Isn't that what Windsurf or Cursor are?

roxolotl

> But interns learn and get better over time. The time that you spend reviewing code or providing feedback to an intern is not wasted, it is an investment in the future. The intern absorbs the knowledge you share and uses it for new tasks you assign to them later on.

This is the piece that confuses me about the comparison to a junior or an intern. Humans learn about the business, the code, the history of the system. And then they get better. Of course there’s a world where agents can do that, and some of the readme/doc solutions do that but the limitations are still massive and so much time is spent reexplaining the business context.

viraptor

You don't have to reexplain the business context. Save it to the mdc file if it's important. The added benefit is that the next real person looking at the code can also use that to learn - it's actually cool for having good up to date documentation is now an asset.

adastra22

Do you find your agent actually respecting the mdc file? I don’t.

viraptor

There should be no difference between the mdc and the text in the prompt. Try something drastic like "All of responses should be in Chinese". If it doesn't happen, they're not included correctly. Otherwise, yeah, they work modulo the usual issues of prompt adherence.

xarope

I think this is how certain LLMs end up with 14k worth of system prompts

Terr_

"Be fast", "Be Cheap", "Be Good".

*dusts off hands* Problem solved! Man, am I great at management or what?

freeone3000

Put the business context in the system prompt.

dvt

I'm actually quite bearish on AI in the generative space, but even I have to admit that writing boilerplate is "N" times faster using AI (use your favorite N). I hate when people claim this without any proof, so literally today this is what I asked ChatGPT:

    write a stub for a react context based on this section (which will function as a modal):
    ```
        <section>
         // a bunch of stuff
        </section>
    ```

Worked great, it created a few files (the hook, the provider component, etc.), and I then added them to my project. I've done this a zillion times, but I don't want to do it again, it's not interesting to me, and I'd have to look up stuff if I messed it up from memory (which I likely would, because provider/context boilerplate sucks).

Now, I can just do `const myModal = useModal(...)` in all my components. Cool. This saved me at least 30 minutes, and 30 minutes of my time is worth way more than 20 bucks a month. (N.B.: All this boilerplate might be a side effect of React being terrible, but that's beside the point.)

Winsaucerer

This kind of thing is my main use, boilerplate stuff And for scripts that I don't care about -- e.g., if I need a quick bash script to do a once off task.

For harder problems, my experience is that it falls over, although I haven't been refining my LLM skills as much as some do. It seems that the bigger the project, the more it integrates with other things, the worse AI is. And moreover, for those tasks it's important for me or a human to do it because (a) we think about edge cases while we work through the problem intellectually, and (b) it gives us a deep understanding of the system.

didibus

You could argue that AI-generated code is a black box, but let's adjust our perspective here. When was the last time you thoroughly reviewed the source code of a library you imported? We already work with black boxes daily as we evaluate libraries by their interfaces and behaviors, not by reading every line.

The distinction isn't whether code comes from AI or humans, but how we integrate and take responsibility for it. If you're encapsulating AI-generated code behind a well-defined interface and treating it like any third party dependency, then testing that interface for correctness is a reasonable approach.

The real complexity arises when you have AI help write code you'll commit under your name. In this scenario, code review absolutely matters because you're assuming direct responsibility.

I'm also questioning whether AI truly increases productivity or just reduces cognitive load. Sometimes "easier" feels faster but doesn't translate to actual time savings. And when we do move quicker with AI, we should ask if it's because we've unconsciously lowered our quality bar. Are we accepting verbose, oddly structured code from AI that we'd reject from colleagues? Are we giving AI-generated code a pass on the same rigorous review process we expect for human written code? If so, would we see the same velocity increases from relaxing our code review process amongst ourselves (between human reviewers)?

materielle

I’m not sure that the library comparison really works.

Libraries are maintained by other humans, who stake their reputation on the quality of the library. If a library gets a reputation of having a lax maintainer, the community will react.

Essentially, a chain of responsibility, where each link in the chain has an incentive to behave well else they be replaced.

Who is accountable for the code that AI writes?

bluefirebrand

> When was the last time you thoroughly reviewed the source code of a library you imported?

Doesn't matter, I'm not responsible for maintaining that particular code

The code in my PRs has my name attached, and I'm not trusting any LLM with my name

didibus

Exactly, that's what I'm saying. Commit AI code under its own name. Then the code under your name can use the AI code as a black box. If your code that uses AI code works as expected, it is similar to when using libraries.

If you consider that AI code is not code any human needs to read or later modify by hand, AI code is modified by AI. All you want to do is just fully test it, if it all works, it's good. Now you can call into it from your own code.

adastra22

These days, I review external dependencies pretty thoroughly. I did not use to. This is because of AI slop though.

danieltanfh95

AI models are fundamentally trained on patterns from existing data - they learn to recognize and reproduce successful solution templates rather than derive solutions from foundational principles. When faced with a problem, the model searches for the closest match in its training experience rather than building up from basic assumptions and logical steps.

Human experts excel at first-principles thinking precisely because they can strip away assumptions, identify core constraints, and reason forward from fundamental truths. They might recognize that a novel problem requires abandoning conventional approaches entirely. AI, by contrast, often gets anchored to what "looks similar" and applies familiar frameworks even when they're not optimal.

Even when explicitly prompted to use first-principles analysis, AI models can struggle because:

- They lack the intuitive understanding of when to discard prior assumptions

- They don't naturally distinguish between surface-level similarity and deep structural similarity

- They're optimized for confident responses based on pattern recognition rather than uncertain exploration from basics

This is particularly problematic in domains requiring genuine innovation or when dealing with edge cases where conventional wisdom doesn't apply.

Context poisoning, intended or not, is a real problem that humans are able to solve relatively easily while current SotA models struggle.

adastra22

So are people. People are trained on existing data and learn to reproduce known solutions. They also take this to the meta level—a scientist or engineer is trained on methods for approaching new problems which have yielded success in the past. AI does this too. I’m not sure there is actually a distinction here..

animex

I write mostly boilerplate and I'd rather have the AI do it. The AI is also slow, which is great, which allows me to run 2 or 3 AI workspaces working on different tickets/problems at the same time.

Where AI especially excels is helping me do maintenance tickets on software I rarely touch (or sometimes never have touched). It can quickly read the codebase, and together we can quickly arrive at the place where the patch/problem lies and quickly correct it.

I haven't written anything "new" in terms of code in years, so I'm not really learning anything from coding manually but I do love solving problems for my customers.

HN

Generative AI coding tools and agents do not work for me

Generative AI coding tools and agents do not work for me