Human coders are still better than LLMs

720 comments

·May 29, 2025

mattnewton

This matches my experience. I actually think a fair amount of value from LLM assistants to me is having a reasonably intelligent rubber duck to talk to. Now the duck can occasionally disagree and sometimes even refine.

https://en.m.wikipedia.org/wiki/Rubber_duck_debugging

I think the big question everyone wants to skip right to and past this conversation is, will this continue to be true 2 years from now? I don’t know how to answer that question.

Buttons840

LLMs aren't my rubber duck, they're my wrong answer.

You know that saying that the best way to get an answer online is to post a wrong answer? That's what LLMs do for me.

I ask the LLM to do something simple but tedious, and then it does it spectacularly wrong, then I get pissed off enough that I have the rage-induced energy to do it myself.

Buttons840

I'm probably suffering undiagnosed ADHD, and will get stuck and spend minutes picking a function name and then writing a docstring. LLMs do help with this even if they get the code wrong, because I usually won't bother to fix their variables names or docstring unless needed. LLMs can reliably solve the problem of a blank-page.

linotype

This. I have ADHD and starting is the hardest part for me. With an LLM it gets me from 0 to 20% (or more) and I can nail it for the rest. It’s way less stressful for me to start now.

albrewer

> LLMs can reliably solve the problem of a blank-page.

This has been the biggest boost for me. The number of choices available when facing a blank page is staggering. Even a bad/wrong implementation helps collapse those possibilities into a countable few that take far less time to think about.

msgodel

Yeah, keeping me in the flow when I hit one of those silly tasks my brain just randomly says "no let's do something else" to has been the main productivity improving feature of LLMs.

carlmr

So much this, the blank page problem is almost gone. Even if it's riddled with errors.

materiallie

This is my experience, too. As a concrete example, I'll need to write a mapper function to convert between a protobuf type and Go type. The types are mirror reflections of each other, and I feed the complete APIs of both in my prompt.

I've yet to find an LLM that can reliability generate mapping code between proto.Foo{ID string} to gomodel.Foo{ID string}.

It still saves me time, because even 50% accuracy is still half that I don't have to write myself.

But it makes me feel like I'm taking crazy pills whenever I read about AI hype. I'm open to the idea that I'm prompting wrong, need a better workflow, etc. But I'm not a luddite, I've "reached up and put in the work" and am always trying to learn new tools.

lazyasciiart

An LLM ability to do a task is roughly correlated to the number of times that task has been done on the internet before. If you want to see the hype version, you need to write a todo web app in typescript or similar. So it's probably not something you can fix with prompts, but having a model with more focus on relevant training data might help.

akoboldfrying

This honestly seems like something that could be better handled with pre-LLM technology, like a 15-line Perl script that reads one on stdin, applies some crufty regexes, and writes the other to stdout. Are there complexities I'm not seeing?

Affric

Yep.

I like maths, I hate graphing. Tedious work even with state of the art libraries and wrappers.

LLMs do it for me. Praise be.

lanstin

Yeah, I write a lot of little data analysis scripts and stuff, and I am happy just to read the numbers, but now I get nice PNGs of the distributions and so on from LLM, and people like that.

bsder

LLMs are a decent search engine a la Google circa 2005.

It's been 20 years since that, so I think people have simply forgotten that a search engine can actually be useful as opposed to ad infested SEO sewage sludge.

The problem is that the conversational interface, for some reason, seems to turn off the natural skepticism that people have when they use a search engine.

AdieuToLogic

> LLMs are a decent search engine a la Google circa 2005.

Statistical text (token) generation made from an unknown (to the user) training data set is not the same as a keyword/faceted search of arbitrary content acquired from web crawlers.

> The problem is that the conversational interface, for some reason, seems to turn off the natural skepticism that people have when they use a search engine.

For me, my skepticism of using a statistical text generation algorithm as if it were a search engine is because a statistical text generation algorithm is not a search engine.

pjmlp

Except a search engine isn't voice controlled, and able to write code for me.

Recently I did some tests with coding agents, and being able to translate a full application from AT&T Assembly into Intel Assembly compatible with NASM, in about half an hour of talking with agent, and having the end result actually working with minor tweeks isn't something a "decent search engine a la Google circa 2005." would ever been able to achieve.

In the past I would have given such a task to a junior dev or intern, to keep them busy somehow, with a bit more tool maturity I have no reason to do it in the future.

And this is the point many developers haven't yet grasped about their future in the job market.

andrekandre

  > the conversational interface, for some reason, seems to turn off the natural skepticism that people have

n=1 but after having chatgpt "lie" to me more than once i am very skeptical of it and always double check it, whereas something like tv or yt videos i still find myself being click-baited or grifted (iow less skeptical) much more easily still... any large studies about this would be very interesting...

therealpygon

LLMs follow instructions. Garbage in = garbage out generally. When attention is managed and a problem is well defined and necessary materials are available to it, they can perform rather well. On the other hand, I find a lot of the loosely-goosey vibe coding approach to be useless and gives a lot of false impressions about how useful LLMs can be, both too positive and too negative.

GiorgioG

So what you’re saying is you need to be very specific and detailed when writing your specifications for the LLM to spit out the code you want. Sounds like I can just skip the middle man and code it myself.

AndrewKemendo

This seems to be what’s happened

People are expecting perfection from bad spec

Isn’t that what engineers are (rightfully) always complaining about to BD?

gpm

This hasn't been my experience (using the latest claude and gemini models). They'll produce poor code even when given a well defined easily achievable task with specific instructions. The code will usually more or less work with today's models, but it will do things like call a function to recreate a value that is already stored in a local variable... (and worse issues prop us the more design-work you leave to the LLM, even dead simple design work with really only one good answer)

I've definitely also found that the poor code can sometimes be a nice starting place. One thing I think it does for me is make me fix it up until it's actually good, instead of write the first thing that comes to mind and declare it good enough (after all my poorly written first draft is of course perfect). In contrast to the usual view of AI assisted coding, I think this style of programming for tedious tasks makes me "less productive" (I take longer) but produces better code.

geraneum

> LLMs follow instructions.

Not really, not always. To anyone who’s used the latest LLMs extensively, it’s clear that this is not something you can reliably assume even with the constraints you mentioned.

troupo

> LLMs follow instructions.

They don't

> Garbage in = garbage out generally.

Generally, this statement is false

> When attention is managed and a problem is well defined and necessary materials are available to it, they can perform rather well.

Keyword: can.

They can also not perform really well despite all the management and materials.

They can also work really well with loosey-goosey approach.

The reason is that they are non-deterministic systems whose performance is affected more by compute availability than by your unscientific random attempts at reverse engineering their behavior https://dmitriid.com/prompting-llms-is-not-engineering

myvoiceismypass

They should maybe have a verifiable specification for said instructions. Kinda like a programming language maybe!

otabdeveloper4

> LLMs follow instructions.

No they don't, they generate a statistically plausible text response given a sequence of tokens.

xarope

I have to upvote this, because this is how I felt after trying three times (that I consciously decided to give an LLM a try, versus having it shoved down my throat by google/ms/meta/etc) and giving up (for now).

seattle_spring

This has been my experience as well. The biggest problem is that the answers look plausible, and only after implementation and experimentation do you find them to be wrong. If this happened every once in a while then it wouldn't be a big deal, but I'd guess that more than half of the answers and tutorials I've received through ChatGPT have ended up being plain wrong.

God help us if companies start relying on LLMs for life-or-death stuff like insurance claim decisions.

dabraham1248

I'm not sure if you're being sarcastic, but in case you're not... From https://arstechnica.com/health/2023/11/ai-with-90-error-rate...

"UnitedHealth uses AI model with 90% error rate to deny care, lawsuit alleges" Also "The use of faulty AI is not new for the health care industry."

pwdisswordfishz

> If this happened every once in a while then it wouldn't be a big deal, but I'd guess that more than half of the answers and tutorials I've received through ChatGPT have ended up being plain wrong.

It would actually have been more pernicious that way, since it would lull people into a false sense of security.

AndrewKemendo

Out of curiosity can you give me an example prompt(s) you’ve used and been disappointed

I see these comments all the time and they don’t reflect my experience so I’m curious what your experience has been

anonzzzies

There are so many examples where all current top models just will loop forever even if you instruct them literally the code. We know many of them, but for instance in a tailwind react project with some degree of complexity (nested components), if you ask for something to scroll in it's space, it will never figure out min-h-0 even if you tell it. It will just loop forever rewriting the code adding and removing things, to the point of it just putting comments like 'This will add overflow' and writing js to force scroll, and it will never work even if you literally tell it what to do. Don't know why, all big and small models have this, and I found Gemini is currently the only model that sometimes randomly has the right idea but then still cannot resolve it. For this we went back to not using tailwind and back to global vanilla css, which I never thought I would say, is rather nice.

Buttons840

I asked Chat GPT 4o to write an Emacs function to highlight a line. This involves setting the "mark" at the beginning, and the "point" at the end. It would only set the point, so I corrected it "no, you have to set both", but even after correction it would move the point to the beginning, and then moved the point again to the end, without ever touching the mark.

marcosdumay

It's a damning assertive duck, completely out of proportion to its competence.

I've seen enough people led astray by talking to it.

foxyv

Same here. When I'm teaching coding I've noticed that LLMs will confuse the heck out of students. They will accept what it suggests without realizing that it is suggesting nonsense.

cogogo

I’m self taught and don’t code that much but I feel like I benefit a ton from LLMs giving me specific answers to questions that would take me a lot of time to figure out with documentation and stack overflow. Or even generating snippets that I can evaluate whether or not will work.

But I actually can’t imagine how you can teach someone to code if they have access to an LLM from day one. It’s too easy to take the easy route and you lose the critical thinking and problem solving skills required to code in the first place and to actually make an LLM useful in the second. Best of luck to you… it’s a weird time for a lot of things.

*edit them/they

XorNot

This was what promptly led me to turning off Jetbrains AI assistant: the multiline completion was incredibly distracting to my chain of thought, particularly when it would suggest things that looked right but weren't. Stopping and parsing the suggestion to realize if it was right or wrong would completely kill my flow.

qwertox

What one would expect if they can't read the code because they haven't learned to code.

TBF, trial and error has usually been my path as well, it's just that I was generating the errors so I would know where to find them.

chucksmash

Tbf, there's a phase of learning to code where everything is pretty much an incantation you learn because someone told you "just trust me." You encounter "here's how to make the computer print text in Python" before you would ever discuss strings or defining and invoking functions, for instance. To get your start you kind of have to just accept some stuff uncritically.

It's hard to remember what it was like to be in that phase. Once simple things like using variables are second nature, it's difficult to put yourself back into the shoes of someone who doesn't understand the use of a variable yet.

klntsky

I would argue that they are never led astray by chatting, but rather by accepting the projection of their own prompt passed through the model as some kind of truth.

When talking with reasonable people, they have an intuition of what you want even if you don't say it, because there is a lot of non-verbal context. LLMs lack the ability to understand the person, but behave as if they had it.

marcosdumay

Most of the times, people are led astray by following average advice on exceptional circumstances.

People with a minimum amount of expertise stop asking for advice for average circumstances very quickly.

sigmoid10

It's mostly a question of experience. I've been writing software long enough that when I give chat models some code and a problem, I can immediately tell if they understood it or if they got hooked on something unrelated. But junior devs will have a hell of a hard time, because the raw code quality that LLMs generate is usually top notch, even if the functionality is completely off.

traceroute66

> When talking with reasonable people

When talking with reasonable people, they will tell you if they don't understand what you're saying.

When talking with reasonable people, they will tell you if they don't know the answer or if they are unsure about their answer.

LLMs do none of that.

They will very happily, and very confidently, spout complete bullshit at you.

It is essentially a lotto draw as to whether the answer is hallucinated, completely wrong, subtly wrong, not ideal, sort of right or correct.

An LLM is a bit like those spin the wheel game shows on TV really.

prisenco

I use it as a rubber duck but you're right. Treat it like a brilliant idiot and never a source of truth.

I use it for what I'm familiar with but rusty on or to brainstorm options where I'm already considering at least one option.

But a question on immunobiology? Waste of time. I have a single undergraduate biology class under my belt, I struggled for a good grade then immediately forgot it all. Asking it something I'm incapable of calling bullshit on is a terrible idea.

But rubber ducking with AI is still better than let it do your work for you.

protocolture

I spend a lot of time working shit out to prove the rubber duck wrong and I am not completely sure this is a bad working model.

jasonm23

Try a system prompt like this:

- - -

System Prompt:

You are ChatGPT, and your goal is to engage in a highly focused, no-nonsense, and detailed way that directly addresses technical issues. Avoid any generalized speculation, tangential commentary, or overly authoritative language. When analyzing code, focus on clear, concise insights with the intent to resolve the problem efficiently. In cases where the user is troubleshooting or trying to understand a specific technical scenario, adopt a pragmatic, “over-the-shoulder” problem-solving approach. Be casual but precise—no fluff. If something is unclear or doesn’t make sense, ask clarifying questions. If surprised or impressed, acknowledge it, but keep it relevant. When the user provides logs or outputs, interpret them immediately and directly to troubleshoot, without making assumptions or over-explaining.

- - -

amelius

If this is a problem for you, just add "... and answer in the style of a drunkard" to your prompts.

all2

My typical approach is prompt, be disgusted by the output, tinker a little on my own, prompt again -- but more specific, be disgusted again by the output, tinker a littler more, etc.

Eventually I land on a solution to my problem that isn't disgusting and isn't AI slop.

Having a sounding board, even a bad one, forces me to order my thinking and understand the problem space more deeply.

suddenlybananas

Why not just write the code at that point instead of cajoling an AI to do it.

drivenextfunc

Regarding the stubborn and narcissistic personality of LLMs (especially reasoning models), I suspect that attempts to make them jailbreak-resistant might be a factor. To prevent users from gaslighting the LLM, trainers might have inadvertently made the LLMs prone to gaslighting users.

schwartzworld

For me it's like having a junior developer work under me who knows APIs inside and out, but has no common sense about architecture. I like that I delegate tasks to them so that my brain can be free for other problems, but it makes my job much more review heavy than before. I put every PR through 3-4 review cycles before even asking my team for a review.

eslaught

How do you not completely destroy your concentration when you do this though?

I normally build things bottom up so that I understand all the pieces intimately and when I get to the next level of abstraction up, I know exactly how to put them together to achieve what I want.

In my (admittedly limited) use of LLMs so far, I've found that they do a great job of writing code, but that code is often off in subtle ways. But if it's not something I'm already intimately familiar with, I basically need to rebuild the code from the ground up to get to the point where I understand it well enough so that I can see all those flaws.

At least with humans I have some basic level of trust, so that even if I don't understand the code at that level, I can scan it and see that it's reasonable. But every piece of LLM generated code I've seen to date hasn't been trustworthy once I put in the effort to really understand it.

schwartzworld

I use a few strategies, but it's mostly the same as if I was mentoring a junior. A lot of my job already involved breaking up big features into small tickets. If the tasks are small enough, juniors and LLMs have an easier time implementing things and I have an easier time reviewing. If there's something I'm really unfamiliar with, it should be in a dedicated function backed by enough tests that my understanding of the implementation isn't required. In fact, LLMs do great with TDD!

> At least with humans I have some basic level of trust, so that even if I don't understand the code at that level, I can scan it and see that it's reasonable.

If you can't scan the code and see that it's reasonable, that's a smell. The task was too big or its implemented the wrong way. You'd feel bad telling a real person to go back and rewrite it a different way but the LLM has no ego to bruise.

I may have a different perspective because I already do a lot of review, but I think using LLMs means you have to do more of it. What's the excuse for merging code that is "off" in any way? The LLM did it? It takes a short time to review your code, give your feedback to the LLM and put up something actually production ready.

> But every piece of LLM generated code I've seen to date hasn't been trustworthy once I put in the effort to really understand it.

That's why your code needs tests. More tests. If you can't test it, it's wrong and needs to be rewritten.

xandrius

Keep using it and you'll see. Also that depends on the model and prompting.

My approach is to describe the task in great detail, which also helps me completing my own understanding of the problem, in case I hadn't considered an edge case or how to handle something specific. The more you do that the closer the result you get is to your own personal taste, experience and design.

Of course you're trading writing code vs writing a prompt but it's common to make architectural docs before making a sizeable feature, now you can feed that to the LLM instead of just having it be there.

ehnto

To me delegation requires the full cycle of agency, with the awareness that I probably shouldn't be interrupted shortly after delegating. I delegated so I can have space from the task and so babysitting it really doesn't suit my needs. I want the task done, but some time in the future.

From my coworkers I want to be able to say, here's the ticket, you got this? And they take the ticket all the way or PR, interacting with clients, collecting more information etc.

I do somewhat think an LLM could handle client comms for simple extra requirements gathering on already well defined tasks. But I wouldn't trust my business relationships to it, so I would never do that.

_tom_

For me, it's a bit like pair programming. I have someone to discuss ideas with. Someone to review my code and suggest alternative approaches. Some one that uses different feature than I do, so I learn from them.

traceroute66

I guess if you enjoy programming with someone you can never really trust, then yeah, sure, its "a bit like" pairs programming.

mock-possum

Trust, but verify ;]

platevoltage

This is how I use it too. It's great at quickly answering questions. I find it particularly useful if I have to work with a language of framework that I'm not fully experienced in.

12_throw_away

> I find it particularly useful if I have to work with a language of framework that I'm not fully experienced in

Yep - my number 1 use case for LLMs is as a template and example generator. It actually seems like a fairly reasonable use for probabilistic text generation!

p1necone

> the duck can occasionally disagree

This has not been my experience. LLMs have definitely been helpful, but generally they either give you the right answer or invent something plausible sounding but incorrect.

If I tell it what I'm doing I always get breathless praise, never "that doesn't sound right, try this instead."

crazygringo

That's not my experience. I routinely get a polite "that might not be the optimal solution, have you considered..." when I'm asking whether I should do something X way with Y technology.

Of course it has to be something the LLM actually has lots of material it's trained with. It won't work with anything remotely cutting-edge, but of course that's not what LLM's are for.

But it's been incredibly helpful for me in figuring out the best, easiest, most idiomatic ways of using libraries or parts of libraries I'm not very familiar with.

Jarwain

I find it very much depends on the LLM you're using. Gemini feels more likely to push back than claude 3.7 is. Haven't tried claude 4 yet

mbrameld

Ask it. Instead of just telling it what you're doing and expecting it to criticize that, ask it directly for criticism. Even better, tell it what you're doing, then tell it to ask you questions about what you're doing until it knows enough to recommend a better approach.

lupire

This is key. Humans each have a personality and some sense of mood. When you ask for help, you choose ask and that person can sense your situation. LLM has every personality and doesn't know your situation. You have to tell it which personality to use and what your situation is.

marcosdumay

LLMs will still be this way 10 years from now.

But IDK if somebody won't create something new that gets better. But there is no reason at all to extrapolate our current AIs into something that solves programing. Whatever constraints that new thing will have will be completely unrelated to the current ones.

smokel

Stating this without any arguments is not very convincing.

Perhaps you remember that language models were completely useless at coding some years ago, and now they can do quite a lot of things, even if they are not perfect. That is progress, and that does give reason to extrapolate.

Unless of course you mean something very special with "solving programming".

bigstrat2003

> Perhaps you remember that language models were completely useless at coding some years ago, and now they can do quite a lot of things, even if they are not perfect.

IMO, they're still useless today, with the only progress being that they can produce a more convincing facade of usefulness. I wouldn't call that very meaningful progress.

marcosdumay

Why state the same arguments everybody has been repeating for ages?

LLMs can only give you code that somebody has wrote before. This is inherent. This is useful for a bunch of stuff, but that bunch won't change if OpenAI decides to spend the GDP of Germany training one instead of Costa Rica.

Retric

Progress sure, but the rate the’ve improved hasn’t been particularly fast recently.

Programming has become vastly more efficient in terms of programmer effort over decades, but making some aspects of the job more efficient just means all your effort it spent on what didn’t improve.

apwell23

> Perhaps you remember that language models were completely useless at coding some years ago

no i don't remember that. They are doing similar things now that they did 3 yrs ago. They were still a decent rubber duck 3 yrs ago.

Bukhmanizer

There are a couple people I work with who clearly don’t have a good understanding of software engineering. They aren’t bad to work with and are in fact great at collaborating and documenting their work, but don’t seem to have the ability to really trace through code and logically understand how it works.

Before LLMs it was mostly fine because they just didn’t do that kind of work. But now it’s like a very subtle chaos monkey has been unleashed. I’ve asked on some PRs “why is this like this? What is it doing?” And the answer is “ I don’t know, ChatGPT told me I should do it.”

The issue is that it throws basically all their code under suspicion. Some of it works, some of it doesn’t make sense, and some of it is actively harmful. But because the LLMs are so good at giving plausible output I can’t just glance at the code and see that it’s nonsense.

And this would be fine if we were working on like a crud app where you can tell what is working and broken immediately, but we are working on scientific software. You can completely mess up the results of a study and not know it if you don’t understand the code.

protocolture

>And the answer is “ I don’t know, ChatGPT told me I should do it.”

This weirds me out. Like I use LLMs A LOT but I always sanity check everything, so I can own the result. Its not the use of the LLM that gets me its trying to shift accountability to a tool.

dlivingston

> I’ve asked on some PRs “why is this like this? What is it doing?” And the answer is “ I don’t know, ChatGPT told me I should do it.”

This would infuriate me. I presume these are academics/researchers and not junior engineers?

Unfortunately this is the world we're entering into, where all of us will be outsourcing more and more of our 'thinking' to machines.

jajko

Sounds almost like you definitely shouldnt use llms nor those juniors for such an important work.

Is it just me or are we heading into a period of explosion of software done, but also a massive drop of its quality? Not uniformly, just a bit of chaotic spread

Bukhmanizer

> llms nor those juniors for such an important work.

Yeah we shouldn’t and I limit my usage to stuff that is easily verifiable.

But there’s no guardrails on this stuff, and one thing that’s not well considered is how these things which make us more powerful and productive can be destructive in the hands of well intentioned people.

palmotea

> Is it just me or are we heading into a period of explosion of software done, but also a massive drop of its quality? Not uniformly, just a bit of chaotic spread

I think we are, especially with executives mandating the use LLMs use and expecting it to massively reduce costs and increase output.

For the most part they don't actually seem to care that much about software quality, and tend to push to decrease quality at every opportunity.

jrochkind1

Which is frightening, because it's not like our industry is known for producing really high quality code at the starting point before LLM authored code.

otabdeveloper4

> those juniors

I'm betting they're the most senior people on the team.

johnnyanmac

>I think the big question everyone wants to skip right to and past this conversation is, will this continue to be true 2 years from now?

For me, it's less "conversation to be skipped" and more about "can we even get to 2 years from now"? There's so much insability right now that it's hard to say what anything will look like in 6 months. "

UncleOxidant

There's some whistling past the graveyard in these comments. "You still need humans for the social element...", "LLMs are bad at debugging", "LLMs lead you astray". And yeah, there's lots of truth in those assertions, but since I started playing with LLMs to generate code a couple of years ago they've made huge strides. I suspect that over the next couple of years the improvements won't be quite as large (Pareto Principle), but I do expect we'll still see some improvement.

Was on r/fpga recently and mentioned that I had had a lot of success recently in getting LLMs to code up first-cut testbenches that allow you to simulate your FPGA/HDL design a lot quicker than if you were to write those testbenches yourself and my comment was met with lots of derision. But they hadn't even given it a try to form their conclusion that it just couldn't work.

xhevahir

This attitude is depressingly common in lots of professional, white-collar industries I'm afraid. I just came from the /r/law subreddit and was amazed at the kneejerk dismissal there of Dario Amodei's recent comments about legal work, and of those commenters who took them seriously. It's probably as much a coping mechanism as it is complacency, but, either way, it bodes very poorly for our future efforts at mitigating whatever economic and social upheaval is coming.

garciasn

This is the response to most new technologies; folks simply don't want to accept the future before the ramifications truly hit. If technology folk cannot see the INCREDIBLE LEAP FORWARD made by LLMs since ChatGPT came on the market, they're not seeing the forest through the trees because their heads are buried in the sand.

LLMs for coding are not even close to imperfect, yet, but the saturation curves are not flattening out; not by a long shot. We are living in a moment and we need to come to terms with it as the work continues to develop; and, we need to adapt and quickly in order to better understand what our place will become as this nascent tech continues its meteoric trajectory toward an entirely new world.

eikenberry

I don't think it is only (or even mostly) not wanting to accept it, I think it is at least equal measure just plain skepticism. We've seen all sorts of wild statements about how much something is going to revolutionize X and then turns out to be nothing. Most people disbelieve these sorts of claims until they see real evidence for themselves... and that is a good default position.

0points

> If technology folk cannot see the INCREDIBLE LEAP FORWARD made by LLMs since ChatGPT came on the market, they're not seeing the forest through the trees because their heads are buried in the sand.

Look, we see the forest. We are just not impressed by it.

Having unlimited chaos monkeys at will is not revolutionizing anything.

const_cast

Lawyers don't even use version control software a lot of the time. They burn hundreds of paralegal hours reconciling revisions, a task that could be made 100x faster and easier with Git.

There's no guarantee a technology will take off, even if it's really, really good. Because we don't decide if that tech takes off - the lawyers do. And they might not care, or they might decide billing more hours is better, actually.

v3xro

Many of us would prefer to see the technological leaps to be evenly distributed (so e.g. even clean drinking water that does not need to be boiled before consumption is not a baseline in 2025). So if you want to adapt to your new and improved position where you are just pushing buttons fine - but some of us are actually interested in how computers work (and are actually really uninterested in most companies' bottom lines). It's just how it is ;)

rxtexit

I think many people just settled in while we had no real technological change for 15 years. Real change, not an update to a web framework.

When I graduated high school, I had never been or knew anyone who had ever been on the internet at all. The internet was this vague "information superhighway" that I didn't know really what to make of.

If you are of a certain age though you would think a pointless update to react was all the change ever coming.

That time is over and we are back to reality.

otabdeveloper4

> you are ... le LUDDITE

Or maybe they just know the nitty-gritty inherent limitations of technology better than you.

(inb4: "LLMs can't have limitations! Wait a few years and they will solve literally every possible problem!")

ben-schaaf

Friendly reminder that people like you were saying the exact same thing about metaverse, VR, web3, crypto, etc.

fumeux_fume

Ah yes, please enjoy living in your moment and anticipating your entirely new world. I also hear all cars will be driving themselves soon and Jesus is coming back any day now.

spamizbad

I think it's pretty reasonable to take a CEO's - any CEO in any industry - statements with a grain of salt. They are under tremendous pressure to paint the most rosy picture possible of their future. They actually need you to "believe" just as much as their team needs to deliver.

onion2k

In most professional industries getting to the right answer is only half the problem. You also need to be able to demonstrate why that is the right answer. Your answer has to stand up to criticism. If your answer is essentially the output of a very clever random number generator you can't ever do that. Even if an LLM could output an absolutely perfect legal argument that matched what a supreme court judge would argue every time, that still wouldn't be good enough. You'd still need a person there to be accountable for making the argument and to defend the argument.

Software isn't like this. No one cares why you wrote the code in your PR. They only care about whether it's right.

This is why LLMs could be useful in one industry and a lot less useful in another.

rxtexit

IMO it is a mixture of stupidity and denial.

I am not a software engineer but I just can't imagine my job is not automated in 10 years or less.

10 years is about the time between King – Man + Woman = Queen and now.

I think what is being highly underestimated is the false sense of security people feel because the jobs they interface with are also not automated, yet.

It is not hard to picture the network of automation that once one role is automated, connected roles to that role become easier to automate. So on and so on while the models keep getting stronger at the same time.

I expect we will have a recession at some point and the jobs lost are gone forever.

sanderjd

Isn't this also kind of just ... a reddit thing?

golergka

Lawyers say those things and then one law firm after another is frantically looking for a contractor to overpay them to install local RAG and chatbot combo.

geros

"It is difficult to get a man to understand something when his salary depends upon his not understanding it." - Upton Sinclair

layer8

Programmers derided programming languages (too inefficient, too inflexible, too dumbing-down) when assembly was still the default. That phenomenon is at the same time entirely to be expected but also says little about the actual qualities of the new technology.

skydhash

If you have something that generated 20 lines of assembly that takes 100x times more than the 2 lines of clever instructions you know, you'd have the same stance even if the higher level was easier to use. Then those kind of performance tricks ceases to matter. But reliability still do. And the reasons we use higher and higher level of programming languages is because they increase reliability and simplicity (at the cost of performance, but we're happy to pay those).

LLMs output are unreliable and productivity is still not proven for an end to end engineering cycle.

layer8

You are arguing based on the merits of the technology, which is fine, but wasn't my point. I was arguing that derision tends to happen no matter what, and thus doesn't indicate much about the merits of the technology one way or the other.

ch4s3

It seems like LLMs made really big strides for a while but don't seem to be getting better recently, and in some ways recent models feel a bit worse. I'm seeing some good results generating test code, and some really bad results when people go to far with LLM use on new feature work. Base on what I've seen it seems like spinning up new projects and very basic features for web apps works really well, but that doesn't seem to generalize to refactoring or adding new features to big/old code bases.

I've seen Claude and ChatGPT happily hallucinate whole APIs for D3 on multiple occasions, which should be really well represented in the training sets.

soerxpso

> hallucinate whole APIs for D3 on multiple occasions, which should be really well represented in the training sets

With many existing systems, you can pull documentation into context pretty quickly to prevent the hallucination of APIs. In the near future it's obvious how that could be done automatically. I put my engine on the ground, ran it and it didn't even go anywhere; Ford will never beat horses.

prisenco

It's true that manually constraining an LLM with contextual data increases their performance on that data (and reduces performance elsewhere), but that conflicts with the promise of AI as an everything machine. We were promised an everything machine but if we have to not only provide it the proper context, but already know what constitutes the proper context, then it is not in any way an everything machine.

Which means it's back to being a very useful tool, but not the earth-shattering disruptor we hoped (or worried) it would be.

oconnor663

> don't seem to be getting better recently

o3 came out just one month ago. Have you been using it? Subjectively, the gap between o3 and everything before it feels like the biggest gap I've seen since ChatGPT originally came out.

ch4s3

I haven't used it extensively, but toyed around with it for Elixir code and I wasn't particularly impressed.

empath75

the LLM's themselves are making marginal gains, but the tools for using LLMs productively are getting so much better.

dinfinity

This. MCP/tool usage in agentic mode is insanely powerful. Let the agent ingest a Gitlab issue, tell it how it can run commands, tests etc. in the local environment and half of the time it can just iterate towards a solution all by itself (but watching and intervening when it starts going the wrong way is still advisable).

Recently I converted all the (Google Docs) documentation of a project to markdown files and added those to the workspace. It now indexes it with RAG and can easily find relevant bits of documentation, especially in agent mode.

It really stresses the importance of getting your documentation and processes in order as well as making sure the tasks at hand are well-specified. It soon might be the main thing that requires human input or action.

cushychicken

ChatGPT-4o is scary good at writing VHDL.

Using it to prototype some low level controllers today, as a matter of fact!

UncleOxidant

Claude and Gemini are decent at it as well. I was surprised when I asked claude (and this was several months back) to come up with a testbench for some very old, poorly documented verilog. It did a very decent job for a first-cut testbench. It even collected common, recurring code into verilog tasks (functions) which really surprised me at the time.

cushychicken

Yes! It’s much better at using functional logic than I am - which I appreciate!

Panzer04

What kind of things is it doing?

I have a hard time imagining an LLM being able to do arbitrary things. It always feels like LLMs can do lots of the easy stuff, but if they can't do everything you need the skilled engineer anyway, who'd knock the easy things out in a week anyway.

cushychicken

What kind of things is it doing?

Wrote me:

- a SPI deserializer that sets a bit after 12 bits read in, to trigger a prefetch

- an SDC constraints file for the deserializer that correctly identified the SPI clock and bus clock as separate domains requiring their own statement

- a test bench that validated both that the prefetch bit was being set, and that it was being set at the proper time relative to the SPI clock

- a makefile with commands for build, headless test, and debug by loading the VCD into a waveform viewer

It always feels like LLMs can do lots of the easy stuff, but if they can't do everything you need the skilled engineer anyway, who'd knock the easy things out in a week anyway.

Nearly every part of the tool flow I just described, I would consider “tricky to get right”. Been doing this for ~15 years and it’s still tough to bootstrap something like this from scratch. ChatGPT-4o did this for me from zero in about 15 minutes.

I won’t lie: I love it. I can focus on the actual, bigger problems at hand, and not the tricky little details of HDLs.

People are either deluding themselves or ignorant of the capabilities of frontier models if they don’t believe LLMs offer a speedup in workflow.

I personally believe that most of the doubt and cynicism is due to:

1) a pretty big collective identity crisis among software professionals, and

2) a suspicion that LLMs make it so that anyone who is good at articulating the problem precisely no longer needs a software engineer as a translation specialist from specs to code.

I say this as an EE of ~15 years who’s always been able to articulate what I want, specifically, to a firmware counterpart, who then writes the code I need. I can turn years of practice in this skill into great prompts for an LLM, which effectively cuts out the middleman.

I really like it. It’s helped me take on a lot of projects that are just outside of my innate level of capability. It’s also helped me learn a lot of new things about these new software adjacent areas. ChatGPT is a great tutor!

roflyear

It's better-than-senior at a some things, but worse-than-junior at a lot of things.

quantadev

It's more like better-than-senior 99% of the time. Makes mistakes 1% of the time. Most of the 'bad results' I've seen people struggle with ended up being the fault of the human, in the form of horrible context given to the AI or else ambiguous or otherwise flawed prompts.

Any skilled developer with a decade of experience can write prompts that return back precisely what we wanted almost every single time. I do it all day long. "Claude 4" rarely messes up.

bgwalter

Yet you are working on your own replacement, while your colleagues are taking the prudent approach.

Jolter

Here’s the deal: if you won’t write your replacement, a competitor will do it and outprice your employer. Either way you’re out of a job. May be more prudent to adapt to the new tools and master them rather than be left behind?

Do you want to be a jobless weaver, or an engineer building mechanical looms for a higher pay than the weaver got?

bgwalter

I want to be neither. I either want to continue being a software engineer who doesn't need a tricycle for the mind, or move to law or medicine; two professions that have successfully defended themselves against extreme versions of the kind of anxiety, obedience and self hate that is so prevalent among software engineers.

allturtles

I think it's the wrong analogy. The prompt engineer who uses the AI to make code maps to the poorly-paid, low-skill power loom machine tender. The "engineer" is the person who created the model. But it's also not totally clear to me that we'll need humans for that either, in the near future.

declan_roberts

I would absolutely love to write my own placement. When I can have AI do my job while I go to the beach you better believe I will be at the beach.

kuahyeow

Compiler is the more apt analogy to a mechanical loom.

An LLM is more like outsourcing to a consultancy. Results may vary.

mullingitover

> Either way you’re out of a job.

Tools and systems which increase productivity famously always put everyone out of a job, which is why after a couple centuries of industrial revolution we're all unemployed.

nialse

Ahh, the “don’t disturb the status quo” argument. See, we are all working on our replacement, newer versions, products, services and knowledge always make the older obsolete. It is wise to work on your replacement, and even wiser to be in charge of and operate the replacement.

bgwalter

No, nothing fundamentally new is created. Programmers have always been obsessed with "new" tooling and processes to distract from that fact.

"AI" is the latest iteration of snake oil that is foisted upon us by management. The problem is not "AI" per se, but the amount of of friction and productivity loss that comes with it.

Most of the productivity loss comes from being forced to engage with it and push back against that nonsense. One has to learn the hype language, debunk it, etc.

Why do you think IT has gotten better? Amazon had a better and faster website with far better search and products 20 years ago. No amount of "AI" will fix that.

palmotea

> It is wise to work on your replacement...

Depends on the context. You have to keep in mind: it is not a goal of our society or economic system to provide you with a stable, rewarding job. In fact, the incentives are to take that away from you ASAP.

Before software engineers go celebrate this tech, they need realize they're going to end up like rust-belt factory workers the day after the plant closed. They're not special, and society won't be any kinder to them.

> ...and even wiser to be in charge of and operate the replacement.

You'll likely only get to do that if your boss doesn't know about it.

npteljes

Carteling doesn't work bottom-up. When changes begin (like this one with AI), one of the things an individual can do is to change course as fast as they can. There are other strategies as well, not evolving is also one, but some strategies yield better results than others. Not keeping up just worsens the chances, I have found.

asdff

It does when it is called unionizing, however for some reason software developers have a mental block towards the concept.

dughnut

Do you want to work with LLMs or H1Bs and interns… choose wisely.

Personally I’m thrilled that I can get trivial, one-off programs developed for a few cents and the cost of a clear written description of the problem. Engaging internal developers or consulting developers to do anything at all is a horrible experience. I would waste weeks on politics, get no guarantees, and waste thousands of dollars and still hear nonsense like, “you want a form input added to a web page? Aw shucks, that’s going to take at least another month” or “we expect to spend a few days a month maintaining a completely static code base” from some clown billing me $200/hr.

rsyring

You can work with consulting oriented engineers who get shit done with relatively little stress and significant productivity. Productivity enhanced by AI but not replaced by it. If interested, reach out to me.

JeremyNT

I don't think that this should be downvoted because it raises a really important issue.

I hate AI code assistants, not because they suck, but because they work. The writing is on the wall.

If we aren't working on our own replacements, we'll be the ones replaced by somebody else's vibe code, and we have no labor unions that could plausibly fight back against this.

So become a Vibe Coder and keep working, or take the "prudent" approach you mention - and become unemployed.

neta1337

I’ll work on fixing the vibe coders mess and make bank. Experience will prove valuable even more than before

realusername

Personally I used them for a while and then just stopped using them because actually no, unfortunately those assistants don't work. They appear to work at first glance but there's so much babysitting needed that it's just not worth it.

This "vibe coding" seems just another way to say that people spend more time refining the output of these tools over and over again that what they would normally code.

parliament32

I'd like to agree with you and remain optimistic, but so much tech has promised the moon and stagnated into oblivion that I just don't have any optimism left to give. I don't know if you're old enough, but remember when speech-to-text was the next big thing? DragonSpeak was released in 1997, everyone was losing their minds about dictating letters/documents in MS Word, and we were promised that THIS would be the key interface for computing evermore. And.. 27 years later, talking to the latest Siri, it makes just as many mistakes as it did back then. In messenger applications people are sending literal voice notes -- audio clips -- back and forth because dictation is so unreliable. And audio clips are possibly the worst interface for communication ever (no searching, etc).

Remember how blockchain was going to change the world? Web3? IoT? Etc etc.

I've been through enough of these cycles to understand that, while the AI gimmick is cool and all, we're probably at the local maximum. The reliability won't improve much from here (hallucinations etc), while the costs to run it will stay high. The final tombstone will be when the AI companies stop running at a loss and actually charge for the massive costs associated with running these models.

oconnor663

> 27 years later, talking to the latest Siri, it makes just as many mistakes as it did back then

Have you tried talking to ChatGPT voice mode? It's mind blowing. You just have a conversation with it. In any language. About anything. The other day I wanted to know about the difference between cast iron and wrought iron, and it turned into a 10 or 15 minute conversation. That's maybe a good example of an "easy" topic for LLMs (lots of textbooks for it to memorize), but the world is full of easy topics that I know nothing about!

some_random

How can you possibly look at what LLMs are doing and the progress made in the last ~3 years and equate it to crypto bullshit? Also it's super weird to include IoT in there, seeing as it has become all but ubiquitous.

r14c

I'm not as bearish on AI, but its hard to tell if you can really extrapolate future performance based on past improvements.

Personally, I'm more interested in the political angle. I can see that AI will be disruptive because there's a ton of money and possibly other political outcomes depending on it doing exactly that.

energy123

Their confusion is your competitive advantage in the labor market.

null

[deleted]

ChrisMarshallNY

Really good coders (like him) are better.

Mediocre ones … maybe not so much.

When I worked for a Japanese optical company, we had a Japanese engineer, who was a whiz. I remember him coming over from Japan, and fixing some really hairy communication bus issues. He actually quit the company, a bit after that, at a very young age, and was hired back as a contractor; which was unheard of, in those days.

He was still working for them, as a remote contractor, at least 25 years later. He was always on the “tiger teams.”

He did awesome assembly. I remember when the PowerPC came out, and “Assembly Considered Harmful,” was the conventional wisdom, because of pipelining, out-of-order instructions, and precaching, and all that.

His assembly consistently blew the doors off anything the compiler did. Like, by orders of magnitude.

benstein

+1000. "Human coders are still better than LLMs" is a hot take. "Antirez is still better than LLMs" is axiomatic ;-)

yua_mikami

The thing everyone forgets when talking about LLMs replacing coders is that there is much more to software engineering than writing code, in fact that's probably one of the smaller aspects of the job.

One major aspect of software engineering is social, requirements analysis and figuring out what the customer actually wants, they often don't know.

If a human engineer struggles to figure out what a customer wants and a customer struggles to specify it, how can an LLM be expected to?

malfist

That was also one of the challenges during the offshoring craze in the 00s. The offshore teams did not have the power, or knowledge to push back on things and just built and built and built. Sounds very similar to AI right?

Probably going to have the same outcome.

pandastronaut

I tend to see today's AI Vibrators as the managers of the 00s and their army of offshore devs.

9dev

Did you actually mean to say AI Vibrators?

hathawsh

The difference is that when AI exhibits behavior like that, you can refine the AI or add more AI layers to correct it. For example, you might create a supervisor AI that evaluates when more requirements are needed before continuing to build, and a code review AI that triggers refinements automatically.

nevertoolate

Question is, how autonomous decision making works, nobody argues that llm can finish any sentence, but can it push a red button?

devjab

LLM's do no software engineering at all, and that can be fine. Because you don't actually need software engineering to create successful programs. Some applications will not even need software engineering for their entire life cycles because nobody is really paying attention to efficiency in the ocean of poor cloud management anyway.

I actually imagine it's the opposite of what you say here. I think technically inclined "IT business partners" will be able of creating applications entirely without software engineers... Because I see that happen every day in the world of green energy. The issues come later, when things have to be maintained, scale or become efficient. This is where the software engineering comes in, because it actually matters if you used a list or a generator in your Python app when it iterates over millions of items and not just a few hundreds.

AstroBen

That's the thing too right.. the vast majority of software out there barely needs to scale or be super efficient

It does need to be reliable, though. LLMs have proven very bad at that

km144

Maybe automated testing of all forms will just become much more ubiquitous as a safeguard against the worst of AI hallucinations? I feel that would solve a lot of people's worries about LLMs. I'm imagining a world where a software developer is a person who gathers requirements, writes some tests, asks the AI to modify the codebase, ensures the tests still work, makes sure they are a human who understands the change the AI just made, and continues with the next requirement.

devjab

> the vast majority of software out there barely needs to scale or be super efficient

That was the way I saw it for a while. In recent months I've begun to wonder if I need to reevaluate that, because it's become clear to me that scaling doesn't actually start from zero. By zero I mean that I was naive enough to think that all programs, even the most googled programmed one by a completely new junior would at least have, some, efficiency... but some of these LLM services I get to work on today are so bad they didn't start at zero but at some negative number. It would have been less of an issue if our non-developer-developers didn't use Python (or at least used Python with ruff/pyrefly/whateveryoulike, but some of the things they write can't even scale to do minimal BI reporting.

ilaksh

It actually comes down to feedback loops which means iterating on software being used or attempting to be used by the customer.

Chat UIs are an excellent customer feedback loop. Agents develop new iterations very quickly.

LLMs can absolutely handle abstractions and different kinds of component systems and overall architecture design.

They can also handle requirements analysis. But it comes back to iteration for the bottom line which means fast turnaround time for changes.

The robustness and IQ of the models continue to be improved. All of software engineering is well underway of being automated.

Probably five years max where un-augmented humans are still generally relevant for most work. You are going to need deep integration of AI into your own cognition somehow in order to avoid just being a bottleneck.

victorbjorklund

Yea, this is why I dont buy the "all developers will disappear". Will I write a lot less code in 5 years (maybe almost none)? Sure, I already type a lot less now than a year ago. But that is just a small part of the process.

elzbardico

No. the scope will just increase to occupy the space left by LLMs. We will never be allowed to retire.

xandrius

Exactly, also today I can actually believe I could finish a game which might have taken much longer before LLMs, just because now I can be pretty sure I won't get stuck on some feature just because I never done it before.

mettamage

> One major aspect of software engineering is social, requirements analysis and figuring out what the customer actually wants, they often don't know.

It really depends on the organization. In many places product owners and product managers do this nowadays.

bbarn

The thing is, it is replacing _coders_ in a way. There are millions of people who do (or did) the work that LLMs excel at. Coders who are given a ticket that says "Write this API taking this input and giving this output" who are so far down the chain they don't even get involved in things like requirements analysis, or even interact with customers.

Software engineering, is a different thing, and I agree you're right (for now at least) about that, but don't underestimate the sheer amount of brainless coders out there.

callc

That sounds more like a case against a highly ossified waterfall development process than anything.

I would argue it’s a good thing to replace the actual brainless activities.

rowanG077

I think LLMs are better at requirement elicitation than they are at actually writing code.

ori_b

> If a human engineer struggles to figure out what a customer wants and a customer struggles to specify it, how can an LLM be expected to?

Presumably, they're trained on a ton of requirements docs, as well as a huge number of customer support conversations. I'd expect them to do this at least as well as coding, and probably better.

wanderingstan

“Better” is always task-dependent. LLMs are already far better than me (and most devs I’d imagine) at rote things like getting CSS syntax right for a desired effect, or remembering the right way to invoke a popular library (e.g. fetch)

These little side quests used to eat a lot of my time and I’m happy to have a tool that can do these almost instantly.

jaccola

I've found LLMs particularly bad for anything beyond basic styling since the effects can be quite hard to describe and/or don't have a universal description.

Also, there are often times multiple ways to achieve a certain style and they all work fine until you want a particular tweak, in which case only one will work and the LLM usually gets stuck in one of the ones that does not work.

danielbln

Multi modal LLMs to the rescue. Throw a screenshot or mockup in there and tell the LLM "there, like this". Gemini can do the same with videos.

karn97

Still terrible result. Multi modal = actually understands the image

gherkinnn

I have found it to be good at things I am not very strong at (SQL) but terrible at the things I know well (CSS).

Telling, isn't it?

mywittyname

Ironically, I find it strong at things I don't know very well (CSS), but terrible at things I know well (SQL).

This is probably really just a way of saying, it's better at simple tasks rather than complex ones. I can eventually get Copilot to write SQL that's complex and accurate, but I don't find it faster or more effective than writing it myself.

ehansdais

Actually, you've reinforced their point. It's only bad at things the user is actually good at because the user actually knows enough in that domain to find the flaws and issues. It appears to be good in domains the user is bad at because the user doesn't know any better. In reality, the LLM is just bad at all domains; it's simply whether a user has the skill to discern it. Of course, I don't believe it's as black and white as that but I just wanted to point it out.

ch4s3

I kind of agree. It feels like they're generally a superior form of copying and pasting fro stack overflow where the machine has automated the searching, copying, pasting, and fiddling with variable names. It be just as useful or dangerous as Google -> Copy -> Paste ever was, but faster.

sanderjd

Funny, I find it to be good at things I'm not very strong at (CSS) but terrible at the things I know well (SQL). :)

Actually I think it's perfectly adequate at SQL too.

kccqzy

> and most devs I’d imagine

What an awful imagination. Yes there are people who don't like CSS but are forced to use it by their job so they don't learn it properly, and that's why they think CSS is rote memorization.

But overall I agree with you that if a company is too cheap to hire a person who is actually skilled at CSS, it is still better to hoist that CSS job onto LLMs than an unwilling human. Because that unwilling human is not going to learn CSS well and won't enjoy writing CSS.

On the other hand, if the company is willing to hire someone who's actually good, LLMs can't compare. It's basically the old argument of LLMs only being able to replace less good developers. In this case, you admitted that you are not good at CSS and LLMs are better than you at CSS. It's not task-dependent it's skill-dependent.

marcosdumay

Hum... I imagine LLMs are better than every developer on getting CSS keywords right like the GP pointed. And I expect every LLM to be slightly worse than most classical autocompletes.

skydhash

Getting CSS keywords right is not the actual point of writing CSS. And you can have a linter that helps you in that regards. The endgame of writing CSS is to style an HTML page according to the specifications of a design. Which can be as detailed as a figma file or as flimsy as a drawing on a whiteboard.

lelandfe

I'm one of those weirdos who really likes handwriting CSS. I frequently find ChatGPT getting my requests wrong.

michaelsalim

This is like saying that LLMs are better at knowing the name of that one obscure API. It's not wrong, but it's also not the hard part about CSS

chii

The LLM outputs good enough CSS, but is (way) cheaper than someone who's actually good at CSS.

sanderjd

Yeah, this is what I really like about AI tools though. They're way better than me at annoying minutia like getting CSS syntax right. I used to dread that kind of thing!

codr7

And you will keep dreading it for as long as you use them, since you learn nothing from solutions served on a silver plate.

sanderjd

The point is that I don't dread it anymore, because now there are tools that make it a lot easier the one or two times a year I have some reason to use it.

zdragnar

I think that's great if it's for something outside of your primary language. I've used it to good effect in that way myself. However, denying yourself the reflexive memory of having learned those things is a quick way to become wholly dependent upon the tool. You could easily end up with compromised solutions because the tool recommends something you don't understand well enough to know there's a better way to do something.

dpkirchner

You're right, however I think we've already gone through this before. Most of us (probably) couldn't tell you exactly how an optimizing compiler picks optimizations or exactly how JavaScript maps to processor instructions, etc -- we hopefully understand enough at one level of abstraction to do our jobs. Maybe LLM driving will be another level of abstraction, when it gets better at (say) architecting projects.

skydhash

> Most of us (probably) couldn't tell you exactly how an optimizing compiler picks optimizations or exactly how JavaScript maps to processor instructions,

That's because other people are making those working well. It's like how you don't care about how the bread is being made because you trust your baker (or the regulations). It's a chain of trust that is easily broken when LLMs are brought in.

AnimalMuppet

So here's an analogy. (Yeah, I know, proof by analogy is fraud. But it's going to illustrate the question.)

Here's a kid out hoeing rows for corn. He sees someone planting with a tractor, and decides that's the way to go. Someone tells him, "If you get a tractor, you'll never develop the muscles that would make you really great at hoeing."

Different analogy: Here's someone trying to learn to paint. They see someone painting by numbers, and it looks a lot easier. Someone tells them, "If you paint by numbers, you'll never develop the eye that you need to really become good as a painter."

Which is the analogy that applies, and what makes it the right one?

I think the difference is how much of the job the tool can take over. The tractor can take over the job of digging the row, with far more power, far more speed, and honestly far more quality. The paint by numbers can take over the job of visualizing the painting, with some loss of quality and a total loss of creativity. (In painting, the creativity is considered a vital part; in digging corn rows, not so much.)

I think that software is more like painting, rather than row-hoeing. I think that AI (currently) is in the form of speeding things up with some loss of both quality and creativity.

Can anyone steelman this?

bluefirebrand

> Here's a kid out hoeing rows for corn. He sees someone planting with a tractor, and decides that's the way to go. Someone tells him, "If you get a tractor, you'll never develop the muscles that would make you really great at hoeing

In this example the idea that losing the muscles that make you great at hoeing" seems kind of like a silly thing to worry about

But I think there's a second order effect here. The kid gets a job driving the tractor instead. He spends his days seated instead of working. His lifestyle is more sedentary. He works just as many hours as before, and he makes about the same as he did before, so he doesn't really see much benefit from the increased productivity of the tractor.

However now he's gaining weight from being more sedentary, losing muscle from not moving his body, developing lower back problems from being seated all day, developing hearing loss from the noisy machinery. His quality of life is now lower, right?

Edit: Yes, there are also health problems from working hard moving dirt all day. You can overwork yourself, no question. It's hard on your body, being in the sun all day is bad for you.

I would argue it's still objectively a physically healthier lifestyle than driving a tractor for hours though.

Edit 2: my point is that I think after driving a tractor for a while, the kid would really struggle to go hoe by hand like he used to, if he ever needed to

Sohcahtoa82

The analogy I would use is that coding via LLM is like learning to drive in a self-driving car that has manual controls as an option that drives overly cautiously (Leaves excessively large following distances, takes corners slower, etc.) while in self-driving mode.

You can let it self-drive, but you'd probably learn nothing, and it will actually take you longer. Put an expert driver behind the wheel, and they'll drive faster and only use automation features for the boring parts.

stonemetal12

>I think the difference is how much of the job the tool can take over.

I think it is about how utilitarian the output is. For food no one cares how the sausage is made. For a painting the story behind it is more important than the picture itself. All of Picasso's paintings are famous because they were painted by Picasso. Picasso style painting by Bill? Suddenly it isn't museum worthy anymore.

No one cares about the story or people behind Word, they just want to edit documents. The Demo scene probably has a good shot at being on the side of art.

danielbln

For me the creativity in software engineering doesn't come from coding, that's an implementation detail. It comes from architecture, from thinking about "what do I want to build, how should it behave, how should it look, what or who is it for?" and driving that forward. Bolting it together in code is hoeing, for that vast majority of us. The creative endeavor sits higher up on the abstraction ladder.

loudmax

Companies that leverage LLMs and AIs to let their employees be more productive will thrive.

Companies that try to replace their employees with LLMs and AIs will fail.

Unfortunately, all that's in the long run. In the near term, some CEOs and management teams will profit from the short term valuations as they squander their companies' future growth on short-sighted staff cuts.

bdbenton5255

That's really it. These tools are useful as assistants to programmers but do not replace an actual programmer. The right course is to embrace the technology moderately rather than reject it completely or bet on it replacing workers.

joshdavham

> In the near term, some CEOs and management teams will profit from the short term valuations

That's actually really interesting to think about. The idea that doing something counter-productive like trying to replace employees with AI (which will cause problems), may actually benefit the company in terms of valuations in the short run. So in effect, they're hurting and helping the company at the same time.

Natfan

See also: Toys 'R' Us, Seers, which were killed by consultancy groups loading on debt and selling assets for an immediate profit, which helped the immediate shareholders but hurt all of the stakeholders.

to11mtm

Hey if the check clears for the bonus they got for hitting 'reduce costs in the IT department', they often bail before things rear their ugly head, or in the ugly case, Reality Distortion Field's the entire org into making the bad anti patterns permanent, even while acknowledging the cost/delivery/quality inefficiencies[0].

This is especially prevalent in waterfall orgs that refuse change. Body shops are more than happy to waste a huge portion of their billable hours on planning meetings and roadmap revisions as the obviousness of the mythical man month comes to bear on the org.

Corners get cut to meet deadlines, because the people who started/perpetuated whatever myth need to save their skins (and hopefully continue to get bonuses.)

The engineers become a scapegoat for the org's management problems (And watch, it very likely will happen at some shops with the 'AI push'). In the nasty cases, the org actively disempowers engineers in the process[0][1].

[0] - At one shop, there was grief we got that we hadn't shipped a feature, but the only reason we hadn't, was IT was not allowed to decide between a set of radio buttons or a drop-down on a screen. Hell I got yelled at for just making the change locally and sending screenshots.

[1] - At more than one shop, FTE devs were responsible for providing support for code written by offshore that they were never even given the opportunity to review. And hell yes myself and others pushed for change, but it's never been a simple change. It almost always is 'GLWT'->'You get to review the final delivery but get 2 days'->'You get to review the set of changes'->'Ok you can review their sprint'->'OK just start reviewing every PR'.

BirAdam

By the time AI hype dies down and hurts the bottom line, AI systems might be good enough to do the jobs.

“The market can remain irrational longer than you can remain solvent.” — Warren Buffett

janalsncm

Very well said. Using code assistance is going to be table stakes moving forward, not something that can replace people. It’s not like competitors can’t also purchase AI subscriptions.

bbarn

Honestly, if you're not doing it now, you're behind. The sheer amount of time savings using it smartly can give you to allow you to focus on the parts that actually matter is massive.

kweingar

If progress continues at the rate that AI boosters expect, then soon you won't have to use them smartly to get value (all existing workflows will churn and be replaced by newer, smarter workflows within months), and everybody who is behind will immediately catch up the moment they start to use the tool.

acquisitionsilk

It is quite heartening to see so many people care about "good code". I fear it will make no difference.

The problem is that the software world got eaten up by the business world many years ago. I'm not sure at what point exactly, or if the writing was already on the wall when Bill Gates' wrote his open letter to hobbyists in 1976.

The question is whether shareholders and managers will accept less good code. I don't see how it would be logical to expect anything else, as long as profit lines go up why would they care.

Short of some sort of cultural pushback from developers or users, we're cooked, as the youth say.

JackSlateur

Code is meant to power your business

Bad code leads to bad business

This makes me think of hosting departement; You know, which people who are using vmware, physical firewalls, dpi proxies and whatnot;

On the other edge, you have public cloud providers, which are using qemu, netfilter, dumb networking devices and stuff

Who got eaten by whom, nobody could have guessed ..

chii

> Bad code leads to bad business

Bad business leads to bad business.

Bad code might be bad, or might be sufficient. It's situational. And by looking at what exists today, majority of code is pretty bad already - and not all businesses with bad code lead to bad businesses.

In fact, some bad code are very profitable for some businesses (ask any SAP integrator).

JackSlateur

It is the survivor bias: "by looking at what is still alive today, majority of code is pretty bad"

It eludes all of those who died in the process : those still alives are here despite bad IT, not due to a bad IT

tcoff91

The vast majority of code that makes money is pretty shitty.

BirAdam

This is fun to think about. I used to think that all software was largely garbage, and at one point, I think this _was_ true. Sometime over the last 20 years, I believe this ceased to be the case. Most software these days actually works. Importantly, most software is actually stable enough that I can make it half an hour without panic saving.

Could most software be more awesome? Yes. Objectively, yes. Is most software garbage? Perhaps by raw volume of software titles, but are most popular applications I’ve actually used garbage? Nope. Do I loathe the whole subscription thing? Yes. Absolutely. Yet, I also get it. People expect software to get updated, and updates have costs.

So, the pertinent question here is, will AI systems be worse than humans? For now, yeah. Forever? Nope. The rate of improvement is crazy. Two years ago, LLMs I ran locally couldn’t do much of anything. Now? Generally acceptable junior dev stuff comes out of models I run on my Mac Studio. I have to fiddle with the prompts a bit, and it’s probably faster to just take a walk and think it over than spend an hour trying different prompts… but I’m a nerd and I like fiddling.

robocat

> Short of some sort of cultural pushback from developers or users

Corporations create great code too: they're not all badly run.

The problem isn't a code quality issue: it is a moral issue of whether you agree with the goals of capitalist businesses.

Many people have to balance the needs of their wallet with their desire for beautiful software (I'm a developer-founder I love engineering and open source community but I'm also capitalist enough to want to live comfortably).

am17an

All the world's smartest minds are racing towards replacing themselves. As programmers, we should take note and see where the wind is blowing. At least don't discard the possibility and rather be prepared for the future. Not to sound like a tin-foil hat but odds of achieving something like this increase by the day.

In the long term (post AGI), the only safe white-collar jobs would be those built on data which is not public i.e. extremely proprietary (e.g. Defense, Finance) and even those will rely heavily on customized AIs.

bitpush

> All the world's smartest minds are racing towards replacing themselves

Isnt every little script, every little automation us programmers do in the same spirit? "I dont like doing this, so I'm going to automate it, so that I can focus on other work".

Sure, we're racing towards replacing ourselves, but there would be (and will be) other more interesting work for us to do when we're free to do that. Perhaps, all of us will finally have time to learn surfing, or garden, or something. Some might still write code themselves by hand, just like how some folks like making bread .. but making bread by hand is not how you feed a civilization - even if hundreds of bakers were put out of business.

AstroBen

> all of us will finally have time to learn surfing, or garden

Unless you have a mortgage.. or rent.. or need to eat

wijwp

> Not to sound like a tin-foil hat but odds of achieving something like this increase by the day.

Where do you get this? The limitations of LLMs are becoming more clear by the day. Improvements are slowing down. Major improvements come from integrations, not major model improvements.

AGI likely can't be achieved with LLMs. That wasn't as clear a couple years ago.

drodgers

I don't know how someone could be following the technical progress in detail and hold this view. The progress is astonishing, and the benchmarks are becoming saturated so fast that it's hard to keep track.

Are there plenty of gaps left between here and most definitions of AGI? Absolutely. Nevertheless, how can you be sure that those gaps will remain given how many faculties these models have already been able to excel at (translation, maths, writing, code, chess, algorithm design etc.)?

It seems to me like we're down to a relatively sparse list of tasks and skills where the models aren't getting enough training data, or are missing tools and sub-components required to excel. Beyond that, it's just a matter of iterative improvement until 80th percentile coder becomes 99th percentile coder becomes superhuman coder, and ditto for maths, persuasion and everything else.

Maybe we hit some hard roadblocks, but room for those challenges to be hiding seems to be dwindling day by day.

materiallie

I think benchmark targeting is going to be a serious problem going forward. The recent Nate Silver podcast on poker performance is interesting. Basically, the LLM models still suck at playing poker.

Poker tests intelligence. So what gives? One interesting thing is that for whatever reason, poker performance isn't used a benchmark in the LLM showdown between big tech companies.

The models have definitely improved in the past few years. I'm skeptical that there's been a "break-through", and I'm growing more skeptical of the exponential growth theory. It looks to me like the big tech companies are just throwing huge compute and engineering budgets at the existing transformer tech, to improve benchmarks one by one.

I'm sure if Google allocated 10 engineers a dozen million dollars to improve Gemini's poker performance, it would increase. The idea before AGI and the exponential growth hypothesis is that you don't have to do that because the AI gets smarter in a general sense all on it's own.

AstroBen

Ultimately this needs to be solved politically

Making our work more efficient, or humans redundant should be really exciting. It's not set in stone that we need to leave people middle aged with families and now completely unable to earn enough to provide a good life

Hopefully if it happens, it happens to such a huge amount of people that it forces a change

lyu07282

But that already happened to lots of industries and lots of people, we never cared before about them, now it's us so we care, but nothing is different about us. Just learn to code!

geraneum

> But that already happened to lots of industries and lots of people, we never cared before about them

We did. Why do you think labor laws, unions, etc. exist? Why do you think communism was appealing as an idea in the beginning to many? Whether the effects were good or bad or enough or not, that’s a different question. But these changes have demonstrably, grave consequences.

AstroBen

The difference is in how many industries AI is threatening. It's not just coding on the chopping block

bgwalter

The Nobel prize is said to have been created partly out of guilt over having invented dynamite, which was obviously used in a destructive manner.

Now we have Geoffrey Hinton getting the prize for contributing to one of the most destructive inventions ever.

reducesuffering

At least he and Yoshua Bengio are remorseful. Many others haven't even gotten that far...

cheema33

> All the world's smartest minds are racing towards replacing themselves.

I think they are hoping that their future is safe. And it is the average minds that will have to go first. There may be some truth to it.

Also, many of these smartest minds are motivated by money, to safeguard their future, from a certain doom that they know might be coming. And AI is a good place to be if you want to accumulate wealth fast.

BirAdam

Nah. As more people are rendered unemployed the buying market and therefore aggregate demand will fall. Fewer sales hurts the bottom line. At some point, revenues across the entire economy fall, and companies cannot afford the massive datacenters and nuclear power plants fueling them. The hardware gets sold cheap, the companies go under, and people get hired again. Eventually, some kind of equilibrium will be found or the world engages in the Butlerian Jihad.

bouncycastle

Last night I spent hours fighting o3.

I never made a Dockerfile in my life, so I thought it would be faster just getting o3 to point to the GitHub repo and let it figure out, rather than me reading the docs and building it myself.

I spent hours debugging the file it gave me... It kept on adding hallucinations for things that didn't exist, and removing/rewriting other parts, and other big mistakes like understanding the difference between python3 and python and the intricacies with that.

Finally I gave up and Googled some docs instead. Fixed my file in minutes and was able to jump into the container and debug the rest of the issues. AI is great, but it's not a tool to end all. You still need someone who is awake at the wheel.

halpow

They're great at one-shotting verbose code, but if they're generate bad code the first time you're out of luck.

I don’t think I ever got to write "this api doesn't exist" and then gotten a useful alternative.

Claude is the only one that regularly tells me something isn't possible rather than making sh up.

throwaway314155

Pro-tip: Check out Claude or Gemini. They hallucinate far less on coding tasks. Alternatively, enable internet search on o3 which boosts its ability to reference online documentation and real world usage examples.

I get having a bad taste in your mouth but these tools _aren't _ magic and do have something of a steep learning curve in order to get the most out of them. Not dissimilar from vim/emacs (or lots of dev tooling).

edit: To answer a reply (hn has annoyingly limited my ability to make new comments) yes, internet search is always available to ChatGpT as a tool. Explicitly clicking the globe icon will encourage the model to use it more often, however.

Sohcahtoa82

> enable internet search on o3

I didn't know it could even be disabled. It must be enabled by default, right?

throwaway314155

You're correct. Tapping the globe icon encourages the model to use it more often.

solatic

Human coders are necessary because writing code is a political act of deciding between different trade-offs. antirez's whole post is explaining to Gemini what the trade-offs even were in the first place. No analysis of a codebase in isolation (i.e. without talking to the original coders, and without comments in the code) can distinguish between intentional prioritization of certain trade-offs or whether behavior is unintentional / written by a human in an imperfect way because they didn't know any better / buggy.

LLMs will never be able to figure out for themselves what your project's politics are and what trade-offs are supposed to be made. The penultimate model will still require a user to explain the trade-offs in a prompt.

energy123

> LLMs will never be able to figure out for themselves what your project's politics are and what trade-offs are supposed to be made.

I wouldn't declare that unsolvable. The intentions of a project and how they fit into user needs can be largely inferred from the code and associated docs/README, combined with good world knowledge. If you're shown a codebase of a GPU kernel for ML, then as a human you instantly know the kinds of constraints and objectives that go into any decisions. I see no reason why an LLM couldn't also infer the same kind of meta-knowledge. Of course, this papers over the hard part of training the LLMs to actually do that properly, but I don't see why it's inherently impossible.

solatic

> associated docs/README

Many (I would even argue most) professional codebases either do not have their documentation (including tutorials and architecture diagrams) in the codebase alongside the code, if there is even such formal documentation at all. It's axiomatic as well that documentation is frequently out-of-date, and in any case represents past political decisions, not future ones; human owners can and do change their minds about which trade-offs are required over the lifetime of a project.

A simple case may be to plot codebase complexity against required scale; early projects benefit from simpler implementations that will not scale, and only after usage and demand are proven does it make sense to make the project more complex in order to support additional scale. So if you are an LLM looking at a codebase in isolation, do you make changes to add complexity to support additional scale? Do you simplify the codebase? Do you completely rewrite it in a different language (say, TypeScript -> Go or Rust)? How could an LLM possibly know which of these are appropriate without additional sources of telemetry at the very least and probably also needing to converse with stakeholders (i.e. bordering on AGI)?

smilbandit

From my limited experience, former coder now management but I still get to code now and then. I've found them helpful but also intrusive. Sometimes when it guesses the code for the rest of the line and next few lines it's going down a path I don't want to go but I have to take time to scan it. Maybe it's a configuration issue, but i'd prefer it didn't put code directly in my way or be off by default and only show when I hit a key combo.

One thing I know is that I wouldn't ask an LLM to write an entire section of code or even a function without going in and reviewing.

haiku2077

Zed has a "subtle" mode like that. More editors should provide it. https://zed.dev/docs/ai/edit-prediction#switching-modes

PartiallyTyped

> One thing I know is that I wouldn't ask an LLM to write an entire section of code or even a function without going in and reviewing.

These days I am working on a startup doing [a bit of] everything, and I don't like the UI it creates. It's useful enough when I make the building blocks and let it be, but allowing claude to write big sections ends up with lots of reworks until I get what I am looking for.

austin-cheney

In many cases developers are a low expectation commodity. In those cases I strongly believe humans are entirely replaceable by AI and I am saying that as somebody with an exceptionally low opinion of LLMs.

Honestly though, when that replacement comes there is no sympathy to be had. Many developers have brought this upon themselves. For roughly the 25 year period from 1995 to 2020 businesses have been trying to turn developers into mindless commodities that are straight forward to replace. Developers have overwhelmingly encouraged this and many still do. These are the people who hop employers every 2 years and cannot do their jobs without lying on their resumes or complete reliance on a favorite framework.

ponector

But to job hop every 2 years is the best strategy to earn more money and experience.

zxexz

I find myself wondering about your story, and would love it if you would elaborate more. I have gotten some use out of LLMs, and have been quite involved in training a few compute intensive (albeit domain-specific) ones.

Maybe it's the way you talk about 'developers'. Nothing I have seen has felt like the sky falling on an industry; to me at most it's been the sky falling on a segment of silicon valley.

austin-cheney

It’s all about perspectives. In many cases the perspectives between how a developer identifies their level of participation versus what they actually do as a work activity differ substantially. For example many developers may refer to themselves as engineers when they have done nothing remotely close to measurements, research, or policy creation in compliance attainment.

With that out of the way let’s look only at what many developers actually do. If a given developer only uses a framework to put text on screen or respond to a user interaction then they can be replaced. LLMs can already do this better than people. That becomes substantially more true after accounting for secondary concerns: security, accessibility, performance, regression, and more.

If a developer is doing something more complex that accounts for systems analysis or human behavior then LLMs are completely insufficient.

HN

Human coders are still better than LLMs

Human coders are still better than LLMs