Skip to content(if available)orjump to list(if available)

Some thoughts on LLMs and software development

nicwolff

> I’ve often heard, with decent reason, an LLM compared to a junior colleague.

No, they're like an extremely experienced and knowledgeable senior colleague – who drinks heavily on the job. Overconfident, forgetful, sloppy, easily distracted. But you can hire so many of them, so cheaply, and they don't get mad when you fire them!

furyofantares

These metaphors all suck. Well, ok, yours is funny. But anyway, LLMs are just very different from any human.

They are extremely shallow, even compared to a junior developer. But extremely broad, even compared to the most experienced developer. They type real fuckin fast compared to anyone on earth, but they need to be told what to do much more carefully than anyone on earth.

CuriouslyC

I've gotten Claude Code to make CUDA kernels and all kinds of advanced stuff that there's zero percent chance a junior would pull off.

AI is like a super advanced senior wearing a blindfold. It knows almost everything, it's super fast, and it gets confused pretty quickly about things after you tell it.

skeeter2020

it's not a senior though, because of the amount of oversight and guidance required. I trust senior-plus human developers to do the right thing and understand why it's the right thing. For mission critical things I get another human senior to verify. There's no way I'd autonomously trust 2, 10 or any number of LLMs to do the same thing.

achierius

You'd be surprised at what juniors can pull off. I have seen fresh-out-of-college new grads write performant GPU kernels that are used in real world library implementations for particular architectures.

QuadmasterXLII

have you ever asked a junior developer to write a cuda kernel?

Guthur

I just spent a good 2 hours trying to debug a SM6 Vulkan issue with unreal engine using an LLM, it had got me to good state but UE kept falling to load a project, it transpired that the specific error message would provide a fix as the top Google result, which I found when I eventually decided to look for myself.

LLM did help a lot to get some busy work out of the way, but it's difficult to know when you need to jump out of the LLM loop and go old skool.

jack_pp

Fwiw I think the ratio of times I needed to go to google for a solution instead of an LLM is like 20:1 for me so your mileage may vary. Depends a lot on the specific niche you're working in.

Unrelated to software but recently I wanted to revive an old dumbphone I haven't used since 2014 and apparently I had it password protected and forgot the password and wanted to factory reset it. I found the exact model of the phone and google had only content farm articles that didn't help me at all but Gemini gave me the perfect solution first try. I went to google first because I had no faith in Gemini since to me it seemed like a pretty obscure question but guess I was wrong.

fencepost

No, they're like a completely nontechnical marketing person who has a big library of papers on related subjects and who's been asked to generate a whitepaper by pulling phrases. What comes out will probably have proper grammar and seem perfectly reasonable to another person with no knowledge of the field, but if actually read by someone knowledgeable may be complete gibberish taken as a whole.

Individual sentences and paragraphs may mostly work, but it's an edifice built on sand out of poorly constructed bricks plus mortar with the wrong proportions (or entirely wrong ingredients).

LLM output is "truthy" - it looks like it might be true, and sometimes it will even be accurate (see also, "stopped clock") but depending on it is foolish because what's generating the output doesn't actually understand what it's putting out - it's just generating output that looks like the kind of thing you've requested.

Jcampuzano2

Overconfident, but also easy to sway their opinion.

They follow directions for maybe an hour and then go off and fix random shit because they forgot what their main task was.

They'll tell you to your face how great your ideas were, and that you're absolutely right about something, then go implement it completely incorrectly.

They add comments to literally everything even when you ask them to stop. But they also ignore said comments sometimes.

Maybe they are kinda like us lol.

AnotherGoodName

My analogy is a cursed monkey paw with unlimited wishes. It's actually really really powerful but you have to be careful of leaving any possible ambiguity in your wishes.

gck1

> Overconfident, forgetful, sloppy, easily distracted

And constantly microdosing, sometimes a bit too much.

amelius

> who drinks heavily on the job

Yes, that's why you should add "... and answer in the style of a drunkard" to every prompt. Makes it easier to not forget what you are dealing with.

amelius

> and they don't get mad when you fire them!

No, typically __you__ are mad when you fire them ...

vkou

Experienced and knowledgeable and they also believe in the technical equivalent of flat-Earthism in many, many non-trivial corners.

And if you push back on that insanity, they'll smile and nod and agree with you and in the next sentence, go right back to pushing that nonsense.

sho_hn

One good yardstick, if one has to anthropomorphize, is that LLMs know and believe what's popular. If you ask it to do something that popular developer opinion gets right, it will do fine.

Ask it for things that many people get wrong or just do badly, or can be mistakenly likened to a popular thing in a way that produces a wrong result, and it'll often err.

The trick is having an awareness of what correct solutions are prevalent in training data and what the bulk of accessible code used for training probably doesn't have many examples of. And this experience is hard to substitute for.

Juniors therefore use LLMs in a bumbling fashion and are productive either by sheer luck, or because they're more likely to ask for common things and so stay in a lane with the model.

A senior developer who develops a good intuition for when the tool is worth using and when not can be really efficient. Some senior developers however either overestimate or underestimate the tool based on wrong expectations and become really inefficient with them.

ares623

> Other forms of engineering have to take into account the variability of the world.

> Maybe LLMs mark the point where we join our engineering peers in a world on non-determinism.

Those other forms of engineering have no choice due to the nature of what they are engineering.

Software engineers already have a way to introduce determinism into the systems they build! We’re going backwards!

didericis

Part of what got me into software was this: no matter how complex or impressive the operation, with enough time and determination, you could trace each step and learn how a tap on a joystick lead to the specific pixels on a screen changing.

There’s a beautiful invitation to learn and contribute baked into a world where each command is fully deterministic and spec-ed out.

Yes, there have always been poorly documented black boxes, but I thought the goal was to minimize those.

People don’t understand how much is going to be lost if that goal is abandoned.

pton_xd

Agreed. The beauty of programming is that you're creating a "mathematical artifact." You can always drill down and figure out exactly what is going on and what is going to happen with a given set of inputs. Now with things like concurrency that's not exactly true, but, I think the sentiment still holds.

The more practical question is though, does that matter? Maybe not.

didericis

> The more practical question is though, does that matter?

I think it matters quite a lot.

Specifically for knowledge preservation and education.

DSingularity

In a way this is also a mathematical artifact — after all tokens are selected through beam searching or some random sampling of likely successor tokens.

whartung

"Computers are deterministic!"

If I wanted to plumb together badly documented black boxes, I'd have become an EE.

_mu

Underrated comment - computers can be decidedly non-deterministic.

https://en.wikipedia.org/wiki/Metastability_(electronics)

girvo

> People don’t understand how much is going to be lost if that goal is abandoned.

Ah but you see, imagine the shareholder value we can generate for a quarter or two in the meanwhile!

Editors note: please read that as absolutely dripping with disdain and sarcasm.

sodapopcan

As pertaining to software development, I agree. I've been hearing accounting (online and from coworkers) of using LLMs to do deterministic stuff. And yet, instead of at least prompting once to "write a script to do X," they just keep prompting "do X" over and over again. Seems incredibly wasteful. It feels like there is this thought of "We are not making progress if we aren't getting the LLM to do everything. Having it write a script we can review and tweak is anti-progress." No one has said that outright, but it's a gut feeling (and it wouldn't surprise me if people have said this out loud).

tptacek

This is the 2025 equivalent of the people who once wrote 2000 word blog posts about how bad it was to use "cat" instead of just shell redirection.

sodapopcan

These are hardly equivalent. One is someone preferring one deterministic way over another. The other is more akin to arguing that it's better to ask someone to manually complete a task for you instead of caching the instructions on your computer. Now if the LLM does caching then you have more of a point, I don't have enough experience there.

bongodongobob

I am one of these people. For my scripting needs, it would probably take me longer to find the script I saved rather than just asking it again and getting an answer in 15 seconds. I haven't saved a smallish script in a year.

achierius

I used to be one of those people, then I started saving these scripts in a folder and realized just how much time it saved me. Especially for nontrivial scripts which require a lot of typing or input specification.

tptacek

'potatolicious says we're going forwards: https://news.ycombinator.com/item?id=44978319

makeitdouble

That was an interesting take, but "probabilistic" is to me different from "random". In particular other field get error tolerances, LLMs give us nothing like that.

We're introducing chaos monkeys, not just variability.

tptacek

Note that he's talking about the same nondeterminism in that post that we're talking about here.

ants_everywhere

adding to this, software deals with non-determinism all the time.

For example, web requests are non-deterministic. They depend, among other things, on the state of the network. They also depend on the load of the machine serving the request.

One way to think about this is: how easy is it for you to produce byte-for-byte deterministic builds of the software you're working on? If it's not trivial there's more non-determinism than is obvious.

skydhash

Mostly the engineering part of software is dealing with non-determinism, by avoiding it or enforcing determinism. Take something like TCP, it's all about guaranteeing the determinism that either the message is sent and received or it is not. And we have a lot of algorithms that tries to guarantee consistency of information between the elements of a system.

anthem2025

It’s not trivial largely because we didn’t bother to design deterministic builds because it didn’t seem to matter. There is not much about the actual problem that makes it difficult.

null

[deleted]

delusional

I would rather say it like this: Very good, very hardworking engineers spent years of their lives building the machine that raised us from the non-determinism of messy physical reality. The technology that brought us perfect, replicable, and reliable math from sending electrons through rocks has been deeply underappreciated in the "software revolution".

The engineers at TSMC, Intel, Global Foundries, Samsung, and others have done us an amazing service, and we are throwing all that hard work away.

AaronAPU

It was forward when Newton discovered the beautiful simple determinism of physics.

Was it going backwards when the probabilistic nature of quantum mechanics emerged?

Viliam1234

Two words: many-world interpretation.

More seriously, this is not a fair comparison. Adding LLM output to your source code is not analogical to quantum physics; it is analogical to letting your 5 years old child transcribe the experimentally measured values without checking and accepting that many of them will be transcribed wrong.

BlueTemplar

Hopefully even Newton already had some awareness of "deterministic chaos" (or whatever terms they would have used back then) ?

And on the side hand, no transistors without quantum mechanics.

siffland

The issue is a lot of times it takes a senior level programmer to reason with a LLM to get the results needed. What happens when there are no Juniors to replace the Seniors. I guess by then AI will be able to program efficiently enough.

So far AI seems to be a great augmentation, but not a replacement.

sebnukem2

> hallucinations aren’t a bug of LLMs, they are a feature. Indeed they are the feature. All an LLM does is produce hallucinations, it’s just that we find some of them useful.

Nice.

nine_k

I'd rather say that LLMs live in a world that consists entirely of stories, nothing but words and their combinations. Thy have no other reality. So they are good at generating more stories that would sit well with the stories they already know. But the stories are often imprecise, and sometimes contradictory, so they have to guess. Also, LLMs don't know how to count, but they know that two usually follows one, and three is usually said to be larger than two, so they can speak in a way that mostly does not contradict this knowledge. They can use tools to count, like a human who knows digits would use a calculator.

But much more than an arithmetic engine, the current crop of AI needs an epistemic engine, something that would help follow logic and avoid contradictions, to determine what is a well-established fact, and what is a shaky conjecture. Then we might start trusting the AI.

gnerd00

this was true, but then it wasn't... the research world several years ago, had a moment when the machinery could reliably solve multi-step problems.. there had to be intermediary results; and machinery could solve problems in a domain where they were not trained specifically.. this caused a lot of excitement, and several hundred billion dollars in various investments.. Since no one actually knows how all of it works, not even the builders, here we are.

achierius

Are you sure you're talking about LLMs? These sound more like traditional ML systems like AlphaFold or AlphaProof.

utyop22

"Since no one actually knows how all of it works, not even the builders, here we are."

To me this is the most bizarre part. Have we ever had a technology deployed at this scale without a true understanding of its inner workings?

My fear is that the general public perception of AI will be damaged since for most LLMs = AI.

tptacek

In that framing, you can look at an agent as simply a filter on those hallucinations.

keeda

More of a error-correcting feedback loop rather than a filter, really. Which is very much what we do as humans, apparently. One recent theory of neuroscience that is becoming influential is Predictive Processing --https://en.wikipedia.org/wiki/Predictive_coding -- this postulates that we also constantly generate a "mental model" of our environment (a literal "prediction") and use sensory inputs to correct and update it.

So the only real difference between "perception" and a "hallucination" is whether it is supported by physical reality.

armchairhacker

This vaguely relates to a theory about human thought: that our subconscious constantly comes up with random ideas, then filters the unreasonable ones, but in people with delusions (e.g. schizophrenia) the filter is broken.

Salience (https://en.wikipedia.org/wiki/Salience_(neuroscience)), "the property by which some thing stands out", is something LLMs have trouble with. Probably because they're trained on human text, which ranges from accurate descriptions of reality to nonsense.

Lionga

Isn't an "agent" not just hallucinations layered on top of other random hallucinations to create new hallucinations?

tptacek

No, that's exactly what an agent isn't. What makes an agent an agent is all the not-LLM code. When an agent generates Golang code, it runs the Go compiler, which is in the agent's architecture an extension of the agent. The Go compiler does not hallucinate.

th0ma5

Yes yes, with yet to be discovered holes

anthem2025

Isn’t that why people argue against calling them hallucinations?

It implies that some parts of the output aren’t hallucinations, when the reality is that none of it has any thought behind it.

keeda

I've prefered to riff off of the other quote:

"All (large language) model outputs are hallucinations, but some are useful."

Some astonishingly large proportion of them, actually. Hence the AI boom.

awesome_dude

I have a very similar (probably unoriginal) thought about some human mental illnesses.

So, we VALUE creativity, we claim that it helps us solve problems, improves our understanding of the universe, etc.

BUT people with some mental illnesses, their brain is so creative that they lose the understanding of where reality is and where their imagination/creativity takes over.

eg. Hearing voices? That's the brain conjuring up a voice - auditory and visual hallucinations are the easy example.

But it goes further, depression is where people's brains create scenarios where there is no hope, and there's no escape. Anxiety too, the brain is conjuring up fears of what's to come

ninetyninenine

Nah I don't agree with this characterization. The problem is, the majority of those hallucinations are true. What was said would make more sense if the majority of the responses were, in fact, false, but this is not the case.

xmprt

I think you're both correct but have different definitions of hallucinations. You're judging it as a hallucination based on the veracity of the output. Whereas Fowler is judging it based on the method by which the output is achieved. By that judgement, everything is a hallucination because the user cannot differentiate between when the LLM is telling the truth and isn't.

This is different from human hallucinations where it makes something up because of something wrong with the mind rather than some underlying issue with the brain's architecture.

ants_everywhere

an LLM hallucination is defined by its truth

> In the field of artificial intelligence (AI), a hallucination or artificial hallucination (also called confabulation,[1] or delusion)[2] is a response generated by AI that contains false or misleading information presented as fact.[3][4]

You say

> This is different from human hallucinations where it makes something up because of something wrong with the mind rather than some underlying issue with the brain's architecture.

For consistency you might as well say everything the human mind does is hallucination. It's the same sort of claim. This claim at least has the virtue of being taken seriously by people like Descartes.

https://en.wikipedia.org/wiki/Hallucination_(artificial_inte...

BlueTemplar

LLMs don't hallucinate : they bullshit (which is not caring about truth).

null

[deleted]

skhameneh

There are many I've worked with that idolize Martin Fowler and have treated his words as gospel. That is not me and I've found it to be a nuisance, sometimes leading me to be overly critical of the actual content. As for now, I'm not working with such people and can appreciate the article shared without clouded bias.

I like this article, I generally agree with it. I think the take is good. However, after spending ridiculous amounts of time with LLMs (prompt engineering, writing tokenizers/samplers, context engineering, and... Yes... Vibe coding) for some periods 10 hour days into weekends, I have come to believe that many are a bit off the mark. This article is refreshing, but I disagree that people talking about the future are talking "from another orifice".

I won't dare say I know what the future looks like, but the present very much appears to be an overall upskilling and rework of collaboration. Just like every attempt before, some things are right and some are simply misguided. e.g. Agile for the sake of agile isn't any more efficient than any other process.

We are headed in a direction where written code is no longer a time sink. Juniors can onboard faster and more independently with LLMs, while seniors can shift their focus to a higher level in application stacks. LLMs have the ability to lighten cognitive loads and increase productivity, but just like any other productivity enhancing tool doing more isn't necessarily always better. LLMs make it very easy to create and if all you do is create [code], you'll create your own personal mess.

When I was using LLMs effectively, I found myself focusing more on higher level goals with code being less of a time sink. In the process I found myself spending more time laying out documentation and context than I did on the actual code itself. I spent some days purely on documentation and health systems to keep all content in check.

I know my comment is a bit sparse on specifics, I'm happy to engage and share details for those with questions.

manmal

> written code is no longer a time sink

It still is, and should be. It’s highly unlikely that you provided all the required info to the agent at first try. The only way to fix that is to read and understand the code thoroughly and suspiciously, and reshaping it until we’re sure it reflects the requirements as we understand them.

skhameneh

Vibe coding is not telling an agent what to do and checking back. It's an active engagement and best results are achieved when everything is planned and laid out in advance — which can also be done via vibe coding.

No, written code is no longer a time sink. Vibe coding is >90% building without writing any code.

The written code and actions are literally presented in diffs as they are applied, if one so chooses.

anskskbs

> It's an active engagement and best results are achieved when everything is planned and laid out in advance

The most efficient way to communicate these plans is in code. English is horrible in comparison.

When you’re using an agent and not reviewing every line of code, you’re offloading thinking to the AI. Which is fine in some scenarios, but often not what people would call high quality software.

Writing code was never the slow part for a competent dev. Agent swarming etc is mostly snake oil by those who profit off LLMs.

epolanski

> It's an active engagement and best results are achieved when everything is planned and laid out in advance — which can also be done via vibe coding.

No.

The general assumed definition of vibe coding, hence the vibe word, is that coding becomes an iterative process guided by intuition rather than spec and processes.

What you describe is literally the opposite of vibe coding, it feels the term is being warped into "coding with an LLM".

mehagar

How could you possibly plan out "everything" in advance? Code itself would be the only way to explicitly specify the "everything".

sfink

> We are headed in a direction where written code is no longer a time sink.

Written code has never been a time sink. The actual time that software developers have spent actually writing code has always been a very low percentage of total time.

Figuring out what code to write is a bigger deal. LLMs can help with part of this. Figuring out what's wrong with written code, and figuring out how to change and fix the code, is also a big deal. LLMs can help with a smaller part of this.

> Juniors can onboard faster and more independently with LLMs,

Color me very, very skeptical of this. Juniors previously spent a lot more of their time writing code, and they don't have to do that anymore. On the other hand, that's how they became not-juniors; the feedback loop from writing code and seeing what happened as a result is the point. Skipping part of that breaks the loop. "What the computer wrote didn't work" or "what the computer wrote is too slow" or even to some extent "what the computer wrote was the wrong thing" is so much harder to learn from.

Juniors are screwed.

> LLMs have the ability to lighten cognitive loads and increase productivity,

I'm fascinated to find out where this is true and where it's false. I think it'll be very unevenly distributed. I've seen a lot of silver bullets fired and disintegrate mid-flight, and I'm very doubtful of the latest one in the form of LLMs. I'm guessing LLMs will ratchet forward part of the software world, will remove support for other parts that will fall back, and it'll take us way too long to recognize which part is which and how to build a new system atop the shifted foundation.

skhameneh

> Figuring out what code to write is a bigger deal. LLMs can help with part of this. Figuring out what's wrong with written code, and figuring out how to change and fix the code, is also a big deal. LLMs can help with a smaller part of this.

I found exactly this is what LLMs are great at assisting with.

But, it also requires context to have guiding points for documentation. The starting context has to contain just enough overview with points to expand context as needed. Many projects lack such documentation refinement, which causes major gaps in LLM tooling (thus reducing efficacy and increasing unwanted hallucinations).

> Juniors are screwed.

Mixed, it's like saying "if you start with Python, you're going to miss lower level fundamentals" which is true in some regards. Juniors don't inherently have to know the inner workings, they get to skip a lot of the steps. It won't inherently make them worse off, but it does change the learning process a lot. I'd refute this by saying I somewhat naively wrote a tokenizer, because the >3MB ONNX tokenizer for Gemma written in JS seemed absurd. I went in not knowing what I didn't know and was able to learn what I didn't know through the process of building with an LLM. In other words, I learned hands on, at a faster pace, with less struggle. This is pretty valuable and will create more paths for juniors to learn.

Sure, we may see many lacking fundamentals, but I suppose that isn't so different from the criticism I heard when I wrote most of my first web software in PHP. I do believe we'll see a lot more Python and linguistic influenced development in the future.

> I'm guessing LLMs will ratchet forward part of the software world, will remove support for other parts that will fall back, and it'll take us way too long to recognize which part is which and how to build a new system atop the shifted foundation.

I entirely agree, in fact I think we're seeing it already. There is so much that's hyped and built around rough ideas that's glaringly inefficient. But FWIW inefficiency has less of an impact than adoption and interest. I could complain all day about the horrible design issues of languages and software that I actually like and use. I'd wager this will be no different. Thankfully, such progress in practice creates more opportunities for improvement and involvement.

sfink

> Sure, we may see many lacking fundamentals, but I suppose that isn't so different from the criticism I heard when I wrote most of my first web software in PHP.

It's not just the fundamentals, though you're right that is an easy casualty. I also agree that LLMs can greatly help with some forms of learning -- previously, you kind of had to follow the incremental path, where you couldn't really do anything complex without have the skills that it built on, because 90% of your time and brain would be spent on getting the syntax right or whatever and so you'd lose track of the higher-level thing you were exploring. With an LLM, it's nice to be able to (temporarily) skip that learning and be able to explore different areas at will. Especially when that motivates the desire to now go back and learn the basics.

But my real fear is about the skill acquisition, or simply the thinking. We are human, we don't want to have to go through the learning stage before we start doing, and we won't if we don't have to. It's difficult, it takes effort, it requires making mistakes and being unhappy about them, unhappy enough to be motivated to learn how to not make them in the future. If we don't have to do it, we won't, even if we logically know that we'd be better off.

Especially if the expectations are raised to the point where the pressure to be "productive" makes it feel like you're wasting other people's time and your paycheck to learn anything that the LLM can do for you. We're reaching the point where it feels irresponsible to learn.

(Sometimes this is ok. I'm fairly bad at long division now, but I don't think it's holding me back. But juniors can't know what they need to know before they know it!)

rootusrootus

For a hot second I thought LLMs were coming for our jobs. Then I realized they were just as likely to end up creating mountains of things for us to fix later. And as things settle down, I find good use cases for Claude Code that augment me but are in no danger of replacing me. It certainly has its moments.

bko

> Certainly if we ever ask a hallucination engine for a numeric answer, we should ask it at least three times, so we get some sense of the variation.

This works on people as well!

Cops do this when interrogating. You tell the same story three times, sometimes backwards. It's hard to keep track of everything if you're lying or you don't recall clearly so you can get a sense of confidence. Also works on interviews, ask them to explain a subject in three different ways to see if they truly understand.

inerte

Triple modular redundancy. I remember reading that's how Nasa space shuttles calculate things because a processor / memory might have been affected by space radiation https://llis.nasa.gov/lesson/18803

Terr_

> This works on people as well!

Only within certain conditions or thresholds that we're still figuring out. There are many cases where the more someone recalls and communicates their memory, the more details get corrupted.

> Cops do this when interrogating.

Sometimes that's not to "get sense of the variation" but to deliberately encourage a contradiction to pounce upon it. Ask me my own birthday enough times in enough ways and formats, and eventually I'll say something incorrect.

Care must also be taken to ensure that the questioner doesn't change the details, such as by encouraging (or sometimes forcing) the witness/suspect to imagine things which didn't happen.

chistev

Who remembers that scene on Better Call Saul between Lalo, Saul, and Kim?

rancar2

My favorite quote to borrow: “Furthermore I think anyone who says they know what this future will be is talking from an inappropriate orifice.”

koolba

Reminds me of the classic Yogi Berra, “It's tough to make predictions, especially about the future”.

Towaway69

Predicting the future isn't about being correct tomorrow, rather it’s about selling something to someone today.

An insight I picked up along the way…

th0ma5

To me, it is more specific to say that many futurists don't take into account the social and economic network effects of changes taking place. Many just act as if the future will continue on completely unchallenged in this current state. But if you look at someone like Kurzweil, you can see the very narrow and specific focus of a prediction, which has proved to be more informative to me as a high bar of futurism.

jeppester

In my company I feel that we getting totally overrun with code that's 90% good, 10% broken and almost exactly what was needed.

We are producing more code, but quality is definitely taking a hit now that no-one is able to keep up.

So instead of slowly inching towards the result we are getting 90% there in no time, and then spending lots and lots of time on getting to know the code and fixing and fine-tuning everything.

Maybe we ARE faster than before, but it wouldn't surprise me if the two approaches are closer than what one might think.

What bothers me the most is that I much prefer to build stuff rather than fixing code I'm not intimately familiar with.

utyop22

"but quality is definitely taking a hit now that no-one is able to keep up."

And its going to get worse! So please explain to me how in the net, you are going to be better off? You're not.

I think most people haven't taken a decent economics class and don't deeply understand the notion of trade offs and the fact there is no free lunch.

epolanski

As Fowler himself states, there's a need to learn to use these tools properly.

In any case poor work quality is a failure of tech leadership and culture, it's not AI's fault.

oo0shiny

> My former colleague Rebecca Parsons, has been saying for a long time that hallucinations aren’t a bug of LLMs, they are a feature. Indeed they are the feature. All an LLM does is produce hallucinations, it’s just that we find some of them useful.

What a great way of framing it. I've been trying to explain this to people, but this is a succinct version of what I was stumbling to convey.

nomilk

> We should ask the LLM the question more than once

For any macOS users, I highly recommend an Alfred workflow so you just press command + space then type 'llm <prompt>' and it opens tabs with the prompt in perplexity, (locally running) deepseek, chatgpt, claude and grok, or whatever other LLMs you want to add.

This approach satisfies Fowler's recommendation of cross referencing LLM responses, but is also very efficient and over time gives you a sense of which LLMs perform better for certain tasks.

_mu

What would such a workflow look like? I have Alfred but mainly just use the clipboard feature. I've tried to get into automation but struggled for inspiration. This one seems good!

Are you just opening a browser tab?

nomilk

Go to the 'Workflows' tab, make a new one with keyword of your choice (e.g. llm), and map it to open these urls in your default browser:

http://localhost:3005/?q={query}

https://www.perplexity.ai/?q={query}

https://x.com/i/grok?text={query}

https://chatgpt.com/?q={query}&model=gpt-5

https://claude.ai/new?q={query}

Modify to your taste. Example: https://github.com/stevecondylios/alfred-workflows/tree/main (you should be able to download the .alfredworkflow file and double click on it to import it straight into alfred, but creating your own shouldn't take long, maybe 5-10 mins if it's your first workflow)

daviding

I get a lot of productivity out of LLMs so far, which for me is a simple good sign. I can get a lot done in a shorter time and it's not just using them as autocomplete. There is this nagging doubt that there's some debt to pay one day when it has too loose a leash, but LLMs aren't alone in that problem.

One thing I've done with some success is use a Test Driven Development methodology with Claude Sonnet (or recently GPT-5). Moving forward the feature in discrete steps with initial tests and within the red/green loop. I don't see a lot written or discussed about that approach so far, but then reading Martin's article made me realize that the people most proficient with TDD are not really in the Venn Diagram intersection of those wanting to throw themselves wholeheartedly into using LLMs to agent code. The 'super clippy' autocomplete is not the interesting way to use them, it's with multiple agents and prompt techniques at different abstraction levels - that's where you can really cook with gas. Many TDD experts have great pride in the art of code, communicating like a human and holding the abstractions in their head, so we might not get good guidance from the same set of people who helped us before. I think there's a nice green field of 'how to write software' lessons with these tools coming up, with many caution stories and lessons being learnt right now.

edit: heh, just saw this now, there you go - https://news.ycombinator.com/item?id=45055439

tra3

It feels like Tdd/llm connection is implied — “and also generate tests”. Thought it’s not cannonical tdd of course. I wonder if it’ll turn the tide towards tech that’s easier to test automatically, like maybe ssr instead of react.

daviding

Yep, it's great for generating tests and so much of that is boilerplate that it feels great value. As a super lazy developer it's great as the burden of all that mechanical 'stuff' being spat out is nice. Test code being like baggage feels lighter when it's just churned out as part of the process, as in no guilt just to delete it all when what you want to do changes. That in itself is nice. Plus of course MCP things (Playwright etc) for integration things is great.

But like you said, it was meant more TDD as 'test first' - so a sort of 'prompt-as-spec' that then produces the test/spec code first, and then go iterate on that. The code design itself is different as influenced by how it is prompted to be testable. So rather than go 'prompt -> code' it's more an in-between stage of prompting the test initially and then evolve, making sure the agent is part of the game of only writing testable code and automating the 'gate' of passes before expanding something. 'prompt -> spec -> code' repeat loop until shipped.

girvo

The only thing I dislike is what it chooses to test when asked to just "generate tests for X": it often chooses to build those "straitjacket for your code" style tests which aren't actually useful in terms of catching bugs, they just act as "any change now makes this red"

As a simple example, a "buildUrl" style function that put one particular host for prod and a different host for staging (for an "environment" argument) had that argument "tested" by exactly comparing the entire functions return string, encoding all the extra functionality into it (that was tested earlier anyway).

A better output would be to check startsWith(prodHost) or similar, which is what I changed it into, but I'm still trying to work out how to get coding agents to do that in the first or second attempt.

But that's also not surprising: people write those kinds of too-narrow not-useful tests all the time, the codebase I work on is littered with them!

rvz

> It feels like Tdd/llm connection is implied — “and also generate tests”.

That sounds like an anti-pattern and not true TDD to get LLMs to generate tests for you if you don't know what to test for.

It also reduces your confidence in knowing if the generated test does what it says. Thus, you might as well write it yourself.

Otherwise you will get these sort of nasty incidents. [0] Even when 'all tests passed'.

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

gck1

LLMs (Sonnet, Gemini from what I tested) tend to “fix” failing tests by either removing them outright or tweaking the assertions just enough to make them pass. The opposite happens too - sometimes they change the actual logic when what really needs updating is the test.

In short, LLMs often get confused about where the problem lies: the code under test or the test itself. And no amount of context engineering seems to solve that.