Skip to content(if available)orjump to list(if available)

How I use LLMs as a staff engineer

How I use LLMs as a staff engineer

200 comments

·February 4, 2025

toprerules

As a fellow "staff engineer" LLMs are terrible at writing or teaching how to write idiomatic code, and they are actually causing me to spend more time reviewing than I was previously due to the influx of junior to senior engineers trying to sneak in LLM garbage.

In my opinion, using LLMs to write code comes as a faustian deal where you learn terrible practices and rely on code quantity, boilerplate, and indeterministic outputs - all hallmarks of poor software craftsmanship. Until ML can actually go end to end on requirements to product and they fire all of us, you can't cut corners on building intuition as a human by forgoing reading and writing code yourself.

I do think that there is a place for LLMs in generating ideas or exploring an untrusted knowledge base of information, but using code generated from an LLM is pure madness unless what you are building is truly going to be thrown away and rewritten from scratch, as is relying on it as a linting, debugging, or source of truth tool.

tokioyoyo

I will get probably heavily crucified for this, but to people who are ideologically opposing AI generated code — executives, directors and managerial staff think the opposite. Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.

Personally, I’m on the fence. But having conversations with others, and some requests from execs to implement different AI utils into our processes… making me to be on the safer side of job security, rather than dismiss it and be adamant against it.

feoren

> executives, directors and managerial staff think the opposite

Executives, directors, and managerial staff have had their heads up their own asses since the dawn of civilization. Riding the waves of terrible executive decisions is unfortunately part of professional life. Executives like the idea of LLMs because it means they can lay you off; they're not going to care about your opinion on it one way or another.

> Being very anti-LLM code instead of trying to understand how it can improve the speed might be detrimental for your career.

You're making the assumption that LLMs can improve your speed. That's the very assumption being questioned by GP. Heaps of low-quality code do not improve development speed.

simonw

I'm willing to stake my reputation on the idea that yes, LLMs can improve your speed. You have to learn how to use them effectively and responsibly but the productivity boosts they can give you once you figure that out are very real.

Aeolun

> Executives like the idea of LLMs because it means they can lay you off; they're not going to care about your opinion on it one way or another.

Probably, but when the time comes for layoffs the ones that will be the first to go are those that are hiding under a rock, claiming that there is no value to those LLM’s even as they’re being replaced.

BerislavLopac

In my mind, this dilemma is actually very simple:

First, what LLMs/GenAI do is automated code generation, plain and simple. We've had code generation for a very long time; heck, even compiling is automated generation of code.

What is new with LLM code generation is non-deterministic, unlike traditional code generation tools; like a box of chocolates, you never know what you're going to get.

So, as long as you have tools and mechanisms to make that non-determinism irrelevant, using LLMs to write code is not a problem at all. In fact, guess what? Hand-coding is also non-deterministic, so we already have plenty of those in place: automated tests, code reviews etc.

subw00f

I think I’m having the same experience as you. I’ve heard multiple times from execs in my company that “software” will have less value and that, in a few years, there won’t be as many developer jobs.

Don’t get me wrong—I’ve seen productivity gains both in LLMs explaining code/ideation and in actual implementation, and I use them regularly in my workflow now. I quite like it. But these people are itching to eliminate the cost of maintaining a dev team, and it shows in the level of wishful thinking they display. They write a snake game one day using ChatGPT, and the next, they’re telling you that you might be too slow—despite a string of record-breaking quarters driven by successful product iterations.

I really don’t want to be a naysayer here, but it’s pretty demoralizing when these are the same people who decide your compensation and overall employment status.

adamredwoods

>> But these people are itching to eliminate the cost of maintaining a dev team, and it shows in the level of wishful thinking they display.

And this is the promise of AI, to eliminate jobs. If CEOs invest heavily in this, they won't back down because no one wants to be wrong.

I understand some people try to claim AI might make net more jobs (someday), but I just don't think that is what CEOs are going for.

mostertoaster

I think with LLMs we will actually see the demand for software developers who can understand code and know how to use the tools sky rocket. There will be ultimately be way more money in total going towards software developers, but average pay will be well above the median pay.

ukoki

> I’ve heard multiple times from execs in my company that “software” will have less value and that, in a few years, there won’t be as many developer jobs.

If LLMs make average devs 10x more productive, Jevon's Paradox[1] suggests we'll just make 10x more software rather than have 10x fewer devs. You can now implement that feature only one customer cares about, or test 10x more prototypes before building your product. And if you instead decide to decimate your engineering team, watch out because your competitors might not.

https://en.wikipedia.org/wiki/Jevons_paradox

chefandy

Just another way for the people on top to siphon money from everyone else. No individual contributor is going to be rewarded for any productivity increase beyond what is absolutely required to get them to fulfill the company’s goals, and the goal posts will be moving so fast that keeping up will be a full time job. As we see from the current job market, the supply of problems the commercial software market needs more coders to solve maybe isn’t quite as bountiful as we thought it was, and maybe we won’t need to perpetually ramp up the number of developers humanity has… maybe we even have too many already? If a company’s top developer can do the work of the 10 developers below them, their boss is going to happily fire the extra developers— not think of all the incredible other things they could use those developers for. A lot of developers assume that one uber-productive developer left standing will be more valuable to the company than they were before— but now that developer is competing with 10 people that also know the code base willing to work for a lot cheaper. We get paid based on what the market will bear, not the amount of value we deliver, so 100% of that newfound profit goes to the top and the rest of it goes to reducing the price of the product to stay competitive with every other company doing the exact same thing.

Maybe I’m being overly cynical, but assuming this isn’t a race to the bottom and people will get rich being super productive ai-enhanced code monsters, to me, looks like a conceited white collar version of the hustle porn guys that think if they simultaneously work the right combo of gig apps at the right time of day in the right spots then they can work their way up to being wealthy entrepreneurs. Good luck.

nyarlathotep_

Yeah it seems pretty obvious where this is all going and yet a sizable proportion of the programming population cheers on every recent advancement that makes their skills more and more of a commodity.

hansvm

"Yes, of course, I'm using AI at every single opportunity where I think it'll improve my output"

<<never uses AI>>

giobox

This simply doesn't work much of the time as an excuse - virtually all the AI tool subscriptions for corporations provide per user stats on how much each staff member is using the AI assist. This shouldn't come as a surprise - software tool purveyors need to demonstrate ROI to their customer's management teams and as always this is in reporting tools.

I've already seen several rounds of slacks: "why aren't you using <insert LLM coding assistant name>?" off the back of this reporting.

These assistants essentially spy on you working in many cases, if the subscription is coming from your employer and is not a personal account. For one service, I was able to see full logging of all the chats every employee ever had.

rectang

This has been true for every heavily marketed development aid (beneficial or not) for as long as the industry has existed. Managing the politics and the expectations of non-technical management is part of career development.

tokioyoyo

Yeah, I totally agree, and you're 100% right. But the amount of integrations I've personally done and have instructed my team to do implies this one will be around for a while. At some point spending too much time on code that could be easily generated will be a negative point on your performance.

I've heard exactly the same stories from my friends in larger tech companies as well. Every all hands there's a push for more AI integration, getting staff to use AI tools and etc., with the big expectation that development will get faster.

aprilthird2021

> executives, directors and managerial staff think the opposite

The entire reason they hire us is to let them know if what they think makes sense. No one is ideologically opposed to AI generated code. It comes with lots of negatives and caveats that make relying on it costly in ways we can easily show to any executives, directors, etc. who care about the technical feasibility of their feelings.

tokioyoyo

> No one is ideologically opposed to AI generated code

Unfortunately, that hasn't been my experience. But I agree with you comment generally.

v3xro

As a former "staff engineer" these executives can go and have their careers and leave people who want to have code they can understand, reason about and focus on quality software well alone.

hinkley

When IntelliJ was young the autocomplete and automated refactoring were massive game changers. It felt like a dawn of a new age. But then release after release no new refactorings materialized. I don't know if they hit the Pareto limit or the people responsible moved on to new projects.

I think that's the sort of spot where better tools might be appropriate. I know what I want to do, but it's a mess to do it. I suspect that will be better at facilitating growth instead of stunting it.

jiggawatts

Hmm… I wonder if there will be a category of LLM-assisted refactoring tools that combine mechanistic transformations with the more flexible capabilities of generative AI. E.g.: update the English text in comments automatically to reflect code structure changes.

esafak

What refactoring do you want IntelliJ to do that it can not?

null

[deleted]

satellite2

I just don't fully understand this position at this level. Personally I know exactly what the next 5 lines need to be, and whether I write them or auto complete or some AI write them doesn't matter. I'll only accept what I had in mind exactly. And with Copilot for boilerplate and relatively trivial tasks that happens pretty often. I feel I'm just saving time / old age joint pain.

purerandomness

If the next 5 lines of code are so predictable, do they really need to be written down?

If you're truly saving time by having an LLM write boiler plate code, is there maybe an opportunity to abstract things away so that higher-level concepts, or more expressive code could be used instead?

jaredklewis

Sure, but abstractions have a cost.

5 lines of code written with just the core language and standard library are often much easier to read and digest than a new abstraction or call to some library.

And it’s just an unfortunate fact of life that many of the common programming languages are not terribly ergonomic; it’s not uncommon for even basic operations to require a few lines of boilerplate. That isn’t always bad as languages are balancing many different goals (expressiveness, performance, simplicity and so on).

weitendorf

I have lately been writing a decent amount of Svelte. Svelte and frontend in general is relatively new to me, but since I’ve been programming for a while now I can usually articulate what I want to do in English. LLMs are totally a game changer for me in this scenario - they basically take me from someone who has to look everything up all the time to someone who only does so a couple times a day.

In a way LLMs are ushering in a kind of boilerplate renaissance IMO. When you can have an LLM refactor a massive amount of boilerplate in one fell swoop it starts to not matter much if you repeat yourself - actually, really logically dense code would probably be harder for LLMs to understand and modify (not dissimilar from us…) so it’s even more of a liability now than in the past. I would almost always rather have simple, easy-to-understand code than something elegant and compact and “expressive” - and our tools increasingly favor this too.

Also I really don’t give a shit about how to best center a div nor do I want to memorize a million different markup tags and their 25 years of baggage. I don’t find that kind of knowledge gratifying because it’s more trivia than anything insightful. I’m glad that with LLMs I can minimize the time I spend thinking about those things.

everforward

Some languages don't give that opportunity. E.g. the "if err != nil" blocks in Go are effectively required and obvious, but are mandated by the language.

Other things are complicated to abstract for the boilerplate they avoid. The kind of thing that avoids 100 lines of code but causes errors that take 20 minutes to understand because of heavy use of reflection/inferred types in generics/etc. The older I get, the more I think "clever" reflection is more of a sin than boring boilerplate.

axlee

What's your stack ? I have the complete opposite experience. LLMs are amazing at writing idiomatic code, less so at dealing with esoteric use cases.

And very often, if the LLM produces a poopoo, asking it to fix it again works just well enough.

Bjartr

> asking it to fix it again works just well enough.

I've yet to encounter any LLM from chatGPT to cursor, that doesn't choke and start to repeat itself and say it changed code when it didn't, or get stuck changing something back and forth repeatedly inside of 10-20 minutes. Like just a handful of exchanges and it's worthless. Are people who make this workflow effective summarizing and creating a fresh prompt every 5 minutes or something?

simonw

One of the most important skills to develop when using LLMs is learning how to manage your context. If an LLM starts misbehaving or making repeated mistakes, start a fresh conversation and paste in just the working pieces that are needed to continue.

I estimate a sizable portion of my successful LLM coding sessions included at least a few resets of this nature.

NicuCalcea

Hm, I use LLMs almost daily, and I've never had it say it changed code and not do it. If anything, they will sometimes try to "improve" parts of the code I didn't ask them to modify. Most times I don't mind, and if I do, it's usually a quick edit to say "leave that bit alone" and resubmit.

> Are people who make this workflow effective summarizing and creating a fresh prompt every 5 minutes or something?

I work on one small problem at a time, only following up if I need an update or change on the same block of code (or something very relevant). Most conversations are fewer than five prompt/response pairs, usually one-three. If the LLM gets something wrong, I edit my prompt to explain what I want better, or to tell it not to take a specific approach, rather than correcting it in a reply. It gets a little messy otherwise, and the AI starts to trip up on its own past mistakes.

If I move on to a different (sub)task, I start a new conversation. I have a brief overview of my project in the README or some other file and include that in the prompt for more context, along with a tree view of the repository and the file I want edited.

I am not a software engineer and I often need things explained, which I tell the LLM in a custom system prompt. I also include a few additional instructions that suit my workflow, like asking it to tell me if it needs another file or documentation, if it doesn't know something, etc.

Aeolun

Creating a new prompt. Sometimes it can go for a while without, but the first response (with crafted context) is generally the best. Having context from the earlier conversation has its uses though.

knicholes

The LLM you choose to work with in Cursor makes a big difference, too. I'm a fan of Claude 3.5 Sonnet.

slashdev

In my experience you have to tell it what to fix. Sometimes how as well.

beepbooptheory

Simply, it made my last job so nightmarish that for the first time in this career I absolutely dreaded even thinking about the codebase or having to work the next day. We can argue about the principle of it all day, or you can say things like "you are just doing it wrong," but ultimately there is just the boots-on-the-ground experience of it that is going to leave the biggest impression on me, at least. Like it's just so bad to have to work alongside, either the model itself or your coworker with the best of intentions but no domain knowledge.

Its like having to forever be the most miserable detective in the world; no mystery, only clues. A method that never existed, three different types that express the same thing, the cheeky smile of your coworker who says he can turn the whole backend into using an ORM in a day because he has Cursor, the manager who signs off on this, the deranged PR the next day. This continual sense that less and less people even know whats going on anymore...

"Can you make sure we support both Mongo and postgres?"

"Can you put this React component inside this Angular app?"

"Can you setup the kubernetes with docker compose?"

esafak

Hiring standards are important, as are managers who get it. Your organization seems to be lacking in both.

the_mitsuhiko

> but using code generated from an LLM is pure madness unless what you are building is truly going to be thrown away and rewritten from scratch, as is relying on it as a linting, debugging, or source of truth tool.

That does not match my experience at all. You obviously have to use your brain to review it, but for a lot of problems LLMs produce close to perfect code in record time. It depends a lot on your prompting skills though.

xmprt

Perhaps I suck at prompting but what I've noticed is that if an LLM has hallucinated something or learned a fake fact, it will use that fact no matter what you say to try to steer it away. The only way to get out of the loop is to know the answer yourself but in that case you wouldn't need an LLM.

liamwire

I’ve found a good way to get unstuck here is to use another model, either or comparable or superior quality, or interestingly sometimes even a weaker version of the same product (e.g. Claude Haiku, vs. Sonnet*). My mental model here is similar to pair programming or simply bringing in a colleague when you’re stuck.

*I don’t know to what extent it’s worthwhile discussing whether you could call these the same model vs. entirely different, for any two products in the same family. Outside of simply quantising the same model and nothing else. Maybe you could include distillations of a base model too?

codr7

I would say prompting skills relative coding skills; and the more you rely on them, the less you learn.

the_mitsuhiko

That is not my experience. I wrote recently [1] about how I use it and it’s more like an intern, pair programmer or rubber duck. None of which make you worse.

[1]: https://lucumr.pocoo.org/2025/1/30/how-i-ai/

brandall10

It's helpful to view working solutions and quality code as separate things to the LLM.

* If you ask it to solve a problem and nothing more, chances are the code isn't the best as it will default to the most common solutions in the training data.

* If you ask it to refactor some code idiomatically, it will apply most common idiomatic concepts found in the training data.

* If you ask it to do both at the same time you're more likely to get higher quality but incorrect code.

It's better to get a working solution first, then ask it to improve that solution, rinse/repeat in smallish chunks of 50-100 loc at a time. This is kinda why reasoning models are of some benefit, as they allow a certain amount of reflection to tie together disparate portions of the training data into more cohesive, higher quality responses.

jondwillis

It isn't like you can't write tests or reason about the code, iterate on it manually, just because it is generated. You can also give examples of idioms or patterns you would like to follow. It isn't perfect, and I agree that writing code is the best way to build a mental model, but writing code doesn't guarantee intuition either. I have written spaghetti that I could not hope to explain many times, especially when exploring or working in a domain that I am unfamiliar with.

ajmurmann

I described how I liked doing ping-pong pairing TDD with Cursor elsewhere. One of the benefits of that approach is that I write at least half the implementation and tests and review every single line. That means that there is always code that follows the patterns I want and it's right there for the LLM to see and base its work on.

Edit: fix typo in last sentence

scudsworth

i love when the llm can be its work of

doug_durham

I've had exactly the opposite experience with generating idiomatic code. I find that the models have a lot of information on the standard idioms of a particular language. If I'm having to write in a language I'm new in, I find it very useful to have the LLM do an idiomatic rewrite. I learn a lot and it helps me to get up to speed more quickly.

null

[deleted]

qqtt

I wonder if there is a big disconnect partially due to the fact that people are talking about different models. The top tier coding models (sonnet, o1, deepseek) are all pretty good, but it requires paid subscriptions to make use of them or 400GB of local memory to run deepseek.

All the other distilled models and qwen coder and similar are a large step below the above models in terms of most benchmarks. If someone is running a small 20GB model locally, they will not have the same experience as those who run the top of the line models.

Aeolun

The top of the line models are really cheap though. Getting an anthropic key and $5 of credit costs you exactly that, and gives you hundreds of prompts.

simonw

On last resort bug fixes:

> I don’t do this a lot, but sometimes when I’m really stuck on a bug, I’ll attach the entire file or files to Copilot chat, paste the error message, and just ask “can you help?”

The "reasoning" models are MUCH better than this. I've had genuinely fantastic results with this kind of thing against o1 and Gemini Thinking and the new o3-mini - I paste in the whole codebase (usually via my https://github.com/simonw/files-to-prompt tool) and describe the bug or just paste in the error message and the model frequently finds the source, sometimes following the path through several modules to get there.

Here's a slightly order example: https://gist.github.com/simonw/03776d9f80534aa8e5348580dc6a8... - finding a bug in some Django middleware

elliotto

I have picked up the cursor tool which allows me to throw in relevant files with a drop down menu. Previously I was copy pasting files into the chatgpt browser page, but now I point cursor to o1 and do it within the ide.

One of my favourite things is to ask it if it thinks there are any bugs - this helps a lot with validating any logic that I might be exploring. I recently ported some code to a different environment with slightly different interfaces and it wasn't working - I asked o1 to carefully go over each implementation in detail why it might be producing a different output. It thought for 2 whole minutes and gave me a report of possible causes - the third of which was entirely correct and had to do with how my environment was coercing pandas data types.

There have been 10 or so wow moments over the past few years where I've been shocked by the capabilities of genai and that one made the list.

powersnail

The "attach the entire file" part is very critical.

I've had the experience of seeing some junior dev posting error messages into ChatGPT, applying the suggestions of ChatGPT, and posting the next error message into ChatGPT again. They ended up applying fixes for 3 different kinds of bugs that didn't exist in the code base.

---

Another cause, I think, is that they didn't try to understand any of those (not the solutions, and not the problems that those solutions are supposed to fix). If they did, they would have figured out that the solutions were mismatches to what they were witnessing.

There's a big difference between using LLM as a tool, and treating it like an oracle.

theshrike79

This is why in-IDE LLMs like Copilot are really good.

I just had a case where I was adding stuff to two projects, both open at the same time.

I added new fields to the backend project, then I swapped to the front-end side and the LLM autocomplete gave me 100% exactly what I wanted to add there.

And similar super-accurate autocompletes happen every day for me.

I really don't understand people who complain about "AI slop", what kind of projects are they writing?

jppope

My experience is similar: great for boilerplate, great for autocomplete, starts to fall apart on complex tasks, doesn't do much as far as business logic (how would it know?)- All in all very useful, but not replacing a decent practitioner any time soon.

LLMs can absolutely bust out some corporate docs super crazy fast too... probably a reasonable thing to re-evaluate the value though

nicksergeant

I've had kind of great experiences even doing complex tasks with lots of steps, as long as I tell it to take things slowly and verify each step.

I had a working and complete version of Apple MapKit JS rendering a map for an address (along with the server side token generation), and last night I told it I wanted to switch to Google Maps for "reasons".

It nailed it on the first try, and even gave me quick steps for creating the API keys in Google Dev Console (which is always _super_ fun to navigate).

As Simon has said elsewhere in these comments, it's all about the context you give it (a working example in a slightly different paradigm really couldn't be any better).

jppope

totally agree, what you are saying is aligned. The LLM needs you in the drivers seat, it can't do it with out you

theshrike79

Exactly. For unit/integration tests I've found it to be a pretty good assistant.

I have a project with a bunch of tests already, then I pick a test file and write `public Task Test` and wait a few seconds, in most cases it writes down a pretty sane basis for a test - and in a few cases it figured out an edge case I missed.

delduca

> Disclaimer: I work for GitHub, and for a year I worked directly on Copilot.

Ah, now it makes sense.

brianstrimp

Yeah, the submission heading should indicate that there is a high risk for a sales pitch in there.

foobazgt

I wonder if the first bullet point, "smart auto complete", is much less beneficial if you're already using a statically typed language with a good IDE. I already feel like Intellij's auto complete reads my mind most of the time.

Klathmon

LLM autocomplete is an entirely different beast.

Traditional auto complete can finish the statement you started typing, LLMs often suggest whole lines before I even type anything, and even sometimes whole functions.

And static types can assist the LLM too. It's not like it's an either or choice

foobazgt

The author says they do the literal opposite:

"Almost all the completions I accept are complete boilerplate (filling out function arguments or types, for instance). It’s rare that I let Copilot produce business logic for me"

My experience is similar, except I get my IDE to complete these for me instead of an LLM.

neeleshs

I use LLM to generate complete solutions to small technical problems. "Write an input stream implementation that skips lines based on a regex".

Hard for an IDE auto complete to do this.

baq

> Copilot

…needn’t say more.

Copilot was utter garbage when I switched to cursor+claude, it was like some alien tech upgrade at first.

unregistereddev

Does IntelliJ try to complete the word you are typing, or does it suggest an entire line of code? Because newer versions of IntelliJ incorporate LLM's to beef up autocomplete. You may already be using it.

mrguyorama

I know I'm not using it because Intellij is constantly complaining that my version does not support the AI plugins.

The "dumb" autogenerated stuff is incredible. It's like going from bad autocomplete to Intellisense all over again.

The world of python tooling (at least as used by my former coworkers) put my expectations in the toilet.

foobazgt

The new LLM-based completion in Intellij is not useful. :(

hansvm

It's completely different. If I start writing an itertools library with comptime inlining and my favorite selection of other features, completing map/reduce/take/skip/... exactly how I want them to look, LLM autocomplete can finish the rest of the library exactly as would have written it even for languages it doesn't otherwise know well, outside of the interesting bits (in the context of itertools, that'd be utilities with memory tradeoffs, like tee and groupby).

VenturingVole

My first thought upon reading this was the observation about the fact software engineers are deeply split: How can they be so negative? A mixture of emotions.

Then I reflected, how very true it was. In fact, as of writing this there are 138 comments and I started simply scrolling through what was shown to assess the negative/neutral/positive bias based upon a highly subjective personal assessment: 2/3 were negative and so I decided to stop.

As a profession, it seems many of us have become accustomed to dealing in absolutes when reality is subjective. Judging LLMs prematurely with a level of perfectionism not even cast upon fellow humans.. or at least, if cast upon humans I'd be glad not to be their colleagues.

Honestly right now - I would use this as a litmus test in hiring and the majority would fail based upon their closed-mindedness and ability to understand how to effectively utilise tools at their disposal. It won't exist as a signal for much longer, sadly!

notTooFarGone

It boils down to responsibility.

We need to trust machines more than humans because machines can't get responsibility. That code that you pushed and broke prd - you can't point at the machine.

It is also predictability/growth in a sense. I can assess certain people and know what they will probably get wrong and develop the person and adjust it. If that person uses LLMs it disguises that exposure of skill and leads to a very hard signal to read as a senior dev, hampering their growth.

VenturingVole

I absolutely agree with your points - assuming that "machines" to mean the code as opposed to the LLMs: As a "Staff+" IC type and mentoring and training a couple of apprentice-level developers I've already asked on several occasions "why did you do this?" and had a response of "oh, that's what the AI did." I'm very patient, but have made clear that's something to never utter - at least not until one has the experience to deeply understand boundaries/constraints effectively.

I did see a paper recently on the impact of AI/LLMs and danger to critical thinking skills - it's a very real issue and I'm having to actively counter this seemingly natural tendency many have.

With respect to signals, mine was around the attitude in general. I'd much rather work with someone who goes "Yes, but.." than one who is outright dismissive.

Increasing awareness of the importance of context will be a topic for a long time to come!

mvdtnz

> What about hallucinations? Honestly, since GPT-3.5, I haven’t noticed ChatGPT or Claude doing a lot of hallucinating.

See this is what I don't get about the AI Evangelists. Every time I use the technology I am astounded at the amount of incorrect information and straight up fantasy it invents. When someone tells me that they just don't see it, I have to wonder what is motivating them to lie. There is simply no way you're using the same technology as me with such wildly different results.

simonw

> There is simply no way you're using the same technology as me with such wildly different results.

Prompting styles are incredibly different between different people. It's very possible that they are using the same technology that you are with wildly different results.

I think learning to use LLMs to their maximum effectiveness takes months (maybe even years) of effort. How much time have you spent with them so far?

mrguyorama

> I have to wonder what is motivating them to lie.

Most of these people who aren't salesmen aren't lying.

They just cannot tell when the LLM is making up code. Which is very very sad.

That or they could literally be replaced by a script that copy/pastes from stack-overflow. My friend did that a lot and it definitely helped features ship but doesn't make maintainable code.

the_mitsuhiko

> When someone tells me that they just don't see it, I have to wonder what is motivating them to lie. There is simply no way you're using the same technology as me with such wildly different results.

I don’t know what technology you are using but I know that I am getting very different results based on my own prompt qualities.

I also do not really consider hallucinations to be much of an issue for programming. It comes up so rarely and it’s caught by the type checker almost immediately. If there are hallucinations it’s often very minor things like imagining a flag that doesn’t exist.

mvdtnz

A lot of you guys, including the author, will respond with these "you're holding it wrong" comments. But you never give examples of actual prompts that are somehow significantly different to what I use. The author gives a very small handful of his example prompts and I don't see anything in them that's fundamentally different to what I've tried. If anything his prompts are especially lazy compared to what I use and what I have read as best practice among "prompt engineers":

"is this idiomatic C?"

"not just “how does X work”, but follow-up questions like “how does X relate to Y”. Even more usefully, you can ask “is this right” questions"

"I’ll attach the entire file or files to Copilot chat, paste the error message, and just ask “can you help?”"

the_mitsuhiko

> A lot of you guys, including the author, will respond with these "you're holding it wrong" comments. But you never give examples of actual prompts that are somehow significantly different to what I use.

It’s hard to impossible to discuss these things without a concrete problem at hand. Most of the prompt is the context provided. I can only talk from my experience which is, that how you write the prompt matters.

If you share what you are trying to accomplish I could probably provide some more appropriate insights.

simonw

I've shared a few hundred examples of how I'm using this stuff, with full prompt and response transcripts. You can find them linked from items on these tags on my blog - they're usually gists.

- ai-assisted-programming: https://simonwillison.net/tags/ai-assisted-programming/

- prompt-engineering: https://simonwillison.net/tags/prompt-engineering/

And my series on how I use LLMs: https://simonwillison.net/series/using-llms/

cruffle_duffle

A lot of times they probably are holding it wrong. These things aren’t mind readers. You have to provide proper context and clear asks. They are just another computer interface to learn to use.

And they aren’t perfect, but they sure can save a lot of time once you know how to use them and understand what they are and aren’t good at.

stuartd

> is this idiomatic C?

This is how I use AI at work for maintaining Python projects, a language in which I am not at all really versed. Sometimes I might add “this is how I would do it in …, how would I do this in Python?”

I find this extremely helpful and productive, especially as I have to pull the code onto a server to test it.

synthc

This year I switched to a new job, using programming languages that I was less familiar with.

Asking a LLM to translate between languages works really well most of the time. It's also a great way to learn which libraries are the standard solution for a language. It really accelerated my learning process.

Sure, there is the occasional too literal translation or hallucination, but I found this useful enough.

brianstrimp

Have you noticed any difference in picking up the language(s) yourself? As in, do you think you'd be more fluent in it by now without all the help? Or perhaps less? Genuine question.

mewpmewp2

I do tons of TypeScript in my side projects and in real life, and I usually feel heavy frustrations when I stray away.

When I stray out of this (e.g. I started doing a lot of IoT, ML and Robotics projects, where I can't always use TypeScript). I think one key thing that LLMs have helped me is that I can ask why something is X without having to worry about sounding stupid or annoying.

So I think it has enabled me at least a way to get out of the TypeScript zone more worry free without losing productivity. And I do think I learn a lot, although I'm relating a lot of it on my JS/TS heavy experience.

To me the ability to ask stupid questions without fear of judgment or accidentally offending someone - it's just amazing.

I used to overthink a lot before LLMs, but they have helped me with that aspect, I think a lot.

I sometimes think that no one except LLMs would have the patience for me if I didn't filter my thoughts always.

synthc

For me it just speeds up learning the language, so I think i'd become fluent faster.

I do thoroughly review of the the LLM answers, and hardly every directly copy paste answer, so I feel this way I still learn the language.

n144q

Agree with many of the points here, especially the part with one-off, non-production code. I had great experience letting ChatGPT writing utility code. Once it provided Go code for an ad-hoc task which runs exactly as expected on first try, when it could cost me at least 30 minutes that's mostly spent on looking up APIs that I am not familiar with. Another time it created an HTTP server that worked with only minor tweaks. I don't want to think about life before LLMs existed.

One thing that is not mentioned -- code review. It is not great at it, often pointing out trivial or non issues. But if it finds 1 area for improvement out of 10 bullet points, that's still worth it -- most human code reviewers don't notice all the issues in the code anyway.

pgm8705

I used to feel they just served as a great auto complete or stack overflow replacement until I switched from VSCode to Cursor. Cursor's agent mode with Sonnet is pretty remarkable in what it can generate just from prompts. It is such a better experience than any of the AI tools VSCode provides, imo. I think tools like this when paired with an experienced developer to guide it and oversee the output can result in major productivity boosts. I agree with the sentiment that it falls apart with complex tasks or understanding unique business logic, but do think it can take you far beyond boilerplate.

Prickle

The main issue I am having here, is that I can see a measurable drop in my ability to write code because of LLM usage.

I need to avoid LLM use to ensure my coding ability stays up to par.

Aeolun

There’s no measurable drop in my ability to write code, but there’s a very significant one in my desire to.

fosterfriends

"Proofreading for typos and logic mistakes: I write a fair amount of English documents: ADRs, technical summaries, internal posts, and so on. I never allow the LLM to write these for me. Part of that is that I think I can write more clearly than current LLMs. Part of it is my general distaste for the ChatGPT house style. What I do occasionally do is feed a draft into the LLM and ask for feedback. LLMs are great at catching typos, and will sometimes raise an interesting point that becomes an edit to my draft."

--

I work on Graphite Reviewer (https://graphite.dev/features/reviewer). I'm also partly dyslexic. I lean massively on Grammarly (using it to write this comment) and type-safe compiled languages. When I engineered at Airbnb, I caused multiple site outages due to typos in my ruby code that I didn't see and wasn't able to execute before prod.

The ability for LLMs to proofread code is a godsend. We've tuned Graphite Reviewer to shut up about subjective stylistic comments and focus on real bugs, mistakes, and typos. Fascinatingly, it catches a minor mistake in ~1/5 PRs in prod at real companies (we've run it on a few million PRs now). Those issues it catches result in a pre-merge code change 75% of the time, about equal to what a human comment does.

AIs aren't perfect, but Im thrilled that they work as fancy code spell-checkers :)

elwillbo

I'm in your boat with having to write a significant amount of English documents. I always write them myself, and have ChatGPT analyze them as well. I just had a thought - I wonder if I could paste in technical documentation, and code, to validate my documentation? Will have to try that later.

CoPilot is used for simple boilerplate code, and also for the autocomplete. It's often a starting point for unit tests (but a thorough review is needed - you can't just accept it, I've seen it misinterpret code). I started experimenting with RA.Aid (https://github.com/ai-christianson/RA.Aid) after seeing a post on it here today. The multi-step actions are very promising. I'm about to try files-to-prompt (https://github.com/simonw/files-to-prompt) mentioned elsewhere in the thread.

For now, LLMs are a level-up in tooling but not a replacement for developers (at least yet)