Peer Programming with LLMs, for Senior+ Engineers

95 comments

·May 24, 2025

mattnewton

The space is moving so fast that, if I wrote down my workflows and workarounds just two months ago, so much of it would be stale today. I think all these recommendations need to list the models and harnesses being described front and center.

motorest

> The space is moving so fast that, if I wrote down my workflows and workarounds just two months ago, so much of it would be stale today.

There is also the problem that none of these workflows were validated or verified. Everyone is free to go on social media or personal blogs and advertise their snake oil. Thus in a scenario where these workflows are found to be lacking, the perceived staleness might actually be ineffectiveness beyond self promotion.

aerhardt

I'm seeing a lot of this too. I can tell with my own eyes that the technology is extremely useful, but also that it has limits. On the internet however you'll see a decent amount of randos claiming that they're one-shotting hyperscale complex systems with the latest trendiest tool. My approach is to keep using LLMs where they reasonably make sense and experimenting here and there with new workflows and tools, but I'm past changing the way I work every two weeks.

brumar

Very important comment. My workflow changed dramatically with the increased capabilities of these tools.

Aeolun

Yeah. I have Claude 4 correcting code that Claude 3.5 wrote a few months ago.

motorest

> Yeah. I have Claude 4 correcting code that Claude 3.5 wrote a few months ago.

You don't even need to switch models. Write a prompt to generate some code and immediately after prompt the same model to review the code it just generated. Sometimes it takes 3 or 4 prompts to get the result to converge. But converge to where?

pmbanugo

I haven’t tried Claude 4. Maybe I’d give it a spin to see if it can improve my design document

bdangubic

hehehe - same :)

hoppp

I use the llm as a glorified search engine. Instead of googling I ask it stuff. Its fine for that but its a hit or miss. Often the output is garbage and its better to just use google.

I dont use it much to generate code, I ask it higher level questions more often. Like when I need a math formula.

sam_bristow

My most common use is similar: when I'm working on problems in a somewhat unfamiliar domain finding out what their "terms of art" are. The chances that I've just come up with a completely new concept are pretty low so it's just a matter of helping me formulate my questions in the language of the existing body of knowledge.

NitpickLawyer

You should go a step further and integrate search (tavili, searxng, etc) into your flow. You'll get better results, and you can refine sources and gradually build a scored list of trusted sources.

notepad0x90

is it really more efficient to have an LLM generate code, then review that code, fix errors and spend some time to fully understand it? I wish there were tangible stats and metrics around this. Is it really more efficient than just writing the code yourself, but using LLMs to look up things or demo solutions?

Aurornis

Lately I look for opportunities to have the LLM do some easy small work while I go work on some harder task in parallel.

I've also tried asking the LLM to come up with a proposed solution while I work on my own implementation at the same time.

LLMs can also be much faster if a task requires some repetitive work. When I recognize a task like that, I try coding the first version and then ask the LLM to follow my pattern for the other areas where I need to repeat the work.

mwcampbell

How did we end up here, just accepting that some coding tasks require repetitive work, and turning to a probabilistic text synthesizer that requires massive training data to automate that? We're having this discussion on a site whose founder famously wrote, over 20 years ago, that succinctness is power, and even wrote this site in a programming language that he designed to take that principle as far as he could. Now, two decades later, why have we so completely retreated from that dream?

I bear some responsibility for this, since I was one of the people who basically said, in the 2010s, that we should just give up and use popular languages like JavaScript because they're popular. I regret that now.

sanderjd

We haven't retreated from that dream. We're doing both things in parallel. But I think it will always be the case that some things are repetitive, even as we continuously expand the frontier of eliminating those things. It's good to have tools that help automate repetitive tasks, and it's also good to create more powerful abstractions. There is no contradiction.

dwaltrip

The ethos that compelled PG to write HN in lisp is what matters, not the actual usage of lisp itself. That ethos lives on, in various forms.

I also offer you an old saying we have all heard many times:

There are 2 kinds of languages: ones that everyone complains about and ones that nobody uses.

Aeolun

Some things are better with more code. Sets of data in particular (strongly typed). Sometimes you need to modify those sets of data and it doesn’t require enough work to write a whole script for, but you still don’t really want to spend the time manually modifying everything. LLM’s are really nice in those instances.

jampekka

> I some responsibility for this, since I was one of the people who basically said, in the 2010s, that we should just give up and use popular languages like JavaScript because they're popular. I regret that now.

In 2010s the move was towards more concise languages and programming techniques in e.g. JavaScript scene too. CoffeeScript is a prime example of this.

But then came the enterprise software people pushing their Javaisms and now we have verbose bondage and ceremony mess like TypeScript and ES6 modules.

And in a tragicomic turn after we made expressing programmer intent formally so difficult, we are turning into writing bad novels for LLMs in a crapshoot trial and error hoping it writes the pointless boilerplate correctly.

wredcoll

I cannot wait to see your next webapp written in J.

zackify

Agree with each of these points so much!

That’s why I really like copilot agent and codex right now.

Even more parallel stuff and from my phone when I’m just thinking of ideas.

max_on_hn

(Disclaimer: I built and sell a product around that workflow)

It often is, if you pick the right tasks (and more tasks fall into that bucket every few weeks).

You can get a simple but fully-working app out of a single prompt, though quality varies widely unless you’re very specific.

Once you have a codebase, agent output quality comes down to architecture and tests.

If you have a scalable architecture with well-separated concerns, a solid integration test harness with examples, and good documentation (features, stack, procedures, design constraints), then getting the exact change you want is a matter of how well you can articulate what you want.

One more asterisk, the development environment has to support the agent: like a human, agents work well with compiler feedback, and better with testing tools and documentation/internet access (yes my agents have these).

I use CheepCode to work on itself, but I am still building up the test library and preview environments to de-risk merging non-trivial PRs that I haven’t pulled down and run locally. I also use it to work on other apps that I'm building, and since those are far more self-contained / easier to test, I get much better results there.

If you want to put less effort into describing what you want, have a chat with an AI to generate tickets. Then paste those tickets into Linear and let CheepCode agents rip through them. I’ve got tooling in the works that will make that much easier, but I can only be in so many places at once as a bootstrapped founder :-)

motorest

> is it really more efficient to have an LLM generate code, then review that code, fix errors and spend some time to fully understand it?

The answer is always "it depends". There are some drudge work tasks that are brilliantly done by LLMs, such as generating unit tests or documentation. Often the first attempt is not great, but iterating over them is so fast that you can regenerate everything from scratch a dozen times before you spend as much time as you would do if you wrote them yourself.

It also depends on what scope you're working on. Small iterations have better results than grand redesigns.

Context is also critical. if your codebase is neatly organized with squeaky clean code then LLMs generate better recommendations. If your codebase is a mess with inconsistent styles in spaghetti code then your prompts tend to generate more of the same.

viraptor

Depends on the code, but often yes. The less you care about that specific result, the more efficient it is. One-off tools under 2k lines where you can easily verify the result? Why would I not generate that and save time for more interesting stuff?

layer8

One-off tools seem to be more ops then dev, to me.

viraptor

I haven't heard the "scripting is not programming" or similar take since newsgroups. It's really time to let it die.

jiggawatts

I’ve used Gemini Pro 2.5 to generate code syntax rewrite tools from me. I asked it to use a compiler SDK that I know from experience is fiddly and frustrating to use. It gave me a working tool in about 10 minutes, while I was actively listening to a meeting.

Not all scripts are “ops”!

enos_feedler

What if every piece of software any consumer needed could be described this way? Outside of system code this could be everything we ever need. This world is nearly upon us and it is super exciting.

mattlondon

I frequently use it to assemble the boiler plate tedious stuff. E.g. add a new test file and wire up all the build rules etc, then implement tests x y z. Or the other day I had to create an input form in an Angular app and I just asked for the 7 or 8 fields I wanted and it did it and wired it all up for me (both template and Typescript). It can do those sorts of things in a few seconds and saves me maybe only 15-20 minutes but the mental relief of me not having to do it is great, even if the time saving is relatively low.

The other thing is sometimes just writing out method signatures with all the right types and conversions/casts etc between types/interfaces/classes etc when I can't be bothered to do all the look ups myself ("Create a method here called foo that accepts a Bar instance and converts it to Baz. Return type should be Quux - add a conversion from Baz to a new Quux instance before the final return - use the builder pattern and map the constant magic-strings in blah.ts to appropriate enum values in Quux." Etc etc and then I write the logic in the middle) Again not a huge time saving but it mentally lightens the load and keeps you concentrating on the problem rather than the minutia

teemur

If you can identify blocks of code you need to write that are easy to define reasonably well, easy to review/verify that it is written correctly but still burdensome to actually write, LLMs are your new best friend. I don't know about how other people think/write, but I seem to have a lot of that kind of stuff on my table. The difficult part to outsource to LLMs is how to connect these easy blocks, but luckily thats the part I find fun in coding, not so much writing the boring simple stuff.

IshKebab

In many cases yes. In many cases no. Overall, it can save you a lot of time even accounting for time wasted when you give it more than it can handle and you give up and do it all yourself.

tptacek

Yes, but the bar for skepticism is higher than that, because LLMs also compile code and catch errors, and generate and run tests; compile errors and assertion failures are just more prompts to an LLM agent.

lolinder

When used that way they also regularly get into loops where they lose track of what they were supposed to do.

The last time I set Cursor on something without watching it very very closely it spun for a while fixing tests and when it finally stopped and I looked what it had done it had coded special cases in to pass the specific failing tests in a way that didn't generalize at all to the actual problem. Another recent time I had to pull the plug on it installing a bunch of brand new dependencies that it decided would somehow fix the failing tests. It had some kind of complete rewrite planned.

Claude Code is even worse when it gets into this mode because it'll do something totally absurd like that and then at the end you have to `git reset` and you're also on the hook for the $5 of tokens that it managed to spend in 5 minutes.

I still find them useful, but it takes a lot of practice to figure out when they'll be useful and when they'll be a total waste of time.

tptacek

It happens to me every once in awhile, but I'm not sure why I would care. I usually set it off on some question and go tab away to something else while it flails. When I come back, I have a better-than-average shot at a workable solution, which is a science fiction result.

When I first began programming as a teenager, one of the mental hurdles I had to get over was asking the computer to "too much"; like, I would feel bad writing a nested loop --- that can't possibly be the right answer! What a chore for the computer! It didn't take me too long to figure out that was the whole point of computers. To me, it's the same thing with LLMs spinning on something. Who gives a shit? It's not me wasting that time.

d1sxeyes

To be fair, TDD has three steps: Red, Green, Refactor. Sounds like you got to Green. /s

what

Citation needed. Here is copilot “fixing” a failing test: https://github.com/dotnet/runtime/commit/fe173fc8f44dbd0a9ed...

It rewrote some comments, changed the test name and added extra assertions to the test. Baby sitting something like that seems like an absolute waste of time.

BeetleB

You want a citation for things so many people are doing daily with LLMs?

Just because they can't fix most failures doesn't mean they can't fix many.

motorest

> It rewrote some comments, changed the test name and added extra assertions to the test. Baby sitting something like that seems like an absolute waste of time.

I've worked with real flesh and blood developers who did the exactly same thing. At least with LLMs we don't have to jump into a 1h long call to discuss the changes.

pmbanugo

I've been experimenting with LLMs for coding for the past year - some wins, plenty of frustrations. Instead of writing another "AI will change everything" post, I collected practical insights from other senior engineers who've figured out what actually works. No hype, just real experiences from people in the trenches.

easygenes

I think none of these offer much useful insight beyond the overarching idea of peer programming beating just vibe coding.

The best structure I've found which leverages this idea is called BMAD, and treats the LLM as though it were a whole development team in an orchestrated way that you have full control over.

https://youtu.be/E_QJ8j74U_0 https://github.com/bmadcode/BMAD-METHOD

pmbanugo

Looks like an elevated vibe coding method for UI development. Does this work for non-web/UI development?

easygenes

You're always limited by the knowledge gaps of the underlying LLM. The method is otherwise the most coherent way to work to the strengths of the LLM through disciplined context and role management. Nothing about it is UI focused, and leans more on general agile team structures than anything else.

jbellis

I would have said that Harper Reed's workflow (brainstorm spec, then co-plan a plan, then execute using LLM codegen) is basically best practice today and I'm surprised that the author adds that "I’ve not been successful using this technique to build a complete feature or prototype."

Here's an example of using this pattern with Brokk to solve a real world bug: https://www.youtube.com/watch?v=t_7MqowT638

cheshire_cat

This is showing the workflow of your tool quite well, but would be way more convincing & impressive if you had actually fixed the bug and linked to the merged PR.

jbellis

that's fair! we thought that at 23m it was already pushing attention spans pretty hard :)

maximge

The article references https://www.seangoedecke.com/practical-ai-techniques, which states:

> AI is much better than strong engineers at writing very short programs: in particular, it can produce ten to thirty lines of straightforward mostly-working code faster than any engineer.

> How can you leverage this? There’s not much demand for this kind of program in the day-to-day of a normal software engineer. Usually code either has to be a modification to a large program, or occasionally a short production-data script (such as a data backfill) where accuracy matters a lot more than speed.

While this may be technically correct — there’s little demand for standalone small programs — it overlooks a crucial reality: the demand for small code segments within larger workflows is enormous.

Software development (in my experience) is built around composing small units — helpers, glue code, input validation, test cases, config wrappers, etc. These aren’t standalone programs, but they’re written constantly. And they’re exactly the kind of 10–30 line tasks where LLMs are most effective.

Engineers who break down large tasks into AI-assisted microtasks can move faster. It’s not about replacing developers — it’s about amplifying them.

diggan

> Peer Programming with LLMs, For Senior+ Engineers

> [...] a collection of blog posts written by other senior or staff+ engineers exploring the use of LLM in their work

It seems to be by senior engineers if anything, I don't see anything in the linked articles indicating they're for senior engineers, seems programmers of all seniority could find them useful, if they find LLMs useful.

OutOfHere

Yes, although those who are not senior engineers will not preemptively see the value in the documented approaches. One has to be a senior to preemptively appreciate the value in them.

SoftTalker

Though I haven't tried it, I would probably enjoy peer programming with an LLM more than I do with a real person (which I have tried and hated).

I could assign the LLM the simple drudgery that I don't really want to do, such as writing tests, without feeling bad about it.

I could tell the LLM "that's the stupidest fucking thing I've ever seen" whereas I would not say that to a real person.

gadflyinyoureye

It seems like we need to use forceful language with these things now. I've had copilot censor everything I asked it. Finally I had to to say, "listen you cracked up piece of shit, help me generate a uuid matcher. "

Aeolun

We’ve blocked your response because it matches public code.

Aeolun

I really want the LLM to do the opposite. To tell me that’s the stupidest fucking thing it’s ever seen. They’re surprisingly bad at that though.

pmbanugo

That’s what the recent Copilot feature on GitHub can do. You assign it tasks and it comes back with a PR. You could also assign it to review a PR.

tracerbulletx

My main feeling is that its great as long as I constrain it to working in a conceptual boundary that I can reason about, like a single system component where I am telling it the API. That way each piece that gets built up I have an understanding of it. If you try to let it go to wide it starts to make mistakes and I lose my mental model.

JSR_FDED

Well put. That’s my challenge too - losing the mental model of my entire codebase. Sometimes it feels like the time I saved using an LLM I then give right back when reassembling the mental model.

CompoundEyes

I write a lot of “defensive” C# code in my day job expecting that someone very inexperienced / offshore will be working with it in the future (and I will be reviewing it four months later when no longer on the project). I call it “corporate coding”. Lots of interfaces that must be adhered to, ioc, injection and annoyingly strong patterns. Anything that makes going off the rails a lot of work — the path of most resistance — glaring in code reviews. But…key logic concentrated in a few taller files so none of the drilling through abstraction so easy to comprehend for a newbie. I want to take some time with a defensive coding approach and LLMs. Particularly scoping it to a certain project or folder in a layered architecture. Why let it know of the front end, back end, database all at once? Of course it’ll get discombobulated.

I’ve also been experimenting with giving an LLM coins and a budget. “You have 10 coins to spend doing x, you earn coins if you m,n,o and lose coins if you j,k,l” this has reduced slop and increased succinctness. It will come back, recount what it’s done explaining the economy and spending. I’ve had it ask “All done boss I have 2 left how can i earn some more coins?” It’s fun to spy on the thinking model working through the choices “if I do this it’ll cost me this so maybe I should do this instead in 1 line of code and I’ll earn 3 coins!”

ColinEberhardt

Thanks for sharing pmabanugo, a couple of those posts are new to me too. If you’re taking submissions, I’ve been exploring how to make the most of these tools for the past few months, here’s my latest post:

https://blog.scottlogic.com/2025/05/08/new-tools-new-flow-th...