Skip to content(if available)orjump to list(if available)

LLM codegen go brrr – Parallelization with Git worktrees and tmux

danielbln

What I don't like about this approach is that it mainly improves the chances of zero-shotting a feature, but I require a ping pong with the LLM to iterate on the code/approach. Not sure how to parallelize that, I'm not gonna keep the mental model of 4+ iterations of code in my head and iterate on all of them.

For visual UI iteration this seems amazing given the right tooling, as the author states.

I could see it maybe useful for TDD. Let four agents run on a test file and implement until it passes. Restrict to 50 iterations per agent, first one that passes the test terminates other in-progress sessions. Rinse and repeat.

diggan

> but I require a ping pong with the LLM to iterate on the code/approach

I've never got results from any LLM when doing more than one-shots. I basically have a copy-pastable prompt, and if the first answer is wrong, I update the prompt and begin from scratch. Usually I add in some "macro" magic too to automatically run shell commands and what not.

It seems like they lose "touch" with what's important so quickly, and manages to steer themselves further away if anything incorrect ends up at any place in the context. Which, thinking about how they work, sort of makes sense.

foolswisdom

That doesn't take away from the OP's point (and OP didn't specify what ping ponging looks like, could be the same as you're describing), you are still iterating based on the results, and updating the prompt based on issues you see in the result. It grates on a human to switch back and forth between those attempts.

scroogey

But if you're "starting from scratch", then what would be the problem? If none of the results match what you want, you reiterate on your prompt and start from scratch. If one of them is suitable you take it. If there's no iterating on the code with the agents, then this really wouldn't add much mental overhead? You just have to glance over more results.

landl0rd

I usually see that results are worse after ping-pong. If one-shot doesn't do it better to "re-roll". Context window full of crap poisons its ability to do better and stay on target.

babyshake

I guess one way it might be able to work is with a manager agent, who delegates to IC agents to try different attempts. The manager reviews their work and understands the differences in what they are doing, and can communicate with you about it and then to the ICs doing the work. So you are like a client who has a point of contact at an engineering org who internally is managing how the project is being completed.

skeptrune

>From the post: There is no easy way to send the same prompt to multiple agents at once. For instance, if all agents are stuck on the same misunderstanding of the requirements, I have to copy-paste the clarification into each session.

It's not just about zero shotting. You should be able to ping pong back and forth with all of the parallel agents at the same time. Every prompt is a dice roll, so you may as well roll as many as possible.

layoric

> Every prompt is a dice roll, so you may as well roll as many as possible.

Same vibe as the Datacenter Scale xkcd -> https://xkcd.com/1737/

Flemlo

I write docs often and what works wonders with LLM is good docs. A readme a architectural doc etc.

Helps me to plan it well and the LLM to work a lot better

mooreds

Bonus! Future you and other devs working in the system will benefit from docs as well.

vFunct

Yah it's not really usable for iteration. I don't parallelize this way. I parallelize based on functions. Different agents for different function.

Meanwhile, a huge problem in parallelization is maintaining memory-banks, like https://docs.cline.bot/prompting/cline-memory-bank

gct

No wonder software is so slow today when we're this profligate. "let's run four AIs and ignore 3/4 of them!" ugh.

bitpush

There's trade-offs all layers of the stack

Your cpu is decoding instructions optimistically (and sometimes even executing)

Your app is caching results just in case

The edge server has stuff stashed for you (and others..)

The list goes on and on...

lmeyerov

I like to think about maximizing throughput while minimizing attention: both matter, and the proposal here is expensive on my attention. Optimizing per-task latency matters less than enabling longer non-interactive runs.

For parallelism, I'm finding it more productive to have multiple distinct tasks that I multitask on and guide each to completion. Along the way I improve the repo docs and tools so the AI is more self-sufficient the next time, so my energy goes more to enabling longer runs.

Ex: One worker improving all docs. I can come back, give feedback, and redo all of them. If I'm going to mess with optimizing agent flows, it'd be to make the repo style guide clearer to the AI. In theory I can divide docs sections and manually run sections in parallel, or ask for multiple parallel versions of it all for comparison... But that's a lot of overhead. Instead, I can fork the repo and work another another non-docs issue in parallel. A. Individual task is slow, but I get more tasks done, and with less human effort.

I'd like tools to automate fork/join parallelism for divide-and-conquer plans, and that feels inevitable. For now, they do fairly linear CoT, and easier for me to do distinct tasks vs worrying about coordinating.

sureglymop

I love how the one non-broken toggle still wasn't great. Now you can save time while wasting your time ;)

skeptrune

It was better than starting from scratch though. Imo, getting a functional wireframe for $0.40 is a good deal.

null

[deleted]

peterkelly

If AI coding agents were actually any good, you could preface your prompt with "attempt the following task four times in parallel" and that would be it.

CraigJPerry

I'm in a different direction on this, worktrees don't solve the problem for me, this is stuck in 1 agent = 1 task mode. I want a swarm of agents on 1 task.

There's a couple of agent personas i go back to over and over (researcher, designer, critic, implementer, summariser), for most tasks i reuse 90%+ of the same prompt, but implementer has variants, one that's primed with an llms.txt (see answerdotai) for a given library i want to use, another that's configured to use gemini (i prefer its tailwind capabilities) rather than claude etc.

To organise these reusable agents i'm currently test driving langroid, each agent contributes via a sub task.

It's not perfect yet though.

skeptrune

I think you misread. The point I make is that it's many agents = 1 task.

Since the probability of a LLM succeeding at any given task is sub 100%, you should run multiple of the same LLM with the same prompted task in parallel.

yakbarber

I think OP means they should be collaborating. In the posters proposed solution each agent is independent. But you could reduce the human attention required by having multiple rounds of evaluation and feedback from other agents before it gets to the human.

null

[deleted]

dgunay

This looks like a much more sophisticated version of my setup. I had Aider vibe code me a script that just manages cloning a repo into a subfolder, optionally with some kind of identifying suffix on it, and then wrote a tiny script to automate calling that script, `cd`ing into the directory, and then running codex-cli on it. The resulting workflow: I open a new terminal, type `vibe --suffix=<suffix> <prompt>`, and then I can go do something else.

8200_unit

Could you share your scripts?

hboon

Coincidentally, I just posted this[1] earlier today where I made a simple change against a 210 LOC vs 1379 LOC file, comparing parameters: LOC, filename vs URL path for that webpage, playwright verification.

My question is how does the author get things done with $0.10 ? My simple example with the smaller file is $2 each.

[1] https://x.com/hboon/status/1927939888000946198

sagarpatil

I avoid using my own API keys, especially for Sonnet 4 or Opus, because LLMs can rack up unexpected costs. Instead, I use Augment Code’s remote agents and Google’s Jules, which charge per message rather than by usage. This setup is ideal for me since I prefer not to run the model locally while I’m actively working on the codebase.

thepablohansen

This resonates- my engineering workflow has started shifting from highly focused, long periods of building out a feature to one that has much more context switching, review, and testing.

hombre_fatal

I find that my bottleneck with LLMs on a real project is reviewing their code, QAing it, and, if it's novel code, integrating it into my own mental model of how the code works so that I can deliberately extend it in a maintainable way.

The latter is so expensive that I still write most code myself, or I'll integrate LLM code into the codebase myself.

I've used parallel Claude Code agents to do chore work for me.

But I'd be curious about examples of tasks that people find best for OP's level of parallelization.

bjackman

Hmm this strategy only makes sense if you can trivially evaluate each agent's results, which I haven't found to be the case.

I expect a common case would be: one agent wrote code that does the thing I want. One agent wrote code that isn't unmaintainable garbage. These are not the same agent. So now you have to combine the two solutions which is quite a lot of work.