We put a coding agent in a while loop

47 comments

·August 24, 2025

VincentEvans

There will be a a new kind of job for software engineers, sort of like a cross between working with legacy code and toxic site cleanup.

Like back in the day being brought in to “just fix” a amalgam of FoxPro-, Excel-, and Access-based ERP that “mostly works” and only “occasionally corrupts all our data” that ambitious sales people put together over last 5 years.

But worse - because “ambitious sales people” will no longer be constrained by sandboxes of Excel or Access - they will ship multi-cloud edge-deployed kubernetes micro-services wired with Kafka, and it will be harder to find someone to talk to understand what they were trying to do at the time.

dhorthy

When Claude starts deploying Kafka clusters I’m outro

CuriouslyC

It's already happening brother, https://github.com/containers/kubernetes-mcp-server.

Jtsummers

Superfund repos.

throwup238

Now that's an open source funding model governments can get behind.

NitpickLawyer

> After finishing the port, most of the agents settled for writing extra tests or continuously updating agent/TODO.md to clarify how "done" they were. In one instance, the agent actually used pkill to terminate itself after realizing it was stuck in an infinite loop.

Ok, now that is funny! On so many levels.

Now, for the project itself, a few thoughts:

- this was tried before, about 1.5 years ago there was a project setup to spam github with lots of "paper implementations", but it was based on gpt3.5 or 4 or something, and almost nothing worked. Their results are much better.

- surprised it worked as well as it did with simple prompts. "Probably we're overcomplicating stuff". Yeah, probably.

- weird copyright / IP questions all around. This will be a minefield.

- Lots of SaaS products are screwed. Not from this, but from this + 10 engineers in every midsized company. NIH is now justified.

keeda

Is that... the first recorded instance of an AI committing suicide?

ghuntley

> - weird copyright / IP questions all around. This will be a minefield.

Yeah, we're in weird territory because you can drive an LLM as a Bitcoin mixer over intellectual property. That's the entire point/meaning behind https://ghuntley.com/z80.

You can take something that exists, distill it back to specs, and then you've got your own IP. Throw away the tainted IP, and then just run Ralph over a loop. You are able to clone things (not 100%, but it's better than hiring humans).

heavyset_go

> then you've got your own IP.

AI output isn't copyrighted in the US.

rasz

>and then you've got your own IP.

except you dont

CuriouslyC

I started building a project by trying to wire in existing open source stuff. When I looked at the build and stuff that would cause me to bring in, and the actual stuff I needed from the open source tools, it turned out to be MUCH faster/cleaner to just get Claude to check out the repo and port the stuff I needed directly.

Now I do a calculus with dependencies. Do I want to track the upstream, is the rigging around the core I want valuable, is it well maintained? If not, just port and move on.

dhorthy

Yeah the NIH thing is super on point. small saas tools for everything is done. Bring on the hand coded custom in-house admin monolith?

Is Unix “small sharp tools” going away? Is that a relic of having to write everything in x86 and we’re now just finally hitting the end of the arc?

bwestergard

There are always two major results from any software development process: a change in the code and a change in cognition for the people who wrote the code (whether they did so directly or with an LLM).

Python and Typescript are elaborate formal languages that emerged from a lengthy process of development involving thousands of people around the world over many years. They are non-trivially different, and it's neat that we can port a library from one to the other quasi-automatically.

The difficulty, from an economic perspective, is that the "agent" workflow dramatically alters the cognitive demands during the initial development process. It is plain to see that the developers who prompted an LLM to generate this library will not have the same familiarity with the resulting code that they would have had they written it directly.

For some economic purposes, this altering of cognitive effort, and the dramatic diminution of its duration, probably doesn't matter.

But my hunch is that most of the economic value of code is contingent on there being a set of human beings familiar with the code in a manner that requires writing having written it directly.

Denial of this basic reality was an economic problem even before LLMs: how often did churn in a development team result in a codebase that no one could maintain, undermining the long-term prospects of a firm?

ghuntley

Nice. Check out https://ghuntley.com/ralph to learn more about Ralph. It's currently building a Gen-Z esoteric programming language and porting the standard library from Go to the Cursed programming language. The compiler is working, I'm just finishing up the touches of the standard library before launching.

The language is called Cursed.

sfarshid

Thanks Geoff, Ralph was our inspiration to do this!

We were curious to see if we can do away with IMPLEMENTATION_PLAN.md for this kind of task

giantg2

There's a lot of "it kind of worked" in here.

If we actually want stuff that works, we need to come up with a new process. If we get "almost" good code from a single invocation, you just going to get a lot of almost good code from a loop. What we likely need is a Cucumberesque format with example tables for requirements that we can distill an AI to use. It will build the tests and then build the code to to pass the tests.

ghuntley

Strangely enough, TLA+ and other formal proofs work very well for driving Ralph.

giantg2

I would consider that expected but not strange. The thing blocking adoption is that most devs/people find those formal languages difficult or boring. That's even true of things like Cucumber - it's boring and most organizations care little for robust QA.

bigmattystyles

Starting to think of this quote more and more:

"This business will get out of control. It will get out of control and we'll be lucky to live through it."

https://www.youtube.com/watch?v=YZuMe5RvxPQ&t=22s

ramraj07

The irony is that everyone did live through that business. So what youre saying is we will live through this too!

hoppp

I wanted to know how much it cost?

I would be scared to run this without knowing the exact cost.

Its not a good idea to do it without a payment cap for sure, its a new way to wake up with a huge bill the next day.

debazel

They did mention how much they spent here: https://github.com/repomirrorhq/repomirror/blob/main/repomir...

> We spent a little less than $800 on inference for the project. Overall the agents made ~1100 commits across all software projects. Each Sonnet agent costs about $10.50/hour to run overnight.

bckr

$800

gregpr07

AGI was just 1 bash for loop away all this time I guess. Insane project.

cogogo

Less flippantly that was sort of my thought. I’m probably a paranoid idiot and I’m not really sure I can articulate this idea properly but I can imagine a less concise but broader prompt and an agent configured in a way it has privileges you dont want it to have or a path to escalate them and its not quite AGI but its a virus on steroids - like a company or resource (think utilities) killer. I hope Im just missing something but these models seem pretty capable of wreaking all kinds of havoc if they just keep looping and have access nobody in their right mind wants.

dhorthy

was deeply unsettling among other things

ghuntley

It is, isn't it mate? Shit, I stumbled upon Ralph back in February and it shook me to the core.

cogogo

Not that I want to be shaken but what is Ralph? A quick search showed me some marketing tools but that cant be what you are referring to is it?

wrs

I’ve done a few ports like this with Claude Code (but not with a while loop) and it did work amazingly well. The original codebase had a good test suite, so I had it port the test suite first, and gave it some code style guidance up front. Then the agent did remarkably well at doing a straight port from one imperative language to another. Then there’s some purely human work to get it really done — 80-90% done sounds about right.

MagMueller

I would love to fix my docs with this. I have them in the main browser-use repo. What do you recommend that the agent does never push to main browser-use, but only to its own branch?

dhorthy

Yeah you can easily tweak this to push to a branch or a fork or something in the generated prompt.md

rogerrogerr

Does anyone else get dull feelings of dread reading this kind of thing? How do you combat it?

rkachowski

> In one instance, the agent actually used pkill to terminate itself after realizing it was stuck in an infinite loop.

The alexandrian solution to the halting problem.

HN

We put a coding agent in a while loop

We put a coding agent in a while loop