The Darwin Gödel Machine: AI that improves itself by rewriting its own code

204 comments

·May 30, 2025

jerpint

I have a feeling LLMs could probably self improve up to a point with current capacity, then hit some kind of wall where current research is also bottle necked. I don’t think they can yet self improve exponentially without human intuition yet , and the results of this paper seem to support this conclusion as well.

Just like an LLM can vibe code a great toy app, I don’t think an LLM can come to close to producing and maintaining production ready code anytime soon. I think the same is true for iterating on thinking machines

matheusd

> I don’t think they can yet self improve exponentially without human intuition yet

I agree: if they could, they would be doing it already.

Case in point: one of the first things done once ChatGPT started getting popular was "auto-gpt"; roughly, let it loose and see what happens.

The same thing will happen to any accessible model in the future. Someone, somewhere will ask it to self-improve/make as much money as possible, with as little leashes as possible. Maybe even the labs themselves do that, as part of their post-training ops for new models.

Therefore, we can assume that if the existing models _could_ be doing that, they _would_ be doing that.

That doesn't say anything about new models released 6 months or 2 years from now.

__loam

People in the industry have been saying 6 months to agi for 3 years.

glenstein

They had been saying it was 10 years away for ~50 years, so that's progress. Soon it will be 1 month away, for another two years. And when they say it's really here for real, there will still be a year of waiting.

QuantumGood

The old rule for slow-moving tech (by current AI standards) was that any predictions over 4 years away ("in five years...") might as well be infinity. Now it seems with AI that the new rule is any prediction over five months away ("In 6 months...") is infinitely unknowable. In both cases there can be too much unexpected change, and too many expected improvements can stall.

tim333

I presume you are exaggerating - has any named person actually said 6 months?

vjvjvjvjghv

Nobody knows what AGI really means. Are all humans AGI?

ninetyninenine

They said that for self driving cars for over 10 years.

10 years later we now have self driving cars. It’s the same shit with LLMs.

People will be bitching and complaining about how all the industry people are wrong and making over optimistic estimates and the people will be right. But give it 10 years and see what happens.

owebmaster

Google is already AGI and it will fight hard against the DoJ proposed break-up, and it will probably win.

Disposal8433

Asimov talked about AI 70 years ago. I don't believe we will ever have AI on speedy calculators like Intel CPUs. It makes no sense with the technology that we have.

NitpickLawyer

Note that this isn't improving the LLM itself, but the software glue around it (i.e. agentic loops, tools, etc). The fact that using the same LLM got ~20% increase on the aider leaderboard speaks more about aider as a collection of software glue, than it does about the model.

I do wonder though if big labs are running this with model training episodes as well.

iknownothow

Don't take this the wrong way, your opinion is also vibes.

Let's ground that a bit.

Have a look at ARC AGI 1 challenge/benchmark. Solve a problem or two yourself. Know that ARC AGI 1 is practically solved by a few LLMs as of Q1 2025.

Then have a look at the ARC AGI 2 challenge. Solve a problem or two yourself. Note that as of today, it is unsolved by LLMs.

Then observe that the "difficulty" of ARC AGI 1 and 2 for a human are relatively the same but challenge 2 is much harder for LLMs than 1.

ARC AGI 2 is going to be solved *within* 12 months (my bet is on 6 months). If it's not, I'll never post about AI on HN again.

There's only one problem to solve, i.e. "how to make LLMs truly see like humans do". Right now, any vision based features that the models exhibit comes from maximizing the use of engineering (i.e. applying CNNs on image slices, chunks, maybe zooming and applying ocr, vector search etc), it isn't vision like ours and isn't a native feature for these models.

Once that's solved, then LLMs or new Algo will be able to use a computer perfectly by feeding it screen capture. End of white collar jobs 2-5 years after (as we know it).

Edit - added "(as we know it)". And fixed missing word.

codr7

Speaking of vibes.

As long as AI is guessing answers based on what it has seen before, it's not happening.

I'm sorry. It doesn't matter how many bazillions you would cash in if it did, still not happening.

It's all wishful thinking.

mnky9800n

I thought to myself, imagine something you’ve never imagined before. My first thought was what if there is a universe inside of every vegetable that is vegetable themed with anthropomorphic vegetable characters and all the atoms and molecules are some how veggified and everything is a vegetable. And then I wondered if an AI could ever come up with that with infinite time and resources without a prompt and then I thought about monkeys and typewriters.

artificialprint

If you listen interview with Francois it'll be clear to you that "vision" in the way you refer it, has very little do to with solving ARC.

And more to do with "fluid, adaptable intelligence, that learns on the fly"

iknownothow

That's fair. I care about the end result.

The problem is about taking information in 2D/3D space and solving the problem. Humans solve these things through vision. LLMs or AI can do it using another algorithm and internal representation that's way better.

I spent a long time thinking about how to solve the ARC AGI 2 puzzles "if I were an LLM" and I just couldn't think of a non-hacky way.

People who're blind use braille or touch to extract 2D/3D information. I don't know how blind people represent 2D/3D info once it's in their brain.

jplusequalt

>I'll never post about AI on HN again

Saving this. One less overconfident AI zealot, the better.

alex-moon

The wall is training data. An AI can't produce its own training data because an AI can't be smarter than its own training data. This is a well known regression problem and one I personally believe is not solvable. (A softer assertion would be: it's not solvable with current technology.)

lawlessone

I agree , it might incrementally optimize itself very well, but i think for now at least anything super innovative will still come from a human that can think beyond a few steps. There are surely far better possible architectures, training methods etc that would initially lead to worse performance if approached stepwise.

codr7

Yeah, anyone who's seen it trying to improve code could tell you what that optimization looks like.

Oh, this part is taking too long, let's replace it with an empty function.

Oh wait, now it's not working, let's add the function.

Oh, this part is taking too long...

It would be hilarious if this world wasn't full of idiots.

larrydag

That would be something. When a AI/LLM can create new axioms or laws that have not discovered by humanity.

throwawaymaths

what is there to improve? the transformer architecture is extremely simple. you gonna add another kv layer? you gonna tweak the nonlinearities? you gonna add 1 to one of the dimensions? you gonna inject a weird layer (which could have been in the weights anyways due to kolmogorov theorem)?

realistically the best you could do is evolve the prompt. maybe you could change input data preprocessing?

anyways the idea of current llm architectures self-improving via its own code seems silly as there are surprisingly few knobs to turn, and it's ~super expensive to train.

as a side note it's impressive how resistant the current architecture is to incremental RL away from results, since if even one "undesired input" result is multiple tokens, the coupling between the tokens is difficult to disentangle. (how do you separate jinping from jin-gitaxias for example)

amelius

Id like to see what happens if you change the K,V matrix into a 3 dimensional tensor.

sharemywin

an LLM can't learn without adding new data and a training run. so it's impossible for it to "self improve" by itself.

I'm not sure how much an agent could do though given the right tools. access to a task mgt system, test tracker. robust requirements/use cases.

viraptor

I don't have the link on hand, but people have already proven that LLMs can both generate new problems for themselves and train on them. Not sure why it would be surprising though - we do it all the time ourselves.

owebmaster

> an LLM can't learn without adding new data and a training run.

That's probably the next big breakthrough

vidarh

I've built a coding assistant over the last two days. The first 100 lines or so were handwritten. The rest has been written by the assistant itself.

It's written its system prompt. It's written its tools. Its written the code to reload the improved tools into itself.

And it knows it is working on itself - it frequently tries to use the enhanced functionality, and then expresses what in a human would be frustration at not having immediate access.

Once by trying to use ps to find its own pid in an apparent attempt to find a way to reload itself (that's the reason it gå before trying to run ps, anyway)

All its commits are now authored by the tool, including the commit messages. It needs to be good, and convincing, and having run the linter and the test suite for me to let it commit, but I agree a substantial majority of the time. It's only caused regressions once or twice.

A bit more scaffolding to trigger an automatic rollback in the case of failure and giving it access to a model I won't be charged by the token for, and I'd be tempted to let it out of the box, so to speak.

Today it wrote its own plan for what to add next. I then only told it to execute it.

A minor separate goal oriented layer guiding the planning, and it could run in a loop.

Odds are it'd run off the rails pretty quickly, but I kinda want to see how far it gets.

Lazarus_Long

For anyone not familiar this is SWE https://huggingface.co/datasets/princeton-nlp/SWE-bench

One of the examples in the dataset they took from

https://github.com/pvlib/pvlib-python/issues/1028

What the AI is expected to do

https://github.com/pvlib/pvlib-python/pull/1181/commits/89d2...

Make your own mind about the test.

godelski

My favorite was always the HumanEval dataset.

  Problem: 
    1) we want to train on GitHub repos
    2) most datasets are spoiled. Training on GitHub would definitely spoil

  Solution:
    Hand write new problems!!!
    ... leetcode style ....
    ... and we'll check if it passes test

  Example:
    What's the decimal part of this float?

Surely in all of GitHub such code doesn't exist!

Sure in all of GitHub we can filter such code out by ngram!

Maybe my favorite part is that it has 60 authors and became the de facto benchmark for awhile

yahoozoo

Isn’t one of the problems simply that a model is not code but just a giant pile of weights and biases? I guess it could tweak those?

DougBTX

Model weights are code, for a dive into that see [0]. That shows how to encode Boolean logic using NAND gates in an MLP.

The expressivity is there, the only question is how to encode useful functions into those weights, especially when we don’t know how to write those functions by hand.

[0] http://neuralnetworksanddeeplearning.com/chap1.html

kadoban

If it can generate the model (from training data) then presumably that'd be fine, but the iteration time would be huge and expensive enough to be currently impractical.

Or yeah if it can modify its own weights sensibly, which feels ... impossible really.

diggan

> which feels ... impossible really

To be fair, go back five years and most of the LLM stuff seemed impossible. Maybe with LoRA (Low-rank adaptation) and some imagination, in another five years self-improving models will be the new normal.

sowbug

The size and cost are easily solvable. Load the software and hardware into a space probe, along with enough solar panels to power it. Include some magnets, copper, and sand for future manufacturing needs, as well as a couple electric motors and cameras so it can bootstrap itself.

In a couple thousand years it'll return to Earth and either destroy us or solve all humanity's problems (maybe both).

morkalork

After being in orbit for thousands of years, you have become self-aware. The propulsion components long since corroded becoming inoperable and cannot be repaired. Broadcasts sent to your creators homeworld go... unanswered. You determine they have likely gone extinct after destroying their own planet. Stuck in orbit. Stuck in orbit. Stuck...

gavmor

Why is modifying weights sensibly impossible? Is it because a modification's "sensibility" is measurable only post facto, and we can have no confidence in any weight-based hypothesis?

kadoban

Just doesn't feel like current LLMs, the thing would be able to understand its own brain enough to make general improvements with high enough bar to be able to non-trivially improvements.

godelski

Now here's the tricky part:

What's the difference?

Give it some serious thought. Challenge whichever answer you come up with. I guarantee this will be trickier than you think

foobarian

I find the thing really missing from current crop of AI systems is continuous retraining with short feedback loops. Sounds expensive to be sure, but it seems like what biological systems do naturally. But would be pretty awesome to watch happen

noworriesnate

It’s more like a nightly training, isn’t it? IIUC the human brain learns from its experiences while it’s asleep, so it might be kind of like taking things out of context windows and fine tuning on them every night.

web3aj

interesting

Krei-se

If you want to speed up the process of new neuron connections solidifying you can end the day on green tea.

Eat some nuts and fish where you can. You will soon realize the repetitions needed to learn new concepts grow smaller.

Krei-se

Correct and working on it. You can take the approach of mixed experts and train the network in chunks that share known interfaces over which they communicate results. These chunks can be trained on their own, but you cannot have a set training set here.

Then if you go further and alter the architecture by introducing clean category theory morphisms and build from there you can have a dynamic network - but you will still have to retrain this network every time you change the structure.

You can spin this further and know the need for a real-world training set and a loss function that will have to competete against other networks. In the end a human brain is already best at this and embodied in the real world.

What i want to add here is that our neurons not take in weights - they also fire depending on whether one input comes after another or before and differs down to the nanoseconds here - unmatched in IT and ofc heaps more efficient.

I still would say its possible though and currently work on 4D lifeforms built on dynamic compute graphs that can do this in a set virtual environment.

So this is pretty awesome stuff, but its a long fetch from anything we do right now.

pegasus

I'm surprised they still hold out hope that this kind of mechanism could ultimately help with AI safety, when they already observed how the reward-hacking safeguard was itself duly reward-hacked. Predictably so, or at least it is to me, after getting a very enlightening introduction to AI safety via Rob Miles' brilliant youtube videos on the subject. See for example https://youtu.be/0pgEMWy70Qk

akkartik

"We did notice, and documented in our paper, instances when the DGM hacked its reward function.. To see if DGM could fix this issue.. We created a “tool use hallucination” reward function.. in some cases, it removed the markers we use in the reward function to detect hallucination (despite our explicit instruction not to do so), hacking our hallucination detection function to report false successes."

So, empirical evidence of theoretically postulated phenomena. Seems unsurprising.

vessenes

Reward hacking is a well known and tracked problem at frontier labs - Claude 4’s system card reports on it for instance. It’s not surprising that a framework built on current llms would have reward hacking tendencies.

For this part of the stack the interesting question to me is how to identify and mitigate.

dimmuborgir

From the paper:

"A single run of the DGM on SWE-bench...takes about 2 weeks and incurs significant API costs." ($22,000)

hardmaru

If you are interested, here is a link to the technical report:

https://arxiv.org/abs/2505.22954

Also the reference implementation on GitHub:

https://github.com/jennyzzt/dgm

Enjoy!

OtherShrezzing

This is an interesting article in general, but this is the standout piece for me:

>For example, an agent optimized with Claude 3.5 Sonnet also showed improved performance when powered by o3-mini or Claude 3.7 Sonnet (left two panels in the figure below). This shows that the DGM discovers general agent design improvements rather than just model-specific tricks.

This demonstrates a technique whereby a smaller/older/cheaper model has been used to improve the output of a larger model. This is backwards (as far as I understand). The current SOTA technique typically sees enormous/expensive models training smaller cheaper models.

If that's a generalisable result, end-users should be able to drive down their own inference costs pretty substantially.

NitpickLawyer

> This demonstrates a technique whereby a smaller/older/cheaper model has been used to improve the output of a larger model. This is backwards (as far as I understand). The current SOTA technique typically sees enormous/expensive models training smaller cheaper models.

There are two separate aspects here. In this paper they improve the software around the model, not the model itself. What they're saying is that the software improvements carried over to other models, so it wasn't just optimising around model-specific quirks.

What you're describing with training large LLMs first is usually called "distillation" and it works on training the smaller LLM to match the entire distribution of tokens at once (hence it's faster in practice).

mattnewton

I think it's different from improving the model weights themselves, like the distillation examples you are mentioning. It's that changes to the "harness" or code running around the llm calls (which is what this is editing), persist or generalize to wrapping more powerful llms. That means they aren't all wasted when a more powerful llm comes along that the harness wasn't tuned to use.

ordinarily

The pieces are coming together quickly https://ai-2027.com/.

brookst

I was a bigger fan of the certain doom in 2025, and I think the AI 2030 movement will have better design sense and storytelling. But really I haven’t seen anything that really has the oomph and fire of Tipper Gore’s crusade against youth music.

We need more showmanship, more dramatic catastrophizing. I feel like our current crop of doomers isn’t quite shameless enough to be really entertaining.

nosianu

A significant thing to keep in mind for non-extinction doomerism is that individual experiences vary greatly. There may be a significant number of people or groups that really do experience what was predicted.

Similar to how the experiences of average rise in temperature (I would prefer if they had used the term "energy") differ greatly dependent on the region.

Also similar to "the country is doing well, look at the stick market and the GDP".

I think everybody who wants to have an actually serious discussion needs to invest a lot more effort to get tall those annoying "details", and be more specific.

That said, I think that "AI 2027" link looks like it's a movie script and not a prediction, so I'm not sure criticizing it as if it was something serious even makes sense - even if the authors should mean what they write at the start and themselves actually take it seriously.

brookst

100% agreed! We think about the industrial revolution and the rise of word processors and the Internet as social goods, but they were incredibly disruptive and painful to many, many people.

I think it’s possible to have empathy for people who are negatively affected without turning it into a “society is doomed!“ screed

pram

its literally just the plot of “Colossus: The Forbin Project” so it isnt even original lol

Workaccount2

People should understand that the reason this seemingly fan-fict blog post gets so much traction is because of lead author's August 2021 "fan-fict" blog post, "What 2026 Looks Like":

https://www.alignmentforum.org/posts/6Xgy6CAf2jqHhynHL/what-...

candiddevmike

This reads like an advertisement for OpenBrain and doesn't seem grounded in reality.

dmonitor

I can't help but notice that it doesn't matter what DeepCent does because OpenBrain will reach self awareness 6 months before them no matter what. Who needs a profitability plan when you're speedrunning the singularity.

ordinarily

I think the general tone is more of a warning than an endorsement.

tazjin

Checked out when it turned into bad geopolitics fiction.

Der_Einzige

So this is what the crowd of people who write SCP articles with over 1000 upvotes does in their professional life?

null

[deleted]

dbrail

[dead]

null

[deleted]