Yes, Claude Code can decompile itself. Here's the source code

122 comments

·March 1, 2025

markisus

The article contains a reference to a much more impressive task where a user automatically decompiled a binary exe game into Python. But I read their original post and here is what that user said.

> Several critics seemed to assume I claimed Claude had "decompiled" the executable in the traditional sense. In reality, as I described in our conversation, it analyzed visible strings and inferred functionality - which is still impressive but different from true decompilation.

So I’m not sure that the implications are as big as the article author is claiming. It seems Claude is good at de-minifying JavaScript but that is a long way away from decompiling highly optimized binary code.

ec109685

The original post about Claude decompiling the game lied too.

He had Claude gin up a post that would go viral and it added this hallucination:

  Perfectly replicated the functionality [of the original game]

https://claude.ai/share/3eecebc5-ff9a-4363-a1e6-e5c245b81a16

Snuggly73

Indeed, if you read the attached Claude transcript to the original post, you will see that it finds some strings and infers the functionality (i.e. it has .wav files, thus it probably plays them).

https://claude.ai/share/3eecebc5-ff9a-4363-a1e6-e5c245b81a16

casey2

I don't see why these claims are even being made. It's well known that a transformer could in theory (in practice implementation details prevent this) translate given all the data in, say, a nes cartridge translate it into a git repo on a linux machine that compiles for x86, or even just an elf x86-64 binary directly.

Is the claim a general system exists? I'm extremely doubtful of that claim, but one that could do every published nes game to some current linux enviornment? Would definitely be easier than making something like current Claude.

ninetyninenine

[flagged]

markisus

It's a little hard to understand your criticism. From just reading the original article, I had assumed that Claude could decompile binaries. The author said

> Understand dear reader that this technique can be done on any programming language and even from pre-existing binaries themselves.

Following that sentence, the author included a twitter embed pointing to a reddit thread about decompiling a binary. Only after I went to the reddit thread, I found that there was no decompilation involved.

haolez

But it should be easy to generate such data to train an AI to do that, if someone wants, no?

soulofmischief

Some transformations irrecoverably lose information. A recontextualization engine such as an LLM might be able to "recover" some information by comparing it to other code in its training set, but it's still a guess and not all code will have representation in the training set.

Kye

I wonder if this would be an ideal use case for diffusion-based LLMs slowly piecing it together with each pass from what it could determine from the last.

Retr0id

This is exactly what skilled human reverse engineers do. Almost no code is truly novel, at best it's a novel recombination of existing ideas. So yes, some information is lost, but all information relevant to the execution of the program can certainly be recovered.

gchamonlive

With information loss what you previously had as a single solution becomes a generalisation which can be decompiled into many different equally valid solutions.

That means even though information is irrecoverably transformed during compilation, LLM could come up with semantically valid expanded solution that both fits the compiled information, that is it can be recompiled successfully, and makes sense semantically for whoever would eventually want to evolve the decompiled code with an understanding of the domain.

So the goal isn't to recover exactly what code really was, but to provide a possible solution that fits and is useful.

brookst

LLMs can synthesize info not in the training set. They should be just as capable of looking at a binary where info has been lost and recreating source just like a human would. Stuff like variable names won’t be exact (for either human or LLM) but can be reasonable inferences based on usage.

It’s a guess, sure, but I don’t see why it would be a less good guess than a human’s.

kelsey98765431

Replace the word easy there with the phrase "technically may be possible with great effort and expense, probably"

kees99

Decent optimizing compiler would necessarily loose a fair bit of source information - loop unrolling, function inlining, tail-call optimizations, etc...

There is no good way to reconstruct that, AI/ML or not.

brookst

For loop unrolling at least, wouldn’t most human reverse engineers see the original loop? It’s far more likely that the source had a loop than that some high level language programmer did the unrolling in source.

viraptor

I'm not sure why this is framed as an issue for security teams. Transpiling software has been a thing for ages. Especially in the JS world. Decompiling has been a bit harder without automation, but unless you have black box tests, this process will take ages to verify that the result has matching functionality.

So why would the blue teams care beyond "oh fun, a new tool for speeding up malware decompilation"?

Edit: To be clear, I get the new reverse engineering and reimplementation possibilities got much better and simpler. But the alarmist tone seems weird.

sbarre

I read the blog post (especially in combination with the end bit) at least in part, as an advertisement for the author's capabilities and services.

That makes the tone make a bit more sense to me.

Avicebron

It seems like "red-teaming" and "security research" have become more socially prominent recently, so people naturally aligned with grift, e.g. making things seems alarmist and that they are the only ones in the know, are trying to seem part of the club?

giancarlostoro

Agreed. Are we supposed to stop developing all software as a result? Ask for LLMs to censor reverse engineering? Someone else wont care and will build another LLM to bypass limitations.

SebFender

On point. We're not really interested in these things. Yeah we take a look at it and stay informed but the main focus remains bypass of controls and data itself.

With decent backend controls - apps don't/shouldn't do much in the end. Once you show information on a screen consider it potentially gone.

IshKebab

Erm sure... so is the output actually any good? I don't think anyone doubted that the LLM could produce some output but I would like to know if it is actually good output. Does it compile? Does it make sense?

causal

Why does the post avoid this obvious question. Claude is impressive, but it still hallucinates a lot.

You really need to be able to build + run + verify features + compare compiled outputs; then you can be somewhat confident it really did what the author is claiming.

Snuggly73

The other linked "oh fuck" article https://ghuntley.com/oh-fuck/ where it supposedly converts https://github.com/RustAudio/cpal at least has a linked video. As far as I can see, based on that video's file tree/Haskell files - its not exactly doing what's described in the article. I'll extrapolate from there...

futasuki

No, the ouput is all hallucination. The minified version contains many prompts that can be easily found. None of them appear in the authors result. None of the code structures and identifiers of the minfied version are present. Its all BS.

mpalmer

Three years ago, you wrote

> Systemically, I'm concerned that there is a lack of professional liability, rigorous industry best practices, and validation in the software industry which contributes to why we see Boeings flying themselves into the ground, financial firms losing everyone's data day in and out, and stories floating around our industry publications about people being concerned about the possibility of a remotely exploitable lunar lander on Mars.

> There's a heap of [comical?] tropes in the software industry that are illogical/counterproductive to the advancement of our profession and contribute to why other professions think software developers are a bunch of immature spoiled children that require constant supervision.

3 weeks ago you posted something titled "The future belongs to people who can just do things".

Today you post this:

> Because cli.mjs is close to 5mb - which is way bigger than any LLM context window out here. You're going to need baby sit it for a while and feed it reward tokens of kind words ("your doing good, please continue") and encourage it to keep on going on - even if it gives up. It will time out, lots...

I don't think you are someone who can just "do things" if you think a good way to de-obfuscate 5MB of minified javascript is to pass it to a massive LLM.

Do you think you are advancing your profession?

rafram

Why do you feel the need to be so rude about an interesting little blog post?

Obviously you don’t need an LLM to prettify obfuscated JavaScript. But take a look at the repo. It didn’t just add the whitespace back — it restored the original file structure, inferred function and variable names, wrote TypeScript type definitions based on usage, and added (actually decent) comments throughout the source code. That simply isn’t possible without an LLM.

mpalmer

> That simply isn’t possible without an LLM.

Do you have a lot of experience with minified code?

rafram

Yes, I do.

Please link to a tool that can infer function/variable names and TypeScript type definitions from minified JS without using LLMs or requiring significant user input.

helsinki

That's the thing - it wasn't even interesting. It was just some LinkedIn garbage post, in my opinion.

bryanrasmussen

I get the feeling that you think they are not advancing their profession and that the quotes you have made of their work are such obvious examples of some problem with them that your parting question is some sort of stinging rebuke - is that correct? If so I have to admit I'm not following the connections.

mpalmer

It's a rebuke. I don't mind that you're not following the connections.

jameshart

This feels very much like the work of someone with ‘just enough knowledge to be dangerous’.

At no point in this process does the author seem to stop and inspect the results to see if they actually amount to what he’s asking for. Claiming that this output represents a decompilation of the obfuscated target seems to require at least demonstrating that the resulting code produces an artifact that does the same thing.

Further, the claim that “Using the above technique you can clean-room any software in existence in hours or less.” is horrifyingly naive. This would in no way be considered a ‘clean room’ implementation of the supplied artifact. It’s explicitly a derived work based on detailed study of the published, copyrighted artifact.

Please step away from the LLM before you hurt someone.

Snuggly73

“Using the above technique you can clean-room any software in existence in hours or less.”

Having spent my misguided youth doing horrible things to Sentinel Rainbow and its cousins - I can only chuckle.

dogma1138

For those who don’t know Sentinel Rainbow is a DRM dongle these were popular in the 90’s through early 2000’s for enterprise/business software especially that could be useful on the smaller end of the businesses scale (CRM, ERP, CAD/CAM) where piracy concerns were much bigger due to the relative high cost of development combined with a relatively small market to begin with.

SAAS pretty much made all of that obsolete, since with SAAS you get an unbeatable DRM for free.

29athrowaway

If my understanding is correct, what's legally protected is reproducing a proprietary IP design rather than studying the design?

If you create a new design that doesn't have the proprietary elements in it that's not grounds for copyright infringement?

jazzyjackson

If you studied the source code, you can't say you've independently created the copy, the trick with clean room design is the implementers are going off a specification which does not include copyrighted material.

https://en.m.wikipedia.org/wiki/Clean-room_design

But maybe you could use one LLM to study the software and write a specification, then throw that over the wall to a different human who uses an LLM to write software based on that spec

spwa4

So you should do the same as with normal clean-room practices?

LLM1: code -> english description of code

LLM2: english description of code -> code

And that would be clean room? Might be cool to automate that. I bet you could train LLMs to do exactly that.

futasuki

I checked the author’s result. It is 100 percent BS. All hallucination. Has nothing to do with the original claude code.

saagarjha

> You might be wondering why I've dumped a transpilation of the source code of Claude Code onto GitHub and the reason is simple. I'm not letting an autonomous closed source agent run hands free on my infrastructure and neither should you.

Asking it for its source code (AI never lies, right?) and then buying it on your personal card so corporate security doesn’t know what you’re doing makes me feel a lot better about it.

zeckalpha

That's not the usual definition of clean room.

If you had it generate tests then handed the tests off to a second agent to implement against...

zahlman

> Please understand that restrictive software licenses no longer matter because these LLMs can be driven to behave like Bitcoin mixers that bypass licensing and copyright restrictions using the approach detailed in this blog post.

This reads to me like "Please understand that legal protections no longer matter because computers can now break the law for you automatically".

thegeomaster

This is total bullshit. It's clear by spending 2 minutes with the output, located on https://github.com/ghuntley/claude-code-source-code-deobfusc....

The AI has just made educated guesses about the functionality, wrote some sensible-looking code and hallucinated a whole lot.

The provided code on GitHub does not compile, does not work in the slightest, does not include any of the prompts from the original source, does not contain any API URLs and endpoints from the original, and uses Claude 3 Opus! And this is just from a cursory 5-minute look.

jasonjmcghee

This needs to be the top comment. Many folks are arriving on the post and taking it at face value. I don't know if it's ignorance or an attempt at a publicity stunt on the author's part, but it isn't at all what they claim.

Sharlin

Based on the author's other blog posts, they certainly seem to have drunk the LLM Kool-Aid. Likely enough of it to make their conclusions perhaps slightly biased.

thegeomaster

> I don't know if it's ignorance or an attempt at a publicity stunt on the author's part, but it isn't at all what they claim.

I let the author know on Twitter too: https://x.com/thegeomaster/status/1895869781229912233

If it's the former, I assume he will update or take down the blog post.

jasonjmcghee

> but if it’s not a 1:1 then deobfuscate is the wrong word for sure

"if it's not" is so troubling

jbellis

A better writeup on reverse engineering CC: https://github.com/Yuyz0112/claude-code-reverse

aeve890

People needs LLM to transpile JS now? Unless it can reliable extract semantics I don't see the novelty.

vlovich123

I’m pretty sure translation of a text into another language would still count as copyright infringement. It may be hard to prove, but this isn’t a copyright bypass.

Mathnerd314

There is a question of originality. If the variable names, comments, etc. are preserved, then yes, it is probably a derivative work. But here, where you are starting from the obfuscated code, there is an argument that the code is solely functional, hence doesn't have copyright protection. It's like how if I take a news article and write a new article with the same facts, there's no copyright protection (witness: news gets re-reported all the time). There is a fine line between "this is just a prompt, not substantial enough to be copyrightable" and "this is a derivative work" which is still being worked out in the legal system.

1970-01-01

Only if the translation is then published

HN

Yes, Claude Code can decompile itself. Here's the source code

Yes, Claude Code can decompile itself. Here's the source code