Yes, Claude Code can decompile itself. Here's the source code
44 comments
·March 1, 2025markisus
haolez
But it should be easy to generate such data to train an AI to do that, if someone wants, no?
soulofmischief
Some transformations irrecoverably lose information. A recontextualization engine such as an LLM might be able to "recover" some information by comparing it to other code in its training set, but it's still a guess and not all code will have representation in the training set.
brookst
LLMs can synthesize info not in the training set. They should be just as capable of looking at a binary where info has been lost and recreating source just like a human would. Stuff like variable names won’t be exact (for either human or LLM) but can be reasonable inferences based on usage.
It’s a guess, sure, but I don’t see why it would be a less good guess than a human’s.
Retr0id
This is exactly what skilled human reverse engineers do. Almost no code is truly novel, at best it's a novel recombination of existing ideas. So yes, some information is lost, but all information relevant to the execution of the program can certainly be recovered.
Kye
I wonder if this would be an ideal use case for diffusion-based LLMs slowly piecing it together with each pass from what it could determine from the last.
kelsey98765431
Replace the word easy there with the phrase "technically may be possible with great effort and expense, probably"
kees99
Decent optimizing compiler would necessarily loose a fair bit of source information - loop unrolling, function inlining, tail-call optimizations, etc...
There is no good way to reconstruct that, AI/ML or not.
brookst
For loop unrolling at least, wouldn’t most human reverse engineers see the original loop? It’s far more likely that the source had a loop than that some high level language programmer did the unrolling in source.
mpalmer
Three years ago, you wrote
> Systemically, I'm concerned that there is a lack of professional liability, rigorous industry best practices, and validation in the software industry which contributes to why we see Boeings flying themselves into the ground, financial firms losing everyone's data day in and out, and stories floating around our industry publications about people being concerned about the possibility of a remotely exploitable lunar lander on Mars.
> There's a heap of [comical?] tropes in the software industry that are illogical/counterproductive to the advancement of our profession and contribute to why other professions think software developers are a bunch of immature spoiled children that require constant supervision.
3 weeks ago you posted something titled "The future belongs to people who can just do things".
Today you post this:
> Because cli.mjs is close to 5mb - which is way bigger than any LLM context window out here. You're going to need baby sit it for a while and feed it reward tokens of kind words ("your doing good, please continue") and encourage it to keep on going on - even if it gives up. It will time out, lots...
I don't think you are someone who can just "do things" if you think a good way to de-obfuscate 5MB of minified javascript is to pass it to a massive LLM.
Do you think you are advancing your profession?
viraptor
I'm not sure why this is framed as an issue for security teams. Transpiling software has been a thing for ages. Especially in the JS world. Decompiling has been a bit harder without automation, but unless you have black box tests, this process will take ages to verify that the result has matching functionality.
So why would the blue teams care beyond "oh fun, a new tool for speeding up malware decompilation"?
Edit: To be clear, I get the new reverse engineering and reimplementation possibilities got much better and simpler. But the alarmist tone seems weird.
giancarlostoro
Agreed. Are we supposed to stop developing all software as a result? Ask for LLMs to censor reverse engineering? Someone else wont care and will build another LLM to bypass limitations.
Avicebron
It seems like "red-teaming" and "security research" have become more socially prominent recently, so people naturally aligned with grift, e.g. making things seems alarmist and that they are the only ones in the know, are trying to seem part of the club?
SebFender
On point. We're not really interested in these things. Yeah we take a look at it and stay informed but the main focus remains bypass of controls and data itself.
With decent backend controls - apps don't/shouldn't do much in the end. Once you show information on a screen consider it potentially gone.
IshKebab
Erm sure... so is the output actually any good? I don't think anyone doubted that the LLM could produce some output but I would like to know if it is actually good output. Does it compile? Does it make sense?
causal
Why does the post avoid this obvious question. Claude is impressive, but it still hallucinates a lot.
You really need to be able to build + run + verify features + compare compiled outputs; then you can be somewhat confident it really did what the author is claiming.
saagarjha
> You might be wondering why I've dumped a transpilation of the source code of Claude Code onto GitHub and the reason is simple. I'm not letting an autonomous closed source agent run hands free on my infrastructure and neither should you.
Asking it for its source code (AI never lies, right?) and then buying it on your personal card so corporate security doesn’t know what you’re doing makes me feel a lot better about it.
mtrovo
I don't understand Anthropic's reluctance to release this project as an npm package but not open-source it. Claude Code is such a great example of how agents could work in the future that the whole community could benefit from studying it. Plus, the work on integrating MCPs alone could create a huge network effect opportunity for them, one that's much bigger than keeping the source code secret.
All they've done so far is add an unnecessary step by putting a bounty on who will be the first to extract all the prompts and the agent orchestration layer.
Etheryte
One obvious reason to not make it open source is licensing, if you do that, all of your competitors can cookie cutter copy what you're doing.
jbellis
A better writeup on reverse engineering CC: https://github.com/Yuyz0112/claude-code-reverse
aeve890
People needs LLM to transpile JS now? Unless it can reliable extract semantics I don't see the novelty.
yodon
I found this article [0] by the same author and linked in the post more personally valuable - great insights into expert-level use of Cursor.
gabrieledarrigo
A pretty boring article.
causal
The concept was interesting (build up your own stdlib of Cursor rules) but there's a kind of hyperbolic/click-baity flavor to both articles that undermine them a little, I think they would stand fine on their own if they just cooled the dramatics a bit.
null
amelius
> these LLMs are shockily good at transpilation and structure to structure conversions
I wonder if it is possible to transpile all the C Python modules to an api version that has no GIL, this way.
Retr0id
I'm confident you could get code output that compiles and runs, but not confident that you wouldn't end up with subtle race conditions. Aside from anything else, there's not much nogil C python code in training sets yet.
licnep
interesting, i never thought about this use case before, but LLMs may be exceedingly good at code deobfuscation and decompilation
The article contains a reference to a much more impressive task where a user automatically decompiled a binary exe game into Python. But I read their original post and here is what that user said.
> Several critics seemed to assume I claimed Claude had "decompiled" the executable in the traditional sense. In reality, as I described in our conversation, it analyzed visible strings and inferred functionality - which is still impressive but different from true decompilation.
So I’m not sure that the implications are as big as the article author is claiming. It seems Claude is good at de-minifying JavaScript but that is a long way away from decompiling highly optimized binary code.