Leaving Meta and PyTorch
15 comments
·November 7, 2025chopete3
>>Every major AI company and hardware vendor are on a speed dial. This kind of power is really hard to give up. But curiosity ultimately won out in my head.
A simple feeling has such a power. May he gets an opportunity to create more another powerful tool before retiring.
qmatch
As a loyal JAX user, I hope they can play catchup. PyTorch has dominated the AI scene since TF1 fumbled the ball at 10th yard line. What Matt Johnson has done turning Autograd into JAX is hopefully going to be worthy of as much praise as what Soumith has received.
n_u
> PyTorch has dominated the AI scene since TF1 fumbled the ball at 10th yard line
can you explain why you think TensorFlow fumbled?
zapnuk
For me it was about 8 years ago. Back then TF was already bloated but had two weaknesses. Their bet on static compute graphs made writing code verbose and debugging difficult.
The few people I know back then used keras instead. I switched to PyTorch for my next project which was more "batteries included".
intermerda
Do you have experience in both JAX and PyTorch? Why do you prefer JAX?
BoredPositron
The last few years must have been incredibly exhausting. Thanks for your work good luck and 73.
mxkopy
PyTorch is one of those tools that’s so simple and easy to take apart that you feel like you might’ve been able to make it yourself. I can’t imagine how much engineering effort was behind all those moments where I thought to myself, “of course it should work like that, how can it be any other way?”
TechnicolorByte
Can anyone recommend a technical overview describing the design decisions PyTorch made that led it to win out?
huevosabio
I don't know the full list, but back when it came out, TF felt like a crude set of bindings to the underlying c++/CUDA workhorse. PyTorch felt, in contrast, pythonic. It was much closer in feeling to numpy.
puttycat
I think it was mostly the eager evaluation that made it possible to debug every step in the network forward/backward passes. Tensorflow didn't have that at the time which made debugging practically impossible.
perfmode
Respect.
msmd74
Sounds like you had a momentous run.
If you take advice from reformed Internet trolls, consider turning off all your devices and trying to give yourself at least a week, but ideally a month offline staring at your new baby. You'll never get that time back and there's nothing your brain will appreciate more than loading up those memories as they grow.
Good luck.
zkmon
[flagged]
theusus
[flagged]
That man has an infective enthusiasm. I remember the DCGAN paper inspired me to try getting the (Lua) Torch code to work, and I tried it on the Oxford flowers dataset early on. It worked surprisingly well, and Soumith Chintala even shared it around in social media, surprised at how well it worked on such a small dataset. Of course back then we didn't really appreciate the problem of mode collapse.
Pytorch and old Lua Torch were a pleasure to work with compared to the contemporary Tensorflow. Lots of S.C's code was copied around liberally, it had its quirks (I remember the DCGAN code had a pretty odd way of doing parameter passing) but it was also really easy to understand and made random people like me feel like we had suddenly stumbled onto something crazy powerful (which we had!). It was wonderfully hackable.