PyTorch Internals: Ezyang's Blog
32 comments
·March 22, 2025smokel
swyx
i think the problem w the podcast format (ironic for me to say) is that it assumes a lot higher familiarity with the apis than is afforded by any visual medium including blogs
smokel
Agreed, but I'm still very happy that some people try. I'm really not that much interested in the weather or listening to idle chit-chat, and for some reason most podcasts seem to focus on that.
null
banana_dick_8
[flagged]
zcbenz
For learning internals of ML frameworks I recommend reading the source code of MLX: https://github.com/ml-explore/mlx .
It is a modern and clean codebase without legacies, and I could understand most things without seeking external articles.
ForceBru
Why is MLX Apple silicon only? Is there something fundamental that prevents it from working on x86? Are some core features only possible on Apple silicon? Or do the devs specifically refuse to port to x86? (Which is understandable, I guess)
I'm asking because it seems to have nice autodiff functionality. It even supports differentiating array mutation (https://ml-explore.github.io/mlx/build/html/usage/indexing.h...), which is something JAX and Zygote.jl can't do. Instead, both have ugly tricks like `array.at[index].set` and the `Buffer` struct.
So it would be cool to have this functionality on a "regular" CPU.
zcbenz
Most features are already supported on x86 CPUs, you can pip install mlx on Linux , and you can even use it on Windows (no official binary release yet but it is building and tests are passing).
saagarjha
I think it relies heavily on unified memory.
alexrigler
This is a fun blast from the near past. I helped organize the PyTorch NYC meetup where Ed presented this and still think it's one of the best technical presentations I've seen. Hand drawn slides for the W. Wish I recorded :\
chuckledog
Great article, thanks for posting. Here’s a nice summary of automatic differentiation, mentioned in the article and core to how NN’s are implemented: https://medium.com/@rhome/automatic-differentiation-26d5a993...
hargun2010
I guess its longer version of slides but not new I saw comment from as far back as 2023, nonetheless good content (resharable).
aduffy
Edward taught a Programming Languages class I took nearly a decade ago, and clicking through here I immediately recognized the illustrated slides, brought a smile to my face
vimgrinder
For someone it might help: If you are having trouble reading long articles, try text-to-audio with line highlight. It helps a lot. It has cured my lack of attention.
PeterStuer
No trouble reading the article. Those slides though. Make my eyes hurt :(
vimgrinder
they were constantly referred too in the text :/ impossible to skip
pizza
Btw, would anyone have any good resources on using pytorch as a general-purpose graph library? Like stuff beyond the assumption of nets = forward-only (acyclic) digraph
bilal2vec
See also dev forum roadmaps [1] and design docs (e.g. [2], [3],[4])
[1]: https://dev-discuss.pytorch.org/t/meta-pytorch-team-2025-h1-...
[2]: https://dev-discuss.pytorch.org/t/pytorch-symmetricmemory-ha...
[3]: https://dev-discuss.pytorch.org/t/where-do-the-2000-pytorch-...
[4]: https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-s...
null
brutus1979
Is there a video version of this? It seems it is from a talk?
nitrogen99
2019. How much of this is still relevant?
mlazos
I used this to onboard to the PyTorch team a few years ago. It’s useful for understanding the key concepts of the framework. Torch.compile isn’t covered but the rest of it is still pretty relevant.
kadushka
I’m guessing about 80%
sidkshatriya
To understand a complex system, sometimes it better to understand a (simpler) model system. Sometimes an older version of the same system is that good model system. This is not true always but a good rule of thumb.
Also interesting in this context is the PyTorch Developer Podcast [1] by the same author. Very comforting to learn about PyTorch internals while doing the dishes.
[1] https://pytorch-dev-podcast.simplecast.com/