Skip to content(if available)orjump to list(if available)

PyTorch Internals: Ezyang's Blog

PyTorch Internals: Ezyang's Blog

32 comments

·March 22, 2025

smokel

Also interesting in this context is the PyTorch Developer Podcast [1] by the same author. Very comforting to learn about PyTorch internals while doing the dishes.

[1] https://pytorch-dev-podcast.simplecast.com/

swyx

i think the problem w the podcast format (ironic for me to say) is that it assumes a lot higher familiarity with the apis than is afforded by any visual medium including blogs

smokel

Agreed, but I'm still very happy that some people try. I'm really not that much interested in the weather or listening to idle chit-chat, and for some reason most podcasts seem to focus on that.

null

[deleted]

banana_dick_8

[flagged]

zcbenz

For learning internals of ML frameworks I recommend reading the source code of MLX: https://github.com/ml-explore/mlx .

It is a modern and clean codebase without legacies, and I could understand most things without seeking external articles.

ForceBru

Why is MLX Apple silicon only? Is there something fundamental that prevents it from working on x86? Are some core features only possible on Apple silicon? Or do the devs specifically refuse to port to x86? (Which is understandable, I guess)

I'm asking because it seems to have nice autodiff functionality. It even supports differentiating array mutation (https://ml-explore.github.io/mlx/build/html/usage/indexing.h...), which is something JAX and Zygote.jl can't do. Instead, both have ugly tricks like `array.at[index].set` and the `Buffer` struct.

So it would be cool to have this functionality on a "regular" CPU.

zcbenz

Most features are already supported on x86 CPUs, you can pip install mlx on Linux , and you can even use it on Windows (no official binary release yet but it is building and tests are passing).

saagarjha

I think it relies heavily on unified memory.

alexrigler

This is a fun blast from the near past. I helped organize the PyTorch NYC meetup where Ed presented this and still think it's one of the best technical presentations I've seen. Hand drawn slides for the W. Wish I recorded :\

chuckledog

Great article, thanks for posting. Here’s a nice summary of automatic differentiation, mentioned in the article and core to how NN’s are implemented: https://medium.com/@rhome/automatic-differentiation-26d5a993...

hargun2010

I guess its longer version of slides but not new I saw comment from as far back as 2023, nonetheless good content (resharable).

https://web.mit.edu/~ezyang/Public/pytorch-internals.pdf

aduffy

Edward taught a Programming Languages class I took nearly a decade ago, and clicking through here I immediately recognized the illustrated slides, brought a smile to my face

vimgrinder

For someone it might help: If you are having trouble reading long articles, try text-to-audio with line highlight. It helps a lot. It has cured my lack of attention.

PeterStuer

No trouble reading the article. Those slides though. Make my eyes hurt :(

vimgrinder

they were constantly referred too in the text :/ impossible to skip

pizza

Btw, would anyone have any good resources on using pytorch as a general-purpose graph library? Like stuff beyond the assumption of nets = forward-only (acyclic) digraph

null

[deleted]

brutus1979

Is there a video version of this? It seems it is from a talk?

nitrogen99

2019. How much of this is still relevant?

mlazos

I used this to onboard to the PyTorch team a few years ago. It’s useful for understanding the key concepts of the framework. Torch.compile isn’t covered but the rest of it is still pretty relevant.

kadushka

I’m guessing about 80%

sidkshatriya

To understand a complex system, sometimes it better to understand a (simpler) model system. Sometimes an older version of the same system is that good model system. This is not true always but a good rule of thumb.