How to scale your model: A systems view of LLMs on TPUs
9 comments
·February 4, 20253abiton
kadushka
Most Pytorch users don’t bother even with the simplest performance optimizations, and you are talking about PTX.
memhole
This is awesome! Can't wait to read it. I've been very curious about why we don't hear more about LLMs on TPUs.
lordswork
This has been my bible for performance work internally at Google. Kind of surprised they released it publicly, but I guess they removed all the Gemini-specific details.
mattjjatgoogle
An author's tweet thread: https://x.com/jacobaustin132/status/1886844716446007300
awongh
Here in the thread he says: https://x.com/jacobaustin132/status/1886844724339675340 : `5 years ago, there were many ML architectures, but today, there is (mostly) only one [transformers].`
To what degree is this actually true, and what else is on the horizon that might become as popular as transformers?
whatever1
How do they make these fancy animations?
perfobotto
What an amazing write up! Thank you very much!
hassleblad23
Great writeup. Congrats.
I am really looking forward for JAX to take over pytorch/cuda over the next years. The whole PTX kerfuffle with Deepseek team shows the value of investing in more low levels approaches to squeeze out the most out of your hardware.