Skip to content(if available)orjump to list(if available)

How to scale your model: A systems view of LLMs on TPUs

3abiton

I am really looking forward for JAX to take over pytorch/cuda over the next years. The whole PTX kerfuffle with Deepseek team shows the value of investing in more low levels approaches to squeeze out the most out of your hardware.

kadushka

Most Pytorch users don’t bother even with the simplest performance optimizations, and you are talking about PTX.

memhole

This is awesome! Can't wait to read it. I've been very curious about why we don't hear more about LLMs on TPUs.

lordswork

This has been my bible for performance work internally at Google. Kind of surprised they released it publicly, but I guess they removed all the Gemini-specific details.

mattjjatgoogle

awongh

Here in the thread he says: https://x.com/jacobaustin132/status/1886844724339675340 : `5 years ago, there were many ML architectures, but today, there is (mostly) only one [transformers].`

To what degree is this actually true, and what else is on the horizon that might become as popular as transformers?

whatever1

How do they make these fancy animations?

perfobotto

What an amazing write up! Thank you very much!

hassleblad23

Great writeup. Congrats.