Skip to content(if available)orjump to list(if available)

HipKittens: Fast and furious AMD kernels

bratao

One thing I don't understand about Nvidia’s valuation is that right now a small number of algorithms have 'won,' such as Transformers. The data is very important. Compared to the past where customized code was much more common, such as modeling code and HPC, the ecosystem was very important and it was almost impossible to implement all CUDA and related code.

Competitors now only need to optimize for a narrow set of algorithms. If a vendor can run vLLM and Transformers efficiently, a massive market becomes available. Consequently, companies like AMD or Huawei should be able to catch up easily. What, then, is Nvidia’s moat? Is InfiniBand enough?"

LtdJorge

The vast amount of CUDA libraries for anything you can think of. I think there’s where they have the biggest leverage.

wmf

Infiniband is being replaced with UEC (and it isn't needed for inference). For inference there is no moat and smart players are buying AMD or Google TPUs.

mandelken

I didn't know you can you buy Google TPUs now?

LtdJorge

Ahh, composable-kernel. The highest offender in the list of software that have produced unrecoverable OOMs in my Gentoo system (it’s actually Clang while compiling CK, which uses upwards of 2.5GB per thread).

wewewedxfgdf

You'd think AMD would swing in on something like this and fund it with the money needed to succeed. I have no knowledge of it but my guess is no, AMD never misses an opportunity to miss an opportunity - when it comes to GPUs and AI.

LtdJorge

First rule of AMD stock is nobody understands AMD stock. I guess it’s also the same for AMD’s software endeavors.