HipKittens: Fast and furious AMD kernels
7 comments
·November 14, 2025bratao
LtdJorge
The vast amount of CUDA libraries for anything you can think of. I think there’s where they have the biggest leverage.
LtdJorge
Ahh, composable-kernel. The highest offender in the list of software that have produced unrecoverable OOMs in my Gentoo system (it’s actually Clang while compiling CK, which uses upwards of 2.5GB per thread).
wewewedxfgdf
You'd think AMD would swing in on something like this and fund it with the money needed to succeed. I have no knowledge of it but my guess is no, AMD never misses an opportunity to miss an opportunity - when it comes to GPUs and AI.
LtdJorge
First rule of AMD stock is nobody understands AMD stock. I guess it’s also the same for AMD’s software endeavors.
One thing I don't understand about Nvidia’s valuation is that right now a small number of algorithms have 'won,' such as Transformers. The data is very important. Compared to the past where customized code was much more common, such as modeling code and HPC, the ecosystem was very important and it was almost impossible to implement all CUDA and related code.
Competitors now only need to optimize for a narrow set of algorithms. If a vendor can run vLLM and Transformers efficiently, a massive market becomes available. Consequently, companies like AMD or Huawei should be able to catch up easily. What, then, is Nvidia’s moat? Is InfiniBand enough?"