Skip to content(if available)orjump to list(if available)

Beating cuBLAS in Single-Precision General Matrix Multiplication

gdiamos

GEMM has been the workhorse of machine learning. It’s amazing how we’ve ratcheted up the TFLOPs over the years.

I wonder what other algorithms allow hardware optimization like this.