Skip to content(if available)orjump to list(if available)

Matrix Core Programming on AMD CDNA Architecture

phkahler

So from CDNA3 to 4 they doubled fp16 and fp8 performance but cut fp32 and fp64 by half?

Wonder why the regression on non-AI workloads?

bigdict

cuz area and power