Matrix-vector multiplication implemented in off-the-shelf DRAM for Low-Bit LLMs
4 comments
·May 4, 2025Bolwin
They're doing matrix operations in the Dram itself? That sounds insane and also fascinating
nkurz
Yup, and incredibly they are able to do this on standard RAM by "intentionally violating the timing parameters":
Processing-Using-DRAM (PUD) leverages the inherent analog operational characteristics of DRAM to enable highly parallel bit-serial computations directly within memory arrays. Prior research has demonstrated that commercial off-the-shelf DRAM can achieve PUD functionality without hardware modifications by intentionally violating the timing parameters.
These studies have established two fundamental PUD operations: RowCopy and majority-of-X (MAJX) (Fig. 1). The RowCopy operation facilitates data movement between different rows within a subarray by issuing a PRE command followed immediately by an ACT command before bitline precharging completes, enabling data transfer through the bitlines. This operation affects all cells along a row simultaneously, making it approximately 100 times faster than processor-mediated data movement. The MAJX operation performs a majority vote among X cells sharing the same bitline that are activated simultaneously, implemented in commercial DRAM by issuing ACT, PRE, and ACT commands in rapid succession without delays. This allows concurrent activation of 2∼32 rows. MAJX enables bit-serial computations that leverage the parallelism of subarrays with 65,536 columns, serving as the fundamental computational unit for PUD.
summarity
Getting LLM inference running on any thing is going to be the next “it runs Doom”
This is just mind-bendingly weird and wonderfully creative. It can pay to work in the weeds! Bravo.