Skip to content(if available)orjump to list(if available)

DeepSeek R1 671B over 2 tok/s without GPU on local gaming rig

htrp

The lmsys had a package (flexgen) that did a lot of this similar work (swap GPU to ram to disk)

not sure if it's still being maintained

buyucu

I applaud how hardcore this is. Swapping the model from disk and just keeping the KV cache on the CPU ram.

Oarch

Can someone ELI5 please?

buyucu

deepseek is huge with 671b parameters. they keep it in hard disk, and load it piece by piece to the ram. the innovation is that they kick out everything other than the kv cache from the ram.

null

[deleted]