How to Run DeepSeek R1 Distilled Reasoning Models on RyzenAI and Radeon GPUs

jamesaross

I have a Radeon 7900 XTX 24GB and have been using the deepseek-r1:14b for a couple days. It achieves about 45 tokens/s. Only after reading this article did I realize that the 32B model would also fit entirely (23GB used). And since Ollama [0] was already installed, it as as easy as running: ollama run deepseek-r1:32b

The 32B model achieves about 25 tokens/s, which is faster than I can read. However, the "thinking" time is mostly a lower quality overhead taking ~1-4 minutes before the Solution/Answer

You can view the model performance within ollama using the command: /set verbose

[0] https://github.com/ollama/ollama

waltercool

Yup, this is what deepseek does.

The good thing of 32B is being as good as 70B at many benchmarks according to Deepseek documentation

https://huggingface.co/deepseek-ai/DeepSeek-R1#distilled-mod...

stonecharioteer

I've been running 32b as well.

But I cannot find it in LM Studio, what am I doing wrong that I only find distilled models?

nickthegreek

32b is distilled model. Only the 670b is not.

larntz

I wrote a similar post about a week ago, but for an [unsupported] Radeon RX 5500 with 4Gi RAM with ollama and fedora 41. Can only run llama:3.2 or deepseek-r1:1.5b, but they're pretty usable if you're ok with a small model and it's for personal use.

I didn't go into detail about how to setup openweb-ui, but there is documentation for the on the project's site.

https://blue42.net/linux-ollama-radeon-rx5500/post/

saelthavron

You have a typo in your ollama.service:

Environmetn="ROCR_VISIBLE_DEVICES=1"

The 't' and 'n' are transposed.

tarasglek

Great blog

FloatArtifact

Has anyone seen an in-depth comparison between radeon 7900 xtx vs 3090 rtx specific to R1?

AbuAssar

https://community.amd.com/t5/ai/experience-the-deepseek-r1-d...

grigio

Ok, but how many tokens/sec will run a 70B model on AMD 395+ ?

heavyset_go

As an aside, either the latest Linux or 6.14 has/will have support for Ryzen XDNA AI chips on their mobile APUs.

Might not be appropriate for this model, but it could be for small models.

lostmsu

What is the programming interface for these drivers? E.g. how do I actually call the NPU? Any existing software?

heavyset_go

Not a clue, it's new to me as well. This[1] might have some relevant info.

[1] https://github.com/amd/xdna-driver

ekianjo

any idea how they will appear to the OS? As additional processors?

p_l

A coprocessor available by mmap/ioctl over some special device files, slightly different from existing XDNA support due to different management interface (the actual platform has been sold for some time as part of high-end FPGAs, but "RyzenAI" has different integration interface on silicon)

numpad0

There are also Vulkan backends included in most all-in-one LLM runners for a while; can be useful for cards not supported by ROCm.

shosca

in my case with a 6900xt:

1. sudo pacman -S ollama-rocm

2. ollama serve

3. ollama run deepseek-r1:32b

larntz

Does that entire model fit in gpu memory? How's it run?

I tried running a model larger than ram size and it loads some layers into the gpu but offloads to the cpu also. It's faster than cpu alone for me, but not by a lot.

shosca

you're right, actually noticed gpu clocking up and down with 32b, 14b clocks up fully and actually runs faster

heavyset_go

Nice, last time I tried out ROCm on Arch a few years ago it was a nightmare. Glad to see it's just one package install away these days, assuming you didn't do any setup beforehand.

qskousen

I think you do still have to have the ROCm drivers installed, but it's not very hard to do from AMD's website.

shosca

everything from arch repos, well cachyos and arch :)

HN

How to Run DeepSeek R1 Distilled Reasoning Models on RyzenAI and Radeon GPUs

How to Run DeepSeek R1 Distilled Reasoning Models on RyzenAI and Radeon GPUs