Skip to content(if available)orjump to list(if available)

DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]

red2awn

Worth noting this is not only good on benchmarks, but significantly more efficient at inference https://x.com/_thomasip/status/1995489087386771851

minimaxir

Testing the model in OpenRouter, it's pretty speedy compared to Claude/GPT/Gemini.

zug_zug

Well props to them for continuing to improve, winning on cost-effectiveness, and continuing to publicly share their improvements. Hard not to root for them as a force to prevent an AI corporate monopoly/duopoly.

TIPSIO

It's awesome that stuff like this is open source, but even if you have a basement rig with 4 NVIDIA GeForce RTX 5090 graphic cards ($15-20k machine), can it even run with any reasonable context window that isn't like a crawling 10/tps?

Frontier models are far exceeding even the most hardcore consumer hobbyist requirements. This is even further

bigyabai

People with basement rigs generally aren't the target audience for these gigantic models. You'd get much better results out of an MoE model like Qwen3's A3B/A22B weights, if you're running a homelab setup.

Spivak

Yeah I think the advantage of OSS models is that you can get your pick of providers and aren't locked into just Anthropic or just OpenAI.

null

[deleted]

jodleif

I genuinely do not understand the evaluations of the US AI industry. The chinese models are so close and far cheaper

espadrine

Two aspects to consider:

1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful.

2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better.

On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/

You can notice that, while Chinese models are quite good, the gap to the top is still significant.

However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there).

jodleif

1. Have you seen the Qwen offerings? They have great multi-modality, some even SOTA.

jasonsb

It's all about the hardware and infrastructure. If you check OpenRouter, no provider offers a SOTA chinese model matching the speed of Claude, GPT or Gemini. The chinese models may benchmark close on paper, but real-world deployment is different. So you either buy your own hardware in order to run a chinese model at 150-200tps or give up an use one of the Big 3.

The US labs aren't just selling models, they're selling globally distributed, low-latency infrastructure at massive scale. That's what justifies the valuation gap.

jodleif

Assuming your hardware premise is right (and lets be honest, nobody really wants to send their data to chinese providers) You can use a provider like Cerebras, Groq?

csomar

According to OpenRouter, z.ai is 50% faster than Anthropic; which matches my experience. z.ai does have frequent downtimes but so does Claude.

newyankee

Yet tbh if the US industry had not moved ahead and created the race with FOMO it would not had been easier for Chinese strategy to work either.

The nature of the race may change as yet though, and I am unsure if the devil is in the details, as in very specific edge cases that will work only with frontier models ?

isamuel

There is a great deal of orientalism --- it is genuinely unthinkable to a lot of American tech dullards that the Chinese could be better at anything requiring what they think of as "intelligence." Aren't they Communist? Backward? Don't they eat weird stuff at wet markets?

It reminds me, in an encouraging way, of the way that German military planners regarded the Soviet Union in the lead-up to Operation Barbarossa. The Slavs are an obviously inferior race; their Bolshevism dooms them; we have the will to power; we will succeed. Even now, when you ask questions like what you ask of that era, the answers you get are genuinely not better than "yes, this should have been obvious at the time if you were not completely blinded by ethnic and especially ideological prejudice."

newyankee

but didn't Chinese already surpass the rest of the world in Solar, batteries, EVs among other things ?

cyberlimerence

They did, but the goalposts keep moving, so to speak. We're approximately here : advanced semiconductors, artificial intelligence, reusable rockets, quantum computing, etc. Chinese will never catch up. /s

jazzyjackson

Valuation is not based on what they have done but what they might do, I agree tho it's investment made with very little insight into Chinese research. I guess it's counting on deepseek being banned and all computers in America refusing to run open software by the year 2030 /snark

jodleif

> Valuation is not based on what they have done but what they might do

Exactly what I’m thinking. Chinese models catching rapidly. Soon to be on-par with the big dogs.

bilbo0s

>I guess it's counting on deepseek being banned

And the people making the bets are in a position to make sure the banning happens. The US government system being what it is.

Not that our leaders need any incentive to ban Chinese tech in this space. Just pointing out that it's not necessarily a "bet".

"Bet" imply you don't know the outcome and you have no influence over the outcome. Even "investment" implies you don't know the outcome. I'm not sure that's the case with these people?

zparky

Benchmarks are super impressive, as usual. Interesting to note in table 3 of the paper (p. 15), DS-Speciale is 1st or 2nd in accuracy in all tests, but has much higher token output (50% more, or 3.5x vs gemini 3 in the codeforces test!).

futureshock

The higher token output is not by accident. Certain kinds of logical reasoning problems are solved by longer thinking output. Thinking chain output is usually kept to a reasonable length to limit latency and cost, but if pure benchmark performance is the goal you can crank that up to the max until the point of diminishing returns. DeepSeek being 30x cheaper than Gemini means there’s little downside to max out the thinking time. It’s been shown that you can further scale this by running many solution attempts in parallel with max thinking then using a model to choose a final answer, so increasing reasoning performance by increasing inference compute has a pretty high ceiling.

htrp

what is the ballpark vram / gpu requirement to run this ?

rhdunn

For just the model itself: 4 x params at F32, 2 x params at F16/BF16, or 1 x params at F8, e.g. 685GB at F8. It will be smaller for quantizations, but I'm not sure how to estimate those.

For a Mixture of Experts (MoE) model you only need to have the memory size of a given expert. There will be some swapping out as it figures out which expert to use, or to change expert, but once that expert is loaded it won't be swapping memory to perform the calculations.

You'll also need space for the context window; I'm not sure how to calculate that either.

BoorishBears

3.2-Exp came out in September: this is 3.2, along with a special checkpoint (DeepSeek-V3.2-Speciale) for deep reasoning that they're claiming surpasses GPT-5 and matches Gemini 3.0

https://x.com/deepseek_ai/status/1995452641430651132