DeepSeek and the Effects of GPU Export Controls
48 comments
·January 23, 2025ioulaum
maxglute
>US should eventually have both lower production costs and energy use in consumer use of AI models
To be determined. PRC can shift compute costs by building cheaper energy. Also arguable they can drive hardware costs to be competitive - % of manufacturing cost of hardware is ~30-40%, PRC save 100s of billions not paying western IP and service fees with indigenous hardware depending on how much central gov wants to squeeze margins. Would not be surprised if they can roll out 4x-8x more 14nm for compute parity and still have cost advantage once they get domestic fabs up at scale.
thijson
I agree, it doesn't really stop them from developing anything cutting edge, there's ways around the export controls, even if they are uneconomical. And the Chinese are no strangers to investing into uneconomical things for vanity or defense purposes.
mycall
Even if you go to yandex.ru, you can find mac mini m4 for sale for same price at in the US. Export controls are pourous.
xnx
> Groq / Cerebras, the fastest LLM inference companies
What do you mean by fastest LLM inference companies? Is there a leaderboard for this?
o999
It is important to keep in mind that GPUs power per $ is what matters, and not per unit.
China can produce much cheaper electronics that can compete even when they aren't as powerful as NVIDIA's
sanjams
> Infrastructure algorithm optimization
> Novel training frameworks
Where can one find more information about these? I keep seeing hand-wavy language like this w.r.t. DeepSeek’s innovation
lhl
I think you haven't been looking too hard in that case. Here is the R1 paper: https://arxiv.org/abs/2501.12948
You can find more papers from the attached author: https://arxiv.org/search/cs?searchtype=author&query=DeepSeek... or title https://arxiv.org/search/?query=DeepSeek&searchtype=title&ab... and go through citations for more.
Of course, you could just search by some of the attached authors as well. Daya Guo, the lead author for the R1 paper has 36 papers on Arxiv: https://arxiv.org/search/cs?query=Guo%2C+Daya&searchtype=aut...
Besides the papers, DeepSeek has an active Github https://github.com/deepseek-ai and https://huggingface.co/deepseek-ai
sanjams
I have read the R1 paper. My observation is that there is no information whatsoever about how they are overcoming the limitations of the H800 compared to the H100 which is what the parent article is about. That's the piece Im curious about.
I will concede that I have not read all their papers or looked through their code, but that's why I asked the question: I hoped someone here might be able to point me to specific places in specific papers instead of a axvix search.
chvid
They wrote a paper. As far as I can tell they applied a smørrebrødsbord approach and that let to the results they got.
diggan
FWIW, I think you meant "Smörgåsbord", which is basically tapas but Swedish-style, like a mix of many different dishes. Smørrebrød is a Danish type of sandwich, I'm guessing smørrebrødsbord would be "a table of Smørrebrød", but I'm not sure how common "smørrebrødsbord", I'm not Danish :)
GaggiX
Their paper goes into the details: https://arxiv.org/abs/2501.12948
whywhywhywhy
Excellent models that need a fraction of compute were obviously going to come from this. OAI is actually encouraged to not to try to make their models because compute is a moat too.
Nyr
This article is assuming that they are being truthful and indeed had access to limited hardware resources, which is doubtful to say the least.
benreesman
I think we should have substantially more confidence in the claims of people who A) haven’t been caught misleading us yet and B) have published extensive code and weights for their absolutely cutting edge stuff and C) aren’t attached to a bunch of other bad behavior (e.g. DDoS crawlers) that we know about.
If there’s news of DeepSeek behaving badly and I missed it, then I take that back, but AFAIK they are at or near the top of the rankings on being good actors.
lopuhin
Why is this doubtful, did you spot any suspicious things in their paper? They make the weights and a lot of training details open as well, which leaves much less room for making stuff up, e.g. you could check training compute requirements from active weight size (which they can't fake as they released the weights) and fp8 training used.
m3kw9
There is rumor the open source is diff from the hosted deepseek so needs more investigation. A bad actor would be someone piping oai models behind a server
ioulaum
It's not actually a 600B+ model. It's a mixture of experts. The actual models are pretty small and thus don't require as much training to reach a decent point.
It's similar to Mixtral having gotten good performance while not having anywhere near OpenAI class money / compute.
null
rbcjvuvy6
[dead]
chvid
DeepSeek shows that it is not the size of your computer that matters the most, rather your talent, and the approach you are taking.
Should have been obvious but now somehow isn't?
diggan
Why can't both matter? No matter your talent, if you don't have access to compute, you can't really test your hypothesis in practice, and if you don't have any talent, all the compute in the world wouldn't matter.
sinuhe69
There is also rumor that they in fact have access to 50000 H100 GPU, and not just H800. 50000 H100 is as big as half of Elon Musk's Colossus!
Cumpiler69
Question: What's stopping China from buying GPUs via third party middle-men countries that don't have export controls to China?
I would assume nothing, similarly to how exports of western tech from western countries somehow magically exploded overnight to Russia's neighbors and everyone is pretending not to notice because it makes money.
ioulaum
They do manage to get smuggled GPUs, but last time I checked, the prices of top GPUs was 5-10x in China, compared to what it should be.
And overall, the controls on sales are being expanded.
The simpler option for them realistically, is not so much that they buy the latest GPUs, but rather that they manage to use them on western cloud services.
The US is looking to track money flows in greater detail to see if funds are ultimately coming from China, but that's some majorly invasive stuff, and not entirely easy to implement on the scale of the planet.
Once you have trained models, inference is almost always less of a hassle.
jdietrich
Because there are export controls on those middle-men countries. Data center GPUs are controlled under the Export Administration Regulations, putting them under broadly the same regulatory regime as critical components for fighter jets or ballistic missiles. If that isn't working to at least severely restrict China's access to GPUs, then we have much bigger things to worry about.
https://exportcontrol.lbl.gov/a-bigger-yard-a-higher-fence-u...
nimbius
Nothing. the US did the very same thing to circumvent export controls by other countries during the cold war in order to obtain sufficient titanium in order to construct the SR71 aircraft.
China has a graphics processor company thats apparently good enough to land it on an entity list.
https://en.wikipedia.org/wiki/Moore_Threads
The sheer number of Chinese companies the US as entity listed for export controls is comical as its basically a blacklist of the entire PRC's tech sector.
export controls work well to do one thing: create a US competitor. china already fabs domestic 3nm chips. theres no reason to think they wont emerge as a serious competitor to NVidia.
spookie
iirc Moore Threads founder worked for nVidia in the past.
vinay427
People in this space are thinking about these problems, including on-chip mechanisms [1] and/or location verification [2], among other proposals.
[1] https://www.cnas.org/publications/reports/secure-governable-...
[2] https://www.iaps.ai/research/location-verification-for-ai-ch...
KaiserPro
Nvidia's fear of getting slapped with a fine.
However there is nothing stopping some company setting up a company in a third country, funding it indirectly and getting them to build a cluster for deepseek/others to access.
After all the location of the servers isn't really an in surmountable problem, so long as the training data is closeby.
londons_explore
> Nvidia's fear of getting slapped with a fine.
I suspect these export restrictions are less black-and-white than you imagine. If Nvidia shipped a lot of GPU's to, say, Brazil, and they ended up being rented to american startups for AI, all would be fine.
But if those same GPU's in Brazil ended up rented to Chinese companies who used them to make state of the art models, then Nvidia would get a big fine and the datacenter would magically catch fire[1].
[1]: https://www.elinfor.com/news/asml-supplier-is-caught-in-a-fi...
dismas
> I suspect these export restrictions are less black-and-white than you imagine.
They definitely are, but things like Golden Sentry and Blue Lantern (amongst other Dual Use Monitoring regimes) can also still look for these sorts of uses. But yes, there's lots of examples of "Country X can't do Y, so we go to country Z and work with them to do Y" sorts of bypasses. Still increases the amount of work required if they want something NATSEC related to work on.
sebzim4500
Probably not much for small quantities but that doesn't scale to buying hundreds of thousands of GPUs.
londons_explore
I saw a multi-mile-long line of brand new cars waiting to cross the land border between Kazakhstan and Russia last year.
wave-function
You would think moving such massive amounts of goods through our country should bring prices down (on everything — cars, machinery, electronics), but no. Everything is still much more expensive than in Russia.
zkid18
i doubt you can ever effectively scrutinise the logistics. just look how creatively people can transfer drugs across the borders.
hendersoon
With $8B in the bank I have some degree of confidence Deepseek evaded the export controls and used full-fat GPUs in addition to the H800s.
null
sschueller
I still don't understand the insane investments in LLM with the believe that it will get us to AGI when that is not possible with LLM. The limitation isn't compute or model size, it's the core concept of LLM.
sebzim4500
Probably they just don't agree with you that LLMs (or derivatives) are incapable of achieving AGI.
eldenring
Why don't you think its possible?
sschueller
LLMs lack semantic understanding and rely on statistical patterns. They are reactive systems without goals, intentions, or the ability to self-improve. They also cannot generate truly novel ideas or concepts beyond their training data.
nbzso
You are using facts and logic. This is not the favorite HN food. Just believe, this is a religion. :)
K0balt
It seems to me the definition of AGI is the real crux here.
What does AGI mean today?
What it used to mean, we passed by a while back.
The old definition in the Research community was that AGI would be able to formulate solutions to novel problems that had not been defined by the programmers. Thus, the “general” intelligence. We’re talking about mouse level intelligence. That’s what AGI meant.
LLMs have demonstrated broad problem solving capabilities across domains, and are capable of making inferences about things and developing a type of internal world model, all by encoding and navigating the cultural-linguistic framework recorded by humanity. We’re way past the old mark.
Now, it seems, the qualification for AGI has been expanded to require:
1 a kind of agency
2 Superhuman accuracy and width/depth of knowledge
3 Vastly superhuman capacity to maintain thousands of simultaneous conversations with thousands of pages of context
4 A level of artistic proficiency, at least in imitation of artistic style
So what does AGI mean now? In 1990, GPT4 would have been called a “limited super-intelligence” in the parlance of the day. Hell even an 8b model could have hit that mark, based on the breadth of accessible knowledge and ability to reason alone.
I would venture to say that my uncle bob, or 10,000 uncle bobs, operating terminals in a pungent call center somewhere, would be deemed “not AGI yet” by current standards, and would be a hell of a lot less useful than an api for deepseek r1 32B.
So, do humans below 110 IQ not qualify as General intelligences?
GaggiX
>when that is not possible with LLM.
According to who? You?
The Chinese do have their home grown GPUs too, although I have the impression that they're not super good.
Even so, if we look at Groq / Cerebras, the fastest LLM inference companies:
They're both based on architectures that are 7nm+, and so architectures that China can produce locally despite the export restrictions.
Ultimately, the export controls are mainly just inconvenience. Not a real blocker.
The Chinese don't need to achieve state of the art chip manufacturing to achieve SOTA AI outcomes.
They just need to make custom silicon specialized for the kinds of AI algorithms they want to scale.
Of course, at scale, that's going to mean that the US should eventually have both lower production costs, and energy use in consumer use of AI models, and that Chinese products will likely be more dependent on the cloud for at least the near future.
The whole strategy seems ultimately meh in a long term sense... Mainly good for building up a sense of mutual enmity and dividing the world ... Which is also going to result in higher cost of living around the world as trade falters.
Sad stuff.