Skip to content(if available)orjump to list(if available)

DeepSeek and the Effects of GPU Export Controls

ioulaum

The Chinese do have their home grown GPUs too, although I have the impression that they're not super good.

Even so, if we look at Groq / Cerebras, the fastest LLM inference companies:

They're both based on architectures that are 7nm+, and so architectures that China can produce locally despite the export restrictions.

Ultimately, the export controls are mainly just inconvenience. Not a real blocker.

The Chinese don't need to achieve state of the art chip manufacturing to achieve SOTA AI outcomes.

They just need to make custom silicon specialized for the kinds of AI algorithms they want to scale.

Of course, at scale, that's going to mean that the US should eventually have both lower production costs, and energy use in consumer use of AI models, and that Chinese products will likely be more dependent on the cloud for at least the near future.

The whole strategy seems ultimately meh in a long term sense... Mainly good for building up a sense of mutual enmity and dividing the world ... Which is also going to result in higher cost of living around the world as trade falters.

Sad stuff.

maxglute

>US should eventually have both lower production costs and energy use in consumer use of AI models

To be determined. PRC can shift compute costs by building cheaper energy. Also arguable they can drive hardware costs to be competitive - % of manufacturing cost of hardware is ~30-40%, PRC save 100s of billions not paying western IP and service fees with indigenous hardware depending on how much central gov wants to squeeze margins. Would not be surprised if they can roll out 4x-8x more 14nm for compute parity and still have cost advantage once they get domestic fabs up at scale.

thijson

I agree, it doesn't really stop them from developing anything cutting edge, there's ways around the export controls, even if they are uneconomical. And the Chinese are no strangers to investing into uneconomical things for vanity or defense purposes.

mycall

Even if you go to yandex.ru, you can find mac mini m4 for sale for same price at in the US. Export controls are pourous.

xnx

> Groq / Cerebras, the fastest LLM inference companies

What do you mean by fastest LLM inference companies? Is there a leaderboard for this?

o999

It is important to keep in mind that GPUs power per $ is what matters, and not per unit.

China can produce much cheaper electronics that can compete even when they aren't as powerful as NVIDIA's

sanjams

> Infrastructure algorithm optimization

> Novel training frameworks

Where can one find more information about these? I keep seeing hand-wavy language like this w.r.t. DeepSeek’s innovation

lhl

I think you haven't been looking too hard in that case. Here is the R1 paper: https://arxiv.org/abs/2501.12948

You can find more papers from the attached author: https://arxiv.org/search/cs?searchtype=author&query=DeepSeek... or title https://arxiv.org/search/?query=DeepSeek&searchtype=title&ab... and go through citations for more.

Of course, you could just search by some of the attached authors as well. Daya Guo, the lead author for the R1 paper has 36 papers on Arxiv: https://arxiv.org/search/cs?query=Guo%2C+Daya&searchtype=aut...

Besides the papers, DeepSeek has an active Github https://github.com/deepseek-ai and https://huggingface.co/deepseek-ai

sanjams

I have read the R1 paper. My observation is that there is no information whatsoever about how they are overcoming the limitations of the H800 compared to the H100 which is what the parent article is about. That's the piece Im curious about.

I will concede that I have not read all their papers or looked through their code, but that's why I asked the question: I hoped someone here might be able to point me to specific places in specific papers instead of a axvix search.

chvid

They wrote a paper. As far as I can tell they applied a smørrebrødsbord approach and that let to the results they got.

diggan

FWIW, I think you meant "Smörgåsbord", which is basically tapas but Swedish-style, like a mix of many different dishes. Smørrebrød is a Danish type of sandwich, I'm guessing smørrebrødsbord would be "a table of Smørrebrød", but I'm not sure how common "smørrebrødsbord", I'm not Danish :)

GaggiX

Their paper goes into the details: https://arxiv.org/abs/2501.12948

whywhywhywhy

Excellent models that need a fraction of compute were obviously going to come from this. OAI is actually encouraged to not to try to make their models because compute is a moat too.

Nyr

This article is assuming that they are being truthful and indeed had access to limited hardware resources, which is doubtful to say the least.

benreesman

I think we should have substantially more confidence in the claims of people who A) haven’t been caught misleading us yet and B) have published extensive code and weights for their absolutely cutting edge stuff and C) aren’t attached to a bunch of other bad behavior (e.g. DDoS crawlers) that we know about.

If there’s news of DeepSeek behaving badly and I missed it, then I take that back, but AFAIK they are at or near the top of the rankings on being good actors.

lopuhin

Why is this doubtful, did you spot any suspicious things in their paper? They make the weights and a lot of training details open as well, which leaves much less room for making stuff up, e.g. you could check training compute requirements from active weight size (which they can't fake as they released the weights) and fp8 training used.

m3kw9

There is rumor the open source is diff from the hosted deepseek so needs more investigation. A bad actor would be someone piping oai models behind a server

ioulaum

It's not actually a 600B+ model. It's a mixture of experts. The actual models are pretty small and thus don't require as much training to reach a decent point.

It's similar to Mixtral having gotten good performance while not having anywhere near OpenAI class money / compute.

ur-whale

> It's not actually a 600B+ model. It's a mixture of experts.

Is this described in the paper or was this inferred from the model itself ?

Just curious, especially if the latter.

lopuhin

It's a 600B+ mixture of experts and yes it's described in the paper, GitHub, etc.

null

[deleted]

rbcjvuvy6

[dead]

chvid

DeepSeek shows that it is not the size of your computer that matters the most, rather your talent, and the approach you are taking.

Should have been obvious but now somehow isn't?

diggan

Why can't both matter? No matter your talent, if you don't have access to compute, you can't really test your hypothesis in practice, and if you don't have any talent, all the compute in the world wouldn't matter.

sinuhe69

There is also rumor that they in fact have access to 50000 H100 GPU, and not just H800. 50000 H100 is as big as half of Elon Musk's Colossus!

Cumpiler69

Question: What's stopping China from buying GPUs via third party middle-men countries that don't have export controls to China?

I would assume nothing, similarly to how exports of western tech from western countries somehow magically exploded overnight to Russia's neighbors and everyone is pretending not to notice because it makes money.

https://i.imgur.com/kDCsxbt.jpeg

ioulaum

They do manage to get smuggled GPUs, but last time I checked, the prices of top GPUs was 5-10x in China, compared to what it should be.

And overall, the controls on sales are being expanded.

The simpler option for them realistically, is not so much that they buy the latest GPUs, but rather that they manage to use them on western cloud services.

The US is looking to track money flows in greater detail to see if funds are ultimately coming from China, but that's some majorly invasive stuff, and not entirely easy to implement on the scale of the planet.

Once you have trained models, inference is almost always less of a hassle.

jdietrich

Because there are export controls on those middle-men countries. Data center GPUs are controlled under the Export Administration Regulations, putting them under broadly the same regulatory regime as critical components for fighter jets or ballistic missiles. If that isn't working to at least severely restrict China's access to GPUs, then we have much bigger things to worry about.

https://exportcontrol.lbl.gov/a-bigger-yard-a-higher-fence-u...

nimbius

Nothing. the US did the very same thing to circumvent export controls by other countries during the cold war in order to obtain sufficient titanium in order to construct the SR71 aircraft.

China has a graphics processor company thats apparently good enough to land it on an entity list.

https://en.wikipedia.org/wiki/Moore_Threads

The sheer number of Chinese companies the US as entity listed for export controls is comical as its basically a blacklist of the entire PRC's tech sector.

export controls work well to do one thing: create a US competitor. china already fabs domestic 3nm chips. theres no reason to think they wont emerge as a serious competitor to NVidia.

spookie

iirc Moore Threads founder worked for nVidia in the past.

vinay427

People in this space are thinking about these problems, including on-chip mechanisms [1] and/or location verification [2], among other proposals.

[1] https://www.cnas.org/publications/reports/secure-governable-...

[2] https://www.iaps.ai/research/location-verification-for-ai-ch...

KaiserPro

Nvidia's fear of getting slapped with a fine.

However there is nothing stopping some company setting up a company in a third country, funding it indirectly and getting them to build a cluster for deepseek/others to access.

After all the location of the servers isn't really an in surmountable problem, so long as the training data is closeby.

londons_explore

> Nvidia's fear of getting slapped with a fine.

I suspect these export restrictions are less black-and-white than you imagine. If Nvidia shipped a lot of GPU's to, say, Brazil, and they ended up being rented to american startups for AI, all would be fine.

But if those same GPU's in Brazil ended up rented to Chinese companies who used them to make state of the art models, then Nvidia would get a big fine and the datacenter would magically catch fire[1].

[1]: https://www.elinfor.com/news/asml-supplier-is-caught-in-a-fi...

dismas

> I suspect these export restrictions are less black-and-white than you imagine.

They definitely are, but things like Golden Sentry and Blue Lantern (amongst other Dual Use Monitoring regimes) can also still look for these sorts of uses. But yes, there's lots of examples of "Country X can't do Y, so we go to country Z and work with them to do Y" sorts of bypasses. Still increases the amount of work required if they want something NATSEC related to work on.

sebzim4500

Probably not much for small quantities but that doesn't scale to buying hundreds of thousands of GPUs.

londons_explore

I saw a multi-mile-long line of brand new cars waiting to cross the land border between Kazakhstan and Russia last year.

wave-function

You would think moving such massive amounts of goods through our country should bring prices down (on everything — cars, machinery, electronics), but no. Everything is still much more expensive than in Russia.

zkid18

i doubt you can ever effectively scrutinise the logistics. just look how creatively people can transfer drugs across the borders.

hendersoon

With $8B in the bank I have some degree of confidence Deepseek evaded the export controls and used full-fat GPUs in addition to the H800s.

null

[deleted]

sschueller

I still don't understand the insane investments in LLM with the believe that it will get us to AGI when that is not possible with LLM. The limitation isn't compute or model size, it's the core concept of LLM.

sebzim4500

Probably they just don't agree with you that LLMs (or derivatives) are incapable of achieving AGI.

eldenring

Why don't you think its possible?

sschueller

LLMs lack semantic understanding and rely on statistical patterns. They are reactive systems without goals, intentions, or the ability to self-improve. They also cannot generate truly novel ideas or concepts beyond their training data.

nbzso

You are using facts and logic. This is not the favorite HN food. Just believe, this is a religion. :)

K0balt

It seems to me the definition of AGI is the real crux here.

What does AGI mean today?

What it used to mean, we passed by a while back.

The old definition in the Research community was that AGI would be able to formulate solutions to novel problems that had not been defined by the programmers. Thus, the “general” intelligence. We’re talking about mouse level intelligence. That’s what AGI meant.

LLMs have demonstrated broad problem solving capabilities across domains, and are capable of making inferences about things and developing a type of internal world model, all by encoding and navigating the cultural-linguistic framework recorded by humanity. We’re way past the old mark.

Now, it seems, the qualification for AGI has been expanded to require:

1 a kind of agency

2 Superhuman accuracy and width/depth of knowledge

3 Vastly superhuman capacity to maintain thousands of simultaneous conversations with thousands of pages of context

4 A level of artistic proficiency, at least in imitation of artistic style

So what does AGI mean now? In 1990, GPT4 would have been called a “limited super-intelligence” in the parlance of the day. Hell even an 8b model could have hit that mark, based on the breadth of accessible knowledge and ability to reason alone.

I would venture to say that my uncle bob, or 10,000 uncle bobs, operating terminals in a pungent call center somewhere, would be deemed “not AGI yet” by current standards, and would be a hell of a lot less useful than an api for deepseek r1 32B.

So, do humans below 110 IQ not qualify as General intelligences?

GaggiX

>when that is not possible with LLM.

According to who? You?