Skip to content(if available)orjump to list(if available)

South Korean regulator accuses DeepSeek of sharing user data with ByteDance

varenc

Some missing context is that the data is shared via the DeepSeek app's use of ByteDance analytics/configuration frameworks. So not a backroom deal where DeekSeek handed over the chat history for its user base, but rather ongoing analytics data being sent from the DeepSeek mobile app.

Here's the SecurityScoreCard article that brought attention to this: https://securityscorecard.com/blog/a-deep-peek-at-deepseek/#...

Besides the usual analytics data (device metadata, user behavior, app performance, errors, etc), it's possible raw chat data is being shared as well, but it's not a smoking gun.

ahoog42

We analyzed the iOS app[1] and observed similar traffic as well as a number of basic security issues (hardcoded encryption keys, use of 3DES and some traffic over HTTP).

[1] https://www.nowsecure.com/blog/2025/02/06/nowsecure-uncovers...

varenc

Thanks for writing this article! I quite enjoyed it.

question: does the DeepSeek's app use of hardcoded encryption keys rise beyond just their attempt to obfuscate and protect their app's private API endpoints? I believe this an attempt to make abusing their mobile app's private web APIs more difficult since even with cert-pinning disabled and HTTPS MITM'd you still can't observe the real traffic and replicate their requests.

If all its doing is obfuscation though, then I don't understand why pointing out that the keys are hardcoded is meaningful. It certainly doesn't engender trust. But if the app's binary is ultimately decoding some encrypted data, it needs the key, meaning it's ultimately available to the reverse engineer. Whether it's hardcoded or not doesn't matter.

It's a bad look, but if the app used the latest tech and assigned each client its own symmetric encryption key for a session, wouldn't you still be able to access the same data? What would be meaningfully different from a security perspective if they had done this obfuscation better?

someNameIG

I thought Apple disallowed apps using HTTP years ago?

vineyardmike

Apple disallowed HTTP by default, you can flip a bit in the config to allowlist some/all endpoints to HTTP. Not clear what the App Store actually does when reviewing this info when you submit.

nicce

So Deepkseek is not sharing more data than most advertising-funded apps in the world?

HeatrayEnjoyer

Only if they were breaking the law too.

unclebucknasty

Interesting, but I don't think those details will be ameliorative to the people who are concerned (e.g. U.S. Congress).

In fact, I wonder if it may further underscore their concerns, given that it surfaces the interconnectedness between all of these firms.

lmm

Would you say that US-based apps that use e.g. Google Analytics, and therefore share information with Google, "surface the interconnectedness between all of these firms" and are a good reason to e.g. ban apps from US-based developers?

ben_w

Not the op, but yes, I would; this is why I approve of GDPR and the cookie popup rules and am actively angry at every company who think it's legit to share browsing habbits with more "trusted partner" companies than there were students in my secondary school.

unclebucknasty

My comment starts with the reality that some people (e.g. U.S. Congress) find cause for concern WRT Chinese apps.

This is the reason, say, revelations about interconnectedness matter when it comes to Chinese apps versus U.S. apps.

You may disagree about whether there should be cause for concern, but that's another matter.

But, if you're asking me if I personally think there's cause for concern around allowing a foreign adversary access to your citizenry via social media platforms, then the answer is yes.

And, of course, China itself also believes it's a problem, which is why U.S. social media is banned there.

bilbo0s

Yep.

No one cares about the details. (Heck, I'd be willing to wager good money that the politicians and most of their staffers don't even understand the details). In the end, it's just one more reason that Chinese models will not be legal in the US in the near future.

ivewonyoung

> it's just one more reason that Chinese models will not be legal in the US in the near future

This isn't about the model, it's about the mobile app.

The open source model weights are different from the website and the app. The model cannot track you.

Not just Congress, even techies can be confused about these things.

globalnode

Yeah they act holier than thou when someone else takes data but then turn around and do it themselves, I think that's called hypocrisy. Besides, once data goes to your ISP its gone, aren't we better off just limiting data that we want to keep private?

aaron695

[dead]

alexey-salmin

Plot twist: all these people sharing on twitter yet another creative way of mentioning Xi and Tiananmen in a conversation without triggering the protection (count to 11 in roman numbers, leetspeak etc) were in fact collecting the training data for the nextgen LLM-based protection. Well played!

ta988

Yes, they probably all do that. Anthropic primised to pay the winner that broke all their protections. That way they get tens of thousands of free workers trying to get the money. Much cheaper than $300k engineers.

gruez

People had the same theory for chatgpt, except rather than Xi and Tiananmen, it was how to make meth and anti-"woke" topics.

cscurmudgeon

A US Tiananmen-comparable example would be ChatGPT censoring George Floyd's death or killing of Native Americans, etc. ChatGPT doesn't censor these topics

lmm

Huh? TPTB in the US do not try to censor those topics; if anything they encourage discussion of them (or at least did until this year). US "AI" systems censor much the same topics as US social networks, just as Chinese "AI" systems censor much the same topics as Chinese social networks.

ChrisArchitect

Related:

South Korea bans new DeepSeek AI downloads

https://news.ycombinator.com/item?id=43076325

countrypao

Major Chinese tech companies often collaborate with government entities, potentially compromising user privacy. Given China's regulatory environment, where authorities can access data held by domestic firms, users worldwide should exercise caution when engaging with platforms from such backgrounds.

therealpygon

Which countries don’t have a process for the same thing…specifically?

codydkdc

m8 they shared their database with the entire world: https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepse...

observationist

"Unintentionally exposed" and "deliberately gave" are two meaningfully different actions, both of which are examples of why much better regulation and legislation of individuals rights over their data are needed.

yalogin

Shouldn't this be the other way around? TikTok has the most user data for any LLM to train with. I bet they will make a killing with it, unless of course the CCP decrees that they share it for free.

aboardRat4

Firstly, Bytedance is far more than just Tiktok.

Secondly, most data in China is shared among most companies anyway, because, firstly, the government (not necessarily CCP) orders most companies to share data with "technological leaders" and "strategically important" companies, and secondly because computer security is mostly an alien concept to Chinese.

Copyright (broadly speaking, most restrictions on unrestricted dissemination if data) is what is killing the US economy.

suraci

> computer security is mostly an alien concept to Chinese.

that's the main reason

i don't know how the situation is elsewhere, but in China, 2/3 of startups expose their databases on public network with a password 'abc123'

frontfor

How do you know that?

powerapple

loads of rubbish XD The government does not know what data is important, and most time does not know who is technological leader

aboardRat4

Of course they don't know, but they really don't have to.

They designate who is strategic, and those designated strategic tell them what kind of data they need.

nomel

What useful textual user data do you see coming from TikTok? All the text seems very low quality, to the point where I naively assume that including it in training data would decrease performance.

vineyardmike

As the sibling commenter mentioned, the video data itself is useful as we see a rise in multimodal models, but also..

(1) all videos are captioned, automatically then often again by the content creator manually. This data alone is extremely valuable for training purposes.

(2) the videos contain great information about slang terms, and youth vernacular. Which is unique data that is harder to find elsewhere.

(3) young people seem to use TikTok as a search engine, so presumably some of the videos' content must be explicitly valuable enough as an information source, similar to YouTube.

nextworddev

arguably video data is more valuable

cwmoore

whoosh

jingyibo123

There's nothing technical about it. Funny when many people mentioned propaganda by Deepseek. You're seeing the counter-strike.

null

[deleted]

kccqzy

> These references suggest deep integration with ByteDance's analytics and performance monitoring infrastructure

I mean when I visit a random website or open a random app, I kind of expect that it will use something like Google Analytics or Firebase Crashlytics so that my "user data" is shared with Google.

If the article wants me to feel outraged about this practice, I don't. I understand that analytics and performance monitoring are often outsourced to a third party, often without a choice of turning off the analytics and performance monitoring features in the first place.

I use the DeepSeek app happily without giving it any data I consider private. I have a separate local DeepSeek distilled model for that.

tokioyoyo

If they sold it, isn't that like what... literally almost every single company does nowadays unless you pay up?

criddell

I don't think so. You can't buy user data from Google or Facebook or Apple or Microsoft and they probably have more of it than anybody else.

lmm

You and I can't buy it because they don't want their competitors getting it. But they'll happily use it to target ads at you, and the US government has access to it and can use it to decide who they want to send their CIA kidnap-torture squads after.

bolognafairy

No, but they let you leverage it.

lolinder

Which is in this case a pretty important distinction. Letting another company leverage user data within the bounding zone which you've defined is not the same thing as is being alleged here, which is actually sharing data.

It's quite literally the difference between exposing a public API and actually handing over the contents of the database.

nialv7

*even if you pay up.

null

[deleted]

derelicta

I think the problem can be solved easily by forcing the company behind DeepSeek to simply redirect all the data they've gathered on their user, directly unto a CIA database. Surely this will be considered a good compromise.

Sohcahtoa82

This is my surprised face -_-

If you're shocked or even the slightest bit surprised, then I can't imagine how blissful your life is to be so unaware about how much corporations are sharing data with each other.

Like, I wholeheartedly expect that if I mention Beyblade toys on Facebook, then the next time I visit Amazon, they'll be suggesting Beyblades even if I've never even searched Amazon for toys, let alone Beyblade.

KeplerBoy

That's literally Meta's business model, they will happily explain how thats going for them in public investors' calls every few months.

With deepseek and bytedance things are a lot less clear cut.

tokioyoyo

Bytedance's entire business model is based on user-targeting and showing things what they might enjoy watching, so they can push more ads to them. I wouldn't be surprised if they bought the data to train their own LLMs.

simlevesque

The terms of use of Deepseek make it very clear they will sell your data.

4ndrewl

How-so less clear-cut? Mysterious and Chinesey so perhaps?

bolognafairy

The ostensible business models of the companies at play.

Stop looking at any opportunity to bark as Sinophobia.

baq

what is less clear cut? you can safely assume they do at least the same things as meta.

TheAceOfHearts

I recently had an experience that genuinely surprised me: I was watching a Peruvian video on YouTube, and I clicked on the creator's Instagram profile link in the description. Literally a few minutes later I received a promotional email with services and investment opportunities from an official Peruvian government email. Somehow opening an Instagram profile of a Peruvian creator got me tagged as a potential investor? But the most shocking part was how quickly this all happened.

cwmoore

Apparently Peru's poise presents preparedness porque es preferido para la presencia de GalApagos al oeste. [sic]

SecretDreams

You could literally talk about beyblade toys on a whatsapp video call and you'll be getting Amazon ads for dem blades the next day.

gruez

There's basically no credible evidence of this happening. All there is are vague anecdotes which are easily explained with confirmation bias and/or the birthday paradox.

SecretDreams

How much did Zuckerberg pay for WhatsApp again?

samplatt

Weird hill to die on, man. Like, sure credible evidence is one of the most important things in the world... but what, are you honestly saying that you're going to be surprised if WhatsApp turned out to be leaking data?

We don't need the pitchforks just yet, sure, but shit, you have to remain realistic about these things.

sitkack

WhatsApp is secured by Signal encryption

SecretDreams

That's nice. I'll remember that next time I talk about beyblades to a friend in WhatsApp and see adds for them on Reddit the next day.

kelnos

WhatsApp is a closed-source client that you cannot trust to faithfully and correctly implement the protocol, or be free of backdoors that allow Meta to snoop on your conversations.

ryandrake

Things can be shocking (as in: causing indignation or disgust), yet totally unsurprising. In fact, I'd argue that most newsworthy events tend to be both terrible and entirely expected, given incentives and the way the world is set up to work.

unclebucknasty

>...corporations are sharing data with each other.

>I wholeheartedly expect that if I mention Beyblade toys on Facebook...

Isn't the lede here that this isn't just some random data sharing agreement between companies, but that these are both Chinese companies, and the recipient of the data has been banned in the U.S. precisely because of data concerns?