South Korean regulator accuses DeepSeek of sharing user data with ByteDance
145 comments
·February 18, 2025varenc
ahoog42
We analyzed the iOS app[1] and observed similar traffic as well as a number of basic security issues (hardcoded encryption keys, use of 3DES and some traffic over HTTP).
[1] https://www.nowsecure.com/blog/2025/02/06/nowsecure-uncovers...
varenc
Thanks for writing this article! I quite enjoyed it.
question: does the DeepSeek's app use of hardcoded encryption keys rise beyond just their attempt to obfuscate and protect their app's private API endpoints? I believe this an attempt to make abusing their mobile app's private web APIs more difficult since even with cert-pinning disabled and HTTPS MITM'd you still can't observe the real traffic and replicate their requests.
If all its doing is obfuscation though, then I don't understand why pointing out that the keys are hardcoded is meaningful. It certainly doesn't engender trust. But if the app's binary is ultimately decoding some encrypted data, it needs the key, meaning it's ultimately available to the reverse engineer. Whether it's hardcoded or not doesn't matter.
It's a bad look, but if the app used the latest tech and assigned each client its own symmetric encryption key for a session, wouldn't you still be able to access the same data? What would be meaningfully different from a security perspective if they had done this obfuscation better?
someNameIG
I thought Apple disallowed apps using HTTP years ago?
vineyardmike
Apple disallowed HTTP by default, you can flip a bit in the config to allowlist some/all endpoints to HTTP. Not clear what the App Store actually does when reviewing this info when you submit.
nicce
So Deepkseek is not sharing more data than most advertising-funded apps in the world?
HeatrayEnjoyer
Only if they were breaking the law too.
unclebucknasty
Interesting, but I don't think those details will be ameliorative to the people who are concerned (e.g. U.S. Congress).
In fact, I wonder if it may further underscore their concerns, given that it surfaces the interconnectedness between all of these firms.
lmm
Would you say that US-based apps that use e.g. Google Analytics, and therefore share information with Google, "surface the interconnectedness between all of these firms" and are a good reason to e.g. ban apps from US-based developers?
unclebucknasty
My comment starts with the reality that some people (e.g. U.S. Congress) find cause for concern WRT Chinese apps.
This is the reason, say, revelations about interconnectedness matter when it comes to Chinese apps versus U.S. apps.
You may disagree about whether there should be cause for concern, but that's another matter.
But, if you're asking me if I personally think there's cause for concern around allowing a foreign adversary access to your citizenry via social media platforms, then the answer is yes.
And, of course, China itself also believes it's a problem, which is why U.S. social media is banned there.
ben_w
Not the op, but yes, I would; this is why I approve of GDPR and the cookie popup rules and am actively angry at every company who think it's legit to share browsing habbits with more "trusted partner" companies than there were students in my secondary school.
bilbo0s
Yep.
No one cares about the details. (Heck, I'd be willing to wager good money that the politicians and most of their staffers don't even understand the details). In the end, it's just one more reason that Chinese models will not be legal in the US in the near future.
ivewonyoung
> it's just one more reason that Chinese models will not be legal in the US in the near future
This isn't about the model, it's about the mobile app.
The open source model weights are different from the website and the app. The model cannot track you.
Not just Congress, even techies can be confused about these things.
globalnode
Yeah they act holier than thou when someone else takes data but then turn around and do it themselves, I think that's called hypocrisy. Besides, once data goes to your ISP its gone, aren't we better off just limiting data that we want to keep private?
aaron695
[dead]
alexey-salmin
Plot twist: all these people sharing on twitter yet another creative way of mentioning Xi and Tiananmen in a conversation without triggering the protection (count to 11 in roman numbers, leetspeak etc) were in fact collecting the training data for the nextgen LLM-based protection. Well played!
ta988
Yes, they probably all do that. Anthropic primised to pay the winner that broke all their protections. That way they get tens of thousands of free workers trying to get the money. Much cheaper than $300k engineers.
gruez
People had the same theory for chatgpt, except rather than Xi and Tiananmen, it was how to make meth and anti-"woke" topics.
cscurmudgeon
A US Tiananmen-comparable example would be ChatGPT censoring George Floyd's death or killing of Native Americans, etc. ChatGPT doesn't censor these topics
lmm
Huh? TPTB in the US do not try to censor those topics; if anything they encourage discussion of them (or at least did until this year). US "AI" systems censor much the same topics as US social networks, just as Chinese "AI" systems censor much the same topics as Chinese social networks.
countrypao
Major Chinese tech companies often collaborate with government entities, potentially compromising user privacy. Given China's regulatory environment, where authorities can access data held by domestic firms, users worldwide should exercise caution when engaging with platforms from such backgrounds.
therealpygon
Which countries don’t have a process for the same thing…specifically?
FpUser
Here is my story. I needed to buy central console for my car (purchased it a while ago in used cars lot). Went to Amazon and made my selection. Next thing is I see is the warning: this particular console will not fit you car which is MAKE: XXXY, MODEL: YYYY, YEAR: ZZZZ. How's that for data sharing.
linkregister
At some point you have entered your car data while searching for another car part on Amazon. Amazon caches this information.
FpUser
I've never entered ANY car data online. Most likely it was sold to Amazon by my insurance company
pertymcpert
Nope.
codydkdc
m8 they shared their database with the entire world: https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepse...
observationist
"Unintentionally exposed" and "deliberately gave" are two meaningfully different actions, both of which are examples of why much better regulation and legislation of individuals rights over their data are needed.
derelicta
I think the problem can be solved easily by forcing the company behind DeepSeek to simply redirect all the data they've gathered on their user, directly unto a CIA database. Surely this will be considered a good compromise.
yalogin
Shouldn't this be the other way around? TikTok has the most user data for any LLM to train with. I bet they will make a killing with it, unless of course the CCP decrees that they share it for free.
aboardRat4
Firstly, Bytedance is far more than just Tiktok.
Secondly, most data in China is shared among most companies anyway, because, firstly, the government (not necessarily CCP) orders most companies to share data with "technological leaders" and "strategically important" companies, and secondly because computer security is mostly an alien concept to Chinese.
Copyright (broadly speaking, most restrictions on unrestricted dissemination if data) is what is killing the US economy.
suraci
> computer security is mostly an alien concept to Chinese.
that's the main reason
i don't know how the situation is elsewhere, but in China, 2/3 of startups expose their databases on public network with a password 'abc123'
frontfor
How do you know that?
powerapple
loads of rubbish XD The government does not know what data is important, and most time does not know who is technological leader
aboardRat4
Of course they don't know, but they really don't have to.
They designate who is strategic, and those designated strategic tell them what kind of data they need.
nomel
What useful textual user data do you see coming from TikTok? All the text seems very low quality, to the point where I naively assume that including it in training data would decrease performance.
vineyardmike
As the sibling commenter mentioned, the video data itself is useful as we see a rise in multimodal models, but also..
(1) all videos are captioned, automatically then often again by the content creator manually. This data alone is extremely valuable for training purposes.
(2) the videos contain great information about slang terms, and youth vernacular. Which is unique data that is harder to find elsewhere.
(3) young people seem to use TikTok as a search engine, so presumably some of the videos' content must be explicitly valuable enough as an information source, similar to YouTube.
null
jingyibo123
There's nothing technical about it. Funny when many people mentioned propaganda by Deepseek. You're seeing the counter-strike.
kccqzy
> These references suggest deep integration with ByteDance's analytics and performance monitoring infrastructure
I mean when I visit a random website or open a random app, I kind of expect that it will use something like Google Analytics or Firebase Crashlytics so that my "user data" is shared with Google.
If the article wants me to feel outraged about this practice, I don't. I understand that analytics and performance monitoring are often outsourced to a third party, often without a choice of turning off the analytics and performance monitoring features in the first place.
I use the DeepSeek app happily without giving it any data I consider private. I have a separate local DeepSeek distilled model for that.
blackeyeblitzar
Reminder on various DeepSeek problems:
1. DeepSeek is full of propaganda/censorship (https://arstechnica.com/ai/2025/01/the-questions-the-chinese...)
2. They already had a serious security and privacy issue when they left their database wide open and leaked everyone’s chat history (https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepse... )
3. Multiple teams of security researchers found code that links DeepSeek to the Chinese government through China Mobile, who is banned from operating in the US (https://www.pbs.org/newshour/world/researchers-link-deepseek...)
4. Other countries like South Korea are banning DeepSeek already over privacy concerns (https://mashable.com/article/south-korea-blocks-deepseek)
Trump needs to enforce PAFACA and ban TikTok, but also ban DeepSeek, which has the same exact issues since it is also effectively operated by a foreign adversary and poses various security threats.
gruez
>1. DeepSeek is full of propaganda/censorship (https://arstechnica.com/ai/2025/01/the-questions-the-chinese...)
Your source doesn't even mention "propaganda". Moreover, while censorship is concerning is concerning, I don't see how that practically affects users. If I want to know how to center a div, who cares if it's cagey about what happened in 1989?
"Western" AI is also arguably full of "propaganda/censorship". Remember when chatgpt just came out, and conservatives were lambasting it for being "woke"?
>3. Multiple teams of security researchers found code that links DeepSeek to the Chinese government through China Mobile, who is banned from operating in the US (https://www.pbs.org/newshour/world/researchers-link-deepseek...)
Seems like a nothingburger?
"Neither Feroot nor the other researchers observed data transferred to China Mobile when testing logins in North America, but they could not rule out that data for some users was being transferred to the Chinese telecom."
>4. Other countries like South Korea are banning DeepSeek already over privacy concerns (https://mashable.com/article/south-korea-blocks-deepseek)
That's what this thread is about. Many commenters have mentioned why South Korea's actions were dumb.
blackeyeblitzar
> "Western" AI is also arguably full of "propaganda/censorship". Remember when chatgpt just came out, and conservatives were lambasting it for being "woke"?
Clearly a false equivalence. You think government propaganda compelled by a dictatorship with access to a military and nukes is the same thing?
vaidhy
Which government are you talking about? Like the US banned tiktok because Israel did not like it [1]?
Let us be honest here.. China censors directly. US censors indirectly through the private companies [2] and through covert use of force [3]. If you had a pro-Russian stance in 2022 or pro-Palestine stance, you will see your content censored in very subtle ways in US.
1. https://www.kenklippenstein.com/p/tiktok-ban-fueled-by-israe... 2. https://indi.ca/why-i-left-medium-they-defenestrated-me/ 3. https://www.kenklippenstein.com/p/the-fbi-knocked-on-my-door
It is surprising how people in US cannot see the disinformation campaign they are being subjected to.
Some missing context is that the data is shared via the DeepSeek app's use of ByteDance analytics/configuration frameworks. So not a backroom deal where DeekSeek handed over the chat history for its user base, but rather ongoing analytics data being sent from the DeepSeek mobile app.
Here's the SecurityScoreCard article that brought attention to this: https://securityscorecard.com/blog/a-deep-peek-at-deepseek/#...
Besides the usual analytics data (device metadata, user behavior, app performance, errors, etc), it's possible raw chat data is being shared as well, but it's not a smoking gun.