Let's talk about AI and end-to-end encryption
45 comments
·January 17, 2025blueblimp
> Yet this approach is obviously much better than what’s being done at companies like OpenAI, where the data is processed by servers that employees (presumably) can log into and access.
No need for presumption here: OpenAI is quite transparent about the fact that they retain data for 30 days and have employees and third-party contractors look at it.
https://platform.openai.com/docs/models/how-we-use-your-data
> To help identify abuse, API data may be retained for up to 30 days, after which it will be deleted (unless otherwise required by law).
https://openai.com/enterprise-privacy/
> Our access to API business data stored on our systems is limited to (1) authorized employees that require access for engineering support, investigating potential platform abuse, and legal compliance and (2) specialized third-party contractors who are bound by confidentiality and security obligations, solely to review for abuse and misuse.
null
nashashmi
The most depressing realization in all of this is that the vast treasure trove of data that we used to have in the cloud thinking it was not scannable even for criminal activity has now become a vector where we shall have thought police coming down upon us for simple ideas of dissent.
AlexandrB
A lot of people tried to sound the alarm. It's not "the cloud", it's "other people's computers". And given that other people own these machines, their interests - whether commercial or ideological - will always come first.
jrm4
"The goal of encryption is to ensure that only two parties, the receiver and sender, are aware of the contents of your data.
Thus, AI training on your data breaks this, because it's another party.
You now don't have encryption."
Thanks for coming to my blah blah blah
ozgune
> Apple even says it will publish its software images (though unfortunately not the source code) so that security researchers can check them over for bugs.
I think Apple recently changed their stance on this. Now, they say that "source code for certain security-critical PCC components are available under a limited-use license." Of course, would have loved it if the whole thing was open source. ;)
https://github.com/apple/security-pcc/
> The goal of this system is to make it hard for both attackers and Apple employees to exfiltrate data from these devices.
I think Apple is claiming more than that. They are saying 1/ they don't keep any user data (data only gets processed during inference), 2/ no privileged runtime access, so their support engineers can't see user data, and 3/ they make binaries and parts of the source code available to security researchers to validate 1/ and 2/.
You can find Apple PCC's five requirements here: https://security.apple.com/documentation/private-cloud-compu...
Note: Not affiliated with Apple. We read through the PCC security guide to see what an equivalent solution would look like in open source. If anyone is interested in this topic, please hit me up at ozgun @ ubicloud . com.
bee_rider
The author helpfully emphasized the interesting question at the end
> This future worries me because it doesn’t really matter what technical choices we make around privacy. It does not matter if your model is running locally, or if it uses trusted cloud hardware — once a sufficiently-powerful general-purpose agent has been deployed on your phone, the only question that remains is who is given access to talk to it. Will it be only you? Or will we prioritize the government’s interest in monitoring its citizens over various fuddy-duddy notions of individual privacy.
I do think there are interesting policy questions there. I mean it could hypothetically be mandated that the government must be given access to the agent (in the sense that we and these companies exist in jurisdictions that can pass arbitrary laws; let’s skip the boring and locale specific discussion of whether you think your local government would pass such a law).
But, on a technical level—it seems like it ought to be possible to run an agent locally, on a system with full disk encryption, and not allow anyone who doesn’t have access to the system to talk with it, right? So on a technical level I don’t see how this is any different from where we were previously. I mean you could also run a bunch of regex’s from the 80’s to find whether or not somebody has, whatever, communist pamphlets on their computers.
There’s always been a question of whether the government should be able to demand access to your computer. I guess it is good to keep in mind that if they are demanding access to an AI agent that ran on your computer, they are basically asking for a lossy record of your entire hard drive.
cryptonector
> The author helpfully emphasized the interesting question at the end
We're already there. AI or not doesn't affect the fact that smartphones gather, store, and transmit a great deal of information about their users and their users' actions and interests.
_boffin_
Unreasonable search?
bee_rider
> (in the sense that we and these companies exist in jurisdictions that can pass arbitrary laws; let’s skip the boring and locale specific discussion of whether you think your local government would pass such a law)
Anyway the idea of what’s a reasonable search in the US has been whittled away to almost nothing, right? “The dog smelled weed on your hard drive.” - A cop, probably.
lowbatt
Maybe a little off topic, but is there a way for a distributed app to connect to one of the LLM companies (OpenAI, etc.) without the unencrypted data hitting an in-between proxy server?
An app I'm building uses LLMs to process messages. I don’t want the unencrypted message to hit my server - and ideally I wouldn’t have the ability to decrypt it. But I can’t communicate directly from client -> LLM Service without leaking the API key.
simonw
"But I can’t communicate directly from client -> LLM Service without leaking the API key."
There is a way you can do that right now: the OpenAI WebRTC API introduced the idea of an "ephemeral key": https://platform.openai.com/docs/guides/realtime-webrtc
This provides a way for your server to create a limited-time API key for a user which their browser can then use to talk to OpenAI's API directly without proxying through you.
I love this idea, but I want it for way more than just the WebRTC API, and I'd like it for other API providers too.
My ideal version would be a way to create an ephemeral API key that's only allowed to talk to a specific model with a specific pre-baked system prompt (and maybe tool configuration and suchlike) and that only works for a limited time and has a limited token budget.
lowbatt
interesting, will check that out. thanks!
whyage
Check out https://www.opaque.co/
michaelmrose
Will such processing be cheap enough to be done by a box that plugs into a customers router to handle such? Would they buy them? Notably not just for this use case but others
crackalamoo
See also CrypTen, Meta's library for privacy preserving machine learning: https://github.com/facebookresearch/CrypTen. This isn't fully homomorphic encryption, but it is multi-party computation (MPC), which hides the inputs from the company owning the model.
But while not revealing user input, it would still reveal the outputs of the model to the company. And yeah, as the article mentions, unfortunately this kind of thing (MPC or fully-homomorphic encryption) probably won't be feasible for the most powerful ML models.
bobbiechen
There is always going to be a gap between local-first processing and what can be achieved in a full-sized datacenter/cloud. That leads to the risks mentioned in the article.
I wrote about Apple's Private Cloud Compute last year; for the foreseeable future, I still think server-side Confidential Computing is the most practical way to do processing without huge privacy risks: https://www.anjuna.io/blog/apple-is-using-secure-enclaves-to...
jFriedensreich
I think this has also a silver lining. The E2E encryption movement especially for messenger apps was largely also used to silently lock users out of their own data and effectively prevent user agency to use their own data to move apps, write automations or archive, this is not just true for whatsapp (the data export feature does not fully work since its launch and was just made to appease some EU law that did not properly check if the button works until the end.) Also signal does not have a way to do this. Maybe with ai coming into the game companies finally decide to provide access to data, I just hope it's in a transparent way with user opt in and user control.
do_not_redeem
1. Is data encrypted in transit?
2. Can the user access their data at rest?
Those two things are entirely orthogonal.
I don't think you can extrapolate a trend from a few apps having bugs in their export code. Google Takeout is also notoriously buggy and they don't use E2E encryption. A more likely explanation is companies of all kinds don't care that much about export functionality, due to the incentives involved.
jFriedensreich
you CAN extrapolate from nearly all e2e encrypted apps not giving a way to use the data. And there is a big difference between buggy google export features or facebook actively making export unusable to lock in users.
walrus01
Signal does not have a way to manually export your private keys and chat history, but the process of "moving" your signal account to a new phone is quite straightforward. You put both devices on the same wifi/LAN layer 2 broadcast segment, start the transfer process in the app, input the verification codes displayed on the screen from both devices, and it sends everything over. This moves the private key in a way that does not result in all of your contacts receiving the scary "this person's key has changed" message.
jFriedensreich
"Moving your account" is not what i talk about, besides not being possible between android and ios with history. User agency means a user is allowed to access his data and do what they want with it how they want, realtime and with whatever code they want to write to do so.
Animats
> Who does your AI agent actually work for?
Yes. I made that point a few weeks ago. The legal concept of principal and agent applies.
Running all content through an AI in the cloud to check for crimethink[1] is becoming a reality. Currently proposed:
- "Child Sexual Abuse Material", which is a growing category that now includes AI-generated images in the US and may soon extend to Japanese animation.
- Threats against important individuals. This may be extended to include what used to be considered political speech in the US.
- Threats against the government. Already illegal in many countries. Bear in mind that Trump likes to accuse people of "treason" for things other than making war against the United States.
- "Grooming" of minors, which is vague enough to cover most interactions.
- Discussing drugs, sex, guns, gay activity, etc. Variously prohibited in some countries.
- Organizing protests or labor unions. Prohibited in China and already searched for.
Note that talking around the issue or jargon won't evade censorship. LLMs can deal with that. Run some ebonics or leetspeak through an LLM and ask it to translate it to standard English. Translation will succeed. The LLM has probably seen more of that dialect than most people.
"If you want a vision of the future, imagine a boot stepping on a face, forever" - Orwell
iugtmkbdfil834
A cynic in me is amused at the yet unknown corporation being placed under investigation due to a trigger phrase in one of the meetings transcribed incorrectly.
Your point is worth reiterating.
Terr_
Or poisoned-data that sets up a trap, so that a system will later confabulate false innocence or guilt when certain topics or targets come up.
walrus01
It's a good thing that encrypted data at rest on your local device is inaccessible to cloud based "AI" tools. The problem is that your average person will blithely click "yes/accept/proceed/continue/I consent" on pop up dialogs in a GUI and agree to just about any Terms of Service, including decrypting your data before it's sent to some "cloud" based service.
I see "AI" tools being used even more in the future to permanently tie people to monthly recurring billing services for things like icloud, microsoft's personal grade of office365, google workspace, etc. You'll pay $15 a month forever, and the amount of your data and dependency on the cloud based provider will mean that you have no viable path to ever stop paying it without significant disruption to your life.
> We are about to face many hard questions about these systems, including some difficult questions about whether they will actually be working for us at all.
And how. I'd lean towards no. Where we're headed feels like XKEYSCORE on steroids. I'd love to take the positive, optimistic bent on this, but when you look at where we've been combined with the behavior of the people in charge of these systems (to be clear, not the researchers or engineers, but c-suite), hope of a neutral, privacy-first future seems limited.