Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy

37 comments

·April 14, 2025

mattnewton

I worked on a similar system at Google for gboard, the Google branded android keyboard that we called “federated analytics” - it worked with device-to-device communication and invertable bloom lookup tables. I’m still not super sure how the Apple system works after reading it, but I don’t see ant mention of using data structures like that, instead they are polling the devices themselves it seems? Does anyone else have more insight to the mechanics, because that seems super inefficient?

https://research.google/blog/improving-gboard-language-model...

matthewdgreen

I went looking for exactly this information the other day. I was surprised to find that it's hard to come up with recent, detailed explanations of what Apple is doing for telemetry collection. When they announced their DP systems back in 2017, they were clearly doing something like Google's RAPPOR [1]. But it's been several years since then and their writeups haven't been updated very much at all [2].

This is pretty important, because these systems aren't so robust that you can just assume everything is working without review. (See, for example, this paper [3].) Apple should at least document what kinds of data are being collected, and precisely how the collection process works.

[1] https://static.googleusercontent.com/media/research.google.c... [2] https://www.apple.com/privacy/docs/Differential_Privacy_Over... [3] https://arxiv.org/pdf/1709.02753

jsenn

> This approach works by randomly polling participating devices for whether they’ve seen a particular fragment, and devices respond anonymously with a noisy signal. By noisy, we mean that devices may provide the true signal of whether a fragment was seen or a randomly selected signal for an alternative fragment or no matches at all. By calibrating how often devices send randomly selected responses, we ensure that hundreds of people using the same term are needed before the word can be discoverable. As a result, Apple only sees commonly used prompts, cannot see the signal associated with any particular device, and does not recover any unique prompts. Furthermore, the signal Apple receives from the device is not associated with an IP address or any ID that could be linked to an Apple Account. This prevents Apple from being able to associate the signal to any particular device.

The way I read this, there's no discovery mechanism here, so Apple has to guess a priori which prompts will be popular. How do they know what queries to send?

vineyardmike

I think the do guess a priori what to query...

Later in the article, for a different (but similar) feature:

> To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics... We then derive a representation, called an embedding, of each synthetic message that captures some of the key dimensions of the message like language, topic, and length. These embeddings are then sent to a small number of user devices that have opted in to Device Analytics.

It's crazy to think Apple is constantly asking my iPhone if I ever write emails similar to emails about tennis lessons (their example). This feels like the least efficient way to understand users in this context. Especially considering they host an email server!

jsenn

yeah, the linked paper [1] has more detail--basically they seem to start with a seed set of "class labels" and subcategories (e.g. "restaurant review" + "steak house"). They ask an LLM to generate lots of random texts incorporating those labels. They make a differentially private histogram of embedding similarities from those texts with the private data, then use that histogram to resample the texts, which become the seeds for the next iteration, sort of like a Particle Filter.

I'm still unclear on how you create that initial set of class labels used to generate the random seed texts, and how sensitive the method is to that initial corpus.

[1] https://arxiv.org/abs/2403.01749

warkdarrior

You could brute force it by querying about all 500k English words. With 1.3+ billion iPhone users, that means about 2600 users will see any goven word, which may be enough to observe trends.

halJordan

No i think it's fairly well guaranteed that devices are encrypting and then submitting prompts. Differential encryption allows them to do honest-to-god work without decrypting the data. The "fragments" the polled devices are sent are probably some sub-sequence of the differentially encrypted prompt.

E: i guess I'm wrong, apologies

airstrike

> Improving Genmoji

I find it odd that they keep insisting on this to the point that it's the very first example. I'm willing to bet 90% of users don't use genmoji and the 10% who have used it on occasion mostly do it for the lulz at how bizarre the whole thing is.

It seems to me that they don't really have a vision for Apple Intelligence, or at least not a compelling one.

LPisGood

In the last weeks I have used it for things like very specific drug jokes and a baseball bat.

bitpush

When other companies are curing skin cancer, discovering new proteins, creating photorealistic images/videos, Apple is .. creating Genmojis. lol

threeseed

Apple has done far more for global health with the Apple Watch, Fitness and Health.

And they have a dedicated app for participating in clinical studies: https://www.apple.com/ios/research-app/

bitpush

Than Medtronic? Than Astra Zeneca? Than J&J?

cheschire

Apple has always been like this. They are never the first one to cross the first few checkpoints. They watch what the winning-est competitors are all doing and then they try to copy that to win the race overall.

If they hadn’t saddled themselves with the privacy promises, or if OpenAI were willing to uphold those same promises, then I bet Siri would’ve been wholly replaced by ChatGPT by now.

null

[deleted]

matt3210

I don't want AI to be part of anything I do unless it's opt-in. When I want to use AI I'll go use AI I don't need or want it integrated into my other tools.

I especially dont want it nativly on my phone or macbook unless it's opt-in. the opt-out stuff is soooo frustrating.

null

[deleted]

lapcat

The article says "opt-in" many times, but my experience as an Apple user, with many devices, is that Apple automatically opts you into analytics, and you have to opt out.

threeseed

They ask you every time you setup a new device / upgrade the OS whether you want to share analytics or not.

It is opt-in but you just need to click a single checkbox:

https://user-images.githubusercontent.com/3705482/142927547-...

lapcat

That looks to me like Apple opts you in, and you have to opt out.

threeseed

Yes just clarifying that you are re-asked every time you upgrade your OS.

So it's not like Apple is just quietly opting you in.

LPisGood

I just got a new MacBook and I felt reasonably inundated with requests to opt in to things.

billyboar

Why are they obsessed with genmoji ffs

specialist

Maybe normalizing avatars to prep users for their planned future perfect black-emoji-sun-verse?

aalimov_

Could be that it’s a popular feature among some portion of their users.

martin_drapeau

I often write in Frenglish (French and English). Apple auto-complete gets so confused and is utterly useless. ChatGPT can easily switch from one language to another. I wish the auto-complete had ChatGPT's power.

null

[deleted]

cadamsdotcom

[flagged]

null

[deleted]

dkga

That is all very nice but as an Apple user I think they need to step up their game with respect to user experience. I often need to switch between three languages in iPhone and the Mac and the keyboard autocorrection and suggestions have become notably worse, not better. Especially since they introduced the dual keyboard.

klipt

FYI the dual keyboard isn't mandatory, you can still add and use single language keyboards.

I assume the dual keyboard is aimed at people who code switch regularly between two languages in the same message.

null

[deleted]

johnea

[flagged]

null

[deleted]

hayst4ck

[flagged]

null

[deleted]

HN

Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy

Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy