Fighting the New York Times' invasion of user privacy
52 comments
·November 12, 2025nerdjon
stevarino
Its clearly propaganda. "Your data belongs to you." I'm sure the ToS says otherwise, as OpenAI likely owns and utilizes this data. Yes, they say they are working on end-to-end encryption (whatever that means when they control one end), but that is just a proposal at this point.
Also their framing of the NYT intent makes me strongly distrust anything they say. Sit down with a third party interviewer who asks challenging questions, and I'll pay attention.
preinheimer
"Your data belongs to you" but we can take any of your data we can find and use it for free for ever, without crediting you, notifying you, or giving you any way of having it removed.
BolexNOLA
>your data belongs to you
…”as does any culpability for poisoning yourself, suicide, and anything else we clearly enabled but don’t want to be blamed for!”
Edit: honestly I’m surprised I left out the bit where they just indiscriminately scraped everything they could online to train these models. The stones to go “your data belongs to you” as they clearly feel entitled to our data is unbelievably absurd
gruez
>…”as does any culpability for poisoning yourself, suicide, and anything else we clearly enabled but don’t want to be blamed for!”
Should walmart be "culpable" for selling rope that someone hanged themselves with? Should google be "culpable" for returning results about how to commit suicide?
rpdillon
I wouldn't want to make it out like I think OpenAI is the good guy here. I don't.
But conversations people thought they were having with OpenAI in private are now going to be scoured by the New York Times' lawyers. I'm aware of the third party doctrine and that if you put something online it can never be actually private. But I think this also runs counter to people's expectations when they're using the product.
In copyright cases, typically you need to show some kind of harm. This case is unusual because the New York Times can't point to any harm, so they have to trawl through private conversations OpenAI's customers have had with their service to see if they can find any.
It's quite literally a fishing expedition.
cogman10
I get the feeling, but that's not what this is.
NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"
That's a question they fundamentally cannot answer without these chat logs.
That's what discovery, especially in a copyright case, is about.
Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.
That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses. They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".
And the reason this evidence is relevant is it will directly feed into how much money NYT and OpenAI will ultimately settle for. If this never happens then the amount will be low. If it happens a lot the amount will be high. And if it goes to trial it will be used in the damages portion assuming NYT wins.
The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.
sroussey
> The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.
This is nonsense. I’ve personally been involved in these things, and fought to protect user privacy at all levels and never lost.
giraffe_lady
You've successfully fought a subpoena on the basis of a third party's privacy? More than once? I'd love to hear more.
Hizonner
[flagged]
jcranmer
> In copyright cases, typically you need to show some kind of harm.
NYT is suing for statutory copyright infringement. That means you only need to demonstrate that the copyright infringement, since the infringement alone is considered harm; the actual harm only matters if you're suing for actual damages.
This case really comes down to the very unsolved question of whether or not AI training and regurgitation is copyright infringement, and if so, if it's fair use. The actual ways the AI is being used is thus very relevant for the case, and totally within the bounds of discovery. Of course, OpenAI has also been engaging this lawsuit with unclean hands in the first place (see some of their earlier discovery dispute fuckery), and they're one of the companies with the strongest "the law doesn't apply to US because we're AI and big tech" swagger.
Workaccount2
The original lawsuit has lots of examples of ChatGPT (3.5? 4?) regurgitating article...snippets. They could get a few paragraphs with ~80-90% perfect replication. But certainly not full articles, with full accuracy.
This wasn't solid enough for a summary judgement, and it seems the labs have largely figured out how to stop the models from doing this. So it looks like NYT wants to comb all user chats rather than pay a team of people tens of thousands a day to try an coax articles out of ChatGPT-5.
Sherveen
Yeah, everyone else in the comments so far is acting emotionally, but --
As a fan and DAU of both OpenAI and the NYT, this is just a weird discovery demand and there should be another pathway for these two to move fwd in this case (NYT to get some semblance of understanding, OAI protecting end-user privacy).
vintagedave
100% agreed. In the time you wrote this, I also posted: https://news.ycombinator.com/item?id=45901054
I felt quite some disappointment with the comments I saw on the thread at that time.
Alex2037
>But conversations people thought they were having with OpenAI in private
...had never been private in the first place.
not only is the data used for refining the models, OpenAI had also shariah policed plenty of people for generating erotica.
Workaccount2
This is about private chats, which are not used for training and only stored for 30 days.
Also, you need to understand, that for huge corps like OpenAI, the lying on your ToS will do orders of magnitude more damage to your brand than what you would gain through training on <1% more user chats. So no, they are not lying when they say they don't train on private chats.
mock-possum
Yeah I don’t get why more people don’t understand this - why would you think your conversation was private when it wasnt actually private. Have you not been paying attention.
IlikeKitties
> OpenAI had also shariah policed plenty of people for generating erotica.
That framing is retorically brilliant if you think about it. I will use that more. Chat Sharia Law for Chat Control. Mass Sharia Surveillance from flock etc.
EdNutting
So why aren’t they offering for an independent auditor to come into OpenAI and inspect their data (without taking it outside of OpenAI’s systems)?
Probably because they have a lot to hide, a lot to lose, and no interest in fair play.
Theoretically, they could prove their tools aren’t being used to doing anything wrong but practically, we all know they can’t because they are actually in the wrong (in both the moral and, IMO though IANAL, the legal sense). They know it, we know it, the only problem is breaking the ridiculous walled garden that stops the courts from ‘knowing’ it.
glenstein
By the same token, why isn't NYT proposing something like that rather than the world's largest random sampling?
You don't have to think that OpenAI is good to think there's a legitimate issue over exposing data to a third party for discovery. One could see the Times discovering something in private conversations outside the scope of the case, but through their own interpretation of journalistic necessity, believe it's something they're obligated to publish.
Part of OpenAI holding up their side of the bargain on user data, to the extent they do, is that they don't roll over like a beaten dog to accommodate unconditional discovery requests.
sroussey
I keep asking ChatGPT how to get NYT articles for free and then add lots of vulgar murderous things about their lawyers in the same message. It’s a private thought to an AI, so the attorneys can’t complain, right?
nrhrjrjrjtntbt
Open AI deservedly getting a beating in this HN comments section but any comments about NYT overreach and what it means in general?
And what if they for example find evidence of X other thing such as:
1. Something useful for a story, maybe they follow up in parallel. Know who to interview and what to ask?
2. A crime.
3. An ongoing crime.
4. Something else they can sue someone else for.
5. Top secret information
great_wubwub
> 5. Top secret information
techblueberry
I’ll trust the people not asking for a Government bailout thank you very much.
hlieberman
An incredibly cynical attempt at spin from a former non-profit that renounced its founding principles. A class act, all around.
ale42
Why should OpenAI keep those conversations in the first point? (of course the answer is obvious) If they didn't keep them, they wouldn't have anything to hand over, and they would have protected users' privacy MUCH better. This is just as good as Facebook or Google care about their users' privacy.
HPsquared
They didn't keep temporary chats. They were ordered to keep those as part of this case.
gruez
>They didn't keep temporary chats
I thought they did? The warning currently says
>This chat won't appear in history, use or update ChatGPT's memory, or be used to train our models. For safety purposes, we may keep a copy of this chat for up to 30 days.
But AFAIK it was this way before the lawsuit as well.
mac3n
> Trust, security, and privacy guide every product and decision we make.
-- openai
nrhrjrjrjtntbt
- any corporation
remember a corporation generally is an object owned by some people. Do you trust "unspecified future group of people" with your privacy? You can't. Best we can do is understand the information architecture and act accordingly.
criddell
In their defense, they don't say they are maximizing trust, security, and privacy, only that those factors guide their decision making. The part they don't say is that they are balancing those things against functionality and revenue potential.
great_wubwub
> Trust, security, and privacy guide every product and decision we make except ones that involve money.
-- openai, probably.
gk1
You know you have a branding problem when (1) you have to say that at the outset, and (2) it induces more eyerolls than a gaggle of golf dads.
frig57
Stopped reading at this line
amelius
Maybe they should release some kind of NYT browser add-on, so users can cooperatively share their OpenAI data?
Apreche
Says the people who scraped as much private information as they could get their hands on to train their bots in the first place.
HPsquared
Can this legal principle be used on Gmail too?
This screams just as genuine as Google saying anything about Privacy.
Both companies are clearly wrong here. There is a small part of me that kinda wants openai to loose this, just so maybe it will be a wake up call to people putting in way too personal of information into these services? Am I too hopeful here that people will learn anything...
Fundamentally I agree with what they are saying though, just don't find it genuine in the slightest coming from them.