Skip to content(if available)orjump to list(if available)

OpenAI slams court order to save all ChatGPT logs, including deleted chats

sinuhe69

Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Billion people use the internet daily. If any organization suspects some people use the Internet for illicit purposes eventually against their interests, would the court order the ISP to log all activities of all people? Would Google be ordered to save the search of all its customers because some might use it for bad things? And once we start, where will we stop? Crimes could happen in the past or in the future, will the court order the ISP and Google to retain the logs for 10 years, 20 years? Why not 100 years? Who should bear the cost for such outrageous demands?

The consequences of such orders are of enormous impact the puny judge can not even begin to comprehend. Privacy right is an integral part of the freedom of speech, a core human right. If you don’t have private thoughts, private information, anybody can be incriminated against them using these past information. We will cease to exist as individuals and I argue we will cease to exist as human as well.

capnrefsmmat

Courts have always had the power to compel parties to a current case to preserve evidence. (For example, this was an issue in the Google monopoly case, since Google employees were using chats set to erase after 24 hours.) That becomes an issue in the discovery phase, well after the defendant has an opportunity to file a motion to dismiss. So a case with no specific allegation of wrongdoing would already be dismissed.

The power does not extend to any of your hypotheticals, which are not about active cases. Courts do not accept cases on the grounds that some bad thing might happen in the future; the plaintiff must show some concrete harm has already occurred. The only thing different here is how much potential evidence OpenAI has been asked to retain.

dragonwriter

> Courts have always had the power to compel parties to a current case to preserve evidence.

Not just that, even without a specific court order parties to existing or reasonably anticipated litigation have a legal obligation that attaches immediately to preserve evidence. Courts tend to issue orders when a party presents reason to believe another party is out of compliance with that automatic obligation, or when there is a dispute over the extent of the obligation. (In this case, both factors seem to be in play.)

btown

Lopez v. Apple (2024) seems to be a recent and useful example of this; my lay understanding is that Apple was found to have failed in its duty to switch from auto-deletion (even if that auto-deletion was contractually promised to users) to an evidence-preservation level of retention, immediately when litigation was filed.

https://codiscovr.com/news/fumiko-lopez-et-al-v-apple-inc/

https://app.ediscoveryassistant.com/case_law/58071-lopez-v-a...

Perhaps the larger lesson here is: if you don't want your service provider to end up being required to retain your private queries, there's really no way to guarantee it, and the only real mitigation is to choose a service provider who's less likely to be sued!

(Not a lawyer, this is not legal advice.)

golol

So if Amazon sues Google, claiming that it is being disadvantaged in search rankings, a court should be able to force Google to log all search activity, even when users delete it?

cogman10

Yes. That's how the US court system works.

Google can (and would) file to keep that data private and only the relevant parts would be publicly available.

A core aspect to civil lawsuits is everyone gets to see everyone else's data. It's that way to ensure everything is on the up and up.

dragonwriter

If Amazon sues Google, a legal obligation to preserve all evidence reasonably related to the subject of the suit attaches immediately when Google becomes aware of the suit, and, yes, if there is a dispute about the extent of that obligation and/or Google's actual or planned compliance with it, the court can issue an order relating to it.

monetus

At Google's scale, what would be the hosting costs of this I wonder. Very expensive after a certain point, I would guess.

saddist0

It can be just anonymised search history in this case.

lcnPylGDnU4H9OF

So then the courts need to find who is setting their chats do be deleted and order them to stop. Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs. OpenAI is doing the responsible thing here.

capnrefsmmat

OpenAI is the custodian of the user data, so they are responsible. If you wanted the court (i.e., the plaintiffs) to find specific infringing chatters, first they'd have to get the data from OpenAI to find who it is -- which is exactly what they're trying to do, and why OpenAI is being told to preserve the data so they can review it.

dragonwriter

> So then the courts need to find who is setting their chats do be deleted and order them to stop.

No, actually, it doesn't. Ordering a party to stop destroying evidence relevant to a current case (which is its obligation even without a court order) irrespective of whether someone else asks it to destroy that evidence is both within the well-established power of the court, and routine.

> Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs.

OpenAI is the alleged infringer in the case.

IAmBroom

Under this theory, if a company had employees shredding incriminating documents at night, the court would have to name those employees before ordering them to stop.

That is ridiculous. The company itself receives that order, and is IMMEDIATELY legally required to comply - from the CEO to the newest-hired member of the cleaning staff.

MeIam

Time does not need user logs to prove such a thing if it was true. Times can show that it is possible so they can show how their own users can access the text. Why would they need other user's data?

KaiserPro

> Time does not need user logs to prove such a thing if it was true.

No it needs to show how often it happens to prove a point of how much impact its had.

mandevil

For the most part (there are a few exceptions), in the US lawsuits are not based on "possible" harm but actual observed harm. To show that, you need actual observed user behavior.

dragonwriter

> Times can show that it is possible

The allegation is not that merely that infringement is possible; the actual occurrence and scale are relevant to the case.

mrtksn

>Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Probably because they bothered to pursue such a thing and hundreds of millions people did not.

How do you conclusively know if someone's content generating machine infringe with your rights? By saving all of its input/output for investigation.

It's ridiculous, sure but is it less ridiculous than AI companies claiming that the copyrights shouldn't apply to them because it will be bad for their business?

IMHO those are just growth pain. Back in the day people used to believe that the law don't apply on them because they did it on the internet and they were mostly right because the laws were made for another age. Eventually the laws both for criminal stuff and copyright caught up. Will be the same for AI, now we are in the wild west age of AI.

TimPC

AI companies aren't seriously arguing that copyright shouldn't apply to them because "it's bad for business". The main argument is that they qualify for fair use because their work is transformative which is one of the major criteria for fair use. Fair use is the same doctrine that allows a school to play a movie for educational purposes without acquiring a license for the public performance of that movie. The original works don't have model weights and can't answer questions or interact with a user so the output is substantially different from the input.

c256

> Fair use is the same doctrine that allows a school to play a movie for educational purposes without acquiring a license for the public performance of that movie. This is a pretty bad example, since fair use has been ruled to NOT allow this.

no_wizard

If AI companies don’t want the court headaches they should instead preemptively negotiate with rights holders and get agreements in place for the sharing of data.

mrtksn

Yeah, and the online radio providers argued that they don’t do anything shady, their service was basically just a very long antenna.

Anyway, the laws were not written with this type of processing in mind. In fact the whole idea of intellectual property breaks down now. Just like the early days of the internet.

mandevil

https://www.copyright.gov/title17/92chap1.html#110 seems to this non-lawyer to be a specific carve out allowing movies to be shown, face-to-face, in non-profit educational contexts without any sort of license. The Fair Use Four Factors test (https://www.copyright.gov/title17/92chap1.html#107) isn't even necessary in this example.

Absent a special legal carve-out, you need to get judges to do the Fair Use Four Factors test, and decide on how AI should be treated. To my very much engineer and not legal eye, AI does great on point 3, but loses on points 1, 2, and 4, so it is something that will need to be decided by the judges, how to balance those four factors defined in the law.

rodgerd

> AI companies aren't seriously arguing that copyright shouldn't apply to them because "it's bad for business".

AI companies have, in fact, said that the law shouldn't apply to them or they won't make money. That is literally the argument Nick Clegg is using to ague that copyright protection should be removed from authors and musicians in the UK.

freejazz

That's not entirely true. A lot of their briefing refers to how impractical and expensive it would be to license all the content they need for the models.

AStonesThrow

> allows a school to play a movie

No, it doesn’t. Play 10% of a movie for the purpose of critiquing it, perhaps.

https://fairuse.stanford.edu/overview/fair-use/four-factors/

Fair Use is not an a priori exemption or exception; Fair Use is an “affirmative defense” so once you have your day in court and the judge asks your attorney why you needed to play 10% of Priscilla, Queen of the Desert for your Gender Studies class, then you can run down those Four Factors enumerated by the Stanford article.

Particularly “amount and substantiality”.

Teachers and churches get tripped up by this all the time. But I’ve also been blessed with teachers who were very careful academically and sought to impart the same caution on all students about using copyrighted materials. It is not easy when fonts have entered the chat!

The same reason you or your professor cannot show/perform 100% of an unlicensed film under any circumstance, is the same basis that creators are telling the scrapers that they cannot consume 100% of copyrighted works on that end. And if the risks may involve reproducing 87% of the same work in their outputs, that’s beyond the standard thresholds.

shkkmo

> It's ridiculous, sure but is it less ridiculous than AI companies claiming that the copyrights shouldn't apply to them because it will be bad for their business?

Since that wasn't ever a real argument, your strawman is indeed ridiculous.

The argument is that requiring people to have a special license to process text with an algorithm is a dramatic expansion of the power of copyright law. Expansions of copyright law will inherently advantage large corporate users over individuals as we see already happening here.

New York Times thinks that they have the right to spy on the entire world to see if anyone might be trying to read articles for free.

That is the problem with copyright. That is why copyright power needs to be dramatically curtailed, not dramatically expanded.

dogman144

You raise good points but the target of your support feels misplaced. Want private ai? You must self-host and inspect if it’s phoning home. No way around it in my view.

Otherwise, you are picking your data privacy champions as the exact same companies, people and investors that sold us social media, and did something quite untoward with the data they got. Fool me twice, fool me three times… where is the line?

In other words - OAI has to save logs now? Candidly they probably were already, or it’s foolish not to assume that.

jrm4

Love the spirit of what you say and I practice it myself, literally.

But also, no - Just self-host or it's all your fault is never ever a sufficient answer to the problem.

It's exactly the same as when Exxon says "what are you doing to lower your own carbon footprint?" It's shifting the burden unfairly; companies like OpenAI put themselves out there and thus must ALWAYS be held to task.

dogman144

I actually agree with your disagreement, and my answer is more scoped to a technical audience that has the know how base to deal with it.

I wish it was different and I agree that there’s a massive accountability hole with… who could it be?

Pragmatically it is what it is, self host and hope for bigger picture change.

naming_the_user

Anything else is literally impossible, though.

If you send your neighbour nudes then they have your nudes. You can put in as many contracts as you want, maybe they never digitised it but their friend is over for a drink and walks out of the door with the shoebox of film. Do not pass GO, do not collect.

Conceivably we can try to control things like e.g. is your cellphone microphone on at all times, but once someone else, particularly an arbitrary entity (e.g. not a trusted family member or something) has the data, it is silly to treat it as anything other than gone.

lovich

Then your problem is with the US legal system, not this individual ruling.

You lose your rights to privacy in your papers without a warrant once you hand data off to a third party. Nothing in this ruling is new.

fluidcruft

A pretty clear distinction is that all ISPs in the world are not currently involved in a lawsuit with New York Times and are not accused of deleting evidence. What OpenAI is accused of is significantly different from merely agnostically routing packets between A and B. OpenAI is not raising astronomical funds because they operate as an ISP.

tailspin2019

> Privacy right is an integral part of the freedom of speech

I completely agree with you, but as a ChatGPT user I have to admit my fault in this too.

I have always been annoyed by what I saw as shameless breaches of copyright of thousands of authors (and other individuals) in the training of these LLMs, and I've been wary of the data security/confidentiality of these tools from the start too - and not for no reason. Yet I find ChatGPT et al so utterly compelling and useful, that I poured my personal data[0] into these tools anyway.

I've always felt conflicted about this, but the utility just about outweighed my privacy and copyright concerns. So as angry as I am about this situation, I also have to accept some of the blame too. I knew this (or other leaks or unsanctioned use of my data) was possible down the line.

But it's a wake up call. I've done nothing with these tools which is even slightly nefarious, but I am today deleting all my historical data (not just from ChatGPT[1] but other hosted AI tools) and will completely reassess my approach of using them - likely with an acceleration of my plans to move to using local models as much as I can.

[0] I do heavily redact my data that goes into hosted LLMs, but there's still more private data in there about me than I'd like.

[1] Which I know is very much a "after the horse has bolted" situation...

CamperBob2

Keeping in mind that the purpose of IP law is to promote human progress, it's hard to see how legacy copyright interests should win a fight with AI training and development.

100 years from now, nobody will GAF about the New York Times.

stackskipton

IP law was to promote human progress by giving financial incentive to create this IP knowing it was protected, and you could make money off it.

DannyBee

Lawyer here

First - in the US, privacy is not a constitutional right. It should be, but it's not. You are protected against government searches, but that's about it. You can claim it's a core human right or whatever, but that doesn't make it true, and it's a fairly reductionist argument anyway. It has, fwiw, also historically not been seen as a core right for thousands of years. So i think it's a harder argument to make than you think despite the EU coming around on this. Again, I firmly believe it should be a core right, but asserting that it is doesn't make that true.

Second, if you want the realistic answer - this judge is probably overworked and trying to clear a bunch of simple motions off their docket. I think you probably don't realize how many motions they probably deal with on a daily basis. Imagine trying to get through 145 code reviews a day or something like that. In this case, this isn't the trial, it's discovery. Not even discovery quite yet, if i read the docket right. Preservation orders of this kind are incredibly common in discovery, and it's not exactly high stakes most of the time. Most of the discovery motions are just parties being a pain in the ass to each other deliberately. This normally isn't even a thing that is heard in front of a judge directly, the judge is usually deciding on the filed papers.

So i'm sure the judge looked at it for a few minutes, thought it made sense at the time, and approved it. I doubt they spent hours thinking hard about the consequences.

OpenAI has asked to be heard in person on the motion, i'm sure the judge will grant it, listen to what they have to say, and determine they probably fucked it up, and fix it. That is what most judges do in this situation.

zerocrates

Even in the "protected against government searches" sense from the 4th Amendment, that right hardly exists when dealing with data you send to a company like OpenAI thanks to the third-party doctrine.

pama

Thanks. As an EU citizen am I exempt from this order? How does the judge or the NYTimes or OpenAI know that I am an EU citizen?

ElevenLathe

The court in question has no obligations to you at all.

mananaysiempre

The current legal stance in the US seems to be that you, not being a US person, have no particular legally protected interest in privacy at all, so you have nothing to complain about here and can’t even sue. The only avenue the EU would have to change that is the diplomatic one, but the Commission does not seem to care.

adgjlsfhk1

you aren't and they don't.

tiahura

While the Constitution does not explicitly enumerate a "right to privacy," the Supreme Court has consistently recognized substantive privacy rights through Due Process Clause jurisprudence, establishing constitutional protection for intimate personal decisions in Griswold v. Connecticut (1965), Lawrence v. Texas (2003), and Obergefell v. Hodges (2015).

ComposedPattern

> It has, fwiw, also historically not been seen as a core right for thousands of years.

Nothing has been seen as a core right for thousands of years, as the concept of human rights is only a few hundred years old.

HardCodedBias

"First - in the US, privacy is not a constitutional right"

What? The supreme court disagreed with you in Griswold v. Connecticut (1965) and Roe v. Wade (1973).

While one could argue that they were vastly stretching the meaning of words in these decisions the point stands that at this time privacy is a constitutional right in the USA.

DannyBee

Roe v. wade is considered explicitly overruled, as well as considered wrongly decided in the first place, as of 2022 (Dobbs).

They also explicitly stated a constitutional right to privacy does not exist, and pointed out that Casey abandoned any such reliance on this sort of claim.

Griswold also found a right to marital privacy. Not general privacy.

Griswold is also barely considered good law anymore, though i admit it has not been explicitly overruled - it is definitely on the chopping block, as more than just Thomas has said.

In any case, more importantly, none of them have found any interesting right to privacy of the kind we are talking about here, but instead more specific rights to privacy in certain contexts. Griswold found a right to marital privacy in "the penumbra of the bill of rights". Lawrence found a right to privacy in your sexual activity.

In dobbs, they explicitly further denied a right to general privacy, and argued previous decisions conflated these: " As to precedent, citing a broad array of cases, the Court found support for a constitutional “right of personal privacy.” Id., at 152. But Roe conflated the right to shield information from disclosure and the right to make and implement important personal decisions without governmental interference."

You are talking about the former, which none of these cases were about. They are all about the latter.

So this is very far afield from a general right to privacy of the kind we are talking about, and more importantly, one that would cover anything like OpenAI chats.

So basically, you have a ~200 year period where it was not considered a right, and then a 50 year period where specific forms of privacy were considered a right, and now we are just about back to the former.

The kind of privacy we are talking about here ("the right to shield information from disclosure") has always been subject to a balancing of interests made by legislatures, rather than a constitutional right upon which they may not infringe. Example abound - you actually don't have to look any further than court filings themselves, and when you are allowed to proceed anonymously or redact/file things under seal. The right to public access is considered much stronger than your right to not want the public to know embarassing or highly private things about your life. There are very few exceptions (minors, etc).

Again, i don't claim any of this is how it is should be. But it's definitely how it is.

krapp

¯\_(ツ)_/¯ The supreme court overturned Roe v. Wade in 2022 and explicitly stated in their ruling that a constitutional right to privacy does not exist.

shkkmo

> It has, fwiw, also historically not been seen as a core right for thousands of years. So i think it's a harder argument to make than you think despite the EU coming around on this.

This doesn't seem true. I'd assume you know more about this than I do though so can you explain this in more detail? The concept of privacy is definitely more than thousands of years old. The concept of a "human right", is arguably much newer. Do you have particular evidence that a right to privacy is a harder argument to make that other human rights?

While the language differs, the right to privacy is enshrined more or less explicitly in many constitutions, including 11 USA states. It isn't just a "european" thing.

static_motion

I understand what they mean. There's this great video [1] which explains it in better terms than I ever could. I've timestamped the link because it's quite long, but if you've got the time it's a fantastic video with a great narrative and presentation.

[1] https://youtu.be/Fzhkwyoe5vI?t=4m9s

know-how

[dead]

dragonwriter

> Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Because the law favors preservation of evidence for an active case above most other interests. It's not a matter of arbitrary preference by the particular court.

piombisallow

Regardless of the details of this specific case, the courts are not democratic and do not decide based on the interest of the parties or how many they are, they decide based on the law.

brookst

This is not true even in the slightest.

The law is not a deterministic computer program. It’s a complex body of overlapping work and the courts are specifically chartered to use judgement. That’s why briefs from two parties in a dispute will often cite different laws and precedents.

For instance, Winter v. NRDC specifically says that courts must consider whether an injunction is in the public interest.

piombisallow

"public interest" is a much more ambiguous thing than the written law

efskap

Note that this also applies to GPT models on the API

> That risk extended to users of ChatGPT Free, Plus, and Pro, as well as users of OpenAI’s application programming interface (API), OpenAI said.

This seems very bad for their business.

merksittich

Interesting detail from the court order [0]: When asked by the judge if they could anonymize chat logs instead of deleting them, OpenAI's response effectively dodged the "how" and focused on "privacy laws mandate deletion." This implicitly admits they don't have a reliable method to sufficiently anonymize data to satisfy those privacy concerns.

This raises serious questions about the supposed "anonymization" of chat data used for training their new models, i.e. when users leave the "improve model for all users" toggle enabled in the settings (which is the default even for paying users). So, indeed, very bad for the current business model which appears to rely on present users (voluntarily) "feeding the machine" to improve it.

[0] https://cdn.arstechnica.net/wp-content/uploads/2025/06/NYT-v...

Kon-Peki

Thank you for the link to the actual text!

So, the NYT asked for this back in January and the court said no, but asked OpenAI if there was a way to accomplish the preservation goal in a privacy-preserving manner. OpenAI refused to engage for 5 f’ing months. The court said “fine, the NYT gets what they originally asked for”.

Nice job guys.

noworriesnate

Nice find! Maybe this is a ploy by OpenAI to use API requests for training while blaming the courts?

blackqueeriroh

That’s not an implicit admission, it’s refusing to argue something they don’t want to do.

neilv

Some established businesses will need to review their contracts, regulations, and risk tolerance.

And wrapper-around-ChatGPT startups should double-check their privacy policies, that all the "you have no privacy" language is in place.

999900000999

I'm not going to look up the comment, but a few months back I called this out and said if you seriously want to use any LLM in a privacy sensitive context you need to self host.

For example, if there are business consequences for leaking customer data, you better run that LLM yourself.

TeMPOraL

My standard reply to such comments over the past year has been the same: you probably want to use Azure instead. A big part of the business value they provide is ensuring regulatory compliance.

There are multinational corporations with heavy presence in Europe, that run their whole business on Microsoft cloud, including keeping and processing there privacy-sensitive data, business-critical data and medical data, and yes, that includes using some of this data with LLMs - hosted on Azure. Companies of this size cannot ignore regulatory compliance and hope no one notices. This only works because MS figured out how to keep it compliant.

Point being, if there are business consequences, you'll be better off using Azure-hosted LLMs than running a local model yourself - they're just better than you or me at this. The only question is, whether you can afford it.

jaggederest

I've been poking around the medical / ehr LLM space and gently asking people how they're preserving privacy and everyone appears to be just shipping data to cloud providers based solely on a BAA. Kinda baffling to me, my first step would be to set up local models even if they're not as good, data breaches are expensive.

ted537

Yeah its an awkward position, as self-hosting is going to be insanely expensive unless you have a substantial userbase to amortize the costs over. At least for a model comparable to GPT-4o or deepseek.

But at least if you use an API in the same region as your customers, court order shenanigans won't get you caught between different jurisdictions.

fakedang

And ironically because OpenAI is actually ClosedAI, the best self-hostable model available currently is a Chinese model.

Etheryte

In the European privacy framework, and legal framework at large, you can't terms of service away requirements set by the law. If the law requires you to keep the logs, there is nothing you can get the user to sign off on to get you out of it.

zombot

OpenAI keeping the logs is the "you have no privacy" part. Anyone who inspects those logs can see what the users were doing. But now everyone knows they're keeping logs and they can't lie their way out of it. So, for your own legal safety, put it in your TOS. Then every user should know they can't use your service if they want privacy.

cj

> Some established businesses will need to review their contracts, regulations, and risk tolerance.

I've reviewed a lot of SaaS contracts over the years.

Nearly all of them have clauses that allow the vendor to do whatever they have to if ordered to by the government. That doesn't make it okay, but it means OpenAI customers probably don't have a legal argument, only a philosophical argument.

Same goes for privacy policies. Nearly every privacy policy has a carve out for things they're ordered to do by the government.

Nasrudith

Yeah. You basically need cyberpunk style corporate extraterritoriality to get that particular benefit, of being able to tell governments to go screw themselves.

Wowfunhappy

> And wrapper-around-ChatGPT startups should double-check their privacy policies, that all the "you have no privacy" language is in place.

If a court orders you to preserve user data, could you be held liable for preserving user data? Regardless of your privacy policy.

gpm

I don't think the suit would be against you preserving it, it would be against you falsely representing that you aren't preserving it.

A court ordering you to stop selling pigeons doesn't mean you can keep your store for pigeons open and pocket the money without delivering pigeons.

cortesoft

Almost all privacy policies are going to have a call out for legal rulings. For example, here is the Hackernews Legal section in the privacy policy (https://www.ycombinator.com/legal/)

> Legal Requirements: If required to do so by law or in the good faith belief that such action is necessary to (i) comply with a legal obligation, including to meet national security or law enforcement requirements, (ii) protect and defend our rights or property, (iii) prevent fraud, (iv) act in urgent circumstances to protect the personal safety of users of the Services, or the public, or (v) protect against legal liability.

woliveirajr

Yes. If your agreement with the end user says that you won't collect and store data, you're responsible for it. If you can't provide it (even if due to a court order), you have to adjust your contract.

Your users aren't obligated to know that you're using open ai or other provider.

pjc50

> If a court orders you to preserve user data, could you be held liable for preserving user data?

No, because you turn up to court and show the court order.

It's possible a subsequent case could get the first order overturned, but you can't be held liable for good faith efforts to comply with court orders.

However, if you're operating internationally, then suddenly it's possible that you may be issued competing court orders both of which are "valid". This is the CLOUD Act problem. In which case the only winning move becomes not to play.

bilbo0s

No. It’s a legal court order.

This, however, is horrible for AI regardless of whether or not you can sue.

Chris2048

Just to be pedantic, could the company encrypt the logs with a third-party key in escrow, s.t they would not be able to access that data, but the third party could provide access e.g. for a court.

HappMacDonald

The problem ultimately isn't a technical one but a political one.

Point 1: Every company has profit incentive to sell the data in the current political climate, all they need is a sneaky way to access it without getting caught. That includes the combo of LLM provider and Escrow non-entity.

Point 2: No company has profit incentive to defend user privacy, or even the privacy of other businesses. So who could run the Escrow service? Another business? Then they have incentive to cheat and help the LLM provider access the data anyway. The government (and which one)? Their intelligence arms want the data just as much as any company does so you're back to square one again.

"Knowledge is power" combined with "Knowledge can be copied without anyone knowing" means that there aren't any currencies presently powerful enough to convince any other entity to keep your secrets for you.

johnQdeveloper

> This seems very bad for their business.

Well, it is gonna be all _AI Companies_ very soon so unless everyone switches to local models which don't really have the same degree of profitability as a SaaS, its probably not going to kill a company to have less user privacy because tbh people are used to not having privacy these days on the internet.

It certainly will kill off the few companies/people trusting them with closed source code or security related stuff but you really should not outsource that anywhere.

csomar

Did an American court just destroy all American AI companies in favor of open weight Chinese models?

pjc50

No, because users don't care about privacy all that much, and for corporate clients discovery is always a risk anyway.

See the whole LIBOR chat business.

thot_experiment

afaik only OpenAI is enjoined in this

mountainriver

You can fine tune models on a multitenant base model and it’s often more profitable.

SchemaLoad

>don't really have the same degree of profitability as a SaaS

They have a fair bit. Local models lets companies sell you a much more expensive bit of hardware. Once Apple gets their stuff together it could end up being a genius move to go all in on local after the others have repeated scandals of leaking user data.

johnQdeveloper

Yes but it shifts all the value onto companies producing hardware and selling enterprise software to people who get locked into contracts. The market is significantly smaller # of companies and margins if they have to build value adds they won't charge for to move hardware.

bsder

> It certainly will kill off the few companies/people trusting them with closed source code or security related stuff but you really should not outsource that anywhere.

And how many companies have proprietary code hosted on Github?

johnQdeveloper

None that I've worked for so I don't really track the statistics tbh.

We've always done self-hosted as old as things like gerrit and what not that aren't even really feature complete as competitors where I've worked.

Kokouane

If you were working with code that was proprietary, you probably shouldn't of been using cloud hosted LLMs anyways, but this would seem to seal the deal.

larrymcp

I think you probably mean "shouldn't have". There is no "shouldn't of".

rimunroe

Which gives you an opening for the excellent double contraction “shouldn’t’ve”

knicholes

I care.

amanaplanacanal

That used to be the case, but "shouldn't of" is definitely becoming more popular, even if it seems wrong. Languages change before our eyes :)

DecentShoes

Who cares?

gpm

I think it's fair to question how proprietary your data is.

Like there's the algorithm by which a hedge fund is doing algorithmic trading, they'd be insane to take the risk. Then there's the code for a video game, it's proprietary, but competitors don't benefit substantially from an illicit copy. You ship the compiled artifacts to everyone, so the logic isn't that secret. Copies of the similar source code have linked before with no significant effects.

FuckButtons

AFAIK, the actual trading algorithms themselves aren’t usually that far from what you can find in a textbook, their efficacy is mostly dictated by market conditions and the performance characteristics of the implementation / system as a whole.

short_sells_poo

Most (all?) hedge funds that use AI models explicitly run in-house. People do use commercial LLMs, but in cases where the LLMs are not run in-house, it's against the company policy to upload any proprietary information (and generally this is logged and policed).

A lot of the use is fairly mundane and basically replaces junior analysts. E.g. it's digesting and summarizing the insane amounts of research that is produced. I could ask an intern to summarize the analysis on platinum prices over the last week, and it'll take them a day. Alternatively, I can feed in all the analysis that banks produce to an LLM and have it done immediately. The data fed in is not a trade secret really, and neither is the output. What I do with the results is where the interesting things happen.

YetAnotherNick

Why not? Assuming you believe you can use any cloud for backup or Github for code storage.

solaire_oa

IIUC one reason is that prompts and other data sent to 3rd party LLM hosts have the chance to be funneled to 4th party RLHF platforms, e.g. Sagemaker, Mechanical Turks, etc. So a random gig worker could be reading a .env file the intern uploaded.

consumer451

All GPT integrations I’ve implemented have been via Azure’s service, due to Microsoft’s contractual obligation for them not to train on my data.

As far as I understand it, this ruling does not apply to Microsoft, does it?

Descon

I think when you spin up open AI in azure, that instance is yours, so I don't believe that would be subject to this order.

tbrownaw

The plans scale down far enough that they can't possibly cover the cost of a private model-loaded-to-vram instance at the low end.

dinobones

How? This is retention for legal risk, not for training purposes.

They can still have legal contracts with other companies, that stipulate that they don't train on any of their data.

paxys

Your employees' seemingly private ChatGPT logs being aired in public during discovery for a random court case you aren't even involved in is absolutely a business risk.

lxgr

I get where it's historically coming from, but the combination of American courts having almost infinite discovery rights (to be paid by the losing party, no less, greatly increasing legal risk even to people and companies not out to litigate) and the result of said discoveries ending up on the public record seems like a growing problem.

There's a qualitative difference resulting from quantitatively much easier access (querying some database vs. having to physically look through court records) and processing capabilities (an army of lawyers reading millions of pages vs. anyone, via an LLM) that doesn't seem to be accounted for.

pjc50

People need to read up on the LIBOR scandal. There was a lot of "wait why are my chat logs suddenly being read out as evidence of a criminal conspiracy".

godelski

Retention means an expansion of your threat model. Specifically, in a way you have little to no control over.

It's one thing if you get pwned because a hacker broke into your servers. It is another thing if you get pwned because a hacker broken into somebody else's servers.

At this point, do we believe OpenAI has a strong security infrastructure? Given the court order, it doesn't seem possible for them to have sufficient security for practical purposes. Your data might be encrypted at rest, but who has the keys? When you're buying secure instances, you don't want the provider to have your keys...

bcrosby95

Isn't it a risk even if they retain nothing? Likely less of a risk, but it's still a risk that you have no way to deep dive on, and you can still get "pwned" because someone broke into their servers.

antihipocrat

Will a business located in another jurisdiction be comfortable that the records of all staff queries & prompts are being stored and potentially discoverable by other parties? This is more than just a Google search, these prompts contain business strategy and IP (context uploads for example)

CryptoBanker

Right, because companies always follow the letter of their contracts.

lxgr

Why would the reason matter for people that don't want their data retained at all?

Take8435

...Data that is kept can be exfiltrated.

fn-mote

Cannot emphasize this enough. If your psychologist’s records can be held for ransom, surely your ChatGPT queries will end up on the internet someday.

Do search engine companies have this requirement as well? I remember back in the old days deanonymizing “anonymous” query logs was interesting. I can’t imagine there’s any secrecy left today.

jameshart

Thinking about the value of the dataset of Enron’s emails that was disclosed during their trials, imagine the value and cost to humanity of all OpenAI’s api logs even for a few months being entered into court record..

ivape

Going to drop a PG tweet:

https://x.com/paulg/status/1913338841068404903

"It's a very exciting time in tech right now. If you're a first-rate programmer, there are a huge number of other places you can go work rather than at the company building the infrastructure of the police state."

---

So, courts order the preservation of AI logs, and government orders the building of a massive database. You do the math. This is such an annoying time to be alive in America, to say the least. PG needs to start blogging again about what's going on now days. We might be entering the digital version of the 60s, if we're lucky. Get local, get private, get secure, fight back.

jacob019

I think the court overstepped by ordering OpenAI to save all user chats. Private conversations with AI should be protected - people have a reasonable expectation that deleted chats stay deleted, and knowing everything is preserved will chill free expression. Congress needs to write clear rules about what companies can and can't do with our data when we use AI. But honestly, I don't have much faith that Congress can get their act together to pass anything useful, even when it's obvious and most people would support it.

ethagnawl

Why is AI special in this regard? Why is my exchange with ChatGPT any more privileged than my DuckDuckGo search for _HIV test margin of error_?

jacob019

You're right, it's not special.

This is from DuckDuckGo's privacy policy: "We don’t track you. That’s our Privacy Policy in a nutshell. We don’t save or share your search or browsing history when you search on DuckDuckGo or use our apps and extensions."

If the court compelled DuckDuckGo to log all searches, I would be equally concerned.

sib

That's a pretty significant difference, though.

OpenAI (and other services) log and preserve your interactions, in order to either improve their service or to provide features to you (e.g., your chat history, personalized answers, etc., from OpenAI). If a court says "preserve all your user interaction logs," they exist and need to be preserved.

DDG explicitly does not track you or retain any data about your usage. If a court says "preserve all your users interaction logs," there is nothing to be preserved.

It is a very different thing - and a much higher bar - for a court to say "write code to begin logging user interaction data and then preserve those logs."

robocat

DuckDuckGo uses Bing.

It would be interesting to know how much Microsoft logs or tracks.

raincole

AI is not special and that's the exact issue. The court made a precedence here. If OpenAI can be ordered to preserve all the logs, then DuckDuckGo can face the same issue even if they don't want to do that.

energy123

People upload about 100x more information about themselves to ChatGPT than search engines.

nradov

How did the court overstep? Orders to preserve evidence are routine in civil cases. Customer expectations about privacy have zero legal relevance.

jacob019

Sure, preservation orders are routine - but this would be like ordering phone companies to record ALL calls just in case some might become evidence later. There's a huge difference between preserving specific communications in a targeted case and mass surveillance of every private conversation. The government shouldn't have that kind of blanket power over private communications.

charonn0

> but this would be like ordering phone companies to record ALL calls just in case some might become evidence later

That's not a good analogy. They're ordered to preserve records they would otherwise delete, not create records they wouldn't otherwise have.

nradov

No, it wouldn't be like that at all. Phone companies and telephone calls are covered under a different legal regime so your analogy is invalid.

pjc50

Consider the opposite prevailing, where I can legally protect my warez site simply by saying "sorry, the conversation where I sent them a copy of a Disney movie was private".

riskable

The legal situation you describe is a matter of impossibility and unrelated to the OpenAI case.

In the case of a warez site they would never have logged such a "conversation" to begin with. So if the court requested that they produce all such communications the warez site would simply declare that as, "Impossibility of Performance".

In the case of OpenAI the courts are demanding that they preserve all future communications from all their end users—regardless of whether or not those end users are parties (or even relevant) to the case. The court is literally demanding that they re-engineer their product to record all communications where none existed previously.

I'm not a lawyer but that seems like it would violate FRCP 26(b)(1) which covers "proportionality". Meaning: The effort required to record the evidence is not proportional relative to the value of the information sought.

Also—generally speaking—courts recognize that a party is not required to create new documents or re-engineer systems to satisfy a discovery request. Yet that is exactly what the court has requested of OpenAI.

null

[deleted]

lcnPylGDnU4H9OF

If specific users are violating the law, then a court can and should order their data to be retained.

BrtByte

The preservation order feels like a blunt instrument in a situation that needs surgical precision

marcyb5st

Would it be possible to comply with the order by anonymizing the data?

The court is after evidence that users use ChatGPT to bypass paywalls. Anonymizing the data in a way that makes it impossible to 1) pinpoint the users and 2) reconstruct the generic user conversation history would preserve privacy and allow OpenAI to comply in good faith with the order.

The fact that they are blaring sirens and hide behind the "we can't, think about users' privacy" feels akin to willingful negligence or that they know they have something to hide.

lcnPylGDnU4H9OF

> feels akin to willingful negligence or that they know they have something to hide

Not at all; there is a presumption of innocence. Unless a given user is plausibly believed to be violating the law, there is no reason to search their data.

Miraltar

Anonymizing data is really hard and I'm not sure they'd be allowed to do it. I mean they're accused of deleting evidences, why would they be allowed to alter it ?

amanaplanacanal

If it's possible evidence as part of a lawsuit, of course they can't delete it.

jacob019

A targeted order is one thing, but this applies to ALL data. My data is not possible evidence as part of a lawsuit, unless you know something I don't know.

artursapek

That’s… not how discovery works

ianks

This ruling is unbelievably dystopian for anyone that values a right to privacy. I understand that the logs will be useful in the occasional conviction, but storing a log of people’s most personal communications is absolutely not a just trade.

To protect their users from the this massive overreach, OpenAI should defy this order and eat the fines IMO.

imiric

This is a moot issue. OpenAI and all AI service providers already use all user-provided data for improving their models, and it's only a matter of time until they start selling it to advertisers, if they don't already. Whether or not they actually delete chat conversations is irrelevant.

Anyone concerned about their privacy wouldn't use these services to begin with. The fact they are so popular is indicative that most people value the service over their privacy, or simply don't care.

wongarsu

Plenty of service providers (including OpenAI) offer you the option to kindly ask them not to, and will even contractually agree not to use or sell your data if you want such an agreement.

Yes, they want to use everyone's data. But they also want everyone as a customer, and they can't have both at once. Offering people an opt-out is a popular middle-ground because the vast majority of people don't care about it, and those that do care are appeased

malwrar

They will do it when they need the money and/or feel they have the leverage for precisely the same reason that 99% of people won’t care. It’s better to assume they’re just sitting on your data and waiting until they can get away with using it.

imiric

That's nice. How can a user verify whether they fully comply with those contracts?

They have every incentive not to, and no oversight to hold them accountable if they don't. Do you really want to trust your data is safe based on a pinky promise from a company?

thewebguyd

> The fact they are so popular is indicative that most people value the service over their privacy, or simply don't care.

Or, the general populace just doesn't understand the actual implications. The HN crowd can be guilty of severely overestimating the average person's tech literacy, and especially their understanding of privacy policies and ToS. Many may think they are OK with it, but I'd argue it's because they don't understand the potential real-world consequences of such privacy violations.

imiric

> Or, the general populace just doesn't understand the actual implications.

That might've been the case in the first generations of ad-supported business models on the web. But after two decades, even non-technical users have understood the implications of "free" services.

IME talking to non-technical people about this topic, I can't remember the last time someone mentioned not being aware of the ToS and privacy policies they agree to, even if they likely hadn't read the legalese. Whereas the most common excuses I've heard are "I have nothing to hide", and "I don't use it often".

So I think you're underestimating the average person's tech literacy. I'm sure people who who still don't understand the implications exist, but they're in the minority.

null

[deleted]

yard2010

It's almost rigged. Either they are keeping the data (and ofc making money out of it) or deleting it destroying the evidence of the crimes they're committing..

null

[deleted]

romanovcode

This has nothing to do with convictions of criminals but everything with CIA gathering profiles every single person they can.

paxys

Not only does this mean OpenAI will have to retain this data on their servers, they could also be ordered to share it with the legal teams of companies they have been sued by during discovery (which is the entire point of a legal hold). Some law firm representing NYT could soon be reading out your private conversations with ChatGPT in a courtroom to prove their case.

fhub

My guess is they will store them on tape e.g. on something like Spectra TFinity ExaScale library. I assume AWS glacier et al use this sort of thing for their deep archives.

Storing them on something that has hours to days retrieval window satisfies the court order, is cheaper, and makes me as a customer that little bit more content with it (mass data breach would take months of plundering and easily detectable).

genewitch

Glacier is tape silos, but this is textual data. You don't need to save output images, just the checkpoint+hash of the generating model and the seed. Stable diffusion saves this until you manually delete the metadata, for example. So my argument is you could do this with LTO as well. Text compresses well, especially if you don't do it naively.

JKCalhoun

> She suggested that OpenAI could have taken steps to anonymize the chat logs but chose not to

That is probably the solution right there.

blagie

This data cannot be anonymized. This is trivial provable, both mathematically, but given the type of data, it should also be intuitively obvious to even the most casual observer.

If you're talking to ChatGPT about being hunted by a Mexican cartel, and having escaped to your Uncle's vacation home in Maine -- which is the sort of thing a tiny (but non-zero) minority of people ask LLMs about -- that's 100% identifying.

And if the Mexican cartel finds out, e.g. because NY Times had a digital compromise at their law firm, that means someone is dead.

Legally, I think NY Times is 100% right in this lawsuit holistically, but this is a move which may -- quite literally -- kill people.

zarzavat

It's like anonymizing your diary by erasing your name on the cover.

JKCalhoun

I don't dispute your example, but I suspect there is a non-zero number of cases that would not be so extreme, so obviously identifiable.

So, sure, no panacea, but .. why not for the cases where it would be a barrier?

genewitch

AOL found out and thus we all found out that you can't anonymize certain things, web searches in that case. I used to have bookmarked some literature from maybe ten years ago that said,(proved with math?), any moderate collection of data from or by individuals that fits certain criteria is de-anonymizeable, if not by itself, then with minimal extra data. I want to say it included if, for instance, instead of changing all occurances of genewitch to user9843711, every instance of genewitch was a different, unique id.

I apologize for not having cites or a better memory at this time.

paxys

> She suggested that OpenAI could have taken steps to anonymize the chat logs but chose not to, only making an argument for why it "would not" be able to segregate data, rather than explaining why it "can’t."

Sounds like bullshit lawyer speak. What exactly is the difference between the two?

dijksterhuis

Not wanting to do something isn't the same thing as being unable to do something.

!define would

> Used to express desire or intent -- https://www.wordnik.com/words/would

!define cannot

> Can not ( = am/is/are unable to) -- https://www.wordnik.com/words/cannot

bilbo0s

I’d just assume that any chat or api call you do to any cloud based ai in th US will be discoverable from here on out.

If that’s too big a risk it really is time to consider locally hosted LLMs.

amanaplanacanal

That's always been the case for any of your data anywhere in any third party service of any kind, if it is relevant evidence in a lawsuit. Nothing specific to do with LLMs.

marcyb5st

I ask again, why not anonymizing the data? That way NYT/the court could see if users are bypassing the paywall through ChatGPT while preserving privacy.

Even if I wrote it, I don't care if someone read out loud in public court "user <insert_hash_here> said: <insert nastiest thing you can think of here>"

Orygin

You can't really anonymize the data if the conversation itself is full of PII.

I had colleagues chat with GPT, and they send all kinds of identifying information to it.

kragen

Copyright in its current form is incompatible with private communication of any kind through computers, because computers by their nature make copies of the communication, so it makes any private communication through a computer into a potential crime, depending on its content. The logic of copyright enforcement, therefore, demands access to all such communications in order to investigate their legality, much like the Stasi.

Inevitably such a far-reaching state power will be abused for prurient purposes, for the sexual titillation of the investigators, and to suppress political dissent.

6stringmerc

This is a ludicrous assertion and factually inaccurate beyond all practical intelligence.

A computer in service of an individual absolutely follows copyright because the creator is in control of the distribution and direction of the content.

Besides, copyright is a civil statute, not criminal. Everything about this comment is the most obtuse form of FUD possible. I’m pro copyright reform, but this is “Uncle off his meds ranting on Facebook” unhinged and shouldn’t be given credence whatsoever.

malwrar

> A computer in service of an individual absolutely follows copyright because the creator is in control of the distribution and direction of the content.

I don’t understand what means. A computer in service of an individual turns copyright law into mattress tag removal law—practically unenforceable.

kragen

None of that is correct. Some of it is not even wrong, demonstrating an unbelievably profound ignorance of its topic. Furthermore, it is gratuitously insulting.

pjc50

> Besides, copyright is a civil statute, not criminal

Nope. https://www.justia.com/intellectual-property/copyright/crimi...

Imnimo

So if you're a business that sends sensitive data through ChatGPT via the API and were relying on the representation that API inputs and outputs were not retained, OpenAI will just flip a switch to start retaining your data? Were notifications sent out, or did other companies just have to learn about this from the press?

celnardur

There has been a lot of opinion pieces popping up on HN recently that describe the benefits they see from LLMs and rebut the drawbacks most of them talk about. While they do bring up interesting points, NONE of them have even mentioned the privacy aspect.

This is the main reason I can’t use any LLM agents or post any portion of my code into a prompt window at work. We have NDAs and government regulations (like ITAR) we’d be breaking if any code left our servers.

This just proves the point. Until these tools are local, privacy will be an Achilles heal for LLMs.

garyfirestorm

You can always self host an LLM which is completely controlled on your own server. This is trivial to do.

redundantly

Trivial after a substantial hardware investment and installation, configuration, testing, benchmarking, tweaking, hardening, benchmarking again, new models come out so more tweaking and benchmarking and tweaking again, all while slamming your head against the wall dealing with the mediocre documentation surrounding all hardware and software components you're trying to deploy.

Yup. Trivial.

dvt

Even my 4-year-old M1 Pro can run a quantized Deepseek R1 pretty well. Sure, full-scale productizing these models is hard work (and the average "just-make-shovels" startups are failing hard at this), but we'll 100% get there in the next 1-2 years.

benoau

I put it LM Studio on an old gaming rig with a 3060 TI, took about 10 minutes to start using it and most of that time was downloading a model.

jjmarr

If you're dealing with ITAR compliance you should have experience with hosting things on-premises.

dlivingston

Yes. The past two companies I've been at have self-hosted enterprise LLMs running on their own servers and connected to internal documentation. There is also Azure Cloud for Gov and other similar privacy-first ways of doing this.

But also, running LLMs locally is easy. I don't know what goes into hosting them, as a service for your org, but just getting an LLM running locally is a straightforward 30-minute task.

genewitch

I'm for hire, I'll do all that for any company that needs it. Email in profile. Contract or employee, makes no difference to me.

blastro

This hasn't been my experience. Pretty easy with AWS Bedrock

celnardur

Yes, but which of the state of the art models that offer the best results, are you allowed to do this with? As far as I've seen the models that you can host locally are not the ones being praised left and right in these articles. My company actually allows people to use a hosted version of Microsoft copilot, but most people don't because it's still not that much of a productivity boost (if any).

genewitch

Deepseek isn't good enough? You need a beefy GPU cluster but I bet it would be fine until the large llama is better at coding, and I'm certain there will be other large models for LLM. Now if there's some new technology around the corner, someone might be able to build a moat, but in a surprising twist, Facebook did us all a favor by releasing their weights back when; there's no moat possible, in my estimation, with LLMs as it stands today. Not even "multi-model" implementations. Which I have at home, too.

Say oai implements something that makes their service 2x better. Just using it for a while should give people who live and breathe this stuff enough information to tease out how to implement something like it, and eventually it'll make it into the local-only applications, and models.

null

[deleted]

aydyn

It is not at all trivial for an organization that may be doing everything on the cloud to locally set up the necessary hardware and ensure proper networking and security to that LLM running on said hardware.

woodrowbarlow

> NONE of them have even mentioned the privacy aspect

because the privacy aspect has nothing to do with LLMs and everything to do with relying on cloud providers. HN users have been vocal about that since long before LLMs existed.

g42gregory

Can somebody please post a complete list of these news organizations, demanding to see all of our ChatGPT conversations?

I see one of them: The New York Times.

We need to let people know who the other ones are.

tgv

Why?

DaSHacka

To know what subscriptions we need to cancel.

tgv

Yeah, shoot the messenger, that has always worked.

lrvick

There is absolutely no reason for these logs to exist.

Run LLM in an enclave that generates ephemeral encryption keys. Have users encrypt text directly to those enclave ephemeral keys, so prompts are confidential and only ever visible in an environment not capable of logging.

All plaintext data will always end up in the hands of governments if it exists, so make sure it does not exist.

jxjnskkzxxhx

Then a court will order that you don't encrypt. And probably go after you for trying to undermine the intent of previous court order. Or what, you thought you found an obvious loophole in the entire legal system?

lrvick

Yes. Because once you have remote attestation, anyone can host these enclaves in any country, and charge some tiny fee for their gpu time.

Decentralize hosting and encryption then centralized developers of the open source software will be literally unable to comply.

This well proven strategy would however only be possible if anything about OpenAI was actually open.

TechDebtDevin

Do you have any reading on this?

paxys

Encryption does not negate copyright laws. The solution here is for LLM builders to pay for training data.

ronsor

The solution here is to get rid of copyright.

mucha

That's happening. Unmodified LLM outputs aren't copyrightable.

OJFord

Better link in the thread: https://arstechnica.com/tech-policy/2025/06/openai-says-cour...

(As in, an actual article, not just a mastodon-tweet from some unknown (maybe known? Not by me) person making the title claim, with no more info.)

incompatible

Looks like https://en.wikipedia.org/wiki/Lauren_Weinstein_(technologist..., he has been commentating on the Internet for about as long as it has existed.

bibinou

And? the article you linked only has primary sources.

genewitch

Roughly how many posts on HN are by people you know?

OJFord

Of those that are tweets and similar? Almost all of them (the ones I look at being interested in the topic anyway).

By 'know' I mean recognise the name as some sort of authority. I don't 'know' Jon Gruber or Sam Altman or Matt Levine, but I'll recognise them and understand why we're discussing their tweet.

The linked tweet (whatever it's called) didn't say anything more than the title did here, so it was pointless to click through really. In replies someone asked the source and someone else replied with the link I commented above. (I don't 'know' those people either, but I recognise Ars/even if I didn't appreciate the longer form with more info.)

genewitch

thanks for engaging.

> The linked tweet (whatever it's called)

"post" works for social media regardless of the medium; not an admonishment, an observation. Also, by the time i saw this, it was already an Ars link, leaving some comments with less context that i apparently didn't pick up on. I was able to make my observation because someone mentioned mastodon (i think), but that was an assumption on my part that the original link was mastodon.

So i asked the question to make sure it wasn't some bias against mastodon (or the fediverse), because I'd have liked to ask, "for what reason?"

ronsor

This court order certainly violates privacy laws in multiple jurisdictions and existing contracts OpenAI may have with customers.

CryptoBanker

Existing contracts have zero bearing on what a court may and may not order.

ronsor

Contracts don't, but foreign law is going to make this a pain for OpenAI. Other countries may not care what a U.S. court orders; they want their privacy laws followed.

jillesvangurp

This is why American cloud providers have legal entities outside of the US. Those have to comply with the law in the countries where they are based if they want to do business there. That's how AWS, Azure, GCP, etc. can do business in the EU. Most of that business is neatly partitioned from any exposure to US courts. There are some treaties that govern what these companies can and cannot send back to the US that some might take issue with and that are policed and scrutinized quite a bit on the EU side.

OpenAI does this as well of course. Any EU customers are going to insist on paying via an EU based entity in euros and will be talking to EU hosted LLMs with all data and logs being treated under EU law, not US law. This is not really optional for commercial use of SAAS services in the EU. To get lucrative enterprise contracts outside the US, OpenAI has no other choice but to adapt to this. If they don't, somebody else will and win those contracts.

I actually was at a defense conference in Bonn last week talking to a representative of Google Cloud. I was surprised that they were there at all because the Germans are understandably a bit paranoid about trusting US companies with hosting confidential stuff (considering some scandals a few years ago about the CIA spying on the German government a few years ago). But they actually do offer some services to the BWI, which is the part of the German army that takes care of their IT needs. And German spending on defense is of course very high right now so there are a lot of companies trying to sell in Germany, on Germany's terms. Including Google.

mosdl

That's OpenAI's issue, not the court.

adriand

The order also dates back to May 13. What the fuck?! That’s weeks ago! The only reason I can think of for why OpenAI did not warn its users about this via an email notification is because it’s bad for their business. But wow is it ever a breach of trust not to.

jcranmer

I don't think the order creates any new violations of privacy law. OpenAI's ability to retain the data and give it to third parties would have been the violation in the first place.