OpenAI whistleblower found dead in San Francisco apartment

498 comments

·December 13, 2024

rafram

Wow. Suchir was my project partner in a CS class at Berkeley (Operating Systems!). Incredibly smart, humble, nice person. It was obvious that he was going to do amazing things. This is really awful.

BillFranklin

There are some pretty callous comments on this thread.

This is really sad. Suchir was just 26, and graduated from Berkeley 3 years ago.

Here’s his personal site: https://suchir.net/.

I think he was pretty brave for standing up against what is generally perceived as an injustice being done by one of the biggest companies in the world, just a few years out of college. I’m not sure how many people in his position would do the same.

I’m sorry for his family. He was clearly a talented engineer. On his LinkedIn he has some competitive programming prizes which are impressive too. He probably had a HN account.

Before others post about the definition of whistleblower or talk about assassination theories just pause to consider whether, if in his position, you would that want that to be written about you or a friend.

csomar

> Before others post about the definition of whistleblower or talk about assassination theories just pause to consider whether, if in his position, you would that want that to be written about you or a friend.

Yes, if I was a few months away from giving the court a statement and I "suicided" myself, I'd rather have people tribulate about how my death happened than expect to take the suicide account without much push.

Sure, if I killed myself in silence I want to go in silence. But it's not clear from the article how critical this guy is in the upcoming lawsuits

> Information he held was expected to play a key part in lawsuits against the San Francisco-based company.

that_guy_iain

> But it's not clear from the article how critical this guy is in the upcoming lawsuits

If he was the key piece to the lawsuit the lawsuit wouldn't really have legs. To get the ball rolling someone like him would have to be critical but after they're able to get the ball rolling and get discovery if after all that all you have is one guy saying there is copyright infringement you've not found anything.

And realistically, the lawsuit is, while important, rather minor in scope and damage it could do to OpenAI. It's not like folk will go to jail, and it's not like OpenAI would have to close its doors, they would pay at most a few hundred million?

mu53

each missing piece weakens the case

chefandy

Do all lawsuits that are damaged by losing key witnesses lack legs, counselor?

ballooney

I don’t think you’re using the word tribulate correctly here.

lbrunson

Missing the forest for the trees.

bdcravens

If I die in the midst of whistleblowing, I hereby give permission for everyone to not ignore that fact.

griomnib

Sure seems like this is happening more frequently, eg with the Boeing guy. So it’s reasonable to ask why.

If you look at Aaron Schwartz for example you see they don’t have to assassinate you, they just have so many lawyers, making so many threats, with so much money/power behind them, people feel scared and powerless.

I don’t think OpenAI called in a hit job, but I think they spent millions of dollars to drive him into financial and emotional desperation - which in our system, is legal.

froggyguts

that's just assassination with extra steps

bossyTeacher

If I pressure you and put you in a position that makes you want to unalive yourself, you can be sure that you will be tried under manslaughter by way of assisted suicide in the form of emotional blackmail. Chances are whatever OpenAI exec did this probably has a lots of minions between him and whoever actually unalived the whistleblower so it can't be traced back to him

achrono

> I don't think OpenAI called in a hit job

Why? What do you know about OpenAI beyond the cute 12 days videos or Sam Altman's modest demeanor in interviews?

On the other hand, what makes you think they spent "millions of dollars" to cause him desperation?

spacemadness

Nah, I’d rather continue to let random HN commenters shame me for thinking critically and not taking things at face value.

guerrilla

You damn well better be trying to figure out what happened if I end up a dead whistleblower.

johnnyanmac

>if in his position, you would that want that to be written about you or a friend.

If that was my public persona, I don't see why not. He could have kept quiet and chosen not to testify if he was afraid of this defining him in a way.

I will say it's a real shame that it did become his public legacy, because I'm sure he was a brilliant man who would have truly help change the world for the better with a few more decades on his belt.

All that said, assassination theories are just that (though "theory" is much too strong a word here in a formal sense. it's basically hearsay). There's no real link to tug on here so there's not much productivity taking that route.

benreesman

It seems most are expressing sadness and condolences to the family and friends around what is clearly a great loss of both an outstanding talent and a uniquely principled and courageous person.

There will always be a few tacky remarks in any Internet forum but those have all found their way to the bottom.

RIP.

lolinder

I considered writing something more focused on him, but the rampant speculation was only going to get worse if no one pointed out the very intentional misleading implications baked into the headline. I stand by what I wrote, but thank you for adding to it by drawing attention away from the entirely-speculative villains and to the very real person who has died.

_cs2017_

As a reader, I prefer not to be misled by articles linked from the HN front page. So I do want to know whether someone is or is not a whistleblower. This has nothing to do with respect for the dead.

verisimi

People are free to comment on media events. You too are free to assume the moral high ground by commenting on the same event, telling people what they should or should not do.

potsandpans

People love to tone police for free points in the internet

bpodgursky

I'm confused by the term "whistleblower" here. Was anything actual released that wasn't publicly known?

It seems like he just disagreed with whether it was "fair use" or not, and it was notable because he was at the company. But the facts were always known, OpenAI was training on public copyrighted text data. You could call him an objector, or internal critic or something.

neuroelectron

The issue is it has to be proven in court. This man was personally responsible for developing web scraping; stealing data from likely copyrighted sources. He would have had communications specifically addressing the legality of his responsibilities, which he was openly questioning his superiors about.

unraveller

Web scraping is legal and benefiting from published works is entirely the point, so long as you don't merely redistribute it.

Training on X doesn't run afoul of fair-use because it doesn't redistribute nor does using it simply publish a recitation (as Suchir suggested). Summoning an LLM is closer to the act of editing in a text editor than it is to republishing. His hang up was on how often the original works were being substituted for chatGPT, but like AI sports articles, overlap is to be expected for everything now. Even without web scraping in training it would be impossible to block every user intention to remake an article out of the magic "editor" - that's with no-use of the data not even fair-use.

hnfong

> Web scraping is legal and benefiting from published works is entirely the point, so long as you don't merely redistribute it.

That's plainly false. Generally, if you redistribute "derivative works" you're also infringing. The question is what counts as derivative works, and I'm pretty sure lawyers and judges are perfectly capable of complicating the picture given the high stakes.

mattigames

"Summoning an LLM is closer to the act of editing in a text editor than it is to republishing." This quote puts so succinctly all that is wrong with LLM, it's the most convenient interpretation to an extreme point, like the creators of fair use laws ever expected AI to exist, like the constrains of human abilities were never in the slightest influential to the fabrication of such laws.

ALittleLight

"Stealing data" seems pretty strong. Web scraping is legal. If you put text on the public Internet other people can read it or do statistical processing on it.

What do you mean he was "stealing data"? Was he hacking into somewhere?

canoebuilder

In a lot of ways, the statistical processing is a novel form of information retrieval. So the issue is somewhat like if 20 years ago Google was indexing the web, then decided to just rehost all the indexed content on their own servers and monetize the views instead linking to the original source of the content.

neuroelectron

When you use some webpages, it forces you to agree to an EULA that might preclude web scraping. NYTimes is such a webpage which is why they were sued. This is evidence that OpenAI didn't care about the law. Someone with internal communications about this could completely destroy the company!!!

stonogo

The article holds clues: "Information he held was expected to play a key part in lawsuits against the San Francisco-based company."

abeppu

and later:

>In a Nov. 18 letter filed in federal court, attorneys for The New York Times named Balaji as someone who had “unique and relevant documents” that would support their case against OpenAI. He was among at least 12 people — many of them past or present OpenAI employees — the newspaper had named in court filings as having material helpful to their case, ahead of depositions.

Yes it's true it's been public knowledge that OpenAI has trained on copyrighted data, but details about what was included in training data (albeit dated ...), as well as internal metrics (e.g. do they know how often their models regurgitate paragraphs from a training document?) would be important.

janalsncm

I guess the question is whether those documents have already been entered into evidence?

sharkjacobs

http://suchir.net/fair_use.html

When does generative AI qualify for fair use? by Suchir Balaji

minimaxir

It's also worth reading his initial tweet: https://x.com/suchirbalaji/status/1849192575758139733

> I recently participated in a NYT story about fair use and generative AI, and why I'm skeptical "fair use" would be a plausible defense for a lot of generative AI products. I also wrote a blog post (https://suchir.net/fair_use.html) about the nitty-gritty details of fair use and why I believe this.

> To give some context: I was at OpenAI for nearly 4 years and worked on ChatGPT for the last 1.5 of them. I initially didn't know much about copyright, fair use, etc. but became curious after seeing all the lawsuits filed against GenAI companies. When I tried to understand the issue better, I eventually came to the conclusion that fair use seems like a pretty implausible defense for a lot of generative AI products, for the basic reason that they can create substitutes that compete with the data they're trained on. I've written up the more detailed reasons for why I believe this in my post. Obviously, I'm not a lawyer, but I still feel like it's important for even non-lawyers to understand the law -- both the letter of it, and also why it's actually there in the first place.

> That being said, I don't want this to read as a critique of ChatGPT or OpenAI per se, because fair use and generative AI is a much broader issue than any one product or company. I highly encourage ML researchers to learn more about copyright -- it's a really important topic, and precedent that's often cited like Google Books isn't actually as supportive as it might seem.

> Feel free to get in touch if you'd like to chat about fair use, ML, or copyright -- I think it's a very interesting intersection. My email's on my personal website.

bsenftner

I'm an applied AI developer and CTO at a law firm, and we discuss the fair use argument quite a bit. It grey enough that whom ever has more financial revenues to continue their case will win. Such is the law and legal industry in the USA.

motohagiography

what twigs me about the argument against fair use (whereby AI ostensibly "replicates" the content competitively against the original) is that it assumes a model trained on journalism produces journalism or is designed to produce it. the argument against that stance would be easy to make.

DennisP

> they can create substitutes that compete with the data they're trained on.

If I'm an artist and copy the style of another artist, I'm also competing with that artist, without violating copyright. I wouldn't see this argument holding up unless it can output close copies of particular works.

Terr_

There's also the output side: Perhaps outputs of generative AI should be ineligible for copyright.

dr_dshiv

That is the current position, weirdly enough.

A1kmm

Although the model weights themselves are also outputs of the training, and interestingly the companies that train models tend to claim model weights are copyrighted.

If a set of OpenAI model weights ever leak, it would be interesting to see if OpenAI tries to claim they are subject to copyright. Surely it would be a double standard if the outcome is distributing model weights is a copyright violation, but the outputs of model inference are not subject to copyright. If they can only have one of the two, the latter point might be more important to OpenAI than protecting leaked model weights.

fenomas

Indeed, and to me it's one of the reasons it's hard to argue that generative AI violates copyright.

At least in the US, a derivative work is a creative (i.e. copyrightable) work in its own right. Neither AI models nor their output meet that bar, so it's not clear what the infringing derivative work could be.

null

[deleted]

jarsin

I just realized I stumbled on some of this guys writings when I was researching AI and copyright cases. I submitted this one to HN awhile back.

He seemed very insightful for someone that isn't a lawyer.

RIP.

visarga

> training on copyrighted data without a similar licensing agreement is also a type of market harm, because it deprives the copyright holder of a source of revenue

I would respond to this by

1. authors don't actually get revenue from royalties, instead it's all about add revenue which leads to enshittification. If they were to live on royalties they would die of hunger, artists, copywriters and musicians.

2. copyright is increasingly concentrated in the hands of a few companies and don't really benefit the authors or the readers

3. actually the competition to new creative works is not AI, but old creative works that have been accumulating for 25 years on the web

I don't think restrictive copyright is what we need. Instead we have seen people migrate from passive consumption to interactivity, we now prefer games, social networks and search engines to TV, press and radio. Can't turn this trend back, it was created by the internet. We have now wikipedia, github, linux, open source, public domain, open scientific publications and non-restrictive environments for sharing and commenting.

If we were to take the idea of protecting copyrights to the extreme, it would mean we need to protect abstract ideas not just expression, because generative AI can easily route around that. But if we protected abstractions from reuse, it would be a disaster for creativity. I just think copyright is a dead man walking at this point.

cbracketdash

Police now say it's been ruled a suicide:

https://sfstandard.com/2024/12/13/key-openai-whistleblower-d...

https://www.forbes.com/sites/cyrusfarivar/2024/12/13/openai-...

https://www.huffpost.com/entry/openai-whistleblower-dead_n_6...

_dp9d

The US is turning into Russia where anyone about to testify magically falls out a window

catlikesshrimp

It should be taught in school that being a whistleblower requires safety preparation. Make it a woke thing or whatever, because it is something many don't give an afterthought about.

sillyfluke

The problem is, from a game theory perspective, things like a dead man's switch may possibly protect you from your enemy but won't protect you from your enemy's enemies who would gain two-fold from your death: your death would be blamed on your enemy, and all the dirty laundry would be aired to the public.

cbracketdash

Well I imagine this is a relatively new phenomena in the USA. Usually I hear about these "coincidences" in foreign countries... but here....? Maybe the older HN generation can shed some insight...

catlikesshrimp

It was common where I live. Since the current government (the last 17 years) it doesn't happen anymore. There is no criticism, and people often go to jail for no apparent reason.

By " common " I mean at least one very famous person yearly in a 7 million habitant country. Suicided without antecedents, family either disagreed with the investigation or speak about it.

jillyboel

[flagged]

pandemic_region

[flagged]

NullPrefix

[flagged]

greenavocado

[flagged]

z3c0

Good lord, what an atrocious Gish gallop of selective quotes and evidence. This might be one of the worst displays of sharpshooter logic I've ever seen.

AND it features a quote from William Pierce, an infamous neo-Nazi. Probably more, but I gave up after the umpteenth unverifiable quote. Just goes to show how much modern right-wing propaganda aligns with traditional neo-Nazi propaganda.

justin66

> Good lord, what an atrocious Gish gallop of selective quotes and evidence. This might be one of the worst displays of sharpshooter logic I've ever seen.

Ease up on the throttle there, LessWrong. You've blown the transaxle.

cenamus

"Upper echelon", "the rich and powerful", all some thin-veiled euphemisms for jews, among the antisemitic circles.

The US just needs to accept that it has it's own nobility again, money aristocracy.

null

[deleted]

MichaelMoser123

RIP. Suchir was a man of principles, he probably had to give up his OpenAI options as a result of his stance - OpenAI is reported to have a very restrictive offboarding agreements [1]

" It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it.

If a departing employee declines to sign the document, or if they violate it, they can lose all vested equity they earned during their time at the company, which is likely worth millions of dollars."

[1] https://www.vox.com/future-perfect/2024/5/17/24158478/openai...

zelphirkalt

Ha, that gives a pretty good picture how "open" Openai is. They want to own their employees, enslave them in a way. One might even think the cause of that whistleblower's death is contagious upon publishing.

Really ridiculous how afraid Openai is of criticism. Acting like a child that throws a tantrum, when something doesn't go its way, just that one needs to remind oneself, that somehow there are, with regard to age at least, adults behind this stuff.

rollcat

> Ha, that gives a pretty good picture how "open" Openai is.

"Any country with 'democratic' in its name, isn't".

The fight to claim a word's meaning can sometimes be fascinating to observe. We've started with "Free Software", but it was easily confused with "freeware", and in the meantime the meaning of "open source" was being put to test by "source available" / "look but do not touch" - so we ended up with atrocities like "FLOSS", which are too cringe for a serious-looking company to try to take over. I think "open" is becoming meaningless (unless you're explicitly referring to open(2)). With the advent of smart locks, even the definition of an open door is getting muddy.

Same for "AI". There's nothing intelligent about LLMs, not while humans continue to supervise the process. I like to include creativity and self-reflection in my working definition of intelligence, traits which LLMs are incapable of.

tikkun

Not anymore. In May 2024 OpenAI confirmed that it will not enforce those provisions:

* The company will not cancel any vested equity, regardless of whether employees sign separation agreements or non-disparagement agreements

* Former employees have been released from their non-disparagement obligations

* OpenAI sent messages to both former and current employees confirming that it "has not canceled, and will not cancel, any vested units"

https://www.theregister.com/2024/05/24/openai_contract_staff...

https://www.bloomberg.com/news/articles/2024-05-24/openai-re...

rkagerer

> It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it.

Can someone with legal expertise weigh in on how likely this would be to hold up in court?

n144q

My guess is that a lawsuit from OpenAI itself is enough to ruin your life. They don't even need to win the case.

tux3

I have it from good authority that — even in the absence of a lawsuit — fighting OpenAI can lead to having dramatically less time to enjoy life.

It's a bit like smoking. Some activities are just not good for your health.

Bluestein

I was wondering myself. Also, the whole thing about losing vested equity - would that hold up in court?

BrandoElFollito

I am amazed that such things are possible. Here on France this is so illegal that it is laughable.

I am saying "laughable" because there are small things companies try to enforce, and say sorry afterwards. But telling you that you are stuck with this for life is comedy grade.

null

[deleted]

neilv

Condolences to the family. It sounds like he was a very thoughtful and principled person.

OutOfHere

[flagged]

dang

> Not that thoughtful. Copyright law is mostly harmful. Apparently he couldn't realize this simple conclusion.

"Eschew flamebait. Avoid generic tangents."

https://news.ycombinator.com/newsguidelines.html

OutOfHere

[flagged]

tivert

[flagged]

SketchySeaBeast

Yeah - what Disney does with the mouse is egregious, but if I write a book or creating painting I'd like to not have a thousand imitators xeroxing away any potential earnings.

OutOfHere

It is nothing like vaccines. Zero. I can easily imagine a thriving world without copyrights, but I cannot without vaccines.

henry700

[flagged]

lawrenceyan

This is incredibly sad, Suchir went to my high school and we both went to Berkeley together. He was clearly very intelligent, and I was always sure he'd go on to be very successful / do interesting things.

If you're struggling reading this, I want to say that you're not alone. Even if it doesn't feel like it right now, the world truly wants you to be happy.

The path is open to you:

Old Path White Clouds [0]

Opening the Heart of Compassion [1]

Seeing That Frees [2]

[0] https://z-library.sk/book/1313569/e77753/old-path-white-clou... [1] https://z-library.sk/book/26536611/711f2c/opening-the-heart-... [2] https://z-library.sk/book/3313275/acb03c/seeing-that-frees-m...

cbracketdash

His parents are skeptical it's a suicide and are ordering a second, independent autopsy: https://sfist.com/2024/12/26/parents-of-openai-whistleblower...

bdndndndbve

This is extremely sad and I'm sorry for Suchir's family and friends.

As someone who has struggled with suicidal ideation while working in the tech industry for over a decade, I do wonder if the insane culture of Bay Area tech has a part to play.

Besides the extreme hustle culture mindset, there's also a kind of naive techno-optimism that can make you feel insane. You're surrounded by people who think breaking the law is OK and that they're changing the world by selling smart kitchen appliances, even while they're exploiting workers in developing countries for cheap tech support and stepping over OD victims outside their condo.

This mindset is so pervasive you really start to wonder if you're crazy for having empathy or any sense of justice.

I have no special insight except to guess that going from being an obviously brilliant student at Berkeley to a cut-throat startup like OpenAI would be a jarring experience. You've achieved everything you worked your whole life for, and you find you're doing work that is completely out of whack with your morals and values.

imglorp

Further piling on potential stress for any whistleblower in a highly specialized field, once you're publicly critical of that field, you're basically unemployable there. And that's without any active retribution from the offending employer. Any retribution, such as blacklisting among peer HR departments would bring an even dimmer outlook.

steeeeeve

Mental health challenges in the bay area tech industry are real for a wide variety of reasons. There's a bigger push in silicon valley for work life balance and mental health care than anywhere else I've been, but more people with serious issues there than anywhere else I've been as well.

Imposter syndrome is high among engineers of all levels of experience and ability. Engineering has it's own set of pressures. Then you add in all the other reasons people can feel stressed or pressured and all of the bay area specific reasons those things are amplified. It adds up.

You would be surprised how many brilliant and highly capable people have broken down. For anyone out there feeling like they are all alone - don't. Even if all the people around you seem happy and confident, I guarantee that a larger portion of them than you realize are struggling.

gsibble

Well put. Almost all of the SF startups I worked for were run by sociopaths willing to break any rule I eventually learned. One is now being charged by the FTC for massive violations. I hated the immoral mindset of winning at the cost of everything from employee comfort to flagrantly illegal activities with customers.

null

[deleted]

strogonoff

Suchir’s suicide (if it was a suicide) is a tragedy. I happen to share some of his views, and I am negative on the impact of current ML tech on society—not because of what it can do, but precisely because of the way it is trained.

The ends do not justify the means—and it is easy to see the means having wide-ranging systemic effects besides the ends, even if we pretended those ends were well-defined and planned (which, aside from the making profit, they are clearly not: just think of the nebulous ideas and contention around AGI).

gsibble

I enjoy using Generative AI but have significant moral qualms with how they train their data. They flagrantly ignore copyright law for a significant amount of their data. The fact they do enter into licensing agreements with some publishers basically shows they know they are breaking the law.

lolinder

Normally the word "whistleblower" means someone who revealed previously-unknown facts about an organization. In this case he's a former employee who had an interview where he criticized OpenAI, but the facts that he was in possession of were not only widely known at the time but were the subject of an ongoing lawsuit that had launched months prior.

As much as I want to give this a charitable reading, the only explanation I can think of for using the word whistleblower here is to imply that there's something shady about the death.

anon373839

> Normally the word "whistleblower" means someone who revealed previously-unknown facts

Not to be pedantic, but this is actually incorrect, both under federal and California law. Case law is actually very explicit on the point that the information does NOT need to be previously unknown to qualify for whistleblower protection.

However, disclosing information to the media is not typically protected.

Terr_

I think their post boils down to: "This title implies someone would have a strong reason to murder them, but that isn't true."

We can evaluate that argument without caring too much about whether the writer intended it, or whether some other circumstances might have forced their word-choice.

blast

From the article:

"The Mercury News and seven sister news outlets are among several newspapers, including the New York Times, to sue OpenAI in the past year."

That's a conflict of interest when it comes to objective reporting.

lolinder

Right, but as you note the legal definition doesn't apply here anyway, we're clearly using the colloquial definition of whistleblower. And that definition comes with the implication that powerful people would want a particular person dead.

In this case I see very little reason to believe that would be the case. No one has hinted that this employee has more damning information than was already public knowledge, and the lawsuit that he was going to testify in is one in which the important facts are not in dispute. The question doesn't come down to what OpenAI did (they trained on copyrighted data) but what the law says about it (is training on copyrighted data fair use?).

anon373839

Well, I still disagree. In reality companies still retaliate against whistleblowers even when the information is already out there. (Hence the need for Congress, federal courts and the California Supreme Court to clarify that whistleblower activity is still protected even if the information is already known.)

I, of course, am not proposing that OpenAI assassinated this person. Just pointing out that disclosures of known information can and do motivate retaliation, and are considered whistleblowing.

stefan_

We are? It's just you here, making a bizarre nitpick in a thread on a persons death.

calf

> Normally the word "whistleblower" means someone who revealed previously-unknown facts about an organization.

A whistleblower could also be someone in the process of doing so, i.e. they have a claim about the organization, as well as a promise to give detailed facts and evidence later in a courtroom.

I think that's the more commonsense understanding of what whistleblowers are and what they do. Your remark hinges on a narrow definition.

jll29

Technically, the term "insider witness of the prosecution" could fit his role.

ninetyninenine

No. Anytime someone potentially possesses information that is damning to a company and that person is killed… the low probability of such an even being a random coincidence is quite low. It is so low such that it is extremely reasonable to consider the potential for an actual assassination while not precluding that a coincidence is a possibility.

lolinder

> Anytime someone potentially possesses information that is damning to a company and that person is killed… the low probability of such an even being a random coincidence is quite low.

You're running into the birthday paradox here. The probability of a specific witness dying before they can testify in a lawsuit is low. The probability of any one of dozens of people involved in a lawsuit dying before it's resolved is actually rather high.

smt88

A 26yo dying is not "one of dozens," it's ~1/10,000 in the US (and likely much lower if we consider this guy's background and socioeconomic status).

ninetyninenine

Right, so given the paradox, consider both possibilities rather then dismiss one like the parent is implying here.

FireBeyond

I've listened to many comments here on some of these, saying it must be assassination "because the person insisted, "If I'm ever found dead, it's not suicide!"." This is sometimes despite extensive mental health history.

Entirely possible.

But in my career as a paramedic, I've (sadly) lost count of the number of mental health patients who have said, "Yeah, that was just a glitch, I'm not suicidal, not now/nor then." ... and gone on to commit or attempt suicide in extremely short order.

ninetyninenine

Right. It could be but it could not be. Your paramedic knowledge makes sense and you’ve rightly stated that the assassination theory is a possibility.

SideQuark

Computer the probability, don’t make claims without making a solid estimate.

No, it’s not low. No need to put conspiracies before evidence, and certainly not by making claims you’ve not done no diligence on.

And the article provides statements by professionals who routinely investigate homicides and suicides that they have no reason to believe anything other than suicide.

ninetyninenine

Who the hell can compute a number from this? All probabilities on this case are made with a gut.

Why don’t you tell me the probability instead of demanding one from me? You’re the one making a claim that professional judgment makes the probability so solid that it’s basically a suicide. So tell me about your computation.

What gets me is the level of stupid you have to be to not even consider the other side. Like if a person literally tells you he’s not going to suicide and if he does it’s an assassination then he suicides and your first instinct is to only trust what the professionals say well… I can’t help you.

ADeerAppeared

> but the facts that he was in possession of were not only widely known at the time but were the subject of an ongoing lawsuit that had launched months prior.

That is an exceedingly charitable read of these lawsuits.

Everyone knows LLMs are copyright infringement machines. Their architecture has no distinction between facts and expressions. For an LLM to be capable of learning and repeating facts, it must also be able to learn and repeat expressions. That is copyright infringement in action. And because these systems are used to directly replace the market for human-authored works they were trained on, it is also copyright infringement in spirit. There is no defending against the claim of copyright infringement on technical details. (C.f. Google Books, which was ruled fair use because of it's strict delineation of facts about books and the expressions of their contents, and provides the former but not a substitute for the latter.)

The legal defense AI companies put up is entirely predicated on "Well you can't prove that we did a copyright infringement on these specific works of yours!".

Which is nonsense, getting LLMs to regurgitate training data is easy. As easy at it is for them to output facts. Or rather, it was. AI companies maintain this claim of "you can't prove it" by aggressively filtering out any instances of problematic content whenever a claim surfaces. If you didn't collect extensive data before going public, the AI company quickly adds your works to it's copyright filter and proclaims in court that their LLMs do not "copy".

A copyright filter that scans all output for verbatim reproductions of training data sounds like a reasonable compromise solution, but it isn't. LLMs are paraphrasing machines, any such copyright filter will simply not work because the token sequence 2nd-most-probable to a copyrighted expression is a simple paraphrase of that copyrighted expression. Now, consider: LLMs treat facts and expressions as the same. Filtering impedes the LLM's ability to use and process facts. Strict and extensive filtering will lobotomize the system.

This leaves AI companies in a sensitive legal position. They are not playing fair in the courts. They are outright lying in the media. The wrong employees being called to testify will be ruineous. "We built an extensive system to obstruct discovery, here's the exact list of copyright infringement we hid". Even just knowing which coworkers worked on what systems (and should be called to testify) is dangerous information.

Sure. The information was public. But OpenAI denies it and gaslights extensively. They act like it's still private information, and to the courts, it currently still is.

And to clarify: No I'm not saying murder or any other foul play was involved here. Murder isn't the way companies silence their dangerous whistleblowers anyway. You don't need to hire a hitman when you can simply run someone out of town and harass them to the point of suicide with none of the legal culpability. Did that happen here? Who knows, phone & chat logs will show. Friends and family will almost certainly have known and would speak up if that is the case.

hnfong

If we take the logic of your final paragraph to its ultimate conclusion, it seems companies can avoid having friends and family speak up about the harassment if they just hire a hitman.

shusaku

[flagged]

bbqfog

Isn't it the other way around since OpenAI is training their models on news company content? OpenAI has behaved extremely unethical the entire time it has existed. It's very likely there is foul play here, it fits the pattern.

lolinder

[flagged]

lyu07282

You assume he revealed everything he knew, he was most likely under NDA, the ongoing lawsuit cited him as a source. Which presumably he didn't yet testify for and now he never will be able to. His (most likely ruled suicide inb4) death should also give pause to the other 11 on that list:

> He was among at least 12 people — many of them past or present OpenAI employees — the newspaper had named in court filings as having material helpful to their case, ahead of depositions.

lolinder

Being one of 12+ witnesses in a lawsuit where the facts are hardly in dispute is not the same as being a whistleblower. The key questions in this lawsuit are not and never were going to come down to insider information—OpenAI does not dispute that they trained on copyrighted material, they dispute that it was illegal for them to do so.

bobthecowboy

It seems like it would matter if they internally believed/discussed it being illegal for them to do so, but then did it anyway and publicly said they felt they were in the clear.

lyu07282

So the lawyers who said they had "possession of information that would be helpful to their case" were misleading? Your whole rationalization seems very biased. He raised public awareness (including details of) of some wrongdoing he perceived at the company and was most likely going to testify about those wrongdoings, that qualifies as a whistleblower in my book.

null

[deleted]