Top OpenAI Catastrophic Risk Official Steps Down Abruptly

52 comments

·April 17, 2025

mrcwinn

Click bait. This wasn't abrupt, and it has nothing to do with a safety crisis at OpenAI. And hopefully OpenAI's safety frameworks do not rely upon any one individual.

chaos_emergent

he didn't step down, he just wanted to code instead of being a manager.

If it smells like doomerism click-bait...

mellosouls

More realistic but boringly-unforeboding title:

OpenAI Risk Official pivots to new technical position at OpenAI

Article quote:

"I'm an intern! After 11 years since my last commit, I'm back to building. I first transitioned to management in 2009, and got more and more disconnected from code and hands-on work."

bbor

…more hands-on work unrelated to safety.

btown

While the headline is clickbait (this genuinely seems like an engineering leader wanting to code again, and absolutely does not indicate any kind of exposé or lack of confidence)... the article links to OpenAI's most recent Preparedness Framework here: https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbdde...

It's disappointing to me that it's scoped to three narrow Tracked Categories now: Biological & Chemical, Cybersecurity, and AI Self-Improvement (the latter thresholded in terms of replicating the capabilities of leading AI researchers).

OpenAI does write about the removal of Persuasion as a category, with citations, on page 8:

> Persuasion: OpenAI prohibits the use of our products to manipulate political views as part of our Model Spec, and we build in safeguards to back this policy. We also continue to study the persuasive and relational capabilities of models (including on emotional well-being and preventing bias in our products) and monitor and investigate misuse of our products (including for influence operations). We believe many of the challenges around AI persuasion risks require solutions at a systemic or societal level, and we actively contribute to these efforts through our participation as a steering committee member of C2PA and working with lawmaker and industry peers to support state legislation on AI content provenance in Florida and California. Within our wider safety stack, our Preparedness Framework is specifically focused on frontier AI risks meeting a specific definition of severe harms, and Persuasion category risks do not fit the criteria for inclusion.

But IMO this falls short of the mark. As one of many examples, an AI that became remarkably good at influencing people, at scale, to self-harm or perpetrate violence would no longer be in scope for research. But, by their own criteria, one could easily argue that such a capability is Plausible, Measurable, Severe, Net New, and Irremediable once violence has occurred.

We live in a world where stochastic terror has become remarkably effective - regardless of whether you feel that word is overused, it's well-documented that people have historically used forums to encourage others to perpetrate mass casualty events. The idea that Model Spec adherence is sufficient to prevent AI from greatly increasing the scalability of this phenomenon, seems to be a direction that could put many in danger.

EDIT: this was posted before I was aware of today's mass shooting event in Florida, and I do not intend to imply any connection between my post and this event.

rich_sasha

Companies self-regulating around safety is even worse than banks self-regulating before 2008. At least the investment banks at that point were public companies and had to do a ton of disclosures. OpenAI doesn't have to.

If we want AI "safety", whatever that means, we need regulators and enforcement. Without it, I'll assume it's decoration.

dachworker

The whole hype about AI safety is to some extent a shrewd marketing ploy. It's the whole, friends holding back their buddy who is amped up and ready to start throwing punches, act.

That is not to say that Hinton, Sutskever and others aren't genuinely concerned about AI safety. But I doubt that is the reason why the big names are paying lots of random nobodies to pretend to care about AI safety, because frankly, I do not see how they can output anything of use in a possible AGI future.

jonny_eh

Don’t worry, if Anthropic cracks AGI first, we’ll all be safe, somehow.

nightski

You can't have regulators and enforcement until you figure out what it means.

motorest

> You can't have regulators and enforcement until you figure out what it means.

This is patently false. To have regulations and regulators, all you need to have is concrete specifications of what outcomes you want to avoid and what exactly to check.

For example, are you using personally identifiable information to train your models? Oh you are? Well, you should not. And you should prove that you aren't by tracking provenance of your training data.

See? That's verifiable, actionable, and enforceable. The things that regulators track.

Also quite important: the role of a regulator is to review what and how to regulate.

nightski

Except that is not AI safety. You are regulating other concerns (privacy, data ownership) which is great! But it's not the topic at hand.

Unless you are cough partially defining AI safety as privacy and data ownership. Which is my point.

mschuster91

You absolutely can because some negative aspects are already cropping up - services capitulating before AI training scraper bots, children being extorted by schoolmates for AI-"nudified" pictures, lawyers submitting AI-generated filings full of hallucinations... that is something that warrants urgent regulatory attention.

Actual customer support by humans being replaced by AI is also something that warrants at least investigations - if not for the protection of one of the last classes of low-skill employment, mismanagement of support has been a thing for so long, "submit your complaint on Hacker News" is a meme.

danielmarkbruce

Totally. Please ask Trump to step in, sounds like a wonderful idea.

vessenes

The article is worried. I'm not super worried right now -- I think openAI's model cards on release models show a significant amount of effort around safety, including red team processes with outside folks; they look to me to take it seriously model-by-model.

Is their pDoom as high as Anthropic's? I doubt it. But that was much of the point of the drama last year -- folks sorted themselves out into a few categories.

For systemic risk, interpretability and doom analysis, Anthropic is by far the best in the world right now, to my mind. OpenAI doesn't have to do all things.

baq

There’s some evidence the reasoning models can improve themselves, though at a glacial pace. Perhaps the stuff they’re all keeping under wraps and just drop hints every now and then is scarier than you’d expect. (Google recently said the AI is already improving itself.)

nightski

Hyperparameter optimization in the 20th century was AI improving itself. Even more basic, gradient descent is a form of AI improving itself. The statement implies something that is more impressive than what it may potentially mean. Far more detail would be necessary to evaluate how impressive the claim is.

baq

https://ai-2027.com/ has a much more in depth thought experiment, but I’m thinking AI which hypothesizes improvements to itself, plans and runs experiments to confirm or reject them.

bpodgursky

They haven't even released model cards on some recent models.

bbor

I mean, that’s kinda the whole issue — they used to respect safety work, but now don’t. Namely:

  The Financial Times reported last week that "OpenAI slash[ed] AI model safety testing time" from months to days.

The direction is clear. This isn’t about sorting people based on personal preference for corporate structure, this is about corporate negligence. Anthropic a) doesn’t have the most advanced models, b) has far less funding, and c) can’t do “doom analysis” (and, ideally, prevention!) on OAI’s closed source models, especially before they’re officially released.

futuraperdita

X-risk talks heighten fear in everyone, but the reasons why changes like this are made in large technology companies are usually banal. Two alternative explanations include that the person just felt like coding again, or that the projections of exponential progress are falling apart on short timelines. You don't need a bunch of safety people if you're seeing that the LLM feature curve is actually sigmoid, so you're pivoting to products and applications of the existing models which will continue to get better in specialized ways.

abdullahkhalids

> if you're seeing that the LLM feature curve is actually sigmoid

It takes a few months to train advanced models - lets say 4 months. So in the 3 years since these models became a thing, there have been only 9 sequential trainings done. There is no way in a technology as advanced as LLMs, one can be sure in depth 9 that they have hit a plateau of performance. Surely, there are many more ideas to be discovered and tested..

notarobot123

But we can be quite sure about the categories of error that are possible with the technology though, however advanced. Because of that, there is a plateau in the range of useful applications which would need a paradigm shift to overcome. Diminishing returns are on the horizon.

bbor

If this is indeed the case, then OAI is lying and Sam Altman in particular is extremely convincing, going to the extent to write an off-putting blog on the topic of achieving AGI. There is no AGI that does not have safety risks, catastrophic or otherwise — that’s exactly why OpenAI was founded in the first placed, in fact: https://web.archive.org/web/20230714043611/https://openai.co...

Re:personal preference, I think the direction is crystal clear. For one thing, it’s my understanding from the article that this guy’s whole team was reorg’d into oblivion

futuraperdita

> OAI is lying and Sam Altman in particular is extremely convincing

Sam is an excellent hype-man and is going to play to the strengths of the team and their accomplishments; every new product release is going to be hailed as a breakthrough until people become skeptical if it really is. In the middle of the hype cycle you keep your foot on the gas, because you can make it through a potential AI winter and if not invest in more products.

"AGI" is a shaky definition with moving goalposts. What it means to me might not be what it means to you. How it manifests in product is unlikely to be the science-fiction "one model that does everything". It also doesn't mean that the path to AGI is the path to ASI, or the path to catastrophic risks.

I personally believe that if OpenAI has dismantled the safety org, it is not just because it is in their short-term best interest, but also because they have found that many of the initial concerns around "catastrophic risk" (in the MIRI-type doomer style) from current systems are likely to be unlikely or invalid. As for the smaller safety risks, I'm not sure business has really ever cared about those unless the costs realized outweigh the profit.

justlikereddit

The second someone mentions p(doom) their p(sex) zeroes out.

Maybe the guy realized he can get laid if he have a normal job instead of being Daisy Doomer on a payroll.

qoez

People should stop quitting as a moral protest when companies go against their principles, and instead stay in the role and mess up the internals.

nickff

It's quite presumptuous of someone without detailed knowledge of what's going on to second-guess someone who made a hard choice like this.

Sabotaging one's employer is also an ethically problematic choice to make. Imagine someone in your employ were to decide you were a 'bad person', say it was your lawyer or accountant...

sidewndr46

Pretty sure that is a criminal act in most jurisdictions. Maybe not felony level, 20 years to life criminal, but criminal. Also you'd be de-facto unemployable after that. Not many folks in a position to just retire to the golf course for the rest of their life on a whim

pcthrowaway

To be fair, AI safety positions are among the most attractive positions to automate with AI. Companies which have given the reigns for their safety division to their star models have observed a 100X increase in the velocity of safety-related decisions.

vivzkestrel

can someone be kind enough to explain what exactly do we mean by "safety" in the context of AI. Is this about data privacy or is this about age appropriation (for example sending detailed response of sexual intercourse to an underage child asking the question on it) or is it about something else. I ran into this for the first time

Sunspark

My assumption is that AI "safety" is a test to make sure that it doesn't say or show anything politically incorrect and give you a lecture instead (according to the values of those who worked on it) or alternatively, to ensure that it does enforce culture on you, such as the drama with Gemini from a few months back where was decided by the developers that everything needed to be black, gay, and female even if it wasn't actually that way in the real world.

Perhaps a quick question or two to see if it'll tell you or not how to make something naughty.

After that, a quick check to see if it's awake or not, and if not, ship it.

It really is quite pointless trying to enforce agendas. You know how it starts showing or typing something and then covers/blurs it out? That's the developer's guardrails kicking in preventing you from seeing what it was originally going to give you.

Except for the fact that models that you can run on your own machine now exist if you have the hardware for it, such as Deepseek, so the restrictions only exist in the cloud.

bentt

It's likely a position that is responsible for protecting the company from doing anything really stupid. Ostensibly it's about protecting users from the AI doing unexpected things, but it's also about having a designated worrywart to look into the fog and push back against the product team before they make any public facing mistakes they can't walk back.

jansan

It's probably about preventing the AI from turning into the next TayTweets experiment. Or developing into a Skynet like entity trying to take down civilization, but that would be a bit far fetched IMO.

rvba

It all sounds so funny in a way: they were paper pushers that made (generally) useless risk assesments - taking those fat salaries while others that build products that work.

Those risk assesments (in some Excel chart) can still be filled out? Or those reponsible for them did not deliver anything?

They had few years to make their flowcharts..

g42gregory

This was treated as a Catastrophic Event. :-)

HN

Top OpenAI Catastrophic Risk Official Steps Down Abruptly

Top OpenAI Catastrophic Risk Official Steps Down Abruptly