Why your boss isn't worried about AI – "can't you just turn it off?"

144 comments

·October 14, 2025

freetime2

For a real world example of the challenges of harnessing LLMs, look at Apple. Over a year ago they had a big product launch focused on "Apple Intelligence" that was supposed to make heavy use of LLMs for agentic workflows. But all we've really gotten since then are a couple of minor tools for making emojis, summarizing notifications, and proof reading. And they even had to roll back the notification summaries for a while for being wildly "out of control". [1] And in this year's iPhone launch the AI marketing was toned down significantly.

I think Apple execs genuinely underestimated how difficult it would be to get LLMs to perform up to Apple's typical standards of polish and control.

[1] https://www.bbc.com/news/articles/cge93de21n0o

zitterbewegung

Now their strategy is to allow for Apple Events to work with the MCP.

https://9to5mac.com/2025/09/22/macos-tahoe-26-1-beta-1-mcp-i...

__loam

I'm happy they ate shit here because I like my mac not getting co-pilot bullshit forced into it, but apparently Apple had two separate teams competing against each other on this topic. Supposedly a lot of politics got in the way of delivering on a good product combined with the general difficulty of building LLM products.

nlawalker

Where did "can't you just turn it off?" in the title come from? It doesn't appear anywhere in the actual title or the article, and I don't think it really aligns with its main assertions.

meonkeys

It shows up at https://boydkane.com under the link "Why your boss isn't worried about advanced AI". Must be some kind of sub-heading, but not part of the actual article / blog post.

Presumably it's a phrase you might hear from a boss who sees AI as similar to (and as benign/known/deterministic as) most other software, per TFA

nlawalker

Ah, thanks for that!

>Presumably it's a phrase you might hear from a boss who sees AI as similar to (and as benign/known/deterministic as) most other software, per TFA

Yeah I get that, but I think that given the content of the article, "can't you just fix the code?" or the like would have been a better fit.

Izkata

It's a sci-fi thing, think of it along the lines of "What do you mean Skynet has gone rogue? Can't you just turn it off?"

(I think something along these lines was actually in the Terminator 3 movie, the one where Skynet goes live for the first time).

Agreed though, no relation to the actual post.

omnicognate

It's a poor choice of phrase if the purpose is to illustrate a false equivalence. It applies to AI both as much (you can kill a process or stop a machine just the same regardless of whether it's running an LLM) and as little (you can't "turn off" Facebook any more than you can "turn off" ChatGPT) as it does to any other kind of software.

wmf

Turning AI off comes up a lot in existential risk discussions so I was surprised the article isn't about that.

kazinator

> AIs will get more reliable over time, like old software is more reliable than new software.

Was that a humam Freudian slip, or artificial one?

Yes, old software is often more reliable than new.

kstrauser

Holy survivorship bias, Batman.

If you think modern software is unreliable, let me introduce you to our friend, Rational Rose.

noir_lord

Agreed.

Or debuggers that would take out the entire OS.

Or a bad driver crashing everything multiple times a week.

Or a misbehaving process not handing control back to the OS.

I grew up in the era of 8 and 16 bit micros and early PCs, they where hilariously less stable than modern machines while doing far less, there wasn’t some halcyon age of near perfect software, it’s always been a case of things been good enough to be good enough but at least operating systems did improve.

malfist

Remember BSODs? Used to be a regular occurrence, now they're so infrequent they're gone from windows 11

Yoric

I grew up in the same era and I recall crashes being less frequent.

There were plenty of other issues, including the fact that you had to adjust the right IRQ and DMA for your Sound Blaster manually, both physically and in each game, or that you needed to "optimize" memory usage, enable XMS or EMS or whatever it was at the time, or that you spent hours looking at the nice defrag/diskopt playing with your files, etc.

More generally, as you hint to, desktop operating systems were crap, but the software on top of it was much more comprehensively debugged. This was presumably a combination of two factors: you couldn't ship patches, so you had a strong incentive to debug it if you wanted to sell it, and software had way fewer features.

Come to think about it, early browsers kept crashing and taking down the entire OS, so maybe I'm looking at it with rosy glasses.

binarymax

You know, I had spent a good amount of years not having even a single thought about rational rose, and now that’s all over.

kstrauser

I do apologize. I couldn't bear this burden alone.

cjbgkagh

How much of that do you think would be attributable to IBM or Rational Software?

sidewndr46

Rational Rhapsody called and wants the crown back

kazinator

At least that project was wise enough to use Lisp for storing its project files.

joomla199

Neither, you’re reading it wrong. Think of it as codebases getting more reliable over time as they accumulate fixes and tests. (As opposed to, say, writing code in NodeJS versus C++)

giancarlostoro

Age of Code does not automatically equal quality of code, ever. Good code is maintained by good developers. A lot of bad code is pushed out by management, and other situations, or just bad devs. This is a can of worms you're talking your way into.

LeifCarrotson

You're using different words - the top comment only mentioned the reliability of the software, which is only tangentially related to the quality, goodness, or badness of the code used to write it.

Old software is typically more reliable, not because the developers were better or the software engineering targeted a higher reliability metric, but because it's been tested in the real world for years. Even more so if you consider a known bug to be "reliable" behavior: "Sure, it crashes when you enter an apostrophe in the name field, but everyone knows that, there's a sticky note taped to the receptionist's monitor so the new girl doesn't forget."

Maybe the new software has a more comprehensive automated testing framework - maybe it simply has tests, where the old software had none - but regardless of how accurate you make your mock objects, decades of end-to-end testing in the real world is hard to replace.

As an industrial controls engineer, when I walk up to a machine that's 30 years old but isn't working anymore, I'm looking for failed mechanical components. Some switch is worn out, a cable got crushed, a bearing is failing...it's not the code's fault. It's not even the CMOS battery failing and dropping memory this time, because we've had that problem 4 times already, we recognize it and have a procedure to prevent it happening again. The code didn't change spontaneously, it's solved the business problem for decades... Conversely, when I walk up to a newly commissioned machine that's only been on the floor for a month, the problem is probably something that hasn't ever been tried before and was missed in the test procedure.

1313ed01

Old code that has been maintained (bugfixed), but not messed with too much (i.e. major rewrites or new features) is almost certain to be better than most other code though?

hatthew

I think we all agree that the quality of the code itself goes down over time. I think the point that is being made is that the quality of the final product goes up over time.

E.g. you might fix a bug by adding a hacky workaround in the code; better product, worse code.

prasadjoglekar

It actually might. Older code running in production is almost automatically regression tested with each new fix. It might not be pretty, but it's definitely more reliable for solving real problems.

kube-system

The author didn't mean that an older commit date on a file makes code better.

The author is talking about the maturity of a project. Likewise, as AI technologies become more mature we will have more tools to use them in a safer and more reliable way.

izzydata

Sounds more like survivorship bias. All the bad codebases were thrown out and only the good ones lasted a long time.

wvenable

In my experience actively maintained but not heavily modified applications tend towards stability over time. It don't even matter if they are good or bad codebases -- even a bad code will become less buggy over time if someone is working on bug fixes.

New code is the source of new bugs. Whether that's an entirely new product, a new feature on an existing project, or refactoring.

wsc981

Basically the Lindy Effect: https://en.wikipedia.org/wiki/Lindy_effect

james_marks

I’ve always called this “Work Hardening”, as in, the software has been improved over time by real work being done with it.

jazzyjackson

Ok, but metal that has been hardened is more prone to snapping once it loses its ductility

kazinator

You mean think of it as opposite to what is written in the remark, and then find it funny?

Yes, I did that.

glitchc

Perhaps better rephrased as "software that's been running for a (long) while is more reliable than software that only started running recently."

xutopia

The most likely danger with AI is concentrated power, not that sentient AI will develop a dislike for us and use us as "batteries" like in the Matrix.

darth_avocado

The reality is that the CEO/executive class already has developed a dislike for us and is trying to use us as “batteries” like in the Matrix.

vladms

Do you know personally some CEO-s? I know a couple and they generally seem less empathic than the general population, so I don't think that like/dislike even applies.

On the other hand, trying to do something "new" is lots of headaches, so emotions are not always a plus. I could make a parallel to doctors: you don't want a doctor to start crying in a middle of an operation because he feels bad for you, but you can't let doctors doing everything that they want - there needs to be some checks on them.

darth_avocado

I would say that the parallel is not at all accurate because the relationship between a doctor and a patient undergoing surgery is not the same as the one you and I have with CEOs. And a lot of good doctors have emotions and they use them to influence patient outcomes positively.

ljlolel

CEOs (even most VCs) are labor too

toomuchtodo

Labor competes for compensation, CEOs compete for status (above a certain enterprise size, admittedly). Show me a CEO willingly stepping down to be replaced by generative AI. Jamie Dimon will be so bold to say AI will bring about a 3 day week (because it grabs headlines [1]) but he isn't going to give up the status of running JPMC; it's all he has besides the wealth, which does not appear to be enough. The feeling of importance and exceptionalism is baked into the identity.

[1] https://fortune.com/article/jamie-dimon-jpmorgan-chase-ceo-a...

icedchai

Almost everyone is "labor" to some extent. There is always a huge customer or major investor that you are beholden to. If you are independently wealthy then you are the exception.

pavel_lishin

Do they know it?

darth_avocado

Until shareholders treat them as such, they will remain in the ruling class

nancyminusone

To me, the greatest threat is information pollution. Primary sources will be diluted so heavily in an ocean of generated trash that you might as well not even bother to look through any of it.

tobias3

And it imitates all the unimportant bits perfectly (like spelling, grammar, word choice) while failing at the hard to verify important bits (truth, consistency, novelty)

ben_w

Concentrated power is kinda a pre-requisite for anything bad happening, so yes, it's more likely in exactly the same way that given this:

  Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

"Linda is a bank teller" is strictly more likely than "Linda is a bank teller and is active in the feminist movement" — all you have is P(a)>P(a&b), not what the probability of either statement is.

navane

The power concentration is already massive, and a huge problem indeed. The ai is just a cherry on top. The ai is not the problem.

mmmore

You can say that, and I might even agree, but many smart people disagree. Could you explain why you believe that? Have you read in detail the arguments of people who disagree with you?

mrob

Why does an AI need the ability to "dislike" to calculate that its goals are best accomplished without any living humans around to interfere? Superintelligence doesn't need emotions or consciousness to be dangerous.

Yoric

It needs to optimize for something. Like/dislike is an anthropomorphization of the concept.

mrob

It's an unhelpful one because it implies the danger is somehow the result of irrational or impulsive thought, and making the AI smarter will avoid it.

worldsayshi

> power resides where men believe it resides

And also where people believe that others believe it resides. Etc...

If we can find new ways to collectively renegotiate where we think power should reside we can break the cycle.

But we only have time to do this until people aren't a significant power factor anymore. But that's still quite some time away.

preciousoo

Seems like a self fulfilling prophecy

yoyohello13

Definitely not ‘self’ fulfilling. There are plenty of people actively and vigorously working to fulfill that particular reality.

drsupergud

> bugs are usually caused by problems in the data used to train an AI

This also is a misunderstanding.

The LLM can be fine, the training and data can be fine, but because the LLMs we use are non-deterministic (at least in regard to their being intentional attempts at entropy to avoid always failing certain scenarios) current algorithms are inherently by-design not going to always answer every question correctly that it potentially could have if the values that fall within a range had been specific values for that scenario. You roll the dice on every answer.

coliveira

This is not necessarily a problem. Any programming or mathematical question has several correct answers. The problem with LLMs is that they don't have a process to guarantee that a solution is correct. They will give a solution that seems correct under their heuristic reasoning, but they arrived at that result in a non-logical way. That's why LLMs generate so many bugs in software and in anything related to logical thinking.

vladms

> Any programming or mathematical question has several correct answers.

Huh? If I need to sort the list of integer number of 3,1,2 in ascending order the only correct answer is 1,2,3. And there are multiple programming and mathematical questions with only one correct answer.

If you want to say "some programming and mathematical questions have several correct answers" that might hold.

Yoric

"1, 2, 3" is a correct answer

"1 2 3" is another

"After sorting, we get `1, 2, 3`" yet another

etc.

At least, that's how I understood GP's comment.

naasking

I think more charitably, they meant either that 1. There is often more than one way to arrive at any given answer, or 2. Many questions are ambiguous and so may have many different answers.

redblacktree

What about multiple notational variations?

1, 2, 3

1,2,3

[1,2,3]

1 2 3

etc.

naasking

> The problem with LLMs is that they don't have a process to guarantee that a solution is correct

Neither do we.

> They will give a solution that seems correct under their heuristic reasoning, but they arrived at that result in a non-logical way.

As do we, and so you can correctly reframe the issue as "there's a gap between the quality of AI heuristics and the quality of human heuristics". That the gap is still shrinking though.

tyg13

I'll never doubt the ability of people like yourself to consistently mischaracterize human capabilities in order to make it seem like LLMs' flaws are just the same as (maybe even fewer than!) humans. There are still so many obvious errors (noticeable by just using Claude or ChatGPT to do some non-trivial task) that the average human would simply not make.

And no, just because you can imagine a human stupid enough to make the same mistake, doesn't mean that LLMs are somehow human in their flaws.

> the gap is still shrinking though

I can tell this human is fond of extrapolation. If the gap is getting smaller, surely soon it will be zero, right?

themanmaran

> Because eventually we’ll iron out all the bugs so the AIs will get more reliable over time

Honestly this feels like a true statement to me. It's obviously a new technology, but so much of the "non-deterministic === unusable" HN sentiment seems to ignore the last two years where LLMs have become 10x as reliable as the initial models.

CobrastanJorji

They have certainly gotten better, but it seems to me like the growth will be kind of logarithmic. I'd expect them to keep getting better quickly for a few more years and then kinda slow and eventually flatline as we reach the maximum for this sort of pattern matching kind of ML. And I expect that flat line will be well below the threshold needed for, say, a small software company to not require a programmer.

Terr_

> kind of logarithmic

https://en.wikipedia.org/wiki/Sigmoid_function

CobrastanJorji

Ironically, yes. :)

criddell

Right away my mind went to "well, are people more reliable than they used to be?" and I'm not sure they are.

Of course LLMs aren't people, but an AGI might behave like a person.

Yoric

By the time a junior dev graduates to senior, I expect that they'll be more reliable. In fact, at the end of each project, I expect the junior dev to have grown more reliable.

LLMs don't learn from a project. At best, you learn how to better use the LLM.

They do have other benefits, of course, i.e. once you have trained one generation of Claude, you have as many instances as you need, something that isn't true with human beings. Whether that makes up for the lack of quality is an open question, which presumably depends on the projects.

adastra22

Older people are generally more reliable than younger people.

tptacek

It would help if this piece was clearer about the context in which "AI bugs" reveal themselves. As an argument for why you shouldn't have LLMs making unsupervised real-time critical decisions, these points are all well taken. AI shouldn't be controlling the traffic lights in your town. We may never reach a point where it can. But among technologists, the major front on which these kinds of bugs are discussed is coding agents, and almost none of these points apply directly to coding agents: agent coding is (or should be) a supervised process.

null

[deleted]

smallnix

> bad behaviour isn’t caused by any single bad piece of data, but by the combined effects of significant fractions of the dataset

Related opposing data point to this statement: https://news.ycombinator.com/item?id=45529587

buellerbueller

"Signficiant fraction" does not imply (to this data scientist) a large fraction.

CollinEMac

> It’s entirely possible that some dangerous capability is hidden in ChatGPT, but nobody’s figured out the right prompt just yet.

This sounds a little dramatic. The capabilities of ChatGPT are known. It generates text and images. The qualities of the content of the generated text and images is not fully known.

kube-system

And that sounds a little reductive. There's a lot that can be done with text and images. Some of the most influential people and organizations in the world wield their power with text and images.

luxuryballs

Yeah, and to riff off the headline, if something dangerous is connected to and taking commands from ChatGPT then you better make sure there’s a way to turn it off.

alephnerd

Also, there's a reason AI Red Teaming is now an ask that is getting line item funding from C-Suites.

Nasrudith

Plus there is the 'monkeys with typewriters' problem with both danger and hypothetical good. In contrast, ChatGPT may technically reply to the right prompt with a universal cancer cure/vaccine. Psuedorandomly generating it wouldn't help as you wouldn't recognize it from all of the other queries of things we don't know of as true or false.

Likewise what to ask it for how to make some sort of horrific toxic chemical, nuclear bomb, or similar isn't much good if you cannot recognize it and dangerous capability depends heavily on what you have available to you. Any idiot can be dangerous with C4 and detonator or bleach and ammonia. Even if ChatGPT could give entirely accurate instructions on how to build an atomic bomb it wouldn't do much good because you wouldn't be able to source the tools and materials without setting off red flags.

kelvinjps10

Think of the news about the kid who got recommended to suicide by ChatGPT, or chatgpt providing the user information on how to do illegal activities, these capabilities are the ones that the author it's referring to

avalys

All the same criticisms are true about hiring humans. You don’t really know what they’re thinking, you don’t really know what their values and morals are, you can’t trust that they’ll never make a mistake, etc.

andrewmutz

Tremendous alpha right now in making scary posts about AI. Fear drives clicks. You don't even need to point to current problems, all you have to do is say we can't be sure they won't happen in the future.