Leaked Apple meeting shows how dire the Siri situation is

81 comments

·March 16, 2025

ChrisArchitect

Actual article: https://www.bloomberg.com/news/articles/2025-03-14/apple-s-s... (https://news.ycombinator.com/item?id=43365517)

ChrisArchitect

Something Is Rotten in the State of Cupertino

https://news.ycombinator.com/item?id=43348891

crooked-v

The basic problem here for Apple is that LLMs will never actually be able to avoid prompt injection issues, and the entire "personal awareness" set of functionality they're trying to make uses LLMs. Unless somebody at Apple invents a new state of the art, it's not going to happen.

With that said, I'm surprised they haven't yet at least replaced the 'dumb' Siri commands with something that's effectively an LLM translation layer for an internal API. That would get significantly better experience (even a dumb LLM is way better at understanding natural language directions than Siri) with no 'personal awareness' stuff needed.

krackers

All Apple needs to do is allow 3rd party apps to integrate with Siri, then use an LLM as a way to convert natural language into the set of allowed API calls. Basically similar what chatGPT does with function calling. And on osx, they should have already had a head start due to applescript integration everywhere in most apps. I have no idea why they're trying to reinvent the wheel with "Shortcuts" which is severely limited.

They already have everything waiting there for their taking, and they're squandering it for no reason. Siri on osx should have been built on top of AppleScript from the get-go, then the switch to LLM would have been easy.

For that matter, I wonder why on osx I haven't seen any 3rd party apps be a siri replacement using applescript to drive applications directly. So much effort is spent on screen scraping and trying to get "agents" to use computers like humans do, but for decades osx has already had a parallel set of APIs built into every application specifically for machine consumption. And most good 3rd party apps even make use of it.

tsujamin

They already have a compostable automation api with 3rd party integrations: Shortcuts!

It’s not perfect, but surely you could natural language -> llm -> temporary shortcut script and that gets you a decent part of the way to a smarter Siri

krackers

I'm not as familiar with Shortcuts API but from a quick glance it seems less rich than apple events/apple script. With LLM + apple script you could achieve computer use agents on easy mode. Not just one-off "what's the weather" queries but complex multi-step interactions like "send the latest photo I took to John".

To start with, Automator on mac would be the perfect place for LLM integration. And Script Editor too. Being perhaps one of the few read-only languages, people would probably _prefer_ an LLM spit out applescript for them. And Apple probably has the highest quality data set internally. Combined with the fact that there there is a uniform way to specify the API surface (sdef), this is a task that is possible by most LLMs today. Just apply a little marketing spin to change the angle from "niche power user feature" to "apple uses computer-use agent AI to allow average joe to automate their entire workflow" and it's a smash hit.

From there it's not much of a stretch to slap some speech recognition (just take Whisper, it already beats whatever iOS uses), add some guardrails, and have it be orders of magnitude better than what Siri currently is capable of. And this is all possible with today's LLMs, and thanks to deepseek without paying a cent to anyone else. When interactive computer-use does get mostly solved, that can be added in as a fallback for anything where applescript doesn't cut it (e.g. web navigation, electron apps, etc.). But there's a clear logical progression that allows you to easily evolve the user experience as the technology matures without having to throw out your stack the entire time.

But to me I think their fate was sealed was Shortcuts was shipped on mac when Automator already existed. And it's clear apple events has been a languishing feature, with integration in native apps already breaking.

seec

Shortcuts takes precedence because it was originally created on iOS and TC want absolute convergence between the OS. They don't realize (or don't care) how restricting it is to go this route. To be honest that may be the desired result.

I have tried to use Shortcuts a few times, and if Automator could be frustrating, Shortcuts is even worse. It seems to have improved with th recent scripting additions, but in my opinion it doesn't a lot of sense. On the phone it seems only useful to make up for lacking software with mediocre voice assistant and on the desktop, it is very lacking and limited. Applescript is an obtuse language but at least it's a programming language still, with the Shortcut toolkit it seems like you need to approach everything in a roundabout way.

In any case, it shows the failing of Apple ideology of "easy" computing for everyone. At some point you need to admit that some things are going to need some learning and competency and the best thing you can do is provide tools to enable the competent user. Not create some Frankenstein thing and the incompetent won't understand better anyway.

Apple has really failed, chasing "simplicity" at the expense of actual usefulness. And this has permeated even the desktop OS which is just dumb.

justonceokay

What you are describing is definitely how Alexa used to work, no idea about now. My only point being that I’m sure this has been thought of before

softwaredoug

I’m not sure I want / need an LLM for the handful of basic commands I want Siri to do in the car with no complication. Siri is good at some straightforward command patterns. And that’s how users have been trained.

Adding an LLM feels like a solution looking for a problem

JadoJodo

I have a pretty good grasp on Siri commands and it's still _terrible_. Here are a few from the past few weeks:

“Siri, drive to <regional grocery chain 20 minutes away>” → “Now driving to <regional grocery chain 2 hours away>.”

“Siri, send <name> a message on <popular messaging app>.” → One of _two_ responses (seems random): “Sorry, I can't do that while you're in the car” OR “What would you like to say to <name>?”

“Siri, play '<song in my Apple Music library by an artist I play every week>'.” → “Now playing <some song I've never heard of by an artist I've never played>.”

“Siri, remind me when I get home to <task>.” → “Sorry, I don't have your address.” (My contact information, with complete address, is in my Apple Contacts, and set as my card.)

The list goes on. It's actually become somewhat of a game for my wife whether Siri will actually complete a given request.

seec

That's pretty much my experience as well. Even something as dumb and simple as setting timers is getting messed up some times. To the point where I'm just considering buying a hardware multi-timer.

I'm not sure how good Google is at this but it seems this sort of technology has hard limits anyway. It's already hard for humans to understand each other so I wonder if all of this isn't just a giant distraction and a big waste of time.

mikodin

As another mentioned, many of us haven't been trained. Every time I've reached for Siri I've been let down and just did it myself or didn't do it at all.

When I bought the new iphone (coming from Android most recently and iphone before that) I figured that Siri would actually become good because of this translation layer, and then them just not doing that has been a huge let down for me on the phone.

kibibu

> how users have been trained

Palm's Graffiti handwriting system, for people who took the time to learn it, was strictly better and less frustrating than natural handwriting systems at the time. It lost to keyboard-based systems, which honestly are worse.

In my experience, "good enough" systems that need far less training will win *

I also only use Google Assistant to choose music and make calls in the car, but that's also because it kinda sucks at anything else and I haven't learned all the commands.

---

* except for hr systems, for some reason

netsharc

LLMs are probably better at "understanding" what the user wants, e.g. "Hey Siri, I have to take my kids to the doctor tomorrow" would reply "Do you want me to make a calendar event?".

Asking ChatGPT, it even has suggestions like "Do you want me to check traffic, do you want me to make a checklist for what to bring?".

If I continue with "but I can't make it", one of the suggestions ChatGPT imagine possible is: "Would you like me to send a message to your partner, babysitter, or anyone else involved?"

mercer

if they used something like Whisper for transcription, the quality of understanding what I'm saying would already massively improve.

wongarsu

Yes, for users that have been conditioned on how to talk to Siri, there is no point adding an LLM layer. But for everyone else an LLM might add a more natural and forgiving interface. Also an interface that might be more open to exploration, if you can describe a goal and the LLM picks whatever command it feels fits that intent.

I can absolutely see an argument that it should be a feature power users can turn off to keep the "original" Siri experience

null

[deleted]

timewizard

Everyone thought LLMs would scale exponentially like everything else in computer science. It's actually linear and they're all stuck holding a bag of mostly worthless goods.

null

[deleted]

djohnston

This problem of non-technical product folks over-promising features is going to get much worse in the age of LLMs. The models are incredibly adept at providing a proof-of-concept. But it's often a monumental endeavour to cross the gap between 70% and 90% accurcacy; 90%-95% even more. This long-tail pain isn't new, but the ability for non-technical folks to poke a model on a chat site and then assume their idea is ready to be deployed to production is a more recent phenomenon.

ChrisMarshallNY

> The models are incredibly adept at providing a proof-of-concept. But it's often a monumental endeavour to cross the gap

That's not just with LLMs. This has been an Achille's heel of demos for decades.

It's usually quite easy to make a demo of something sexy, but making it ship, is a very, very different thing.

This is made exponentially worse, by managers thinking that the demo means "we're almost ready," and even by the engineers, themselves, who are often not experienced enough to understand that demo != ship. Also, it has been my experience, that researchers that come up with amazing ideas, have absolutely no appreciation for what it takes to turn their ideas into ship product. Since Apple is all about ship product, they have to bridge that gulf. They are trying to commoditize something that is mostly buzz and raw research, right now.

I'm really pretty jaded about ML and LLMs in general, right now. I feel as if a whole bunch of really sexy demos were released, creating a huge level of buzz, but that the shipping product is still a ways off.

elicksaur

I don’t disagree. I do think there will be a tendency to say, “We can do X using AI.” When X can happen, but it isn’t guaranteed to happen by the system.

Here, it doesn’t sound like the features promised were truly demo-able, and when they were they were “not working properly up to a third of the time”.

Having a random CRUD MVP that is 2/3rds done is different than having a SOTA LLM implementation only being 2/3rds reliable. It is a vastly different problem to get from there to the finish line.

But I think marketing types would be equally likely to make promises in both scenarios.

pgwhalen

I’m getting this sense too, as my own employer is starting to add AI features. The pitches for what is possible are not grounded in what engineers know the system can do, they’re just assumptions about “if we give the LLM these inputs, we’d expect to see these outputs that are accurate enough for meaningful productivity gains.”

I’m not an AI skeptic, but it’ll be interesting to see how we manage the uncertainty in these projects.

KevinMS

> But it's often a monumental endeavour to cross the gap between 70% and 90% accurcacy; 90%-95% even more.

That sounds exactly what we went through in the late 90's with voice recognition (remember dragon naturally speaking and via voice?).

walterbell

https://archive.is/hj3Om

> Apple’s product marketing organization, marketing communications, software engineering team and AI organization are all culpable.. Given the nature of this collective failure, it’s unlikely to result in management changes on Cook’s executive team.. the whole mess is likely to come to a head this week at Apple’s annual Top 100 offsite. A March tradition since the days of Steve Jobs, this is a meeting of the company’s 100 most important executives.

Since no one was responsible for this collective failure, can a single leader be given responsibility for future Siri success?

  Management is about persuading people to do things they do not want to do, while leadership is about inspiring people to do things they never thought they could.  —Steve Jobs

peatmoss

I upgraded to an iPhone 16 in part because I was interested in the AI augments to iOS. My assumption was that Apple was likely to do something tasteful, and probably not nuclear-level bad in terms of privacy.

I've since disabled everything AI. My feeling is that my phone is much snappier without. Text summary features are pretty bad—in one case a summary of a text thread led me to believe my dad was having an acute health crisis when in fact it was a fairly benign discussion of news from doctors. Even custom emoji, which could have been fun, are simply bad.

Meanwhile some basic iOS features like "mute all notifications from this particular text thread" have been broken for me for several months.

karlgkk

Their ai summary really eats shit when it comes to slang heavy, short messages.

It’d be genuinely useful if it kicked on in threads with high amounts of unread messages

resoluteteeth

I have an android phone and switching google assistant to gemini made it worse for the fairly basic things I actually use google assistant for, like setting reminders, so I switched it back.

Are people really clamoring for AI in the voice assistant on their phone?

jsheard

> so I switched it back.

Google has helpfully decided that you don't actually want to do that:

https://blog.google/products/gemini/google-assistant-gemini-...

later this year, the classic Google Assistant will no longer be accessible on most mobile devices

dismalaf

The Gemini based assistant has been getting better... It was weird to see the massive step backwards though.

wongarsu

Using a fully-fledged flagship model as a personal assistant actually works pretty well. You don't need a task app, just tell the AI to remember and remind you. Granted, now you have to inject prompts to work around context limitations, allow the AI to schedule wake ups for itself or allow it to send notifications, etc. But those are issues with straight forward solutions. The issue is that for now those models are incredibly expensive and resource intensive to run. And a quantized 3.25B model like Gemini Nano just can't perform the same.

juujian

Yup, had to stop using it, next step was to get rid of Gemini altogether. It's so weird to claim you give people an advanced AI when actually you are removing a whole bunch of features.

replyifuagree

+1 OMG I was going die using that Gemini garbage! Was so glad they had a way to switch back!

pram

My Nest can still only get “ok google turn on the light” the second time I say it. Good shit!

replyifuagree

This is why I still haven't pulled the trigger on home automation stuff. Bad enough keeping my laptop functioning, don't want that experience extending to the rest of my life!

harrall

I know Apple has been known to wait until a technology was very mature before putting it into a product, sometimes years after competitors had already done so.

But Apple Intelligence seems like the complete opposite… like they had to have something now.

dehrmann

It's not even to the level of Apple Intelligence. If I tell Siri to set a timer for 5 minutes, it goes off, and I say "Hey Siri, stop alarm," it stops my daily wake-up alarm, not the timer. It feels like Siri hasn't gotten any better in the past decade.

replyifuagree

It is almost like Apple has an operations guy at the helm who doesn't understand software at all!

EPWN3D

The same year the tech industry got taken by storm by LLMs, Apple announced a brutally expensive, niche VR/AR headset with no particular use case. That product was essentially DOA.

I think this is the first time in decades where Apple took a step from their back foot. The Vision Pro announcement during the initial hype cycle of ChatGPT made it look like they were just totally out of step with the industry. It was a dud product that cost billions in R&D, the market for it was unclear, and the industry zeitgeist (and talent) was going in a totally different direction.

So their response was to rush something out the door to avoid looking like they were caught flat-footed. Now it's abundantly obvious they were unprepared for AI, they've failed to ship features they promised, and they have to commit resources to support an additional platform that no one cares about because they did ship that one. Worst of all worlds.

Had they slept on the Vision Pro, I'm pretty convinced they could have credibly kept their powder dry on LLMs, like they've done with many industry hype cycles in the last. People would've assumed they were doing the Apple thing of not being first to ship something, but the first to ship compelling.

seec

Vision Pro is a vanity product for Tim Cook. Once you understand that it makes a lot more sense. It's the only product he has ever pushed, and I believe it's because of the shallow wow factor which fits TC very well. In any case it would always have been a nice product, regardless of price, but at this price it's just nonsensical.

replyifuagree

Yep, Tim Cook = Steve Ballmer

walterbell

> Had they slept on the Vision Pro

2025 versions of iOS, iPadOS and macOs are all being redesigned based on VisionOS, despite VisionPro production and successor cancellation.

napkin

Is there evidence Vision Pro production was ‘cancelled’ as opposed to just running its limited course as planned? Are there substantial leaks indicating its successor has been cancelled?

CamperBob2

Had they slept on the Vision Pro, I'm pretty convinced they could have credibly kept their powder dry on LLMs

No shortage of powder at Apple, it's safe to say.

Aeolun

They could have waited with the vision pro too. Just keep the tech in their back pocket until they could release something that was viable for consumers.

binarymax

The personal context issue could have been solved long before LLMs appeared.

My foil for this has always been the simple request from Siri: "Take me home".

Apple and my device know where I live. I have even registered "home" in Apple maps. This is not a huge leap, and this is not even something that requires an LLM. But it does force the featureset and capability to be better. This was a problem in 2015. 10 years later I still can't make this simple request.

I don't know why Apple just froze Siri and put it on life support. They could have been doing far better in this space.

cblum

On a related note, I don’t know how Apple has managed to mess up so badly the idea of storing your own information like address, phone number, etc.

Every time I’ve moved, for months afterwards I keep having to fix my address when using Apple Pay or setting directions to home, etc. because it seems like they keep multiple copies of that information all over the place, instead of referring to a single piece of information.

Places I can think of where I’ve had to update my address one by one: each credit card I have associated with Apple Pay; Maps; my own contact that I can share with others; subscription information; shipping address for Apple Pay.

I get it that some people use separate billing and shipping addresses, but for the vast majority of people the use case is “I have a single home address where I want to direct Maps to, be billed at, and ship stuff to, please update it all at once when I update my address.”

seec

Yes it's extremely dumb and annoying. It's everywhere in Apple products, for example your Health ID cannot get your weight for your latest recorded weight in the app. I mean, it's even the same app, it's like the devs can't be bothered to make a smart product.

I think this is because of the way they silo everything, the way they are paranoid about privacy/security (mostly for their marketing, they don't care that much) and the culture that isn't very competitive anymore.

It finally hurts them because the experience isn't coherent and very frustrating for the user. It's funny because the marketing is all about the "ecosystem" but in reality, the coherence is mostly about how things look.

The rumor of a big redesign isn't surprising; instead of working on how things actually work (they wouldn't need to back off some marketing choice and that seems impossible) they will put a coat of paint and make people focus on good looks.

The last redesign was terrible and it took an awful lot of time to get things back to half-decent. It's just one more motivation to not renew for an iPhone I guess...

indemnity

I encounter this every time there is an Apple hardware release and I try to buy on the phone, and pay with Apple Pay.

They’ll have lost shipping and billing address and I have to laboriously type it in again before I can pay for it.

Sometimes I will have purchased from them, the very prior week, sending to the same address they now lost.

How a trillion dollar company cannot figure out the most basic functionality of an online shop I don’t know.

xmddmx

FYI, I find that "navigate home" works, if you want to get home. (I agree with your complaint however that Siri is brain dead much of the time)

binarymax

I just tried this, thinking that I was "holding it wrong" and no - it still doesn't work. It says it doesn't know where I live.

creddit

The most interesting question arising from this is the meta-question: who leaked this meeting and why?

walterbell

And how, given Apple's Severance-grade secrecy?

vosper

Siri today is capable of doing anything I really want it to.

If it only it could f*ckin understand me! Whisper (or whatever the ChatGPT app uses) runs rings around. Basically zero errors.

For me they just need to fix the goddamn voice recognition performance.

walterbell

Could a 3rd-party app use Whisper to drive Siri / Shortcuts?

rgovostes

My speculation is that the small on-device models are simply not useful enough for practical purposes.

The on-device transformer models are described as having 3B parameters[0]. Their own figures from 6 months ago show that humans prefer other models' output 25-40% of the time, which aligns with Gurman's reporting.

I don't know how well Apple's A-series chips can handle transformers, but if you play with their on-device diffusion models (Image Playground, Genmoji), you can watch your battery drop before your eyes. And the output of these models is also embarrassingly behind the state of the art.

If Apple can make great foundation models for Private Cloud Compute, that's great. But then what's the point of buying a new iPhone for on-device inference capabilities?

Secondly, I speculate that allowing a server-side model to query the personal context database is going to be hard to do performantly, requiring several round trips and uploading fragments of the data to the cloud for processing.

0: https://machinelearning.apple.com/research/introducing-apple...

seec

Apple's approach makes no sense but they don't have a choice because of all the past marketing nonsense.

They need to justify their hardware quality/price which is becoming irrelevant because even the cheapest phone work just fine without lags nowadays. AI will be something to be done on servers for the foreseeable future but Apple put itself in a hole with their privacy nonsense.

If instead they had truly invested on Macs and allowed their customers to build their own personal cloud, they would have a business case. Instead, they were extremely greedy and tried to double dip with high hardware price and "service" subscriptions (their low storage devices are made just for that). They are going to pay for this short-term thinking, eventually...

HN

Leaked Apple meeting shows how dire the Siri situation is

Leaked Apple meeting shows how dire the Siri situation is