Leaked Apple meeting shows how dire the Siri situation is
57 comments
·March 16, 2025crooked-v
krackers
All Apple needs to do is allow 3rd party apps to integrate with Siri, then use an LLM as a way to convert natural language into the set of allowed API calls. Basically similar what chatGPT does with function calling. And on osx, they should have already had a head start due to applescript integration everywhere in most apps. I have no idea why they're trying to reinvent the wheel with "Shortcuts" which is severely limited.
They already have everything waiting there for their taking, and they're squandering it for no reason. Siri on osx should have been built on top of AppleScript from the get-go, then the switch to LLM would have been easy.
For that matter, I wonder why on osx I haven't seen any 3rd party apps be a siri replacement using applescript to drive applications directly. So much effort is spent on screen scraping and trying to get "agents" to use computers like humans do, but for decades osx has already had a parallel set of APIs built into every application specifically for machine consumption. And most good 3rd party apps even make use of it.
tsujamin
They already have a compostable automation api with 3rd party integrations: Shortcuts!
It’s not perfect, but surely you could natural language -> llm -> temporary shortcut script and that gets you a decent part of the way to a smarter Siri
krackers
I'm not as familiar with Shortcuts API but from a quick glance it seems less rich than apple events/apple script. With LLM + apple script you could achieve computer use agents on easy mode. Not just one-off "what's the weather" queries but complex multi-step interactions like "send the latest photo I took to John".
To start with, Automator on mac would be the perfect place for LLM integration. And Script Editor too. Being perhaps one of the few read-only languages, people would probably _prefer_ an LLM spit out applescript for them. And Apple probably has the highest quality data set internally. Combined with the fact that there there is a uniform way to specify the API surface (sdef), this is a task that is possible by most LLMs today. Just apply a little marketing spin to change the angle from "niche power user feature" to "apple uses computer-use agent AI to allow average joe to automate their entire workflow" and it's a smash hit.
From there it's not much of a stretch to slap some speech recognition (just take Whisper, it already beats whatever iOS uses), add some guardrails, and have it be orders of magnitude better than what Siri currently is capable of. And this is all possible with today's LLMs, and thanks to deepseek without paying a cent to anyone else. When interactive computer-use does get mostly solved, that can be added in as a fallback for anything where applescript doesn't cut it (e.g. web navigation, electron apps, etc.). But there's a clear logical progression that allows you to easily evolve the user experience as the technology matures without having to throw out your stack the entire time.
But to me I think their fate was sealed was Shortcuts was shipped on mac when Automator already existed. And it's clear apple events has been a languishing feature, with integration in native apps already breaking.
justonceokay
What you are describing is definitely how Alexa used to work, no idea about now. My only point being that I’m sure this has been thought of before
softwaredoug
I’m not sure I want / need an LLM for the handful of basic commands I want Siri to do in the car with no complication. Siri is good at some straightforward command patterns. And that’s how users have been trained.
Adding an LLM feels like a solution looking for a problem
kibibu
> how users have been trained
Palm's Graffiti handwriting system, for people who took the time to learn it, was strictly better and less frustrating than natural handwriting systems at the time. It lost to keyboard-based systems, which honestly are worse.
In my experience, "good enough" systems that need far less training will win *
I also only use Google Assistant to choose music and make calls in the car, but that's also because it kinda sucks at anything else and I haven't learned all the commands.
---
* except for hr systems, for some reason
mikodin
As another mentioned, many of us haven't been trained. Every time I've reached for Siri I've been let down and just did it myself or didn't do it at all.
When I bought the new iphone (coming from Android most recently and iphone before that) I figured that Siri would actually become good because of this translation layer, and then them just not doing that has been a huge let down for me on the phone.
netsharc
LLMs are probably better at "understanding" what the user wants, e.g. "Hey Siri, I have to take my kids to the doctor tomorrow" would reply "Do you want me to make a calendar event?".
Asking ChatGPT, it even has suggestions like "Do you want me to check traffic, do you want me to make a checklist for what to bring?".
If I continue with "but I can't make it", one of the suggestions ChatGPT imagine possible is: "Would you like me to send a message to your partner, babysitter, or anyone else involved?"
mercer
if they used something like Whisper for transcription, the quality of understanding what I'm saying would already massively improve.
wongarsu
Yes, for users that have been conditioned on how to talk to Siri, there is no point adding an LLM layer. But for everyone else an LLM might add a more natural and forgiving interface. Also an interface that might be more open to exploration, if you can describe a goal and the LLM picks whatever command it feels fits that intent.
I can absolutely see an argument that it should be a feature power users can turn off to keep the "original" Siri experience
null
timewizard
Everyone thought LLMs would scale exponentially like everything else in computer science. It's actually linear and they're all stuck holding a bag of mostly worthless goods.
null
djohnston
This problem of non-technical product folks over-promising features is going to get much worse in the age of LLMs. The models are incredibly adept at providing a proof-of-concept. But it's often a monumental endeavour to cross the gap between 70% and 90% accurcacy; 90%-95% even more. This long-tail pain isn't new, but the ability for non-technical folks to poke a model on a chat site and then assume their idea is ready to be deployed to production is a more recent phenomenon.
ChrisMarshallNY
> The models are incredibly adept at providing a proof-of-concept. But it's often a monumental endeavour to cross the gap
That's not just with LLMs. This has been an Achille's heel of demos for decades.
It's usually quite easy to make a demo of something sexy, but making it ship, is a very, very different thing.
This is made exponentially worse, by managers thinking that the demo means "we're almost ready," and even by the engineers, themselves, who are often not experienced enough to understand that demo != ship. Also, it has been my experience, that researchers that come up with amazing ideas, have absolutely no appreciation for what it takes to turn their ideas into ship product. Since Apple is all about ship product, they have to bridge that gulf. They are trying to commoditize something that is mostly buzz and raw research, right now.
I'm really pretty jaded about ML and LLMs in general, right now. I feel as if a whole bunch of really sexy demos were released, creating a huge level of buzz, but that the shipping product is still a ways off.
elicksaur
I don’t disagree. I do think there will be a tendency to say, “We can do X using AI.” When X can happen, but it isn’t guaranteed to happen by the system.
Here, it doesn’t sound like the features promised were truly demo-able, and when they were they were “not working properly up to a third of the time”.
Having a random CRUD MVP that is 2/3rds done is different than having a SOTA LLM implementation only being 2/3rds reliable. It is a vastly different problem to get from there to the finish line.
But I think marketing types would be equally likely to make promises in both scenarios.
pgwhalen
I’m getting this sense too, as my own employer is starting to add AI features. The pitches for what is possible are not grounded in what engineers know the system can do, they’re just assumptions about “if we give the LLM these inputs, we’d expect to see these outputs that are accurate enough for meaningful productivity gains.”
I’m not an AI skeptic, but it’ll be interesting to see how we manage the uncertainty in these projects.
walterbell
> Apple’s product marketing organization, marketing communications, software engineering team and AI organization are all culpable.. Given the nature of this collective failure, it’s unlikely to result in management changes on Cook’s executive team.. the whole mess is likely to come to a head this week at Apple’s annual Top 100 offsite. A March tradition since the days of Steve Jobs, this is a meeting of the company’s 100 most important executives.
Since no one was responsible for this collective failure, can a single leader be given responsibility for future Siri success?
Management is about persuading people to do things they do not want to do, while leadership is about inspiring people to do things they never thought they could. —Steve Jobs
resoluteteeth
I have an android phone and switching google assistant to gemini made it worse for the fairly basic things I actually use google assistant for, like setting reminders, so I switched it back.
Are people really clamoring for AI in the voice assistant on their phone?
jsheard
> so I switched it back.
Google has helpfully decided that you don't actually want to do that:
https://blog.google/products/gemini/google-assistant-gemini-...
later this year, the classic Google Assistant will no longer be accessible on most mobile devices
dismalaf
The Gemini based assistant has been getting better... It was weird to see the massive step backwards though.
wongarsu
Using a fully-fledged flagship model as a personal assistant actually works pretty well. You don't need a task app, just tell the AI to remember and remind you. Granted, now you have to inject prompts to work around context limitations, allow the AI to schedule wake ups for itself or allow it to send notifications, etc. But those are issues with straight forward solutions. The issue is that for now those models are incredibly expensive and resource intensive to run. And a quantized 3.25B model like Gemini Nano just can't perform the same.
juujian
Yup, had to stop using it, next step was to get rid of Gemini altogether. It's so weird to claim you give people an advanced AI when actually you are removing a whole bunch of features.
pram
My Nest can still only get “ok google turn on the light” the second time I say it. Good shit!
peatmoss
I upgraded to an iPhone 16 in part because I was interested in the AI augments to iOS. My assumption was that Apple was likely to do something tasteful, and probably not nuclear-level bad in terms of privacy.
I've since disabled everything AI. My feeling is that my phone is much snappier without. Text summary features are pretty bad—in one case a summary of a text thread led me to believe my dad was having an acute health crisis when in fact it was a fairly benign discussion of news from doctors. Even custom emoji, which could have been fun, are simply bad.
Meanwhile some basic iOS features like "mute all notifications from this particular text thread" have been broken for me for several months.
karlgkk
Their ai summary really eats shit when it comes to slang heavy, short messages.
It’d be genuinely useful if it kicked on in threads with high amounts of unread messages
harrall
I know Apple has been known to wait until a technology was very mature before putting it into a product, sometimes years after competitors had already done so.
But Apple Intelligence seems like the complete opposite… like they had to have something now.
EPWN3D
The same year the tech industry got taken by storm by LLMs, Apple announced a brutally expensive, niche VR/AR headset with no particular use case. That product was essentially DOA.
I think this is the first time in decades where Apple took a step from their back foot. The Vision Pro announcement during the initial hype cycle of ChatGPT made it look like they were just totally out of step with the industry. It was a dud product that cost billions in R&D, the market for it was unclear, and the industry zeitgeist (and talent) was going in a totally different direction.
So their response was to rush something out the door to avoid looking like they were caught flat-footed. Now it's abundantly obvious they were unprepared for AI, they've failed to ship features they promised, and they have to commit resources to support an additional platform that no one cares about because they did ship that one. Worst of all worlds.
Had they slept on the Vision Pro, I'm pretty convinced they could have credibly kept their powder dry on LLMs, like they've done with many industry hype cycles in the last. People would've assumed they were doing the Apple thing of not being first to ship something, but the first to ship compelling.
walterbell
> Had they slept on the Vision Pro
2025 versions of iOS, iPadOS and macOs are all being redesigned based on VisionOS, despite VisionPro production and successor cancellation.
Aeolun
They could have waited with the vision pro too. Just keep the tech in their back pocket until they could release something that was viable for consumers.
dehrmann
It's not even to the level of Apple Intelligence. If I tell Siri to set a timer for 5 minutes, it goes off, and I say "Hey Siri, stop alarm," it stops my daily wake-up alarm, not the timer. It feels like Siri hasn't gotten any better in the past decade.
rgovostes
My speculation is that the small on-device models are simply not useful enough for practical purposes.
The on-device transformer models are described as having 3B parameters[0]. Their own figures from 6 months ago show that humans prefer other models' output 25-40% of the time, which aligns with Gurman's reporting.
I don't know how well Apple's A-series chips can handle transformers, but if you play with their on-device diffusion models (Image Playground, Genmoji), you can watch your battery drop before your eyes. And the output of these models is also embarrassingly behind the state of the art.
If Apple can make great foundation models for Private Cloud Compute, that's great. But then what's the point of buying a new iPhone for on-device inference capabilities?
Secondly, I speculate that allowing a server-side model to query the personal context database is going to be hard to do performantly, requiring several round trips and uploading fragments of the data to the cloud for processing.
0: https://machinelearning.apple.com/research/introducing-apple...
jccalhoun
I am curious how many people are like me and never use the voice assistants on phones. I have amazon alexa devices and only use it for timers, weather, and controlling lights. I never use my phone's voice assistant. (I'm not even sure how to turn it on)
walterbell
Enterprise MDM and Apple Configurator can disable voice assistant entirely, which may be needed in some work contexts.
binarymax
The personal context issue could have been solved long before LLMs appeared.
My foil for this has always been the simple request from Siri: "Take me home".
Apple and my device know where I live. I have even registered "home" in Apple maps. This is not a huge leap, and this is not even something that requires an LLM. But it does force the featureset and capability to be better. This was a problem in 2015. 10 years later I still can't make this simple request.
I don't know why Apple just froze Siri and put it on life support. They could have been doing far better in this space.
xmddmx
FYI, I find that "navigate home" works, if you want to get home. (I agree with your complaint however that Siri is brain dead much of the time)
gscott
It seems there should be two intelligent agents Siri which takes commands and does them and then some other agent that is an llm that you can call up with a different call name. By separating the two you solve all the problems.
AlexandrB
But that doesn't match the marketing vision of the product and brand identity. /s
ineedaj0b
siri was under performing for years: im a 'heavy' user - it's been terrible. it worked better in 2012 and slowly got worse. it's acceptable in 2025 but not good.
the siri division was caught nlp-ing anonymized data and totally sideswiped by llms. they have not transitioned. generally apple finds a Moses (like they did with m1 chips), hires a cracked team to build out but.. the salaries they offer don't touch openai or anthropic and who wants to work with nut that was building out siri initially? the guy is terrible
steve would have gutted the siri team. switched to a <100 person team and they'd have something nice 2025, even unique. apple is the team i expected to have deepseek type innovations on limited hardware. instead she wants to search the internet for what my local humidity levels are.
umeshunni
> who wants to work with nut that was building out siri initially
Who are you referring to?
softwaredoug
It’s one thing to chat with an LLM about information/knowledge. It’s another completely as a user to understand which actions it can take on your behalf. That seems like the disconnect.
What would be nice if an LLM could just exist to tell me what Siri is capable of. As I usually blindly stumble into its capabilities.
The basic problem here for Apple is that LLMs will never actually be able to avoid prompt injection issues, and the entire "personal awareness" set of functionality they're trying to make uses LLMs. Unless somebody at Apple invents a new state of the art, it's not going to happen.
With that said, I'm surprised they haven't yet at least replaced the 'dumb' Siri commands with something that's effectively an LLM translation layer for an internal API. That would get significantly better experience (even a dumb LLM is way better at understanding natural language directions than Siri) with no 'personal awareness' stuff needed.