The case against conversational interfaces

220 comments

·April 1, 2025

ChuckMcM

This clearly elucidated a number of things I've tried to explain to people who are so excited about "conversations" with computers. The example I've used (with varying levels of effectiveness) was to get someone to think about driving their car by only talking to it. Not a self driving car that does the driving for you, but telling it things like: turn, accelerate, stop, slow down, speed up, put on the blinker, turn off the blinker, etc. It would be annoying and painful and you couldn't talk to your passenger while you were "driving" because that might make the car do something weird. My point, and I think it was the author's as well, is that you aren't "conversing" with your computer, you are making it do what you want. There are simpler, faster, and more effective ways to do that then to talk at it with natural language.

shubhamjain

I had the same thoughts on conversational interfaces [1]. Humane AI failed not only because of terrible execution, the whole assumption of voice being a superior interface (and trying to invent something beyond smartphones) was flawed.

> Theoretically, saying, “order an Uber to airport” seems like the easiest way to accomplish the task. But is it? What kind of Uber? UberXL, UberGo? There’s a 1.5x surge pricing. Acceptable? Is the pickup point correct? What would be easier, resolving each of those queries through a computer asking questions, or taking a quick look yourself on the app?

> Another example is food ordering. What would you prefer, going through the menu from tens of restaurants yourself or constantly nudging the AI for the desired option? Technological improvement can only help so much here since users themselves don’t clearly know what they want.

[1]: https://shubhamjain.co/2024/04/16/voice-is-bad-ui/

JSR_FDED

And 10x worse than that is booking a flight: I found one that fits your budget, but it leaves at midnight, or requires an extra stop, or is on an airline for which you don't collect frequent flyer miles, or arrives it at a secondary airport in the same city, or it only has a middle seat available.

How many of these inconveniences will you put up with? Any of them, all of them? What price difference makes it worthwhile? What if by traveling a day earlier you save enough money to even pay for a hotel...?

All of that is for just 1 flight, what if there are several alternatives? I can't imagine have a dialogue about this with a computer.

fragmede

But that is how we used to buy a plane ticket. Long before flights.google.com's price table, you'd call a human up and tell them you'd like to go on holiday. They'd ask you where and when and how much you could afford, and then after a while with the old system (SABRE) clicking and clacking they'd find you a good deal. After a few flights with that travel agent, they'd hey to know you and wouldn't have to ask so many questions.

Similarly, long before Waymo, you'd get into a taxi, and tell the human driver you're going to the airport, and they'd take you there. In fact, they'd get annoyed at you if you backseat drove, telling them how to use the blinker and how hard to brake and accelerate.

The thing about conversational interfaces is that we're used to them, because we (well, some of us) interface with other humans fairly regularly, and so it's a fairly baseline level skill to have to exist in the world today. There's a case to be made against them, but since everyone can be assumed to be conversational (though perhaps not in a given language), it's here to stay. Restaurants have menus that customers look at before using the conversation interface to get food, in order to guide the discussion, and that's had thousands of years to evolve, so it might be a local maxima, but it's a pretty good one.

indigoabstract

And then there is the fact that voice isn't the dominant mode of expression for all people. Some are predominantly visual thinkers, some are analytic and slow to decide, while some prefer to use their hands and so on.

I guess there's just no substitute for someone actually doing the work of figuring out the most appropriate HMI for a given task or situation, be it voice controls, touch screens, physical buttons or something else.

ramblejam

> since users themselves don’t clearly know what they want.

Knowing what you want is, sadly, computationally irreducible.

littlestymaar

Why couldn't the interface ask you about your preferences? Because instead, what we have right now are clunky web interface that just cram every choice in the small screen in front of you and letting you understand how they are in fact different and sort out yourself how to make things work.

Of course a conversational interface is useless if it tries to just do the same thing as a web UI, which is why it failed a decade ago when it was trendy, because the tech was nowhere clever enough to make that useful. But today, I'd bet the other way round.

UncleMeat

Scrolling through a list of a few options seems much less clunky than being asked via voice about which option I prefer. I can see multiple options at once and compare them easily. But via voice I need to keep all of the options in working memory to compare them. Harder.

earnestinger

It can ask, but how much time do you want to spend answering stuff?

Such dialog is probably nice for first time user, it is a nightmare for repeated user.

Propelloni

> I had the same thoughts on conversational interfaces [1]. Humane AI failed not only because of terrible execution, the whole assumption of voice being a superior interface (and trying to invent something beyond smartphones) was flawed.

Amen to that. I guess, it would help to get of the IT high horse and have a talk with linguists and philosophers of language. They are dealing with this shit for centuries now.

phyzix5761

You're onto something. We've learned to make computers and electronic devices feel like extensions of ourselves. We move our bodies and they do what we expect. Having to switch now to using our voice breaks that connection. Its no longer an extension of ourselves but a thing we interact with.

namaria

Two key things that make computers useful, specificity and exactitude, are thrown out of the window by interposing NLP between the person and the computer.

I don't get it at all.

TeMPOraL

   [imprecise thinking]
         v <--- LLMs do this for you
   [specific and exact commands]
         v
   [computers]
         v
   [specific and exact output]
         v <--- LLMs do this for you
   [contextualized output]

In many cases, you don't want or need that. In some, you do. Use right tool for the job, etc.

grbsh

Why would you ever hire a human to perform some task for you in a company? They're known for having problems with ambiguity and precision in communication.

Humans require a lot of back and forth effort for "alignment" with regular "syncs" and "iterations" and "I'll get that to you by EOD". If you approach the potential of natural interfaces with expectations that frame them the same way as 2000s era software, you'll fail to be creative about new ways humans interact with these systems in the future.

brookst

I also don’t like command like interfaces for all things, but there are cases where they excel, or where they are necessary due to technical constraints. But when the man page for a simple command runs to 10 screens of options I sometimes wonder.

d3vmax

Agree. Not all systems require convo mode. I personally find Chat/Convo/IVR type interface slow/tedious. Keyboard/Mouse ftw.

However, A CEO using Power BI with Convo to can get more insights/graphs rather than slice/dicing his data. They do have fixed metrics but incase they want something not displayed.

rurp

An empirical example would be Amazon's utter failure at making voice shopping a thing with the Echo. There were always a number of obvious flaws with the idea. There's no way to compare purchase options, check reviews, view images, or just scan a bunch of info at once with your eyeballs at 100x the information bandwidth of a computer generated voice talking to you.

Even for straightforward purchases, how many people trust Amazon to find and pick the best deal for them? Even if Amazon started out being diligent and honest it would never last if voice ordering became popular. There's no way that company would pass up a wildly profitable opportunity to rip people off in an opaque way by selecting higher margin options.

johnnyanmac

Yeah, it comes and goes in games for a reason. If it's not already some sort of social game, then the time to speak an answer is always slower than 3 button presses to select a pre-canned answer. Navigating a menu with Kinect voice commands will often be slower than a decent interface a user clicks through.

Voice interface only prevails in situations with hundreds of choices, and even then it's probably easier to use voice to filter down choices rather than select. But very few games have such scale to worry about (certainly no AAA game as of now).

steveBK123

Yeah I mean - haven't we already been doing this a decade with home voice assistant speaker things and all found them to be underwhelming?

Theres 1-5 things any individual finds them useful for (timers/lights/music/etc) and then.. thats it.

99.9% of what I use a computer for its far faster to type/click/touch my phone/tablet/computer.

ryandrake

I think a lot of these "voice assistant" systems are envisioned and pushed by senior leadership in companies like SVPs and VPs. They're the ones who make the decision to invest in products like this. Why do they think these products make sense? Because they themselves have personal assistants and nannies and chauffeurs and private chefs, and voice is their primary interface to these people. It makes sense that people who spend all their time vocally telling others to do work, think that voice is a good interface for regular people to tell their computers to do work.

steveBK123

That is actually a very interesting take I've not seen before and does make some sense.

If your work revolves about telling people what to do and asking questions, a voice assistant seems like a great idea (even if you yourself wouldn't have to stoop to using a robotic version since you have a real live human).

If your work actually involves doing things, then voice/conversational text interface quickly falls apart.

scott_w

> you couldn't talk to your passenger while you were "driving" because that might make the car do something weird.

This even happens while walking my dog. If my wife messages me, my iPhone reads it out and, at the same time, I'm trying to cross a road, she'll get a garbled reply which is just me shouting random words at my dog to keep her under control.

guestbest

If the driver could queue actions it would make chat interfaced driving easier since the desired actions could be prepared for implementation by button press rather than needed a dedicated button built at a factory built by an engineer.

PeterStuer

Here's where the article goes wrong:

1. "Natural language is a data transfer mechanism"

2. "Data transfer mechanisms have two critical factors: speed and lossiness"

3. "Natural language has neither"

While a conversational interface does transfer information, its main qualities are what I always refer to as "blissfull ignorance" and "intelligent interpretation".

Blisfull ignorance allows the requester to state an objective while not being required to know or even be right in how to achieve it. It is the opposite of operational command. Do as I mean, not as I say.

"Intelligent Interpretation" allows the receiver the freedom to infer an intention in the communication rather than a command. It also allows for contextual interactions such as goal oriented partial clarification and elaboration.

The more capable of intelligent interpretation the request execution system is, the more appropriate a conversational interface will be.

Think of it as managing a team. If they are junior, inexperienced and not very bright, you will probably tend towards handholding, microtasking and micromanagement to get things done. If you have a team of senior, experienced and bright engineers, you can with a few words point out a desire and, trust them to ask for information when there is relevant ambiguity, and expect a good outcome without having to detail manage every minute of their days.

null

[deleted]

throwaway290

> If you have a team of senior, experienced and bright engineers, you can with a few words point out a desire and, trust them to ask for information when there is relevant ambiguity, and expect a good outcome

It's such a fallacy. First thing an experienced and bright engineer will tell you is to leave the premises with your "few words about a desire" and not return without actual specs and requirements formalized in some way. If you do not understand what you want yourself, it means hours/days/weeks/months/literally years of back and forths and broken solutions and wasted time, because natural language is slow and lossy af (the article hits the nail on the head on this one).

Re "ask for information", my favorite example is when you say one thing if I ask you today and then you reply something else (maybe the opposite, it happened) if I ask you a week later because you forgot or just changed your mind. I bet a conversational interface will deal with this just fine /s

lolinder

> First thing an experienced and bright engineer will tell you is to leave the premises with your "few words about a desire" and not return without actual specs and requirements formalized in some way.

No, that's what a junior engineer will do. The first thing that an experienced and bright senior engineer will do is think over the request and ask clarifying questions in pursuit of a more rigorous specification, then repeat back their understanding of the problem and their plan. If they're very bright they'll get the plan down in writing so we stay on the same page.

The primary job of a senior engineer is not to turn formal specifications into code, it's to turn vague business requests into formal specifications. They're senior enough to recognize that that's the actually difficult part of the work, the thing that keeps them employed.

indoordin0saur

You're entirely right. The person you're responding to doesn't sound like a senior engineer so much as a grouchy old engineer who is burned out. Of course, you can get bad clients but expecting them to know exactly what specs they want every time is unreasonable in most situations, particularly if they don't have the technical knowledge of the systems you work in.

throwaway290

I used to think like you. My job is to ask questions etc. But after a couple decades I see if someone doesn't bother to even think about the idea enough to understand it himself beyond a few words he is not worth engaging with in this fashion. He doesn't really know what he wants. Today I ask a clarifying question he says one thing, next week he changes his mind or forgets and the result slowly becomes a mess

> The primary job of a senior engineer is not to turn formal specifications into code, it's to turn vague business requests into formal specifications.

Converting vibes and external world into specific requirements is product owner job.

Do not mistake software engineers and product people. These are very different things. Sometimes these things are done by the same person if the org has not enough money. Many freelancers working with small biz do both. I often do both at my day job. But this is a higher level role and if you are a senior engineer doing product stuff I hope it is recognized and you get proportionate comp.

Hauthorn

I think you work in different domains.

Expecting a good outcome is different from expecting to get exactly what you intended.

Formal specifications are useful in some lines of work and for some projects, less so for others.

Wicked problems would be one example where formal specs are impossible by definition.

johnnyanmac

>Anyway, the disabled are pretty much always allowed to be collateral damage by society, so this will just be senseless pain.

For games, you don't really need nor desire formal specs. But it also can really show how sometimes a director has a low tolerance for interpretation despite their communication being very loose. This leads to situations where it feels like the director is shifting designs on a dome, which is a lose-lose situation for everyone involved.

If nothing else, formal specification is for CYA. You get what you ask for, and any deviation should go in the next task order or have been addressed beforehand.

throwaway290

> Formal specifications are useful in some lines of work and for some projects, less so for others

There is always formal specification. Code is final formal specification in the end. But converting vague vibes from natural language into a somewhat formalized description is key ability you need for any really new non trivial project idea. Another human can't do it for you, conversational UI can't do it for you...

PeterStuer

I do understand that in bad cases it can be very frustrating as an engineer to chase vague statements only to be told later 'nah, that was not what I meant'. This is especially true when the gap in both directions is very large or there is incompetence and/or even adversarial stances between the parties. Language and communication only work if both parties are willing to understand.

Unfortunately if either is the case "actual specs and requirements formalized", while sounding logical, and might help, in my experience did very little to save any substantial project (and I've seen a lot). The common problem is that the business/client/manager is forced to sign of on formal documents far outside their domain of competence, or the engineers are straitjacketed into commitments that do not make sense or have no idea of what is considered tacit knowledge in the domain and so can't contextualize the unstated. Those formalized documents then mostly become weaponized in a mutual destructive CYA.

What I've also seen more than once is years of formalized specs and requirements work while nothing ever gets produced, and the project is aborted before even the first line of code hit test.

I've given this example before: When Covid lockdows hit there were digitization projects years in planning and budgeted for years of implementation, that were hastily specked, coded and roiled out into production by a 3 person emergency team over a long weekend. Necessity apparently has a way of cutting through the BS like nothing else can.

You need both sides capable, willing and able to understand. If not, good luck mitigating, but you're probably doomed either way.

brookst

I’m a PM and pride myself in specs that give the right level of detail, where “right” can vary hugely depending on context.

But I still get lazy with LLMs and fall into iteration the way bad PM/eng teams do. “Write a SQL query to look at users by gesture by month”. “Now make the time unit a parameter”. “Now pivot the features to columns”. “Now group features hierarchically”. “Now move the feature table to a WITH”.

My point and takeaway is that LLMs are endlessly patient and pretty quick to turn requirements around, so they lend themselves to exploration more than human teams do. Agile, I guess, to a degree that we don’t even aspire to in the human world because it would be very expensive and lead to fisticuffs.

throwaway290

> What I've also seen more than once is years of formalized specs and requirements work while nothing ever gets produced, and the project is aborted before even the first line of code hit test.

It just shows that no one really understood what they wanted. It is crazy to expect somebody to understand something better than you and it is hilarious to want a conversational UI to understand something better than you.

TeMPOraL

Star Trek continues to be prescient. It not only introduced the conversational interface to the masses, it also nailed its proper uses in ways we're still (re)discovering now.

If you pay attention to how the voice interface is used in Star Trek (TNG and upwards), it's basically exactly what the article is saying - it complements manual inputs and works as a secondary channel. Nobody is trying to manually navigate the ship by voicing out specific control inputs, or in the midst of a battle, call out "computer, fire photon torpedoes" - that's what the consoles are for (and there are consoles everywhere). Voice interface is secondary - used for delegation, queries (that may be faster to say than type), casual location-independent use (lights, music; they didn't think of kitchen timers, though (then again, replicators)), brainstorming, etc.

Yes, this is a fictional show and the real reason for voice interactions was to make it a form of exposition, yadda yadda - but I'd like to think that all those people writing the script, testing it, acting and shooting it, were in perfect position to tell which voice interactions made sense and which didn't: they'd know what feels awkward or nonsensical when acting, or what comes off this way when watching it later.

ben_w

I have similar thoughts on LCARS: the Doylist requirement for displays that are bold enough and large enough to feel meaningful even when viewed on a 1990-era TV, are also the requirements for real life public information displays.

At first glance it feels like real life will not benefit from labelling 90% of the glowing rectangles with numbers as the show does, but second thoughts say spreadsheets and timetables.

jeremyjh

There was an episode where Beverly Crusher was alone on the ship, and controlled everything just by talking to the computer. I wondered why there is a bridge, much less a bridge crew. But yes it makes sense to use higher bandwidth control systems when possible.

ben_w

If that was the episode where the crew disappeared with nobody else but her noticing, it doesn't really count because she was trapped in a Negative Space Wedgie pocket dimension based on her own thoughts at the time she was trapped.

jeremyjh

Yes, that was it. I think though that she had a good enough understanding of the ship's capabilities that her private world would have been realistic in that respect.

johnnyanmac

Star trek's crews overall are chosen in a way that seems to consider redundancies, as well as meshing as a team that can offer varying viewpoints.

It runs directly counter to that more capitalistic mindset of "why don't we do more with less?" when spending years navigating all kinds of unknown situations, you want as many options as possible available.

TeMPOraL

Definitely plays well with the kind of scenarios the writers throw at them - you can pretty much expect any Starfleet officer, whether a commander or an ensign, to operate any system on the ship with at least some passing competence. There's no "I work in stellar cartography, I don't know which button fires torpedoes or how to turn on the bio-bed in sick bay" on a Starfleet ship, except when uttered as a joke (or with EMHs). Overkill in real life? Perhaps. But definitely reassuring.

Hell, if someone really didn't know, they could expect "Computer, turn on the bio-bed 3" to just work - circling us back to the topic of what NLP and voice interfaces are good for.

blatantly

I remember Picard barking out commands to make the ship do preprogrammed evasion or fight maneuvers too. This seems like another good use.

TeMPOraL

Yeah, this and I think even weapons control, happened on the show. But the scenario for these cases is when the bridge is understaffed for episode-specific plot reasons, and the protagonist has to simultaneously operate systems usually handled by distinct stations. That's when you get an officer e.g. piloting the shuttle/runabout while barking out commands to manage power flow, or voice-ordering evasions while manually operating weapons, etc.

(Also worth noting is that "pre-programmed evasion patterns" are used in normal circumstances, too. "Evasive maneuver JohnDoe Alpha Three" works just as well when spoken to the helm officer as to a computer. I still don't know whether such preprogrammed maneuvers make sense in real-life setting, though.)

ben_w

In real life, even today, a lot of systems have to run fully automatically because humans are too slow to respond: https://en.wikipedia.org/wiki/Phalanx_CIWS

But specifically manoeuvres, rather than weapons systems? Today, I doubt it: the ships are too slow for human brains to be the limiting factor. But if we had an impulse drive and inertial dampers (in the Trek sense rather than "shock absorbers"), then manoeuvres would also necessarily be automated.

In the board game Star Fleet Battles (based on a mix of TOS, TAS, and WW2 naval warfare), one of the (far too many*) options is "Erratic Manoeuvres", for which the lore is a combination of sudden acceleration and unpredictable changes in course.

As we live in a universe where the speed of light appears to be a fundamental limit, if we had spaceships pointing lasers at each other and those ships could perform such erratic manoeuvres as compatible with the lore of the show about how fast they can move and accelerate, performing such manoeuvres manually would be effective when the ships are separated by light seconds. But if the combatants are separated by "only" 3000 km, then it has to be fully automated because human nerve impulses from your brain to your finger are not fast enough to be useful.

* The instructions are shaped like pseudocode for a moderately complex video game, but published 10-20 years before home computers were big enough for the rule book. So it has rules for boarding parties, and the Tholian web, and minefields, and that one time in the animated series where the Klingons had a stasis field generator…

cdrini

Completely agree, voice UI is best as an augmentation of our current HCI patterns with keyboard/mouse. I think one of the reasons this is, is because our brains kind of have separate buffers for visual memory and aural memory (Baddeley's working memory model). Most computer use takes up the visual buffer, and our aural buffer has extra bandwidth. This also means we can do things aurally while still maintaining focus/attention on what we're doing visually, allowing a kind of multitasking.

One thing I will note is that I'm not sure I buy the example for voice UIs being inefficient. I've almost never said "Alexa what's the weather like in Toronto?". I just say "Alexa, weather". And that's much faster than taking my phone out and opening an app. I don't think we need to compress voice input. Language kind of auto-compresses, since we create new words for complex concepts when we find the need.

For example, in a book club we recently read "As Long as the Lemon Trees Grow". We almost immediately stopped referring to it by the full title, and instead just called it "lemons" because we had to refer to it so much. Eg "Did you finish lemons yet?" or "This book is almost as good as lemons!". The context let shorten the word. Similarly the context of my location shortens the word to just "weather". I think this might be the way the voice UIs can be made more efficient: in the same way human speech makes itself more efficient.

incognito124

> This also means we can do things aurally while still maintaining focus/attention on what we're doing visually, allowing a kind of multitasking.

Maybe you, but I most definitely cannot focus on different things aurally and visually. I never successfully listened to something in the background while doing something else. I can't even talk properly if I'm typing something on a computer.

cdrini

Or to clarify, I don't think one can be in deep flow eg programming and simultaneously in deep flow having an aural conversation; we're human we can't truly multitask. But I do think that if you're focusing on something using your computer, it's _less_ disruptive to eg say "Alexa remind me in twenty minutes to take out the trash" then it is to stop what you're doing and put that in an app on your computer.

theshackleford

Yup, we are all different. I require auditory stimulation to work at my peak.

I did horribly in school but once I was in an environment where I could have some kind of background audio/video playing I began to excel. It also helps me sleep of a night. It’s like the audio keeps the portion of me that would otherwise distract me occupied.

gblargg

The multitasking is something I like about smart home speakers. I can be asking it to turn the lights on/off or check the temperature, while doing other things physically and not interrupting them, often while walking through the room. Even if voice commands are slower, they don't interrupt other processing nearly as much as having to visually devote attention and fine motor skills, and navigate to the right screen in an app to do what you want.

XorNot

I feel like the people using Voice Attack or whatever in space sims zeroed in on this.

It's very useful being able to request auxillary functions without losing your focus, and I think that would apply to say, word editing as well - e.g. being able to say "insert a date here" rather the having to get into the menus to find it.

Conversely, latency would be a big issue.

pugio

> The second thing we need to figure out is how we can compress voice input to make it faster to transmit. What’s the voice equivalent of a thumbs-up or a keyboard shortcut? Can I prompt Claude faster with simple sounds and whistles?

This reminds me of the amazing 2013 video of Travis Rudd coding python by voice: https://youtu.be/8SkdfdXWYaI?si=AwBE_fk6Y88tLcos

The number of times in the last few years I've wanted that level of "verbal hotkeys"... The latencies of many coding llms are still a little bit too low to allow for my ideal level of flow (though admittedly I haven't tried one's hosted on services like groq), but I can clearly envision a time when I'm issuing tight commands to a coder model that's chatting with me and watching my program evolve on screen in real time.

On a somewhat related note to conversational interfaces, the other day I wanted to study some first aid stuff - used Gemini to read the whole textbook and generate Anki flash cards, then copied and pasted the flashcards directly into chat GPT voice mode and had it quiz me. That was probably the most miraculous experience of voice interface I've had in a long time - I could do chores while being constantly quizzed on what I wanted to learn, and anytime I had a question or comment I could just ask it to explain or expound on a term or tangent.

WhyIsItAlwaysHN

I worked like that for a year in uni because of RSI and it's very easy to get voice strain if you use your voice for coding like that. Many short commands is very tiring for the voice.

It's also hard to dictate code without a lot of these commands because it's very dense in information.

I hope something else will be the solution. Maybe LLMs being smart enough to guess the code out of a very short description and then a set of corrections.

szszrk

Oh wow. That video is 12 years old. Early in the presentation Travis reveals he used Dragon back then.

Do you recall Swype keyboard for Android? The one that popularized swyping to write on touch screens? It had Dragon at some point.

IT WAS AMAZING.

Around 12-14 years ago (Android 2.3? Maybe 3?) I was able to easily dictate full long text messages and emails, in my native tongue, including punctuation and occasional slang or even word formation. I could dictate a decent long paragraph of text on the first try and not have to fix a single character.

It's 2025 and the closest I can find is a dictation app on my newest phone that uses online AI service, yet it's still not that great when it comes to punctuation and requires me to spit the whole paragraph at once, without taking a breath.

Is there anything equally effective for any of you nowadays? That actually works across the whole device?

davvid

> It's 2025 and the closest I can find is a dictation app on my newest phone that uses online AI service, yet it's still not that great [...]

> Is there anything equally effective for any of you nowadays?

I'm not affiliated in any way. You might be interested in the "Futo Keyboard" and voice input apps - they run completely offline and respect your privacy.

The source code is open and it does a good job at punctuation without you needing to prompt it by saying, "comma," or, "question mark," unlike other voice input apps such as Google's gboard.

https://keyboard.futo.org/

szszrk

Thanks for that suggestion.

I know and like Futo, very interesting project. Unfortunately multilang models are not great in my case. Still not bad for an offline tool, but far from "forget it's there, just use it" vibe I had with Dragon.

Funny thing is that I may have missgonfigured something in futo, because my typing corrections are phonetical :) so I type something in Polish and get autocorrect in English composed of different letters, but kind of similar sounding word.

Cthulhu_

It sounds like Dragon was never ambitious enough, and / or the phone manufacturers were too closed off to allow them entry into that market.

But now Microsoft bought them a few years ago. Weird that it took so long though.

android521

>I admit that the title of this essay is a bit misleading (made you click though, didn’t it?). This isn’t really a case against conversational interfaces, it’s a case against zero-sum thinking.

No matter the intention or quality of the article, i do not like this kind of deceitful link-bait article. It may have higher quality than pure link-bait but nobody like to be deceived

indoordin0saur

I did not find the article to be deceitful at all. He does make a case against overuse of conversational interfaces. The author is just humbly acknowledging his position is more nuanced than the title of article might suggest.

mpalmer

"Humbly"? The author has full control over the title, and in addition to being bait, the title is not humble at all.

Not a case against, but the case against.

johnnyanmac

I simply saw that as tongue in cheek about how the author wanted to use a more general core point. The lens of conversational interfaces makes a good case for that while keeping true to the idea.

You can argue against something but also not think it's 100% useless.

whatnow37373

It's no wonder extraverted normie and managerial types that get through their day by talking think throwing words at a problem is the best thing since sliced bread.

They have problems like "compose an email that vaguely makes the impression I'm considering various options but I'm actually not" and for that, I suspect, the conversational workflow is quite good.

Anyone else that actually just does the stuff is viscerally aware of how sub-optimal it is to throw verbiage at a computer.

I guess it depends on what level of abstraction you're working at.

sevensor

The best executives to work for are the ones who are able to be as precise at their level of abstraction as I am at mine. There’s a shared understanding at an intermediate level, and we can resolve misunderstandings quickly. And then there are the executives who think we should just feed our transducer data into an llm.

techpineapple

There’s an interesting… paradox? Observation? That up until 20-30 years ago, humans were not computerized beings. I remember a thought leader at a company I worked at said that the future was wearable computing, a computer that disappears from your knowing and just integrates with your life. And that sounds great and human and has a very thought leadery sense of being forward thinking.

But I think it’s wrong? Ever since the invention of the television, we’ve been absolutely addicted to screens. Screens and remotes, and I think there’s something sort of anti-humanly human about it. Maybe we don’t want to be human? But people I think would generally much rather tap their thumb on the remote than talk to their tv, and a visual interface you hold in the palm of your hand is not going away any time soon.

neom

I went through Waldorf education and although Rudolf Steiner is quite eccentric, one thing I think he was spot on about was regarding WHEN you introduce technology. He believed that introducing technology or mechanized thinking too early in childhood would hinder imaginative, emotional, and spiritual development. He emphasized that children should engage primarily with natural materials, imaginative play, storytelling, artistic activities, and movement, as opposed to being exposed prematurely to mechanical devices or highly structured thinking, I seem to recall he recommended this till the age of 6.

My parents did this with me, no screens till 6 (wasn't so hard as I grew up in the early 90s, but still, no TV). I notice too how much people love screens, that non-judgmental glow of mental stimulation, it's wonderful, however I do think it's easier to "switch off" when you spent the first period of your life fully tuned in to the natural world. I hope folks are able to do this for their kids, it seems it would be quite difficult with all the noise in the world. Given it was hard for mine during the era of CRT and 4 channels, I have empathy for parents of today.

soulofmischief

I will counter this by saying that my time spent with screens before 6 was unimaginably critical for me.

If I hadn't had it, I would have been trapped by the racist, religously zealous, backwoods mentality that gripped the rest of my family and the majority of the people I grew up with. I discovered video games at age 3 and it changed EVERYTHING. It completely opened my mind to abstract thought and, among other things, influenced me to teach myself to read at age 3. I was reading at a collegiate level by age five and discovered another passion, books. Again, propelled me out of an extremely anti-intellectual upbringing.

I simply could not imagine where I would be without video games, visual arts or books. Screens are not the problem. Absent parenting is the problem. Not teaching children the power of these screens is the problem.

f1shy

I second this motion. Technology is just a tool. It can be wisely used or not. Just forbidding it, is not wise in my opinion. You have to be careful to use it properly, or course.

Also let me drop the thought here, that Rudolf Steiner, like Montesori and the like, shoot "this is good" "this is bad" based on "feeling" or intuition, or such. There were no extensive scientific studies behind it.

setr

I’ve been theory crafting around video games for children on the opposing premise. I think fundamentally the divide is on the quality of content — most games have some value to extract, but many are designed to be played inefficiently, and require far more time investment than value extracted.

Eg Minecraft, Roblox, CoD, Fortnite, Dota/LoL, the various mobile games clearly have some kind of value (mechanical skill, hand-eye coordination, creative modes, 3D space navigation / translation / rotation, numeric optimization, social interaction, etc), but they’re also designed as massive timesinks mostly through creative mode or multiplayer.

Games like paper Mario, pikmin, star control 2, katamari damacy, lego titles, however are all children-playable but far more time efficient and importantly time-bounded for play. Even within timesink games there are higher quality options — you definitely get more, and faster, out of satisfactory / factorio than modded Minecraft. If you can push kids towards the higher quality, lower timesink games, I think it’s worth. Fail to do so and it’s definitely not.

The same applies to TV, movies, books, etc. Any medium of entertainment have horrendous timesinks to avoid, and if you can do so, avoiding the medium altogether is definitely a missed opportunity. Screens are only notable in that the degenerate cases are far more degenerate than anything that came before it

neom

Oh, his theory wasn't about video games though, they didn't exist in 1910, it was about the full breadth of human sensorial systems being used in the context of our neurology for a prolonged period of time during high neuroplasticity (0 to 6 was his theory). I haven't really played video games, so I don't know much about them personally.

nine_k

I don't see a contradiction. Watching passively in an expectation of a dopamine hit = bad. Playing actively with things that respond in various interesting ways = good, no matter if the things are material or virtual.

anthk

Wasn't Waldorf a cult?

jyounker

The story is kind of wild. Behind the Bastards has a couple of episodes on Rudolf Steiner, the founder: https://www.youtube.com/watch?v=6_2HkBugFBw

Foobar8568

It is, packed of pseudosciences, which we still suffer today.

In Switzerland, we get often measle outbreaks thanks to his cult.

lrem

Playing computer games since an early age made me who I am. It required learning English a decade earlier than my peers. It pulled me into programming around start of primary school. I wouldn’t be a staff engineer in a western country without these two.

bsder

> Screens and remotes, and I think there’s something sort of anti-humanly human about it.

Actually, it's the reverse. The orienting response is wired in quite deeply. https://en.wikipedia.org/wiki/Orienting_response

When I was teaching, I used to force students using laptops to sit near the back of the room for exactly this reason. It's almost impossible for humans to ignore a flickering screen.

strogonoff

Sensitivity to stimuli behind orienting impulse varies by individual and I wish I was less sensitive on daily basis.

These days screen brightness goes pretty high and it is unbelievable how many people seem to never use their screen (phone or laptop) on anything less than 100% brightness in any situation and are seemingly not bothered by flickering bright light or noise sources.

I am nostalgic about old laptops’ dim LCD screens that I saw a few times as a kid, they did not flicker much and had a narrow angle of view. I suspect they would even be fine in a darkened classroom.

Al-Khwarizmi

The last few times I've bought a new monitor, I've gone through the process of adjusting brightness based on comparing a document on screen to a paper sheet. This invariably results into going from defaults of 50-70% to very low figures like 5-15%, and it's not that I work in dark places, my offices have reasonable light from outside. I would be extremely uncomfortable using default settings, for me they are absurdly bright.

King-Aaron

A flickering screen is modern man's flickering campfire.

LoganDark

Computers are tools, not people. They should be made easier to use as tools, not tried to be made people. I actually hate people, tools are much better.

alnwlsn

Somebody showed me a text-to-CAD AI tool recently, and I can't help but feel that whoever made it doesn't understand that people who use CAD aren't trying to solve the problem of "make a model of a rubber duck" but something more like "make a custom angle bracket which mounts part number xxxyyyy". Sure, you can try to describe what you want in words, but there's a reason machine shops want drawings and not a 300 word poem like you're a 14th century monk. Much much easier to just draw a picture.

stevage

Surely those text based tools exist for people who aren't CAD experts. I don't know CAD. But a tool that let me type in a description of a thing and then send it off to be 3D printed sounds pretty great to me.

DabeDotCom

> It was like they were communicating telepathically. > > That is the type of relationship I want to have with my computer!

The problem is, "The Only Thing Worse Than Computers Making YOU Do Everything... Is When They Do Everything *FOR* You!"

"ad3} and "aP might not be "discoverable" vi commands, but they're fast and precise.

Plus, it's easier to teach a human to think like a computer than to teach a computer to think like a human — just like it's easier to teach a musician to act than to teach an actor how to play an instrument — but I admit, it's not as scalable; you can't teach everyone Fortran or C, so we end up looking for these Pareto Principle shortcuts: Javascript provides 20% of the functionality, and solves 80% of the problems.

But then people find Javascript too hard, so they ask ChatGPT/Bard/Gemini to write it for them. Another 20% solution — of the original 20% is now 4% as featureful — but it solves 64% of the world's problems. (And it's on pace to consume 98% of the world's electricity, but I digress!)

PS: Mobile interfaces don't HAVE to suck for typing; I could FLY on my old Treo! But "modern" UI eschews functionality for "clean" brutalist minimalism. "Why make it easy to position your cursor when we spent all that money developing auto-conflict?" «sigh»

grbsh

I think we can have the best of both worlds here. We want the precision and speed of using vi commands, but we want the discoverability of GUI document editors. LLMs may be able to solve the discoverability problem. If the editor can be highly confident that you want to use a given a command, for example, it can give you an intellisense like completion option. I don't think we've cracked the code on how this UX should work yet though -- as evidenced by how many people find cursor/copilot autocompletion suggestions so frustrating.

The other great thing about this mode is that it can double as a teaching methodology. If I have a complicated interface that is not very discoverable, it may be hard to sell potential users on the time investment required to learn everything. Why would I want to invest hours into learning non-transferrable knowledge when I'm not even sure I want to go with this option versus a competitor? It will be a far better experience if I can first vibe-use the product , and if it's right for me, I'll probably be incented to learn the inner workings of it as I try to do more and more.

Izkata

> We want the precision and speed of using vi commands, but we want the discoverability of GUI document editors.

> The other great thing about this mode is that it can double as a teaching methodology.

gvim has menus and puts the commands in the menus as shortcuts. I learned from there vim has folding and how to use it.

benob

To me natural language interfaces are like the mouse-driven menu vs terminal interpreter. They allow good discoverability in systems that we don't master at the cost of efficiency.

As always, good UI allows for using multiple modalities.

chthonicdaemon

I feel like chat interfaces have terrible discoverability. You can ask for anything but you have no idea what the system can actually do. In the menu system the options were all spelled out - that's what discoverability means to me. If you spend enough time going through the menus and dialogs you will find all the options, and in a well-designed interface you might notice a function you didn't know about near the one you're using now.

What chat interfaces have over CLIs is good robustness. You can word your request in lots of different ways and get a useful answer.

InsideOutSanta

Yes, this is exactly it. For things that I do rarely, I would love to have a working natural language interface because I know what I want to do, but I don't know how to do it. Even if there were more efficient ways to achieve my goal, since I do not know what they are, the inefficiencies of a natural language interface do not matter to me.

In this sense, natural language interfaces are more powerful search features rather than a replacement for other types of interfaces.

benrutter

Yesyesyesyes! I do wish I could think of more examples supporting both well.

VSCode is probably the best I can think of, where keyboard shortcuts can get you up to a decent speed as an advanced user, but mouse clicks provide an easy intro for a new user.

For the most part, I see tools like NVim, which is super fast but not new-user friendly. Or IOS, which a toddler can navigate, but doesn't afford many ways to speed up interactions like typing.

earcar

Who's actually making the claim we should replace everything with natural language? Almost nobody serious. This article sets up a bit of a strawman while making excellent points.

What we're really seeing is specific applications where conversation makes sense, not a wholesale revolution. Natural language shines for complex, ambiguous tasks but is hilariously inefficient for things like opening doors or adjusting volume.

The real insight here is about choosing the right interface for the job. We don't need philosophical debates about "the future of computing" - we need pragmatic combinations of interfaces that work together seamlessly.

The butter-passing example is spot on, though. The telepathic anticipation between long-married couples is exactly what good software should aspire to. Not more conversation, but less need for it.

Where Julian absolutely nails it is the vision of AI as an augmentation layer rather than replacement. That's the realistic future - not some chat-only dystopia where we're verbally commanding our way through tasks that a simple button press would handle more efficiently.

The tech industry does have these pendulum swings where we overthink basic interaction models. Maybe we could spend less time theorizing about natural language as "the future" and more time just building tools that solve real problems with whatever interface makes the most sense.

mattmanser

I don't think it's a straw man, there's lots of people who think it might, or under vague impressions that it might. Plenty of less technical people. Because they haven't thought it through.

The article is useful as it's enunciated arguments which many of us have intuited, but are not necessarily able to explain ourselves.