Skip to content(if available)orjump to list(if available)

AI's Biggest Flaw? The Blinking Cursor Problem

lisper

No, the blinking cursor is a feature, not a bug. Alec Watons over at Technology Connections has a much better argument for this than I could ever hope to muster, so I'll just hand it over to him:

https://www.youtube.com/watch?v=QEJpZjg8GuA

joe_the_user

Just to be clear, the long video you link to essentially saying lack of discoverability is an intentional misfeature of social media.

Which is to say that the host and OP agree lack of discoverability is a problem (Watons just views it as maliciously inserted problem). And so your "No" involves a bit of misrepresentation...

lisper

> lack of discoverability is an intentional misfeature of social media

That's not the message at all. The message is that the problem with social media is that it feeds you content without any prompting, and so it turns the user into a purely passive consumer and robs them of their agency. There's plenty of discoverability in social media. The problem is you don't have to use it, and so people don't. A blinking cursor forces you to take the wheel.

psadri

I think it depends on who is using the system. For a power user who is already familiar with the domain, a free form, open ended ui that can understand anything I very powerful and liberating.

For a novice user or someone who is not from the domain - it can be challenging because they may not know where to start.

There is so much that can be done in this space by fully leaning into the AI. It can for example figure out the user’s level and offer varying levels of guidance and help.

lisper

IMHO anyone who doesn't know where to start shouldn't be using an AI. If you don't have enough initiative to decide what to ask you need a very different kind of help.

ojschwa

This is a tantalizing problem for me as a UX designer. My approach, which I'm quite proud of, places a UI primitive (Todo lists) center stage, with the chat thread on the side similar to Canvas or Claude's Artifacts. The interaction works like this:

1. User gets shown a list GUI based on their requirement (Meal Planning, Shopping List...) 2. Users speak directly to the list while the LLM listens in realtime 3. The LLM acknowledges with emojis that flash to confirm understanding 4. The LLM creates, updates or deletes the list items in turn (stored in localStorage or a Durable Object -> shout out https://tinybase.org/)

The lists are React components, designed to be malleable. They can be re-written in-app by the LLM, while still taking todos. The react code also provide great context for the LLM — a shared contract between user and AI. I'm excited to experiment with streaming real-time screenshots of user interactions with the lists for even deeper mind-melding.

I believe the cursor and chat thread remain critical. They ground the user and visually express the shared context between LLM and user. And of course, all these APIs are fundamentally structured around sequential message exchanges. So it will be an enduring UI pattern.

If you're curious I have a demo here -> https://app.tinytalkingtodos.com/

psadri

I’m a fan of this co-authoring-a-document model. It’s nice bridge between free form interaction and rigid UIs of the yore.

xg15

> Once you’ve overcome the intimidation of the blinking cursor and typed something in, one of the next obstacles you’ll likely meet is the lack of clarity regarding the overall capabilities of these general-purpose AI systems.

The article presents this as a UX problem, but isn't this actually a much deeper issue? We straight up don't know what those models can and cannot do (I.e. which tasks can be reliably done with high levels of correctness and which tasks will just lead to endless hallucinations) because the mechanism by which the models generalize tasks is still not fully understood. This stuff is still an active area of research.

light_triad

A chat interface is great in the sense that it's open, flexible and intuitive.

The downside is there's a tendency to anthropomorphise AI, and you might not want to talk to your computer: it takes too long to explain all the details, can be clunky for certain tasks and as the author argues actually limiting if you don't already know what it can do.

There's a need to get past the "Turing test" phase and integrate AI into more workflows so that chat is one interface among many options depending on the job to be done.

42lux

You know I kinda want to but more like in Star Trek. Interconnected between voice commands, terminals and screens. The problem is the fact that we won’t get a well integrated AI. The best possibility has probably apple because they usually get the interconnections between their products right… but they have other problems in regards to AI.

rafaelmn

Chat works because humans are really impressed by natural language responses, irregardless of the actual correctness/quality.

Once you build it into a product the failure modes become obvious.

kazinator

'irregardless' of correctness, LOL. Best thing I read all day.

rafaelmn

Is it double negative ? English not a first language so maybe doesn't translate exactly. But kind of proves my point - if that was grammatically correct bullshit you're way more impressed than getting an awkwardly structured reply.

When you actually build stuff with AI into products (I've been a part of several integrations), failure modes and reliability become obviously lacking. Models failing to respond to a simple RAG question with relevant context a significant percentage of time, meanwhile they are solving PHD problems on some benchmark's. Then you find out they have to sample 10s of times to get the right answer in scenarios where evaluation is simple, or include tests in training data and then suddenly the lack of product integration makes sense.

antonkar

Yep, we can build the Artificial Static Place Intelligence – instead of creating AI/AGI agents that are like librarians who only give you quotes from books and don’t let you enter the library itself to read the whole books. Why not expose the whole library – the entire multimodal language model – to real people, for example, in a computer game?

To make this place easier to visit and explore, we could make a digital copy of our planet Earth and somehow expose the contents of the multimodal language model to everyone in a familiar, user-friendly UI of our planet.

We should not keep it hidden behind the strict librarian (AI/AGI agent) that imposes rules on us to only read little quotes from books that it spits out while it itself has the whole output of humanity stolen.

We can explore The Library without any strict guardian in the comfort of our simulated planet Earth on our devices, in VR, and eventually through some wireless brain-computer interface (it would always remain a game that no one is forced to play, unlike the agentic AI-world that is being imposed on us more and more right now and potentially forever)

_nalply

I found your analogy of a librarian only giving out quotes insightful and enlightening. Thank you.

It was a TIL moment for me: Make the training data available and indexable! Similar to a snapshot of humanity's complete knowledge and stories.

Today the AI models are like a librarian who knows well all books of the library but can't carry around the library in a bag. There was a time when she read all the books, but now the books are in thousands of crates in sub-basements and not available.

I envision a future where exabytes of data or more are stored in a smartphone-like device in something like a tiny crystal. The AI model on request can make a copy of some original for you. And this thing can't be bricked.

antonkar

Thank you, we discussed it recently here and in other comments on that page https://forum.effectivealtruism.org/posts/9XJmunhgPRsgsyWCn/...

marginalia_nu

These seem to mostly be a human problem.

Out of the large number of things you can do, most likely you're only consciously aware of a small number of them, and even among those, you're fairly likely to fall back on doing the things you've done before.

You could potentially do something new, something you haven't even considered doing that's wildly out of character, there's any number of such things you could do, but most likely you won't, you'll follow your routines and do the same proven things over and over again.

You Can Just Do Things (TM), sure, but first you need to have the idea of doing them. That's the difficult hard part, fishing an interesting idea out of the dizzying expanse of possibilities.

recursive

> These AI systems are not able to describe their own capabilities or strengths and are not aware of their limitations and weaknesses

I've experienced this with github copilot. At the beginning of a copilot chat, there's a short paragraph. It tells you to use "slash commands" for various purposes. I ask for a list of what slash commands are available. It responds by giving me a general definition of the term "slash command". No. I want to know which slash commands you support. Then it tells me it doesn't actually support slash commands.

I definitely feel like I'm falling into the non-power-user category described here in most of my AI interactions. So often I just end up arguing them in circles and them constantly agreeing and correcting, but never addressing my original goal.

yorwba

To find out about slash commands, you should type "/help". Of course, you'd only know about the "/help" slash command if you were already at least a bit familiar with slash commands. It is a conundrum.

recursive

Or... it could say "type /help to learn more". But maybe that would make it too easy.

ddxv

Another issue is trust. When it does tell you inrormation, how do you know you can trust that?

I treat it now more like advice from a friend. Great information that isn't necessarily right and often wrong without having any idea it is wrong.

Syonyk

> I treat it now more like advice from a friend. Great information that isn't necessarily right and often wrong without having any idea it is wrong.

"Drunken uncle at a bar, known for spinning tales, and a master BSer who hustled his way through college in assorted pool halls" is my personal model of it. Often right, or nearly so. Frequently wrong. Sometimes has made things up on the spot. Absolutely zero ability to tell which it is, from the conversation.

skydhash

You actually have a confidence measure for your friend advice. I’d trust a mechanic friend if he says I should have someone take a look at my car, or my librarian friend when he recommends a few books. Not everyone tell a lie and the truth in the same breath. And there’s the quantifier like “I think…”, “I believe…”, “Maybe…”

ddxv

True, I've started to develop my own model of that. I completely trust AI models around Javascript, generic code etc. The more mission critical something is, I'm more likely to only read what it says and avoid copy pasting.

SQL I've learned I need to 100% read/comprehend the logic, too easy to be 'right' that later turns out to be wrong.

Less common / newer libraries are the least trustable. I can barely get anything working with ClickHouse/Svelte 5 etc

null

[deleted]

kazinator

> Back then very few people knew what to do when faced with this screen

In the '80s, you could go into any computer store and see what prior visitors had been up to with the machines on display.

And what you would very often find is evidence that the user before you had been trying to type English into the computer, to see whether it would converse, and that the user soon gave up after seeing nothing but error messages.

It was incredibly common. People who didn't know anything about computers harbored a misunderstanding that you could just chat with them, like Captain Kirk or Mr. Spock in Star Trek, and they tried exactly that at the keyboard.

Fast forward 40 years, and it finally works like they expect.

So anyway, chatting with a computer at the blinking cursor is entirely discoverable. And if there's a prompt there for the human saying something like "try asking me anything in plain English", then quadruply so.

flowerthoughts

Altavista replaced Yahoo. And then Google did it the same way. And this was without language interpretation. I agree with the sentiment, and I think it's easier for professionals to build muscle memory on a 2D layout rather than text fragments, but it all depends on the use-case. I think we'll move more into a scrolling history of small widgets that you fill out. I.e. the AI builds the user interface that's needed at this point in time.

Granted, Tomi Engdahl's electronics hub [1] was an amazing resource for discovering electronics.

[1] https://www.epanorama.net/

amelius

WhatsApp has the same blinking cursor, and everybody is happy with it.

layer8

An important feature of WhatsApp is that it lets you communicate with different people, who each have different pre-existing contexts and roles for you. Role selection is one of the possible solutions proposed in the article.

kleiba

The blinking cursor is a metaphor, it's about having to craft prompts and what that implies from a UX perspective.

wepple

I tend to know people on chat are human and therefore what they’re likely capable of and not capable of.

And I’m not expected to use them as a tool. By contrast I can probably pick up any Ryobi power tool that I’ve never seen before and work out how to make it do its thing, and probably what its purpose is

null

[deleted]

binarymax

I see some good points here but overall I disagree. Traditionally all UI have required people to adapt to how machines work. We need to memorize commands and navigate clunky interfaces that are painstakingly assembled (often unsuccessfully) by UX research and UI teams.

The chat reverses this. It is now machines adapting to how we communicate. I can see some UI sugar finding its way into this new way of interaction, but we should start over and force the change to keep it on our terms.

jltsiren

Chat UI can be intuitive if it sees the context. If you can make single-sentence queries and the AI understands enough of the context to guess what you actually meant and to extrapolate the details, it can be very powerful. But if you actually need precise and detailed queries for good outcomes, it's not very natural. Explaining things clearly is often harder than understanding them or doing them yourself.