Chat is a bad UI pattern for development tools

431 comments

·February 4, 2025

wiremine

I'm going to take a contrarian view and say it's actually a good UI, but it's all about how you approach it.

I just finished a small project where I used o3-mini and o3-mini-high to generate most of the code. I averaged around 200 lines of code an hour, including the business logic and unit tests. Total was around 2200 lines. So, not a big project, but not a throw away script. The code was perfectly fine for what we needed. This is the third time I've done this, and each time I get faster and better at it.

1. I find a "pair programming" mentality is key. I focus on the high-level code, and let the model focus on the lower level code. I code review all the code, and provide feedback. Blindly accepting the code is a terrible approach.

2. Generating unit tests is critical. After I like the gist of some code, I ask for some smoke tests. Again, peer review the code and adjust as needed.

3. Be liberal with starting a new chat: the models can get easily confused with longer context windows. If you start to see things go sideways, start over.

4. Give it code examples. Don't prompt with English only.

FWIW, o3-mini was the best model I've seen so far; Sonnet 3.5 New is a close second.

ryandrake

I guess the things I don't like about Chat are the same things I don't like about pair (or team) programming. I've always thought of programming as a solitary activity. You visualize the data structures, algorithms, data paths, calling flow and stack, and so on, in your mind, with very high throughput "discussions" happening entirely in your brain. Your brain is high bandwidth, low latency. Effortlessly and instantly move things around and visualize them. Figure everything out. Finally, when it's correct, you send it to the slow output device (your fingers).

The minute you have to discuss those things with someone else, your bandwidth decreases by orders of magnitude and now you have to put words to these things and describe them, and physically type them in or vocalize them. Then your counterpart has to input them through his eyes and ears, process that, and re-output his thoughts to you. Slow, slow, slow, and prone to error and specificity problems as you translate technical concepts to English and back.

Chat as a UX interface is similarly slow and poorly specific. It has all the shortcomings of discussing your idea with a human and really no upside besides the dictionary-like recall.

yarekt

That's such a mechanical way of describing pair programming. I'm guessing you don't do it often (understandable if its not working for you).

For me pair programming accelerates development to much more than 2x. Over time the two of you figure out how to use each other's strengths, and as both of you immerse yourself in the same context you begin to understand what's needed without speaking every bit of syntax between each other.

In best cases as a driver you end up producing high quality on the first pass, because you know that your partner will immediately catch anything that doesn't look right. You also go fast because you can sometimes skim over complexities letting your partner think ahead and share that context load.

I'll leave readers to find all the caveats here

Edit: I should probably mention why I think Chat Interface for AI is not working like Pair programming: As much as it may fake it, AI isn't learning anything while you're chatting to it. Its pointless to argue your case or discuss architectural approaches. An approach that yields better results with Chat AI is to just edit/expand your original prompt. It also feels less like a waste of time.

With Pair programming, you may chat upfront, but you won't reach that shared understanding until you start trying to implement something. For now Chat AI has no shared understanding, just "what I asked you to do" thing, and that's not good enough.

RHSeeger

I think it depends heavily on the people. I've done pair programming at a previous job and I hated it. It wound up being a lot slower overall.

For me, there's

- Time when I want to discuss the approach and/or code to something (someone being there is a requirement)

- Time when I want to rubber duck, and put things to words (someone being there doesn't hurt, but it doesn't help)

- Time when I want to write code that implements things, which may be based on the output of one of the above

That last bucket of time is generally greatly hampered by having someone else there and needing to interact with them. Being able to separate them (having people there for the first one or two, but not the third) is, for me, optimal.

skue

> I'm guessing you don't do it often (understandable if its not working for you). For me pair programming accelerates development to much more than 2x.

The value of pair programming is inversely proportional to the expertise of the participant. Junior devs who pair with senior devs get a lot out of it, senior devs not so much.

GP is probably a more experienced dev, whereas you are the type of dev who says things like “I’m guessing that you…”.

ionwake

this is so far removed from anything I have ever heard or experienced. But I know not everyone is the same and it is refreshing to view this comment.

viraptor

> but you won't reach that shared understanding until you start trying to implement something.

That's very much not my experience. Pairing on design and diagrams is as or more useful than on the code itself. Once you have a good design, the code is pretty simple.

freehorse

Pair programming is imo great when there is some sort of complementarity between the programmers. It may or may not accelerate output, but it can definitely accelerate learning which is often harder. But as you say, this is not what working with llms is about.

taneq

> As much as it may fake it, AI isn't learning anything while you're chatting to it.

What's your definition of 'learn'? An LLM absolutely does extract and store information from its context. Sure, it's only short term memory and it's gone the next session, but within the session it's still learning.

I like your suggestion to update your original prompt instead of continuing the conversation.

dkkergoog

[dead]

frocodillo

I would argue that is a feature of pair programming, not a bug. By forcing you to use the slower I/O parts of your brain (and that of your partner) the process becomes more deliberate, allowing you to catch edge cases, bad design patterns, and would-be bugs before even putting pen to paper so to speak. Not to mention that it immediately reduces the bus factor by having two people with a good understanding of the code.

I’m not saying pair programming is a silver bullet, and I tend to agree that working on your own can be vastly more efficient. I do however think that it’s a very useful tool for critical functionality and hard problems and shouldn’t be dismissed.

RHSeeger

You can do that without pair programming, though. Both through actual discussions and through rubber ducking.

TeMPOraL

I guess it depends on a person. My experience is close to that of 'ryandrake.

I've been coding long enough to notice there are times where the problem is complex and unclear enough that my own thought process will turn into pair programming with myself, literally chatting with myself in a text file; this process has the bandwidth and latency on the same order as talking to another person, so I might just as well do that and get the benefit of an independent perspective.

The above is really more of a design-level discussion. However, there are other times - precisely those times that pair programming is meant for - when the problem is clear enough I can immerse myself in it. Using the slow I/O mode, being deliberate is exactly the opposite of what I need then. By moving alone and focused, keeping my thoughts below the level of words, I can explore the problem space much further, rapidly proposing a solution, feeling it out, proposing another, comparing, deciding on a direction, noticing edge cases and bad design up front and dealing with them, all in a rapid feedback loop with test. Pair programming in this scenario would truly force me to "use the slower I/O parts of your brain", in that exact sense: it's like splitting a highly-optimized in-memory data processing pipeline in two, and making the halves communicate over IPC. With JSON.

As for bus factor, I find the argument bogus anyway. For that to work, pair programming would've to be executed with the same partner or small group of partners, preferably working on the same or related code modules, daily, over the course of weeks at least - otherwise neither them nor I are going to have enough exposure to understand what the other is working on. But it's not how pair programming worked when I've experienced it.

It's a problem with code reviews, too: if your project has depth[0], I won't really understand the whole context of what you're doing, and you won't understand the context of my work, so our reviews of each others' code will quickly degenerate to spotting typos, style violations, and peculiar design choices; neither of us will have time or mental capacity to fully understand the changeset before "+2 LGTM"-ing it away.

[0] - I don't know if there's a a better, established term for it. What I mean is depth vs. breadth in the project architecture. Example of depth: you have a main execution orchestrator, you have an external data system that handles integrations with a dozen different data storage systems, then you have math-heavy business logic on data, then you have RPC for integrating with GUI software developed by another team, then you have extensive configuration system, etc. - each of those areas is full of design and coding challenges that don't transfer to any other. Contrast that with an example of breadth: a typical webapp or mobile app, where 80% of the code is just some UI components and a hundred different screens, with very little unique or domain-specific logic. In those projects, developers are like free electrons in metal: they can pick any part of the project at any given moment and be equally productive working on it, because every part is basically the same as every other part. In those projects, I can see both pair programming and code reviews deliver on their promises in full.

hinkley

Efficient, but not always more effective.

hmcdona1

This going to sound out of left field, but I would venture to guess you have very high spatial reasoning skills. I operate much this same way and only recently connected these dots that that skill might be what my brain leans on so heavily while programming and debugging.

Pair programming is endlessly frustrating beyond just rubber duckying because I’m having to exit my mental model, communicate it to someone else, and then translate and relate their inputs back into my mental model which is not exactly rooted in language in my head.

bobbiechen

I agree, chat is only useful in scenarios that are 1) poorly defined, and 2) require a back-and-forth feedback loop. And even then, there might be better UX options.

I wrote about this here: https://digitalseams.com/blog/the-ideal-ai-interface-is-prob...

throwup238

At the same time, putting your ideas to words forces you to make them concrete instead of nebulous brain waves. I find that the chat interface gets rid of the downsides of pair programming (that the other person is a human being with their own agency*) while maintaining the “intelligent” pair programmer aspect.

Especially with the new r1 thinking output, I find it useful to iterate on the initial prompt as a way to make my ideas more concrete as much as iterating through the chat interface which is more hit and miss due to context length limits.

* I don’t mean that in a negative way, but in a “I can’t expect another person to respond to me instantly at 10 words per second” way.

cortesoft

> At the same time, putting your ideas to words forces you to make them concrete instead of nebulous brain waves.

I mean, isn’t typing your code also forcing you to make your ideas concrete

cjonas

I find its exactly the opposite. With AI chat, I can define signatures, write technical requirements and validate my approach in minutes. I'm not talking with the AI like I would a human... I'm writing a blend of stubs and concise requirements, providing documentation, reviewing, validating and repeating. When it goes in the wrong direction, I add additional details and regenerate from scratch. I focus on small, composable chunks of functionality and then tie it all together at the end.

alickz

in my experience, if you can't explain something to someone else then you don't fully understand it

our brains like to jump over inconsistencies or small gaps in our logic when working by themselves, but try to explain that same concept to someone else and those inconsistencies and gaps become glaringly obvious (doubly so if the other person starts asking questions you never considered)

it's why pair programming and rubber duck debugging work at all, at least in my opinion

frizlab

Or maybe your in the process of building it and that’s why you cannot understand it: it does not exist yet.

knighthack

I mostly agree with you, but I have to point out something to the contrary of this part you said: "...The minute you have to discuss those things with someone else, your bandwidth decreases by orders of magnitude and now you have to put words to these things and describe them, and physically type them in or vocalize them."

Subvocalization/explicit vocalization of what you're doing actually improves your understanding of the code. Doing so may 'decrease bandwith', but improves comprehension, because it's basically inline rubber duck debugging.

It's actually easy to write code which you don't understand and cannot explain what it's doing, whether at the syntax, logic or application level. I think the analogue is to writing well; anyone can write streams of consciousness amounting to word salad garbage. But a good writer can cut things down and explain why every single thing was chosen, right down to the punctuations. This feature of writing should be even more apparent with code.

I've coded tons of things where I can get the code working in a mediocre fashion, and yet find great difficulty in try to verbally explain what I'm doing.

In contrast there's been code where I've been able to explain each step of what I'm doing before I even write anything; in those situations what generally comes out tends to be superior maintainable code, and readable too.

godelski

  > I focus on the high-level code, and let the model focus on the lower level code.

Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.

I've yet to see any model understand nuance or detail.

This is especially apparent in image models. Sure, it can do hands but they still don't get 3D space nor temporal movements. It's great for scrolling through Twitter but the longer you look the more surreal they get. This even includes the new ByteDance model also on the front page. But with coding models they ignore context of the codebase and the results feel more like patchwork. They feel like what you'd be annoyed at with a junior dev for writing because not only do you have to go through 10 PRs to make it pass the test cases but the lack of context just builds a lot of tech debt. How they'll build unit tests that technically work but don't capture the actual issues and usually can be highly condensed while having greater coverage. It feels very gluey, like copy pasting from stack overflow when hyper focused on the immediate outcome instead of understanding the goal. It is too "solution" oriented, not understanding the underlying heuristics and is more frustrating than dealing with the human equivalent who says something "works" as evidenced by the output. This is like trying to say a math proof is correct by looking at just the last line.

Ironically, I think in part this is why chat interface sucks too. A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make. And you can't even know the answer until you're part way in.

yarekt

> A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make

This is why I think LLMs can't really replace developers. 80% of my job is already trying to figure out what's actually needed, despite being given lots of text detail, maybe even spec, or prototype code.

Building the wrong thing fast is about as useful as not building anything at all. (And before someone says "at least you now know what not to do"? For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?)

godelski

  > Building the wrong thing fast is about as useful as not building anything at all.

SAY IT LOUDER

Fully agree. Plus, you may be faster in the short term but you won't in the long run. The effects of both good code and bad code compound. "Tech debt" is just a fancy term for "compounding shit". And it is true, all code is shit, but it isn't binary; there is a big difference between being stepping in shit and being waist deep in shit.

I can predict some of the responses

  Premature optimization is the root of all evil

There's a grave misunderstanding in this adage[0], and I think many interpret it as "don't worry about efficiency, worry about output." But the context is that you shouldn't optimize without first profiling the code, not that you shouldn't optimize![1] I find it also funny revisiting this quote, because it seems like it is written by a stranger in a strange land, where programmers are overly concerned with optimizing their code. These days, I hear very little about optimization (except when I work with HPC people) except when people are saying to not optimize. Explains why everything is so sluggish...

[0] https://softwareengineering.stackexchange.com/a/80092

[1] Understanding the limitations of big O analysis really helps in understanding why this point matters. Usually when n is small, you can have worse big O and still be faster. But the constants we drop off often aren't a rounding error. https://csweb.wooster.edu/dbyrnes/cs200/htmlNotes/qsort3.htm

TeMPOraL

> For any problem there are infinite number of wrong solutions, but only a handful of ones that yield success, why waste time trying all the wrong ones?

Devil's advocate: because unless you're working in heavily dysfunctional organization, or are doing a live coding interview, you're not playing "guess the password" with your management. Most of the time, they have even less of a clue about how the right solution looks like! "Building the wrong thing" lets them diff something concrete against what they imagined and felt like it would be, forcing them to clarify their expectations and give you more accurate directions (which, being a diff against a concrete things, are less likely to be then misunderstood by you!). And, the faster you can build that wrong thing, the less money and time is burned to buy that extra clarity.

lucasmullens

> But with coding models they ignore context of the codebase and the results feel more like patchwork.

Have you tried Cursor? It has a great feature that grabs context from the codebase, I use it all the time.

troupo

> It has a great feature that grabs context from the codebase, I use it all the time.

If only this feature worked consistently, or reliably even half of the time.

It will casually forget or ignore any and all context and any and all files in your codebase at random times, and you never know what set of files and docs it's working with at any point in time

pc86

I can't get the prompt because I'm on my work computer but I have about a three-quarter-page instruction set in the settings of cursor, it asks clarifying questions a LOT now, and is pretty liberal with adding in commented pseudo-code for stuff it isn't sure about. You can still trip it up if you try, but it's a lot better than stock. This is with Sonnet 3.5 agent chats (composer I think it's called?)

I actually cancelled by Anthropic subscription when I started using cursor because I only ever used Claude for code generation anyway so now I just do it within the IDE.

godelski

I have not. But I also can't get the general model to work well in even toy problems.

Here's a simple example with GPT-4o: https://0x0.st/8K3z.png

It probably isn't obvious in a quick read, but there are mistakes here. Maybe the most obvious is that how `replacements` is made we need to intelligently order. This could be fixed by sorting. But is this the right data structure? Not to mention that the algorithm itself is quite... odd

To give a more complicated example I passed the same prompt from this famous code golf problem[0]. Here's the results, I'll save you the time, the output is wrong https://0x0.st/8K3M.txt (note, I started command likes with "$" and added some notes for you)

Just for the heck of it, here's the same thing but with o1-preview

Initial problem: https://0x0.st/8K3t.txt

Codegolf one: https://0x0.st/8K3y.txt

As you can see, o1 is a bit better on the initial problem but still fails at the code golf one. It really isn't beating the baseline naive solution. It does 170 MiB/s compared to 160 MiB/s (baseline with -O3). This is something I'd hope it could do really well on given that this problem is rather famous and so many occurrences of it should show up. There's tons of variations out there and It is common to see parallel fizzbuzz in a class on parallelization as well as it can teach important concepts like keeping the output in the right order.

But hey, at least o1 has the correct output... It's just that that's not all that matters.

I stand by this: evaluating code based on output alone is akin to evaluating a mathematical proof based on the result. And I hope these examples make the point why that matters, why checking output is insufficient.

[0] https://codegolf.stackexchange.com/questions/215216/high-thr...

Edit: I want to add that there's also an important factor here. The LLM might get you a "result" faster, but you are much more likely to miss the learning process that comes with struggling. Because that makes you much faster (and more flexible) not just next time but in many situations where even a subset is similar. Which yeah, totally fine to glue shit together when you don't care and just need something, but there's a lot of missed value if you need to revisit any of that. I do have concerns that people will be plateaued at junior levels. I hope it doesn't cause seniors to revert to juniors, which I've seen happen without LLMs. If you stop working on these types of problems, you lose the skills. There's already an issue where we rush to get output and it has clear effects on the stagnation of devs. We have far more programmers than ever but I'm not confident we have a significant number more wizards (the percentage of wizards is decreasing). There's fewer people writing programs just for fun. But "for fun" is one of our greatest learning tools as humans. Play is a common trait you see in animals and it exists for a reason.

wiremine

> Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.

That's interesting. I found assistants like Copilot fairly good at low level code, assuming you direct it well.

godelski

I have a response to a sibling comment showing where GPT 4o and o1-preview do not yield good results.

  > assuming you direct it well.

But hey, I admit I might not be good at this. But honestly, I've found greater value in my time reading the docs than spending trying to prompt engineer my way through. And I've given a fair amount of time to trying to get good at prompting. I just can't get it to work.

I do think that when I'm coding with an LLM it _feels_ faster, but when I've timed myself, it doesn't seem that way. It just seems to be less effort (I don't mind the effort, especially because the compounding rewards).

rpastuszak

I've changed my mind on that as well. I think that, generally, chat UIs are a lazy and not very user friendly. However, when coding I keep switching between two modes:

1. I need a smart autocomplete that can work backwards and mimic my coding patterns

2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)

Pair development, even a butchered version of the so called "strong style" (give the driver the highest level of abstraction they can use/understand) works quite well for me. But, the main reason this works is that it forces me to structure my thinking a little bit, allows me to iterate on the definition of the problem. Toss away the sketch with bigger parts of the problem, start again.

It also helps me to avoid yak shaving, getting lost in the detail or distracted because the feedback loop between me seeing something working on the screen vs. the idea is so short (even if the code is crap).

I'd also add 5.: use prompts to generate (boring) prompts. For instance, I needed a simple #tag formatter for one of my markdown sites. I am aware that there's a not-so-small list of edge cases I'd need to cover. In this case I'd write a prompt with a list of basic requirements and ask the LLM to: a) extend it with good practice, common edge cases b) format it as a spec with concrete input / output examples. This works a bit similar to the point you made about generating unit tests (I do that too, in tandem with this approach).

In a sense 1) is autocomplete 2) is a scaffolding tool.

yarekt

Oh yea, point 1 for sure. I call copilot regex on steroids.

Example: - copy paste a table from a pdf datasheet into a comment (it'll be badly formatted with newlines and whatnot, doesn't matter) - show it how to do the first line - autocomplete the rest of the table - Check every row to make sure it didn't invent fields/types

For this type of workflow the tools are a real time saver. I've yet to see any results for the other workflows. They usually just frustrate me by either starting to suggest nonsense code without full understanding, or its far too easy to bias the results and make them stuck in a pattern of thinking.

ryandrake

> I've changed my mind on that as well. I think that, generally, chat UIs are a lazy and not very user friendly. However, when coding I keep switching between two modes:

> 1. I need a smart autocomplete that can work backwards and mimic my coding patterns

> 2. I need a pair programming buddy (of sorts, this metaphor doesn't completely work, but I don't have a better one)

Thanks! This is the first time I've seen it put this clearly. When I first tried out CoPilot, I was unsure of how I was "supposed" to interact with it. Is it (as you put it) a smarter autocomplete, or a programming buddy? Is it both? What was the right input method to use?

After a while, I realized that for my personal style I would pretty much entirely use method 1, and never method 2. But, others might really need that "programming buddy" and use that interface instead.

echelon

I work on GenAI in the media domain, and I think this will hold true with other fields as well:

- Text prompts and chat interfaces are great for coarse grained exploration. You can get a rough start that you can refine. "Knight standing in a desert, rusted suit of armor" gets you started, but you'll want to take it much further.

- Precision inputs (mouse or structure guided) are best for fine tuning the result and honing in on the solution itself. You can individually plant the cacti and pose the character. You can't get there with text.

dataviz1000

I agree with you.

Yesterday, I asked o3-mini to "optimize" a block of code. It produced very clean, functional TypeScript. However, because the code is reducing stock option chains, I then asked o3-mini to "optimize for speed." In the JavaScript world, this is usually done with for loops, and it even considered aspects like array memory allocation.

This shows that using the right qualifiers is important for getting the results you want. Today, I use both "optimize for developer experience" and "optimize for speed" when they are appropriate.

Although declarative code is just an abstraction, moving from imperative jQuery to declarative React was a major change in my coding experience. My work went from telling the system how to do something to simply telling it what to do. Of course, in React—especially at first—I had to explain how to do things, but only once to create a component. After that, I could just tell the system what to do. Now, I can simply declare the desired outcome, the what. It helps to understand how things work, but that level of detail is becoming less necessary.

javier2

Nah, a Chat is terrible for development. In my tears of working, i have only had the chance to start a new codebase 3-4 times. 90% of the time is spent modifying large existing systems, constantly changing them. The chat interface is terrible for this. It would be much better if it was more integrated with the codebase and editor

pc86

Cursor does all of this, and agent chats let you describe a new feature or an existing bug and it will search the entire codebase and add relevant code to its context automatically. You can optionally attach files for the context - code files that you want to add to the context up front, documentation for third-party calls, whatever you want.

As a side note, "No, you're wrong" is not a great way to have a conversation.

javier2

Yeah, that is right. I'll give Cursor a try, because I believe we can do much better than these hopeless chat windows!

zahlman

>In my tears of working

Sometimes typos are eerily appropriate ;)

(I almost typed "errily"...)

javier2

I’ll leave it!

rafaelmn

This only works for small self-contained problems with narrow scope/context.

Chat sucks for pulling in context, and the only worse thing I've tried is the IDE integrations that supposedly pull the relevant context for you (and I've tried quite a few recently).

I don't know if naive fine-tuning with codebase would work, I suspect there are going to be tools that let you train the AI on code in the sense that it can have some references in model, and it knows how you want your project code/structure to look like (which is often quite different from what it looks in most areas)

jacob019

Totally agree. Chat is a fantastic interface because it stays out of my way. For me it's much more than a coding assistant. I get live examples of how to use tools, and help with boilerplate, which is a time saver and improvement over legacy workflows, but the real benefit is all the spitballing I can do with it to refine ideas and logic and help getting up to speed on tooling way outside of my domain. I spent about 3.5 hours chatting with o1 about RL architecture to solve some business problems. Now I have a crystal clear plan and the confidence to move forward in an optimal way. I feel a little weird now, like I was just talking to myself for a few hours, but it totally helped me work through the planning. For actual code, I find myself being a bit less interactive with LLMs as time goes, sometimes it's easier to just write the logic the way I want rather than trying to explain how I want it but the ability to retrieve code samples for anything with ease is like a superpower. Not to mention all the cool stuff LLMs can do at runtime via API. Yeah, chat is great, and I'll stick with writing code in Vim and pasting as needed.

gamedever

What did you create? In my field, so far, I've found the chat bots not doing so well. My guess is the more likely you're making something other people make often, the more likely the bot will help.

Even then though, I asked o1-cursor to start a react app. It failed, mostly because it's out of date. It's instructions were for react 2 versions ago.

This seems like an issue. If the statistically most likley answer is old, that's not helpful.

wiremine

The most recent one was a typescript project focused on zod.

I might be reading into your comment, but I agree "top-down" development sucks: "Give me a react that does X". I've had much more success going bottom-up.

And I've often seen models getting confused on versions. You need to be explicit, and even then then forget.

taeric

I'm growing to the idea that chat is a bad UI pattern, period. It is a great record of correspondence, I think. But it is a terrible UI for doing anything.

In large, I assert this is because the best way to do something is to do that thing. There can be correspondence around the thing, but the artifacts that you are building are separate things.

You could probably take this further and say that narrative is a terrible way to build things. It can be a great way to communicate them, but being a separate entity, it is not necessarily good at making any artifacts.

zamfi

With apologies to Bill Buxton: "Every interface is best at something and worst at something else."

Chat is a great UI pattern for ephemeral conversation. It's why we get on the phone or on DM to talk with people while collaborating on documents, and don't just sit there making isolated edits to some Google Doc.

It's great because it can go all over the place and the humans get to decide which part of that conversation is meaningful and which isn't, and then put that in the document.

It's also obviously not enough: you still need documents!

But this isn't an "either-or" case. It's a "both" case.

packetlost

I even think it's bad for generalized communication (ie. Slack/Teams/Discord/etc.) that isn't completely throwaway. Email is better in every single way for anything that might ever be relevant to review again or be filtered due to too much going on.

goosejuice

I've had the opposite experience.

I have never had any issue finding information in slack with history going back nearly a decade. The only issue I have with Slack is a people problem where most communication is siloed in private channels and DMs.

Email threads are incredibly hard to follow though. The UX is rough and it shows.

packetlost

I hard disagree. Don't have a conversation? Ask someone who does to forward it. Email lets the user control how to organize conversations. Want to stuff a conversation in a folder? Sure. Use tags religiously? Go for it. Have one big pile and rely on full-text search and metadata queries? You bet. Only the last of these is possible with the vast majority of IM platforms because the medium just doesn't allow for any other paradigm.

The fact that there's a subject header alone leads people to both stay on topic and have better thought out messages.

I agree that email threads could have better UX. Part of that is the clients insistence on appending the previous message to every reply. This is completely optional though and should probably be turned off by default for simple replies.

esafak

In Slack people don't even consistently use threads, because they are not forced to, so conversations are strewn all over the place, interleaved with one another. Slack has no model of a discussion in the first place.

taeric

Anything that needs to be filtered for viewing again pretty much needs version control. Email largely fails at that, as hard as other correspondence systems. That said, we have common workflows that use email to build reviewed artifacts.

People love complaining about the email workflow of git, but it is demonstrably better than any chat program for what it is doing.

packetlost

I don't think I agree with this. Sure, many things should be versioned, but I don't think most correspondence requires it, which is emails primarily purpose.

SoftTalker

Yes, agree. Chatting with a computer has all the worst attributes of talking to a person, without any of the intuitive understanding, nonverbal cues, even tone of voice, that all add meaning when two human beings talk to each other.

TeMPOraL

That comment made sense 3 years ago. LLMs already solved "intuitive understanding", and the realtime multimodal variants (e.g. the thing behind "Advanced Voice" in ChatGPT app) handle tone of voice in both directions. As for nonverbal cues, I don't know yet - I got live video enabled in ChatGPT only few days ago and didn't have time to test it, but I would be surprised if it couldn't read the basics of body language at this point.

Talking to a computer still sucks as an user interface - not because a computer can't communicate on multiple channels the way people do, as it can do it now too. It sucks for the same reason talking to people sucks as an user interface - because the kind of tasks we use computers for (and that aren't just talking with/to/at other people via electronic means) are better handle by doing than by talking about them. We need an interface to operate a tool, not an interface to an agent that operates a tool for us.

As an example, consider driving (as in, realtime control - not just "getting from point A to B"): a chat interface to driving would suck just as badly as being a backseat driver sucks for both people in the car. In contrast, a steering wheel, instead of being a bandwidth-limiting indirection, is an anti-indirection - not only it lets you control the machine with your body, the control is direct enough that over time your brain learns to abstract it away, and the car becomes an extension of your body. We need more of tangible interfaces like that with computers.

The steering wheel case, of course, would fail with "AI-level smarts" - but that still doesn't mean we should embrace talking to computers. A good analogy is dance - it's an interaction between two independently smart agents exploring an activity together, and as they do it enough, it becomes fluid.

So dance, IMO, is the steering wheel analogy for AI-powered interfaces, and that is the space we need to explore more.

ryandrake

> We need an interface to operate a tool, not an interface to an agent that operates a tool for us.

Excellent comment and it gets to the heart of something I've had trouble clearly articulating: We've slowly lost the concept that a computer is a tool that the user wields and commands to do things. Now, a computer has its own mind and agency, and we "request" it to do things and "communicate" with it, and ask it to run this and don't run that.

Now, we're negotiating and pleading with the man inside of the computer, Mr. Computer, who has its own goals and ambitions that don't necessarily align with your own as a user. It runs what it wants to run, and if that upsets you, user, well tough shit! Instead of waiting for a command and then faithfully executing it, Mr. Computer is off doing whatever the hell he wants, running system applications in the background, updating this and that, sending you notifications, and occasionally asking you for permission to do even more. And here you are as the user, hobbled and increasingly forced to "chat" with it to get it to do what you want.

Even turning your computer off! You used to throw a hardware switch that interrupts the power to the main board, and _sayonara_ Mr. Computer! Now, the switch does nothing but send an impassioned plea to the operating system to pretty please, with sugar on top, when you're not busy could you possibly power off the computer (or mostly power it off, because off doesn't even mean off anymore).

smj-edison

This is one reason I love what Bret Victor has been doing with Dynamic Land[1]. He's really been doing in on trying to engage as many senses as possible, and make the whole system understandable. One of his big points is that the future in technology is helping us understand more, not defer our understanding to something else.

[1] https://dynamicland.org/

EDIT: love your analogy to dance!

taeric

I think this gets to how a lot of these conversations go past each other? A chat interface for getting a ride from a car is almost certainly doable? So long as the itinerary and other details remain separate things? At large, you are basically using a chat bot to be a travel agent, no?

But, as you say, a chat interface would be a terrible way to actively drive a car. And that is a different thing, but I'm growing convinced many will focus on the first idea while staving off the complaints of the latter.

In another thread, I assert that chat is probably a fine way to order up something that fits a repertoire that trained a bot. But, I don't think sticking to the chat window is the best way to interface with what it delivers. You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.

aylmao

I would also call it having all the worst attributes of a CLI, without the succinctness, OS integration, and program composability of one.

1ucky

You should check out out MCP by Anthropic, which solves some of the issues you mentioned.

hakfoo

The idea of chat interfaces always seemed to be to disguise available functionality.

It's a CLI without the integrity. When you bought a 386, it came with a big book that said "MS-DOS 4.01" and enumerated the 75 commands you can type at the C:\> prompt and actually make something useful happen.

When you argue with ChatGPT, its whole business is to not tell you what those 75 commands are. Maybe your prompt fits its core competency and you'll get exactly what you wanted. Maybe it's hammering what you said into a shape it can parse and producing marginal garbage. Maybe it's going to hallucinate from nothing. But it's going to hide that behind a bunch of cute language and hopefully you'll just keep pulling the gacha and blaming yourself if it's not right.

taeric

Yeah, this is something I didn't make clear on my post. Chat between people is the same bad UI. People read in the aggression that they bring to their reading. And get mad at people who are legit trying to understand something.

You have some of the same problems with email, of course. Losing threading, in particular, made things worse. It was a "chatification of email" that caused people to lean in to email being bad. Amusing that we are now seeing chat applications rise to replace email.

SoftTalker

Yeah this is part of why RTO is not an entirely terrible idea. Remote work has these downsides -- working with another person over a computer link sucks pretty hard, no matter how you do it (not saying WFH doesn't have other very real upsides).

Suppafly

I like the idea of having a chat program, the issue is that it's horrible to have a bunch of chat programs all integrated into every application you use that are separate and incompatible with each other.

I really don't like the idea of chatting with an AI though. There are better ways to interface with AIs and the focus on chat is making people forget that.

tux1968

We need an LSP like protocol for AI, so that we can amortize the configuration over every place we want such an integration. AISP?

lytedev

I think they're working on it? MCP: https://www.anthropic.com/news/model-context-protocol

freedomben

Midjourney is an interesting case study in this I think, building their product UI as a discord bot. It was interesting to be sure, but I always felt like I was fighting the "interface" to get things done. It certainly wasn't all bad, and I think if I used it more it might even be great, but as someone who doesn't use Discord other than that and only rarely generated images, I had to read the docs every time I wanted to generate an image, which is a ridiculous amount of friction.

joe_guy

There has recently been a pretty large UI inclusion for midjourney directly inside Discord which has the option of being used instead of the text input.

As is often the case in these sorts of thingsz your milage may vary for the more complex settings.

ijk

I'm curious if you find their new website interface more tractable--there's some inherent friction to the prompting in either case, but I'd like to know if the Discord chat interface can be overcome by using a different interface or if the issue is more intrinsic.

troupo

Their website UI is great. Discord is nigh unusable

dapperdrake

Email threads seem better for documenting and searching correspondence.

The last counter argument I read got buried on Discord or Slack somewhere.

jayd16

Isn't this entirely an implementation detail of slack and discord search? What about email makes it more searchable fundamentally? The meta data if both platforms is essentially the same, no?

NovemberWhiskey

I think this depends very much on how you use the tools.

My experience with email is that people have subject lines, email explicitly identifies to and cc recipients; email is threaded; email often has quotes/excerpting/highlighting from prior parts of the thread.

On the other hand, most chat usage I see is dependent on temporal aspects for threading (people under-utilize platform features for replies etc), tagging is generally only done to ping people to attract attention, chat groups are frequently reused for multiple different purposes.

Leaping to a point-in-time within a chat stream is often a bad user experience, with having to scroll up and down through unrelated stuff to find what you’re looking for.

Stuff in email is just massively more discoverable for me.

mrweasel

No, it has to do with context. In an email you will frequently have to provide more context for your answers to make sense. Chat is a conversation, which search drops you straight into, may with AI you could get placed at an appropriate starting point, but you're still reading a conversation. It's much easier to get dropped into a correspondence. To me the difference is like reading someones letter, vs. overhearing a conversation in a bus.

This obvious assumes that who ever wrote the email isn't a madman, who insist on using emails like it was a chat.

layer8

What makes email more useful in general is that each email is a separate object that you can organize in any way you want, i.e. move, copy, rename, sort into folders, attach as a file to any calendar entry, todo item, etc., or indeed to any other email. You can forward them to any other recipient, you can add and remove any recipient to and from the conversation at any time. It is conceptually powerful and flexible in a similar way that files in a file system are a powerful and flexible way to organize data. And it is easy to understand.

While all of these features could in principle be realized in a chat system as well, in practice they don’t provide that flexibility and power.

Another usability feature of emails is that they have a subject line. This allows to meaningfully list emails in a compact fashion. In a desktop interface, you can easily view and visually grep 50 emails or more at once in a mail folder or list of search results (in something like Outlook or Thunderbird or Mutt). This allows working with emails more efficiently than with a chat view where you can only see a few messages at once, and only of the same thread or channel.

Yet another usability feature of emails is that each email has its own read/unread status. This, again, is facilitated by each email being its own separate data object, and by the separation between subject and body, which allows the read status to be unambiguously bound to “opening” the email, or to navigating in the list of emails alongside a preview pane. And you can mark any email as unread again. In chats, the granularity of read/unread is the whole chat, whether you’ve actually read all of it or not. You can’t easily track what you’ve read or not in an automated way as with email, other than by that coarse-grained linear time-based property of when you last visited the channel.

slongfield

Personally, when I send an email, I feel less time pressure to respond, so I more carefully craft my responses. The metadata is similar enough, but the actual data in email/forums is usually better.

al_borland

I find things get buried just as easily in email. People on my team are constantly resending each other emails, because they can’t find the thread.

This is why, if something is important, I take it out of email and put it into a document people can reference. The latest and correct information from all the decisions in the thread can also be collected in one place, so everyone reading doesn’t have to figure it out. Not to mention side conversations can influence the results, without being explicitly stated in the email thread.

kmoser

> This is why, if something is important, I take it out of email and put it into a document people can reference.

This is how things should be done, regardless of which medium is used to discuss the project. Without isolating and aggregating the final decision of each thread, there is no way to determine what everybody has agreed upon as the final product without looking back, which quickly becomes onerous.

Things get messy when you start having different versions of each feature, but that doesn't change the concept of using email/Slack/Discord/text/etc. for discussion and a separate "living" document for formalizing those decisions.

HappMacDonald

We had this problem in our organization circa 20 years back so I built a ticketing system, now each conversation exists as its own object, and "the same thing being discussed twice" has the opportunity to be merged into one, etc. That seems to have helped a lot with our internal conversations.

taeric

Discord and slack baffle me. I liked them specifically because they were more ephemeral than other options. Which, seems at odds with how people want them to be? Why?

wizzard0

Can't say for everyone, but I have terrible memory and rely heavily on the chat history (and other tools) to keep my mental model in shape.

Here, ephemeral means "this conversation might as well never had happened", so why waste time on that?

mrweasel

I really don't get why people are so happy about Slack (never used Discord). The interface is awful, it barely functions as a chat client, yet people adds bots, automation and use it as a repository for documentation. Honestly it would be better if history was deleted weekly or something, just to prevent people from storing things in Slack.

jayd16

Were these ever ephemeral? Are you misremembering history free IRC chat rooms?

Oh, how nice it must be to complain about Slack. Try using Teams and you will never want to complain about Slack again.

jerjerjer

Slack is way worse than Teams. I honestly rather dislike both and rather use email only, but will pick Teams over Slack any time.

chinathrow

Voice messages within a chat UI is even worse. I can't search it, I can't listen to it in the same situations I can read a message.

I wish I could block them within all these chat apps.

"Sorry, you can't bother to send voice messages to this person."

taeric

Oh dear lord yes. I am always baffled when I hear that some folks send voice memos to people.

beambot

As a written form of "stream of consciousness", it seems to have a lot of value to me. It's noisy, inefficient & meandering -- all the things those polished artifacts are not -- but it's also where you can explore new avenues without worrying about succinctness or completeness. It's like the first draft of a manuscript.

taeric

Certainly, it can have its use. But I question if it is stronger than previous generative techniques for creating many things. There have been strong tools that you could, for example, draw a box and say this should be a house. Add N rooms. This room should be a bathroom. Add windows to these rooms. Add required subfloor and plumbing.

Even with game development. Level editors have a good history for being how people actually make games. Some quite good ones, I should add.

For website development, many template based systems worked quite well. People seem hellbent on never acknowledging that form builders of the late 90s did, in fact, work.

Is it a bit nicer that you can do everything through a dialog? I'm sure it is a great for people that think that way.

themanmaran

I'm surprised that the article (and comments) haven't mentioned Cursor.

Agreed that copy pasting context in and out of ChatGPT isn't the fastest workflow. But Cursor has been a major speed up in the way I write code. And it's primarily through a chat interface, but with a few QOL hacks that make it way faster:

1. Output gets applied to your file in a git-diff style. So you can approve/deny changes.

2. It (kinda) has context of your codebase so you don't have to specify as much. Though it works best when you explicitly tag files ("Use the utils from @src/utils/currency.ts")

3. Directly inserting terminal logs or type errors into the chat interface is incredibly convenient. Just hover over the error and click the "add to chat"

dartos

I think the wildly different experiences we all seem to have with AI code tools speaks to the inconsistency of the tools and our own lack of understanding of what goes into programming.

I’ve only been slowed down with AI tools. I tried for a few months to really use them and they made the easy tasks hard and the hard tasks opaque.

But obviously some people find them helpful.

Makes me wonder if programming approaches differ wildly from developer to developer.

For me, if I have an automated tool writing code, it’s bc I don’t want to think about that code at all.

But since LLMs don’t really act deterministically, I feel the need to double check their output.

That’s very painful for me. At that point I’d rather just write the code once, correctly.

kenjackson

I use LLMs several times a day, and I think for me the issue is that verification is typically much faster than learning/writing. For example, I've never spent much time getting good at scripting. Sure, probably a gap I should resolve, but I feel like LLMs do a great job at it. And what I need to script is typically easy to verify, I don't need to spend time learning how to do things like, "move the files of this extension to this folder, but rewrite them so that the name begins with a three digit number based on the date when it was created, with the oldest starting with 001" -- or stuff like that. Sometimes it'll have a little bug, but one that I can debug quickly.

Scripting assistance by itself is worth the price of admission.

The other thing I've found it good at is giving me an English description of code I didn't write... I'm sure it sometimes hallucinates, but never in a way that has been so wrong that its been apparent to me.

shaan7

I think you and the parent comment are onto something. I also feel like the parent since I find it relatively difficult to read code that someone else wrote. My brain easily gets biased into thinking that the cases that the code is covering are the only possible ones. On the flip side, if I were writing the code, I am more likely to determine the corner cases. In other words, writing code helps me think, reading just biases me. This makes it extremely slow to review a LLM's code at which point I'd just write it myself.

Very good for throwaway code though, for example a PoC which won't really be going to production (hopefully xD).

skydhash

Your script example is a good one, but the nice thing about scripting is when you learn the semantic of it. Like the general pattern of find -> filter/transform -> select -> action. It’s very easy to come up with a one liner that can be trivially modified to adapt it to another context. More often than not, I find LLMs generate overly complicated scripts.

aprilthird2021

I think it's about what you're working on. It's great for greenfield projects, etc. Terrible for complex projects that plug into a lot of other complex projects (like most of the software those of us not at startups work on day to day)

dartos

It’s been a headache for my greenfield side projects and for my day to day work.

Leaning on these tools just isn’t for me rn.

I like them most for one off scripts or very small bash glue.

sangnoir

> But since LLMs don’t really act deterministically, I feel the need to double check their output.

I feel the same

> That’s very painful for me. At that point I’d rather just write the code once, correctly.

I use AI tools augmentatively, and it's not painful for me, perhaps slightly inconvenient. But for boiler-plate-heavy code like unit tests or easily verifiable refactors[1], adjusting AI-authored code on a per-commit basis is still faster than me writing all the code.

1. Like switching between unit-test frameworks

lolinder

I like Cursor, but I find the chat to be less useful than the super advanced auto complete.

The chat interface is... fine. Certainly better integrated into the editor than GitHub Copilot's, but I've never really seen the need to use it as chat—I ask for a change and then it makes the change. Then I fixed what it did wrong and ask for another change. The chat history aspect is meaningless and usually counterproductive, because it's faster for me to fix its mistakes than keep everything in the chat window while prodding it the last 20% of the way.

tarsinge

I was a very skeptic on AI assisted coding until I tried Cursor and experienced the super autocomplete. It is ridiculously productive. For me it’s to the point it makes Vim obsolete because pressing tab correctly finishes the line or code block 90% of the time. Every developer having an opinion on AI assistance should have just tried to download Cursor and start editing a file.

themanmaran

Agreed the autocomplete definitely gets more milage than the chat. But I frequently use it for terminal commands as well. Especially AWS cli work.

"how do I check the cors bucket policies on [S3 bucket name]"

fragmede

> while prodding it the last 20% of the way.

hint: you don't get paid to get the LLM to output perfect code, you get paid by PRs submitted and landed. Generate the first 80% or whatever with the LLM, and then finish the last 20% that you can write faster than the LLM yourself, by hand.

reustle

Depends on the company. Most of the time, you get paid to add features and fix bugs, while maintaining reliability.

End users don’t care where the code came from.

jeremyjh

That is exactly what GP was pointing out, and why they said they do not prod it for it the last 20%.

koito17

I'm not familiar with Cursor, but I've been using Zed with Claude 3.5 Sonnet. For side projects, I have found it extremely useful to provide the entire codebase as context and send concise prompts focusing on a single requirement. Claude handles "junior developer" tasks well when each unit of work is clearly separated.

Zed makes it trivial to attach documentation and terminal output as context. To reduce risk of hallucination, I now prefer working in static, strongly-typed languages and use libraries with detailed documentation, so that I can send documentation of the library alongside the codebase and prompt. This sounds like a lot of work, but all I do is type "/f" or "/t" in Zed. When I know a task only modifies a single file, then I use the "inline assist" feature and review the diffs generated by the LLM.

Additionally, I have found it extremely useful to actually comment a codebase. LLMs are good at unstructured human language, it's what they were originally designed for. You can use them to maintain comments across a codebase, which in turn helps LLMs since they get to see code and design together.

Last weekend, I was able to re-build a mobile app I made a year ago from scratch with a cleaner code base, better UI, and implement new features on top (making the rewrite worth my time). The app in question took me about a week to write by hand last year; the rewrite took exactly 2 days.

---

As a side note: a huge advantage of Zed with locally-hosted models is that one can correct the code emitted by the model and force the model to re-generate its prior response with those corrections. This is probably the "killer feature" of models like qwen2.5-coder:32b. Rather than sending extra prompts and bloating the context, one can just delete all output from where the first mistake was made, correct the mistake, then resume generation.

stitched2gethr

I think this misses the point. It seems like the author is saying we should move from imperative instructions to a declarative document that describes what the software should do.

Imperative: - write a HTTP server that serves jokes - add a healthcheck endpoint - add TLS and change the serving port to 443

Declarative: - a HTTP server that serves jokes - contains a healthcheck endpoint - supports TLS on port 443

The differences here seem minimal because you can see all of it at once, but in the current chat paradigm you'd have to search through everything you've said to the bot to get the full context, including the side roads that never materialized.

In the document approach you're constantly refining the document. It's better than reviewing the code because (in theory) you're looking at "support TLS on port 443" instead of a lot of code, which means it can be used by a wider audience. And ideally I can give the same high level spec to multiple LLMs and see which makes the best application.

ygouzerh

Good explanation! As an open-reflexion: will a declarative document be as detailed as the imperative version? Often between the specs that the product team is providing (that we can consider as the "descriptive" document) and the implementation, many sub specs have been created by the tech team that uncovered some important implementation details. It's like a Rabbit Hole.

For example, for a signup page, we could have: - Declarative: Signup the user using their email address - Imperative: To do the same, we will need to implement the smtp library, which means discovering that we need an SMTP server, so now we need to choose which one. And when purchasing an SMTP Server plan, we discover that there are rate limit, so now we need to add some bot protection to our signup page (IP Rate Limit only? ReCaptcha? Cloudflare bot protection?), etc

Which means that at the end, the imperative code way is kind of like the ultimate implementation specs.

bze12

I could imagine a hybrid where declarative statements drive the high-level, and lower-level details branch off and are hashed out imperatively (in chat). Maybe those detail decisions then revise the declarative statements.

The source of truth would still be the code though, otherwise the declarative statements would get so verbose that they wouldn't be any more useful than writing the code itself.

skydhash

The issue is that there’s no execution platform for declarative specs, so something will be translated to imperative and that is where the issue lies. There’s always an imperative core which needs to be deterministic or it’s out needs to be verified. LLMs are not the former and the latter option can take more time than just writing the code.

patrickaljord

Instead of Cursor I would recommend two open source alternatives that you can combine: https://www.continue.dev/ and https://github.com/cline/cline

freeone3000

It’s not nearly as slick. cursor’s indexing and integration are significant value-adds.

coder543

I used Continue before Cursor. Cursor’s “agent” composer mode is so much better than what Continue offered. The agent can automatically grep the codebase for relevant files and then read them. It can create entirely new files from scratch. I can still manually provide some files as context, but it’s not usually necessary. With Continue, everything was very manual.

Cursor also does a great job of showing inline diffs of what composer is doing, so you can quickly review every change.

I don’t think there’s any reason Continue can’t match these features, but it hadn’t, last I checked.

Cursor also focuses on sane defaults, which is nice. The tab completion model is very good, and the composer model defaults to Claude 3.5 Sonnet, which is arguably the best non-reasoning code model. (One would hope that Cursor gets agent-composer working with reasoning models soon.) Continue felt much more technical… which is nice for power users, but not always the best starting place.

notShabu

chat is the best way to orchestrate and delegate. whether or not this is considered "ME writing MY code" is imo a philosophical debate

e.g. executives treat the org as a blackbox LLM and chat w it to get real results

mkozlows

Windsurf is even moreso this way -- it'll look through your codebase trying to find the right files to inspect, it runs the build/test stuff and examines the output to see what went wrong.

I found interacting with it via chat to be super-useful and a great way to get stuff done. Yeah, sometimes you just have to drop into the code, and tag a particular line and say "this isn't going to work, rewrite it to do x" (or rewrite it yourself), but the ability to do that doesn't vitiate the value of the chat.

mholm

Yeah, the OP has a great idea, but models as-is can't handle that kind of workflow reliably. The article is both a year behind, and a year ahead at the same time. The user must iterate with the chatbot, and you can't do that by just doing a top down 'here's a list of all features, get going, ping me when finished' prompt. AI is a junior engineer, so you have to treat it like a junior engineer, and that means looking through your chat logs, and perhaps backing up to a restore point and going a different direction.

mttrms

I've started using Zed on a side project and I really appreciate that you can easily manipulate the chat / context and continue making requests

https://zed.dev/docs/assistant/assistant-panel#editing-a-con...

It's still a "chat" but it's just text at the end of the day. So you can edit as you see fit to refine your context and get better responses.

croes

Natural language isn’t made to be precise that’s why we use a subset in programming languages.

So you either need lots of extra text to remove the ambiguity of natural language if you use AI or you need a special precise subset to communicate with AI and that’s just programming with extra steps.

Klaster_1

A lot of extra text usually means prior requirements, meeting transcripts, screen share recordings, chat history, Jira tickets and so on - the same information developers use to produce a result that satisfies the stakeholders and does the job. This seems like a straightforward direction solvable with more compute and more efficient memory. I think this will be the way it pans outs.

Real projects don't require an infinitely detailed specification either, you usually stop where it no longer meaningfully moves you towards the goal.

The whole premise of AI developer automation, IMO, is that if a human can develop a thing, then AI should be able too, given the same input.

throwaway290

idk if you think all those jira tickets and meetings are precise enough (IMO sometimes the opposite)

By the way, remind me why you need design meetings in that ideal world?:)

> Real projects don't require an infinitely detailed specification either, you usually stop where it no longer meaningfully moves you towards the goal.

The point was that specification is not detailed enough in practice. Precise enough specification IS code. And the point is literally that natural language is just not made to be precise enough. So you are back where you started

So you waste time explaining in detail and rehashing requirements in this imprecise language until you see what code you want to see. Which was faster to just... idk.. type.

Klaster_1

That's a fair point, I'd love to see Copilot come to a conclusion that they can't resolve a particular conundrum and communicates with other people so everyone makes a decision together.

falcor84

Even if you have superhuman AI designers, you still need buy-in.

cube2222

We are kind of actually there already.

With a 200k token window like Claude has you can already dump a lot of design docs / transcripts / etc. at it.

rightisleft

Its all about the context window. Even the new Mistral Codestral-2501 256K CW does a great job.

If you use cline with any large context model the results can be pretty amazing. It's not close to self guiding, You still need to break down and analyze the problem and provide clear and relevant instructions. IE you need to be a great architect. Once you are stable on the direction, its awe inspiring to watch it do the bulk if the implementation.

I do agree that there is space to improve over embedded chat windows in IDEs. Solutions will come in time.

mollyporph

And Gemini has 2m token window. Which is about 10 minutes of video for example.

layer8

This premise in your last paragraph can only work with AGI, and we’re probably not close to that yet.

oxfordmale

Yes, let's devise a more precise way to give AI instructions. Let's call it pAIthon. This will allow powers that be, like Zuckerberg to save face and claim that AI has replaced mid-level developers and enable developers to rebrand themselves as pAIthon programmers.

Joking aside, this is likely where we will end up, just with a slightly higher programming interface, making developers more productive.

dylan604

man, pAIthon was just sitting right there for the taking

oxfordmale

Thanks for pointing it out :-)

null

[deleted]

pjc50

There was a wave of this previously in programming: https://en.wikipedia.org/wiki/The_Last_One_(software)

All the same buzzwords, including "AI"! In 1981!

empath75

AIs actually are very good at this. They wouldn't be able to write code at all otherwise. If you're careful in your prompting, they'll make fewer assumptions and ask clarifying questions before going ahead and writing code.

9rx

> If you're careful in your prompting

In other words, if you replace natural language with a programming language then the computer will do a good job of interpreting your intent. But that's always been true, so...

benatkin

Being careful in your prompting doesn’t imply that. That can also be thought of as just using natural language well.

LordDragonfang

> they'll make fewer assumptions and ask clarifying questions before going ahead and writing code.

Which model are you talking about here? Because with ChatGPT, I struggle with getting it to ask any clarifying questions before just dumping code filled with placeholders I don't want, even when I explicitly prompt it to ask for clarification.

oxfordmale

AI is very good at this. Unfortunately, humans tend to be super bad at providing detailed verbal instructions.

nomel

Then those same humans won't be able to reason about code, or the problem spaces they're working in, regardless, since it's all fundamentally about precise specifics.

indymike

Languages used for day to day communication between humans do not have the specificity needed for detailed instructions... even to other humans. We out of band context (body language, social norms, tradition, knowledge of a person) quite a bit more than you would think.

croes

AI is a little bit like Occam‘s razor, when you say hoofbeats, you get horses. Bad if you need Zebras.

foobiekr

I don’t think I’ve ever seen an llm in any context ask for clarification. Is that a real thing?

spacemanspiff01

Or a proposal/feedback process. Ala you are hired by non technical person to build something, you generate requirements and a proposed solution. You then propose that solution, they give feedback.

Having a feedback loop is the only way viable for this. Sure, the client could give you a book on what they want, but often people do not know their edge cases, what issues may arise/etc.

dylan604

> and that’s just programming with extra steps.

If you know how to program, then I agree and part of why I don't see the point. If you don't know how to program, than the prompt isn't much different than providing the specs/requirements to a programmer.

kokanee

> or you need a special precise subset to communicate with AI

haha, I just imagined sending TypeScript to ChatGPT and having it spit my TypeScript back to me. "See guys, if you just use Turing-complete logically unambiguous input, you get perfect output!"

charlieyu1

I guess we could have LLM to translate natural language to some precise subset, get it processed, then translate the output back to natural language

thomastjeffery

Natural language can be precise, but only in context.

The struggle is to provide a context that disambiguates the way you want it to.

LLMs solve this problem by avoiding it entirely: they stay ambiguous, and just give you the most familiar context, letting you change direction with more prompts. It's a cool approach, but it's often not worth the extra steps, and sometimes your context window can't fit enough steps anyway.

My big idea (the Story Empathizer) is to restructure this interaction such that the only work left to the user is to decide which context suits their purpose best. Given enough context instances (I call them backstories), this approach to natural language processing could recursively eliminate much of its own ambiguity, leaving very little work for us to do in the end.

Right now my biggest struggle is figuring out what the foundational backstories will be, and writing them.

skydhash

That’s what programming languages are: You define a context, then you see that you can shorten the notation to symbol character: Like “The symbol a will refer to the value of type string and content ‘abcd’ and cannot refer to anything else for its life time” get you:

  const a = “abcd”

That is called semantics. Programming is mostly fitting the vagueness inherent to natural languages to the precise context of the programming language.

thomastjeffery

Yes, but programming languages are categorically limited to context-free grammar. This means that every expression written in a programming language is explicitly defined to have precisely one meaning.

The advantage of natural language is that we can write ambiguously defined expressions, and infer their meaning arbitrarily with context. This means that we can write with fewer unique expressions. It also means that context itself can be more directly involved in the content of what we write.

In context-free grammar, we can only express "what" and "how"; never "why". Instead, the "why" is encoded into every decision of the design and implementation of what we are writing.

If we could leverage ambiguous language, then we could factor out the "why", and implement it later using context.

matthewsinclair

Yep. 100% agree. The whole “chat as UX” metaphor is a cul-de-sac that I’m sure we’ll back out of sooner or later.

I think about this like SQL in the late 80s. At the time, SQL was the “next big thing” that was going to mean we didn’t need programmers, and that management could “write code”. It didn’t quite work out that way, of course, as we all know.

I see chat-based interfaces to LLMs going exactly the same way. The LLM will move down the stack (rather than up) and much more appropriate task-based UX/UI will be put on top of the LLM, coordinated thru a UX/UI layer that is much sympathetic to the way users actually want to interact with a machine.

In the same way that no end-users ever touch SQL these days (mostly), we won’t expose the chat-based UX of an LLM to users either.

There will be a place for an ad-hoc natural language interface to a machine, but I suspect it’ll be the exception rather than the rule.

I really don’t think there are too many end users who want to be forced to seduce a mercurial LLM using natural language to do their day-to-day tech tasks.

jug

I think a counterpoint to this is that SQL has a specific and well-defined meaning and it takes effort to get what you actually want right. However, communication with an AI can sometimes request a specific context or requirements but also be intentionally open-ended where we want to give the AI leeway. The great thing here is that humans _and_ AI now quite clearly understand when a sentence is non-specific, or with great importance. So, I think it’s hard to come up with a more terse or approachable competitor to the sheer flexibility of language. In a way, I think it’s a similar problem that still has engineers across the world input text commands in a terminal screen since about 80 years now.

sangnoir

> The whole “chat as UX” metaphor is a cul-de-sac that I’m sure we’ll back out of sooner or later.

Only when someone discovers another paradigm that matches or exceeds the effectiveness of LLMs without being a language model.

amedviediev

I actually came to the same conclusion. I am currently working on a side project that's an AI powered writing app for writers, and while I still provide chat because that seems to be the expectation, my goal is to abstract all the AI assistance a writer might need into curated UI options.

daxfohl

Or DSLs like cucumber for acceptance tests. Cute for simple things, but for anything realistic, it's more convoluted than convenient.

spolsky

I don't think Daniel's point is that Chat is generically a clunky UI and therefore Cursor cannot possibly exist. I think he's saying that to fully specify what a given computer program should do, you have to provide all kinds of details, and human language is too compressed and too sloppy to always include those details. For example, you might say "make a logon screen" but there are an infinite number of ways this could be done and until you answer a lot of questions you may not get what you want.

If you asked me two or three years ago I would have strongly agreed with this theory. I used to point out that every line of code was a decision made by a programmer and that programming languages were just better ways to convey all those decisions than human language because they eliminated ambiguity and were much terser.

I changed my mind when I saw how LLMs work. They tend to fill in the ambiguity with good defaults that are somewhere between "how everybody does it" and "how a reasonably bright junior programmer would do it".

So you say "give me a log on screen" and you get something pretty normal with Username and Password and a decent UI and some decent color choices and it works fine.

If you wanted to provide more details, you could tell it to use the background color #f9f9f9, but a part of what surprised my and caused me to change my mind on this matter was that you could also leave that out and you wouldn't get an error; you wouldn't get white text on white background; you would get a decent color that might be #f9f9f9 or might be #a1a1a1 but you saved a lot of time by not thinking about that level of detail and you got a good result.

zamfi

Yeah, and in fact this is about the best-case scenario in many ways: "good defaults" that get you approximately where you want to be, with a way to update when those defaults aren't what you want.

Right now we have a ton of AI/ML/LLM folks working on this first clear challenge: better models that generate better defaults, which is great—but also will never solve the problem 100%, which is the second, less-clear challenge: there will always be times you don't want the defaults, especially as your requests become more and more high-level. It's the MS Word challenge reconstituted in the age of LLMs: everyone wants 20% of what's in Word, but it's not the same 20%. The good defaults are good except for that 20% you want to be non-default.

So there need to be ways to say "I want <this non-default thing>". Sometimes chat is enough for that, like when you can ask for a different background color. But sometimes it's really not! This is especially true when the things you want are not always obvious from limited observations of the program's behavior—where even just finding out that the "good default" isn't what you want can be hard.

Too few people are working on this latter challenge, IMO. (Full disclosure: I am one of them.)

skydhash

Which no one argues about really. But writing code was never the issue of software project. And if you open any books about software engineering, there’s barely any mention of coding. The issue is the process of finding what code to write and where to put it in a practical and efficient way.

In your example, the issue is not with writing the logon screen (You can find several example on github and a lot of css frameworks have form snippets). The issue is making sure that it works and integrate well with the rest of the project, as well as being easy to maintain.

jakelazaroff

I agree with the premise but not with the conclusion. When you're building visual things, you communicate visually: rough sketches, whiteboard diagrams, mockups, notes scrawled in the margins.

Something like tldraw's "make real" [1] is a much better bet, imo (not that it's mutually exclusive). Draw a rough mockup of what you want, let AI fill in the details, then draw and write on it to communicate your changes.

We think multi-modally; why should we limit the creative process to just text?

[1] https://tldraw.substack.com/p/make-real-the-story-so-far

Edmond

This is about relying on requirements type documents to drive AI based software development, I believe this will be ultimately integrated into all the AI-dev tools, if not so already. It is really just additional context.

Here is an example of our approach:

https://blog.codesolvent.com/2024/11/building-youtube-video-...

We are also using the requirements to build a checklist, the AI generates the checklist from the requirements document, which then serves as context that can be used for further instructions.

Here's a demo:

https://youtu.be/NjYbhZjj7o8?si=XPhivIZz3fgKFK8B

wongarsu

Now we just need another tool that allows stakeholders to write requirement docs using a chat interface

ajmurmann

I agree with this and disagree at the same time. It depends what the goal is. If the goal is to have AI write the entire codebase for you, yes chat and human language is quite bad. That's part of the reason formal languages exist. But then only experts can use it. Requirement docs are a decent middle ground. However, I'm not sure it's a good goal for AI to generate the code base.

The mode that I've found most fruitful when using Cursor is treating it almost exactly as I would a pair programming partner. When I start on a new piece of functionality I describe the problem and give it what my thoughts are on a potential solution and invite feedback. Sometimes my solution is the best. Sometimes the LLM had a better idea and frequently we take a modified version of what one of us suggested. Just as you would with a human partner. The result of the discussion is better than what either of us would have done on their own.

I also will do classical ping-pong style tdd with it one we agreed on an approach. I'll write a test; llm makes it pass and write the next test which I'll make pass and so on.

As with a real pair, it's important to notice when they are struggling and help them or take over. You can only do this if you stay fully engaged and understand every line. Just like when pairing. I've found llms get frequently in a loop where something doesn't work and they keep applying the same changes they've tried before and it never works. Understand what they are trying to do and help them out. Don't be a shitty pair for your llm!

cruffle_duffle

> I've found llms get frequently in a loop where something doesn't work and they keep applying the same changes they've tried before and it never works.

It gets even funner when you try to get other models to fix whatever is broken and they too get caught in the same loop. I’ll be like “nope! Your buddy ChatGPT said the same thing and got stuck in such and such loop. Clearly whatever you are trying isn’t working so step back and focus on the bigger picture. Are we even doing this the right way in the first place?”

And of course it still walks down the loop. So yeah, better be ready to fix that problem yourself cause if they all do the same thing you are either way off course or they are missing something!

mlsu

I can't wait for someone to invent a new language, maybe a subset of English, that is structured enough to half-well describe computer programs. Then train a model with RLHF to generate source code based on prompts in this new language.

It will slowly grow in complexity, strictness, and features, until it becomes a brand-new programming language, just with a language model and a SaaS sitting in the middle of it.

A startup will come and disrupt the whole thing by simply writing code in a regular programming language.

fullstackwife

> Who is hiring 2035:

> Looking for a low level engineer, who works close to the metal, will work on our prompts

sho_hn

I'd say this criticism is well-addressed in aider. Steering the LLM via code comments is the first UX I've seen that works.

https://aider.chat/docs/usage/watch.html

How jarring it is & how much it takes you out of your own flow state is very much dependent on the model output quality and latency still, but at times it works rather nicely.

fny

Narrative text is a worse UI pattern. It's impractical to read. Also how exactly do you merge narrative changes if you need to write several transformations as updates? Are you expected to update the original text? How does this affect diffs in version control?

I think it's more ideal to have the LLM map text to some declarative pseudocode that's easy to read which is then translated to code.

The example given by Daniel might map to something like this:

  define sign-in-screen:
    panel background "#f9f9f9":
      input email required: true, validate-on-blur: true
      input password required: true
      button "Sign in" gradient: ("#EEE" "#DDD")
      connect-to-database

Then you'd use chat to make updates. For example, "make the gradient red" or "add a name field." Come to think of it, I don't see why chat is a bad interface at all with this set up.

ygouzerh

It's interesting, it seems that we are looping back on the old trend of Model-Driven Architecture

cruffle_duffle

lol. I’ve tried to get my LLM to produce something like that. Prompt was like “I’m going to feed your output to another model, please don’t write a narrative write what we’ve discussed in a machine readable format”.

It decided to output something JSON and maybe YAML once.

HN

Chat is a bad UI pattern for development tools

Chat is a bad UI pattern for development tools