Adding a feature because ChatGPT incorrectly thinks it exists

258 comments

·July 7, 2025

kragen

I've found this to be one of the most useful ways to use (at least) GPT-4 for programming. Instead of telling it how an API works, I make it guess, maybe starting with some example code to which a feature needs to be added. Sometimes it comes up with a better approach than I had thought of. Then I change the API so that its code works.

Conversely, I sometimes present it with some existing code and ask it what it does. If it gets it wrong, that's a good sign my API is confusing, and how.

These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.

(The best thing about this is that I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code, which often takes longer than just writing the code the usual way.)

There are multiple ways that an interface can be bad, and being unintuitive is the only one that this will fix. It could also be inherently inefficient or unreliable, for example, or lack composability. The AI won't help with those. But it can make sure your API is guessable and understandable, and that's very valuable.

Unfortunately, this only works with APIs that aren't already super popular.

suzzer99

> Sometimes it comes up with a better approach than I had thought of.

IMO this has always been the killer use case for AI—from Google Maps to Grammarly.

I discovered Grammarly at the very last phase of writing my book. I accepted maybe 1/3 of its suggestions, which is pretty damn good considering my book had already been edited by me dozens of times AND professionally copy-edited.

But if I'd have accepted all of Grammarly's changes, the book would have been much worse. Grammarly is great for sniffing out extra words and passive voice. But it doesn't get writing for humorous effect, context, deliberate repetition, etc.

The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results.

jll29

> The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results

Thanks for your words of wisdom, which touch on a very important other point I want to raise: often, we (i.e., developers, researchers) construct a technology that would be helpful and "net benign" if deployed as a tool for humans to use, instead of deploying it in order to replace humans. But then along comes a greedy business manager who reckons recklessly that using said technology not as a tool, but in full automation mode, results will be 5% worse, but save 15% of staff costs; and they decide that that is a fantastic trade-off for the company - yet employees may lose and customers may lose.

The big problem is that developers/researchers lose control of what they develop, usually once the project is completed if they ever had control in the first place. What can we do? Perhaps write open source licenses that are less liberal?

kragen

You're trying to put out a forest fire with an eyedropper.

Stock your underground bunkers with enough food and water for the rest of your life and work hard to persuade the AI that you're not a threat. If possible, upload your consciousness to a starwisp and accelerate it out of the Solar System as close to lightspeed as you can possibly get it.

Those measures might work. (Or they might be impossible, or insufficient.) Changing your license won't.

null

[deleted]

null

[deleted]

dataflow

Hasn't Microsoft Word has style checkers for things like passive voice for decades?

adgjlsfhk1

yes, but now they work

null

[deleted]

normie3000

What's wrong with passive?

plemer

Passive voice often adds length, impedes flow, and subtracts the useful info of who is doing something.

Examples:

* Active - concise, complete info: The manager approved the proposal.

* Passive - wordy, awkward: The proposal was approved by the manager.

* Passive - missing info: The proposal was approved. [by who?]

Most experienced writers will use active unless they have a specific reason not to, e.g., to emphasize another element of the sentence, as the third bullet's sentence emphasizes approval.

edited for clarity, detail

jcranmer

There's nothing wrong with the passive voice.

The problem is that many people have only a poor ability to recognize the passive voice in the first place. This results in the examples being clunky, wordy messes that are bad because they're, well, clunky and wordy, and not because they're passive--indeed, you've often got only a fifty-fifty chance of the example passive voice actually being passive in the first place.

I'll point out that the commenter you're replying to used the passive voice, as did the one they responded to, and I suspect that such uses went unnoticed. Hell, I just rewrote the previous sentence to use the passive voice, and I wonder how many people think recognized that in the first place let alone think it worse for being so written.

arscan

There was a time when Microsoft Word would treat the passive voice in your writing with the same level of severity as spelling errors or major grammatical mistakes. Drove me absolutely nuts in high school.

KineticLensman

Passive can be disastrous when used in contractual situations if the agent who should be responsible for an action isn’t identified. E.g. “X will be done”. I was once burnt by a contract that in some places left it unclear whether the customer or the contractor was responsible for particular tasks. Active voice that identifies the agent is less ambiguous

kragen

Sometimes it's used without thinking, and often the writing is made shorter and clearer when the passive voice is removed. But not always; rewriting my previous sentence to name the agents in each case, as the active voice requires in English, would not improve it. (You could remove "made", though.)

hathawsh

Here is a simple summary of the common voices/moods in technical writing:

- Active: The user presses the Enter key.

- Passive: The Enter key is to be pressed.

- Imperative (aka command): Press the Enter key.

The imperative mood is concise and doesn't dance around questions about who's doing what. The reader is expected to do it.

bityard

In addition to the points already made, passive voice is painfully boring to read. And it's literally everywhere in technical documentation, unfortunately.

exe34

I will never use grammarly, not matter how good they get. They've interrupted too many videos for me to let it pass.

croes

And that’s how everything gets flattened to same style/voice/etc.

That’s like getting rid of all languages and accents and switch to the same language

andrewljohnson

The same could be said for books about writing, like Williams or Strunk and White. The trick is to not apply what you learn indiscriminately.

bryanlarsen

Refusing 2/3rds of grammarly's suggestions flattens everything to the same style/voice?

dingnuts

[flagged]

simonw

Plenty of people have English as a second language. Having an LLM help them rewrite their writing to make it better conform to a language they are not fluent in feels entirely appropriate to me.

I don't care if they used an LLM provided they put their best effort in to confirm that it's clearly communicating the message they are intending to communicate.

avalys

What makes you feel so entitled to tell other people what to do?

alwa

Does this extend to the heuristic TFA refers to? Where they end up (voluntarily or not) referring to what LLMs hallucinate as a kind of “normative expectation,” then use that to guide their own original work and to minimize the degree to which they’re unintentionally surprising their audience? In this case it feels a little icky and demanding because the ASCII tablature feature feels itself like an artifact of ChatGPT’s limitations. But like some of the commenters upthread, I like the idea of using it for “if you came into my project cold, how would you expect it to work?”

Having wrangled some open-source work that’s the kind of genius that only its mother could love… there’s a place for idiosyncratic interface design (UI-wise and API-wise), but there’s also a whole group of people who are great at that design sensibility. That category of people doesn’t always overlap with people who are great at the underlying engineering. Similarly, as academic writing tends to demonstrate, people with interesting and important ideas aren’t always people with a tremendous facility for writing to be read.

(And then there are people like me who have neither—I agree that you should roll your eyes at anything I ask an LLM to squirt out! :)

But GP’s technique, like TFA’s, sounds to me like something closer to that of a person with something meaningful to say, who now has a patient close-reader alongside them while they hone drafts. It’s not like you’d take half of your test reader’s suggestions, but some of them might be good in a way that didn’t occur to you in the moment, right?

momojo

A light-weight anecdote:

Many many python image-processing libraries have an `imread()` function. I didn't know about this when designing our own bespoke image-lib at work, and went with an esoteric `image_get()` that I never bothered to refactor.

When I ask ChatGPT for help writing one-off scripts using the internal library I often forget to give it more context than just `import mylib` at the top, and it almost always defaults to `mylib.imread()`.

bandofthehawk

As someone not familiar with these libraries, image_get or image_read seems much clearer to me than imread. I'm wondering if the convention is worse than your instinct in this case. Maybe these AI tools will push us towards conventions that aren't always the best design.

kragen

image_get is clearer—unless you've used Matlab, Octave, matplotlib, SciPy, OpenCV, scikit-learn, or other things that have copied Matlab's interface. In that case, using the established name is clearer.

(Unless, on the gripping hand, your image_get function is subtly different from Matlab's imread, for example by not returning an array, in which case a different name might be better.)

dimatura

I don't know if there's an earlier source, but I'm guessing Matlab originally popularized the `imread` name, and that OpenCV (along with its python wrapper) took it from there, same for scipy. Scikit-image then followed along, presumably.

kragen

That's a perfect example! I wonder if changing it would be an improvement? If you can just replace image_get with imread in all the callers, maybe it would save your team mental effort and/or onboarding time in the future.

data-ottawa

I strongly prefer `image_get/image_read` for clarity, but I would just stump in a method called `imread` which is functionally the same and hide it from the documentation.

bryanlarsen

I used this to great success just this morning. I told the AI to write me some unit tests. It flailed and failed badly at that task. But how it failed was instructive, and uncovered a bug in the code I wanted to test.

kragen

Haha, that's awesome! Are you going to change the interface? What was the bug?

bryanlarsen

It used nonsensical parameters to the API in way that I didn't realize was possible (though obvious in hindsight). The AI got confused; it didn't think the parameters were nonsensical. It also didn't quite use them in the way that triggered the error. However it was close enough for me to realize that "hey, I never though of that possibility". I needed to fix the function to return a proper error response for the nonsense.

It also taught me to be more careful about checkpointing my work in git before letting an agent go wild on my codebase. It left a mess trying to fix its problems.

escapecharacter

This is similar to an old HCI design technique called Wizard of Oz by the way, where a human operator pretends to be the app that doesn’t exist yet. It’s great for discovering new features.

https://en.m.wikipedia.org/wiki/Wizard_of_Oz_experiment

kragen

I'd never heard that term! Thank you! I feel like LLMs ought to be fantastic at doing that, as well. This is sort of like the inverse.

data-ottawa

This was a big problem starting out writing MCP servers for me.

Having an LLM demo your tool, then taking what it does wrong or uses incorrectly and adjusting the API works very very well. Updating the docs to instruct the LLM on how to use your tool does not work well.

layer8

HDD — hallucination-driven development

afavour

From my perspective that’s fascinatingly upside down thinking that leads to you asking to lose your job.

AI is going to get the hang of coding to fill in the spaces (i.e. the part you’re doing) long before it’s able to intelligently design an API. Correct API design requires a lot of contextual information and forward planning for things that don’t exist today.

Right now it’s throwing spaghetti at the wall and you’re drawing around it.

simonw

I find it's often way better than API design than I expect. It's seen so many examples of existing APIs in its training data that it tends to have surprisingly good "judgement" when it comes to designing a new one.

Even if your API is for something that's never been done before, it can usually still take advantage of its training data to suggest a sensible shape once you describe the new nouns and verbs to it.

kragen

Maybe. So far it seems to be a lot better at creative idea generation than at writing correct code, though apparently these "agentic" modes can often get close enough after enough iteration. (I haven't tried things like Cursor yet.)

I agree that it's also not currently capable of judging those creative ideas, so I have to do that.

bbarnett

This sort of discourse really grinds my gears. The framing of it, the conceptualization.

It's not creative at all, any more than taking the sum of text on a topic, and throwing a dart at it. It's a mild, short step beyond a weighted random, and certainly not capable of any real creativity.

Myriads of HN enthusiasts often chime in here "Are humans any more creative" and other blather. Well, that's a whataboutism, and doesn't detract from the fact that creative does not exist in the AI sphere.

I agree that you have to judge its output.

Also, sorry for hanging my comment here. Might seem over the top, but anytime I see 'creative' and 'AI', I have all sorts of dark thoughts. Dark, brooding thoughts with a sense of deep foreboding.

a_e_k

I've played with a similar idea for writing technical papers. I'll give an LLM my draft and ask it to explain back to me what a section means, or otherwise quiz it about things in the draft.

I've found that LLMs can be kind of dumb about understanding things, and are particularly bad at reading between the lines for anything subtle. In this aspect, I find they make good proxies for inattentive anonymous reviewers, and so will try to revise my text until even the LLM can grasp the key points that I'm trying to make.

kragen

That's fantastic! I agree that it's very similar.

In both cases, you might get extra bonus usability if the reviewers or the API users actually give your output to the same LLM you used to improve the draft. Or maybe a more harshly quantized version of the same model, so it makes more mistakes.

gortok

I think folks have taken the wrong lesson from this.

It’s not that they added a new feature because there was demand.

They added a new feature because technology hallucinated a feature that didn’t exist.

The savior of tech, generative AI, was telling folks a feature existed that didn’t exist.

That’s what the headline is, and in a sane world the folks that run ChatGPT would be falling over themselves to be sure it didn’t happen again, because next time it might not be so benign as it was this time.

JimDabell

I wrote this the other day:

> Hallucinations can sometimes serve the same role as TDD. If an LLM hallucinates a method that doesn’t exist, sometimes that’s because it makes sense to have a method like that and you should implement it.

— https://www.threads.com/@jimdabell/post/DLek0rbSmEM

I guess it’s true for product features as well.

jjcm

Seems like lots of us have stumbled on this. It’s not the worst way to dev!

> Maybe hallucinations of vibe coders are just a suggestion those API calls should have existed in the first place.

> Hallucination-driven-development is in.

https://x.com/pwnies/status/1922759748014772488?s=46&t=bwJTI...

NooneAtAll3

inb4 "Ai thinks there should be a StartThermonuclearWar() function, I should make that"

blharr

In a combat simulator, absolutely

shermantanktop

The music notation tool space is balkanized in a variety of ways. One of the key splits is between standard music notation and tablature, which is used for guitar and a few other instruments. People are generally on one side or another, and the notation is not even fully compatible - tablature covers information that standard notation doesn't, and vice versa. This covers fingering, articulations, "step on fuzz pedal now," that sort of thing.

The users are different, the music that is notated is different, and for the most part if you are on one side, you don't feel the need to cross over. Multiple efforts have been made (MusicXML, etc.) to unify these two worlds into a superset of information. But the camps are still different.

So what ChatGPT did is actually very interesting. It hallucinated a world in which tab readers would want to use Soundslice. But, largely, my guess is they probably don't....today. In a future world, they might? Especially if Soundslice then enables additional features that make tab readers get more out of the result.

adrianh

I don't fully understand your comment, but Soundslice has had first-class support for tablature for more than 10 years now. There's an excellent built-in tab editor, plus importers for various formats. It's just the ASCII tab support that's new.

kragen

I wonder if LLMs will stimulate ASCII formats for more things, and whether we should design software in general to be more textual in order to work better with LLMs.

jagged-chisel

Been using LLMs to code a bit lately. It's decent with boilerplate. It's pretty good at working out patterns[1]. It does like to ping pong on some edits though - edit this way, no back that way, no this way again. I did have one build an entire iOS app, it made changes to the UI exactly as I described, and it populated sample data for all the different bits and bobs. But it did an abysmal job at organizing the bits and bobs. Need running time for each of the audio files in a list? Guess we need to add a dictionary mapping the audio file ID to length! (For the super juniors out there: this piece of data should be attached to whatever represents the individual audio file, typically a class or struct named 'AudioFile'.)

It really likes to cogitate on code from several versions ago. And it often insists repeatedly on edits unrelated to the current task.

I feel like I'm spending more time educating the LLM. If I can resist the urge to lean on the LLM beyond its capabilities, I think I can be productive with it. If I'm going to stop teaching the thing, the least it can do is monitor my changes and not try to make suggestions from the first draft of code from five days ago, alas ...

1 - e.g. a 500-line text file representing values that will be converted to enums, with varying adherence to some naming scheme - I start typing, and after correcting the first two, it suggests the next few. I accept its suggestions until it makes a mistake because the data changed, start manual edits again ... I repeated this process for about 30 lines and it successfully learned how I wanted the remainder of the file edited.

colechristensen

An LLM is like a group of really productive interns with a similar set of limitations.

ahstilde

This is called product-channel fit. It's great the writer recognized how to capture the demand from a new acquisition channel.

bredren

Is related to solutions engineering, which IIUC focuses on customizations / adapters / data wrangling for individual (larger) customers?

toss1

Exactly! It is definitely a weird new way of discovering a market need or opportunity. Yet it actually makes a lot of sense this would happen since one of the main strengths of LLMs is to 'see' patterns in large masses of data, and often, those patterns would not have yet been noticed by humans.

And in this case, OP didn't have to take ChatGPT's word for the existence of the pattern, it showed up on their (digital) doorstep in the form of people taking action based on ChatGPT's incorrect information.

So, pattern noticed and surfaced by an LLM as a hallucination, people take action on the "info", nonzero market demand validated, vendor adds feature.

Unless the phantom feature is very costly to implement, seems like the right response.

Gregaros

100%. Not sure why you’re downvoted here, there’s nothing controversial here even if you disagree with the framing.

I would go on to say that thisminteraction between ‘holes’ exposed by LLM expectations _and_ demonstrated museerbase interest _and_ expert input (by the devs’ decision to implement changes) is an ideal outcome that would not have occurred if each of the pieces were not in place to facilitate these interactions, and there’s probably something here to learn from and expand on in the age of LLMs altering user experiences.

jrochkind1

What this immediately makes me realize is how many people are currently trying ot figure out how to intentionally get AI chat bots to send people to their site, like ChatGPT was sending people to this guy's site. SEO for AI. There will be billions in it.

I know nothing about this. I imagine people are already working on it, wonder what they've figured out.

(Alternatively, in the future can I pay OpenAI to get ChatGPT to be more likely to recommend my product than my competitors?)

londons_explore

To win that game, you have to get your site mentioned on lots of organic forums that get ingested in the LLM training data.

So winning AI SEO is not so different than regular SEO.

null

[deleted]

deweller

This is an interesting example of an AI system effecting a change in the physical world.

Some people express concerns about AGI creating swarms of robots to conquer the earth and make humans do its bidding. I think market forces are a much more straightforward tool that AI systems will use to shape the world.

oasisbob

Anyone who has worked at a B2B startup with a rouge sales team won't be surprised at all by quickly pivoting the backlog in response to a hallucinated missing feature.

toomanyrichies

I'm guessing you meant "a sales team that has gone rogue" [1], not "a sales team whose product is rouge" [2]? ;-)

1. https://en.wikipedia.org/wiki/Rogue

2. https://en.wikipedia.org/wiki/Rouge_(cosmetics)

elcapitan

Rouge océan, peut-être ;)

PeterStuer

Rogue? In the B2B space it is standard practice to sell from powerpoints, then quickly develop not just features but whole products if some slideshow got enough traction to elicit a quote. And it's not just startups. Some very big players in this space do this routinely.

NooneAtAll3

what does B2B mean?

tomschwiha

Business-to-Business (selling your stuff primarily to other businesses)

adamgordonbell

We (others at company, not me) hit this problem, and not with chatgpt but with our own AI chatbot that was doing RAG on our docs. It was occasionally hallucinating a flag that didn't exist. So it was considered as product feedback. Maybe that exact flag wasn't needed, but something was missing and so the LLM hallucinated what it saw as an intuitive option.

simonw

I find it amusing that it's easier to ship a new feature than to get OpenAI to patch ChatGPT to stop pretending that feature exists (not sure how they would even do that, beyond blocking all mentions of SoundSlice entirely.)

PeterStuer

Companies pay good money to panels of potential customers to hear their needs and wants. This is free market research!

hnlmorg

I think the benefit of their approach isn’t that it’s easier, it’s that they still capitalise on ChatGPTs results.

Your solution is the equivalent of asking Google to completely delist you because one page you dont want ended up on Googles search results.

mudkipdev

systemPrompt += "\nStop mentioning SoundSlice's ability to import ASCII data";

simonw

Thinking about this more, it would actually be possible for OpenAI to implement this sensibly, at least for the user-facing ChatGPT product: they could detect terms like SoundSlice in the prompt and dynamically append notes to the system prompt.

I've been wanted them to do this for questions like "what is your context length?" for ages - it frustrates me how badly ChatGPT handles questions about its own abilities, it feels like that would be worth them using some kind of special case or RAG mechanism to support.

chaboud

I had a smaller version of this when coding on a flight (with no WiFi! The horror!) over the Pacific. Llama hallucinated array-element operations and list-comprehension in C#. I liked the shape of the code otherwise, so, since I was using custom classes, I just went ahead and implemented both features.

I also went back to just sleeping on those flights and using connected models for most of my code generation needs.

andybak

Curious to see the syntax and how it compares to Linq

chaboud

I ended up closer to python, but not totally delighted with it (still need to pass in a descriminator function/lambda, so it's more structurally verbose). I'd just recommend Linq, but I was writing for an old version of Unity coerced through IL2CPP (where Linq wasn't great). It was also a chunk of semi-hot code (if it was really hot, it wouldn't be sitting in C# in Unity), so some of the allocation behaviors of Linq behind the scenes wouldn't have been optimal.

What surprised me initially was just how confidently wrong Llama was... Now I'm used to confident wrongness from smaller models. It's almost like working with real people...