AI coding and the peanut butter and jelly problem
82 comments
·April 11, 2025kenjackson
barotalomey
> If anything, LLMs have surprised at much better they are than humans in understanding instructions for text based activities.
That's demonstrateably false, as proven by both OpenAI's own research [1] and endless independent studies by now.
What is fascinating is how some people cling on false ideas about what LLM is and isnt.
Its a recurring fallacy that's bound to get it's own name any time soon.
otabdeveloper4
People think coding is the difficult part of programming.
Which it isn't, just like pressing keys isn't the difficult part of being a pianist.
If they invented a machine to press piano keys with superhuman speed and precision that wouldn't make you a musician.
kenjackson
You’re comparing an LLM to expert programmers. Compare an LLM on the same task versus the average college student. And try it for a math problem. A poetry problem. Ask it a more complex question about history or to do an analysis of an essay you wrote.
Put it this way — I’m going to give you a text based question to solve and you have a choice to get another human to solve it (randomly selected from adults in the US) or ChatGPT, and both will be given 30 minutes to read and solve the problem — which would you choose?
aleph_minus_one
> Put it this way — I’m going to give you a text based question to solve and you have a choice to get another human to solve it (randomly selected from adults in the US) or ChatGPT, and both will be given 30 minutes to read and solve the problem — which would you choose?
You wouldn't randomly selected an arbitrary adult from the USA to do a brain surgery on you, so this argument is rabulistic.
null
zahlman
Okay, but like.
If you do have that skill to communicate clearly and describe the requirements of a novel problem, why is the AI still useful? Actually writing the code should be relatively trivial from there. If it isn't, that points to a problem with your tools/architecture/etc. Programmers IMX are, on average, far too tolerant of boilerplate.
MBCook
Exactly. This same point was mentioned on Accidental Tech Podcast last week during a section primarily about “vibe coding”. (May have been the paid-only segment)
If the LLM gets something wrong, you have to be more exact to get it to make the program do the thing you want. And when that isn’t perfect, you have to tell it exactly what you want to to do in THAT situation. And the next one. And the next one.
At that point you’re programming. It may not be the same as coding in a traditional language, but isn’t it effectively the same process? You’re having to lay out all the exact steps to take when different things happen.
So in the end have you replaced programmers or decreased the amount of programming needed? Or have you just changed the shape of the activity so it doesn’t look like what we’re used to calling programming today?
John Siracusa (one of the hosts) compared it to the idea of a fourth generation language.
From Wikipedia:
“The concept of 4GL was developed from the 1970s through the 1990s, overlapping most of the development of 3GL, with 4GLs identified as ‘non-procedural’ or ‘program-generating’ languages”.
Program generating language sounds an awful lot like what people are trying to use AI for. And these claims that we don’t need programmers anymore also sound a lot like the claims from when people were trying to make flowchart based languages. Or COBOL.
“You don’t need programmers! The managers can write their own reports”.
In fact “the term 4GL was first used formally by James Martin in his 1981 book Application Development Without Programmers” (Wikipedia again).
They keep trying. But it all ends up still being programming.
daxfohl
This is what I keep coming back to. I'm sure I'm not the only one here who frequently writes the code, or at least a PoC, then writes the design doc based on it. Because the code is the most concise and precise way to specify what you really want. And writing it gives you more clarity on things you might not have thought about when writing it in a document. Unrolling that into pseudocode/English almost always gets convoluted for anything but very linear pieces of logic, and you're generally not going to get it right if you haven't already done a little exploratory coding beforehand.
So to me, even in an ideal world the dream of AI coding is backwards. It's more verbose, it's harder to conceptualize, it's less precise, and it's going to be more of a pain to get right even if it worked perfectly.
That's not to say it'll never work. But the interface has to change a lot. Instead of a UX where you have to think about and specify all the details up front, a useful assistant would be more conversational, analyze the existing codebase, clarify the change you're asking about, propose some options, ask which layer of the system, which design patterns to use, whether the level of coupling makes sense, what extensions of the functionality you're thinking about in the future, pros and cons of each approach, and also help point out conflicts or vague requirements, etc. But it seems like we've got quite a way to go before we get there.
grahac
Agreed although AIs today with simple project based rules can do things like check and account for error cases, and write the appropriate unit tests for those error cases.
I personally have found I can often create equivalent code in less English than typing.
Also it works very well where the scope is well defined like implementing interfaces or porting a library from one language to another.
namaria
Another issue I see is the "Machine Stops" problem. When we come to depend on a systems that fails to foster the skills and knowledge needed to reproduce it (i.e. if programming comes to be so easy to so many people that they don't actually need to know how it works under the hood) you slowly loose the ability to maintain and extend the system as a society.
LikesPwsh
I realise this is meant to be a jab at high-level programming languages, but SQL really did succeed at that.
Its abstraction may leak sometimes, but most people using it are incredibly productive without needing to learn what a spool operator or bitmap does.
Even though the GUI and natural language aspects of 4GL failed, declarative programming was worth it.
MBCook
I really like SQL personally. You’re right it does work well, but I suspect that’s because it has a limited domain instead of being a general purpose language.
aleph_minus_one
> At that point you’re programming. It may not be the same as coding in a traditional language, but isn’t it effectively the same process? You’re having to lay out all the exact steps to take when different things happen.
No, it isn't.
Programming is thinking deeply about
- the invariants that your code obeys
- which huge implications a small, innocent change in one part of the program will have for other, seemingly unrelated parts of the program
- in which sense the current architecture is (still) the best possible for what the program does, and if not, what the best route is to get there
- ...
euroderf
So, which is it ? Do you want to end up writing extremely detailed requirements, in English ? Or do you want to DIY by filling your head with software-related abstractions - in some internal mental "language" that might often be beyond words - and then translating those mental abstractions to source code ?
derefr
An LLM is a very effective human-solution-description / pseudocode to "the ten programming languages we use at work, where I'm only really fluent in three of them, and have to use language references for the others each time I code in them" transpiler.
It also remembers CLI tool args far better than I do. Before LLMs, I would often have to sit and just read a manpage in its entirety to see if a certain command-line tool could do a certain thing. (For example: do you know off-hand if you can get ls(1) to format file mtimes as ISO8601 or POSIX timestamps? Or — do you know how to make find(1) prune a specific subdirectory, so that it doesn't have to iterate-over-and-ignore the millions of tiny files inside it?) But now, I just ask the LLM for the flags that will make the tool do the thing; it spits them out (if they exist); and then I can go and look at the manpage and jump directly to that flag to learn about it — using the manpage as a reference, the way it was intended.
Actually, speaking of CLI tools, it also just knows about tools that I don't. You have to be very good with your google-fu to go from the mental question of "how do I get disk IO queue saturation metrics in Linux?" to learning about e.g. the sar(1) command. Or you can just ask an LLM that actual literal question.
taurath
I’ve found that the surfacing of tools and APIs really can help me dive into learning, but ironically usually by AI finding a tool and then me reading its documentation, as I want to understand if it has the capabilities or flexibility I have in mind. I can leave that to LLMs to tell me, but I find it’s too good an opportunity to build my own internal knowledge base to pass up. It’s the back and forth between having an LLM spit out familiar concepts and give new to me solutions. Overall it helps me get through learning quicker I think, because I can often work off of an example to start.
derefr
Exactly — one thing LLMs are great at, is basically acting as a coworker who happens to have a very wide breadth of knowledge (i.e. to know at least a little about a lot) — who you can thus ask to "point you in a direction" any time you're stuck or don't know where to start.
Arcuru
Before LLMs there existed quite a few tools to try to help with understanding CLI options; off the top of my head there are https://github.com/tldr-pages/tldr and explainshell.com
LLMs are both more general and more useful than those tools. They're more flexible and composable, and can replace those tools with a small wrapper script. Part of the reason why the LLMs can do that though is because it has those other tools as datasets to train off of.
simonw
Once you've got to a detailed specification, LLMs are a lot faster at correctly typing code than you are.
zahlman
In your analysis, do you account for the time taken to type a detailed specification with which to prompt the LLM?
Or the time to review the code - whether by manual fixes, or iterating with the prompt, or both?
simonw
No, just the time spent typing the code.
tharant
This is one reason I see to be optimistic about some of the hype around LLMs—folks will have to learn how to write high quality specifications and documentation in order to get good results from a language model; society desperately needs better documentation!
geor9e
>Actually writing the code should be relatively trivial
For you, maybe. This statement assumes years of grueling training to become bilingual in a foreign programming language. And I can't type at 1000 tokens/s personally - sometimes I just want to press the voice dictate key and blab for five seconds and move on to something actually interesting.
zahlman
>This statement assumes years of grueling training to become bilingual in a foreign programming language
...So, less experienced programmers are supposed to be happy that they can save time with the same technology that will convince their employers that a human isn't necessary for the position?
(And, frankly, I've overall quite enjoyed the many years I've put into the craft.)
geor9e
You're seeing this entirely from the perspective of people who do programming as their job. I'm seeing it from the perspective of the other 99% of society. It feels really good that they're no longer gatekept by the rigid and cryptic interfaces that prevented them from really communicating with their computer, just because it couldn't speak their native tongue.
larve
Useful boilerplate:
- documentation (reference, tutorials, overviews) - tools - logging and log analyzers - monitoring - configurability - unit tests - fuzzers - UIs - and not least: lots and lots of prototypes and iterating on ideas
All of these are "trivial" once you have the main code, but they are incredibly valuable, and LLMs do a fantastic job.
zahlman
I was referring specifically to boilerplate within the code itself. But sure, I can imagine some uses.
Syzygies
"I bought those expensive knives. Why doesn't my cooking taste better?"
"I picked up an extraordinary violin once. It sounded awful!"
There's an art here. Managerial genius is recognizing everyone's strengths and weaknesses, and maximizing impact. Coding with AI is no different.
Of course I have to understand the code well enough to have written it. Usually much of the time is spent proposing improvements.
I'm a few months in, learning to code with AI for my math research. After a career as a professor, I'm not sure I could explain to anyone what I'm starting to get right, but I'm working several times more efficiently than I ever could by hand.
Some people will get the hang of this faster than others, and it's bull to think this can be taught.
phalangion
This video shows the peanut butter and jelly problem in action: https://youtu.be/cDA3_5982h8?si=xIQpzNTvhRcGY4Nb
_wire_
Step 1: Computer, make me a peanut butter and jelly sandwich.
If this can't work, the program abstraction is insufficient to the task. This insufficiency is not a surprise.
That an ordinary 5-year can make a sandwich after only ever seeing someone make one, and that the sandwich so-made is a component within a life sustaining matrix which inevitably leads to new 5 year-olds making their own sandwiches and serenading the world about the joys of peanut butter and jelly is the crucial distinction between AI and intelligence.
The rest of the stuff about a Harvard professor ripping a hole in a bag and pouring jelly on a clump of bread on the floor is a kooky semantic game that reveals something about the limits of human intelligence among the academic elite.
We might wonder why some people have to get to university before encountering such basic epistemological conundrum as what constitutes clarity in exposition... But maybe that's what teaching to the test in U.S. K-12 gets you.
Alan Kay is known a riff on a simple study where Harvard students were asked what causes the earth's seasons: almost all of them give the wrong explanation, but many of them are very confident about the correctness of their wrong explanations.
Given that the measure of every AI chat program's performance is how agreeable its response is to a human, is there a clear distinction between a the human and the AI?
If this HN discussion was among AI chat programs considering their own situations and formulating understanding of their own problems; maybe waxing about the ineffable, for them, joy of eating a peanut butter and jelly sandwich...
But it isn't.
pkdpic
lol, I didn't realize how famous the PB&J exercise was. That's fantastic. I thought it was just from this puppet video I've been showing my 4yo and his friends. Anyway they seem to love it.
https://m.youtube.com/watch?v=RmbFJq2jADY&t=3m25s
Also seems like great advice, feels like a good description of what Ive been gravitating towards / having more luck with lately proompting.
01HNNWZ0MV43FF
My class did it with paper airplanes. My partner used the phrase "hotdog style" which I'd never heard. Good times!
extr
Didn't know they did the PB&J thing at Harvard. I remember doing that in the 3rd grade or thereabouts.
iDon
For decades people have been dreaming of higher-level languages, where a user can simply specify what they want and not how to do it (the name of the programming language Forth derives from '4th Generation Language', reflecting this idea).
Here we are - we've arrived at the next level.
The emphasis in my prompts is specification : clear and concise, defining terms as they are introduced, and I've had good results with that. I expect that we'll see specification/prompt languages evolve, in the same way that MCP has become a defacto standard API for connecting LLMs to other applications and servers. We could use a lot of the ideas from existing specification languages, and there has been a lot of work done on this over 40+ years, but my impression is they are largely fairly strict, because their motivation was provably-correct code. The ideas can be used in a more relaxed way, because prompting fits well with rapid application development (RAD) and prototyping - I think there is a sweet spot of high productivity in a kind of REPL (read/evaluate/print loop) with symbolic references and structure embedded in free-form text.
Other comments have mentioned the importance of specification and requirements analysis, and dahlfox menions being able to patch new elements into the structure in subsequent prompts (via BASIC line number insertion).
grahac
Anyone here see the CS50 peanut butter and jelly problem in person?
Nuzzerino
We had this in the 8th grade science class, and IMO it was much better than this Harvard version. Still PB&J with instructions. Teacher had a skeleton named "it". Anytime instructions referenced the word "it", the teacher used the skeleton in place.
csours
Not in that course, but I've done it at a "STEM" day; it's just about the most fun I've ever had teaching.
grahac
That's awesome.
I got the sense the professor liked it a ton too. In the videos of this on YouTube you can see the professor really enjoying watching it all go down.
It still is so memorable.
rileymat2
I had a middle school art teacher do it in roughly 1995.
ryoshu
We did it in 3rd grade in public school.
mkw5053
Similar to having a remote engineering team that fulfills the (insufficient) requirements but in ways you did not predict (or want).
daxfohl
Whenever I'm prompting LLM for this kind of thing I find myself wishing there was a BASIC style protocol that we could use to instruct LLMs. Numbered statements, GOTOs to jump around, standardized like MCP or A2A such that all LLMs are trained to understand and verified to follow the logic.
Why BASIC? It's a lot harder to mix English and structured programming concepts. Plus it's nice if you forget a step between 20 and 30 you can just say `25 print 'halfway'` from the chat.
philipswood
I've had successful chats with ChatGPT that uses Python as pseudocode.
I specifically mean pseudocode intended to clarify and communicate - as opposed to be run by a computer.
E.g.: https://chatgpt.com/share/67f9fee9-07dc-8003-a272-ca05d91282...
cadamsdotcom
AI is exposing the difference in effectiveness between communicating clearly and precisely (potentially being more verbose than you think you need), vs. leaning heavily on context.
This is actually no different than for humans once you get past the familiar. It's like the famous project management tree story: https://pmac-agpc.ca/project-management-tree-swing-story
If anything, LLMs have surprised at much better they are than humans in understanding instructions for text based activities. But they are MUCH worse than humans when it comes to creating images/videos.