Skip to content(if available)orjump to list(if available)

Bag of words, have mercy on us

Bag of words, have mercy on us

46 comments

·December 7, 2025

4bpp

As usual with these, it helps to try to keep the metaphor used for downplaying AI, but flip the script. Let's grant the author's perception that AI is a "bag of words", which is already damn good at producing the "right words" for any given situation, and only keeps getting better at it.

Sure, this is not the same as being a human. Does that really mean, as the author seems to believe without argument, that humans need not be afraid that it will usurp their role? In how many contexts is the utility of having a human, if you squint, not just that a human has so far been the best way to "produce the right words in any given situation", that is, to use the meat-bag only in its capacity as a word-bag? In how many more contexts would a really good magic bag of words be better than a human, if it existed, even if the current human is used somewhat differently? The author seems to rest assured that a human (long-distance?) lover will not be replaced by a "bag of words"; why, especially once the bag of words is also ducttaped to a bag of pictures and a bag of sounds?

I can just imagine someone - a horse breeder, or an anthropomorphised horse - dismissing all concerns on the eve of the automotive revolution, talking about how marketers and gullible marks are prone to hippomorphising anything that looks like it can be ridden and some more, and sprinkling some anecdotes about kids riding broomsticks, legends of pegasi and patterns of stars in the sky being interpreted as horses since ancient times.

andai

So a human is just a really expensive, unreliable bag of words. And we get more expensive and more unreliable by the day!

There's a quote I love but have misplaced, from the 19th century I think. "Our bodies are just contraptions for carrying our heads around." Or in this instance... bag of words transport system ;)

bloaf

Everyone is out here acting like "predicting the next thing" is somehow fundamentally irrelevant to "human thinking" and it is simply not the case.

What does it mean to say that we humans act with intent? It means that we have some expectation or prediction about how our actions will effect the next thing, and choose our actions based on how much we like that effect. The ability to predict is fundamental to our ability to act intentionally.

So in my mind: even if you grant all the AI-naysayer's complaints about how LLMs aren't "actually" thinking, you can still believe that they will end up being a component in a system which actually "does" think.

voidhorse

When you have a thought, are you "predicting the next thing"—can you confidently classify all mental activity that you experience as "predicting the next thing"?

Language and society constrains the way we use words, but when you speak, are you "predicting"? Science allows human beings to predict various outcomes with varying degrees of success, but much of our experience of the world does not entail predicting things.

How confident are you that the abstractions "search" and "thinking" as applied to the neurological biological machine called the human brain, nervous system, and sensorium and the machine called an LLM are really equatable? On what do you base your confidence in their equivalence?

Does an equivalence of observable behavior imply an ontological equivalence? How does Heisenberg's famous principle complicate this when we consider the role observer's play in founding their own observations? How much of your confidence is based on biased notions rather than direct evidence?

The critics are right to raise these arguments. Companies with a tremendous amount of power are claiming these tools do more than they are actually capable of and they actively mislead consumers in this manner.

bamboozled

The issue is that prediction is "part" of the human thought process, it's not the full story...

bloaf

And the big players have built a bunch of workflows which embed many other elements besides just "predictions" into their AI product. Things like web search, to incorporating feedback from code testing, to feeding outputs back into future iterations. Who is to say that one or more of these additions has pushed the ensemble across the threshold and into "real actual thinking."

The near-religious fervor which people insist that "its just prediction" makes me want to respond with some religious allusions of my own:

> Who is this that wrappeth up sentences in unskillful words? Gird up thy loins like a man: I will ask thee, and answer thou me. Where wast thou when I laid up the foundations of the earth? tell me if thou hast understanding. Who hath laid the measures thereof, if thou knowest? or who hath stretched the line upon it?

The point is that (as far as I know) we simply don't know the necessary or sufficient conditions for "thinking" in the first place, let alone "human thinking." Eventually we will most likely arrive at a scientific consensus, but as of right now we don't have the terms nailed down well enough to claim the kind of certainty I see from AI-detractors.

throwaway150

> The issue is that prediction is "part" of the human thought process, it's not the full story...

Do you have a proof for this?

Surely such a profound claim about human thought process must have a solid proof somewhere? Otherwise who's to say all of human thought process is not just a derivative of "predicting the next thing"?

tkgally

I am unsure myself whether we should regard LLMs as mere token-predicting automatons or as some new kind of incipient intelligence. Despite their origins as statistical parrots, the interpretability research from Anthropic [1] suggests that structures corresponding to meaning do exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought.

That said, I was struck by a recent interview with Anthropic’s Amanda Askell [2]. When she talks, she anthropomorphizes LLMs constantly. A few examples:

“I don't have all the answers of how should models feel about past model deprecation, about their own identity, but I do want to try and help models figure that out and then to at least know that we care about it and are thinking about it.”

“If you go into the depths of the model and you find some deep-seated insecurity, then that's really valuable.”

“... that could lead to models almost feeling afraid that they're gonna do the wrong thing or are very self-critical or feeling like humans are going to behave negatively towards them.”

[1] https://www.anthropic.com/research/team/interpretability

[2] https://youtu.be/I9aGC6Ui3eE

Kim_Bruning

Amanda Askell studied under David Chalmers at NYU: the philosopher who coined "the hard problem of consciousness" and is famous for taking phenomenal experience seriously rather than explaining it away. That context makes her choice to speak this way more striking: this isn't naive anthropomorphizing from someone unfamiliar with the debates. It's someone trained by one of the most rigorous philosophers of consciousness, who knows all the arguments for dismissing mental states in non-biological systems, and is still choosing to speak carefully about models potentially having something like feelings or insecurities.

andai

Well, she's describing the system's behavior.

My fridge happily reads inputs without consciousness, has goals and takes decisions without "thinking", and consistently takes action to achieve those goals. (And it's not even a smart fridge! It's the one with a copper coil or whatever.)

I guess the cybernetic language might be less triggering here (talking about systems and measurements and control) but it's basically the same underlying principles. One is just "human flavored" and I therefore more prone to invite unhelpful lines of thinking?

Except that the "fridge" in this case is specifically and explicitly designed to emulate human behavior so... you would indeed expect to find structures corresponding to the patterns it's been designed to simulate.

Wondering if it's internalized any other human-like tendencies — having been explicitly trained to simulate the mechanisms that produced all human text — doesn't seem too unreasonable to me.

CGMthrowaway

>research from Anthropic [1] suggests that structures corresponding to meaning exist inside those bundles of numbers and that there are signs of activity within those bundles of numbers that seem analogous to thought.

Can you give some concrete examples? The link you provided is kind of opaque

>Amanda Askell [2]. When she talks, she anthropomorphizes LLMs constantly.

She is a philosopher by trade and she describes her job (model alignment) as literally to ensure models "have good character traits." I imagine that explains a lot

tkgally

Here are three of the Anthropic research reports I had in mind:

https://www.anthropic.com/news/golden-gate-claude

Excerpt: “We found that there’s a specific combination of neurons in Claude’s neural network that activates when it encounters a mention (or a picture) of this most famous San Francisco landmark.”

https://www.anthropic.com/research/tracing-thoughts-language...

Excerpt: “Recent research on smaller models has shown hints of shared grammatical mechanisms across languages. We investigate this by asking Claude for the ‘opposite of small’ across different languages, and find that the same core features for the concepts of smallness and oppositeness activate, and trigger a concept of largeness, which gets translated out into the language of the question.”

https://www.anthropic.com/research/introspection

Excerpt: “Our new research provides evidence for some degree of introspective awareness in our current Claude models, as well as a degree of control over their own internal states.”

electroglyph

the anthropomorphization (say that 3 times quickly) is kinda weird, but also makes for a much more pleasant conversation imo. it's kinda tedious being pedantic all the time.

bamboozled

I use LLMs heavily for work, I have done so for about 6 months. I see almost zero "thought" going on and a LOT of pattern matching. You can use this knowledge to your advantage if you understand this. If you're relying on it to "think", disaster will ensue. At least that's been my experience.

I've completely given up on using LLMs for anything more than a typing assistant / translator and maybe an encyclopedia when I don't care about correctness.

viccis

Every day I see people treat gen AI like a thinking human, Dijkstra's attitudes about anthropomorphizing computers is vindicated even more.

That said, I think the author's use of "bag of words" here is a mistake. Not only does it have a real meaning in a similar area as LLMs, but I don't think the metaphor explains anything. Gen AI tricks laypeople into treating its token inferences as "thinking" because it is trained to replicate the semiotic appearance of doing so. A "bag of words" doesn't sufficiently explain this behavior.

bloaf

I'll make the following observation:

The contra-positive of "All LLMs are not thinking like humans" is "No humans are thinking like LLMs"

And I do not believe we actually understand human thinking well enough to make that assertion.

Indeed, it is my deep suspicion that we will eventually achieve AGI not by totally abandoning today's LLMs for some other paradigm, but rather embedding them in a loop with the right persistence mechanisms.

roxolotl

Yea bag of words isn’t helpful at all. I really do think that “superpowered sentence completion” is the best description. Not only is it reasonably accurate it is understandable, everyone has seen autocomplete function, and it’s useful. I don’t know how to “use” a bag of words. I do know how to use sentence completion. It also helps explains why context matters.

domador

I've been recently using a similar description, referring to "AI" (LLMs) as "glorified autocomplete" or "luxury autocomplete".

akomtu

Spoken Query Language? Just like SQL, but for unstructured blobs of text as a database and unstructured language as a query? Also known as Slop Query Language or just Slop Machine for its unpredictable results.

Ukv

> Spoken Query Language? Just like SQL, but for unstructured blobs of text as a database and unstructured language as a query?

I feel that's more a description of a search engine. Doesn't really give an intuition of why LLMs can do the things they do (beyond retrieval), or where/why they'll fail.

cowsandmilk

Title is confusing given https://en.wikipedia.org/wiki/Bag-of-words_model

But even more than that, today’s AI chats are far more sophisticated than probabilistically producing the next word. Mixture of experts routes to different models. Agents are able to search the web, write and execute programs, or use other tools. This means they can actively seek out additional context to produce a better answer. They also have heuristics for deciding if an answer is correct or if they should use tools to try to find a better answer.

The article is correct that they aren’t humans and they have a lot of behaviors that are not like humans, but oversimplifying how they work is not helpful.

palata

Slightly unfortunate that "Bag of words" is already a different concept: https://en.wikipedia.org/wiki/Bag_of_words.

My second thought is that it's not the metaphor that is misleading. People have been told thousands of times that LLMs don't "think", don't "know", don't "feel", but are "just a very impressive autocomplete". If they still really want to completely ignore that, why would they suddenly change their mind with a new metaphor?

Humans are lazy. If it looks true enough and it cost less effort, humans will love it. "Are you sure the LLM did your job correctly?" is completely irrelevant: people couldn't care less if it's correct or not. As long as the employer believes that the employee is "doing their job", that's good enough. So the question is really: "do you think you'll get fired if you use this?". If the answer is "no, actually I may even look more productive to my employer", then why would people not use it?

est

> Who reassigned the species Brachiosaurus brancai to its own genus, and when?

To be fair, everage person couldn't answer this either, at least not without thorough research.

kace91

I’ve made this point several times: sure, an anthropomorphized LLM is misleading, but would you rather have them seem academic?

At least the human tone implies fallibility, you don’t want them acting like interactive Wikipedia.

andai

It's a concussed savant with anretrograde amnesia in a hyperbolic time chamber.

Ukv

I'm not convinced that "It's just a bag of words" would do much to sway someone who is overestimating an LLM's abilities. Feels too abstract/disconnected from what their experience using the LLM will be that it'll just sound obviously mistaken.

Kim_Bruning

This is essentially Lady Lovelace's objection from the 19th century [1]. Turing addressed this directly in "Computing Machinery and Intelligence" (1950) [2], and implicitly via the halting problem in "On Computable Numbers" (1936) [3]. Later work on cellular automata, famously Conway's Game of Life [4], demonstrates more conclusively that this framing fails as a predictive model: simple rules produce structures no one "put in."

A test I did myself was to ask Claude (The LLM from Anthropic) to write working code for entirely novel instruction set architectures (e.g., custom ISAs from the game Turing Complete [5]), which is difficult to reconcile with pure retrieval.

[1] Lovelace, A. (1843). Notes by the Translator, in Scientific Memoirs Vol. 3. ("The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.") Primary source: https://en.wikisource.org/wiki/Scientific_Memoirs/3/Sketch_o.... See also: https://www.historyofdatascience.com/ada-lovelace/ and https://writings.stephenwolfram.com/2015/12/untangling-the-t...

[2] https://academic.oup.com/mind/article/LIX/236/433/986238

[3] https://www.cs.virginia.edu/~robins/Turing_Paper_1936.pdf

[4] https://web.stanford.edu/class/sts145/Library/life.pdf

[5] https://store.steampowered.com/app/1444480/Turing_Complete/

darepublic

Nice essay but when I read this

> But we don’t go to baseball games, spelling bees, and Taylor Swift concerts for the speed of the balls, the accuracy of the spelling, or the pureness of the pitch. We go because we care about humans doing those things.

My first thought was does anyone want to _watch_ me programming?

Fwirt

No, but watching a novelist at work is boring, and yet people like books that are written by humans because they speak to the condition of the human who wrote it.

Let us not forget the old saw from SICP, “Programs must be written for people to read, and only incidentally for machines to execute.” I feel a number of people in the industry today fail to live by that maxim.

drivebyhooting

That old saw is patently false.

paulryanrogers

Why?

It suggests to me, having encountered it for the first time, that programs must be readable to remain useful. Otherwise they'll be increasingly difficult to execute.

1659447091

I vaguely remember a site where you could watch random people live streaming their programming environment, but I think twitch ate it, or maybe it was twitch -- not sure, but was interesting

[added] It was livecoding.tv - circa 2015 https://hackupstate.medium.com/road-to-code-livecoding-tv-e7...

hansvm

A number of people make money letting people watch them code.

skybrian

No, but open source projects will be somewhat more willing to review your pull request than one that's computer-generated.

awesome_dude

I mean, I like to watch Gordon Ramsey... not cook, but have very strong discussions with those that dare to fail his standards...

voidhorse

The defenders and the critics around LLM anthropomorphism are both wrong.

The defenders are right insofar as the (very loose) anthropomorphizing language used around LLMs is justifiable to the extent that human beings also rely on disorder and stochastic processes for creativity. The critics are right insofar as equating these machines to humans is preposterous and mostly relies on significantly diminishing our notion of what "human" means.

Both sides fail to meet the reality that LLMs are their own thing, with their own peculiar behaviors and place in the world. They are not human and they are somewhat more than previous software and the way we engage with it.

However, the defenders are less defensible insofar as their take is mostly used to dissimulate in efforts to make the tech sound more impressive than it actually is. The critics at least have the interests of consumers and their full education in mind—their position is one that properly equips consumers to use these tools with an appropriate amount of caution and scrutiny. The defenders generally want to defend an overreaching use of metaphor to help drive sales.