The Great Chatbot Debate
12 comments
·March 25, 2025lsy
dghlsakjg
Are llms truly non-deterministic?
My impression was that we inject some randomness intentionally to mimic it, but if you remove that factor you end up with a deterministic chat model.
lsy
In addition to Kamshak's note about parallel inference accumulating float errors differently due to order of operations, which makes LLMs theoretically non-deterministic at temperature 0, there is the issue of them being practically non-deterministic as-deployed, not just via temperature but because of inclusion of prior "turns" in context, variations in phrasing of prompts, etc.
It's also "non-deterministic" in the sense that if you removed all sources of non-determinism and asked "What is 1+1?" and received the answer "2" deterministically, that doesn't guarantee a correct answer for "What is 1+2?". Ie a variation in the input isn't correlated in a logical way with a variation in the output, which is somewhat fatal for computer programs, where the goal is to generalize a problem across a range of inputs.
Kamshak
There is also unintentional randomness due to the parallelism in inference (e.g. parallel matmuls added together on the GPU). Since it's multiplying floats every operation has rounding drift that accumulates differently depending on the order of operations. So even at temperature 0 you're not getting deterministic outputs
naveen99
Because addition and multiplication are not associative with floats ?
CuriouslyC
The probability distribution over next tokens given previous tokens is deterministic. The sampling algorithm for that distribution is non-deterministic.
krallistic
And sampling from a (now fixed) distribution can be made deterministic...
So the total generation of text from an LLM can be made fully deterministic. The problem for scientists is that we cant do that in the deployed systems...
javajive
Here is the debate: https://www.youtube.com/watch?v=YtIQVaSS5Pg
Ukv
Agree with Sébastien's stance that understanding isn't just zero or one. I'd argue current LLMs do already to some extent satisfy Emily's definition of "understanding" by mapping from language to an internal semantic space - just not necessarily as well as humans.
Broadly they seemed to be of the views that:
* (Sébastien) Pragmatically what we care about is the model's functionality/behavior. If for instance it's capable of solving unseen math puzzles, whether we say it "truly understood" the puzzle is mostly immaterial
* (Emily) Whether or not it understands is important because it will determine what tasks it's capable of. If shown a parrot and a foreign-language word for parrot, a human could pick up from context that the word refers to a parrot but model could not due to lack of understanding
I normally lean towards the former view because I feel "understanding" is a fairly nebulous term. Claims that "machines can never do X because X requires intelligence/understanding/..." tend to just move onto the next X once machines accomplish that X, opposed to actually being convinced that machines now have understanding, effectively decoupling whether it's said to have "understanding" from any practical limitations in its behavior.
I do think the latter view makes sense if you hold "understanding" as something fixed/specific, and are at some point willing to say "this system can do X, so has understanding" to the tasks you said required understanding. Emily did for instance seem willing to say home assistants have understanding.
Uehreka
The question of whether chatbots are “truly intelligent” is the dumbest question people regularly debate about AI.
First of all, we do not and have never had a rigorous and falsifiable definition of “intelligence”, so this question can’t actually be answered. I’ve watched people tie themselves in knots over why “AI doesn’t think like humans” only to find themselves reinventing the Chinese Room Paradox. I’ve watched people on this site get blue in the face that “it should be called Machine Learning and not Artificial Intelligence”… because machines can’t be intelligent but they can learn?
Second, it’s inconsequential. If a chatbot can deliver passable performance at a job at 1% of the cost of a human, it’s going to cause a massive problem for human workers whether it has a mortal soul or not. If generative AI is pumping the internet with slop, that sucks whether the AI is doing it out of malevolent intent or a million dot products.
skywhopper
Is there a debate? No, they do not “understand”.
One effect of this technology being almost totally opaque, non-explainable, and also nondeterministic is that most of the arguments for it (and maybe most against it?) take on an anecdotal character.
I was surprised to hear Dr. Bubeck, who is in kind of a privileged position wrt OpenAI and obviously an accomplished scientist, essentially saying things like "I tried asking it X and I am pretty sure it's not in the training set, and it worked, therefore I think it understands".
A really big problem with the anecdata approach to proving out AI is alluded to in Dr. Bender's story about the "Everything in the Whole Wide World" museum (for those who didn't watch, Grover the muppet goes to the aforementioned museum, sees many things — but not "everything", then walks through a door labeled "Everything Else" that leads outside). That is, no individual prompt-and-response can be relied on to inform about another prompt-and-response that you haven't yet tried. And with nondeterminism, you can't even rely on that prompt-and-response to remain stable. But a failed prompt also carries the possibility that a tweaked prompt could produce a success. So (as we saw with the twitter post yesterday complaining about LLMs not working well that was both heavily upvoted and heavily contested in the comments), we are in a world where nobody can say much except to provide anecdotes of the thing failing or succeeding or having characteristic Y, which are exercises in narrative construction to support or oppose investment, not principled discussion.