Origin of Hallucination in LLMs, The physical source of hallucinations has found
38 comments
·December 22, 2025airhangerf15
crazygringo
> LLMs don't "hallucinate" or "lie." They have no intent.
You're just arguing about semantics. It doesn't matter in any substantial way. Ultimately, we need a word to distinguish factual output from confidently asserted erroneous output. We use the word "hallucinate". If we used a different word, it wouldn't make any difference -- the observable difference remains the same. "Hallucinate" is the word that has emerged, and it is now by overwhelming consensus the correct word.
> Whenever they get something "right," it's literally by accident.
This is obviously false. A great deal of training goes into making sure they usually get things right. If an infinite number of monkeys on typewriters get something right, that's by accident. Not LLM's.
TomasBM
While I agree that we need a word for this type of behavior, hallucinate is a wrong choice IMO.
Hallucinations are already associated with a type of behavior, which is (roughly defined) "subjectively seeing/hearing things which aren't there". This is an input-level error, not the right umbrella term for the majority of errors happening with LLMs, many if which are at output-level.
I don't know what would be a better term, but we should distinguish between different semantic errors, such as:
- confabulating, i.e., recalling distorted or misinterpreted memories;
- lying, i.e., intentionally misrepresenting an event or memory;
- bullshitting, i.e., presenting a version without regard for the truth or provenance; etc.
I'm sure someone already made a better taxonomy, and hallucination is OK for normal public discussions, but I'm not sure why the distinctions aren't made in supposedly more serious works.
crazygringo
I mean, I think you're right that confabulation is probably a more correct technical term, but we all use hallucinate now, so it doesn't really matter. It might have been useful to argue about it 4 or 5 years ago, but that ship has long since sailed. [1]
And I think we already distinguish between types of errors -- LLM's effectively don't lie, AFAIK, unless you're asking them to engage in role-play or something. They mostly either hallucinate/confabulate in terms of inventing knowledge they don't have, or they just make "mistakes" e.g. in arithmetic, or in attempting to copy large amounts of code verbatim.
And when you're interested in mistakes, you're generally interested in a specific category of mistakes, like arithmetic, or logic, or copying mistakes, and we refer to them as such -- arithmetic errors, logic errors, etc.
So I don't think hallucination is taking away from any kind of specificity. To the contrary, it is providing specificity, because we don't call arithmetic errors hallucinations. And we use the word hallucination precisely to distinguish it from these run-of-the-mill mistakes.
[1] https://trends.google.com/explore?q=hallucination&date=all&g...
tsumnia
> You're just arguing about semantics. It doesn't matter in any substantial way.
While I agree for many general aspects of LLMs, I do disagree in terms of some of the meta-terms used when describing LLM behavior. For example, the idea that AI has "bias" is problematic because neural networks literally have a variable called "bias", thus of course AI will always have "bias". Plus, a biases AI is literally the purpose behind classification algorithms.
But these terms, "bias" and "hallucinations", are co-opted to spin a narrative of no longer trusting AI.
How in the world did creating an overly confident chatbot completely 180 years of AI progress and sentiment?
spwa4
Terminology sucks. There is an ML technique called "hallucinating", that can really improve results. It works, for example, on Alphafold, and allows you to reverse the function of Alphafold (instead of finding the fold that matches a given protein or protein complex, find a protein complex that has a specific shape, or fits on a specific shape).
It's called hallucination because it works by imagining you have the solution and then learning what the input needs to be to get that solution. Treat the input or the output as weights and learn an input that fits an output or vice-versa instead of the network. Fix what the network sees as the "real world" to match what "what you already knew", just like a hallucinating human does.
You can imagine how hard it is to find papers on this technique nowadays.
2026iknewit
This is still not true.
"Whenever they get something "right," it's literally by accident." "the random word generator"
First of, the input is not random at all which allows the question how random the output is.
Second, it compresses data which has an impact on that data. Probably cleaning or adjustment which should reduce 'random' even more. It compresses data from us into concepts. A high level concept is more robust than 'random'.
Thinking or reasoning models are also finetuning the response by walking the hyperspace and basically collecting and strengthening data.
We as humans do very similiar things and no one is calling us just random word predictors...
And because of this, "hallucinations -- plausible but factually incorrect outputs" is an absolut accurate description of what an LLM does when it response with a low probability output.
Humans also do this often enough btw.
Please stop saying an LLM is just a random word predictor.
jaredcwhite
OK, it's a semi-random word predictor.
lambdaone
"Hallucinate" is a term of art, and does not imply a philosophical commitment to whether LLMs have minds. "Confabulation" might be a more appropriate term.
What is indisputable is that LLMs, even though they are 'just' word generators, are remarkably good at generating factual statements and accurate answers to problems, yet also regrettably inclined to generating apparenly equally confident counterfactual statements and bogus answers. That's all that 'hallucination' means in this context.
If this work can be replicated, it may offer a way to greatly improve the signal-to-bullshit ratio of LLMs, and that will be both impressive and very useful if true.
gchamonlive
I think you are reading too much into the title. "Neuron" is a totally valid way of refering to the perception unit in such models. It's got nothing to do with biology apart from the name.
cyost
Somewhat on the subject: here's a neuroscientist reflecting on our failure to model a worm's brain, a "mere" 302 neurons (3 parts, this one is the first). https://ccli.substack.com/p/the-biggest-mystery-in-neuroscie...
Biological systems are hard.
pektezol
It’s been more than 3 years and people still can’t understand this.
Miraltar
Yup, three years of fighting on "correct" use of words. Maybe someday people will accept that LLMs do "hallucinate" even tho it's not the same "hallucinate" as for humans.
null
bigfishrunning
People don't want to understand this.
ohyes
People have a strong financial incentive to not understand this. It’s subprime mortgages.
user34283
Three top level comments so far, and as far as I can tell each is entirely pointless yapping about semantics around 'hallucination'.
Who cares? I wonder if any of the commenters is qualified enough to understand the research at all. I am not.
amelius
You're reasoning at the wrong abstraction level. This will not get you very far.
dpweb
We don't understand the brain. We fully understand what LLM are doing, humans built them. The idea we don't understand what LLMs are doing is magical. Magical is good for clicks and fundraising.
allears
We know how we built the machines, but their complexity produces emergent behavior that we don't completely understand.
lambdaone
The title as posted is misleading and sensational: we should use the actual paper title, "H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs", which is far less sensational.
dijksterhuis
> please use the original title, unless it is misleading or linkbait; don't editorialize.
lukeinator42
It is fairly well established that neurons in these artificial neural networks are polysemantic and information is represented in directions in the activation embedding space rather than neurons independently representing information (which is why anthropic is doing things like training sparse autoencoders). I haven't read the paper in depth but it seems like it is based on a fundamental misunderstanding about neurons in ANNs vs the brain.
buppermint
From a quick read, this is cool but maybe a little overstated. From Figure 3, completely suppressing these neurons only reduces hallucinations by like ~5% compared to their normal state.
Table 1 is even more odd, H-neurons predicts hallucination ~75% of the time but a similar % of random neurons predict hallucinations ~60% of the time, which doesn't seem like a huge difference to me.
null
jurystillout26
Many people seem to be claiming that "LLMs do what humans do / humans also hallucinate", as if the process of human knowledge is identical to the purely semantic knowledge of LLMs.
No. Human beings have experiential, embodied, temporal knowledge of the world through our senses. That is why we can, say, empirically know something, which is vastly different than semantically or logically knowing something. Yes, human beings also have probabalistic ways of understanding the world and interacting with others. We have many other forms of knowledge as well and the LLM way of interpreting data is by no means the primary way in which we feel confident that something is true or false.
That said, I don't get up in arms about the term "hallucination", although I prefer the term confabulation per neuroscientist Anil Seth. Many clunky metaphors are now mainstream, and as long as the engineers and researchers who study these kinds of things are ok with that, that's the most important thing.
But what I think all these people who dismiss objections to the term as "arguing semantics" are missing is the fundamental point: LLMs have no intent, and they have no way of distinguishing what data is empirically true or not. This is why the framing, not just the semantics, of this piece is flawed. "Hallucinations" is a feature of LLMs that exists at the very conceptual level, not as a design flaw of current models. They have pattern recognition, which gets us very far in terms of knowing things, but people who only rely on such methods of knowing are most often referred to as conspiracy theorists.
bowsamic
“Physical”? I don’t think that’s the right word to use
Miraltar
Agreed, the original title is better (and typo free) although a bit long.
H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
ZeroConcerns
Yeah, actual title is "H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs"
But regardless of title this is all highly dubious...
jrm4
There is no such thing as a "hallucination" that could be isolated from "not a hallucination" in a provable systematic way because all they do is hallucinate.
I'm extremely comfortable calling this paper complete and utter bullshit (or, I suppose if I'm being charitable, extremely poorly titled) from the title alone.
lambdaone
Arguably, all we do is something similar to hallucination; it's just that hundreds of millions of years have selected against brains that generate internal states that lead to counter-survival behavior.
I recently almost fell on a tram as it accelerated suddenly; my arm reached out for a stanchion that was out of my vision, so rapidly I wasn't aware of what I was doing before it had happened. All of this occurred using subconscious processes, based on a non-physical internal mental model of something I literally couldn't see at the moment it happened. Consciousness is over-rated; I believe Thomas Metzinger's work on consciousness (specifically, the illusion of consciousness) captures something really important about the nature of how our minds really work.
2026iknewit
The Input of an LLM is real data. The n-dimensional space an LLM works in is a reflection of this. Statistical probably speaking there should be a way of knowing when an LLM is confident vs. when not.
This type of research is absolut valid.
An LLM is not just hallucinate.
eurekin
We are in the vibe science era, it seems
GaggiX
The people talking about semantics in the comment section seems to completely ignore the positive correlation of LLMs between accuracy and stated confidence, this is called calibration and this "old" blog post from a year ago already showed it, LLMs can know what they know: https://openai.com/index/introducing-simpleqa/
LLMs don't "hallucinate" or "lie." They have no intent. They're Weighted Random Word Generator Machines. They're train mathematically to create series of tokens. Whenever they get something "right," it's literally by accident. If you get that rate of accidental rightness up to 80%, and people suddenly thing the random word generator is some kind of oracle. It's not. It's a large model with an embedded space, tokens and a whole series of computationally expensive perceptron and attention blocks that generate output.
The title/introduction is very baited, because it implies some "physical" connection to hallucinations in biological organism, but it's focused on trying to single out certain parts of the model. LLMs are absolutely nothing at all like a biological system, of which our brains are orders of magnitudes more complex than the machines we've built that we no longer fully understand. Believing in these LLMs as being some next stage in understanding intelligence is hubris.