Responses from LLMs are not facts

101 comments

·October 29, 2025

stavros

> Responses from Large Language Models like ChatGPT, Claude, or Gemini are not facts. > They’re predicting what words are most likely to come next in a sequence.

I wish we'd move away from these reductive statements that sound like they mean something but are actually a non-sequitur. "Articles on Wikipedia are not facts. They're variations in magnetic flux on a platter transferred over the network".

Yeah, that doesn't make them not facts, though. The LLM should simply cite its sources, and so should Wikipedia, a human, or a dog, otherwise I'm not believing any of them. Especially the human.

HarHarVeryFunny

How do you suppose an LLM can cite it's sources when it doesn't have one?! It's a language model, not an encyclopedia. The LLM doesn't even get to choose what it outputs - it just gives next word probabilities and one of those is selected AT RANDOM by the sampler.

So, maybe words 1-3 of the LLMs answer are some common turn of speech that was predicted by 1000s of samples, word 4 came from 4chan (a low probability random pick from the sampler), and word 5 was hallucinated. So, what's the "source" for this "fact"?

anigbrowl

This is transparently untrue. Gemini reliably produces links (both inline and at the paragraph level, and most of the time summarizes them correctly. This has been publically available for quite a while now.

bee_rider

The fancy online models can produce links for you. They might get the summary wrong, but they’ve got a link, you can follow it and check it out.

In this context they are more like conversational search engines. But that’s a pretty decent feature IMO.

HarHarVeryFunny

If the output came from RAG (search) rather than the model itself, then a link is possible, but not if the model just generated the sequence of words by itself.

Note too that these models can, and do, make up references. If it predicts a reference is called for, then it'll generate one, and to the LLM it makes no difference if that reference was something actually in the training data or just something statistically plausible it made up.

Terr_

> [Comment, sarcastically:] not facts [but] variations in magnetic flux

I disagree, you're conflating two different things here:

1. There's a difference between an answer N which was reached by a good/reliable f(x)=N process, versus the same answer reached by using a flawed/unreliable g(x)=N process.

2. There's a difference between the platonic information/concept versus how it happens to reach you via symbols and atoms and photons.

In other words, TFA is focused on how the result is reached, but your response concerns how a result is expressed.

_______

Imagine I took a pen and wrote down "3+4=", then I rolled dice (2d6) which totaled to 7, causing me to complete the equation as "3+4=7".

That's a (#1) problem, isn't it? While it happens to be textually "correct", the process is flawed in a way that taints the result. When you object that my process stinks because "you're just rolling dice", that's a good objection, rather than a "reductive non-sequitur."

Meanwhile, I doubt anybody is remotely worried about (#2) stuff, like how the equation would "just" be subjective sensations of electrical signals from reflected photons from ink molecules on cellulose.

matharmin

Except, what makes an LLM fundamentally flawed?

Wikipedia has mistakes, direct sources have mistakes, human makes mistakes. They have different levels of accuracy, and I would never use LLM output as a reference source. But "they’re predicting what words are most likely to come next in a sequence" does not fully capture what LLMs do, and does not mean it's necessarily a flawed process.

dullcrisp

Articles on Wikipedia are not facts. They’re the product of community contributions on a topic.

I don’t think that’s really a non-sequitur, but I guess it depends on what’s meant by facts in your epistemology.

glitchc

We can call wikipedia content facts by consensus. It's hard to say the same for LLMs since the input is not curated for accuracy, even though the wikipedia content is a subset of the entire training corpus.

In short, the curation is the key differentiator between the two.

drdaeman

> not curated for accuracy

I thought accuracy is one of metrics that the models are trained for…

charcircuit

Consensus of what the news media says. Wikipedia doesn't actually care if the content is true.

SoftTalker

This is all of written history. Nobody is still around to verify what really happened. There’s archeological evidence in some cases but that is subject to interpretation, and in most cases very few people have actually seen it firsthand. The only facts are those that can be proven by scientific experiments or logical inference from other facts.

andy99

I agree that next word prediction, while technically correct, doesn’t capture the full nature of what LLMs are optimized for. And the article gets it wrong. In fact they’re optimized for sycophancy and human preference, to produce plausible feel good slop that looks good and makes you read it uncritically, the high fructose corn syrup of reading.

So things like brainstorming or summarization actually give horrible results optimized to make you feel smart and not to help you learn or critically appraise anything.

OTOH, for most actual facts, I think LLMs are pretty good and continue to get better (as long as you’re asking direct questions about a real thing).

So yeah they’re not just next word predictors, even if that describes how they work; they’re something much more insidious that has been optimized by world experts to be more convincing than you, whether right or wrong. If your boss is citing LLMs you’ve already lost, just move on.

zahlman

> In fact they’re optimized for sycophancy and human preference, to produce plausible feel good slop that looks good and makes you read it uncritically, the high fructose corn syrup of reading.

I'm scared that the thought registered instinctively that an LLM absolutely could come up with that metaphor.

anigbrowl

Fully agree, it's become a thought-terminating cliche at this point.

> They’re predicting what words are most likely to come next in a sequence.

I find this objection hilarious because it describes an awful lot of humans. One of the banes of my life is when I'm trying to phrase something clearly and specifically and the person I'm talking to interrupts to respond to a wildly incorrect guess about what I was about to say next.A lot of people don't have the patience to unspool a thought or the instinct to ask a clarifying question instead of plowing ahead with their mistaken assumption.

tomlockwood

The funny irony is that for years and years universities would as a policy not accept wikipedia as a reference. I think the thinking was that a published book was more likely to have been written by an expert in the field. Now, even that is less and less likely.

null

[deleted]

Mistletoe

Gemini cites its sources all the time.

https://gemini.google.com/app/6da2be1502b764f1

LLMs used to be awful, but lately I find them just as good as Wikipedia, which is to say very very good. Sometimes they hallucinate no doubt, but in general it's as good as a google search for me.

awkwardpotato

Do you check these sources? I find Gemini and, especially, Google Search AI to regularly cite sources that does not say what it claims to says. For example, (not an actual example but along these lines), "Can Google Sheets do x feature" and it replies "Yup" and links to an Excel YouTube tutorial as its source

Telemakhos

I ask ChatGPT and Grok questions about Latin and Greek all the time, and they'll brazenly invent sources, quoting them in Greek or Latin. As an example (an actual example), I asked ChatGPT to round up all the poetry that, like Catullus' sparrow and Statius' parrot, dealt with birds. It hallucinated a bird poem by Callimachus that it claimed was the prototype and gave me not only an English translation but a Greek original—that never existed. It just plain lied. I have zero faith in any fact about the ancient world that comes from an LLM.

On the other hand, LLMs do a great job translating between languages, which is probably why they can vibe code. They catch some grammar errors, too, although not all of them, and even some stylistic errors, so it's useful to run Greek compositions through them. Ask it about linguistic questions ("Which Greek verbs other than ἀφίημι violate Grassman's law?"), though, and it will spew a bunch of irrelevant examples that don't pertain, because it doesn't actually understand what it's doing, just predicting tokens.

Mistletoe

The sources look good on the one I posted to me.

stavros

I agree, I use them all the time too. When they make a claim, I click the citation and verify.

degamad

And in your verification, what's your usual experience?

Citation usually shows the claim was right? Mix of right and wrong, say 60%/40%? Usually wrong? Citation often doesn't exist or is not relevant to the claim?

(I don't often use the AI answers, but the few times I bother to check the citations, they usually don't fill me with confidence about the answer.)

rlayton2

Even in small companies, its important to discuss what the expectations around AI are. In the absence of any further requirements (i.e. assuming privacy is not a major issue, regulatory issues etc), it can be as simple as clearly saying: "You can use AI, but you are ultimately responsible for what you deliver. It is expected you verify the data, test the code, and otherwise validate the responses."

Something as simple as that gives an expectation, without being overbearing to start with.

foxfired

We used to say "stop copying and pasting from stackoverflow without reading it first". Everything changed, yet everything is the same.

alecsm

At least Chat GPT is not condescending like many people from SO.

I'm so glad the site is dead.

ceejayoz

ChatGPT would be worthless without training material like SO.

CamperBob2

Not really. LLMs are good at indexing and digesting documentation, up to and including actual source code, and answering questions about it.

And they never "Vote to close as duplicate" because somebody asked something vaguely similar 10 years ago about a completely different platform and didn't get a good answer even then.

Stack Overflow is the taxi industry to AI's Uber. We needed it at one point, but it really always sucked, and unsurprisingly some people took exception to that and built something better, or at least different.

xgulfie

You're absolutely correct!

hekkle

[dead]

Brendinooo

I mostly just think this is a bad response to a real problem.

Attitude problems aside[0], if you lead with "Responses from Large Language Models like ChatGPT, Claude, or Gemini are not facts", most people are probably going to respond with some form of "well it said this one fact and I know for sure that one was right" and move on from this unconvinced of anything.

I'm not sure what a better approach is though. Honestly, "Don’t copy-paste something that a chatbot said and send it to someone as if that’s authoritative" feels like a better starting point. Another comment in this thread about asking to share the prompt and demonstrating how it can be manipulated could help. Or talking about LLM bias. I dunno.

P.S. what's up with those form inputs/submits for the good/bad uses?

[0]: "lmgtfy" or even "rtfm" seemed to fade after the novelty wore off and everyone realized it was a jerk move to use terms like that. Or maybe it's a generational thing?

geocrasher

LLM's follow the old adage of "Garbage In, Garbage Out". LLM's work great for things that are well documented and understood.

If you use LLM's to understand things that are poorly understood in general, you're going to get poor information because the source was poor. Garbage in, Garbage out.

They are also terrible at understanding context unless you specify everything quite explicitly. In the tech support world, we get people arguing about a recommended course of action because ChatGPT said it should be something else. And it should, in the context for which the answer was originally given. But in proprietary systems that are largely undocumented (publicly) they fall apart fast.

yellow_postit

This feels like getting taught in school not to cite Wikipedia when the actual digital literacy challenge is deeper— learn where the info comes from and to critically think.

gus_massa

> Imagine someone who has read thousands of books, but doesn’t remember where they read what.

That sound like me! Well I probably read only a hundred, but I also mostly forgot the sources. I can halucinate a source, like "there is (probably) a Schaum book about that".

delis-thumbs-7e

You mean I should not use ”But gus_massa on HN said that!” as a final authority on every subject anymore?

mr3martinis

Bosses love it when you call them foolish.

9x39

"Can you share the chat so we can look at it together?"

Asking for the receipts so you can figure out where they put their thumb on the scale is more illuminating.

notatoad

you're more polite than me, but that's essentially the same response as what i have to people citing chatGPT. i just say "ChatGPT told me that's wrong".

if somebody thinks that unverified LLM output is relevant to a conversation, i don't want to have to defend why it shouldn't be part of the conversion, i want to put the responsibility for justifying it back onto them.

null

[deleted]

hamasho

Remind me this useful URL when boss says "we need carousel".

https://shouldiuseacarousel.com/

ixxie

I know its disturbing to many, but there is something nice about the post-truth moment: it feels like more people are actually questioning things more than when I grew up in the 90s/00s.

I think we need to shift towards a socionormative understanding of knowledge; as Rorty put it: "a fact is just something we can't be bothered to argue about". I agree with him that talking about truth isn't so useful for moving our culture forward.

We should be talking about how to negotiate the diverse vocabularies of discursive communities as they increasingly clash in our globalized culture. Dialectical exclusion is the cultural catastrophe of the day.

Aeolun

I think my main response to these messages is: “If ChatGPT is more trustworthy than me, the expert you hired, what do you have me for?”

I can spend hours refuting and explaining why what ChatGPT told you doesn’t apply in our situation, or you can… just trust me?

paulcole

The people who get the most haughty about insisting that they be “just trusted” tend to be the people who are the least trustworthy.