The "confident idiot" problem: Why AI needs hard rules, not vibe checks
71 comments
·December 4, 2025keiferski
rafamct
Yes you're totally right! I misunderstood what you meant, let me write six more paragraphs based on a similar misunderstanding rather than just trying to get clarification from you
TimPC
The benchmarks are dumb but highly followed so everyone optimizes for the wrong thing.
morkalork
This drives me nuts when trying to bounce an architecture or coding solution idea off an LLM. A human would answer with something like "what if you split up the responsibility and had X service or Y whatever". No matter how many times you tell the LLM not to return code, it returns code. Like it can't think or reason about something without writing it out first.
jqpabc123
We are trying to fix probability with more probability. That is a losing game.
Thanks for pointing out the elephant in the room with LLMs.
The basic design is non-deterministic. Trying to extract "facts" or "truth" or "accuracy" is an exercise in futility.
HarHarVeryFunny
The factuality problem with LLMs isn't because they are non-deterministic or statistically based, but simply because they operate at the level of words, not facts. They are language models.
You can't blame an LLM for getting the facts wrong, or hallucinating, when by design they don't even attempt to store facts in the first place. All they store are language statistics, boiling down to "with preceding context X, most statistically likely next words are A, B or C". The LLM wasn't designed to know or care that outputting "B" would represent a lie or hallucination, just that it's a statistically plausible potential next word.
AlecSchueler
In a way though those things aren't so different as they might first appear. The factual answer is traditionally the most plausible response to many questions. They don't operate on any level other than pure language but there are a heap of behaviours which emerge from that.
toddmorey
Yeah, that’s very well put. They don’t store black-and-white they store billions of grays. This is why tool use for research and grounding has been so transformative.
Forgeties79
> You can't blame an LLM for getting the facts wrong, or hallucinating, when by design they don't even attempt to store facts in the first place
On one level I agree, but I do feel it’s also right to blame the LLM/company for that when the goal is to replace my search engine of choice (my major tool for finding facts and answering general questions), which is a huge pillar of how they’re sold to/used by the public.
wisty
I think they are much smarter than that. Or will be soon.
But they are like a smart student trying to get a good grade (that's how they are trained!). They'll agree with us even if they think we're stupid, because that gets them better grades, and grades are all they care about.
Even if they are (or become) smart enough to know better, they don't care about you. They do what they were trained to do. They are becoming like a literal genie that has been told to tell us what we want to hear. And sometimes, we don't need to hear what we want to hear.
"What an insightful price of code! Using that API is the perfect way to efficiently process data. You have really highlighted the key point."
The problem is that chatbots are trained to do what we want, and most of us would rather have a syncophant who tells us we're right.
The real danger with AI isn't that it doesn't get smart, it's that it gets smart enough to find the ultimate weakness in its training function - humanity.
HarHarVeryFunny
> I think they are much smarter than that. Or will be soon.
It's not a matter of how smart they are (or appear), or how much smarter they may become - this is just the fundamental nature of Transformer-based LLMs and how they are trained.
The sycophantic personality is mostly unrelated to this. Maybe it's part human preference (conferred via RLHF training), but the "You're asbolutely right! (I was wrong)" is clearly deliberately trained, presumably as someone's idea of the best way to put lipstick on the pig.
You could imagine an expert system, CYC perhaps, that does deal in facts (not words) with a natural language interface, but still had a sycophantic personality just because someone thought it was a good idea.
fzeindl
Bruce Schneier put it well:
"Willison’s insight was that this isn’t just a filtering problem; it’s architectural. There is no privilege separation, and there is no separation between the data and control paths. The very mechanism that makes modern AI powerful - treating all inputs uniformly - is what makes it vulnerable. The security challenges we face today are structural consequences of using AI for everything."
- https://www.schneier.com/crypto-gram/archives/2025/1115.html...
CuriouslyC
Attributing that to Simon when people have been writing articles about that for the last year and a half doesn't seem fair. Simon gave that view visibility, because he's got a pulpit.
6LLvveMx2koXfwn
He referenced Simon's article from September the 12th 2022
DoctorOetker
Determinism is not the issue. Synonyms exist, there are multiple ways to express the same message.
When numeric models are fit to say scientific measurements, they do quite a good job at modeling the probability distribution. With a corpus of text we are not modeling truths but claims. The corpus contains contradicting claims. Humans have conflicting interests.
Source-aware training (which can't be done as an afterthought LoRA tweak, but needs to be done during base model training AKA pretraining) could enable LLM's to express according to which sources what answers apply. It could provide a review of competing interpretations and opinions, and source every belief, instead of having to rely on tool use / search engines.
None of the base model providers would do it at scale since it would reveal the corpus and result in attribution.
In theory entities like the European Union could mandate that LLM's used for processing government data, or sensitive citizen / corporate data MUST be trained source-aware, which would improve the situation, also making the decisions and reasoning more traceable. This would also ease the discussions and arguments about copyright issues, since it is clear LLM's COULD BE MADE TO ATTRIBUTE THEIR SOURCES.
I also think it would be undesirable to eliminate speculative output, it should just mark it explicitly:
"ACCORDING to <source(s) A(,B,C,..)> this can be explained by ...., ACCORDING to <other school of thought source(s) D,(E,F,...)> it is better explained by ...., however I SUSPECT that ...., since ...."
If it could explicitly separate the schools of thought sourced from the corpus, and also separate its own interpretations and mark them as LLM-speculated-suspicions, then we could still have the traceable references, without losing the potential novel insights LLM's may offer.
jennyholzer
"chatGPT, please generate 800 words of absolute bullshit to muddy up this comments section which accurately identifies why LLM technology is completely and totally dead in the water."
DoctorOetker
Less than 800 words, but more if you follow the link :)
https://arxiv.org/abs/2404.01019
"Source-Aware Training Enables Knowledge Attribution in Language Models"
sweezyjeezy
You could make an LLM deterministic if you really wanted to without a big loss in performance (fix random seeds, make MoE batching deterministic). That would not fix hallucinations.
I don't think using deterministic / stochastic as the dividing property is useful here if we're talking about a tool to mimic humans. Describing a human coder as 'deterministic' doesn't seem right - if you give one the same tasks under different environmental conditions, I don't think you get exactly the same outputs either. I think that what we're really talking is about some sort of fundamental 'instability' of LLMs a la chaos theory.
rs186
We talk about "probability" here because the topic is hallucination, not getting different answers each time you ask the same question. Maybe you could make the output deterministic but does not help with the hallucination problem at all.
sweezyjeezy
Exactly - the issue is not determinism.
ajuc
Yeah deterministic LLMs just hallucinate the same way every time.
zahlman
I can still remember when https://en.wikipedia.org/wiki/Fuzzy_electronics was the marketing buzz.
CuriouslyC
Hard drives and network pipes are non-deterministic too, we use error correction to deal with that problem.
pydry
I find it amusing that once you try to take LLMs and do productive work with them either this problem trips you up constantly OR the LLM ends up becoming a shallow UI over an existing app (not necessarily better, just different).
Davidzheng
lol humans are non-deterministic too
rthrfrd
But we also have a stake in our society, in the form of a reputation or accountability, that greatly influences our behaviour. So comparing us to an LLM has always been meaningless anyway.
actionfromafar
Hm, great lumps of money also detaches a person from reputation or accountability.
jennyholzer
to be fair, the people most antisocially obsessed with dogshit AI software are completely divorced from the social fabric and are not burdened by these sorts of juvenile social ties
some_furry
Human minds are more complicated than a language model that behaves like a stochastic echo.
pixl97
Birds are more complicated than jet engines, but jet engines travel a lot faster.
someguy101010
wrote about this a bit too in https://www.robw.fyi/2025/10/24/simple-control-flow-for-auto...
ran into this when writing agents to fix unit tests. often times they would just give up early so i started writing the verifiers directly into the agent's control flow and this produced much more reliable results. i believe claude code has hooks that do something similar as well.
nickdothutton
- Claude, please optimise the project for performance.
o Claude goes away for 15 minutes, doesn't profile anything, many code changes.
o Announces project now performs much better, saving 70% CPU.
- Claude, test the performance.
o Performance is 1% _slower_ than previous.
- Claude, can I have a refund for the $15 you just wasted?
o [Claude waffles], "no".
klysm
I’ve always found the hard numbers on performance improvement hilarious. It’s just mimicking what people say on the internet when they get performance gains
steerlabs
OP here. I wrote this because I got tired of agents confidently guessing answers when they should have asked for clarification (e.g. guessing "Springfield, IL" instead of asking "Which state?" when asked "weather in Springfield").
I built an open-source library to enforce these logic/safety rules outside the model loop: https://github.com/imtt-dev/steer
condiment
This approach kind of reminds me of taking an open-book test. Performing mandatory verification against a ground truth is like taking the test, then going back to your answers and looking up whether they match.
Unlike a student, the LLM never arrives at a sort of epistemic coherence, where they know what they know, how they know it, and how true it's likely to be. So you have to structure every problem into a format where the response can be evaluated against an external source of truth.
amorroxic
Thanks a lot for this. Also one question in case anyone could shed a bit of light: my understanding is that setting temperature=0, top_p=1 would cause deterministic output (identical output given identical input). For sure it won’t prevent factually wrong replies/hallucination, only maintains generation consistency (eq. classification tasks). Is this universally correct or is it dependent on model used? (or downright wrong understanding of course?)
wintermutestwin
Can someone please explain why these token guessing models aren't being combined with logic "filters?"
I remember when computers were lauded for being precise tools.
chrischen
We already have verification layers: high level strictly typed languages like Haskell, Ocaml, Rescript/Melange (js ecosystem), purescript (js), elm, gleam (erlang), f# (for .net ecosystem).
These aren’t just strict type systems but the language allows for algebraic data types, nominal types, etc, which allow for encoding higher level types enforced by the language compiler.
The AI essentially becomes a glorified blank filler filling in the blanks. Basic syntax errors or type errors, while common, are automatically caught by the compiler as part of the vibe coding feedback loop.
blixt
Yeah I’ve found that the only way to let AI build any larger amount of useful code and data for a user that does not review all of it requires a lot of “gutter rails”. Not just adding more prompting, because it is an after-the-fact solution. Not just verifying and erroring a turn, because it adds latency and allows the model to start spinning out of control. But also isolating tasks and autofixing output keep the model on track.
Models definitely need less and less of this for each version that comes out but it’s still what you need to do today if you want to be able to trust the output. And even in a future where models approach perfect, I think this approach will be the way to reduce latency and keep tabs on whether your prompts are producing the output you expected on a larger scale. You will also be building good evaluation data for testing alternative approaches, or even fine tuning.
toddmorey
Confident idiot: I’m exploring using LLM for diagram creation.
I’ve found after about 3 prompts to edit an image with Gemini, it will respond randomly with an entirely new image. Another quirk is it will respond “here’s the image with those edits” with no edits made. It’s like a toaster that will catch on fire every eighth or ninth time.
I am not sure how to mitigate this behavior. I think maybe an LLM as a judge step with vision to evaluate the output before passing it on to the poor user.
gaigalas
I don't think this approach can work.
Anyway, I've written a library in the past (way way before LLMs) that is very similar. It validates stuff and outputs translatable text saying what went wrong.
Someone ported the whole thing (core, DSL and validators) to python a while ago:
https://github.com/gurkin33/respect_validation/
Maybe you can use it. It seems it would save you time by not having to write so many verifiers: just use existing validators.
I would use this sort of thing very differently though (as a component in data synthesis).
yanis_t
It's funny when you start think how to succeed with LLMs, you end up thinking about modular code, good test coverage, though-through interfaces, code styles, ... basically with whatever standards of good code base we already had in the industry.
null
The thing that bothers me the most about LLMs is how they never seem to understand "the flow" of an actual conversation between humans. When I ask a person something, I expect them to give me a short reply which includes another question/asking for details/clarification. A conversation is thus an ongoing "dance" where the questioner and answerer gradually arrive to the same shared meaning.
LLMs don't do this. Instead, every question is immediately responded with extreme confidence with a paragraph or more of text. I know you can minimize this by configuring the settings on your account, but to me it just highlights how it's not operating in a way remotely similar to the human-human one I mentioned above. I constantly find myself saying, "No, I meant [concept] in this way, not that way," and then getting annoyed at the robot because it's masquerading as a human.