Yet Another LLM Rant
80 comments
·August 9, 2025hodgehog11
raincole
While the normal distribution meme is notoriously overused, I think it fits the scenario here.
LLMs know so much (when you just use ChatGPT for the first time like it's an Oracle machine) -> LLMs don't know anything (when you understand how machine learning works) -> LLMs know so much (when you actually think about what 'know' means)
libraryofbabel
Yeah. The empty “it’s just a statistical model” critique (or the dressed-up “stochastic parrots” version of it) is almost a sign at this point that the person using it formed their opinions about AI back when ChatGPT first came out, and hasn’t really bothered to engage with it much since then.
If in 2022 I’d tried to convince AI skeptics that in three years we might have tools on the level of Claude Code, I’m sure I’d have heard everyone say it would be impossible because “it’s just a statistical model.” But it turned out that there was a lot more potential in the architecture for encoding structured knowledge, complex reasoning, etc., despite that architecture being probabilistic. (Don’t bet against the Bitter Lesson.)
LLMs have a lot of problems, hallucination still being one of them. I’d be the first to advocate for a skeptical hype-free approach to deploying them in software engineering. But at this point we need careful informed engagement with where the models are at now rather than cherry-picked examples and rants.
seba_dos1
Unless what you work on is very simple and mostly mindless, using tools like Claude Code is the exact opposite of how to make the current SotA LLMs useful for coding. The models can help and boost your productivity, but it doesn't happen by letting them do more stuff autonomously. Quite the contrary.
And when what you usually work on actually is very simple and mostly mindless, you'd probably benefit more from doing it yourself, so you can progress above the junior stuff one day.
vidarh
People repeating the "stochastic parrot" meme in all kinds of variations if anything appear to be more like stochastic parrots than the typical LLM is.
efilife
> it cannot "logically reason" like a human does
Reason? Maybe. But there's one limitation that we currently have no idea how to overcome; LLMs don't know how much they know. If they tell you they don't something it may be a lie. If they tell you they do, this may be a lie too. I, a human, certainly know what I know and what I don't and can recall from where I know the information
vidarh
I have never met a human who has a good grasp of what they know and don't know. They may have a better graps of it than an LLM, but humans are awfully bad at understanding the limits of our own knowledge, and will argue very strongly in favour of knowing more than we demonstrably do in all kinds of contexts.
hodgehog11
You are judging this based on what the LLM outputs, not on its internals. When we peer into its internals, it seems that LLMs actually have a pretty good representation of what they do and don't know; this just isn't reflected in the output because the relevant information is lost in future context.
mrcartmeneses
Socrates would beg to differ
AaronAPU
I’m afraid that sense of knowing what you know is very much illusory for humans as well. Everyone is just slowly having to come to terms with that.
lblume
Do you really know what you don't know? This would rule out unknown unknowns entirely.
add-sub-mul-div
Yes, it's not that people know specifically what they don't know, it's that they develop the wisdom to know those boundaries and anticipate them and reduce their likelihood and impact.
For example, if I use the language of my expertise for a familiar project then the boundaries where the challenges might lie are known. If I start learning a new language for the project I won't know which areas might produce unknowns.
The LLM will happily give you code in a language it's not trained well on. With the same confidence as using any other language.
gallerdude
> OpenAI researcher Noam Brown on hallucination with the new IMO reasoning model:
> Mathematicians used to comb through model solutions because earlier systems would quietly flip an inequality or tuck in a wrong step, creating hallucinated answers.
> Brown says the updated IMO reasoning model now tends to say “I’m not sure” whenever it lacks a valid proof, which sharply cuts down on those hidden errors.
> TLDR, the model shows a clear shift away from hallucinations and toward reliable, self‑aware reasoning.
Barrin92
>Humans are statistical models too in an appropriate sense.
No, we aren't and I'm getting tired of this question begging and completely wrong statement. Human beings are capable of what Kant in fancy words called "transcendental apperception", we're already bringing our faculties to bear on experience without which the world would make no sense to us.
What that means in practical terms for programming problems of this kind is that, we can say "I don't know", which the LLM can't, because there's no "I", in the LLM, no unified subject that can distinguish what it knows and what it doesn't, what's within its domain of knowledge or outside.
>If you take the time to define "know", "think", and "deduct", you will find it difficult to argue current LLMs do not do these things
No, only if you don't spend the time to think about what knowledge is you'd make such a statement. What enables knowledge, which is not raw data but synthesized, structured cognition, is the faculties of the mind a priori categories we bring to bear on data.
That's why these systems are about as useless as a monkey with a typewriter when you try to have them work on manual memory management in C, because that's less of a task in auto completion and requires you to have in your mind a working model of the machine.
lblume
The position of Kant does not align with the direction modern neuroscience is heading towards. Current evidence seems to prefer decentralized theories of consciousness like Dennett's multiple drafts model[1], suggesting there is no central point where everything comes together to form conscious experience, but instead that it itself is constituted by collaborative processes that have multiple realizations.
Barrin92
>Current evidence seems to prefer decentralized theories of consciousness like Dennett
There is no such thing as consciousness in Dennett's theory, his position is that it doesn't exist, he is a Eliminativist. This is of course an absurd position with no evidence for it as people like Chalmers have pointed out (including in that Wikipedia article), and it might be the most comical and ideological position in the last 200 years.
hodgehog11
This is interesting philosophy, and others have better critiques here in that regard. I'm a mathematician, so I can only work in what I can define symbolically. Humans most certainly ARE statistical models by that definition: without invoking the precise terminology, we take input, yield output, and plausibly involve uncertain elements. One can argue as to whether this is the correct language or not, but I prefer to think this way, as the arrogance of human thinking has otherwise failed us in making good predictions about AI.
If you can come up with a symbolic description of a deficiency in how LLMs approach problems, that's fantastic, because we can use that to alter how these models are trained, and how we approach problems too!
> What that means in practical terms for programming problems of this kind is that, we can say "I don't know", which the LLM can't, because there's no "I", in the LLM, no unified subject that can distinguish what it knows and what it doesn't, what's within its domain of knowledge or outside.
We seriously don't know whether there is an "I" that is comprehended or not. I've seen arguments either way. But otherwise, this seems to refer to poor internal calibration of uncertainty, correct? This is an important problem! (It's also a problem with humans too, but I digress). LLMs aren't nearly as bad as this as you might think, and there are a lot of things you can do (that the big tech companies do not do) that can better tune it's own self-confidence (as reflected in logits). I'm not aware of anything that uses this information as part of the context, so that might be a great idea. But on the other hand, maybe this actually isn't as important as we think it is.
HDThoreaun
Kant was a dualist, of course he didnt think humans were statistical models. It just turns out he was (probably) wrong.
bwfan123
Humans build theories of how things work. llms dont. Theories are deterministic symbolic representation of the chaotic worlds of meaning . Take the turing machine for example as a theory of computation in general, euclidean geometry as a theory for space, and newtonian mechanics as a theory for motion.
A theory gives 100% correct predictions. Although the theory itself may not model the world accurately. Such feedback between the theory, and its application in the world causes iterations to the theory. From newtonian mechanics to relativity etc.
Long story short, the LLM is a long way away from any of this. And to be fair to LLMs, the average human is not creating theories, it takes some genius to create them (newton, turing, etc).
Understanding something == knowing the theory of it.
hodgehog11
> Humans build theories of how things work. llms dont. Theories are deterministic symbolic representation of the chaotic worlds of meaning
What made you believe this is true? Like it or not, yes, they do (at least to the best extent of our definitions of what you've said). There is a big body of literature exploring this question, and the general consensus is that all performant deep learning models adopt an internal representation that can be extracted as a symbolic representation.
bwfan123
> What made you believe this is true?
I am yet to see a theory coming of the LLM that is sufficiently interesting. My comment was answering your question of what does it mean to "understanding something". My answer to that is: understanding something is knowing the theory of it.
Now, that begs the question of what is a theory. And to answer that, a theory comprises of building block symbols and a set of rules to combine them. for example, building blocks for space (and geometry) could be points, lines, etc. The key point in all of this is symbolism as abstractions to represent things in some world.
simonw
> This concludes all the testing for GPT5 I have to do. If a tool is able to actively mislead me this easy, which potentially results in me wasting significant amounts of time in trying to make something work that is guaranteed to never work, it’s a useless tool.
Yeah, except it isn't. You can get enormous value out of LLMs if you get over this weird science fiction requirement that they never make mistakes.
And yeah, their confidence is frustrating. Treat them like an over-confident twenty-something intern who doesn't like to admit when they get stuff wrong.
You have to put the effort in to learn how to use them with a skeptical eye. I've been getting value as a developer from LLMs since the GPT-3 era, and those models sucked.
> The only way ChatGPT will stop spreading that nonsense is if there is a significant mass of humans talking online about the lack of ZSTD support.
We actually have a robust solution for this exact problem now: run the prompt through a coding agent of some sort (Claude Code, Codex CLI, Cursor etc) that has access to the Swift compiler.
That way it can write code with the hallucinated COMPRESSION_ZSTD thing in it, observe that it doesn't compile and iterate further to figure out what does work.
Or the simpler version of the above: LLM writes code. You try and compile it. You get an error message and you paste that back into the LLM and let it have another go. That's been the main way I've worked with LLMs for almost three years now.
jpc0
If that same intern, when asked something, responded that they checked, gave you a link to a document they claim have the proof / answer but does in fact no and continued to do that they wouldn’t be an intern very long. But somehow this is acceptable behaviour for an AI?
I use AI for sure, but only on things that I can easily verify is correct (run a test or some code ), because I have had the AI give me functions in an API with links to online documentation for those functions, the document exists, the function is not in it, when called out instead of doing a basic tool call the AI will double down that it is correct and you the human are wrong. That would get an intern fired but here you are standing on the interns side.
simonw
Because LLMs aren't human beings.
I wrote a note about that here: https://simonwillison.net/2025/Mar/11/using-llms-for-code/#s...
> Don’t fall into the trap of anthropomorphizing LLMs and assuming that failures which would discredit a human should discredit the machine in the same way.
jpc0
> Or the simpler version of the above: LLM writes code. You try and compile it. You get an error message and you paste that back into the LLM and let it have another go. That's been the main way I've worked with LLMs for almost three years now.
I’m going to comment here about this but it’s a follow on to the other comment, this is exactly the workflow I was following. I had given it the compiler error and it blamed an environment issue, I confirmed the environment is as it claims it should be, it linked to documentation that doesn’t state what it claims is stated.
In a coding agent this would have been an endless feedback loop that eats millions of tokens.
This is the reason why I do not use coding agents, I can catch hallucinations and stop the feedback loop from ever happening in the first place without needing to watch an AI agent try to convince itself that it is correct and the compiler must be wrong.
jpc0
> And yeah, their confidence is frustrating. Treat them like an over-confident twenty-something intern who doesn't like to admit when they get stuff wrong.
I was explicitly calling out this comment, that intern would get fired if when explicitly called out they not only don’t want to admit they are wrong but vehemently disagree.
The interaction was “Implement X”, it gave an implementation, I responded “function y does not exist use a different method”, it instead of following that instruction gave me a link to the documentation for the library that it claim’s contains that function and told me I am wrong.
I said the documentation it linked does not contain that function and to do something different and yet it still refused to follow instructions and pushed back.
At that point I “fired” it and wrote the code myself.
raincole
Wow... people unironically anthropomorphize AI to the point that they expect AI to work exactly like an human intern, otherwise it's unacceptable...
bfioca
>...it’s a useless tool. I don’t like collaborating with chronic liars who aren’t able to openly point out knowledge gaps...
I think a more correct take here might be "it's a tool that I don't trust enough to use without checking," or at the very least, "it's a useless tool for my purposes." I understand your point, but I got a little caught up on the above line because it's very far out of alignment with my own experience using it to save enormous amounts of time.
libraryofbabel
and as others have pointed out, this issue of “how much should I check” is really just a subset of an old general problem in trust and knowledge (“epistemology” or what have you) that people have recognized since at least the scientific revolution. The Royal Society’s motto on its founding in the 1660s was “take no man’s word for it.”
Coding agents have now got pretty good at checking themselves against reality, at least for things where they can run unit tests or a compiler to surface errors. That would catch the error in TFA. Of course there is still more checking to do down the line, in code reviews etc, but that goes for humans too. (This is not to say that humans and LLMs should be treated the same here, but nor do I treat an intern’s code and a staff engineer’s code the same.) It’s a complex issue that we can’t really collapse into “LLMs are useless because they get things wrong sometimes.”
lazide
It’s a tool that fundamentally can’t be used reliably without double checking everything it. That is rather different than you’re presenting it.
vidarh
We double check human work too in all kinds of contexts.
A whole lot of my schooling involved listening to teachers repeating over and over to us how we should check our work, because we can't even trust ourselves.
(heck, I had to double-check and fix typos in this comment)
mhh__
Checking is usually faster than writing from scratch so this is still +EV
efilife
What does +EV mean? I'm looking but can't find a definition
exe34
> Checking is usually faster than writing from scratch
Famous last words. Checking trivial code for trivial bugs, yes. In science, you can have very subtle bugs that bias your results in ways that aren't obvious for a while until suddenly you find yourself retracting papers.
I've used LLMs to write tedious code (that should probably have been easier if the right API had been thought through), but when it comes to the important stuff, I'll probably always write an obviously correct version first and then let the LLM try to make a faster/more capable version, that I can check against the correct version.
tmnvdb
So similar to wikipedia
simianwords
Similar to anything really. Can I really trust anything without verifying? Scientific journals?
gdsys
"Based on my research, zstd compression is not natively supported by iOS or Apple's frameworks, which means you cannot use zstd compression without adding some form of external code to your project"
Thanks Sonnet.
Full response:
https://www.perplexity.ai/search/without-adding-third-party-...
dcre
Most likely the key here is web search. Later I will try the post’s example with gpt-5 with search. I would be surprised if it didn’t say the same thing.
From a product point of view, it seems clear that just as they have work to get the model to dynamically decide to use reasoning when it would help, they have to do the same with web search.
Maro
I'm currently working as a hands-off VP, so I don't use LLMs for coding at work, only for emails and document editing. I do use it for my hobby weekend coding stuff, which usually involves writing short 100-1000 LOC Python toy programs for my own education/entertainment. My way-of-working is quite primitive, I have zero integrations, nothing agentic, I just copy/paste with ChatGPT.
For this use-case it's been very useful, it can usually generate close-to-complete solutions, as long as it's one of the major programming languages, and it's a reasonably standard problem. So in general I'm always surprised when people say that LLMs are completely useless for coding --- this is just not true, and I feel sorry for people who shut themselves off from a useful tool.
But even at this small scale, even the best (o3) models sometimes totally fail. Recently I started a series of posts on distributed algorithms [1], and when I was working on the post/code for the Byzantine Generals / Consensus algorithm, o3 --- to my honest surprise --- just totally failed. I tried about 10 different times (both from scratch and describing the incorrect behaviour of its code), also showing it the original Lamport paper, and it just couldn't get it right... even though the toy implementation is just ~100 LOC, the actual algorithm portion is maybe 25 LOC. My hypothesis is that there are very few implementations online, and additionally I find the descriptions of the algorithm a bit vague (interleaves message cascade and decision logic).
drakenot
ChatGPT 5> Without adding third-party dependencies, how can I compress a Data stream with zstd in Swift on an iPhone?
(It auto-routed to its thinking model, took 25s)
> Short answer: you can’t. iOS doesn’t ship a public zstd encoder/decoder. The system Compression and AppleArchive frameworks only expose lzfse, lz4, zlib/deflate, and lzma. So with only Apple SDKs, there’s no way to produce zstd-compressed bytes.
> > Two practical paths:
> 1. If zstd isn’t mandatory, use a built-in codec (lzfse or lz4 are fastest) via Compression’s streaming API. Example (pure Swift, no deps) that compresses an arbitrary byte stream: >
> <code example>
> 2. If zstd is mandatory, you must embed zstd yourself. That can mean:
> - Vendoring the official C sources (libzstd) directly into your project and calling them via a tiny Swift wrapper/bridging header
> - Statically linking a prebuilt libzstd.a you compile for iOS
djeastm
Here's what I got with ChatGPT GPT-5, it must not have thought about it because it was near instantaneous:
>On iOS, you can use Apple’s built-in Zstandard (zstd) compression API from the Compression framework — no third-party dependencies required.
>Here’s how you can compress a Data stream with zstd: >...
https://chatgpt.com/share/68976c8f-7ae0-8012-b7a8-58e016246d...
hodgehog11
I think the useful takeaway here is that Top 1 operation is generally not a good idea, especially not for making judgements. This doesn't address the main points of the blog though.
quantum_state
An implication from the 1985 paper of Peter Naur on programming as theory building is that the current LLM coding tool would be very effective in generating technical debt even when it works ... use at your own risk.
jmkni
what's funny is that newer models will now be trained on the exact question, "Without adding third-party dependencies, how can I compress a Data stream with zstd in Swift on an iPhone?" and similar questions to it, because of this post
maybe the key to training future llm's is to write angry blog posts about the things they aren't good at and get them to the front page of hn?
nikolayasdf123
good point. nobody knows you are a dog on internet anyways
simianwords
The prompt works for me and correctly identifies that zstd doesn't work https://chatgpt.com/share/689769c5-bd68-800b-ae63-c6a337dcfa...
"Short answer: you can’t. iOS doesn’t ship a Zstandard (zstd) encoder/decoder in any first-party framework. Apple’s built-in Compression framework supports LZFSE, LZ4, zlib/deflate, and LZMA—not zstd."
tptacek
LLMs can be a useful tool, maybe. But don’t anthropomorphize them.
(but, earlier)
If a tool is able to actively mislead me this easy, which potentially results in me wasting significant amounts of time in trying to make something work that is guaranteed to never work, it’s a useless tool. I don’t like collaborating with chronic liars.
nikolayasdf123
possible solution: "reality checks"
I see that GitHub Copilot actually runs code, writes simple exploratory programs, iteratively tests its hypothesis. it is astoundingly effective and fast.
same here. nothing stops this AI to actually trying to implement whatever this AI suggested, compile it, and see if this is actually works.
grounding in reality at inference time, so to speak.
nikolayasdf123
> “Not having an answer” is not a possibility in this system - there’s always “a most likely response”, even if that makes no sense.
simple fix - probability cutoff. but in all seriousness this is something that will be fixed. don't see fundamental reason why not.
and I myself seen such hallucinations (about compression too actually) as well.
Seb-C
Hallucinations are not a bug or an exception, but a feature. Everything outputted by LLMs is 100% made-up, with a heavy bias towards what has been fed to it at first (human written content).
The fundamental reason why it cannot be fixed is because the model does not know anything about the reality, there is simply no such concept here.
To make a "probability cutoff" you first need a probability about what the reality/facts/truth is, and we have no such reliable and absolute data (and probably never will).
tmnvdb
You use a lot of anthropomorphisms: doesn't "know" anything (does your hard drive know things? Is it relevant?), "making things up" is even more linked to continuous intent. Unless you believe the LLMs are sentient this is a strange choice of words.
Seb-C
I originally put quotes around "know" and somehow lost it in an edit.
I'm precisely trying to criticize the claims of AGI and intelligence. English is not my native language, so nuances might be wrong.
I used the word "makes-up" in the sense of "builds" or "constructs" and did not mean any intelligence there.
simianwords
>To make a "probability cutoff" you first need a probability about what the reality/facts/truth is, and we have no such reliable and absolute data (and probably never will).
Can a human give a probability estimate to their predictions?
nikolayasdf123
have you seen Iris flowers dataset? it is fairly simple to find cutoffs to classify flowers.
or are you claiming in general that there is no objective truth in reality in philosophical sense? well, you can go on that more philosophical side of the road, or you can get more pragmatic. things just work, regardless how we talk about them.
Seb-C
I don't mean it in a philosophical sense, more in a rigorous scientific one.
Yes, we do have reliable datasets as in your example, but those are for specific topics and are not based on natural language. What I would call "classical" machine learning is already a useful technology where it's applied.
Jumping from separate datasets focused on specific topics to a single dataset describing "everything" at once is not something we are even close to doing, if it's even possible. Hence the claim of having a single AI able to answer anything is unreasonable.
The second issue is that even if we had such a hypothetical dataset, ultimately if you want a formal response from it, you need a formal question and a formal language (probably something between maths and programming?) in all the steps of the workflow.
LLMs are only statistical models about natural languages, so it's the antithesis of this very idea. To achieve that would have to be a completely different technology that has yet to even be theoretized.
I am sympathetic to the reasoning as to why LLMs should not be used to help some programmers right now. But I get a little frustrated seeing many of these kinds of posts that talk about fundamental limitations of LLMs vs humans on the grounds that it cannot "logically reason" like a human does. These are limitations in the current approach to training and objectives; internally, we have no clue what is going on.
> it’s “just a statistical model” that generates “language” based on a chain of “what is most likely to follow the previous phrase”
Humans are statistical models too in an appropriate sense. The question is whether we try to execute phrase by phrase or not, or whether it even matters what humans do in the long term.
> The only way ChatGPT will stop spreading that nonsense is if there is a significant mass of humans talking online about the lack of ZSTD support.
Or you can change the implicit bias in the model by being more clever with your training procedure. This is basic stats here, not everything is about data.
> They don’t know anything, they don’t think, they don’t learn, they don’t deduct. They generate real-looking text based on what is most likely based on the information it has been trained on.
This may be comforting to think, but it's just wrong. It would make my job so much easier if it were true. If you take the time to define "know", "think", and "deduct", you will find it difficult to argue current LLMs do not do these things. "Learn" is the exception here, and is a bit more complex, not only because of memory and bandwidth issues, but also because "understand" is difficult to define.