I genuinely don't understand why some people are still bullish about LLMs

1253 comments

·March 27, 2025

gilbetron

I get so confused on this. I play around, test, and mess with LLMs all the time and they are miraculous. Just amazing, doing things we dreamed about for decades. I mean, I can ask for obscure things with subtle nuance where I misspell words and mess up my question and it figures it out. It talks to me like a person. It generates really cool images. It helps me write code. And just tons of other stuff that astounds me.

And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly? This is the most amazing technology I've experienced as a 50+ year old nerd that has been sitting deep in tech for basically my whole life. This is the stuff of science fiction, and while there totally are limitations, the speed at which it is progressing is insane. And people are like, "Wah, it can't write code like a Senior engineer with 20 years of experience!"

Crazy.

WhyOhWhyQ

The technology is not just less than superintelligence, for many applications it is less than prior forms of intelligence like traditional search and Stack Exchange, which were easily accessible 3 years ago and are in the process of being displaced by LLMs. I find that outcome unimpressive.

And this Tweeter's complaints do not sound like a demand for superintelligence. They sound like a demand for something far more basic than the hype has been promising for years now. - "They continue to fabricate links, references, and quotes, like they did from day one." - "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error." (Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.) - "They reference a scientific publication, I look it up, it doesn't exist." - "I have tried Gemini, and actually it was even worse in that it frequently refuses to even search for a source and instead gives me instructions for how to do it myself." - "I also use them for quick estimates for orders of magnitude and they get them wrong all the time. " - "Yesterday I uploaded a paper to GPT to ask it to write a summary and it told me the paper is from 2023, when the header of the PDF clearly says it's from 2025. "

Thlom

A municipality in Norway used LLM to create a report about the school structure in the municipality (how many schools are there, how many should there be, where should they be, how big should they be, pros and cons of different size schools and classes etc etc). Turns out the LLM invented scientific papers to use as references and the whole report is complete and utter garbage based on hallucinations.

brookst

And that says… what? The entire LLM technology is worthless for all applications, from all implementations?

A company I worked for spent millions on a customer service solution that never worked. I wouldn’t say that contracted software is useless.

mwigdahl

All of these anecdotal stories about "LLM" failures need to go into more detail about what model, prompt, and scaffolding was used. It makes a huge difference. Were they using Deep Research, which searches for relevant articles and brings facts from them into the report? Or did they type a few sentences into ChatGPT Free and blindly take it on faith?

LLMs are _tools_, not oracles. They require thought and skill to use, and not every LLM is fungible with every other one, just like flathead, Phillips, and hex-head screwdrivers aren't freely interchangeable.

eric_cc

Sounds like a user problem, though. When used properly as a tool they are incredible. When you give up 100% trust to them to be perfect it’s you that is making the mistake.

jabroni_salad

Well yeah, it's fancy autocomplete. And it's extremely amazing what 'fancy autocomplete' is able to do, but making the decision to use an LLM for the type of project you described is effectively just magical thinking. That isn't an indictment against LLM, but rather the person who chose the wrong tool for the job.

KoolKat23

This is more a lack of understanding of it's limitations, it'd be different if they asked for it to write a python script to collate the data.

freilanzer

So they used the model as a database? It should be immediately obvious to anyone that this won't work.

w0m

"an old poorly implemented model can't do item X well therefore the technology is garbage"

Likely the most accurate measure of progress would be watching detractors goalposts move over time.

Terretta

"They continue to fabricate links, references, and quotes, like they did from day one." - "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error."

Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.

There are no fabricated links, references, or quotes, in OpenAI's GPT 4.5 + Deep Research.

It's unfortunate the cost of a Deep Research bespoke white paper is so high. That mode is phenomenal for pre-work domain research. You get an analyst's two week writeup in under 20 minutes, for the low cost of $200/month (though I've seen estimates that white paper cost OpenAI over USD 3000 to produce for you, which explains the monthly limits).

You still need to be a domain expert to make use of this, just as you need to be to make use of an analyst. Both the analyst and Deep Research can generate flawed writeups with similar misunderstandings: mis-synthesizing, misapplication, or missing inclusion of some essential.

Neither analyst nor LLM is a substitute for mastery.

fridder

While I agree, it doesn't stop business folks pushing for its use in area where it is inappropriate. That is, at least for me, part of the skepticism.

blactuary

How do people in the future become domain experts capable of properly making use of it if they are not the analyst spending two weeks on the write-up today?

never_inline

My complaints with Deep Research LLMs is they don't go deeper than 2 pages of SERPs. I want them to dig down obscure stuff, not list cursorily relevant peripheral directions. they just seem to do breadth first than depth first search.

waffletower

This assessment is incomplete. Large languages models are both less and more than these traditional tools. They have not subsumed them and all can sit together in separate tabs of a single browser window. They are another resource, and when the conditions are right, which is often the case in my experience, they are a startlingly effective tool for navigating the information landscape. The criticism of Gemini is a fair one, and I encountered it yesterday, but perhaps with 50% less entitlement. But Gemini also helped me translate obscure termios APIs to python from C source code I provided. The equivalent using search and/or Stack Overflow would have required multiple piecemeal searches without guarantees -- and definitely would have taken much more time.

casey2

The 404 links are hilarious, like you can't even parse the output and retry until it returns a link that doesn't 404? Even ignoring the billions in valuation, this is so bad for a $20 sub.

eric_cc

The tweeters complaints sound like a user problem. LLM’s are tools. How you use them, when you use them, and what you expect out of them should be based on the fact they are tools.

null

[deleted]

whamlastxmas

I’m sorry but the experience of coding with an LLM is about ten billion times better than googling and stack overflowing every single problem I come across. I’ve stack overflowed maybe like two things in the past half year and I’m so glad to not have to routinely use what is now a very broken search engine and web ecosystem.

player1234

How did you measure and compare googling/stack overflow to coding with an LLM? How did you get to the very impressive number ten billion times better?! Can you share your methodology? How have you defined better?

quonn

It‘s broken now. It was fine 5 years ago.

blactuary

The search ecosystem is broken now because google is focused on LLMs

vonneumannstan

[flagged]

zehaeva

I thought summarizing papers/stories/emails/meetings was one of the touted use cases of LLMs?

What are the use cases where the expected performance is high?

giantrobot

> This is just not a use case where the expected performance on these tasks is high.

Yet the hucksters hyping AI are falling all over themselves saying AI can do all this stuff. This is where the centi-billion dollar valuations are coming from. It's been years and these super hyped AIs still suck at basic tasks.

When pre-AI shit Google gave wrong answers it at least linked to the source of the wrong answers. LLMs just output something that looks like a link and calls it a day.

somenameforme

It's mostly because of how they were initially marketed. In an effort to drive hype 'we' were promised the world. Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence? In reality Bard, let alone whatever early version he was using, is about as sentient as my left asscheek.

OpenAI did similar things by focusing to the point of absurdity on 'safety' for what was basically a natural language search engine that has a habit of inventing nonsensical stuff. But on that same note (and also as you alluded to) - I do agree that LLMs have a lot of use as natural language search engines in spite of their proclivity to hallucinate. Being able to describe a e.g. function call (or some esoteric piece of history) by description and then often get the precise term/event that I'm looking for is just incredibly useful.

But LLMs obviously are not sentient, are not setting us on the path to AGI, or any other such nonsense. They're arguably what search engines should have been 10 or 15 years ago, but anti-competitive monopolization of the industry meant that search engine technology progress basically stalled out, if not regressed for the sake of ads (and individual 'entrepreneurs' becoming better at SEO), about the time Google fully established itself.

chimprich

> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?

I presume you are referring to this Google engineer, who was sacked for making the claim. Hardly an example of AI companies overhyping the tech; precisely the opposite, in fact. https://www.bbc.co.uk/news/technology-62275326

It seems to be a common human hallucination to imagine that large organisations are conspiring against us.

MindBeams

Corporations are motivated by profit, not doing what's best for humanity. If you need an example of "large organizations conspiring against us," I can give you twenty.

dsr_

There is no difference between an organization and a conspiracy. Organizing to do something is the same as conspiring to do something.

That leaves the question of whether the organization is commensal, symbiotic or predatory towards any given "us".

jibal

> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?

That's not what happened. Google stomped hard on Lemoine, saying clearly that he was wrong about LaMDA being sentient ... and then they fired him for leaking the transcripts.

Your whole argument here is based on false information and faulty logic.

brookst

Which is pretty ironic in a thread littered with people asserting LLMs are useless because they can hallucinate and create illogical outputs.

hatefulmoron

I think you're missing the point. He's pointing out what the atmosphere was/is around LLMs in these discussions, and how that impacts stories like with Lemoine.

I mean, you're right that he's silly and Google didn't want to be part of it, but it was (and is?) taken seriously that: LLMs are nascent AGI, companies are pouring money to get there first, we might be a year or two away. Take these as true, it's at least possible that Google might have something chained up in their basement.

In retrospect, Google dismissed him because he was acting in a strange and destructive way. At the time, it could be spun as just further evidence: they're silencing him because he's right. Could it have created such hysteria and silliness if the environment hadn't been so poisoned by the talk of imminent AGI/sentience?

thaumasiotes

> OpenAI did similar things by focusing to the point of absurdity on 'safety' for what was basically a natural language search engine that has a habit of inventing nonsensical stuff.

The focus on safety, and the concept of "AI", preexisted the product. An LLM was just the thing they eventually made; it wasn't the thing they were hoping to make. They applied their existing beliefs to it anyway.

fifticon

I am worried about them as a substitute for search engines. My reasoning is that classic google web-scraping and SEO, as shitty as it may be, is 'open-source' (or at least, 'open-citation') in nature - you can 'inspect the sh*t it's built from'. Whereas LLMs, to me seem like a chinese - or western - totalitarian political system wet dream - 'we can set up an inscrutable source of "truth" for the people to use, with the _truths_ we intend them to receive'. We already saw how weird and unsane this was, when they were configured to be woke under the previous regime. Imagine it being configured for 'the other post-truth' is a nightmare.

f6v

> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?

No, first time I hear about it. I guess the secret to happiness is not following leaks. I had very low expectations before trying LLMs and I’m extremely impressed now.

decimalenough

This was three years ago:

https://www.theguardian.com/technology/2022/jun/12/google-en...

He was fired and a casual browse of his blog makes it quite clear that he was a few fries short of a Happy Meal all along.

mdp2021

> """happiness"""

Not following leaks, or just the news, not living in the real world, not caring of the consequences of reality: anybody can think he's """happy""" with psychedelia and with just living in private world. But it is the same kind of "happy" that comes with "just smile".

If you did not get information that there are severe pitfalls - which is by the way so unrelated to the "it's sentient thing", as we are talking about the faults in the products, not the faults in human fools -, you are supposed to see them from your own judgement.

emsign

They have their value in analyzing huge amounts of data for example scientific papers or raw observations, but the popular public ones are mostly trained on stolen/pirated texts offthe internet and from social media clouds the companies control. So this means: bullshit in -> bullshit out. I don't need machines for that the regular human bullshitters do this job just fine.

TeMPOraL

> the popular public ones are mostly trained on stolen/pirated texts offthe internet

You mean like actual literature, textbooks and scientific papers? You can't get them in bulk without pirating. Thank intellectual property laws.

> from social media clouds the companies control

I.e. conversations of real people about matters of real life.

But if it satisfies your elitist, ivory-towerish vision of "healthy information diet" for LLMs, then consider that e.g. Twitter is where, until now, you'd get most updates from the best minds in several scientific fields. Or that besides r/All, the Reddit dataset also contains r/AskHistorians and other subreddits where actual experts answer questions and give first-hand accounts of things.

The actually important bit though, is that LLM training manages to extract value from both the "bullshit" and whatever you'd call "not bullshit", as the model has to learn to work with natural language just as much as it has to learn hard facts or scientific theories.

brookst

A lot of irony in that comment.

TeMPOraL

Nobody promised the world. The marketing underpromised and LLMs overdelivered. Safety worries didn't come from marketing, it came from people who were studying this as a mostly theoretical worry for the next 50+ years, only to see major milestones crossed a decade or more before they expected.

Did many people overhype LLMs? Yes, like with everything else (transhumanist ideas, quantum physics). It helps being more picky who one listens to, and whether they're just painting pretty pictures with words, or actually have something resembling a rational argument in there.

babyent

Bro AGI as a marketing term is too stale already.

We are now at Artificial SUPER Intelligence.

I’m waiting for Artificial Pro Max Super Duper Intelligence.

latentsea

You wait, someday they'll come out with something they start calling AI 2.0

snitty

Folks really over index when an LLM is very good for their use case. And most of the folks here are coders, at which they're already good and getting better.

For some tasks they're still next to useless, and people who do those tasks understandably don't get the hype.

Tell a lab biologist or chemist to use an LLM to help them with their work and they'll get very little useful out of it.

Ask an attorney to use it and it's going to miss things that are blindingly obvious to the attorney.

Ask a professional researcher to use it and it won't come up with good sources.

For me, I've had a lot of those really frustrating experiences where I'm having difficulty on a topic and it gives me utter incorrect junk because there just isn't a lot already published about that data.

I've fed it tricky programming tasks and gotten back code that doesn't work, and that I can't debug because I have no idea what it's trying to do, or I'm not familiar with the libraries it used.

eyegor

It sounds like you're trying to use these llms as oracles, which is going to cause you a lot of frustration. I've found almost all of them now excel at imitating a junior dev or a drunk PhD student. For example the other day I was looking at acoustic sensor data and I ran it down the trail of "what are some ways to look for repeating patterns like xyz" and 10 minutes later I had a mostly working proof of concept for a 2nd order spectrogram that reasonably dealt with spectral leakage and a half working mel spectrum fingerprint idea. Those are all things I was thinking about myself, so I was able to guide it to a mostly working prototype in very little time. But doing it myself from zero would've taken at least a couple of hours.

But truthfully 90% of work related programming is not problem solving, it's implementing business logic. And dealing with poor, ever changing customer specs. Which an llm will not help with.

TeMPOraL

> But truthfully 90% of work related programming is not problem solving, it's implementing business logic. And dealing with poor, ever changing customer specs. Which an llm will not help with.

Au contraire, these are exactly things LLMs are super helpful at - most of business logic in any company is just doing the same thing every other company is doing; there's not that many unique challenges in day-to-day programming (or business in general). And then, more than half of the work of "implementing business logic" is feeding data in and out, presenting it to the user, and a bunch of other things that boil down to gluing together preexisting components and frameworks - again, a kind of work that LLMs are quite a big time-saver for, if you use them right.

bachmeier

Strongly in agreement. I've tried them and mostly come away unimpressed. If you work in a field where you have to get things right, and it's more work to double check and then fix everything done by the LLM, they're worse than useless. Sure, I've seen a few cases where they have value, but they're not much of my job. Cool is not the same as valuable.

If you think "it can't quite do what I need, I'll wait a little longer until it can" you may still be waiting 50 years from now.

giantrobot

> If you work in a field where you have to get things right, and it's more work to double check and then fix everything done by the LLM, they're worse than useless.

Most programmers understand reading code is often harder than writing it. Especially when someone else wrote the code. I'm a bit amused by the cognitive dissonance of programmers understanding that and then praising code handed to them by an LLM.

It's not that LLMs are useless for programming (or other technical tasks) but they're very junior practitioners. Even when they get "smarter" with reasoning or more parameters their nature of confabulation means they can't be fully trusted in the way their proponents suggest we trust them.

It's not that people don't make mistakes but they often make reasonable mistakes. LLMs make unreasonable mistakes at random. There's no way to predict the distribution of their mistakes. I can learn a human junior developer sucks at memory management or something. I can ask them to improve areas they're weak in and check those areas of their work in more detail.

I have to spend a lot of time reviewing all output from LLMs because there's rarely rhyme or reason to their errors. They save me a bunch of typing but replace a lot of my savings with reviews and debugging.

throw234234234

My view is that it will be some time before they can as well because of the success in the software domain - not because LLM's aren't capable as a tech but because data owners and practitioners in other domains will resist the change. From the SWE experience, news reports, financial magazines, etc many are preparing accordingly, even if it is a subconscious thing. People don't like change, and don't want to be threatened when it is them at risk - no one wants what happened to artists and now SWE's to happen to their profession. They are happy for other professions to "democratize/commoditize" as long as it isn't them - after all this increases their purchasing power. Don't open source knowledge/products, don't let AI near your vertical domain, continue to command a premium for as long as you can - I've heard variations of this in many AI conversations. Much easier in oligopoly and monopoly like domains and/or domains where knowledge was known to be a moat even when mixed with software as you have more trust competitors won't do the same.

For many industries/people work is a means to earn, not something to be passionate in for its own sake. Its a means to provide for other things in life you are actually passionate about (e.g. family, lifestyle, etc). In the end AI may get your job eventually but if it gets you much later vs other industries/domains you win from a capital perspective as other goods get cheaper and you still command your pre-AI scarcity premium. This makes it easier for them to acquire more assets from the early disrupted industries and shield them from eventual AI taking over.

I'm seeing this directly in software. Less new frameworks/libraries/etc outside the AI domain being published IMO, more apprehension from companies to open source their work and/or expose what they do, etc. Attracting talent is also no longer as strong of a reason to showcase what you do to prospective employees - economic conditions and/or AI make that less necessary as well.

aetherson

I know at least two attorneys who use LLMs productively.

As with all LLM usage right now, it's a tool and not fit for every purpose. But it has legit uses for some attorney tasks.

NoGravitas

I frequently see news stories where attorneys get in trouble for using LLMs, because they cite hallucinated case law (e.g.). If they didn't get caught, that would look the same as using them "productively".

croes

The problem it‘s marketed as a general tool that at least in the future will work near perfectly if we just provide enough data and computing power.

anon291

> For some tasks they're still next to useless, and people who do those tasks understandably don't get the hype.

This is because programmers talk on the forums that programmers scrape to get data to train the models.

topaz0

Honestly it's worse than this. A good lab biologist/chemist will try to use it, understand that it's useless, and stop using it. A bad lab biologist/chemist will try to use it, think that it's useful, and then it will make them useless by giving them wrong information. So it's not just that people over-index when it is useful, they also over-index when it's actively harmful but they think it's useful.

brookst

You think good biologists never need to summarize work into digestible language, or fill out multiple huge, redundant grant applications with the same info, or reformat data, or check that a writeup accurate reflects data?

I’m not a biologist (good or bad) but the scientists I know (who I think are good) often complain that most of the work is drudgery unrelated to the science they love.

null

[deleted]

tiahura

This attorney uses it all day every day.

rcdwealth

[dead]

gloosx

The problem Sabine tries to communicate is that reality is different from what the cash-heads behind main commercial models are trying to portray. They push the narrative that they’ve created something akin to human cognition, when in reality, they’ve just optimised prediction algorithms on an unprecedented scale. They are trying to say that they created Intelligence, which is the ability to acquire and apply knowledge and skills, but we all know the only real Intelligence they are creating is the collection of information of military or political value.

The technology is indeed amazing and very amusing, but like all the good things in the hands of corporate overlords, it will be slowly turning into profit-milking abomination.

lnenad

> They push the narrative that they’ve created something akin to human cognition

This is your interpretation of what these companies are saying. I'd love to see if some company specifically anything like that?

Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now? How many things from today when brought back into 2010 would make the person using it make it feel like they're being tricked or pranked? We already take them for granted even thought they've only been around for less than half of a decade.

LLMs aren't a catch all solution to the world's problems; or something that is going to help us in every facet of our lives; or an accelerator for every industry that exists out there. But at no point in history could you talk to your phone about general topics, get information, practice language skills, build an assistant that teaches your kid about the basics of science, use something to accelerate your work in a many different ways etc...

Looking at llms shouldn't be boolean, it shouldn't be between they're the best thing ever invented vs they're useless; but it seems like everyone presents the issue in this manner and Sabine is part of that problem.

gloosx

No major company directly states "We have created human-like intelligence," they intentionally use suggestive language that leads people to think AI is approaching human cognition. This helps with hype, investment, and PR.

>I'd love to see if some company specifically anything like that?

1. DeepMind researchers: Sparks of Artificial General Intelligence: Early experiments with GPT-4 - https://arxiv.org/abs/2303.12712

2. "GPT-4 is not AGI, but it does exhibit more general intelligence than previous models." - Sam Altman

3. Musk has claimed that AI is on the path to "understanding the universe." His branding of Tesla's self-driving AI as "Full Self-Driving" (FSD) also misleadingly suggests a level of autonomous reasoning that doesn't exist.

4. Meta's AI chief scientist, Yann LeCun, has repeatedly said they are working on giving AI "common sense" and "world models" similar to how humans think.

>Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now?

ELIZA is an early natural language processing computer program developed from 1964 to 1967

ELIZA's creator, Weizenbaum, intended the program as a method to explore communication between humans and machines. He was surprised and shocked that some people, including Weizenbaum's secretary, attributed human-like feelings to the computer program. 60 years ago.

So as you can see, us humans are not too hard to fool with this.

hansmayer

How about Sam Altman literally saying on twitter "We know how to build AGI now"? That close enough?

re-thc

> Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now?

Lots e.g. vacuum cleaners.

> But at no point in history could you talk to your phone

You could always "talk" to your phone just like you could "talk" to a parrot or a dog. What does that even mean?

If we're talking about LLMs, I still haven't been able to have a real conversation with 1. There's too much of a lag to feel like a conversation and often doesn't reply with anything related.

sevensor

> This is your interpretation of what these companies are saying. I'd love to see if some company specifically anything like that?

What is the layman to make of the claim that we now have “reasoning” models? Certainly sounds like a claim of human-like cognition, even though the reality is different.

noufalibrahim

Much as i agree with the point about overhyping from companies, I'd be more sympathetic to this point of view if she acknowledged the merits of the technology.

Yes, it hallucinates and if you replace your brain with one of these things, you won't last too long. However, it can do things which, in the hands of someone experienced, are very empowering. And it doesn't take an expert to see the potential.

As it stands, it sounds like a case of "it's great in practice but the important question is how good it is in theory."

nottorp

If it works for you...

I use LLMs. They're somewhat useful if you're on a non niche problem. They're also useful instead of search engines, but that's because search has been entshittified more than because a LLM is better.

However 90% of the marketing material about them is simply disgusting. The bigwigs sound like they're spreading a new religion, and most enthusiasts sound like they're new converts to some sect.

If you're marketing it as a tool, fine. If you're marketing it as the third and fourth coming of $DEITY, get lost.

TeMPOraL

I hate to bring an ad hominem into this, but Sabine is a YouTube influencer now. That's her current career. So I'd assume this Tweet storm is also pushing a narrative on its own, because that's part of doing the work she chose to do to earn a living.

Pinch of salt & all.

ben_w

While true, I think this is more likely a question of framing or anchoring — I am continuously impressed and surprised by how good AI is, but I recognise all the criticisms she's making here. They're amazing, but at the same time they make very weird mistakes.

They actually remind me of myself, as I experience being a native English speaker now living in Berlin and attempting to use a language I mainly learned as an adult.

I can often appear competent in my use of the language, but then I'll do something stupid like asking someone in the office if we have a "Gabelstapler" I can borrow — Gabelstapler is "forklift truck", I meant to ask for a stapler, which is "Tacker" or "Hefter", and I somehow managed to make this mistake directly after carefully looking up the word. (Even this is a big improvement for me, as I started off like Officer Crabtree from Allo' Allo').

mdp2021

What you have done there is to discount statements that may build up a narrative - and still may remain fair... On which basis? Possibly they do not match your own narrative?

null

[deleted]

tim333

LLMs seem akin to parts of human cognition, maybe the initial fast thinking bit when ideas pop up in a second of two. But any human writing a review with links to sources would look them up and check the are they right ones that match the initial idea. Current LLMs don't seem to do that, at least the ones Sabine complains about.

Akin to human cognition but still a few bricks short of a load, as it were.

brookst

You lay the rhetoric on so thick (“cash heads”, “pushing the narrative”, “corporate overlords”, “profit-making abomination”) that it’s hard to understand your actual claim.

Are you trying to say that LLMs are useful now but you think that will stop being the case at some point in the future?

abecedarius

If it's just cash-heads pushing a narrative, where do Bengio and Hinton fit? Stuart Russell? Doug Hofstadter?

I mean fine, argue that they're mistaken to be concerned, if that's your belief. But dismissing it all as obvious shilling is not that argument.

jchw

Look man, and I'm saying this not to you but to everyone who is in this boat; you've got to understand that after a while, the novelty wears off. We get it. It's miraculous that some gigabytes of matrices can possibly interpret and generate text, images, and sound. It's fascinating, it really is. Sometimes, it's borderline terrifying.

But, if you spend too much time fawning over how impressive these things are, you might forget that something being impressive doesn't translate into something being useful.

Well, are they useful? ... Yeah, of course LLMs are useful, but we need to remain somewhat grounded in reality. How useful are LLMs? Well, they can dump out a boilerplate React frontend to a CRUD API, so I can imagine it could very well be harmful to a lot of software jobs, but I hope it doesn't bruise too many egos to point out that dumping out yet another UI that does the same thing we've done 1,000,000 times before isn't exactly novel. So it's useful for some software engineering tasks. Can it debug a complex crash? So far I'm around zero for ten and believe me, I'm trying. From Claude 3.7 to Gemini 2.5, Cursor to Claude Code, it's really hard to get these things to work through a problem the way anyone above the junior dev level can. Almost unilaterally, they just keep digging themselves deeper until they eventually give up and try to null out the code so that the buggy code path doesn't execute.

So when Sabine says they're useless for interpreting scientific publications, I have zero trouble believing that. Scoring high on some shitty benchmarks whose solutions are in the training set is not akin to generalized knowledge. And these huge context windows sound impressive, but dump a moderately large document into them and it's often a challenge to get them to actually pay attention to the details that matter. The best shot you have by far is if the document you need it to reference definitely was already in the training data.

It is very cool and even useful to some degree what LLMs can do, but just scoring a few more points on some benchmarks is simply not going to fix the problems current AI architecture has. There is only one Internet, and we literally lit it on fire to try to make these models score a few more points. The sooner the market catches up to the fact that they ran out of Internet to scrape and we're still nowhere near the singularity, the better.

hansmayer

100% this. I think we should start producing independent evaluations of these tools for their usefulness, not for whatever made up or convoluted evaluation index the OpenAI, Google or Anthropic throw at us.

raincole

> the novelty wears off.

Hardly. I pretty much have been using LLM at least weekly (most of the time daily) since GPT3.5. I am still amazed. It's really, really hard to not be bullish for me.

It kinda reminds me the days I learned Unix-like command line. At least once a week, I shouted to me self: "What? There is a one-liner that does that? People use awk/sed/xargs this way??" That's how I feel about LLM so far.

jchw

I tried LLMs for generating shell snippets. Mixed bag for me. It seems to have a hard time making portable awk/sed commands. It also really overcomplicates things; you really don't need to break out awk for most simple file renaming tasks. Lesser used utilities, all bets are off.

Yesterday Gemini 2.5 Pro suggested running "ps aux | grep filename.exe" to find a Wine process (pgrep is the much better way to go for that, but it's still wrong here) and get the PID, then pass that into "winedbg --attach" which is wrong in two different ways, because there is no --attach argument and the PID you pass into winedbg needs to be the Win32 one not the UNIX one. Not an impressive showing. (I already knew how to do all of this, but I was curious if it had any insights I didn't.)

For people with less experience I can see how getting e.g. tailored FFmpeg commands generated is immensely useful. On the other hand, I spent a decent amount of effort learning how to use a lot of these tools and for most of the ways I use them it would be horrific overkill to ask an LLM for something that I don't even need to look anything up to write myself.

Will people in the future simply not learn to write CLI commands? Very possible. However, I've come to a different, related conclusion: I think that these areas where LLMs really succeed in are examples of areas where we're doing a lot of needless work and requiring too much arcane knowledge. This counts for CLI usage and web development for sure. What we actually want to do should be significantly less complex to do. The LLM actually sort of solves this problem to the extent that it works, but it's a horrible kludge solution. Literally converting video files and performing basic operations on them should not require Googling reference material and Q&A websites for fifteen minutes. We've built a vastly overly complicated computing environment and there is a real chance that the primary user of many of the interfaces will eventually not even be humans. If the interface for the computer becomes the LLM, it's mostly going to be wasted if we keep using the same crappy underlying interfaces that got us into the "how do I extract tar file" problem in the first place.

null

[deleted]

viraptor

> dumping out yet another UI that does the same thing we've done 1,000,000 times before isn't exactly novel

As a yet that's exactly what people get paid to do every day. And if it saves them time, they won't exactly get bored of that feature.

sarchertech

They really don’t. People say this all the time, but you give any project a little time and it evolves into a special unique snowflake every single time.

That’s why every low code solution and boilerplate generator for the last 30 years failed to deliver on the promises they made.

jchw

If your job can be hallowed out into >90% entering prompts into AI text editors, you won't have to worry about continuing to be paid to do it every day for very long.

dartharva

> Well, are they useful? ... Yeah, of course LLMs are useful, but we need to remain somewhat grounded in reality. How useful are LLMs?

They are useful enough that they can passably replace (much more expensive) humans in a lot of noncritical jobs, thus being a tangible tool for securing enterprise bottom lines.

myko

Which jobs? I haven't seen LLMs successfully replace more expensive humans in noncritical roles

casey2

>they can dump out a boilerplate react frontend to a CRUD API

This is so clearly biased that it boarders on parody. You can only get out what you put in. The real use case of current LLMs is that any project that would previously require collaboration can now be down solo with a much faster turnover. Of course in 20 years when compute finally catches up they will just be super intelligent AGI

jchw

I have Cursor running on my machine right now. I am even paying for it. This is in part because no matter what happens, people keep professing, basically every single time a new model is released, that it has finally happened: programmers are finally obsolete.

Despite the ridiculous hype, though, I have found that these things have crossed into usefulness. I imagine for people with less experience, these tools are a godsend, enabling them to do things they definitely couldn't do on their own before. Cool.

Beyond that? I definitely struggle to find things I can do with these tools that I couldn't do better without. The main advantage so far is that these tools can do these things very fast and relatively cheaply. Personally, I would love to have a tool that I can describe what I want in detailed but plain English and have it be done. It would probably ruin my career, but it would be amazing for building software. It'd be like having an army of developers on your desktop computer.

But, alas, a lot of the cool shit I'd love to do with LLMs doesn't seem to pan out. They're really good at TypeScript and web stuff, but their proficiency definitely tapers off as you veer out. It seems to work best when you can find tasks that basically amount to translation, like converting between programming languages in a fuzzy way (e.g. trying to translate idioms). What's troubling me the most is that they can generate shitloads of code but basically can't really debug the code they write beyond the most entry-level problem-solving. Reverse engineering also seems like an amazing use case, but the implementations I've seen so far definitely are not scratching the itch.

> Of course in 20 years when compute finally catches up they will just be super intelligent AGI

I am betting against this. Not the "20 years" part, it could be months for all we know; but the "compute finally catches up" part. Our brains don't burn kilowatts of power to do what they do, yet given basically unbounded time and compute, current AI architectures are simply unable to do things that humans can, and there aren't many benchmarks that are demonstrating how absolutely cataclysmically wide the gap is.

I'm certain there's nothing magical about the meat brain, as much as that is existentially challenging. I'm not sure that this follows through to the idea that you could replicate it on a cluster of graphics cards, but I'm also not personally betting against that idea, either. On the other hand, getting the absurd results we have gotten out of AI models today didn't involve modest increases. It involved explosive investment in every dimension. You can only explode those dimensions out so far before you start to run up against the limitations of... well, physics.

Maybe understanding what LLMs are fundamentally doing to replicate what looks to us like intelligence will help us understand the true nature of the brain or of human intelligence, hell if I know, but what I feel most strongly about is this: I do not believe LLMs are replicating some portion of human intelligence. They are very obviously neither a subset or superset or particularly close to either. They are some weird entity that overlaps in other ways we don't fully comprehend yet.

pebble

Complete hyperbole.

lolinder

I see a difference between seeing them as valuable in their current state vs being "bullish about LLMs" in the stock market sense.

The big problem with being bullish in the stock market sense is that OpenAI isn't selling the LLMs that currently exist to their investors, they're selling AGI. Their pitch to investors is more or less this:

> If we accomplish our goal we (and you) will have infinite money. So the expected value of any investment in our technology is infinite dollars. No, you don't need to ask what the odds are of us accomplishing our goal, because any percent times infinity is infinity.

Since OpenAI and all the founders riding on their coat tails are selling AGI, you see a natural backlash against LLMs that points out that they are not AGI and show no signs of asymptotically approaching AGI—they're asymptotically approaching something that will be amazing and transformative in ways that are not immediately clear, but what is clear to those who are watching closely is that they're not approaching Altman's promises.

The AI bubble will burst, and it's going to be painful. I agree with the author that that is inevitable, and it's shocking how few people see it. But also, we're getting a lot of cool tech out of it and plenty of it is being released into the open and heavily commoditized, so that's great!

lostmsu

I think that people who don't believe LLMs to be AGI are not very good at Venn diagrams. Because they certainly are artificial, general, and intelligent according to any dictionary.

vunderba

Good grief. You are deeply confused and/or deeply literal. That's not the accepted definition of AGI in any sense. One does not evaluate each word has an isolated component for testing the truth of a statement in an open compound word. Does your "living room" have organs?

For your edification:

https://en.wikipedia.org/wiki/Artificial_general_intelligenc...

mplanchard

By this argument the DEA is the FBI because they are also a federal bureau that does investigations.

sarchertech

That’s not how language works.

hansmayer

Oh, the irony of using the verb "believe" in the same sentence with Venn diagrams... :)

zetazzed

I feel like LLMs are the same as the leap from "world before web search" to "world after web search." Yeah, in google, you get crap links for sure, and you have to wade through salesy links and random blogs. But in the pre-web-search world, your options were generally "ask a friend who seems smart" or "go to the library for quite a while," AND BOTH OF THOSE OPTIONS HAD PLENTY OF ISSUES. I found a random part in an old arduino kit I bought years ago, and GPT-4o correctly identified it and explained exactly how to hook it up and code for it to me. That is frickin awesome, and it saves me a ton of time and leads me to reuse the part. I used DeepResearch to research car options that fit my exact needs, and it was 100% spot on - multiple people have suggested models that DeepReearch did not identify that would be a fit, but every time I dig in, I find that DeepResearch was right and the alternative actually had some dealbreaker I had specified. Etc., etc.

In the 90s, Robert Metcalfe infamously wrote "Almost all of the many predictions now being made about 1996 hinge on the Internet’s continuing exponential growth. But I predict the Internet, which only just recently got this section here in InfoWorld, will soon go spectacularly supernova and in 1996 catastrophically collapse." I feel like we are just hearing LLM versions of this quote over and over now, but they will prove to be equally accurate.

mdp2021

> versions of this quote

Generic. For the Internet, more complex questions would have been "What are the potential benefits, what the potential risks, what will grow faster" etc. The problem is not the growth but what that growth means. For LLMs, the big clear question is "will they stop just being LLMs, and when will they". Progress is seen, but we seek a revolution.

jryan49

It would be fine if it were sold that way, but there is so much hype. We're told that it's going to replace all of us and put us all out our jobs. They set the expectations so high. Like remember OpenAI showing a video of it doing your taxes for you? Predictions that super-intelligent AI is going to radically transform society faster than we can keep up? I think that's where most of the backlash is coming from.

autoexec

> We're told that it's going to replace all of us and put us all out our jobs.

I think this is the source of a lot of the hype. There are people salivating at the thought of no longer needing to employ the peasant class. They want it so badly that they'll say anything to get more investment in LLMs even if it might only ever allow them to fire a fraction of their workers, and even if their products and services suffer because the output they get with "AI" is worse than what the humans they throw away were providing.

They know they're overselling it, but they're also still on their knees praying that by some miracle their LLMs trained on the collective wisdom of facebook and youtube comments will one day gain actual intelligence and they can stop paying human workers.

In the meantime, they'll shove "AI" into everything they can think of for testing and refinement. They'll make us beta test it for them. They don't really care if their AI makes your customer service experience go to shit. They don't care if their AI screws up your bill. They don't care if their AI rejects your claims or you get denied services you've been paying for and are entitled to. They don't care if their AI unfairly denies you parole or mistakenly makes you the suspect of a crime. They don't care if Dr. Sbaitso 2.0 misdiagnoses you. Your suffering is worth it to them as long as they can cut their headcount by any amount and can keep feeding the AI more and more information because just maybe with enough data one day their greatest dream will become reality, and even if that never happens a lot of people are currently making massive amounts of money selling that lie.

The problem is that the bubble will burst eventually. The more time goes by and AI doesn't live up to the hype the harder that hype becomes to sell. Especially when by shoving AI into everything they're exposing a lot of hugely embarrassing shortcomings. Repeating "AI will happen in just 10 more years" gives people a lot of time to make money and cash out though.

On the plus side, we do get some cool toys to play with and the dream of replacing humans has sparked more interest in robotics so it's not all bad.

tempestn

Yeah, it won't do your taxes for you, but it can sure help you do them yourself. Probably won't put you out of your job either, but it might help you accomplish more. Of course, one result of people accomplishing more in less time is that you need fewer people to do the same amount of work - so some jobs could be lost. But it's also possible that for the most part instead, more will be accomplished overall.

codr7

People frame that like it's something we gain, efficiency, as if before we were wasting time by thinking for ourselves. I get that they can do certain things better, I'm not sure that delegating to them is free of charge. We're paying something, losing something. Probably learning and fulfillment. We become increasingly dependent on machines to do anything.

Something important happened when we turned the tables around, I don't feel it gets the credit it should. It used to be humans telling machines what to do. Now we're doing the opposite.

CamperBob2

If it had access to my books, a current-gen frontier LLM most certainly could do my taxes. I don't understand that entire line of reasoning.

hansmayer

Forget OpenAI ChatGPT doing your taxes for you. Now Gemini will write up your sales slides about Gouda cheese, stating wrongly in the process that gouda makes up about 50% of all cheese consumption worldwide :) These use-cases are getting more useful by the day ;)

gilbetron

I mean, it's been like 3 years. 3 years after the web came out was barely anything. 3 years after the first GPU was cool, but not that cool. The past three years in LLMs? Insane.

Things could stall out and we'll have bumps and delays ... I hope. If this thing progresses at the same pace, or speeds up, well ... reality will change.

Or not. Even as they are, we can build some cool stuff with them.

MostlyStable

My experience (almost exclusively Claude), has just been so different that I don't know what to say. Some of the examples are the kinds of things I explicitly wouldn't expect LLMs to be particularly good at so I wouldn't use them for, and others, she says that it just doesn't work for her, and that experience is just so different than mine that I don't know how to respond.

I think that there are two kinds of people who use AI: people who are looking for the ways in which AIs fail (of which there are still many) and people who are looking for the ways in which AIs succeed (of which there are also many).

A lot of what I do is relatively simple one off scripting. Code that doesn't need to deal with edge cases, won't be widely deployed, and whose outputs are very quickly and easily verifiable.

LLMs are almost perfect for this. It's generally faster than me looking up syntax/documentation, when it's wrong it's easy to tell and correct.

Look for the ways that AI works, and it can be a powerful tool. Try and figure out where it still fails, and you will see nothing but hype and hot air. Not every use case is like this, but there are many.

-edit- Also, when she says "none of my students has ever invented references that just don't exist"...all I can say is "press X to doubt"

bluefirebrand

> Look for the ways that AI works, and it can be a powerful tool. Try and figure out where it still fails, and you will see nothing but hype and hot air. Not every use case is like this, but there are many.

The problem is that I feel I am constantly being bombarded by people bullish on AI saying "look how great this is" but when I try to do the exact same things they are doing, it doesn't work very well for me

Of course I am skeptical of positive claims as a result.

MostlyStable

I don't know what you are doing or why it's failed. Maybe my primary use cases really are in the top whatever percentile for AI usefulness, but it doesn't feel like it. All I know is that frontier models have already been good enough for more than a year to increase my productivity by a fair bit.

nonchalantsui

Your use case is in fact in the top whatever percentile for AI usefulness. Short simple scripting that won't have to be relied on due to never being widely deployed. No large codebase it has to comb through, no need for thorough maintenance and update management, no need for efficient (and potentially rare) solutions.

The only use case that would beat yours is the type of office worker that cannot write professional sounding emails but has to send them out regularly manually.

turtletontine

I literally had a developer of an open source package I’m working with tell me “yeah that’s a known problem, I gave up on trying to fix it. You should just ask ChatGPT to fix it, I bet it will immediately know the answer.”

Annoying response of course. But I’d never used an LLM to debug before, so I figured I’d give it a try.

First: it regurgitated a bunch of documentation and basic debugging tips, which might have actually been helpful if I had just encountered this problem and had put no thought into debugging it yet. In reality, I had already spent hours on the problem. So not helpful

Second: I provided some further info on environment variables I thought might be the problem. It latched on to that. “Yes that’s your problem! These environment variables are (causing the problem) because (reasons that don’t make sense). Delete them and that should fix things.” I deleted them. It changed nothing.

Third: It hallucinated a magic numpy function that would solve my problem. I informed it this function did not exist, and it wrote me a flowery apology.

Clearly AI coding works great for some people, but this was purely an infuriating distraction. Not only did it not solve my problem, it wasted my time and energy, and threw tons of useless and irrelevant information at me. Bad experience.

mystified5016

The biggest thing I've found is that if you give any hint at all as to what you think the problem is, the LLM will immediately and enthusiastically agree, no matter how wildly incorrect your suggestion is.

If I give it all my information and add "I think the problem might be X, but I'm not sure", the LLM always agrees that the problem is X and will reinterpret everything else I've said to 'prove' me right.

Then the conversation is forever poisoned and I have to restart an entirely new chat from scratch.

98% of the utility I've found in LLMs is getting it to generate something nearly correct, but which contains just enough information for me to go and Google the actual answer. Not a single one of the LLMs I've tried have been any practical use editing or debugging code. All I've ever managed is to get it to point me towards a real solution, none of them have been able to actually independently solve any kind of problem without spending the same amount of time and effort to do it myself.

dale_glass

On the other hand, when it works it's darn near magic.

I spent like a week trying to figure out why a livecd image I was working on wasn't initializing devices correctly. Read the docs, read source code, tried strace, looked at the logs, found forums of people with the same problem but no solution, you know the drill. In desperation I asked ChatGPT. ChatGPT said "Use udevadm trigger". I did. Things started working.

For some problems it's just very hard to express them in a googleable form, especially if you're doing something weird almost nobody else does.

sdenton4

This morning I was using an LLM to develop some SQL queries against a database it had never seen before. I gave it a starting point, and outlined what I wanted to do. It proposed a solution, which was a bit wrong, mostly because I hadn't given it the full schema to work with. Small nudges and corrections, and we had something that worked. From there, I iterated and added more features to the outputs.

At many points, the code would have an error; to deal with this, I just supply the error message, as-is to the LLM, and it proposes a fix. Sometimes the fix works, and sometimes I have to intervene to push the fix in the right direction. It's OK - the whole process took a couple hours, and probably would have been a whole day if I were doing it on my own, since I usually only need to remember anything about SQL syntax once every year or three.

A key part of the workflow, imo, was that we were working in the medium of the actual code. If the code is broken, we get an error, and can iterate. Asking for opinions doesn't really help...

pikelet

This is exactly my experience, every time! If I offer it the slightest bit of context it will say 'Ah! I understand now! Yes, that is your problem, …' and proceed to spit out some non-existent function, sometimes the same one it has just suggested a few prompts ago which we already decided doesn't exist/work. And it just goes on and on giving me 'solutions' until I finally realise it doesn't have the answer (which it will never admit unless you specifically ask it to – forever looking to please) and give up.

ajkdhcb2

My experiences have all been like this too. I am puzzled by how some people say it works for them

danielmarkbruce

Exactly which model did you use? You talk about LLMs as though they are all the same.

Alien 1: I gave Jeff Dean a giant complex system to build, he crushed it! Humans are so smart.

Alien 2: I gave a random human a simple programming problem and he just stared at me like an idiot. Humans suck.

kerkeslager

It's worse than that.

I see people say, "Look how great this is," and show me an example, and the example they show me is just not great. We're literally looking at the same thing, and they're excited that this LLM can do a college grads's job to the level of a third grader, and I'm just not excited about that.

thisissomething

What changed my point of view regarding LLMs was when I realized how crucial context is in increasing output quality.

Treat the AI as a freelancer working on your project. How would you ask a freelancer to create a Kanban system for you? By simply asking "Create a Kanban system", or by providing them a 2-3 pages document describing features, guidelines, restrictions, requirements, dependencies, design ethos, etc?

Which approach will get you closer to your objective?

The same applies to LLM (when it comes to code generation). When well instructed, it can quickly generate a lot of working code, and apply the necessary fixes/changes you request inside that same context window.

It still can't generate senior-level code, but it saves hours when doing grunt work or prototyping ideas.

"Oh, but the code isn't perfect".

Nor is the code of the average jr dev, but their codes still make it to production in thousands of companies around the world.

joquarky

I see it as a knowledge multiplier. You still need to know enough about the subject to verify the output.

kristopolous

They're sophisticated tools at much as any other software.

About 2 weeks ago I started on a streaming markdown parser for the terminal because none really existed. I've switched to human coding now but the first version was basically all llm prompting and a bunch of the code is still llm generated (maybe 80%). It's a parser, those are hard. There's stacks, states, lookaheads, look behinds, feature flags, color spaces, support for things like links and syntax highlighting... all forward streaming. Not easy

https://github.com/kristopolous/Streamdown

mattigames

Exactly, thanks to all the money involved in such hype the incentives will always skew towards over spamming naive optimism about it's features.

Sohcahtoa82

> LLMs are almost perfect for this. It's generally faster than me looking up syntax/documentation, when it's wrong it's easy to tell and correct.

Exactly this.

I once had a function that would generate several .csv reports. I wanted these reports to then be uploaded to s3://my_bucket/reports/{timestamp}/.csv

I asked ChatGPT "Write a function that moves all .csv files in the current directory to and old_reports directory, calls a create_reports function, then uploads all the csv files in the current directory to s3://my_bucket/reports/{timestamp}/.csv with the timestamp in YYYY-MM-DD format""

And it created the code perfectly. I knew what the correct code would look like, I just couldn't be fucked to look up the exact calls to boto3, whether moving files was os.move or os.rename or something from shutil, and the exact way to format a datetime object.

It created the code far faster that I would have.

Like, I certainly wouldn't use it to write a whole app, or even a whole class, but individual blocks like this, it's great.

taberiand

I have been saying this about llms for a while - if you know what you want, how to ask for it, and what the correct output will look like, LLMs are fantastic (at least Claude Sonnet is). And I mean that seriously, they are a highly effective tool for productive development for senior developers.

I use it to produce whole classes, large sql queries, terraform scripts, etc etc. I then look over that output, iterate on it, adjust it to my needs. It's never exactly right at first, but that's fine - neither is code I write from scratch. It's still a massive time saver.

selcuka

> they are a highly effective tool for productive development for senior developers

I think this is the most important bit many people miss. It is advertised as an autonomous software developer, or something that can take a junior to senior levels, but that's just advertising.

It is actually most useful for senior developers, as it does the grunt work for them, while grunt work is actually useful work for a junior developer as a learning tool.

thisissomething

"How to ask for it" is the most important part. As soon as you realize that you have to provide the AI with CONTEXT and clear instructions (you know, like a top-notch story card on a scrum board), the quality and assertiveness of the results increase a LOT.

Yes, it WON'T produce senior-level code for complex tasks, but it's great at tackling down junior to mid-level code generation/refactoring, with minor adjustments (just like a code review).

So, it's basically the same thing as having a freelancer jr dev at your disposal, but it can generate working code in 5 min instead of 5 hours.

simonw

I've had so many cases exactly like your example here. If you build up an intuition that knows that e.g. Claude 3.7 Sonnet can write code that uses boto3, and boto3 hasn't had any breaking changes that would affect S3 usage in the past ~24 months, you can jump straight into a prompt for this kind of task.

It doesn't just save me a ton of time, it results in me building automations that I normally wouldn't have taken on at all because the time spent fiddling with os.move/boto3/etc wouldn't have been worthwhile compared to other things on my plate.

fumeux_fume

I think you have an interesting point of view and I enjoy reading your comments, but it sounds a little absurd and circular to discount people's negativity about LLMs simply because it's their fault for using an LLM for something it's not good at. I don't believe in the strawman characterization of people giving LLMs incredibly complex problems and being unreasonably judgemental about the unsatisfactory results. I work with LLMs every day. Companies pay me good money to implement reliable solutions that use these models and it's a struggle. Currently I'm working with Claude 3.5 to analyze customer support chats. Just as many times as it makes impressive, nuanced judgments it fails to correctly make simple trivial judgements. Just as many times as it follows my prompt to a tee, it also forgets or ignores important parts of my prompt. So the problem for me is it's incredibly difficult to know when it'll succeed and when it'll fail for a given input. Am I unreasonable for having these frustrations? Am I unreasonable for doubting the efficacy of LLMs to address problems that many believe are already solved? Can you understand my frustration to see people characterize me as such because ChatGPT made a really cool image for them once?

cjf101

It's a weird circle with these things. If you _can't_ do the task you are using the LLM for, you probably shouldn't.

But if you can do the task well enough to at least recognize likely-to-be-correct output, then you can get a lot done in less time than you would do it without their assistance.

Is that worth the second order effects we're seeing? I'm not convinced, but it's definitely changed the way we do work.

rurp

I think this points to much of the disagreement over LLMs. They can be great at one-off scripts and other similar tasks like prototypes. Some folks who do a lot of that kind of work find the tools genuinely amazing. Other software engineers do almost none of that and instead spend their coding time immersed in large messy code bases, with convoluted business logic. Looping an LLM into that kind of work can easily be net negative.

danielmarkbruce

Maybe they are just lazy around tooling. Cursor with Claude works well for project sizes much larger than I expected but it takes a little set up. There is a chasm between engineers who use tools well and who do not.

rurp

I don't really agree with framing it as lazy. Adding more tools and steps to your workflow isn't free, and the cost/benefit of each tool will be different for everyone. I've lost count of how many times someone has evangelized a software tool to me, LLM or not. Once in a while they turn out to be useful and I incorporate them into my regular workflow, but far more often I don't. This could be for any number of reasons like it does not fit with my workflow well, or I already have a better way of doing whatever it does, or the tool adds more friction than it remove.

I'm sure spending more time fiddling with the setup of LLM tools can yield better results, but that doesn't mean that it will be worth it for everyone. In my experience LLMs fail often enough at modestly complex problems that they are more hassle than benefit for a lot of the work I do. I'll still use them for simple tasks, like if I need some standard code in a language I'm not too familiar with. At the same time, I'm not at all surprised that others have a different experience and find them useful for larger projects they work on.

glimshe

I'm tired of people bashing LLMs. AI is so useful in my daily work that I can't understand where these people are coming from. Well, whatever...

As you said, examples where I wouldn't expect LLMs to be good at from people who dismiss the scenarios where LLMs are great at. I don't want to convince anyone, to be honest - I just want to say they are incredibly useful for me and a huge time saver. If people don't want to use LLMs, it's fine for me as I'll have an edge over them in the market. Thanks for the cash, I guess.

joquarky

I'm starting to resign myself to just enjoying the benefits and let those who can't evolve fall behind.

I'm growing weary of trying to help people use these tools properly.

hbogert

I'm pretty weary of all the people telling me I'm "holding it wrong"

sidewndr46

every time someone brings up "Code that doesn't need to deal with edge cases" I like to point at that such code is not likely to be used for anything that matters

brulard

Oh, but it is. I can have code that does something nice to have, needs not to be 100% correct etc. For example, I want a background for my playful webpage. Maybe a WebGL shader. It might not be exactly what I asked for, but I can have it in few minutes up and running. Or some non-critical internal tools - like scraper for lunch menus from restaurants around office. Or simple parking spot sharing app. Or any kind of prototypes which in some companies are being created all the time. There are so many use cases that are forgiving regarding correctness and are much more sensitive to development effort.

nonchalantsui

There is a cost burden to not being 100% correct when it comes to programming. You simply have chosen to ignore that burden, but it still exists for others. Whether it's for example a percent of your users now getting stalled pages due to the webgl shader, or your lunch scraper ddosing local restaurants. They aren't actually forgiving regarding correctness.

Which is fine for actual testing you're doing internally, since that cost burden is then remedied by you fixing those issues. However, no feature is as free as you're making it sound, not even the "nice to have" additions that seem so insignificant.

mplanchard

I’m always amazed in these discussions how many people apparently have jobs doing a bunch of stuff that either doesn’t need to be correct or is simple enough that it doesn’t require any significant amount of external context.

joquarky

I'm always amazed by the arrogance that if you can't hack it, then everyone else can't either.

grg0

In other words, people preaching LLMs are noobs with no real stake in what they are doing. But you can't really call people noobs these days.

Panzer04

Is such code hard to write in the first place?

Automating the easy 80% sounds useful, but in practice I'm not convinced that's all that helpful. Reading and putting together code you didn't write is hard enough to begin with.

Ferret7446

It's not hard, but it's time consuming.

umanwizard

You never write code to automate something in your personal environment, or to analyze some one-off data, rather than to actually go into production?

skydhash

Not GP, but more often than not I reach out to tools I already know (sed,awk,python) or read the docs which don't take that much time if you know how to get to the sections you need.

SoftTalker

I write code like that all the time. It's used for very specific use cases, only by myself or something I've also written. It's not exposed to random end users or inputs.

relaxing

> Also, when she says "none of my students has ever invented references that just don't exist"...all I can say is "press X to doubt"

I’ve never seen it from my students. Why do you think this? It’s trivial to pick a real book/article. No student is generating fake material whole cloth and fake references to match. Even if they could, why would they risk it?

vanviegen

Exactly. Lazy students just refer to vaguely related but existing material that they didn't read. Much better than llms! :-)

bell-cot

TBD whether that makes the effort to spot-check their references greater (does actually say what the student - explicitly or implicitly - claims it does?), or less (proving the non-existence of an obscure references is proving a negative)?

simonw

The most interesting thing about this post is how it reinforces how terrible the usability of LLMs still is today:

"I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error. I Google for the alleged quote, it doesn't exist. They reference a scientific publication, I look it up, it doesn't exist."

To experienced LLM users that's not surprising at all - providing citations, sources for quotes, useful URLs are all things that they are demonstrably terrible at.

But it's a computer! Telling people "this advanced computer system cannot reliably look up facts" goes against everything computers have been good at for the last 40+ years.

roxolotl

One of the things that’s hard about these discussions is that behind them is an obscene amount of money and hype. She’s not responding to realists like you. She’s responding to the bulls. The people saying these tools will be able to run the world by the end of this year, maybe next.

And that’s honestly unfair to you since you do awesome realistic and level headed work with LLM.

But I think it’s important when having discussions to understand the context within which they are occurring.

Without the bulls she might very well be saying what you are in your last paragraph. But because of the bulls the conversation becomes this insane stratified nonsense.

tootie

Possibly a reaction to Bill Gates recent statements that it will begin replacing doctors and teachers. It's ridiculous to say LLMs are incredibly useful and valuable. It's highly dubious to think they can be trusted with actual critical tasks without careful supervision.

simonw

I think it's ridiculous to say LLMs are NOT "incredibly useful and valuable", but I 100% agree that it's "highly dubious to think they can be trusted with actual critical tasks without careful supervision".

polytely

It's honestly so scary because Sam Altman and his ilk would gladly replace all teachers with LLM's right now, because it makes their lines go up, doesn't matter to them that would result in a generation of dumb people in like 10 years. Honestly would just create more LLM users for them to sell to so its a win win I guess, but it completely fucks up our world.

gorgoiler

“Replacing teachers” is of course laughable.

Teachers are there to observe and manage behavior, resolve conflict, identify psychological risks and get in front of fixing them, set and maintain a positive tone (“setting the weather”), lift pupils up to that tone, and to summarize, assess and report on progress.

They are also there to grind through papers, tests, lesson plans, reports, marking, and letter writing. All of that will get easier with machine assistance.

Teaching is one of the most human-nature centric jobs in the world and will be the last to go. If AI can help focus the role of teacher more on using expert people skills and less on drudgery it will hopefully even improve the prospects of teaching as a career, not eliminate it.

Davidzheng

Capability today and next year will probably be very different in reliability

ANighRaisin

This isn't really a problem in tool-assisted LLMs.

Use google AI studio with search grounding. Provides correct links and citations every time. Other companies have similar search modes, but you have to enable those settings if you want good results.

IncreasePosts

Okay, but it's weird there is a "don't lie to me" button.

owenpalmer

The "don't lie to me" button for a human is asking them, "where did you learn that fact?"

Grounding isn't very different from that.

knowaveragejoe

How would that ever work? The only thing you can do is continue to refine high quality data sets to train on. The rate of hallucination only trends downwards on the high end models as they improve in various ways.

bradfox2

It's wikipedia in the 00s all over again being preached by roughly the same age and social demographic.

dyauspitr

I’ve never had it give me bullshit citations when I specifically say look it up. It just gives me a clickable link.

Davidzheng

This sort of concrete verifiable hallucinations can be trained out and probably will be soon

gotimo

people have said this since chatGPT first released

josephwegner

I become more and more convinced with each of these tweets/blogs/threads that using LLMs well is a skill set akin to using Search well.

It’s been a common mantra - at least in my bubble of technologists - that a good majority of the software engineering skill set is knowing how to search well. Knowing when search is the right tool, how to format a query, how to peruse the results and find the useful ones, what results indicate a bad query you should adjust… these all sort of become second nature the longer you’ve been using Search, but I also have noticed them as an obvious difference between people that are tech-adept vs not.

LLMs seems to have a very similar usability pattern. They’re not always the right tool, and are crippled by bad prompting. Even with good prompting, you need to know how to notice good results vs bad, how to cherry-pick and refine the useful bits, and have a sense for when to start over with a fresh prompt. And none of this is really _hard_ - just like Search, none of us need to go take a course on prompting - IMO folks jusr need to engage with LLMs as a non-perfect tool they are learning how to wield.

The fact that we have to learn a tool doesn’t make it a bad one. The fact that a tool doesn’t always get it 100% on the first try doesn’t make it useless. I strip a lot of screws with my screwdriver, but I don’t blame the screwdriver.

mexicocitinluez

Agree. It's a tool like anything else.

On a side note, this lady is a fraud: https://www.youtube.com/watch?v=nJjPH3TQif0&themeRefresh=1

sitkack

I don't know if she is a fraud, but she has definitely greatly amplified Rage Bait Farming and talking about things that are far outside of her domain of expertise as if she were an expert.

In no way am I credentialing her, lots of people can make astute observations about things they weren't trained in, but she both mastered sounding authoritative and at the same time, presenting things to go the most engagement possible.

joquarky

I've frequently heard that once you get sucked into the YouTube algorithm, you have to keep making content to maintain rankings.

This trap reminds me of the Perry Bible Fellowship comic "Catch Phrase" which has been removed for being too dark but can still be found with a search.

refulgentis

Thanks for sharing this. I was heavily involved in graduate physics when I was in school, and was very worried about what direction shed take after the first big viral vid "telling her story." I wasn't sure it was well understood, or even understood at all, how blinkered her...viewpoint?...was.

Spivak

LLMs function as a new kind of search engine, one that is amazingly useful because it can surface things that traditional search could never dream of. Don't know the name of a concept, just describe it vaguely and the LLM will pull out the term. Are you not sure what kind of information even goes into a cover letter or what's customary to talk about? Ask an LLM to write you one, it will be bland and generic sure but that's not the point because you now know the "shape" of what they're supposed to look like and that's great for getting unblocked. Have you stumbled across a passage of text that's almost English but you're not really sure what to look up to decipher it? Paste it into the LLM and it will tell you that it's "Early Modern English" which you can look up to confirm and get a dictionary for.

cainxinth

Broader than that, it’s critical thinking skills. Using search and LLMs requires analyzing the results and being able to separate what is accurate and useful from what isn’t.

giraffe_lady

From my experience this is less an application of critical skills and more a domain knowledge check. If you know enough about the subject to have accumulated heuristics for correctness and intuition for "lgtm" in the specific context, then it's not very difficult or intellectually demanding to apply them.

If you don't have that experience in this domain, you will spend approximately as much effort validating output as you would have creating it yourself, but the process is less demanding of your critical skills.

sitkack

No, it is critical thinking skills, because the LLMs can teach you the domain, but you have to then understand what they are saying enough to tell if they are bsing you.

> you don't have that experience in this domain, you will spend approximately as much effort validating output as you would have creating it yourself,

Not true.

LLMs are amazing tutors. You have to use outside information, they test you, you test them, but they aren't pathologically wrong in the way that they are trying to do a Gaussian magic smoke psyop against you.

cainxinth

Knowledge certainly helps, but I’m talking about something more fundamental: your bullshit detector.

Even when you lack subject matter expertise about something, there are certain universal red flags that skeptics key in on. One of the biggest ones is: “There’s no such thing as a free lunch” and its corollary: “If it sounds too good to be true, it probably is.”

jpc0

I'm not so sure about that. I was really Anti llm in the previous generation of LLMs (GPT3.5/4) but never stopped trying them out. I just found the results to be subpar.

Since reasoning models came about I've been significantly more bullish on them purely because they are less bad. They are still not amazing but they are at a poiny where I feel like including them in my workflow isn't an impediment.

They can now reliably complete a subset of tasks without me needing to rewrite large chunks of it myself.

They are still pretty terrible at edge cases ( uncommon patterns / libraries etc ) but when on the beaten path they can actually pretty decently improve productivity. I still don't think 10x ( well today was the first time I felt a 10x improvement but I was moving frontend code from a custom framework to react, more tedium than anything else in that and the AI did a spectacular job ).

retrac

You're using them wrong. Everyone is though I can't fault you specifically. Chatbot is like the worst possible application of these technologies.

Of late, deaf tech forums are taken over by language model debates over which works best for speech transcription. (Multimodal language models are the the state of the art in machine transcription. Everyone seems to forget that when complaining they can't cite sources for scientific papers yet.) The debates are sort of to the point that it's become annoying how it has taken over so much space just like it has here on HN.

But then I remember, oh yeah, there was no such thing as live machine transcription ten years ago. And now there is. And it's going to continue to get better. It's already good enough to be very useful in many situations. I have elsewhere complained about the faults of AI models for machine transcription - in particular when they make mistakes they tend to hallucinate something that is superficially grammatical and coherent instead - but for a single phrase in an audio transcription sporadically that's sometimes tolerable. In many cases you still want a human transcriber but the cost of that means that the amount of transcription needed can never be satisfied.

It's a revolutionary technology. I think in a few years I'm going have glasses that continuously narrate the sounds around me and transcribe speech and it's going to be so good I can probably "pass" as a hearing person in some contexts. It's hard not to get a bit giddy and carried away sometimes.

Shank

> You're using them wrong. Everyone is though I can't fault you specifically.

If everyone is using them wrong, I would argue that says something more about them than the users. Chat-based interfaces are the thing that kicked LLMs into the mainstream consciousness and started the cycle/trajectory we’re on now. If this is the wrong use case, everything the author said is still true.

There are still applications made better by LLMs, but they are a far cry from AGI/ASI in terms of being all-knowing problem solvers that don’t make mistakes. Language tasks like transcription and translation are valuable, but by no stretch do they account for the billions of dollars of spend on these platforms, I would argue.

minimaxir

LLM providers actually have an incentive not to write literature on how to use LLM optimally, as that causes friction which means less engagement/money spent on the provider. There's also the typical tin-foil hat explanation of "it's bad so you'll keep retrying it to get the LLM to work which means more money for us."

1shooner

Isn't this more a product of the hype though? At worst you're describing a product marketing mistake, not some fundamental shortcoming of the tech. As you say "chat" isn't a use case, it's a language-based interface. The use case is language prediction, not an encyclopedic storage and recall of facts and specific quotes. If you are trying to get specific facts out of an LLM, you'd better be using it as an interface that accesses some other persistent knowledge store, which has been incorporated into all the major 'chat' products by now.

hnfong

Surely you're not saying everyone is using them wrong. Let's say only 99% of them are using LLMs wrong, and the remaining 1% creates $100B of economic value. That's $100B of upside.

Yes the costs of training AI models these days are really high too, but now we're just making a quantitative argument, not a qualitative one.

The fact that we've discovered a near-magical tech that everyone wants to experiment with in various contexts, is evidence that the tech is probably going somewhere.

Historically speaking, I don't think any scientific invention or technology has been adopted and experimented with so quickly and on such a massive scale as LLMs.

It's crazy that people like you dismiss the tech simply because people want to experiment with it. It's like some of you are against scientific experimentation for some reason.

hinkley

“If everything smells like shit, check your shoe.”

sidewndr46

If the goal is to layoff all the customer support and trap the customer in a tarpit with no exit, LLMs are likely the best choice.

Hikikomori

US can have fun with that. In EU well likely get laws that force companies to let us talk to a human if it gets bad enough.

whazor

I think all the technology is already in place. There are already smart glasses with tiny text displays. Also smartphones have more than enough processing capacity to handle live speech transcription.

azemetre

What is the best open source live machine transcription tools would you say? Know of any guides that make it easy to setup locally if so?

Marsymars

I’ve had the exact same vibes around chatbots as an application of LLMs. But other than translation/transcription, what else is there?

simoncion

> ...there was no such thing as live machine transcription ten years ago.

What? Then what the hell do you call Dragon NaturallySpeaking and other similar software in that niche?

retrac

Thru the 90s and 00s and well into the 10s I generally dismissed speech recognition as useless to me, personally.

I have a minor speech impediment because of the hearing loss. They never worked for me very well. I don't speak like a standard American - I have a regional accent and I have a speech impediment. Modern speech recognition doesn't seem to have a problem with that anymore.

IBM's ViaVoice from 1997 in particular was a major step. It was really impressive in a lot of ways but the accuracy rate was like 90 - 95% which in practice means editing major errors with almost every sentence. And that was for people who could speak clearly. It never worked for me very well.

You also needed to speak in an unnatural way [pause] comma [pause] and it would not be fair to say that it transcribed truly natural speech [pause] full stop

Such voice recognition systems before about 2016 also required training on the specific speaker. You would read many pages of text to the recognition engine to tune it to you specifically.

It could not just be pointed at the soundtrack to an old 1980s TV show then produce a time-sync'd set of captions accurate enough to enjoy the show. But that can be done now.

simoncion

So, you started by saying

> ...there was no such thing as live machine transcription ten years ago.

Now you're saying that live machine transcription existed thirty years ago, but it has gotten substantially better in the intervening decades.

I agree with your amended claim.

crazygringo

If there's one common thread across LLM criticisms, it's that they're not perfect.

These critics don't seem to have learned the lesson that the perfect is the enemy of the good.

I use ChatGPT all the time for academic research. Does it fabricate references? Absolutely, maybe about a third of the time. But has it pointed me to important research papers I might never have found otherwise? Absolutely.

The rate of inaccuracies and falsehoods doesn't matter. What matters is, is it saving you time and increasing your productivity. Verifying the accuracy of its statements is easy. While finding the knowledge it spits out in the first place is hard. The net balance is a huge positive.

People are bullish on LLM's because they can save you days' worth of work, like every day. My research productivity has gone way up with ChatGPT -- asking it to explain ideas, related concepts, relevant papers, and so forth. It's amazing.

Terr_

> Verifying the accuracy of its statements is easy.

For single statements, sometimes, but not always. For all of the many statements, no. Having the human attention and discipline to mindfully verify every single one without fail? Impossible.

Every software product/process that assumes the user has superhuman vigilance is doomed to fail badly.

> Automation centaurs are great: they relieve humans of drudgework and let them focus on the creative and satisfying parts of their jobs. That's how AI-assisted coding is pitched [...]

> But a hallucinating AI is a terrible co-pilot. It's just good enough to get the job done much of the time, but it also sneakily inserts booby-traps that are statistically guaranteed to look as plausible as the good code (that's what a next-word-guessing program does: guesses the statistically most likely word).

> This turns AI-"assisted" coders into reverse centaurs. The AI can churn out code at superhuman speed, and you, the human in the loop, must maintain perfect vigilance and attention as you review that code, spotting the cleverly disguised hooks for malicious code that the AI can't be prevented from inserting into its code. As qntm writes, "code review [is] difficult relative to writing new code":

-- https://pluralistic.net/2025/03/18/asbestos-in-the-walls/

crazygringo

> Having the human attention and discipline to mindfully verify every single one without fail? Impossible.

I mean, how do you live life?

The people you talk to in your life say factually wrong things all the time.

How do you deal with it?

With common sense, a decent bullshit detector, and a healthy level of skepticism.

LLM's aren't calculators. You're not supposed to rely on them to give perfect answers. That would be crazy.

And I don't need to verify "every single statement". I just need to verify whichever part I need to use for something else. I can run the code it produces to see if it works. I can look up the reference to see if it exists. I can Google the particular fact to see if it's real. It's really very little effort. And the verification is orders of magnitude easier and faster than coming up with the information in the first place. Which is what makes LLM's so incredibly helpful.

abrichr

> I just need to verify whichever part I need to use for something else. I can run the code it produces to see if it works. I can look up the reference to see if it exists. I can Google the particular fact to see if it's real. It's really very little effort. And the verification is orders of magnitude easier and faster than coming up with the information in the first place. Which is what makes LLM's so incredibly helpful.

Well put.

Especially this:

> I can run the code it produces to see if it works.

You can get it to generate tests (and easy ways for you to verify correctness).

madethisnow

It's really funny how most anecdotes and comments about the utility and value of interacting with LLM's can be applied to anecdotes and comments about human beings themselves. Majority of people havent realized yet that consciousness is assumed by our society, and that we, in fact, don't know what it is or if we have it. Let alone prescribing another entity with it.

sc68cal

> Does it fabricate references? Absolutely, maybe about a third of the time

And you don't have concerns about that? What kind of damage is that doing to our society, long term, if we have a system that _everyone_ uses and it's just accepted that a third of the time it is just making shit up?

crazygringo

No, I don't. Because I know it does and it's incredibly easy to type something into Google Scholar and see if a reference exists.

Like, I can ask a friend and they'll mistakenly make up a reference. "Yeah, didn't so-and-so write a paper on that? Oh they didn't? Oh never mind, I must have been thinking of something else." Does that mean I should never ask my friend about anything ever again?

Nobody should be using these as sources of infallible truth. That's a bonkers attitude. We should be using them as insanely knowledgeable tutors who are sometimes wrong. Ask and then verify.

The net benefit is huge.

IncreasePosts

No, that doesn't mean you should never ask your friend things again if they make that mistake. But, if 30% of all their references are made up then you might start to question everything your friend says. And looking up references to every claim you're reading is not a productive use of time.

bjornsing

> And you don't have concerns about that? What kind of damage is that doing to our society, long term, if we have a system that _everyone_ uses and it's just accepted that a third of the time it is just making shit up?

Main problem with our society is that two thirds of what _everyone_ says is made up shit / motivated reasoning. The random errors LLMs make are relatively benign, because there is no motivation behind them. They are just noise. Look through them.

Davidzheng

I think a third of facts i say are false as stated and I do not think I'm worse than 30th percentile in humans at truthfulness

abenga

You are not a trusted authority relied on by millions and expected to make decisions for them, and you could choose not to say something you aren't sure that you know.

mijoharas

So, I've sometimes wondered about this.

Could it end up being a net benefit? will the realistic sounding but incorrect facts generated by A.I. make people engage with arguments more critically, and be less likely to believe random statements they're given?

Now, I don't know, or even think it is likely that this will happen, but I find it an interesting thought experiment.

aaronbaugher

That's hilarious; I had no idea it was that bad. And for every conscientious researcher who actually runs down all the references to separate the 2/3 good from the 1/3 bad, how many will just paste them in, adding to the already sky-high pile of garbage out there?

MisterKent

This. 100% this.

LLMs will spit out responses with zero backing with 100% conviction. People see citations and assume it's correct. We're conditioned for it thanks to....everything ever in history. Rarely do I need to check a wikipedia entry's source.

So why do people not understand that: this is absolutely going to pour jet fuel on misinformation in the world. And we as a society are allowed to hold a bar higher for what we'll accept get shoved down our throats by corporate overlords that want their VC payout.

crazygringo

> People see citations and assume it's correct.

The solution is to set expectations, not to throw away one of the most valuable tools ever created.

If you read a supermarket tabloid, do you think the stories about aliens are true? No, because you've been taught that tabloids are sensationalist. When you listen to campaign ads, do you think they're true? When you ask a buddy about geography halfway across the world, do you assume every answer they give is right?

It's just about having realistic expectations. And people tend to learn those fast.

> Rarely do I need to check a wikipedia entry's source.

I suggest you start. Wikipedia is full of citations that don't back up the text of the article. And that's when there are even citations to begin with. I can't count the number of times I've wanted to verify something on Wikipedia, and there either wasn't a citation, or there was one related to the topic but that didn't have anything related to the specific assertion being made.

madethisnow

people lie more

maple3142

I think many people are just not really good at dealing with "imperfect" tools. Different tools can have different success probability, let's call that probability p here. People typically use tool that have p=100%, or at least very close to it. But LLM is a tool that is far from that, so making use of it takes different approach.

Imagine there is an probabilistic oracle that can answer any question with a yes/no with success probability p. If p=100% or p=0% then it is apparently very useful. If p=50% then it is absolutely worthless. In other cases, such oracle can be utilized in different way to get the answer we want, and it is still a useful thing.

renewiltord

One of the magic things about engineering is that I can make usefulness out of unreliability. Voltage can fluctuate and I can transmit 1s and 0s, lines can fizz, machines can die, and I can reliably send video from one end to the other.

Unreliability is something we live in. It is the world. Controlling error, increasing signal over noise, extracting energy from the fluctuations. This is life, man. This is what we are.

I can use LLMs very effectively. I can use search engines very effectively. I can use computers.

Many others can’t. Imagine the sheer fortune to be born in the era where I was meant to be: tools transformative and powerful in my hands; useless in others’.

I must be blessed by God.

pixl97

Many people are trapped in black and white thinking. It's like they can only think in binary things that are not either good nor bad are heresy.

nextts

That is deep

biophysboy

Your point reminded me of Terrence Tao’s point that AI has a “plausibility problem”. When it can’t be accurate, it still disguises itself as accurate.

Its true success rate is by no means 100%, and sometimes is 0%, but it always tries to make you feel confident.

I’ve had to catch myself surrendering too much judgment to it. I worry a high school kid learning to write will have fewer qualms surrendering judgment

geuis

A scientific instrument that is unreliably accurate is useless. Imagine a kitchen scale that always gave +/- 50% every 3rd time you use it. Or maybe 5th time. Or 2nd.

So we're trying to use tools like this currently to help solve deeper problems and they aren't up to the task. This is still the point we need to start over and get better tools. Sharpening a bronze knife will never be as sharp or have the continuity as a steel knife. Same basic elements, very different material.

Extasia785

A bad analogy doesn't make a good argument. The best analogy for LLMs is probably a librarian on LSD in a giant library. They will point you in a direction if you have a question. Sometimes they will pull up the exact page you need, sometimes they will lead you somewhere completely wrong and confidently hand you a fantasy novel, trying to convince you it's a real science book.

It's completely up to your ability to both find what you need without them and verify the information they give you to evaluate their usefulness. If you put that on a matrix, this makes them useful in the quadrant of information that is both hard to find, but very easy to verify. Which at least in my daily work is a reasonable amount.

iambateman

I think people confuse the power of the technology with the very real bubble we’re living in.

There’s no question that we’re in a bubble which will eventually subside, probably in a “dot com” bust kind of way.

But let me tell you…last month I sent several hundred million requests to AI, as a single developer, and got exactly what I needed.

Three things are happening at once in this industry… (1) executives are over promising a literal unicorn with AGI, that is totally unnecessary for the ongoing viability of LLM’s and is pumping the bubble. (2) the technology is improving and delivery costs are changing as we figure out what works and who will pay. (3) the industry’s instincts are developing, so it’s common for people to think “AI” can do something it absolutely cannot do today.

But again…as one guy, for a few thousand dollars, I sent hundreds of millions of requests to AI that are generating a lot of value for me and my team.

Our instincts have a long way to go before we’ve collectively internalized the fact that one person can do that.

Hasu

> But let me tell you…last month I sent several hundred million requests to AI, as a single developer, and got exactly what I needed

There are 2.6 million seconds in a month. You are claiming to have sent hundreds of requests per second to AI.

iambateman

That's exactly what happened – I called the OpenAI API, using custom application code running on a server, a few hundred million times.

It is trivial for a server to send/receive 150 requests per second to the API.

This is what I mean by instincts...we're used to thinking of developers-pressing-keys as a fundamental bottleneck, and it still is to a point. But as soon as the tracks are laid for the AI to "work", things go from speed-of-human-thought to speed-of-light.

hadlock

A lot of people are feeding all the email and slack messages for entire companies through AI to classify sentiment (positive, negative, neutral etc), or summarize it for natural language search using a specific dictionary. You can process each message multiple ways for all sorts of things, or classify images. There's a lot of uses for the smaller cheaper faster llms

crazygringo

Yeah I'm curious now.

If you have a lot of GPU's and you're doing massive text processing like spam detection for hundreds of thousands of users, sure.

But "as a single developer", "value for me and my team"... I'm confused.

iambateman

I'm NDA'ed on the specifics, sorry.

In general terms, we had to call the OpenAI API X00,000,000 times for a large-scale data processing task. We ended up with about 2,000,000 records in a database, using data created, classified, and cleaned by the AI.

There were multiple steps involved, so each individual record was the result of many round trips between the AI and the server, and not all calls are 1-to-1 with a record.

None of this is rocket science, and I think any average developer could pull off a similar task given enough time...but I was the only developer involved in the process.

The end product is being sold to companies who benefit from the data we produced, hence "value for me and the team."

The real point is that generative AI can, under the right circumstances, create absurd amounts of "productivity" that wouldn't have been possible otherwise.

latemedium

My experience is starkly different. Today I used LLMs to:

1. Write python code for a new type of loss function I was considering

2. Perform lots of annoying CSV munging ("split this CSV into 4 equal parts", "convert paths in this column into absolute paths", "combine these and then split into 4 distinct subsets based on this field.." - they're great for that)

3. Expedite some basic shell operations like "generate softlinks for 100 randomly selected files in this directory"

4. Generate some summary plots of the data in the files I was working with

5. Not to mention extensive use in Cursor & GH Copilot

The tool (Claude 3.7 mostly, integrated with my shell so it can execute shell commands and run python locally) worked great in all cases. Yes I could've done most of it myself, but I personally hate CSV munging and bulk file manipulations and its super nice to delegate that stuff to an LLM agent

edit: formatting

kwertyoowiyop

These seem like fine use cases: trivial boilerplate stuff you’d otherwise have to search for and then munge to fit your exact need. An LLM can often do both steps for you. If it doesn’t work, you’ll know immediately and you can probably figure out whether it’s a quick fix or if the LLM is completely off-base.

zeroonetwothree

That’s fair but it’s totally different use cases than the linked post discusses.

signatoremo

The click baited title is “I genuinely don’t understand how some people are still bullish about LLM”.

I guess the author can understand now?

When something was impossible only 3 years ago, barely worked 2 years ago, but works well now, there are very good reasons to be bullish, I suppose?

The hypes cut both way.

mdp2021

> When something was impossible only 3 years ago, barely worked 2 years ago, but works well now

Are you talking of what exactly? What are you stating works well now and did not years ago? Claude as a milestone of code writing?

Also in that case, if there are current apparent successes coming from a realm of tentative responses, we would need proof that the unreliable has become reliable. The observer will say "they were tentative before, they often look tentative now, why should we think they will pass the threshold to a radical change".

mnky9800n

How did you integrate Claude into your shell

simonw

I wrote my own tool for that a while back as an LLM plugin, so I can do this:

    llm cmd extract first frame of movie.mp4 as a jpeg using ffmpeg

I use that all the time, it works really well (defaulting to GPT-4o-mini because it's so cheap, but it works with Claude too): https://simonwillison.net/2024/Mar/26/llm-cmd/

latemedium

I hacked something together a while back - a hotkey toggles between standard terminal mode and LLM mode. LLM mode interacts with Claude, and has functions / tool calls to run shell commands, python code, web search, clipboard, and a few other things. For routine data science tasks it's been super useful. Claude 3.7 was a big step forward because it will often examine files before it begins manipulating them and double-checks that things were done correctly afterwards (without prompting!). For me this works a lot better than other shell-integration solutions like Warp

airstrike

Claude Code is available directly from Anthropic, but you have to request an invite as it's in "Research Preview"

There are third party tools that do the same, though

bjornsing

I’ve been using Claude a lot lately, and I must say I very much disagree.

For example, the other day I was chatting with it about the health risks associated with my high consumption of grown salmon. It then generated a small program to simulate the accumulation of PCB in my body. I could review the program, ask questions about the assumptions, etc. It all seemed very reasonable. A toxicokinetic analysis it called it.

It then struck me how immensely valuable this is to a curious and inquisitive mind. This is essentially my gold standard of intelligence: take a complex question and break it down in a logical way, explaining every step of the reasoning process to me, and be willing to revise the analysis if I point out errors / weaknesses.

Now try that with your doctor. ;)

Can it make mistakes? Sure, but so can your doctor. The main difference is that here the responsibility is clearly on you. If you do not feel comfortable reviewing the reasoning then you shouldn’t trust it.

theshrike79

"Trust but verify"

With an LLM it should be "Don't trust, verify" - but it isn't that hard to verify LLM claims, just ask it for original sources.

Compare to ye olde scientific calculators (90s), they were allowed in tests because even though they could solve equations, they couldn't show the work. And showing the work was 90% of the score. At best you could use one to verify your solution.

But then tech progressed and now calculators can solve equations step by step -> banned from tests at school.

Too

> it isn't that hard to verify LLM claims, just ask it for original sources

Have you tried actually doing this? Most of the time it makes up urls that don't exist or contradict the answer it just gave.

mattmanser

Google it.

There's a bunch of scientific papers talking about that.

Bet that plenty of those researchers have written python programs they've uploaded to GitHub and you just got one of their programs regurgitated

It's not intelligence mate, it's just copying an existing program.

bjornsing

I don’t mind to be honest. I don’t expect more intelligence than that from my doctor either. I want them to identify the relevant science and regurgitate / apply it.

Ylpertnodi

>It's not intelligence mate, it's just copying an existing program.

Isn't the 'intelligence' part, the bit that gets a previously-constructed 'thing', and makes it work in 'situation'. Pretty sure that's how humans work, too.

lionkor

that's a wild assumption that has no place here

maxdoop

And how would you define intelligence then?

Nevermark

So many people putting expectations up to knock down about models. Infinite reasons to critique them.

Please dispense with anyone's "expectations" when critiquing things! (Expectations are not a fault or property of the object of the expectations.)

Today's models (1) do things that are unprecedented. Their generality of knowledge, and ability to weave completely disparate subjects together sensibly, in real time (and faster if we want), is beyond any other artifact in existence. Including humans.

They are (2) progressing quickly. AI has been an active field (even through its famous "winters") for several decades, and they have never moved forward this fast.

Finally and most importantly (3), many people, including myself, continue to find serious new uses for them in daily work, that no other tech or sea of human assistants could replace cost effectively.

The only way I can make sense out of anyone's disappointment is to assume they simply haven't found the right way to use them for themselves. Or are unable to fathom that what is not useful for them is useful for others.

They are incredibly flexible tools, which means a lot of value, idiosyncratic to each user, only gets discovered over time with use and exploration.

That that they have many limits isn't surprising. What doesn't? Who doesn't? Zeus help us the day AI doesn't have obvious limits to complain about.

tkgally

> Their generality of knowledge, and ability to weave completely disparate subjects together sensibly, is beyond any other artifact in existence

Very well said. That’s perhaps the area where I have found LLMs most useful lately. For several years, I have been trying to find a solution to a complex and unique problem involving the laws of two countries, financial issues, and my particular individual situation. No amount of Googling could find an answer, and I was unable to find a professional consultant whose expertise spans the various domains. I explained the problem in detail to OpenAI’s Deep Research, and six minutes later it produced a 20-page report—with references that all checked out—clearly explaining my possible options, the arguments for and against each, and why one of those options was probably best. It probably saved me thousands of dollars.

ryukoposting

Are they progressing quickly? Or was there a step-function leap about 2 years ago, and incremental improvements since then?

I tried using AI coding assistants. My longest stint was 4 months with Copilot. It sucked. At its best, it does the same job as IntelliSense but slower. Other times it insisted on trying to autofill 25 lines of nonsense I didn't ask for. All the time I saved using Copilot was lost debugging the garbage Copilot wrote.

Perplexity was nice to bounce plot ideas off of for a game I'm working on... until I kept asking for more and found that it'll only generate the same ~20ish ideas over and over, rephrased every time, and half the ideas are stupid.

The only use case that continues to pique my interest is Notion's AI summary tool. That seems like a genuinely useful application, though it remains to be seen if these sorts of "sidecar" services will justify their energy costs anytime soon.

Now, I ask: if these aren't the "right" use cases for LLMs, then what is, and why do these companies keep putting out products that aren't the "right" use case?

hadlock

have you tried it recently? o3-mini-high is really impressive. If you ease into talking to it about your intent and outlining the possible edge and corner cases it will write nuanced rust code 1000 lines at a time no problem

jandrewrogers

My anecdotal experience is similar. For any important or hard technical questions relevant to anything I do, the LLM results are consistently trash. And if you are an expert in the domain you can’t not notice this.

On the other hand, for trivial technical problems with well known solutions, LLMs are great. But those are in many senses the low value problems; you can throw human bodies against that question cheaply. And honestly, before Google results became total rubbish, you could just Google it.

I try to use LLMs for various purposes. In almost all cases where I bother to use them, which are usually subject matters I care about, the results are poorer than I can quickly produce myself because I care enough to be semi-competent at it.

I can sort of understand the kinds of roles that LLMs might replace in the next few years, but there are many roles where it isn’t even close. They are useless in domains with minimal training data.

snitty

>For any important or hard technical questions relevant to anything I do, the LLM results are consistently trash. And if you are an expert in the domain you can’t not notice this.

This is also my experience. My day job isn't programming, but when I can feed an LLM secretarial work, or simple coding prompts to automate some work, it does great and saves me time.

Most of my day is spent getting into the details on things for which there's no real precedent. Or if there is, it hasn't been widely published on. LLMs are frustrating useless for these problems.