Skip to content(if available)orjump to list(if available)

AI Responses May Include Mistakes

AI Responses May Include Mistakes

190 comments

·May 31, 2025

_wire_

Google's Gemini in search just makes up something that arbitrarily appears to support the query without care for context and accuracy. Pure confabulation. Try it for yourself. Ridiculous. It works as memory support if you know the result you're looking for, but if you don't, you can't trust it as far as you can throw it.

If you look carefully at Google Veo output, it's similarly full of holes.

It's plain there's no reasoning whatsoever informing the output.

Veo output with goofy wrongness

https://arstechnica.com/ai/2025/05/ai-video-just-took-a-star...

Tesla FSD goes crazy

https://electrek.co/2025/05/23/tesla-full-self-driving-veers...

justmarc

And suddenly this type of quality is becoming "normal" and acceptable now? Nobody really complains.

That is very worrying. Normally this would never fly, but nowadays it's kind of OK?

Why should false and or inaccurate results be accepted?

TeMPOraL

We lost that battle back when we collectively decided that sales and marketing is respectable work.

bheadmaster

Hah. Good observatoon.

I often get in arguments about how I tend to avoid brands that put too much into marketing. Of course, theoretically, the amount of money a company puts into marketing doesn't automatically lower the quality of their products, but in my experience, the correlation is there. Whiskas, Coka Cola, McDonalds, etc.

Nuzzerino

Complain to it enough times, remain resilient and you’ll eventually figure it out (that’s a wild card though). Or find someone who has and take their word for it (except you can’t because they’re probably indistinguishable from the ‘bot’ now according to the contradictory narrative). Iterate. Spiral. No one should have to go through that though. Be merciful.

rchaud

> Normally this would never fly, but nowadays it's kind of OK?

We started down this path ever since obvious bugs were reframed as "hallucinations".

meander_water

I've recently started wondering what the long term impacts of AI slop is going to be. Will people get so sick of the sub-par quality that there will be a widespread backlash, and a renewed focus on handmade or artisinal products made by hand? Or will we go the other way where everyone will accept the status-quo and everything will just get shittier, and we will just have multiple cycles of AI slop trained on AI slop?

chronid

Suddenly? That's the level of quality that is standard in all software projects I've ever seen since I've started working in IT.

Enshittification is all around us and is unstoppable. Because we have deadlines to hit and goals to shows we reached to the VP. We broke everything and the software is just half working? Come on that's an issue for the support and ops teams. On to the next beautiful feature we can put on marketing slides!

veunes

And how quickly the bar is being lowered

reaperducer

And suddenly this type of quality is becoming "normal" and acceptable now?

The notion that "computers are never wrong" has been engrained in society for at least a century now, starting with scifi, and spreading to the rest of culture.

It's an idea that has caused more harm than good.

emrah

When were search results 100% fact checked and accurate??

mdp2021

For example, in the times of "lectures", where transmitted information was literally read (as the term says) in real time from the source to the public.

But in general, the (mis-)information that spinach could contain so much iron to be interchangeable with nails had to be a typo so rare that it would become anecdotal and generate cultural phenomena like Popeye.

flomo

I had a question about my car, so I googled '[year] [make] [model] [feature]'. This seems like the sort of thing that Google had always absolutely nailed. But now, 90% of the page was ai slop about wrong model, wrong year, even the wrong make. (There was one youtube which was sorta informative, so some credit.)

But way way down on at the very the bottom of the page, there was the classic google search answer on a totally unrelated car forum. Thanks CamaroZ28.com!

null

[deleted]

camillomiller

This is a very very good point. If this was happening with different queries that we never used, or a new type of questions/queries than I would have some patience. But it happens exactly with the formulations that were giving you the best results in the SERP before!

TeMPOraL

That was true before AI too (I know, I did such searches myself). Google results has been drowning in slop for over a decade now - it was just human-generated slop, aka. content marketing and SEO stuff.

I'm not defending the AI feature here, just trying to frame the problem: the lies and hallucinations were already there, but nobody cared because apparently people don't mind being constantly lied to by other people.

flomo

No, I'm not complaining about SEO shit...

The thing is the primordial google had the answer, but Google themselves buried it under 100+ links of Google-generated slopdiarrhea, most of which didn't even factually fit the question, and was not at all relevant to my automobile.

dingnuts

Ad supported search has been awful for a few years now, just buy a Kagi subscription and you'll be like me: horrified but mildly amused with a dash of "oh that explains a lot" when people complain about Google

gambiting

I'm a member of a few car groups on Facebook and the misinformation coming from Google is infuriating, because people treat it as gospel and then you have to explain to them that the AI slop they were shown as the top result in Google is not - in fact - correct.

As a simple example - someone googled "how to reset sensus system in Volvo xc60" and Google told them to hold the button under the infotainment screen for 20 seconds and they came to the group confused why it doesn't work. And it doesn't work because that's not the way to do it, but Google told them so, so of course it must be true.

nyarlathotep_

I wonder this too--if there will actually be more work created from LLM generations from a whole new genre of customer support that now not only has to know the "material" but has to know provide secondary support in resolving issues customers have from incorrect nonsense.

flomo

Exactly, the "AI" opiateturd results are often for the completely wrong year/model or just obviously false. I'm certain Google used to be really good at this kind of thing.

veunes

Yeah, it feels like we've crossed into a weird uncanny valley where AI outputs sound smarter than ever, but the underlying logic (or lack thereof) hasn't caught up

roywiggins

I think it's just much easier for an LLM to learn how to be convincing than it is to actually be accurate. It just has to convince RLHF trainers that it's right, not actually be right. And the first one is a general skill that can be learned and applied to anything.

https://arxiv.org/html/2409.12822v1

camillomiller

This baffles me like no other tech has done before. Google is betting its own core business on a pivot that relies on a massively faulty piece of technology. And as Ben Evans also says, promising that it will get better only gets you so far, it’s an empty promise. Yesterday AI overview made up an entire album by a dead Italian musician when I searched for a tribute event that was happening at a Berlin venue. It just took the name of the venue and claimed it was the most important work from that artist.

Funnily enough (not for Google), I copypasted that answer on chatGPT and it roasted AI Overview so bad on its mistakes and with such sarcasm that it even made me chuckle.

DanHulton

It's the unfounded promises that this will be solved because the tech will only get better that really upset me. Because sure, it will get better, I'm pretty certain of that. They'll have additional capabilities, they'll have access to more-recent data, etc. But "better" does not necessarily equate to "will fix the lying problem." That's a problem that is BAKED INTO the technology, and requires some kind of different approach to solve -- you can't just keep making a hammer bigger and bigger in the hopes that one day it'll turn into a screwdriver.

Before LLMs really took off, we were in the middle of an "AI winter", where there just weren't any promising techs, at least none with sufficient funding attached to them. And it's WORSE now. LLMs have sucked all the air out of the room, and all of the funding out of other avenues of research. Technologies that were "10-20" years away now might be 30-40, because there's fewer people researching them, with less money, and they might even be completely different people trying to restart the research after the old ones got recruited away to work on LLMs!

mountainriver

I really don’t understand the whole AI winter talk all the time. We haven’t had anything of the sort since 2008. There were tons of major RL advancements before ChatGPT that were stunning.

I would challenge anyone to find data to actually support any of these claims. ML spending has been up since deep learning year over year and the models just keep getting better

MangoToupe

I'm honestly so confused how people use LLMs as a replacement for search. All chatbots can ever find are data tangential to the stuff I want (eg i ask for a source, it gives me a quote). Maybe i was just holding search wrong?

TeMPOraL

> eg i ask for a source, it gives me a quote

It should give you both - the quote should be attributed to where it was found. That's, generally, what people mean when they ask or search for "a source" of some claim.

As for general point - using LLMs as "better search" doesn't really look like those Google quick AI answers. It looks like what Perplexity does, or what o3 in ChatGPT does when asked a question or given a problem to solve. I recommend checking out the latter; it's not perfect, but good enough to be my default for nontrivial searches, and more importantly, it shows how "LLMs for search" should work to be useful.

mdp2021

> LLMs as a replacement for search

Some people expect LLMs as part of a better "search".

LLMs should be integrated to search, as a natural application: search results can heavily depend on happy phrasing, search engines work through sparse keywords, and LLMs allow to use structured natural language (not "foo bar baz" but "Which foo did a bar baz?" - which should be resistant to terms variation and exclude different semantics related to those otherwise sparse terms).

But it has to be done properly - understand the question, find material, verify the material, produce a draft reply, verify the draft vis-a-vis the material, maybe iterate...

1659447091

DuckDuckGo Ai assist is going in the right direction, imo. It will pull info from wikipedia, use math and map tools plus other web sources that has been mostly accurate for me on the search page.

The chat option uses gpt-4o with web search and was able to provide links to colonial map resources I was curious about after falling down that rabbit hole. It also gave me general (& proper) present day map links to the places I was looking for in the map sites I asked for.

It did get confused a few times when I was trying to get present day names of old places I had forgot; like Charles River in Va that it kept trying to send me to Boston or Charles City Co on the James river and told me to look for it around there...

The York river wiki page clearly says it was once Charles River. Maybe I wasn't asking the right questions. For more unique things it was pretty helpful thou and saved the endless searching w/ 100 tabs adventure

MaxikCZ

Its good to be shown direction. When I only have a vauge idea of what I want, AI usually helps me frame it into searchable terms I had no clue existed.

incangold

I find LLMs are often better for X vs Y questions where search results were already choked by content farm chaff. Or at least LLMs present more concise answers, surrounded by fewer ads and less padding. Still have to double check the claims of course.

MangoToupe

I think I'm discovering that I just don't tend to think in terms of questions rather than content

Garlef

Maybe that's because we're conditioned by the UX of search.

But another thing I find even more surprising is that, at least initially, many expected that the LLMs would give them access to some form of higher truth.

MangoToupe

I think you might be on to something. I've found myself irritated that i can't just chuck keywords at LLMs.

christophilus

I’ve had good results with Brave search, which self reports to use: Meta Llama 3, Mistral / Mixtral, and CodeLLM. It’s not always 100% accurate, but it’s almost always done the trick and saved me digging through more docs than necessary.

XorNot

I use ublock to remove Gemini responses from search, because even glancing at then is liable to bias my assumption about whatever I'm looking for.

Information hygiene is a skill which started out important but is going to become absolutely critical.

MaxikCZ

Half my browser extensions have sole purpose of removing shit from sites I visit.

HN is like a unicorn that havent made me block a single thing yet.

alpaca128

Ironically that's an AI tool I would use - one that can dynamically filter content from sites according to my preferences, to counter the algorithmic spam. It wouldn't be generative AI though, and that's the only kind of AI that matters right now apparently.

justmarc

We can't expect the vast majority of regular users to have any of that skill.

What is this going to lead to? fascinating times.

jobigoud

It's very easy though, Right click > Block element > Create. Overlays show which blocks you are removing. Sliders can be used to increase/refine.

How can we make it even easier and visual? Customizing pages by removing elements should be part of the default browser experience to be honest. Like in the initial web where you would tweak the color of links, visited links, etc.

Kwpolska

Google recently started showing me their AI bullshit. This made me pull the trigger and switch to DuckDuckGo as the primary search engine.

That said, some niche stuff has significantly better results in Google. But not in the AI bullshit. I searched for a very niche train-related, the bullshit response said condescendingly "this word does not exist, maybe you meant [similarly sounding but completely different word], which in the context of trains means ...". The first real result? Turns out that word does exist.

christophilus

What’s the word?

datavirtue

I switched to DDG over seven years ago and just realized it had been that long when I read your comment. Google started wasting my time and I had to shift.

dijksterhuis

fyi, you can remove any and all “ai” assistant bs etc from DDG if you use the noai subdomain (in case you wanna avoid their stuff, although it’s much less prominent anyway) https://noai.duckduckgo.com/

sspiff

I find this phenomenon really frustrating. I understand (or am at least aware of) the probabilistic nature of LLMs and their limitations, but when I point this out to my wife or friends when they are misusing LLMs for tasks they are both unsuited for and unreliable at, they wave their hands and dismiss my concerns as me being an AI cynic.

They continue to use AI for math (asking LLMs to split bills, for example) and treat its responses for factual data lookup as 100% reliable and correct.

osmsucks

> They continue to use AI for math (asking LLMs to split bills, for example)

Ah, yes, high tech solutions for low tech problems. Let's use the word machine for this number problem!

datavirtue

I'm so lazy, I have chat bots do all kinds of complex calculations for me. I even use it as a stock screener and the poor thing just suffers, burning fuck tons of electricity.

thaumasiotes

> Let's use the word machine for this number problem!

You know, that's a thought process that makes internal sense.

You have someone who's terrible at math. They want something else to do math for them.

Will they prefer to use a calculator, or a natural language interface?

How do you use a calculator without knowing what you're doing?

osmsucks

Feels to me like if you can't even use a calculator you have bigger problems to worry about...

veunes

What's tricky is that for casual use, it gets things "close enough" often enough that people start building habits around it

jatora

Using it for simple math is actually pretty hilarious. Hey maybe they make sure to have it use python!...but I dream

BlueTemplar

Using LLMs (or platforms in general) is a bit like smoking (in closed spaces, with others present) : a nuisance.

diggan

That's just plain wrong, and I'm a smoker. LLMs won't affect the ones around you, unless you engage with them in some way. Sit next to me while I smoke and you'll be affected by passive smoking regardless of how much you engage or not. Not really a accurate comparison :)

JeremyNT

> They continue to use AI for math (asking LLMs to split bills, for example) and treat its responses for factual data lookup as 100% reliable and correct.

I don't do this but isn't it basically... fine? I assume all the major chatbots can do this correctly at this point.

The trick here is that chatbots can do a wide range of tasks, so why context switch to a whole different app for something like this? I believe you'll find this happening more frequently for other use cases as well.

Usability trumps all.

JeremyNT

Wish I could edit, but I was referring to the bill splitting math specifically here. I didn't mean to quote the rest.

When it comes to facts that actually matter, people need to know to verify the output.

minimaxir

The simple "AI responses may include mistakes" disclaimer or ChatGPT's "ChatGPT can make mistakes. Check important info." CYA text at the bottom of the UI are clearly no longer sufficient. After years of news stories about LLM hallucinations in fact-specific domains and people still getting burnt by them, LLM providers should be more aggressive in educating users about their fallability since hallucinations can't ever be fully fixed, even if it means adding friction.

eddythompson80

That doesn't really make sense. You either make the LLM provider liable for the output of the model, or you have the current model. The friction already exists. All these AI companies and cloud providers are running "censored models" and more censorship is added at every layer. What would more friction be here? more pop-ups?

Doing the former basically means killing the model-hosting business. Companies could develop models, use them internally and give them to their employees, but no public APIs exists. Companies strike legally binding contracts to use/license each other models, but the general public doesn't have access to those without something that would mitigate the legal risk.

Maybe years down the line, as attitudes soften, some companies would begin to push the boundaries. Automating the legal approval process, opening signups, etc.

minimaxir

Yes, more popups, retention metrics be damned. Even 2 years since ChatGPT, many people still think it's omniscent which is what's causing trouble.

TeMPOraL

Hyperbole is as much of a problem. ChatGPT is not omniscient, but it's also not "dumb as shit", at least not across the board. LLMs got popular in big part because they provide unique, distinctly new value.

This black and white assumption that because LLMs are not always giving probably correct answers therefore they are dangerous, reminds me of what the generation of my parents and teachers thought of Wikipedia when it became popular. The problems were different ("anyone can edit" vs. "hallucinates stuff"), but the mindset seems very similar.

MaxikCZ

People who are suspectible to read AI slop as universal truth do so because they dont read much at all. I guess you would be surprised how huge amount of users dont bother to read anything at all: a popup only exists in a sense "how do I close this", which is being solved by clicking at the most visually distinct button. If you asked them what they clicked or what the popup was about, they look at you like you are crazy for even assuming they should know.

eddythompson80

I don't think more pop-ups solve anything. It'll just make a chrome extension called "ChatGPTAutoAccept" get popular. You think someone who is thinking it's omniscent will suddenly reconsider because a "reminder, this is dumb as shit" pop up keeps annoying them every 5 minutes?

nyarlathotep_

> LLM providers should be more aggressive in educating users about their fallability since hallucinations can't ever be fully fixed, even if it means adding friction

But they can't be as the whole premise of the boom is replacing human intellectual labor. They've said as much on many many occasions--see Anthropic's CEO going off about mass unemployment quite recently. How can the two of these co-exist?

tbrownaw

> should be more aggressive in educating users about their fallability

This might be an "experience is the best teacher" situation. It'd probably be pretty hard to invent a disclaimer that'd be as effective as getting bit.

minimaxir

Unfortunately, getting bit in cases such as publishing misinformation or false legal citations waste everyone time, not just their own.

userbinator

The disclaimer needs to be in bold red text at the top.

camillomiller

Remember when Apple was roasted to hell anytime Maps would push you to get a wrong turn? Or when Google Maps would take you to the wrong place at the wrong time (like a sketchy neighborhood)? Those were all news stories they had to do PR crisis management for. Now they slap a disclaimer like that and we’re all good to go. The amount of public opinion forgiveness these technologies are granted is disproportionate and disheartening.

thejohnconway

That always struck me as pretty overblown, given that before map apps, people got lost all the goddam time. It was a rare trip with any complexity that a human map reader wouldn’t make a mistake or two.

LLMs aren’t competing with perfect, they are competing with websites that may or may not be full of errors, or asking someone that may or may not know what they are talking about.

alpaca128

Worse - LLMs are competing with inconvenience, and inconvenience always loses.

arcanemachiner

Yeah, but we're all used to having software integrated into our lives now. And we all now how shitty and broken software often is...

ben_w

Apple maps currently insists that there's a hotel and restaurant across the street from me.

According to the address on the business website that Apple Maps itself links to, the business is 432 km away from me.

neepi

To be fair people are pretty damn unintelligent when it comes to verifying information. Despite my academic background I catch myself doing it all the time as well.

However LLMs amplify this damage by sounding authoritative on everything and even worse being promoted as authoritative problem solvers for all domains with a small disclaimer. This double think is unacceptable.

But if they made the disclaimer bigger then the AI market would collapse in about an hour much like people’s usage does when they don’t verify something and get shot by someone actually authoritative. This has happened at work a couple of times and caused some fairly high profile problems. Many people refuse to use it now.

What we have is bullshit generator propped up by avoiding speaking the truth because the truth compromises the promoted utility. Classic bubble.

YetAnotherNick

You are assuming that the people burnt by LLM responses doesn't know that ChatGPT can make mistakes?

mdp2021

> The simple ...

No, improper phrasing. Correct disclaimer is, "The below engine is structurally unreliable".

--

Comment, snipers. We cannot reply to unclear noise.

jll29

Language models are not designed to know things, they are designed to say things - that's why they are called language models and not knowledge models.

Given a bunch of words have already been generated, it always ads the next words based on how common the sequence is.

The reason you get different answers each time is the effect of the pseudo-random number generator on picking the next word. The model looks at the probability distribution of most likely next words, and when the configuration parameter called "temperature" is 0 (and it is actually not possible to set to 0 in the GUI), there is no random influence, and strictly the most likely next word (top-1 MLE) will always be chosen. This leads to output that we would classify as "very boring".

So the model knows nothing about IBM, PS/2, 80286 versus 80486, CPUs, 280 or any models per se. -- One of the answers seems to suggest that there is no model 280, I wonder whether that one was generated through another process (there is a way to incorporate user feedback via "reinforcement learning"), or whether that was a consequence of the same randomized next-word picking, just a more lucky attempt.

otabdeveloper4

> This leads to output that we would classify as "very boring".

Not really. I set temperature to 0 for my local models, it works fine.

The reason why the cloud UIs don't allow a temperature of 0 is because then models sometimes start to do infinite loops of tokens, and that would break the suspension of disbelief if the public saw it.

mdp2021

Which local models are you using, that do not output loop garbage at temperature 0?

What do you get at very low temperature values instead of 0?

otabdeveloper4

> Which local models are you using, that do not output loop garbage at temperature 0?

All of them. I make my own frontends using llama-cpp. Quality goes up with temperature 0 and loops are rare.

The temperature setting isn't for improving quality, it's to not break your suspension of disbelief that you're talking to an intelligent entity.

verisimi

> Language models are not designed to know things, they are designed to say things - that's why they are called language models and not knowledge models.

This is true. But you go to Google not to 'have a chat' but ostensibly to learn something based in knowledge.

Google seem to be making an error in swapping the provision of 'knowledge' for 'words' you'd think, but then again perhaps it makes no difference when it comes to advertising dollars which is their actual business.

neilv

On the Google search Web site, the "AI responses may include mistakes." weak disclaimer small print is also hidden behind the "Show more" button.

When OpenAI launched ChatGPT, I had to explain to a non-CS professor that it wasn't AI like they're thinking of, but currently more like a computational parlor trick that looks a lot like AI.

But turns out this parlor trick is awesome for cheating on homework.

Also good at cheating at many other kinds of work, if you don't care much about quality, nor about copyrights.

stavros

I really don't understand the view that it's a "parlor trick that looks like AI". If it's not "a thing that can write code", but instead just looks like a thing that can write code (but can actually write code), it can write code. All the "no true Scotsman" stuff about what it's doing behind the scenes is irrelevant, because we have no idea what human brains are doing behind the scenes either.

ben_w

Although I broadly agree, I wouldn't go quite as far as where you say:

> All the "no true Scotsman" stuff about what it's doing behind the scenes is irrelevant, because we have no idea what human brains are doing behind the scenes either.

Computers and transistors have a massive speed advantage over biological brains and synapses — literally, not metaphorically, the same ratio as the speed difference between how far you walk in a day and continental drift, with your brain being continental draft — which means they have the possibility of reading the entire Internet in a few weeks to months to learn what they know, and not the few tens to hundreds of millenia it would take a human.

Unfortunately, the method by which they acquire information and knowledge, is sufficiently inefficient that they actually need to read the entire Internet to reach the skill level of someone who has only just graduated.

This means I'm quite happy to *simultaneously* call them extremely useful, even "artificial general intelligence", and yet also agree with anyone who calls them "very very stupid".

If we actually knew how our brains did this inteligence thing, we could probably make AI genuinely smart as well as absurdly fast.

hnlmorg

Their point wasn’t that it’s not useful. It’s that it isn’t artificial intelligence like the masses consider the term.

You wouldn’t say Intellisense isn’t useful but you also wouldn’t call it “AI”. And what LLMs are like is basically Intellisense on steroids (probably more like a cocktail of speed and acid, but you get my point)

stavros

If you'd call k-means AI but you wouldn't call LLMs AI, I'm so far off that reasoning that I don't think we can agree.

neilv

Historically, there's been some discussion about that:

https://en.wikipedia.org/wiki/Chinese_room

keiferski

It matters if we are making a distinction between essence and output.

On the output side, it functionally doesn’t really have a difference. At least in terms of more abstract things like writing code. Although I would argue that the output AI makes still doesn’t match the complexity and nuance of an individual human being, though, and may never do so, simply because the AI is simulating embodiment and existing in the world. It might need to simulate an Earth equivalent to truly simulate a human’s personal output.

In the essence side, it’s much more of a clear distinction. We have numerous ways of determining if a thing is human or not - biology, for one. It would take some serious sci-fi until we get to the point where an android is indistinguishable from a human on the cellular level.

ben_w

> Although I would argue that the output AI makes still doesn’t match the complexity and nuance of an individual human being, though

LLMs are very good at nuance. Better than any human I've seen — so much so, I find it to be a tell.

> We have numerous ways of determining if a thing is human or not - biology, for one.

I don't care if the intelligence is human, I care if it's (1) (a) intelligent, (b) educated, and (2) has the ability to suffer or not so I know if it should have moral subject rights.

1a is slowly improving but we're guessing and experimenting: not really engineering intelligence, just scaling up the guesses that work OK. 1b was always easy, libraries fit "education" in isolation from the "intelligent" part of 1a. LLMs are good enough combination of (a) and (b) to be interesting, potentially even an economic threat, depending on how long the time-horizon between failures gets.

2 is pre-paradigmatic, we don't have enough understanding of the problem to ask the correct question — even ignoring AI for the moment, the same problem faces animal welfare (and why would the answer be the same for each of chimps, dogs, ravens, lobsters, and bees?) and even within humans on topics such as abortion, terminal stage of neurodegenerative conditions such as Alzheimer's, etc.

otabdeveloper4

LLMs can't write code.

They don't have capacity to understand logical or temporal relationships, which is the core competency of coding.

They can form syntactically valid strings in a formal language, which isn't the same thing as coding.

stavros

Hmm, I guess I better throw away all this working code they wrote, then.

loa_in_

It's a memory augmentation/information retrieval tool with flexible input and output interface.

9x39

Gemini appears tuned to try to handle the typical questions people type in, while more traditional things you search for get some confabulated nonsense.

I've observed a great deal of people trust the AI Overview as an oracle. IMO, it's how 'normal' people interact with AI if they aren't direct LLM users. It's not even age gated like trusting the news - trusting AI outputs seems to cross most demographics. We love our confident-based-on-nothing computer answers as a species, I think.

eddythompson80

I think Google is in a particularly bad situation here.

For over a decade now, that spot in the search page had the "excerpt from a page" UI, which made a lot of sense. It cut down an extra click, and if you trusted the source site, and presumably Google's "Excerpt Extraction Technology" (whatever that was) what was left not to trust? It was very trust worthy information location.

Like if I search for a quick medical question, and there is an except from the mayoclinic, I trust the mayoclinic, so good enough for me. Sometimes I'd copy the excerpt from google, and go to the page and ctrl-f it.

Google used to do a decent job at picking reputable sources, the excerpts were always indeed found in the page in a non-altering context, so it was good enough to build trust. Now that system has degraded over the years in terms of how good it was at picking those reputable sources. Most likely because it was SEO gamed.

However, it has been replaced with a the AI Overview. I'm not against AI, but AI is fundamental different than "a relevant excerpt from a source you trust with a verifiable source in milliseconds".

tsunamifury

How could you think this hard and be so far off. Google is in a hyper strong position here and I don’t even like them.

They can refine grounded results over time and begin serving up increasingly well reasoned results over time as Models improve cost effectively. Then that drives better vectors for ads.

Like what about this is hard to understand?

eddythompson80

What about what is hard to understand?

Google did it because it's better for Google, yes. They no longer have to deal with people trying to hack SEO. Now you would have to figure out how to influence the training process of google to hijack that box. So it's better for Google to move to AI Overview. What's your point here?

I say Google is in a bad position morally or in terms of "doing the right thing" not that one would really expect it from a corporation per se. There is a distinction you know.

Google introduced the box as "Excerpt from a search result" box. They traditionally put a lot of care into their search quality and it showed and built trust with their users. Over the years, the search quality dropped. Whether it was less attention from Google, fundamentally harder problem to solve with far more motivated attackers. Yet, the intrusion of bullshit website in the "Excerpt from a search result" box still let you decide that you are not gonna trust medical advice from "mikeInTheDeep.biz" it wasn't ideal that they build trust then let it slip, but being able to see a source with a quote makes it useful when you trust the source.

With AI Overview, you either trust it all, don't trust any of it, use it as confirmation bias, don't

geraneum

> if they aren't direct LLM users

My manager, a direct LLM user, uses the latest models to confirm his assumptions. If they are not confirmed on the first try, he then proceeds to form the question differently until gets what he wants from them.

edit: typo

danielbln

We love our confident-based-on-nothing answers period, computer or not.

chneu

Most folks just want confirmation. They don't want to have their views/opinions changed. LLM are good at trying to give folks what they're looking for.

mdp2021

Repent.

You are not there to "love what gives you the kicks". That's a kind of love that should not exit the bedroom (better, the bathroom).

Llamamoe

I already went through a realization a while ago that you just can't mention something to people anymore and expect them to be able to learn about it by searching the web, like it used to be possible, because everything is just unreliable misleading SEO spam slop.

I shudder to think how much worse this is going to be with "AI Overview". Are we entering an era of people googling "how does a printer work" and (possibly) being told that it's built by a system of pulleys and ropes and just trusting it blindly?

Because that's the kind of magnitude of errors I've seen in dozens of searches I've made in the domains I'm interested in, and I think everyone has seen the screenshots of even more outlandish - or outright dangerous - answers.

hannob

"AI Responses May Include Mistakes" is really the one, single most important thing I want to shout into the whole AI debate.

It also should be the central issue - together with the energy/climate impacts - in every debate about AI ethics or AI safety. It's those two things that will harm us most if this hype continues unchecked.

consp

The problem is not it may, but it will make mistakes. But people do not realize this and treat it as an almighty oracle. It's a statistical model after all, there is a non zero chance of the monkey creating the works Shakespeare.

rcarmo

This is why Google has got search fundamentally wrong. They just don’t care about accuracy of results anymore, and worry mostly about providing a quick answer and a bunch of sponsored links below it.

Llamamoe

Except that out of 10 answers, the "quick answer" is subtly wrong 6 times, egregiously wrong 2, and outright dangerous once. I've seen screenshots of stuff that would get people killed or in legal trouble.

dandanua

They just continue the Eric Schmidt idea "More results are better than none". It has evolved to "It's better to hallucinate than produce a negative answer", I guess.

nickjj

I had an experience the other day with ChatGPT and some Python code.

I wanted to modify Gunicorn's logger class so I can filter out specific URL paths. Given it's a hot code path (running on every request) I told it I made 3 solutions and was looking to see which one is the fastest. I used a list + loop using startswith, compiled regex and also used startswith while passing in a tuple of paths.

It produced me benchmark code and also benchmark results which stated the regex solution was the best and fastest solution using Python's standard library.

I didn't believe it so I ran the benchmark myself and the tuple version was over 5x faster than the regex solution.

I then told it I ran the benchmark and got different results and it almost word for word said something like "Oh right, thank you for the clarification, the tuple version is indeed the fastest!". It saved me a few minutes writing the benchmark code but yeah, I rarely trust its output for anything I'm not 100% on.

elmerfud

AI is like that one guy who always can tell you something about anything with total confidence. So really not sure why anyone would trust it beyond a bar conversation.

vouaobrasil

I think it's psychological. Most people use visual body cues to determine whether someone is lacking in confidence in their answer. AI does not have any cues to show a lack of confidence, and people also have a high trust in machine output because traditional algorithms always give the correct answer.

The percentage of people that will look at it critically is negligible.

normie3000

> Most people use visual body cues to determine whether someone is lacking in confidence in their answer

Do they?

vouaobrasil

I can certainly tell when someone is just bull**ing from their tone of voice long before they tell me the information.

JdeBP

No "AI" company has yet had the bravery to name its product Cliff Clavin.

Bravery in several ways, that is. There's the risk of being sued by John Ratzenberger. (-:

mvdtnz

> So really not sure why anyone would trust it beyond a bar conversation

Really, you don't know why? Maybe because it's being promoted as "AI" by companies with missions like "organise the world's information", who have spent decade now doing their best to provide accurate information to user queries?

csomar

I had this a few months ago with an old man. He said there are 10 billion people in the world; so I told him you are off by 2 billion. He was adamant and challenged me to a Google search. So I did just that and lo and behold there are 10 billion people according to Google.

I even took a screenshot: https://imgur.com/a/oQYKmKP

I really had nothing to say at that moment.

null

[deleted]

ekianjo

Note that we don't really know the exact answer because population reporting is shaky at best where there are the most people.

mdp2021

And a rational interlocutor, besides conversational shortcuts, replies in the form "The best estimations from sources like S0, S1 and S2, publish a value between V1 and V2".

Biganon

Make up a fake popular wisdom saying that sounds real, search for it, Gemini will gladly tell you it exists and explain it to you

mdp2021

...As others have noted elsewhere: which Gemini? There are cheap ones and others proposed as flagship.

ghusbands

The one that appears when you search with Google.