AI is becoming 'yes-men on servers'
34 comments
·March 10, 2025berkes
Kabukks
I suspect instructing the model to respond with "I don't know" more readily will result in more of those responses even though there are other options that seem viable according to the training data / model.
Remember, LLMs are just statistical sentence completion machines. So telling it what to respond with will increase the likelihood of that happening, even if there are other options that are viable.
But since you can't blindly trust LLM output anyway, I guess increasing "I don't know" responses is a good way of reducing incorrect responses (which will still happen frequently enough) at the cost of missing some correct ones.
asddubs
I suspect it's a balancing act between the AI being generally willing to help and avoid responses like this, e.g.:
https://www.sandraandwoo.com/wp-content/uploads/2024/02/twit...
or it just telling you to google it
shreyshnaccount
what (hypothetically) happens when the cost to run the next giant llm exceeds the cost to hire a person for tasks like this?
Yizahi
The so called AI can't "know". It doesn't have understanding if the generated text is an answer or of it isn't. You can't force that instruction on a neural network, at best it just adjusts generated text slightly and you think that it somehow started understanding.
netdevphoenix
> known for being brutally blunt, rude, honest, and pushing back.
That's a different perspective. Dutch people don't see themselves as rude. A Dutch could say that Americans are known for being dishonest and not truly conveying what they mean. Yet Americans won't see themselves this way. You can replace Dutch and American for any other nationality
XCabbage
Obvious thought that I haven't tested: can you literally achieve this by getting it to answer in Dutch, or training an AI on Dutch text? Plausibly* Dutch-language training data will reflect this cultural difference by virtue of being written primarily by Dutch people.
* (though not necessarily, since the Internet is its own country with its own culture, and much training data comes from the Internet)
zoover2020
I've tried Dutch answers and it is more than happy to hallucinate and give me answers that are very "American". Doesn't help that our culture is very inspired by the US pop culture as well since the internet.
Haven't tried prompt engineering with the Dutch stereotype, though.
lifestyleguru
> "Dutch" famously known for being brutally blunt, rude, honest, and pushing back.
Dutch will never bluntly push back if you plan to setup tax evasion scheme in their country. Being vicious assholes in daily stuff especially towards strangers? That's hardly something deserving praise.
msm_
That's... a surprisingly crass thing to say. I would play it off as a joke, if not for the second part of your post. Dutch people are not "vicious assholes", they have a culture of direct communication. Assuming that only your culture communication patterns are "correct" is xenophobic and close-minded.
And connecting all people in the country with "tax evasion schemes" is rude, if that was not actually a joke.
eszed
To be fair, that's consequent to the Netherlands' well-known love of soda-bread sandwiches.
lifestyleguru
What do you mean, some Irish reference? Oh I see, I answered to myself;)
j45
Also the more accuracy that is put into the prompt and attached, the more accurate the processing is.
mentalgear
BlueSky version: https://bsky.app/profile/thomwolf.bsky.social/post/3ljpkl6c6...
---
Quite interesting post that asks the right question about "asking the right questions". Yet one aspect I felt missing (which might automatically solve this) is first-principles-based causal reasoning.
A truly intelligent system — one that reasons from first principles by running its own simulations and physical experiments — would notice if something doesn't align with the "textbook version".
It would recognize when reality deviates from expectations and ask follow-up questions, naturally leading to deeper insights and the right questions - and answers.
Fascinating in this space is the new "Reasoning-Prior" approach (MIT Lab & Harvard), which trains reasoning capabilities learned from the physical world as a foundation for new models (before evening learning about text).
Relevant paper: "General Reasoning Requires Learning to Reason from the Get-go."
mentalgear
PS: great explainer video https://www.youtube.com/watch?v=seTdudcs-ws&t=180s
nalinidash
for those who do not want to go to twitter,here is the xcancel link: https://xcancel.com/Thom_Wolf/status/1897630495527104932
neffo
or his blog: https://thomwolf.io/blog/scientific-ai.html
Agingcoder
The author seems to assume that conjuring up a conjecture is the hard part - yet it will be filled with the same standard mathematics ( granted, sometimes wrapped as new tools, and the proof ends up being as important as the result), often at great cost.
Having powerful assistants that allow people to try out crazy mathematical ideas without fear of risking their careers or just having fun with ideas is likely to have an outsized impact anyway I think.
kristianc
As Isaac Newton himself put it, “if I have seen further it is by standing on the shoulders of Giants.” It was ever thus.
timewizard
New things AI will magically fix by existing: The completely broken university career publishing pipeline. *fingers crossed*
neilv
A nice post (that should be somewhere smarter than contemporary Twitter/X).
> PS: You might be wondering what such a benchmark could look like. Evaluating it could involve testing a model on some recent discovery it should not know yet (a modern equivalent of special relativity) and explore how the model might start asking the right questions on a topic it has no exposure to the answers or conceptual framework of. This is challenging because most models are trained on virtually all human knowledge available today but it seems essential if we want to benchmark these behaviors. Overall this is really an open question and I’ll be happy to hear your insightful thoughts.
Why benchmarks?
A genius (human or AI) could produce novel insights, some of which could practically be tested in the real world.
"We can gene-edit using such-and-such approach" => Go try it.
No sales brochure claims, research paper comparison charts to show incremental improvement, individual KPIs/OKRs to hit, nor promotion packets required.
wewewedxfgdf
I'm still waiting for the end of the world caused by AI as predicted by a very large number of prominent figures such as Sam Altman, Hinton, Musk, signers of the Center for AI Safety statement, Shane Legg, Martin Minsky, Eliezer Yudkowsky.
No sign yet.
On the other hand, LLMs are writing code which I can debug and eventually get to work in a real code base - and script writers everywhere are writing scripts more quickly, marketing people are writing better ad copy, employers are writing better job ads and real estate agents writing better ads for houses.
lionkor
Prominent figure says baseless thing to boost stock prices, more news at 6
TeMPOraL
Yes, except half of the list isn't made of prominent people. Whose stock price was Eliezer boosting when he was talking about these things 15 years ago?
Nah, it's more that the masses got exposed to those ideas recently - ideas which existed long ago, in obscurity - and of course now everyone is a fucking expert in this New Thing No One Talked About Before ChatGPT.
Even the list GP gave, the specific names on it - the only thing that this particular grouping communicates is one having no first clue what they're talking about.
godelski
The oddity isn't that people lie, the oddity is that people continue to believe those who lie. They even give more trust to those who constantly lie. This is certainly odd
netdevphoenix
Fact is even if the world was to end, finding the causes would be extremely difficult because...well the world would have ended.
empiko
I am still waiting to see the impact on GDP or any other economic measure.
TeMPOraL
> If something was not written in a book I could not invent it unless it was a rather useless variation of a known theory. __More annoyingly, I found it very hard to challenge the status-quo__, to question what I had learned.
(__emphasis__ mine)
As if "challenging the status-quo" was the goal in the first place. You ain't gonna get any Einstein by asking people to think inside the "outside the box" box. "Status quo" isn't the enemy, and defying it isn't the path to genius; if you're measuring your own intellectual capacity by proxy of how much you question, you ain't gonna get anywhere useful. After all, questioning everything is easy, and doesn't require any particular skill.
The hard thing is to be right, despite both the status-quo and the "question the status-quo" memes.
(It also helps being in the right time and place, to have access to the results of previous work that is required to make that next increment - that's another, oft forgotten factor.)
OtherShrezzing
>We're currently building very obedient students, not revolutionaries. This is perfect for today’s main goal in the field of creating great assistants and overly compliant helpers. But until we find a way to incentivize them to question their knowledge and propose ideas that potentially go against past training data, they won't give us scientific revolutions yet.
This would definitely be an interesting future. I wonder what it'd do to all of the work in alignment & safety if we started encouraging AIs to go a bit rogue in some domains.
internet_points
If an llm is trained on knowledge up until say September 2023, could you use a corpus of interesting/insightful scientific discoveries and new methods developed after that date to evaluate/tune it? (Though I fear it would be a small corpus.)
Yizahi
One of the problems would be acquiring said corpus. NN corporations got away with scraping all human made content for free (arguably stealing it all), but no one can really prove that their specific content was taken without asking, so no lawsuits. NYT tried but that was workaround and I don't know the status of that case. But if NN corpo will come out with explicitly saying that "here, we are using a Nature journal dump from 2024" then Nature journal will come to them and say "oh, really?".
kingkongjaffa
Get a research paper, look at the references. Give an llm all of the references but not the current paper. See if it can conclude something like the current paper? Or at least design the same experiment as detailed in the paper?
jrimbault
What about the not-LLMs works?
I know barely anything about it but it seems some people are interested and excited about protein engineering powered by neural networks.
moralestapia
>Just consider the crazy paradigm shift of special relativity and the guts it took to formulate a first axiom like “let’s assume the speed of light is constant in all frames of reference” defying the common sense of these days (and even of today…)
I'm not an expert on this. Wasn't this an observed phenomenon before Albert put together his theory?
I've had some luck instructing AI to "Don't make up anything. If there's no answer, say I don't know".
Which made me think that AI would be far more useful (for me?) if it was tuned to "Dutchness" rather than "Americanness".
"Dutch" famously known for being brutally blunt, rude, honest, and pushing back.
Yet we seem to have "American" AI, tuned to "the customer is always right", inventing stuff just to not let you down, always willing to help even if that makes things worse.
Not "critical thinking" or "revolutionary" yet. Just less polite and less willing to always please you. In human interaction, the Dutch bluntness and honesty can be very off-putting, but It is quite efficient and effective. Two traits I very much prefer my software to have. I don't need my software to be polite or to not hurt my feelings. It's just a tool!