Skip to content(if available)orjump to list(if available)

The Einstein AI Model

The Einstein AI Model

58 comments

·March 8, 2025

berkes

I've had some luck instructing AI to "Don't make up anything. If there's no answer, say I don't know".

Which made me think that AI would be far more useful (for me?) if it was tuned to "Dutchness" rather than "Americanness".

"Dutch" famously known for being brutally blunt, rude, honest, and pushing back.

Yet we seem to have "American" AI, tuned to "the customer is always right", inventing stuff just to not let you down, always willing to help even if that makes things worse.

Not "critical thinking" or "revolutionary" yet. Just less polite and less willing to always please you. In human interaction, the Dutch bluntness and honesty can be very off-putting, but It is quite efficient and effective. Two traits I very much prefer my software to have. I don't need my software to be polite or to not hurt my feelings. It's just a tool!

null

[deleted]

Kabukks

I suspect instructing the model to respond with "I don't know" more readily will result in more of those responses even though there are other options that seem viable according to the training data / model.

Remember, LLMs are just statistical sentence completion machines. So telling it what to respond with will increase the likelihood of that happening, even if there are other options that are viable.

But since you can't blindly trust LLM output anyway, I guess increasing "I don't know" responses is a good way of reducing incorrect responses (which will still happen frequently enough) at the cost of missing some correct ones.

berkes

> Remember, LLMs are just statistical sentence completion machines. So telling it what to respond with will increase the likelihood of that happening, even if there are other options that are viable.

Obviously. When I say "tuned" I don't mean adding stuff to a prompt. I mean tuning in the way models are also tuned to be more or less professional, tuned to defer certain tasks to other models (i.e. counting or math, something statistical models are almost unable to do) and so on.

I am almost certain that the chain of models we use on chatgpt.com are "tuned" to always give an answer, and not to answer with "I am just a model, I don't have information on this". Early models and early toolchains did this far more often, but today they are quite probably tuned to "always be of service".

"Quite probably" because I have no proof, other than that it will gladly hallucinate, invent urls and references, etc. And knowing that all the GPT competitors are battling for users, so their products quite certainly tuned to help in this battle - e.g. appear to be helpful and all-knowing, rather than factual correct and therefore often admittedly ignorant.

zamadatix

Whether you train the model how to do math internally or tell it to call an external model which only does math the root problem still exists. It's not as if a model which only does math won't hallucinate how to solve math problems just because it doesn't know about history, for the same number of parameters it's probably better to not have to duplicate the parts needed to understand the basis of things multiple times.

The root problem is training models to be uncertain of their answers results in lower benchmarks in every area except hallucinations. It's like you were in a multiple choice test and instead of picking which of answers A-D you think made more sense you picked E "I don't know". Helpful for the test grader, a bad bet for the model trying to claim it gets the most answers right compared to other models.

asddubs

I suspect it's a balancing act between the AI being generally willing to help and avoid responses like this, e.g.:

https://www.sandraandwoo.com/wp-content/uploads/2024/02/twit...

or it just telling you to google it

shreyshnaccount

what (hypothetically) happens when the cost to run the next giant llm exceeds the cost to hire a person for tasks like this?

Yizahi

The so called AI can't "know". It doesn't have understanding if the generated text is an answer or of it isn't. You can't force that instruction on a neural network, at best it just adjusts generated text slightly and you think that it somehow started understanding.

berkes

There's a distinction between "a model" and the chain of tools and models you employ when asking something on chatgpt.com or any of the consumer facing alternatives.

The latter is a chain of models, some specialized in question dissecting, some specialized in choosing the right models and tools (i.e: there's a calculation in there, lets push that part to a simple python function that can actually count stuff, and pull the rest through a generic LLM). I experiment with such toolchains myself and it's baffling how fast the complexity of all this is becoming.

A very simple example would be "question" -> "does_it_want_code_generated.model" -[yes]-> specialized_code_generator.model | -[no]-> specialized_english_generator.model"

So, sure: a model has no "knowledge", and nor does a chain of tools. But having e.g. a model specialized (ie. trained on or enriched with) all scientific papers ever, or maybe even a vector DB with all that data, somewhere in the toolchain that is in charge of either finding the "very likely references" or denying an answer would help a lot. It would for me.

Yizahi

Sure, chains of networks can guess at the "passable" answer much better/faster/cheaper etc. But that doesn't remove the core issue, that none of the sub-networks or decision trees can understand what it generates, and so it can't abort its work and output "no answer" or something similar.

The whole premise of original request was that user raises a task for NN which has a verifiable (maybe partially) answer. He sees incorrect answer and wishes that a "failure" was displayed instead. But NN can't verify correctness of it's output. After all G in GPT stands for Generative.

netdevphoenix

> known for being brutally blunt, rude, honest, and pushing back.

That's a different perspective. Dutch people don't see themselves as rude. A Dutch could say that Americans are known for being dishonest and not truly conveying what they mean. Yet Americans won't see themselves this way. You can replace Dutch and American for any other nationality

berkes

I am Dutch, have lived in many countries in several continents. I do see myself as rude. But, being Dutch, I don't give a ** ;).

XCabbage

Obvious thought that I haven't tested: can you literally achieve this by getting it to answer in Dutch, or training an AI on Dutch text? Plausibly* Dutch-language training data will reflect this cultural difference by virtue of being written primarily by Dutch people.

* (though not necessarily, since the Internet is its own country with its own culture, and much training data comes from the Internet)

berkes

That hardly works. Though from my limited experiments, claude's models are better at this than OpenAIs. OpenAI will, quite often, come with suggestions that are literal translations of "anglicist" phrases.

Such as "Ik hoop dat deze email u gezond vindt" (I hope this email finds you well), which is so wrong that not even "simple" translation tools would suggest this.

Seeing that OpenAIs models can (could? This is from a large test we did months ago) not even use proper localized phrases but uses American ones, I highly doubt it can or will respond by refusing answers when it has none based on the training data.

zoover2020

I've tried Dutch answers and it is more than happy to hallucinate and give me answers that are very "American". Doesn't help that our culture is very inspired by the US pop culture as well since the internet.

Haven't tried prompt engineering with the Dutch stereotype, though.

j45

Also the more accuracy that is put into the prompt and attached, the more accurate the processing is.

OutOfHere

I have seen the other side where a configured AI responds "I don't know" far too much, often when it shouldn't. There is nothing more useless than it. Certainly we need an accurate balance.

randomNumber7

Thing about the Einstein example is, that it was already known the speed of light is constant.

The question he asked was just that this fact was not compatible with the Maxwell equations.

systemstops

Wouldn't the ability to "ask the right questions" require that AI could update its own weights, as those weights determine which questions can be asked?

mentalgear

BlueSky version: https://bsky.app/profile/thomwolf.bsky.social/post/3ljpkl6c6...

---

Quite interesting post that asks the right question about "asking the right questions". Yet one aspect I felt missing (which might automatically solve this) is first-principles-based causal reasoning.

A truly intelligent system — one that reasons from first principles by running its own simulations and physical experiments — would notice if something doesn't align with the "textbook version".

It would recognize when reality deviates from expectations and ask follow-up questions, naturally leading to deeper insights and the right questions - and answers.

Fascinating in this space is the new "Reasoning-Prior" approach (MIT Lab & Harvard), which trains reasoning capabilities learned from the physical world as a foundation for new models (before evening learning about text).

Relevant paper: "General Reasoning Requires Learning to Reason from the Get-go."

ilaksh

I think it's more of a social phenomenon than an intellectual characteristic. I guess these days people would just assume that outlier ideas come from autism, but I think that isn't necessarily true.

But maybe it helps to be socially isolated or just stubborn. People do not want to accept new approaches.

Clearly they do eventually, but there is always some friction.

But I think that it's been shown that through promoting and various types of training or tuning, LLMs can be configured to be non- sycophantic. It's just that humans don't want to be contradicted so that can be trained out of them during reinforcement.

Along with the training process just generally being aimed at producing expected rather than unexpected answers.

neilv

A nice post (that should be somewhere smarter than contemporary Twitter/X).

> PS: You might be wondering what such a benchmark could look like. Evaluating it could involve testing a model on some recent discovery it should not know yet (a modern equivalent of special relativity) and explore how the model might start asking the right questions on a topic it has no exposure to the answers or conceptual framework of. This is challenging because most models are trained on virtually all human knowledge available today but it seems essential if we want to benchmark these behaviors. Overall this is really an open question and I’ll be happy to hear your insightful thoughts.

Why benchmarks?

A genius (human or AI) could produce novel insights, some of which could practically be tested in the real world.

"We can gene-edit using such-and-such approach" => Go try it.

No sales brochure claims, research paper comparison charts to show incremental improvement, individual KPIs/OKRs to hit, nor promotion packets required.

vessenes

The reason you'd have a benchmark is that you want to be able to check in on your model programmatically. DNA wetwork is slow and expensive. While you're absolutely right that benchmarks aren't the best thing ever and that they are used for marketing and sales purposes, they also do seem to generally create capacity momentum in the market. For instance, nobody running local LLMs right now would prefer a 12 month-old model to one of the top models today at the same size - they are significantly more capable, and many researchers believe that training on new and harder benchmarks has been a way to increase that capacity.

Agingcoder

The author seems to assume that conjuring up a conjecture is the hard part - yet it will be filled with the same standard mathematics ( granted, sometimes wrapped as new tools, and the proof ends up being as important as the result), often at great cost.

Having powerful assistants that allow people to try out crazy mathematical ideas without fear of risking their careers or just having fun with ideas is likely to have an outsized impact anyway I think.

kristianc

As Isaac Newton himself put it, “if I have seen further it is by standing on the shoulders of Giants.” It was ever thus.

timewizard

New things AI will magically fix by existing: The completely broken university career publishing pipeline. *fingers crossed*

dang

(Most comments here were posted to https://news.ycombinator.com/item?id=43317269 and then moved hither.)

ypeterholmes

Hey look, the goalposts are being moved again. This time it's from top end researcher to generational genius. Question: what evidence is there that this benchmark will not be reached also? Time and again these essays make the mistake of assuming AI is a static thing, and refuse to acknowledge the inexorable march forward we are witnessing. As humans, we cling to our own fragile superiority. Even on this thread- I thought Hinton said the world would be transformed by now. That's NOT what was claimed. We are like three years in! Posts like this will be laughable in 10 years.

rcarmo

He means YMaaS, no? Might as well coin the acronym early.

tim333

>I’m afraid AI won't give us a "compressed 21st century".

There's no mention of exponential growth which seems a major omission when you are talking about centuries. Computers have kept improving in a Moore's law like way in terms of compute per dollar and no doubt will keep on like that for a while yet. Give it a few years and AI tech will be way better than what we have now. I don't know about exact timings like 5-10 years but in a while.

dimitri-vs

What exponential growth? By all accounts things are slowing down: sonnet3.7 is not exponentially better, neither is gpt4.5, grok3 is just catching up. I'm still using sonnet3.5 for a lot of coding because IMO it's better than 3.7.

zesterer

The whole point of this post is that the things AI isn't good at and has never been good at will be the limit to otherwise-exponential growth.

TeMPOraL

> If something was not written in a book I could not invent it unless it was a rather useless variation of a known theory. __More annoyingly, I found it very hard to challenge the status-quo__, to question what I had learned.

(__emphasis__ mine)

As if "challenging the status-quo" was the goal in the first place. You ain't gonna get any Einstein by asking people to think inside the "outside the box" box. "Status quo" isn't the enemy, and defying it isn't the path to genius; if you're measuring your own intellectual capacity by proxy of how much you question, you ain't gonna get anywhere useful. After all, questioning everything is easy, and doesn't require any particular skill.

The hard thing is to be right, despite both the status-quo and the "question the status-quo" memes.

(It also helps being in the right time and place, to have access to the results of previous work that is required to make that next increment - that's another, oft forgotten factor.)