The Einstein AI Model

216 comments

·March 8, 2025

mentalgear

BlueSky version: https://bsky.app/profile/thomwolf.bsky.social/post/3ljpkl6c6...

---

Quite interesting post that asks the right question about "asking the right questions". Yet one aspect I felt missing (which might automatically solve this) is first-principles-based causal reasoning.

A truly intelligent system — one that reasons from first principles by running its own simulations and physical experiments — would notice if something doesn't align with the "textbook version".

It would recognize when reality deviates from expectations and ask follow-up questions, naturally leading to deeper insights and the right questions - and answers.

Fascinating in this space is the new "Reasoning-Prior" approach (MIT Lab & Harvard), which trains reasoning capabilities learned from the physical world as a foundation for new models (before evening learning about text).

Relevant paper: "General Reasoning Requires Learning to Reason from the Get-go."

mentalgear

PS: great explainer video https://www.youtube.com/watch?v=seTdudcs-ws&t=180s

antonkar

Interesting, I think the guy who’ll make the GUI for LLMs is the next Jobs/Gates/Musk and Nobel Prize Winner (I think it’ll solve alignment by having millions of eyes on the internals of LLMs), because computers became popular only after the OS with a GUI appeared. I just started ASK HN to let people share their AI safety ideas, both crazy and not: https://news.ycombinator.com/item?id=43332593

zombot

"Reality" is an alien concept to an LLM. All they have is sequences of words that they can complete.

reverius42

"Reality" is an alien concept to a Human. All they have is sequences of motions and noises they can complete.

neom

I explained to someone mum is mum because of lips, air and sounds and they didn't believe me, so that's what we're contending with....

herculity275

I used to think that but newer models' increasing coherence coupled with the astonishing success of RLHF makes me think that LLMs that can effectively intuit "reality" will emerge fairly soon.

alex77456

Maybe we're overestimating average human brain capabilities

chezelenkoooo

I think the evidence of all human progress disagrees with this statement fairly strongly.

engfan

I have never heard anyone think this way: “The main mistake people usually make is thinking Newton or Einstein were just scaled-up good students, that a genius comes to life when you linearly extrapolate a top-10% student.

The reason such people are widely lauded as geniuses is precisely because people can’t envision smart students producing paradigm-shifting work as they did.

Yes, people may be talking about AI performance as genius-level but any comparison to these minds is just for marketing purposes.

8note

we kinda think too much of them though. each is also a product of their surroundings, and had contemporaries who could or did come to the same revelations.

InfiniteLoup

What's your opinion on John von Neumann?

seanhunter

Yes, or Leonard Euler.

_diyar

Alien

downboots

If the universe is not intelligent, how can a subset of it be intelligent? If it is all computation, what is the purpose?

sebastiennight

The first question is weird. Many subsets of X can have property Y without X having it, wouldn't you say?

"If the desert is not covered in palm trees, how can a subset of it be covered in palm trees?"

"If the neural network is not activating, how can a node of the network be activating?"

neilv

A nice post (that should be somewhere smarter than contemporary Twitter/X).

> PS: You might be wondering what such a benchmark could look like. Evaluating it could involve testing a model on some recent discovery it should not know yet (a modern equivalent of special relativity) and explore how the model might start asking the right questions on a topic it has no exposure to the answers or conceptual framework of. This is challenging because most models are trained on virtually all human knowledge available today but it seems essential if we want to benchmark these behaviors. Overall this is really an open question and I’ll be happy to hear your insightful thoughts.

Why benchmarks?

A genius (human or AI) could produce novel insights, some of which could practically be tested in the real world.

"We can gene-edit using such-and-such approach" => Go try it.

No sales brochure claims, research paper comparison charts to show incremental improvement, individual KPIs/OKRs to hit, nor promotion packets required.

vessenes

The reason you'd have a benchmark is that you want to be able to check in on your model programmatically. DNA wetwork is slow and expensive. While you're absolutely right that benchmarks aren't the best thing ever and that they are used for marketing and sales purposes, they also do seem to generally create capacity momentum in the market. For instance, nobody running local LLMs right now would prefer a 12 month-old model to one of the top models today at the same size - they are significantly more capable, and many researchers believe that training on new and harder benchmarks has been a way to increase that capacity.

Agingcoder

The author seems to assume that conjuring up a conjecture is the hard part - yet it will be filled with the same standard mathematics ( granted, sometimes wrapped as new tools, and the proof ends up being as important as the result), often at great cost.

Having powerful assistants that allow people to try out crazy mathematical ideas without fear of risking their careers or just having fun with ideas is likely to have an outsized impact anyway I think.

aleksiy123

The Bitter Lesson seems relevant here again. http://www.incompleteideas.net/IncIdeas/BitterLesson.html

I think I read somewhere about Erdős having this somewhat brute force approach. Whenever fresh techniques were developed (by himself or others), he would go back to see if they could be used on one of his long-standing open questions.

HappMacDonald

I think this is the second time I've read this blog post, but it increasingly strikes me as parenting advice.

Translated to that domain, it reads "teach your kids how to think, not what to think".

causal

Paradoxically, as a parent I find the notion that humans are blank slates completely false. Babies come with a tremendous amount of pre-programmed behaviors and interests.

robwwilliams

Which is great advice that almost no parents follow.

tensor

Even worse, people seem to forget that “science” is not math. You need to test hypotheses with physical (including biological) experiments. The vast majority of the time spent doing “science” is running these experiments.

An LLM like AI won’t help with that. It would still be a huge help in finding and correlating data and information though.

timewizard

New things AI will magically fix by existing: The completely broken university career publishing pipeline. *fingers crossed*

kristianc

As Isaac Newton himself put it, “if I have seen further it is by standing on the shoulders of Giants.” It was ever thus.

msvana

I have a few thoughts after reading this:

- I started to see LLMs as a kind of search engines. I cannot say they are better than traditional search engines. On one hand, they are better at personalizing the answer, on the other hand, they hallucinate a lot.

- There is a different view on how new scientific knowledge is made. It's all about connecting existing dots. Maybe LLMs can assist with this task by helping scientists discover relevant dots to connect. But as the author suggests, this is only part of the job. To find the correct ways to connect the dots, you need to ask the right questions, examine the space of counterfactuals, etc. LLMs can be useful tool, but they are not autonomous scientists (yet).

- As someone developing software on top of LLMs, I am slowly coming to a conclusion that human-in-the-loop approaches seem to work better than fully autonomous agents.

downboots

Instead of connecting language with physical existence, or entities, it's connecting tokens. An LLM may be able to describe scenes in a video, but a model would tell you that said video is a deep fake because of some principle like conservation of energy and mass informed by experience, assumptions, inference rules, etc.

msabalau

It doesn't seem correct to dismiss the creativity of Move 37 because real originality is "something more fundamental, like inventing the rules of Go itself"

It would seem more fruitful to simply point out that LLMs aren't all of AI, and that excelling at mimicking human-like text production isn't really doing the work that AlphaGo was attempting.

Just because both things might be given as (different) examples of deep reinforcement learning in an AI survey course doesn't mean that we have much reason to believe that the vast investments in LLMs result in AlphaGo like achievements.

Nesco

Modern LRMs do have some tiny degrees of intelligence

OtherShrezzing

>We're currently building very obedient students, not revolutionaries. This is perfect for today’s main goal in the field of creating great assistants and overly compliant helpers. But until we find a way to incentivize them to question their knowledge and propose ideas that potentially go against past training data, they won't give us scientific revolutions yet.

This would definitely be an interesting future. I wonder what it'd do to all of the work in alignment & safety if we started encouraging AIs to go a bit rogue in some domains.

chr15m

If this take is correct and we need creative B students, we might still get a compressed 21st century with human creative B students working together with AI A students who support the human with research, validation, workshopping ideas, etc.

downboots

"Alpha children wear grey. They work much harder than we do, because they're so frightfully clever. I'm awfully glad I'm a Beta, because I don't work so hard."

"The lower the caste, the shorter the oxygen."

EigenLord

I think the author has a point. LLMs struggle with what you might call epistemically constructive novelty. It's the ability not just to synthesize existing knowledge, but to identify what's missing and conjecture something to fill the gap and demonstrate it to satisfaction. Out-of-distribution knowledge gaps are typically where LLMs "hallucinate." Unlike highly skilled human researchers, they don't pause and construct the bridge that will get them from known to unknown, they just immediately rush to fill in the blank with whatever sounds most plausible. They need to ask questions that haven't been asked before, or answer ones that haven't been answered. Is this just some missing subroutine that we'll eventually figure out? Or is this conjecture-proving process much more elaborate than whatever existing models, no matter how scaled, can manage? I'm not sure. But the answer starts with a question.

_cs2017_

I wonder if people could just write their blogs posts in a short form: claim, argument in favor, counter-argument; consequences (this is optional).

Like this whole blog post could be:

Claim: Current AI is unlikely to usher in an era of dramatically accelerated scientific discovery.

Argument in favor: A genius does not come to life when you linearly extrapolate a top-10% student. Newton or Einstein is not just scaled-up good students. To create an Einstein, we need a system can ask questions nobody else has thought of or dared to ask. One that writes 'What if everyone is wrong about this?' when all textbooks, experts, and common knowledge suggest otherwise.

Existing benchmarks don't test such skills. And existing systems are likely hopelessly far from this capability (based on the author's personal feelings).

Counter-argument: none.

Consequences: obvious.

hackerknew

Could we train an AI model on the corpus of physics knowledge up to the year 1905 and then see if we can adjust the prompt to get it to output the theory of relativity?

This would be an interesting experiment for other historical discoveries too. I'm now curious if anybody has created a model with "old data" like documents and books from hundreds of years ago, and see if comes up with the same conclusions as researchers and scientists of the past.

Would AI have been able to predict the effectiveness of vaccines, insulin, other medical discoveries?

Garlef

Great idea!

But there might not be enough text.

And: There's a similar situation to why double blind studies are necessary - The questions we pose to such a system would be contaminated by our cultural background; We'd might be leading the system.

And if the system is autonomous and we wait for something true to appear how would we know that the final system, trained on current data produced something worthwhile?

Take maths: Producing new proofs and new theorems might not be the issue. Rather: Why should we care about these result? Thousands of PhD students produce new mathematics all the time. And most of it is irrelevant.

esafak

That's the ideal, but I think today's models are too crude for that. Relativity is built on differential geometry, which was new at the time. I think inventing or even building that is beyond today's models; there's an infinitely large space of mathematics that can be invented, and barely a gradient to guide the search. Humans don't coin mathematics by gradient descent. The most I've seen is fitting observations using existing mathematics; a technique known as symbolic regression. The E=mc^2 equation could be curve fitted like this, but it would afford no insight.

https://en.wikipedia.org/wiki/Symbolic_regression

ilamparithi

Had the same thought sometime back about AI discovering theory of relativity with only the data before 1905. It would give a definite answer about whether any reasoning involved in the LLM output.

knowaveragejoe

Now that would be interesting!

systemstops

Wouldn't the ability to "ask the right questions" require that AI could update its own weights, as those weights determine which questions can be asked?

esafak

If the existing weights don't let it ask the right questions, assuming it is properly fit, why would retraining it fix the problem?