SIMA 2: An agent that plays, reasons, and learns with you in virtual 3D worlds
18 comments
·November 13, 2025eminence32
This is obviously just a research project, but I do wonder about the next steps:
* After exploring an learning about a virtual world, can anything at all be transferred to an agent operating in the real world? Or would an agent operating in the real world have to be trained exclusively or partially in the real world?
* These virtual worlds are obviously limited in a lot of important ways (for example, character locomotion in a game is absolutely nothing like how a multi-limbed robot moves). Does there eventually need to be more sophisticated virtual worlds that more closely mirror our real world?
* Google seems clearly interested in generalized agents and AGI, but I'm actually somewhat interested in AI agents in video games too. Many video games have companion NPCs that you can sort of give tasks to, but in almost all cases, the companion NPCs are nearly uncontrollable and very limited in what they can actually do.
nharada
The end goal is to marry the lessons learned about HOW to learn in a virtual world with a high fidelity world model that's currently out of reach for this generation of AI. In a year or two once we have a world model that's realistic enough and fast enough, robots will be trained there and then (hopefully) generalize easily to the real world. This is groundwork trying to understand how to do that without having the models required to do it for real.
Workaccount2
>We’ve observed that, throughout the course of training, SIMA 2 agents can perform increasingly complex and new tasks, bootstrapped by trial-and-error and Gemini-based feedback.
>In subsequent training, SIMA 2’s own experience data can then be used to train the next, even more capable version of the agent. We were even able to leverage SIMA 2’s capacity for self-improvement in newly created Genie environments – a major milestone toward training general agents across diverse, generated worlds.
Pretty neat, I wonder how that works with Gemini, I suppose SIMA is a model (agent?) that runs on top of it?
FuckButtons
That’s what it sounded like to me, a plain text interface between two distinct systems.
ukuina
At 0:52 in their demo video, there is a grammatical inconsistency in the agent's text output. The annotations in the video are therefore suspected to be created by humans after the fact. Is Google up to their old marketing/hyping tricks again?
> SIMA 2 Reasoning:
> The user wants me to go to the ‘tomato house’. Based on the description ‘ripe tomato’, I identify the red house down the street.
m_w_
I can't speak to the content of the actual game being played, but it wouldn't surprise me if there was an in-game text prompt:
> "The house that looks like a ripe tomato!"
that was transformed into a "user prompt" in a more instructional format
> "Go to the tomato house"
And both were used in the agent output. At least the Y-axes on the graphs look more reasonable than some other recent benchmarks.
oersted
I get why they do it, they are a business. I just wish Google would get off their ivory tower and build in the open more like they used to (did they? maybe I'm misremembering...).
They've acquired this bad habit of keeping all their scientific experiments closed by default and just publishing press releases. I wish it was open-source by default and closed just when there's a good reason.
Don't get me wrong, I suppose this is more of a compliment. I really like what they are doing and I wish we could all participate in these advances.
singularity2001
Dreamer v3 was open, v4 coming soon?
null
JohnMakin
Isn't most of this demo no man's sky? The voiceover doesn't make it clear that the world is not generated by SIMA.
xnx
It's hard to keep up with the many different models and pace of progress.
Genie 3 is Google's world generating model: https://deepmind.google/blog/genie-3-a-new-frontier-for-worl...
lawlessone
If it can get through those lengthy glitchy nms story mission tutorials quickly it's already a super intelligence.
as much as some AI annoys me. This would be great for making games more accessible.
woeirua
Yet another blogpost that looks super impressive, until you get to the bottom and see the charts assessing held out task performance on ASKA and MineDojo and see that it's still a paltry 15% success rate. (Holy misleading chart batman!) Yes, it's a major improvement over SIMA 1, but we are still a long way from this being useful for most people.
Workaccount2
To be fair, it's 65% on all tasks (with a 75% human baseline) and 15% on unseen environments. They don't provide a human baseline for that, but I'd imagine it's much more than 15%.
tinfoilhatter
[flagged]
dang
Could you please stop posting flamebait and breaking the site guidelines? You've unfortunately been doing it repeatedly, including this dreadful thread from a couple weeks ago: https://news.ycombinator.com/item?id=45781981. I realize the other person was doing it also, but you (<-- I don't mean you personally, but all of us) need to follow the rules regardless of what other people are doing.
Comments like what your account has been posting are not what this site is for, and destroy what it is for, so if you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
jandrese
It seems pretty clear to me that they're trying to develop AGI humanoid assistants/workers without the messy and expensive real world hardware. Basically approaching the problem from the other end than a company like Tesla that built a robot and are now trying to figure out how to make a computer drive it without needing constant hand holding.
The gap between high level and low level control of robots is closing. Right now thousands of hours of task specific training data is being collected and trained on to create models that can control robots to execute specific tasks in specific contexts. This essentially turns the operation of a robot into a kind of video game, where inputs are only needed a in low-dimensional abstract form, such as "empty the dishwasher" or "repeat what I do" or "put your finger in the loop and pull the string". This will be combined with high-level control agents like SIMA 2 to create useful real-world robots.