Genie 3: A new frontier for world models
361 comments
·August 5, 2025modeless
tugn8r
But that was always going to be the case?
Reality is not composed of words, syntax, and semantics. A human modal is.
Other human modals are sensory only, no language.
So vision learning and energy models that capture the energy to achieve a visual, audio, physical robotics behavior are the only real goal.
Software is for those who read the manual with their new NES game. Where are the words inside us?
Statistical physics of energy to make machine draw the glyphs of language not opionated clustering of language that will close the keyboard and mouse input loop. We're like replicating human work habits. Those are real physical behaviors. Not just descriptions in words.
kkukshtel
I similarly am surprised at how fast they are progressing. I wrote this piece a few months ago about how I think steering world model output is the next realm of AAA gaming:
https://kylekukshtel.com/diffusion-aaa-gamedev-doom-minecraf...
But even when I wrote that I thought things were still a few years out. I facetiously said that Rockstar would be nerd-sniped on GTA6 by a world model, which sounded crazy a few months ago. But seeing the progress already made since GameNGen and knowing GTA6 is still a year away... maybe it will actually happen.
ewoodrich
> Rockstar would be nerd-sniped on GTA6 by a world model
I'm having trouble parsing your meaning here.
GTA isn't really a "drive on the street simulator", is it? There is deliberate creative and artistic vision that makes the series so enjoyable to play even decades after release, despite the graphics quality becoming more dated every year by AAA standards.
Are you saying someone would "vibe model" a GTAish clone with modern graphics that would overtake the actual GTA6 in popularity? That seems extremely unlikely to me.
throwmeaway222
I'm trying to wrap my head around this since we're still seeing text spit out slowly ( I mean slowly as in 1000's of tokens a second)
I'm starting to think some of the names behind LLMs/GenAI are cover names for aliens and any actual humans involved have signed an NDA that comes with millions of dollars and a death warrant if disobeyed.
kfarr
Bitter lesson strikes again!
nxobject
_Especially_ given the goal of a world model using a rasters-only frame-by-frame approach. Holy shit.
diwank
> Future robots may learn in their dreams...
So prescient. I definitely think this will be a thing in the near future ~12-18 months time horizon
casenmgreen
I may be wrong, but this seems to make no sense.
A neural net can produce information outside of its original data set, but it is all and directly derived from that initial set. There are fundamental information constraints here. You cannot use a neural net to itself generate from its existing data set wholly new and original full quality training data for itself.
You can use a neural net to generate data, and you can train a net on that data, but you'll end up with something which is no good.
scarmig
Humans are dependent on their input data (through lifetime learning and, perhaps, information encoded in the brain from evolution), and yet they can produce out of distribution information. How?
There is an uncountably large number of models that perfectly replicate the data they're trained on; some generalize out of distribution much better. Something like dreaming might be a form of regularization: experimenting with simpler structures that perform equally well on training data but generalize better (e.g. by discovering simple algorithms that reproduce the data equally well as pure memorization but require simpler neural circuits than the memorizing circuits).
Once you have those better generalizing circuits, you can generate data that not only matches the input data in quality but potentially exceeds it, if the priors built into the learning algorithm match the real world.
neom
I might be misunderstanding your comment so sorry if so. Robots have sensors and RL is a thing, they can collect real world data and then processing and consolidating real world experiences during downtime (or in real time), running simulations to prepare for scenarios, and updating models based on the day's collected data. The way I saw it that I thought was impressive was the robot understood the scene, but didn't know how the scene would respond to it's actions, so it gens videos of the possible scenarios, and then picks the best ones and models it's actuation based on it's "imagination".
schmidtleonard
We are miles away from the fundamental constraint. We know that our current training methodologies are scandalously data inefficient compared to human/animal brains. Augmenting observations with dreams has long been theorized to be (part of) the answer.
Demplolo
I actually think you can.
The LLM has plenty of experts and approaches etc.
Give it tool access let it formulate it's own experiments etc.
The only question here is if it becomes a / the singularity because of this, gets stuck in some local minimum or achieves random perfection and random local minimum locations.
hnuser123456
It's feasible you could have a personal neural net that fine-tunes itself overnight to make less inference mistakes in the future.
exe34
Any idea how humans do it? Where do they get novel information from?
neom
I'm invested in a startup that is doing something unrelated robotics, but they're spending a lot of time in Shenzhen, I keep a very close eye on robotics and was talking to their CTO about what he is seeing in China, versions of this are already being implemented.
dingnuts
[flagged]
null
dingnuts
what is a robot dream when there is clearly no consciousness?
What's with this insane desire for anthropomorphism? What do you even MEAN learn in its dreams? Fine-tuning overnight? Just say that!
gavinray
> What's with this insane desire for anthropomorphism?
Devil's advocate: Making the assumption that consciousness is uniquely human, and that humans are "special" is just as ludicrous.Whether a computational medium is carbon-based or silicon-based seems irrelevant. Call it "carbon-chauvinism".
olddustytrail
Yes, and an object in OOP isn't really a physical object. And a string isn't really a thin bit of rope.
No-one cares. It's just terminology.
Aco-
"Do Androids Dream of Electric Sheep?"
ivape
So this just happened from scaling the model
Unbelievable. How is this not a miracle? So we're just stumbling onto breakthroughs?
silveraxe93
Is it actually unbelievable?
It's basically what every major AI lab head is saying from the start. It's the peanut gallery that keeps saying they are lying to get funding.
JeremyNT
Even as a layman and AI skeptic, to me this entirely matches my expectations, and something like this seemed like it was basically inevitable as of the first demos of video rendering responding to user input (a year ago? maybe?).
Not to detract from what has been done here in any way, but it all seems entirely consistent with the types of progress we have seen.
It's also no surprise to me that it's from Google, who I suspect is better situated than any of its AI competitors, even if it is sometimes slow to show progress publicly.
glenstein
>It's basically what every major AI lab head is saying from the start.
I suppose it depends what you count as "the start". The idea of AI as a real research project has been around since at least the 1950s. And I'm not a programmer or computer scientist, but I'm a philosophy nerd and I know debates about what computers can or can't do started around then. One side of the debate was that it awaited new conceptual and architectural breakthroughs.
I also think you can look at, say, Ted Talks on the topic, with guys like Jeff Hawkins presenting the problem as one of searching for conceptual breakthroughs, and I think similar ideas of such a search have been at the center of Douglas Hofstadter's career.
I think in all those cases, they would have treated "more is different" like an absence of nuance, because there was supposed to be a puzzle to solve (and in a sense there is, and there has been, in terms of vector space and back propagation and so on, but it wasn't necessarily clear that physics could "pop out" emergently from such a foundation).
ivape
It's akin to us sending a rocket to space and immediately discovering a wormhole. Sure, there's a lot of science about what's out there, but to discover all this in our first few trips to orbit ...
shreezus
There are a lot of "interesting" emergent behaviors that happen just a result of scaling.
Kind of like how a single neuron doesn't do much, but connect 100 billion of them and well...
spaceman_2020
becoming really, really hard to refute the Simulation Theory
casenmgreen
The guy who tried was invite by Google to try it.
He seems to me too enthusiastic, such that I feel Google asked him in particular because they trusted him to write very positively.
alphabetting
I doubt there was a condition on writing positively. Other people who tested have said this won't replace engines. https://togelius.blogspot.com/2025/08/genie-3-and-future-of-...
SequoiaHope
You don’t ask people to speak how you want, you simply only invite people who already have a history of speaking how you want. This phenomena is explained in detail I. Noam Chomsky’s work around mass media (eg NY Times doesn’t tell their editors what to do exactly, but only hire editors who already want to say what NY Times wants, or have a certain world view). The same can be applied to social media reviews. Invite the person who gives glowing reviews all the time.
phkahler
But can we use it to create movies one scene at a time?
make3
It wouldn't be surprising if a structured version of this with state cached per room for example could be used in a game.
& you're basically seeing GPT-3 and saying it will never be used in any serious application.. the rate of improvement in their model is insane
echelon
> What I don't think this technology will do is replace game engines. I just don't see how you could get the very precise and predictable editing you have in a regular game engine from anything like the current model. The real advantage of game engines is how they allow teams of game developers to work together, making small and localized changes to a game project.
I've been thinking about this a while and it's obvious to me:
Put Minecraft (or something similar) under the hood. You just need data structures to encode the world. To enable mutation, location, and persistence.
If the model is given additional parameters such as a "world mesh", then it can easily persist where things are, what color or texture they should be, etc.
That data structure or server can be running independently on CPU-bound processes. Genie or whatever "world model" you have is just your renderer.
It probably won't happen like this due to monopolistic forces, but a nice future might be a future where you could hot swap renderers between providers yet still be playing the same game as your friends - just with different looks and feels. Experiencing the world differently all at the same time. (It'll probably be winner take all, sadly, or several independent vertical silos.)
If I were Tim Sweeny at Epic Games, I'd immediately drop all work on Unreal Engine and start looking into this tech. Because this is going to shore them up on both the gaming and film fronts.
echelon
I don't know. I wasn't there and I'm excited.
I think this puts Epic Games, Nintendo, and the whole lot into a very tough spot if this tech takes off.
I don't see how Unreal Engine, with its voluminous and labyrinthine tomes of impenetrable legacy C++ code, survives this. Unreal Engine is a mess, gamers are unhappy about it, and it's a PITA to develop with. I certainly hate working with it.
Innovator's Dilemma fast approaching the entire gaming industry and they don't even see it coming it's happening so fast.
Exciting that building games could become as easy as having the idea itself. I'm imagining something like VRChat or Roblox or Fortnite, but where new things are simply spoken into existence.
It's absolutely terrifying that Google has this much power.
sureglymop
How so? It's not really by itself being creative yet, no? It sure seems like a game changer but who knows if one can even use this at scale?
csomar
Also he is ex-Google Mind. Like the worst kind of pick you can make when there are dozens of eligible journalists out there.
forrestthewoods
> this is a clear glimpse into the future.
Not for video games it isn’t.
dlivingston
Unless and until state can be stored outside of the model.
I for one would love a video game where you're playing in a psychedelic, dream-like fugue.
throwmeaway222
It's kinda crazy though that a single game session would be burning enough natural gas to power 3 cities. Unless that's not true
qwertox
This is revolutionary. I mean, we already could see this coming, but now it's here. With limitations, but this is the beginning.
In game engines it's the engineers, the software developers who make sure triangles are at the perfect location, mapping to the correct pixels, but this here, this is now like a drawing made by a computer, frame by frame, with no triangles computed.
ollin
This is very encouraging progress, and probably what Demis was teasing [1] last month. A few speculations on technical details based on staring at the released clips:
1. You can see fine textures "jump" every 4 frames - which means they're most likely using a 4x-temporal-downscaling VAE with at least 4-frame interaction latency (unless the VAE is also control-conditional). Unfortunately I didn't see any real-time footage to confirm the latency (at one point they intercut screen recordings with "fingers on keyboard" b-roll? hmm).
2. There's some 16x16 spatial blocking during fast motion which could mean 16x16 spatial downscaling in the VAE. Combined with 1, this would mean 24x1280x720/(4x16x16) = 21,600 tokens per second, or around 1.3 million tokens per minute.
3. The first frame of each clip looks a bit sharper and less videogamey than later stationary frames, which suggests this is could be a combination of text-to-image + image-to-world system (where the t2i system is trained on general data but the i2w system is finetuned on game data with labeled controls). Noticeable in e.g. the dirt/textures in [2]. I still noticed some trend towards more contrast/saturation over time, but it's not as bad as in other autoregressive video models I've seen.
[1] https://x.com/demishassabis/status/1940248521111961988
[2] https://deepmind.google/api/blob/website/media/genie_environ...
ollin
Regarding latency, I found a live video of gameplay here [1] and it looks like closer to 1.1s keypress-to-photon latency (33 frames @ 30fps) based on when the onscreen keys start lighting up vs when the camera starts moving. This writeup [2] from someone who tried the Genie 3 research preview mentions that "while there is some control lag, I was told that this is due to the infrastructure used to serve the model rather than the model itself" so a lot of this latency may be added by their client/server streaming setup.
[1] https://x.com/holynski_/status/1952756737800651144
[2] https://togelius.blogspot.com/2025/08/genie-3-and-future-of-...
rotexo
You know that thing in anxiety dreams where you feel very uncoordinated and your attempts to manipulate your surroundings result in unpredictable consequences? Like you try to slam on the brake pedal but your car doesn’t slow down, or you’re trying to get a leash on your dog to lead it out of a dangerous situation and you keep failing to hook it on the collar? Maybe that’s extra latency because your brain is trying to render the environment at the same time as it is acting.
addisonj
Really impressive... but wow this is light on details.
While I don't fully align with the sentiment of other commenters that this is meaningless unless you can go hands on... it is crazy to think of how different this announcement is than a few years ago when this would be accompanied by an actual paper that shared the research.
Instead... we get this thing that has a few aspects of a paper - authors, demos, a bibtex citation(!) - but none of the actual research shared.
I was discussing with a friend that my biggest concern with AI right now is not that it isn't capable of doing things... but that we switched from research/academic mode to full value extraction so fast that we are way out over our skis in terms of what is being promised, which, in the realm of exciting new field of academic research is pretty low-stakes all things considered... to being terrifying when we bet policy and economics on it.
To be clear, I am not against commercialization, but the dissonance of this product announcement made to look like research written in this way at the same time that one of the preeminent mathematicians writing about how our shift in funding of real academic research is having real, serious impact is... uh... not confidence inspiring for the long term.
yanis_t
> Text rendering. Clear and legible text is often only generated when provided in the input world description.
Reminds me of when image AIs weren't able to generate text. It wasn't too long until they fixed it.
reactordev
And made hands 10x worse. Now hands are good, text is good, image is good, so we’ll have to play where’s Waldo all over again trying to find the flaw. It’s going to eventually get to a point where it’s one of those infinite zoom videos where the AI watermark is the size of 1/3rd of a pixel.
What I’d really love to see more of is augmented video. Like, the stormtrooper vlogs. Runway has some good stuff but man is it all expensive.
TheAceOfHearts
I wouldn't say that the text problem has been fully fixed. It has certainly gotten a lot better, but even gpt-image-1 still fails occasionally when generating text.
yencabulator
Note that the prompt and the generated chalkboard disagree on whether there's a dash or not.
mason_mpls
The claims being made in this announcement are not demonstrated in the video. A very careful first person walk in an AI video isn’t very impressive these days…
timeattack
Advances in generative AI are making me progressively more and more depressive.
Creativity is taken from us at exponential rate. And I don't buy argument from people who are saying they are excited to live in this age. I can get that if that technology stopped at current state and remained to be just tools for our creative endeavours, but it doesn't seem to be an endgame here. Instead it aims to be a complete replacement.
Granted, you can say "you still can play musical instruments/paint pictures/etc for yourself", but I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.
So what is final state here for us? Return to menial not-yet-automated work? And when this would be eventually automated, what's left? Plug our brains to personalized autogenerated worlds that are tailored to trigger related neuronal circuitry for producing ever increasing dopamine levels and finally burn our brains out (which is arguably already happening with tiktok-style leasure)? And how you are supposed to pay for that, if all work is automated? How economics of that is supposed to work?
Looks like a pretty decent explanation of Fermi paradox. No-one would know how technology works, there are no easily available resources left to make use of simpler tech and planet is littered to the point of no return.
How to even find the value in living given all of that?
rolfus
I'm one of those excited people! We haven't lost anything with this new technology, only gained.
The way I see it, most people aren't creative. And the people who are creatives are mostly creating for the love of it. Most books that are published are read exclusively by the friends and family of the author. Most musicians, most stand-up comedians, most artist get to show off their works for small groups of people and make no money doing so. But they do it anyway. I draw terrible portraits, make little inventions and sometimes I build something for the home, knowing full well that I do these things for my own enjoyment and whatever ego boost I get from showing these things off to people I know.
I'm doing a marathon later and I've been working my ass off for the prospect of crossing the finishing line as number four thousand and something, and I'll do it again next year.
Etheryte
> I don't think there was ever a period of time where creative works were just created for sake of itself rather for sharing it with others at masse.
Numerous famous writers, painters, artists, etc counter this idea, Kafka being a notable example, whose significant works only came to light after his passing and against his will. This doesn't take away from the rest of your discussion point, but art always has and always will also exist solely for its own sake.
stillpointlab
> I don't buy argument from people who are saying they are excited to live in this age
What argument is required for excitement? Excitement is a feeling not a rational act. It comes from optimism and imagination. There is no argument for optimism. There is often little reason in imagination.
> How to even find the value in living given all of that?
You might have heard of the Bhagavad Gita, a 2000+ year old spiritual text. It details a conversation between a warrior prince and a manifestation of God. The warrior prince is facing a very difficult battle and he is having doubts justifying any action in the face of the decisions he has to make. He is begging this manifestation of God to give him good reasons to act, good reasons not just to throw his weapons down, give away all his possessions and sit in a cave somewhere.
There are no definite answers in the text, just meditations on the question. Why should we act when the result is ultimately pointless, we will all die, people will forget you, situations will be resolved with or without you, etc.
This isn't some new question that LLMs are forcing us to confront. LLMs are just providing us a new reason to ask the same age-old questions we have been facing for as long as writing has existed.
HocusLocus
Genie 3 not only groks the Bhagavad Gita, it can generate "Blue & Elephant People: The Movie".
p4coder
Today physical world is largely mechanized, we rarely walk, run lift heavy things for survival. So we grow fat and weak unless we exercise. Tomorrow vast majority of us will never think, create, investigate for earning a living. So we will get dumb and dumber over time. A small minority of us will keep polishing their intellect but will never be smarter than machines just like the best athletes of today can't outrun machines.
pizzathyme
This is surprisingly a great analogy because millions of people still run every week for their own benefit (physical and mental health, social connection, etc).
I wonder if mental exercises will move to the same category? Not necessarily a way to earn money, but something everybody does as a way of flourishing as a human.
psbp
The process of thinking and exploring ideas is inherently enriching.
Nothing can take away your ability to have incredible experiences, except if the robots kill us all.
Kiro
I don't understand your argument at all. I've made hundreds of songs in my life that I haven't shared with anyone and so have all other musicians I know. The act of creating is separate from finding or having an audience. In fact, I would say that the complete opposite of what you say is true.
And even so, music production has been a constant evolution of replacing prior technologies and making it easier to get into. It used to be gatekept by expensive hardware.
skybrian
We already live in a world where a vast library of songs by musicians who play much better than you are readily available on YouTube and Spotify. This seems like more of the same?
podgietaru
I like living in a world where I know that people who have spent actually time on nurturing a talent get rewarded for doing so, even if that talent is not something I will ever be good at.
I don't want to live in a world where these things are generated cheaply and easily for the profit of a very select few group of people.
I know the world doesn't work like I described in the top paragraph. But it's a lot closer to it than the bottom.
wolttam
It's hard to see how there will be room for profit as this all advances
There will be two classes of media:
- Generated, consumed en-masse by uncreative, uninspired individuals looking for cheap thrill
- Human created, consumed by discerning individuals seeking out real human talent and expression. Valuing it based merely on the knowledge that a biological brain produced (or helped produce) it.
I tend to suspect that the latter will grow in value, not diminish, as time progresses
bko
> I like living in a world where I know that people who have spent actually time on nurturing a talent get rewarded for doing so, even if that talent is not something I will ever be good at.
Rewarded how? 99.99% of people who do things like sports or artistic like writing never get "rewarded for doing so", at least in the way I imagine you mean the phrase. The reward is usually the experience itself. When someone picks up a ball or an instrument, they don't do so for some material reward.
Why should anyone be rewarded materially for something like this? Why are you so hung up on the <0.001% that can actually make some money now having to enjoy the activity more as a hobby than a profession.
null
fantasizr
I could see that theater and live music (especially performed on acoustic instruments) become hyper popular because it'll be the only talent worth paying to see when everything else is 'cheaply' made.
pessimizer
> I like living in a world where I know that people who have spent actually time on nurturing a talent get rewarded for doing so, even if that talent is not something I will ever be good at.
That world has only existed for the last hundred or so years, and the talent is usually brutally exploited by people whose main talent is parasitism. Only a tiny percentage of people who sell creative works can make a living out of it; the living to be made is in buying their works at a premium, bundling them, and reselling them, while offloading almost all of the risk to the creative as an "advance."
Then you're left in a situation where both the buyer of art and the creator of art are desperate to pander to the largest audience possible because everybody is leveraged. It's a dogshit world that creates dogshit art.
Saline9515
It still requires work, dedication and produces authenticity. A world where AI can produce music instantly commoditizes it.
skybrian
Music is already a commodity. You can just buy some anonymous background music to play in your restaurant. No effort required.
whamlastxmas
I mean you can just listen to human made music if that’s an important part of the experience for you. I doubt humans are going to stop anytime soon
j_timberlake
I don't know how on Earth people can think like this. Most people can find "value" in a slice of pizza. It doesn't even have to be a good pizza.
Or kittens and puppies. Do you think there won't be kittens and puppies?
And that's putting aside all the obvious space-exploration stuff that will probably be more interesting than anything the previous 100 billion humans ever saw.
yomismoaqui
The question is, why are you doing art?
- Because you enjoy it
- Because you get pats in the back from people you share it with
- Because you want to earn money from it
The 1st one will continue to be true in this dystopian AI art future, the other not so much.
And sincerely I find that kind of human art, the one that comes from a pure inner force, the more interesting one.
EDIT: list formatting
sunsunsunsun
You seem to forget that most artists enjoy it but due to the structure of our society are forced to either give it up for most of their waking life to earn money or attempt to market their art to the masses to make money. This AI stuff only makes it harder for artists to make any kind of living off of their work.
MetaWhirledPeas
While there are plenty of cases where good artists make most of their money from the art, there are plenty of other cases where good artists have a 'real job' on the side.
jjrh
Ideally AI makes it so you don't have to work and can pursue whatever interests.
assword
> The 1st one will continue to be true in this dystopian AI art future, the other not so much.
No it won’t, you’ll be too busy trying to survive off of what pittance is left for you to have any time to waste on leisure activities.
Oarch
I don't think I've ever seen a presentation that's had me question reality multiple times before. My mind is suitably blown.
mason_mpls
The demo looks like they’re being very gentle with the AI, this doesn’t look like much of an advancement.
_hark
Very cool! I've done research on reinforcement/imitation learning in world models. A great intro to these ideas is here: https://worldmodels.github.io/
I'm most excited for when these methods will make a meaningful difference in robotics. RL is still not quite there for long-horizon, sparse reward tasks in non-zero-sum environments, even with a perfect simulator; e.g. an assistant which books travel for you. Pay attention to when virtual agents start to really work well as a leading signal for this. Virtual agents are strictly easier than physical ones.
Compounding on that, mismatches between the simulated dynamics and real dynamics make the problem harder (sim2real problem). Although with domain randomization and online corrections (control loop, search) this is less of an issue these days.
Multi-scale effects are also tricky: the characteristic temporal length scale for many actions in robotics can be quite different from the temporal scale of the task (e.g. manipulating ingredients to cook a meal). Locomotion was solved first because it's periodic imo.
Check out PufferAI if you're scale-pilled for RL: just do RL bigger, better, get the basics right. Check out Physical Intelligence for the same in robotics, with a more imitation/offline RL feel.
unboxingelf
The Simulation Theory presents the following trilemma, one of which must be true:
1. Almost all human-level civilizations go extinct before reaching a technologically mature “posthuman” stage capable of running high-fidelity ancestor simulations.
2. Almost no posthuman civilizations are interested in running simulations of their evolutionary history or beings like their ancestors.
3. We are almost certainly living in a computer simulation.
lotyrin
If you take the idea of it needing to be a constructed simulation you get the dream argument. If you add that one can't verify anyone else having subjective experience you get Boltzmann brain. If you add the idea that maybe the ancestor simulations are designed to teach us virtuous behavior through repeated visits to simulation worlds you get the karmic cycle, and Boltzmann brain + karmic cycle is roughly the egg theory.
I think some/all of these things can roughly true at the same time. Imagine an infinite space full of chaotic noise that arises a solitary Boltzmann brain, top level universe and top level intelligence. This brain, seeking purpose and company in the void, dreams of itself in various situations (lower level universes) and some of those universes' societies seek to improve themselves through deliberate construction of karmic cycle ancestor simulation. A hierarchy of self-similar universes.
It was incredibly comforting to me to think that perhaps the reason my fellow human beings are so poor at empathy, inclusion, justice, is that this is a karmic kindergarten where we're intended to be learning these skills (and the consequences for failing to perform them) and so of course we're bad at it, it's why we're here.
Workaccount2
I wonder how hard it would be to get VR output?
That's an insane product right there just waiting to happen. Too bad Google sleeps so hard on the tech they create.
SeanaldMcDnld
Consistent output and spatial coherence across each eye, maybe a couple years? But meeting head tracking accuracy and latency requirements, I’d bet decades. There’s no way any of this tech reduces end to end latency to acceptable levels, without a massive change in hardware. We’ll probably see someone use reprojection techniques in a year or so and claim they’ve done it. But true generated pixels straight to the headset based on head tracking, is so so far away.
nosignono
You don't have to do it in real time, per se. I imagine a world in which the renderer and the world generation are decoupled. For example, you could descriptively articulate what you wanted to achieve and have it generate a world, quietly do some structure from motion (or just generate the models and textures), and those those as assets in a game engine for the actual moment to moment rendering.
You'd have some "please wait in this lobby space while we generate the universe" moments, but those are easy to hide with clever design.
kridsdale3
Agree. So I'll make a wild bet of "20 years". And hope for the best.
pawelduda
It's hard to get an acceptable VR output for today's rendering engines still. In the examples provided, the movement seems to be slow and somewhat linear, which doesn't translate to head movements in VR. VR needs 2 consistent videos with much higher resolutions and low latency is a must. The feedback would still be very dependent on people's tolerance to all imperfections - some would be amazed, others would puke. That's why VR still isn't in the spotlight after all the years (I personally find it great).
kridsdale3
I think VR will come at the same time they make multiplayer. There needs to be differentiation between the world-state and the viewport. Right now, I suspect they're the same.
But once you can get N cameras looking at the same world-state, you can make them N players, or a player with 2 eyes.
Consistency over multiple minutes and it runs in real time at 720p? I did not expect world models to be this good yet.
> Genie 3’s consistency is an emergent capability
So this just happened from scaling the model, rather than being a consequence of deliberate architecture changes?
Edit: here is some commentary on limitations from someone who tried it: https://x.com/tejasdkulkarni/status/1952737669894574264
> - Physics is still hard and there are obvious failure cases when I tried the classical intuitive physics experiments from psychology (tower of blocks).
> - Social and multi-agent interactions are tricky to handle. 1vs1 combat games do not work
> - Long instruction following and simple combinatorial game logic fails (e.g. collect some points / keys etc, go to the door, unlock and so on)
> - Action space is limited
> - It is far from being a real game engines and has a long way to go but this is a clear glimpse into the future.
Even with these limitations, this is still bonkers. It suggests to me that world models may have a bigger part to play in robotics and real world AI than I realized. Future robots may learn in their dreams...