Skip to content(if available)orjump to list(if available)

Robust autonomy emerges from self-play

Robust autonomy emerges from self-play

66 comments

·February 7, 2025

markisus

Some interesting points from this paper:

- All simulated agents use the same neural net with the same weights, albeit with randomized rewards and conditioning vector to allow them to behave as different types of vehicles with different types of aggressiveness. This is like driving in a world where everyone is different copies of you, but some of your copies are in rush while others are patient. This allows backprop to optimize for a sort of global utility across the entire population.

- There is no modeling of occlusion effects. Instead, agents are given the state of nearby agents, but corrupted by random noise. In the real world, occluded nearby agents can be extremely close (think about a child running out from behind a parked car). The paper comments on this.

> Both Waymax and nuPlan construct observations, maps, and other actors with auto-labeling tools from realworld perception data. This brings occlusion, incorrect or missing traffic-light states, and obstacles revealed at the last moment. Despite the minimalistic noise modeling in GIGAFLOW, the GIGAFLOW policy generalizes zero-shot to these conditions.

- The resulting policy simulates agents that are human-like, even though the system has never seen humans drive. This is a great result when one considers other reinforcement learning projects produce extremely high performance agents that humans would consider to be abusive or pathological.

nine_k

Can there be "smart toys" for models that help them self-improve in a particularly efficient way?

cainxinth

A Young Lady's Illustrated Primer

Rebuff5007

In RL literature this is generally called "curriculum learning".

The curriculum is usually modeled as some form of reward function to steer learning, or sometimes by environment configuration (e.g. learn to walk on a normal surface before a slippery surface).

visarga

Yes, the smart toys are search, code execution, simulations and games.

jazzyjackson

video games are basically like this, progressive level require more skill, learned from the easier levels.

djmips

And this is a reason we play video games? That they appeal to some ancient instinct to improve?

jazzyjackson

Rather, I think it’s borne of necessity to onboard you to the game mechanics. We certainly have a bird brained instinct to catch the worm / win a round, so a good game design cuts you some slack to begin with, so you can have a little dopamine as a treat

From there, difficulty should scale up so you don’t always win, giving you that “intermittent reinforcement“ that makes games addictive

krige

One of the reasons. I'd wager this is what appeals to people not merely playing but mastering a particular game - playing higher difficulties, 100% completion, and so on.

The other reasons would be overcoming other humans (esports/pvp multiplayer), discovery (story driven and exploratory games), and just passing the time (casual games).

jepj57

I'd say it's feedback/reward loop plus small, quick to achieve, progressive goal setting.

hirokio123

I'm creating "smart toys" like that for humans. I recently launched a mobile app. I'd love to see these research breakthroughs feed back into human learning because if humans remain foolish, the world could fall apart.

With DeepSeek R1 and these autonomous driving research results, it feels like we've entered an era where human data is no longer necessary. The ability to infinitely expand learning through simulation while maintaining safety in the real world feels like science fiction coming to life—it's truly exciting.

grandma_tea

Can you expand on that? Efficient in what way?

nine_k

Efficient in the way of bringing the model to meet the criteria of autonomy faster. On one hand it may be something specifically efficient at reaching some autonomy qualities. OTOH it could be just something that efficiently uses the improvement in the model during training to make the subsequent training faster.

null

[deleted]

seaucre

This is interesting, and I have always thought this approach worth exploring given the "bitter lesson" in other ML domains, but I think we should be skeptical until we see such models deployed and operating effectively on real-world vehicles.

dhbradshaw

Interesting to see this coming out of Apple

mitthrowaway2

Something about dreams that fascinates me is that I usually am genuinely surprised by events that occur in dreams. I interact with other characters whose motivation I cannot understand and whose actions I cannot fully anticipate. It feels like there's a foreign entity acting as DM.

This isn't fake surprise. Sometimes I'll wake up and think, "who on earth were those guys and what were they trying to do? And yet their actions make sense..." or, "who came up with that punchline? It's legitimately funny and I never saw it coming, so it can't have been me..."

And yet I know it's all being generated by my own brain somehow. Through some kind of privileged access level.

And then I think about the bicameral brain structure. Does our brain have two halves so that it can function in a self-play training mode during sleep? Are each halves of my brain experiencing the same dream from opposite points of view?

Apologies for the tangent; this is almost totally unrelated to the article and probably something well known to neuroscience for decades. But still, it fascinates me, and the more we learn about the effectiveness of self-play in AI, the more I wonder.

bananaflag

> "who came up with that punchline? It's legitimately funny and I never saw it coming, so it can't have been me..."

Sometimes, when I dream, I am envious of people being witty in ways I think I can't be, and when I wake up I'm like "..."

djmips

Or dreams with music scores or dreams where you are composing a music score. But once I didn't quite wake up and I got it down in a sequencer. Freaked out my boss who didn't believe me when I said I made it.

jeffparsons

I have this when I'm drifting off to sleep but not quite gone yet. I can effortlessly compose compelling melodies in my head, sometimes even parts for multiple instruments simultaneously, as if I'm simply listening to someone play it. But if I want it to go in a particular direction, it does. The couple of times I've willed myself to wake up and find a keyboard, the melodies still made sense — I wasn't just a drunk guy thinking he can sing.

I've occasionally been able to do the same with architecture: design a massive sprawling palace with ease as I fly through it. And much like the music, on the one and only (I think?) time I woke up and sketched as much as I could remember, it still all made sense.

But in my normal waking life I am creatively constipated. My mind aggressively criticises and crushes ideas before they get a chance to grow organically. I have one side of the creative process in my waking life (filtering) but very little of the other (synthesis).

This makes me think a couple of things:

1. I totally get why artists use drugs so much. Any way to tap into that other state must be incredibly tempting.

2. It would be so amazing if we could figure out how to record in high fidelity and interpret what's going on in these altered states of consciousness. Maybe you've composed a whole symphony while you slept once, and you just don't know it!

geon

The problem is, you might dream the joke is very witty, but when you wake up, you realize it is completely nonsensical.

Your brain didn’t really dream the joke. It dreamt the emotional response to it. The joke itself was just a prop.

Not sure where I’m going with this.

wruza

Also happens with drugs, the you understood everything thing, teleportation, repetitions, etc, repetitions, etc.

We get used to our usual normal-functioning consciousness, but there’s a whole universe of its potential modes and what-causes-what and in which order. For example, dream scenarios may easily happen in reverse, from a random emotion to drawing the scene around that, to the sense of reality, when you just remember a concept of it. Everything can be backwards, orthogonal, not in order. All “it” has to do is to make it feel normal. All these “backwards”, “normal”, “time”, “because” are not something just granted, you have to actively experience these all the time.

mitthrowaway2

In that case, it wouldn't still be funny and unexpected when I woke up.

Davidzheng

By the way, on dreams: Doesn't the existence of dreams mean that consciousness is like a subprogram in the brain that is usually "plugged in" to reality (or rather some rendering of reality done by the rest of the brain) but the rest of the brain can in fact simulate other scenarios for the consciousness to plug into?

Also I think split brain experiments basically support that you can split a consciousness into two tbh...

Anyways obviously above is speculation

plakspin

You know how a loud sound or a flash of light can alter your dream ... For example you hear your alarm clock and all of a sudden you are dreaming you are in an ambulance?

My theory is that we are always in a "dream" state. Stimuli that manage to reach our conscious attention will alter this dream.

When a asleep only very strong stimuli will reach us so for the most part our "dream" is in a feedback loop mostly doing its own thing. When awake though we have a much more weaker filter for stimuli . The direction that our "dream" takes is fully controlled by it.

mitthrowaway2

That's one way to think about it I guess! I always think of it more like the channels that write to the sensory input registers get switched, just like when I configure my microcontroller's ADC0 to read from INT_REF instead of EXT_REF by flipping a bit at a certain memory address.

jes5199

I think you may have hit upon a novel combination of ideas here. There is something called "social simulation theory" regarding the purpose of dreams, but I don't think it has a neuroanatomical description included.

HaZeust

>"I don't think it has a neuroanatomical description included"

Genuinely curious, but then why bother?

jes5199

because you can make descriptions of useful human cognition without specifying the implementation. When you’re reading python code, do you ask, “wait, which hard register is this local variable stored in??”

vjerancrnjak

It’s a very conscious part of sleep. So who knows what’s actually going on elsewhere, and if you ever make it conscious, would it just be an interpretation of this temporary conscious machinery inspecting what was previously running without it with no labels.

It is similar to solving problems. You want most of it to happen in unconsciousness, otherwise it’s too slow.

Things are learned when they are natural, without thought.

beagle3

Don’t remember where I read this theory, but the gist is that dreams are simulated events for practice/preparation, and they are therefore surprising and very diverse - you need to practice social interactions, fight or flight, shame, freezing, etc - and since you don’t have a lot of time, many dreams will sample many or all these themes.

crummy

I'll be grateful for all the prep dreams when I finally find myself doing an exam I'd forgotten about while naked.

svnt

I don’t think this requires two halves, although it certainly seems possible that is what is happening.

I believe it only requires that your sensory and post-sensory systems be unpredictably generative when feeding to your subjective sense-making/observer. This could be provided for within a coherent whole brain.

mitthrowaway2

That would work for dreams of wandering through some random landscape, but I'm talking about dreams where I interact with people who seem as smart (or smarter) than me, who can tell me jokes that make me laugh.

What you describe would be like learning chess by exploring random boards, but I'm talking about dreams as self play: learning chess by playing as white against black, without any window into black's strategy. To do that well seems to require running two brain instances in relative isolation. Dreams would be the only safe time to do that, and a bicameral brain hardware would be the most straightforward implementation. I doubt my optic nerve can play chess against my cerebellum.

juliendorra

Is it possible that you are confusing the feeling of the dream with the content of the dream? For example can you remember and write down any of these jokes? Do they make any sense outside of the dream?

The dream may give you the feeling that someone is telling you a joke, and the feeling that makes you laugh, without the actual joke existing as a real structured text joke.

svnt

As I said, I’m open to the possibility that you are correct, but it isn’t accurate to reduce the piecewise analogy to an optic nerve playing chess against a cerebellum.

You have multiple visual cortices that are made of roughly the same stuff as the rest of your neocortex. There is more than enough idle network/processing capacity there, given it is not being fed visual stimulus by the optic nerve, to “play chess.”

mandmandam

> I interact with other characters whose motivation I cannot understand and whose actions I cannot fully anticipate. It feels like there's a foreign entity acting as DM.

My head canon on this phenomenon is that we are not quite as integrated (or isolated) as our conscious 'self' would have us believe.

Neuroscience and Buddhism alike seem to back this up... The concept of 'anatta' [0] has held up strongly for thousands of years (how could it not?). As far as I can tell, neuroscience is also increasingly clear that consciousness is an emergent phenomenon - and strange things happen when this is disrupted.

Most who have experienced psychedelics will agree that our brains are capable of providing utterly novel, previously unimaginable experiences; stranger even than most dreams.

For example, DMT users across a wide range of cultural backgrounds describe seeing 'machine elves' [1]; "People describe these entities as distinct, autonomous beings that typically present with some kind of message."

Then, look at people who 'hear voices'; a phenomenon which is quite scary in our culture, and quite normal or respected in other, older cultures. The voices can say things which surprise us even in our waking lives, sometimes even offering powerful insight, without any ingestion of substances.

So... There are these ways to experience surprising characters made by our own brains - dreams, drugs, and other oddities. My personal view is that these are always active, mostly doing their thing in the background; something like flora and fauna in the sea of our sub/consciousness. Sleep can reveal them, as can 'altered' states of mind.. As if the light of consciousness dazzles us, and we can't see the gloriously intricate machinery in the dark.

0 - https://buddhism.stackexchange.com/a/25302

1 - https://health.howstuffworks.com/wellness/drugs-alcohol/dmt-...

linux_devil

Maybe not directly related , I find genertic algorithms and other optimisation algorithms such as Ant Colony Optimisation algorithms intersecting with this approach of self-play and leading to robust autonomy.

The28thDuck

The concept of being able to simulate 42 years of “experience” in one hour seems so foreign to me. Something about it creeps me out.

ThrowawayTestr

Don't watch the White Christmas Black Mirror episode.

baq

Maybe at this point don’t watch any black mirror episodes…?

ThrowawayTestr

This is also an acceptable option. It's good TV but it's also nightmare fuel.

RGamma

Don't read Junji Ito's Nagai Yume either.

mikelevins

I had a couple of hobbies (lucid dreaming and shamanic trance drumming) that enabled me to experience big disconnects between the subjective experience of time passing and objective measurable wall-clock time. Some dreams and trances subjectively appeared to be much longer than the wall-clock time recorded by clocks and human helpers.

I don't have any definite knowledge of what's going on with that, but I suspect some part of it is my brain retroactively manufacturing the memory of lots of time passing, and some part of it is my brain confabulating episodic memory about the dream or trance as I wake up and write it down.

Human memory is well known to be generally unreliable and full of confabulated details, so I think the most parsimonious explanation for differences between the time experienced in dreams and the objectively-measurable time that passes is that our brains are just making shit up.

Of course, the idea that your brain just lies to you about the past might be just as creepy as any other explanation.

geon

Humanity experiences almost a million years per hour.

p-a_58213

If the gym is sufficiently simple and well-coded, achieving a simulation speed of 367,920x real-time (simulating 42 years in one hour) is plausible. The question is whether these simulated scenarios genuinely reflect 42 years of real-world driving experience and truly represent the information that a single agent has at its disposal when making driving decisions.

null

[deleted]

null

[deleted]

dang

[stub for offtopicness]

TZubiri

[flagged]

awinter-py

[flagged]

esafak

"Guys, do you think we'll get away it?"

null

[deleted]

bmacho

"Self-play has powered breakthroughs in two-player and multi-player games."

RGamma

"Self-play simulation emits fitter models"

surume

[flagged]

dang

Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.