Skip to content(if available)orjump to list(if available)

John Carmack talk at Upper Bound 2025 – slides and notes

MrScruff

It's always a treat to watch a Carmack lecture or read anything he writes, and his notes here are no exception. He writes as an engineer, for engineers and documents all his thought processes and misteps in the exact detailed yet concise way you'd want a colleague to who was handing off some work.

One question I would have about the research direction is the emphasis on realtime. If I understand correctly he's doing online learning in realtime. Obviously makes for a cool demo and pulls on his optimisation background, and no doubt some great innovations will be required to make this work. But I guess the bitter lesson and recent history also tell us that some solutions may only emerge at compute levels beyond what is currently possible for realtime inference let alone learning. And the only example we have of entities solving Atari games is the human brain, of which we don't have a clear understanding of the compute capacity. In which case, why wouldn't it be better to focus purely on learning efficiency and relax the realtime requirement for now?

That's a genuine question by the way, definitely not an expert here and I'm sure there's a bunch of value to working within these constraints. I mean, jumping spiders solve reasonably complex problems with 100k neurons, so who knows.

johnb231

From the notes:

"A reality check for people that think full embodied AGI is right around the corner is to ask your dancing humanoid robot to pick up a joystick and learn how to play an obscure video game."

throw_nbvc1234

This sounds like a problem that could be solved around the corner with a caveat.

Games generally are solvable for AI because they have feedback loops and a clear success or failure criteria. If the "picking up a Joystick" part is the limiting factor, sure. But why would we want robots to use an interface (especially a modern controller) heavily optimized for human hands; that seems like the definition of a horseless carriage.

I'm sure if you compared a monkey and a dolphins performance using a joystick you'd get results that aren't really correlated with their intelligence. I would guess that if you gave robots an R2D2 like port to jack into and play a game, that problem could be solved relatively quickly.

xnickb

Just like OpenAI early on promised us an AGI and showed us how it "solved" Dota 2.

They also claimed it "learned" to play by playing itself only however it was clear that most of the advanced techniques were borrowed from existing AI and by observing humans.

No surprise they gave up on that project completely and I doubt they'll ever engage in anything like that again.

Money better spent on different marketing platforms.

mellosouls

The point isn't about learning video games its about learning tasks unrelated to its specific competency generally.

jappgar

A human would learn it faster, and could immediately teach other humans.

AI clearly isn't at human level and it's OK to admit it.

johnb231

No, the joystick part is really not the limiting factor. They’ve already done this with a direct software interface. Physical interface is a new challenge. But overall you are missing the point.

suddenlybananas

It's because humans (and other animals) have enormous innate capacities and knowledge which makes learning new things much much simpler than if you start from scratch. It's not really because of human's computational capacity.

MrScruff

By innate do you mean evolved/instinctive? Surely even evolved behaviour must be expressed as brain function, and therefore would need a brain capable of handling that level of processing.

I don't think it's clear how much of a human brains function exists at birth though, I know it's theorised than even much of the sensory processing has to be learned.

suddenlybananas

I'm not arguing against computational theory of mind, I'm just saying that innate behaviours don't require the same level of scale as learnt ones.

Existing at birth is not the same thing as innate. Puberty is innate but it is not present at birth.

nlitened

> the human brain, of which we don't have a clear understanding of the compute capacity

Neurons have finite (very low) speed of signal transfer, so just by measuring cognitive reaction time we can deduce upper bounds on how many _consecutive_ neuron connections are involved in reception, cognitive processing, and resulting reaction via muscles, even for very complex cognitive processes. And the number is just around 100 consecutive neurons involved one after another. So “the algorithm” could not be _that_ complex in the end (100x matmul+tanh?)

Granted, a lot of parallelism and feedback loops are involved, but overall it gives me (and many others) an impression that when the AGI algorithm is ever found, it’s “mini” version should be able to run on modest 2025 hardware in real time.

johnb231

> (100x matmul+tanh?)

Biological neurons are way more complex than that. A single neuron has dentritic trees with subunits doing their own local computations. There are temporal dynamics in the firing sequences. There is so much more complexity in the biological networks. It's not comparable.

neffy

This is exactly it. Biology is making massive use of hacked real time local network communication in ways we haven´t begun to explore.

woolion

You could implement a Turing-machine with humans acting physically operating as logic gates. Then, every human is just a boolean function.

scajanus

The granted is doing a lot of work there. In fact, if you imagine a computer being able to do similar tasks as human brain can in around 100 steps, it becomes clear that considering parallelism is absolutely critical.

qoez

Interesting reply from an openai insider: https://x.com/unixpickle/status/1925795730150527191

epr

Actually no, it's not interesting at all. Vague dismissal of an outsider is a pretty standard response by insecure academic types. It could have been interesting and/or helpful to the conversation if they went into specifics or explained anything at all. Since none of that's provided, it's "OpenAI insider" vs John Carmack AND Richard Sutton. I know who I would bet on.

andy_ppp

My bet is on Carmack.

speed_spread

I suspect Carmack in the Dancehall with the BFG.

zeroq

  >> "they will learn the same lesson I did"
Which is what? Don't trust Altman? x)

null

[deleted]

roflcopter69

Funny, I was just commenting something similar here, see https://news.ycombinator.com/item?id=44071614

And I say this while most certainly not being as knowledgeable as this openai insider. So it even I can see this, then it's kinda bad, isn't it?

fmbb

Can you explain which parts you think are bad and why?

steveBK123

Another thought experiment - if OpenAI AGI was right around the corner, why are they wasting time/money/energy buying a product-less vanity hardware startup run by Ive?

Why not tackle robotics if anything. Or really just be the best AGI and everyone will be knocking on your door to license it in their hardware/software stacks, you will print infinite money.

soared

Or have your AGI design products

steveBK123

All the more reason not to acquihire Ive for $6.5B, if true

koolala

I wish he did this with VR environment instead like they mention at the start of the slides. A VR environment with a JPEG camera filter, physics sim, noise, robot simulation. If anyone could program that well its him.

Using real life robots is going to be a huge bottleneck for training hours no matter what they do.

andy_ppp

I still don’t think we have a clear enough idea of what a concept is to be able to think about AGI. And then being able to use concepts from one area to translate into another area, what is the process by which the brain combines and abstracts ideas into something new?

throw310822

Known entities are recurring patterns (we give names to things that occur more than once, in the world or in our thoughts). Concepts are recurring thought patterns. Abstractions, relations, metaphors, are all ways of finding and transferring patterns from one domain to another.

kamranjon

I was really excited when I heard Carmack was focusing on AI and am really looking forward to watching this when the video is up - but just from looking at the slides it seems like he tried to build a system that can play the Atari? Seems like a fun project, but curious what will come out of it or if there is an associated paper being released.

johnb231

Atari games are widely used in Reinforcement Learning (RL) research as a standard benchmark.

https://github.com/Farama-Foundation/Arcade-Learning-Environ...

The goal is to develop algorithms that generalize to other tasks.

sigmoid10

They were highly used. OpenAI even included them in their RL Gym library back in the old days when they were still doing open research. But if you look at this leaderboard from 7 (yes, seven!) years ago [1], most of them were already solved way beyond human capabilities. But we didn't get a really useful general purpose algorithm out of it. As an AI researcher, I always considered Atari a fun academic exercise, but nothing more. Similar to how recognising characters using convnets was cool in the nineties and early 00s, but didn't give us general purpose image understanding. Only modern GPUs and massive training datasets did. Nowadays most cutting-edge RL game research focuses on much more advanced games like Minecraft which is thought to be better suited. But I'm pretty sure it's still not enough. Even role-playing GTA VI won't be. We probably need a pretty advanced physical simulation of the real world before we can get agents to handle the real world. But that means solving the problem of generating such an environment first, because you can't train on the actual real world due to the sample inefficiency of all current algorithms. Nvidia is doing some really interesting research in this direction by combining physics simulation and image generation models to simulate an environment, while getting accuracy and diversity at the same time into training data. But it still feels like some key ingredient is missing.

[1]https://github.com/cshenton/atari-leaderboard

gregdeon

I watched the talk live. I felt that his main argument was that Atari _looks_ solved, but there's still plenty of value that could be gained by revisiting these "solved" games. For one, learning how to play games through a physical interface is a way to start engaging with the kinds of problems that make robotics hard (e.g., latency). They're also a good environment to study catastrophic forgetting: an hour of training on one game shouldn't erase a model's ability to play other games.

I think we could eventually saturate Atari, but for now it looks like it's still a good source of problems that are just out of reach of current methods.

mschuster91

> But it still feels like some key ingredient is missing.

Continuous training is the key ingredient. Humans can use existing knowledge and apply it to new scenarios, and so can most AI. But AI cannot permanently remember the result of its actions in the real world, and so its body of knowledge cannot expand.

Take a toddler and an oven. The toddler has no concept of what an oven is other than maybe that it smells nice. The toddler will touch the oven, notice that it experiences pain (because the oven is hot) and learn that oven = danger. Place a current AI in a droid toddler body? It will never learn and keep touching the oven as soon as the information of "oven = danger" is out of the context window.

For some cases this inability to learn is actually desirable. You don't want anyone and everyone to be able to train ChatGPT unsupervised, otherwise you get 4chan flooding it with offensive crap like they did to Tay [1], but for AI that physically interacts with the meatspace, constant evaluation and learning is all but mandatory if it is to safely interact with its surroundings. "Dumb" robots run regular calibration cycles for their limbs to make sure they are still aligned to compensate for random deviations, and so will AI robots.

[1] https://en.wikipedia.org/wiki/Tay_(chatbot)

newsclues

Being highly used in the past is good, it's a benchmark to compare against.

tschillaci

You will find many agents that solved (e.g., finished, reached high score) atari games, but there is still so much more work to do in the field. I wrote my Master's thesis on how to learn from few interactions with the game, so that if the algorithm is ported to actual robots they don't need to walk and fall for centuries before learning behaviors. I think there is more research to do on higher levels of generalization: when you know how to play a few video games, you quickly understand how to play a new one intuitively, and I haven't seen thorough research on that.

gadders

I want smarter NPCs in games.

albertzeyer

His goal was not just to solve Atari games. That was already done.

His goal is to develop generic methods. So you could work with more complex games or the physical world for that, as that is what you want in the end. However, his insight is, you can even modify the Atari setting to test this, e.g. to work in realtime, and the added complexity by more complex games doesn't really give you any new additional insights at this point.

modeless

He says they will open source it which is cool. I agree that I don't understand what's novel here. Playing with a physical controller and camera on a laptop GPU in real time is cool, and maybe that hasn't specifically been done before, but it doesn't seem surprising that it is possible.

If it is substantially more sample efficient, or generalizable, than prior work then that would be exciting. But I'm not sure if it is?

RetroTechie

Maybe that's exactly his goal: not to come up with something that beats the competition, but play with the architecture, get a feel for what works & what doesn't, how various aspects affect the output, and improve on that. Design more efficient architectures, or come up with something that has unique features compared to other models.

If so, scaling up may be more of a distraction rather than helpful (besides wasting resources).

I hope he succeeds in whatever he's aiming for.

cryptoz

DeepMind’s original demos were also of Atari gameplay.

pyb

"... Since I am new to the research community, I made an effort" This means they've probably submitted a paper too.

epolanski

It states it's a research, not a product company.

diggan

To be fair, OpenAI is also a "research lab" rather than "product company" and they still sell products for $200/month, not sure the distinction matters in practice much today as long as the entity is incorporated somehow.

pyb

That's what I said

roflcopter69

Honestly, having gone through the slides, it's a bit painful to see Carmack "rediscover" stuff I've learned in a reinforcement learning lecture like ten years ago.

But don't get me wrong! Since this is a long-term research endeavor of his, I believe really starting from the basics is good for him and will empower him to bring something new to the table eventually.

I'm surprised though that he "only" came so far as of now. Maybe my slight idolization of Carmack made me kinda of blind to the fact that this kind of research is a mean beast after all and there is a reason that huuuuge research labs dump countless of man-decades into this kind of stuff with no guaranteed breakthroughs.

Cipater

roflcopter69

I was just going to answer https://news.ycombinator.com/item?id=44071595 who mentioned exactly the same tweet.

I'm nowhere as good at my craft as someone who works for openai, which the author of that tweet seems to be, but if even I can see this, then it's bad, isn't it?

saejox

What Carmack is doing is right. More people need to get away from training their models just with words. AI need the physicality.

dusted

anywhere we can watch the presentation ? the slides alone are great, but if he's saying stuff alongside, I'd be interested in that too :)

null

[deleted]