John Carmack talk at Upper Bound 2025
386 comments
·May 23, 2025MrScruff
kilpikaarna
I'm sure there were offline rendering and 3D graphics workstation people saying the same about the comparatively crude work he was doing in the early 90s...
Obviously both Carmack and the rest of the world has changed since then, but it seems to me his main strength has always been in doing more with less (early id/Oculus, AA). When he's working in bigger orgs and/or with more established tech his output seems to suffer, at least in my view (possibly in his as well since he quit both Bethesda-id and Meta).
I don't know Carmack and can't claim to be anywhere close to his level, but as someone also mainly interested in realtime stuff I can imagine he also feels a slight disdain for the throw-more-compute-at-it approach of the current AI boom. I'm certainly glad he's not running around asking for investor money to train an LLM.
Best case scenario he teams up with some people who complement his skillset (akin to the game designers and artists at id back in the day) and comes up with a way to help bring some of the cutting edge to the masses, like with 3D graphics.
LarsDu88
The thing about Carmack in the 90s... There was a lot of research going on around 3d graphics. Companies like SGI and Pixar were building specialized workstations for doing vector operations for 3d rendering. 3d was a thing. Game consoles with specialized 3d hardware would launch in 1994 with the Sega Saturn and the Sony Playstation (in Japan only for one year)
What Carmack did was basically get a 3d game running on existing COMMODITY hardware. The 386 chip that most people used for their excel spreadsheets did not do floating point operations well, so Carmack figured out how to do everything using integers.
May 1992 -> Wolfenstein 3d releases December 1993 -> Doom releases December 1994 -> Sony Playstation launches in Japan June 1996 -> Quake releases
So Wolfenstein and Doom were actually not really 3d games, but rather 2.5 games (you can't have rooms below other rooms). The first 3d game here is actually Quake which also eventually also got hardware acceleration support.
Carmack was the master of doing the seeminly impossible on super constrained hardware on virtually impossible timelines. If DOOM released in 1994 or 1995, would we still remember it in the same way?
hx8
> If DOOM released in 1994 or 1995, would we still remember it in the same way?
Maybe. One aspect of Wolfenstein and Doom's popularity is that it was years ahead of everyone else technically on PC hardware. The other aspect is that they were genre defining titles that set the standards for gameplay design. I think Doom Deathmatch would have caught on in 1995, as there really were very few (just Command and Conquer?) standout PC network multiplayer games released between 1993 and 1995.
andrepd
> So Wolfenstein and Doom were actually not really 3d games, but rather 2.5 games (you can't have rooms below other rooms). The first 3d game here is actually Quake
Ultima Underworld is a true 3D game from 1992. An incredibly impressive game, in more ways than one.
muziq
The world seems to have rewritten history, and forgotten Ultima Underworld, which shipped prior to Doom..
gjadi
Hardware changes a lot in the time it takes to develop a game. When I read his plan files and interviews, I realized he seemed to spend a lot of time before developing the game thinking about what the next gen hardware was going to bring. Then design the best game they could think of whike targeting this not-yet-available hardware.
CamperBob2
If DOOM released in 1994 or 1995, would we still remember it in the same way?
I think so, because the thing about DOOM is, it was an insanely good game. Yes, it pioneered fullscreen real-time perspective rendering on commodity hardware, instantly realigning the direction of much of the game industry, yadda yadda yadda, but at the end of the day it was a good-enough game for people to remember and respect even without considering the tech.
Minecraft would be a similar example. Minecraft looked like total ass, and games with similar rendering technology could have been (and were) made years earlier, but Minecraft was also good. And that was enough.
leoc
But also, he didn't do the technically hardest and most impressive part, Quake, on his own. IIUC he basically relied on Michael Abrash's help to get Quake done (in any reasonable amount of time).
Buttons840
> his main strength has always been in doing more with less
Carmack builds his kingdom and then runs it well.
I makes me wonder how he would fare as an unknown Jr. developer with managers telling him "that's a neat idea, but for now we just need you to implement these Figma designs".
mrandish
A key aspect of the Carmack approach (or similar 'smart hacker' unconventional career approach) is avoiding that situation in the first place. However, this also carries substantial career, financial and lifestyle risks & trade-offs - especially if you're not both talented enough and lucky enough to hit a sufficiently fertile oppty in the right time window on the first few tries.
Assuming one is willing to accept the risks and has the requisite high-talent plus strong work drive, the Carmack-like career pattern is to devote great care to evaluating and selecting opptys near the edges of newly emerging 'interesting things' which also: coincide with your interests/talents, are still at a point where a small team can plausibly generate meaningful traction, and have plausible potential to grow quickly and get big.
Carmack was fortunate that his strong interest in graphics and games overlapped a time period when Moore's Law was enabling quite capable CPU, RAM and GFX hardware to hit consumer prices. But we shouldn't dismiss Carmack's success as "luck". That kind of luck is an ever-present uncontrolled variable which must be factored into your approach - not ignored. Since Carmack has since shown he can get very interested in a variety of things, I assume he filtered his strong interests to pick the one with the most near-term growth potential which also matched his skills. I suspect the most fortunate "luck" Carmack had wasn't picking game graphics in the early 90s, it was that (for whatever reasons) he wasn't already employed in a more typical "well-paying job with a big, stable company, great benefits and career growth potential" so he was free to find the oppty in the first place.
I had a similarly unconventional career path which, fortunately, turned out very well for me (although not quite at Carmack's scale :-)). The best luck I had actually looked like 'bad luck' to me and everyone else. Due to my inability to succeed in a traditional educational context (and other personal shortcomings), I didn't have a college degree or resume sufficient to get a "good job", so I had little choice but to take the high-risk road and figure out the unconventional approach as best I could - which involved teaching myself, then hiring myself (because no one else would) and then repeatedly failing my way through learning startup entrepreneurship until I got good at it. I think the reality is that few who succeed on the 'unconventional approach' consciously chose that path at the beginning over lower risk, more comfortable alternatives - we simply never had those alternatives to 'bravely' reject in pursuit of our dreams :-).
zeroq
> "makes me wonder how he would fare as an unknown Jr. developer with managers telling him (...)"
he would probably write an open letter and left Meta. /sjohnb231
From the notes:
"A reality check for people that think full embodied AGI is right around the corner is to ask your dancing humanoid robot to pick up a joystick and learn how to play an obscure video game."
ferguess_k
We don't really need AGI. We need better specialized AIs. Throw in a few specialized AIs and they will leave some impact in the society. That might not be that far away.
nightski
Saying we don't "need" AGI is like saying we don't need electricity. Sure life existed before we had that capability, but it would be very transformative. Of course we can make specialized tools in the mean time.
alickz
What if AGI is just a bunch of specialized AIs put together?
It would seem our own generalized intelligence is an emergent property of many, _many_ specialized processes
I wonder if AI is the same
bluGill
Specialized AIs have been making an impact on society since at least the 1960s. AI has long suffered from every time they come up with something new it gets renamed and becomes important (where it makes sense) without giving AI credit.
From what I can tell most in AI are currently hoping LLMs reach that point quick just because the hype is not helping AI at all.
Karrot_Kream
I think to many AI enthusiasts, we're already at the "specialized AIs" phase. The question is whether those will jump to AGI. I'm personally unconvinced but I'm not an ML researcher so my opinion is colored by what I use and what I read, not active research. I do think though that many specialized AIs is already enough to experience massive economic disruption.
babyent
Why not just hire like 100 of the smartest people across domains and give them SOTA AI, to keep the AI as accurate as possible?
Each of those 100 can hire teams or colleagues to make their domain better, so there’s always human expertise keeping the model updated.
AndrewKemendo
This debate is exhausting because there's no coherent definition of AGI that people agree on.
I made a google form question for collecting AGI definitions cause I don't see anyone else doing it and I find it infinitely frustrating the range of definitions for this concept:
https://docs.google.com/forms/d/e/1FAIpQLScDF5_CMSjHZDDexHkc...
My concern is that people never get focused enough to care to define it - seems like the most likely case.
johnb231
The Wikipedia article on AGI explains it well enough.
Researchers at Google have proposed a classification scheme with multiple levels of AGI. There are different opinions in the research community.
latentsea
In a way it sort of doesn't matter. If all we ever wind up with is unbelievably fancy chatbots, marketing teams will eventually draw their line in the sand and call it AGI.
There comes a theoretical point at which a definition is no longer relevant because it's obvious to everyone on an intuitive level. An easy lower bound for where this threshold might sit would be "when it can start and win wars unassisted under its own volition". Since, at that point no one on earth would have a need to debate it. It would simply be respected and understood for what it is without needing to define it.
Until such an obvious threshold is crossed, it will be whatever executives, product managers, and marketing teams say it is.
mvkel
It doesn't really seem like there's much utility in defining it. It's like defining "heaven."
It's an ideal that some people believe in, and we're perpetually marching towards it
bigyabai
It is a marketing term. That's it. Trying to exhaustively define what AGI is or could be is like trying to explain what a Happy Meal is. At it's core, the Happy Meal was not invented to revolutionize food eating. It puts an attractive label on some mediocre food, a title that exists for the purpose of advertisement.
There is no point collecting definitions for AGI, it was not conceived as a description for something novel or provably existent. It is "Happy Meal marketing" but aimed for adults.
vonneumannstan
Is this supposed to be a gotcha? We know these systems are typically trained using RL and they are exceedingly good at learning games...
johnb231
No it is not a “gotcha” and I don’t understand how you got that impression.
Carmack believes AGI systems should be able to learn new tasks in realtime alongside humans in the real world.
throw_nbvc1234
This sounds like a problem that could be solved around the corner with a caveat.
Games generally are solvable for AI because they have feedback loops and a clear success or failure criteria. If the "picking up a Joystick" part is the limiting factor, sure. But why would we want robots to use an interface (especially a modern controller) heavily optimized for human hands; that seems like the definition of a horseless carriage.
I'm sure if you compared a monkey and a dolphins performance using a joystick you'd get results that aren't really correlated with their intelligence. I would guess that if you gave robots an R2D2 like port to jack into and play a game, that problem could be solved relatively quickly.
xnickb
Just like OpenAI early on promised us an AGI and showed us how it "solved" Dota 2.
They also claimed it "learned" to play by playing itself only however it was clear that most of the advanced techniques were borrowed from existing AI and by observing humans.
No surprise they gave up on that project completely and I doubt they'll ever engage in anything like that again.
Money better spent on different marketing platforms.
mellosouls
The point isn't about learning video games its about learning tasks unrelated to its specific competency generally.
jappgar
A human would learn it faster, and could immediately teach other humans.
AI clearly isn't at human level and it's OK to admit it.
jandrese
> But why would we want robots to use an interface (especially a modern controller) heavily optimized for human hands; that seems like the definition of a horseless carriage.
Elon's response to this is that if we want these androids to replace human jobs then the lowest friction alternative is for the android to be able to do anything a human can do in a human amount of space. A specialized machine is faster and more efficient, but comes with engineering and integration costs that create a barrier to entry. Elon learned this lesson the hard way when he was building out the gigafactories and ended up having to hire a lot of people to do the work while they sorted out the issues with the robots. To someone like Elon a payroll is an ever growing parasite on a companies bottom line, far better if the entire thing is automated.
johnb231
No, the joystick part is really not the limiting factor. They’ve already done this with a direct software interface. Physical interface is a new challenge. But overall you are missing the point.
suddenlybananas
It's because humans (and other animals) have enormous innate capacities and knowledge which makes learning new things much much simpler than if you start from scratch. It's not really because of human's computational capacity.
xnx
> enormous innate capacities and knowledge
Hundreds of millions of years of trial-and-error biological pre-training where survival/propagation is the reward function
suddenlybananas
Yes but it remains innate to the individual.
MrScruff
By innate do you mean evolved/instinctive? Surely even evolved behaviour must be expressed as brain function, and therefore would need a brain capable of handling that level of processing.
I don't think it's clear how much of a human brains function exists at birth though, I know it's theorised than even much of the sensory processing has to be learned.
suddenlybananas
I'm not arguing against computational theory of mind, I'm just saying that innate behaviours don't require the same level of scale as learnt ones.
Existing at birth is not the same thing as innate. Puberty is innate but it is not present at birth.
Nopoint2
There is just no reason to believe that we are born with some insanely big library of knowledge, and it sounds completely impossible. How would it be stored, and how would we even evolve it?
It just isn't needed. Just like you can find let's say kangaroos in the latent space of an image generator, so we learn abstract concepts and principles of how things work as a bonus of learning to process the senses.
Maybe a way to AGI could be figuring out how to combine a video generator with a LLM or something similar in a way that allows it to understand things intuitively, instead of doing just lots and lots of some statistical bullsit.
Jensson
> There is just no reason to believe that we are born with some insanely big library of knowledge, and it sounds completely impossible. How would it be stored, and how would we even evolve it?
We do have that, ever felt fear of heights? That isn't learned, we are born with it. Same with fear of small moving objects like spiders or snakes.
Such things are learned/stored very different from memories, but its certainly there and we can see animals also have those. Like cats gets very scared of objects that are long and appear suddenly, like a cucumber, since their genetic instincts thinks its a snake.
suddenlybananas
>How would it be stored, and how would we even evolve it?
DNA and the same way anything else is evolved? The body is insanely complicated, I don't see why innate knowledge is so unbelievable.
nlitened
> the human brain, of which we don't have a clear understanding of the compute capacity
Neurons have finite (very low) speed of signal transfer, so just by measuring cognitive reaction time we can deduce upper bounds on how many _consecutive_ neuron connections are involved in reception, cognitive processing, and resulting reaction via muscles, even for very complex cognitive processes. And the number is just around 100 consecutive neurons involved one after another. So “the algorithm” could not be _that_ complex in the end (100x matmul+tanh?)
Granted, a lot of parallelism and feedback loops are involved, but overall it gives me (and many others) an impression that when the AGI algorithm is ever found, it’s “mini” version should be able to run on modest 2025 hardware in real time.
johnb231
> (100x matmul+tanh?)
Biological neurons are way more complex than that. A single neuron has dentritic trees with subunits doing their own local computations. There are temporal dynamics in the firing sequences. There is so much more complexity in the biological networks. It's not comparable.
woolion
You could implement a Turing-machine with humans acting physically operating as logic gates. Then, every human is just a boolean function.
neffy
This is exactly it. Biology is making massive use of hacked real time local network communication in ways we haven´t begun to explore.
scajanus
The granted is doing a lot of work there. In fact, if you imagine a computer being able to do similar tasks as human brain can in around 100 steps, it becomes clear that considering parallelism is absolutely critical.
qoez
Interesting reply from an openai insider: https://x.com/unixpickle/status/1925795730150527191
epr
Actually no, it's not interesting at all. Vague dismissal of an outsider is a pretty standard response by insecure academic types. It could have been interesting and/or helpful to the conversation if they went into specifics or explained anything at all. Since none of that's provided, it's "OpenAI insider" vs John Carmack AND Richard Sutton. I know who I would bet on.
handsclean
It seems that you’ve only read the first part of the message. X sometimes aggressively truncates content with no indication it’s done so. I’m not sure this is complete, but I’ve recovered this much:
> I read through these slides and felt like I was transported back to 2018.
> Having been in this spot years ago, thinking about what John & team are thinking about, I can't help but feel like they will learn the same lesson I did the hard way.
> The lesson: on a fundamental level, solutions to these games are low-dimensional. No matter how hard you hit them with from-scratch training, tiny models will work about as well as big ones. Why? Because there's just not that many bits to learn.
> If there's not that many bits to learn, then researcher input becomes non-negligible.
> "I found a trick that makes score go up!" -- yeah, you just hard-coded 100+ bits of information; a winning solution is probably only like 1000 bits. You see progress, but it's not the AI's.
> In this simplified RL setting, you don't see anything close to general intelligence. The neural networks aren't even that important.
> You won't see _real_ learning until you absorb a ton of bits into the model. The only way I really know to do this is with generative modeling.
> A classic example: why is frame stacking just as good as RNNs? John mentioned this in his slides. Shouldn't a better, more general architecture work better?
> YES, it should! But it doesn't, because these environments don't heavily encourage real intelligence.
leoc
I'm not sure what the moral is from this, but if Atari games are just too easy, at the same time the response of the machine-learning guys to the challenge of the NetHack Learning Environment seems to have mostly been to quietly give up. Why is generative modeling essential to finding harder challenges when NetHack is right there ...?
lairv
Alex Nichol worked on "Gotta Learn Fast" in 2018 which Carmack mentions in his talk, he also worked on foundational deep learning methods like CLIP, DDPM, GLIDE, etc. Reducing him to a "seething openai insider" seems a bit unfair
ActivePattern
It's a OpenAI researcher that's worked on some of their most successful projects, and I think the criticism in his X thread is very clear.
Systems that can learn to play Atari efficiently are exploiting the fact that the solutions to each game are simple to encode (compared to real world problems). Furthermore, you can nudge them towards those solutions using tricks that don't generalize to the real world.
6stringmerc
Right, and the current state of tech - from accounts I’ve read, though not first hand experienced - is the “black box” methods of AI are absolutely questionable when delivering citations and factual basis for their conclusions. As in, the most real world challenge, in the basic sense, of getting facts right is still a bridge too far for OpenAI, ChatGPT, Grok, et al.
See also: specious ethics regarding the training of LLMs on copyright protected artistic works, not paying anything to the creators, and pocketing investor money while trying to legislate their way around decency in engineering as a science.
Carmack has a solid track record as an engineer, innovator, and above the board actor in the tech community. I cannot say the same for the AI cohort and I believe such a distinction is important when gauging the validity of critique or self-aggrandizement by the latter, especially at the expense of the former. I am an outlier in this community because of this perspective, but as a creator and knowledgeable enough about tech to see things through this lens, I am fine being in this position. 10 years from now will be a great time to look back on AI the way we’re looking back at Carmack’s game changing contributions 30 years ago.
dgb23
That sounds like an extremely useful insight that makes this kind of research even more valuable.
kadushka
He did go into specifics and explained his point. Or have you only read his first post?
quadrature
Do you have an X account, if you're not logged in you'll only see the first post in the thread.
threatripper
x.com/... -> xcancel.com/...
MattRix
It’s not vague, did you only see the first tweet or the entire thread?
johnb231
Carmack replied to that https://x.com/ID_AA_Carmack/status/1925973500327591979
jjulius
I appreciate how they don't tell us what lesson they learned.
dcre
It is a thread. You may have only seen the first tweet because Twitter is a user-hostile trash fire.
“The lesson: on a fundamental level, solutions to these games are low-dimensional. No matter how hard you hit them with from-scratch training, tiny models will work about as well as big ones. Why? Because there's just not that many bits to learn.”
jjulius
Thank you for clarifying. I don't have a Twitter account, and the linked tweet genuinely looks like a standalone object. Mea culpa.
lancekey
I think some replies here are reading the full twitter thread, while others (not logged in?) see only the first tweet. The first tweet alone does come off as a dismissal with no insight.
mannycalavera42
indeed, this is pure walled garden sh*t
alexey-salmin
Each of these games is low-dimensional and require not the "intelligence" but more like "reflexes", I tend to agree.
However making a system that can beat an unknown game does require generalization. If not real a intelligence (whatever that means) but at the level of say "a wolf".
Whether it can arise from RL alone is not certain, but it's there somewhere.
andy_ppp
My bet is on Carmack.
WithinReason
"Graphics Carmack" is a genius but that doesn't mean that "AI Carmack" is too.
MrLeap
I wouldn't bet against him. "The Bitter Lesson" may imply an advantage to someone who historically has been at the tip of the spear for squeezing the most juice out of GPU hosted parallel computation.
Graphics rendering and AI live on the same pyramid of technology. A pyramid with a lot of bricks with the initials "JC" carved into them, as it turns out.
cheschire
Carmack is always a genius, but like most people he requires luck, and like most people, the house always wins. Poor Armadillo Aerospace.
mrguyorama
What has "Graphics Carmack" actually done since about 2001?
So, his initial tech was "Adaptive tile refresh" in Commander Keen, used to give it console style pixel-level scrolling. Turns out, they actually hampered themselves in Commander Keen 1 by not understanding the actual tech, and implemented "The Jolt", a feature that was not necessary. The actual hardware implemented scrolling the same way that consoles like the NES did, and did not need "the jolt", nor the limitations it imposed.
Then, Doom and Quake was mostly him writing really good optimizations of existing, known and documented algorithms and 3D techniques, usually by recognizing what assumptions they could make, what portions of the algorithm didn't need to be recalculated when, etc. Very talented at the time, but in the software development industry, making a good implementation of existing algorithms that utilize your specific requirements is called doing your job. This is still the height of his relative technical output IMO.
Fast Inverse Square Root was not invented by him, but was floating around in industry for a while. He still gets kudos for knowing about it and using it.
"Carmack's reverse" is a technique for doing stencil shadows that was a minor (but extremely clever) modification to the "standard" documented way of doing shadow buffers. There is evidence of the actual technique from a decade before Carmack put it in Doom 3 and it was outright patented by two different people the year before. There is no evidence that Carmack "stole" or anything this technique, it was independent discovery, but was clearly also just a topic in the industry at the time.
"Megatextures" from Rage didn't really go anywhere.
Did Carmack actually contribute anything to VR rendering while at Oculus?
People treat him like this programming god and I just don't understand. He was well read, had a good (maybe too good) work ethic, and was very talented at writing 386 era assembly code. These are all laudable, but doesn't in my mind imply that he's some sort of 10X programmer who could revolutionize random industries that he isn't familiar with. 3D graphics math isn't exactly difficult.
dumdedum123
Exactly. I know him and like him. He is a genius programmer for sure BUT people forget that the last successful product that he released was Doom 3 over 20 years ago. Armadillo was a failure and Oculus went nowhere.
He's also admitted he doesn't have much of math chops, which you need if you want to make a dent in AI. (Although the same could have been said of 3D graphics when he did Wolfenstein and Doom, so perhaps he'll surprise us)
I wish him well TBH
ramesh31
What has he shipped in the last 20 years? Oculus is one thing, but that was firmly within his wheelhouse of graphics optimization. Abrash and co. handled the hardware side of things.
Carmack is a genius no doubt. But genius is the result of intense focused practice above and beyond anyone else in a particular area. Trying to extend that to other domains has been the downfall of so many others like him.
alexey-salmin
Ever since Romero departed the id Software had shipped *checks notes* Quake II, Quake III, Doom 3 and Quake 4.
Funnily enough Romero himself didn't ship much either. IMO it's one of the most iconic "duo breakups". The whole is greater than the sum of the parts.
null
cmpxchg8b
Appeal to authority is a logical fallacy. People often fall into the trap of thinking that because they are highly intelligent and an expert in one domain that this makes them an expert in one or more other domains. You see this all the time.
mrandish
> People often fall into the trap of thinking that because they are highly intelligent and an expert in one domain that this makes them an expert in one or more other domains.
While this is certainly true, I'm not aware of any evidence that Carmack thinks this way about himself. I think he's been successful enough that's he's personally 'post-economic' and is choosing to spend his time working on unsolved hard problems he thinks are extremely interesting and potentially tractable. In fact, he's actively sought out domain experts to work with him and accelerate his learning.
rurp
Bayesian reasoning isn't a fallacy. A known expert in one domain is often correct about things in a related one. The post didn't claim that Carmack is right, just that that he's who they would bet on to be right, which seems perfectly reasonable to me.
edanm
Expecting an expert in one thing to also be pretty good at other domains, especially when they're relatively related, isn't a fallacy.
speed_spread
I suspect Carmack in the Dancehall with the BFG.
zeroq
>> "they will learn the same lesson I did"
Which is what? Don't trust Altman? x)cmiles74
From a marketing perspective, this strikes me as a very predictable response.
kamranjon
I was really excited when I heard Carmack was focusing on AI and am really looking forward to watching this when the video is up - but just from looking at the slides it seems like he tried to build a system that can play the Atari? Seems like a fun project, but curious what will come out of it or if there is an associated paper being released.
johnb231
Atari games are widely used in Reinforcement Learning (RL) research as a standard benchmark.
https://github.com/Farama-Foundation/Arcade-Learning-Environ...
The goal is to develop algorithms that generalize to other tasks.
sigmoid10
They were highly used. OpenAI even included them in their RL Gym library back in the old days when they were still doing open research. But if you look at this leaderboard from 7 (yes, seven!) years ago [1], most of them were already solved way beyond human capabilities. But we didn't get a really useful general purpose algorithm out of it. As an AI researcher, I always considered Atari a fun academic exercise, but nothing more. Similar to how recognising characters using convnets was cool in the nineties and early 00s, but didn't give us general purpose image understanding. Only modern GPUs and massive training datasets did. Nowadays most cutting-edge RL game research focuses on much more advanced games like Minecraft which is thought to be better suited. But I'm pretty sure it's still not enough. Even role-playing GTA VI won't be. We probably need a pretty advanced physical simulation of the real world before we can get agents to handle the real world. But that means solving the problem of generating such an environment first, because you can't train on the actual real world due to the sample inefficiency of all current algorithms. Nvidia is doing some really interesting research in this direction by combining physics simulation and image generation models to simulate an environment, while getting accuracy and diversity at the same time into training data. But it still feels like some key ingredient is missing.
mschuster91
> But it still feels like some key ingredient is missing.
Continuous training is the key ingredient. Humans can use existing knowledge and apply it to new scenarios, and so can most AI. But AI cannot permanently remember the result of its actions in the real world, and so its body of knowledge cannot expand.
Take a toddler and an oven. The toddler has no concept of what an oven is other than maybe that it smells nice. The toddler will touch the oven, notice that it experiences pain (because the oven is hot) and learn that oven = danger. Place a current AI in a droid toddler body? It will never learn and keep touching the oven as soon as the information of "oven = danger" is out of the context window.
For some cases this inability to learn is actually desirable. You don't want anyone and everyone to be able to train ChatGPT unsupervised, otherwise you get 4chan flooding it with offensive crap like they did to Tay [1], but for AI that physically interacts with the meatspace, constant evaluation and learning is all but mandatory if it is to safely interact with its surroundings. "Dumb" robots run regular calibration cycles for their limbs to make sure they are still aligned to compensate for random deviations, and so will AI robots.
gregdeon
I watched the talk live. I felt that his main argument was that Atari _looks_ solved, but there's still plenty of value that could be gained by revisiting these "solved" games. For one, learning how to play games through a physical interface is a way to start engaging with the kinds of problems that make robotics hard (e.g., latency). They're also a good environment to study catastrophic forgetting: an hour of training on one game shouldn't erase a model's ability to play other games.
I think we could eventually saturate Atari, but for now it looks like it's still a good source of problems that are just out of reach of current methods.
Buttons840
My impression is that Atari was 80% solved, and then researchers and companies moved on.
A company solves self-driving 80% of the way and makes a lot of VC cash along the way. Then they solve intelligent chatbots 80% of the way and make a lot of VC cash along the way. Now they're working on solving humanoid robotics 80% of the way... I wonder why?
In the end, we have technology that can do some neat tricks, but can't be relied upon.
There are probably still some very hard problems in certain Atari games. Only the brave dare tackle these problems, because failure comes sharp and fast. Whereas, throwing more compute at a bigger LLM might not really accomplish anything, but we can make people think it accomplished something, and thus failure is not really possible.
newsclues
Being highly used in the past is good, it's a benchmark to compare against.
tschillaci
You will find many agents that solved (e.g., finished, reached high score) atari games, but there is still so much more work to do in the field. I wrote my Master's thesis on how to learn from few interactions with the game, so that if the algorithm is ported to actual robots they don't need to walk and fall for centuries before learning behaviors. I think there is more research to do on higher levels of generalization: when you know how to play a few video games, you quickly understand how to play a new one intuitively, and I haven't seen thorough research on that.
lo0dot0
I can tell you right now without any research that video game designers reuse interface patterns and game mechanics that were already known when making new games. Those patterns and mechanics are also often analogies for real life allowing humans to intuitively play the games. If people can't play your game intuitively, they might say it's a bad game.
Jensson
So why can't AI learn those and reapply the same understanding to new games?
albertzeyer
His goal was not just to solve Atari games. That was already done.
His goal is to develop generic methods. So you could work with more complex games or the physical world for that, as that is what you want in the end. However, his insight is, you can even modify the Atari setting to test this, e.g. to work in realtime, and the added complexity by more complex games doesn't really give you any new additional insights at this point.
mike_hearn
But how is this different to what NVIDIA have already done? They have robots that can achieve arbitrary and fluid actions in the real world by training NNs in very accurate GPU simulated environments using physics engines. Moving a little Atari stick around seems like not much compared to sorting through your groceries etc.
The approach NVIDIA are using (and other labs) clearly works. It's not going to be more than a year or two now before robotics is as solved as NLP and chatbots are today.
albertzeyer
I think he argues that they would not be able to play Atari games this way (I don't know; maybe I also misunderstood).
But also, he argues a lot about sample efficiency. He wants to develop algorithms/methods/models which can learn much faster / with much fewer data.
modeless
He says they will open source it which is cool. I agree that I don't understand what's novel here. Playing with a physical controller and camera on a laptop GPU in real time is cool, and maybe that hasn't specifically been done before, but it doesn't seem surprising that it is possible.
If it is substantially more sample efficient, or generalizable, than prior work then that would be exciting. But I'm not sure if it is?
RetroTechie
Maybe that's exactly his goal: not to come up with something that beats the competition, but play with the architecture, get a feel for what works & what doesn't, how various aspects affect the output, and improve on that. Design more efficient architectures, or come up with something that has unique features compared to other models.
If so, scaling up may be more of a distraction rather than helpful (besides wasting resources).
I hope he succeeds in whatever he's aiming for.
gadders
I want smarter NPCs in games.
cryptoz
DeepMind’s original demos were also of Atari gameplay.
moralestapia
Here's what they built, https://x.com/ID_AA_Carmack/status/1925243539543265286
Quite exciting. Without diminishing the amazing value of LLMs, I don't think that path goes all the way to AGI. No idea if Carmack has the answer, but some good things will come out of that small research group, for sure.
petters
Isn't that what Deepmind did 12 years ago?
hombre_fatal
He points that out in his notes and says DeepMind needed specialized training and/or 200M frames of training just to kinda play one game.
tsunamifury
What deepmind accomplished with suicidal Mario was so much more than you probably ever will know from outside the company.
moralestapia
IIRC Deepmind (and OpenAI and ...) have done this on software-only setups (emulators, TAS, etc); while this one has live input and actuators in the loop, so, kind of the same thing but operating in the physical realm.
I do agree that it is not particularly groundbreaking, but it's a nice "hey, here's our first update".
null
willvarfar
Playing Atari games makes it easy to benchmark and compare and contrast his future research with Deepmind and more recent efforts.
koolala
I wish he did this with VR environment instead like they mention at the start of the slides. A VR environment with a JPEG camera filter, physics sim, noise, robot simulation. If anyone could program that well its him.
Using real life robots is going to be a huge bottleneck for training hours no matter what they do.
andy_ppp
I still don’t think we have a clear enough idea of what a concept is to be able to think about AGI. And then being able to use concepts from one area to translate into another area, what is the process by which the brain combines and abstracts ideas into something new?
throw310822
Known entities are recurring patterns (we give names to things that occur more than once, in the world or in our thoughts). Concepts are recurring thought patterns. Abstractions, relations, metaphors, are all ways of finding and transferring patterns from one domain to another.
andy_ppp
Sure, I understand what the terminology means but I don't believe we get to AGI without some ability to translate the learning of say using a mouse to using a trackpad in a simple way. Humans make these translations all the while, you know how to use a new room and the items in it automatically but I personally see the systems we have built are currently very brittle when they see new environments because they can't simplify everything to its fundamentals and then extrapolate back to more complex tasks. You could train a human on using an Android phone and give them an iPhone and they would do pretty well, if you did this with modern machine learning systems you will get an extremely high error rate. Or say you train an model on how to use a sword, I'm not convinced it would know how to use and ax or pair of crutches as a weapon.
Maybe it will turn out to simply be enough artificial neurons and everything works. But I don't believe that.
steveBK123
Another thought experiment - if OpenAI AGI was right around the corner, why are they wasting time/money/energy buying a product-less vanity hardware startup run by Ive?
Why not tackle robotics if anything. Or really just be the best AGI and everyone will be knocking on your door to license it in their hardware/software stacks, you will print infinite money.
mindwok
AGI is not enough. Seriously, imagine if they had an AGI in their ChatGPT interface. It’s not enough to do anything truly meaningful. It’s like a genius in the woods somewhere. For AGI to have an impact it needs to be everywhere.
Jensson
> Seriously, imagine if they had an AGI in their ChatGPT interface. It’s not enough to do anything truly meaningful
If they had that people would make agents with it and then it can do tons of truly meaningful things.
People try to make agents with the current one but its really difficult since its not AGI.
steveBK123
Robotics to navigate the physical world seems more impactful than some pin/glasses product to provide a passive audio/visual interface to the chatbot doesn't seem so earth shattering either though.
What would you do with a 10x or 100x smarter Siri/Alexa? I still don't see my life changing.
Give me a robot that can legitimately do household errands like the dishes, laundry, etc.. now we are talking.
whamlastxmas
I think a big problem with AGI discussion is sometimes people aren’t being creative enough. A 100x smarter ChatGPT means biology immortality
amelius
and build cities ...
joshstrange
Once AGI is accomplished I can’t imagine what else it would do but bootstrap itself up which, depending on compute, could scale quite far. OpenAI would only need to feed it compute for the most part.
I don’t think AGI is close, but once it happens it’s hard to imagine it not “escaping” (whenever we want to define that as).
j_timberlake
This line of thought doesn't work, because any company approaching AGI might be actively trying to hide that information from regulators and the military. Being the 1st AGI company is actually pretty risky.
foobiekr
However their actual actions resemble companies who know AGI isn’t even on the horizon and moreso they are acting as exactly as if they believe the AI hype bubble is coming to an end and they need to dump the stuff into the public markets asap.
There really isn’t any other way to interpret OpenAI’s actions for the last few months.
Sure it could all be a feint to hide their amazing progress. Or it could be what it looks like.
Given the hype cycles of the last 20 years, I’m going with the second.
j_timberlake
Name any other company acting like that besides OpenAI. Or any person besides Sam Altman, the guy who screwed up OpenAI's structure/board/funding. You just have a narrative you want to be true, and one company half fits into that narrative.
steveBK123
VCs are far too conditioned as hype men to hide the ball like that.
After generations of boastful over-promising, do you really believe THIS time they are underpromising?
cma
If Ive hype brings more investment valuation than it costs to acquire his company, and the time costs etc., it's a win for current investors regardless of if it is a good investment for the investors in the new round.
soared
Or have your AGI design products
steveBK123
All the more reason not to acquihire Ive for $6.5B, if true
tiahura
Does AGI necessarily mean super-genius? Was KITT AGI? I'm not sure he could design products?
steveBK123
Is VC really funding a trillion dollars of GPU purchases to replace labor that could instead be bid out to developing world mechanical turks for $1/hr?
trendoid
No the term for that is ASI...artifical super intelligence. People in AI community have different timelines for that than AGI.
typon
They have a large robotics research group.
saejox
What Carmack is doing is right. More people need to get away from training their models just with words. AI need the physicality.
johnb231
> More people need to get away from training their models just with words.
They started doing that a couple of years ago. The frontier "language" models are natively multimodal, trained on audio, text, video, images. That is all in the same model, not separate models stitched together. The inputs are tokenized and mapped into a shared embedding space.
Gemini, GPT-4o, Grok 3, Claude 3, Llama 4. These are all multimodal, not just "language models".
timmg
(If you know) how does that work?
Are the audio/video/images tokenized the same way as text and then fed in as a stream? Or is the training objective different than "predict next token"?
If the former, do you think there are limitations to "stream of tokens"? Or is that essentially how humans work? (Like I think of our input as many-dimensional. But maybe it is compressed to a stream of tokens in part of our perception layer.)
johnb231
Ask Gemini to explain how it was trained
NL807
>AI need the physicality.
which i found interesting, because i remember Carmack saying simulated environments are way forward and physical environments are too impractical for developing AI
SeanaldMcDnld
Yeah in that way this demo seemed gimmicky like he acknowledged. He said in the past he would almost count people out if they weren’t training RL in a virtual environment. I agree, still happy he’s staying on the path of online continual learning though
programd
Nvidia seems to think the same thing. Here's Jim Fan talking about a "physical Turing test" and how embodied AI is the way forward.
https://www.youtube.com/watch?v=_2NijXqBESI
He also talks needing large amounts of compute to run the virtual environments where you'll be training embodied AI. Very much worth watching.
pyb
"... Since I am new to the research community, I made an effort" This means they've probably submitted a paper too.
epolanski
It states it's a research, not a product company.
diggan
To be fair, OpenAI is also a "research lab" rather than "product company" and they still sell products for $200/month, not sure the distinction matters in practice much today as long as the entity is incorporated somehow.
pyb
That's what I said
ploden
Why would AGI choose to be embodied? We talk about creating a superior intelligence and having it drive our cars and clean our homes. The scenario in Dan Simmons' Hyperion seems much more plausible: we invent AGI and it disappears into the cloud and largely ignores us.
fusionadvocate
It doesn't need to be permanent. If humans could escape from their embodiment temporarily they would certainly do so. Being permanently bounded to a physical interface is definitely a disadvantage.
dusted
anywhere we can watch the presentation ? the slides alone are great, but if he's saying stuff alongside, I'd be interested in that too :)
It's always a treat to watch a Carmack lecture or read anything he writes, and his notes here are no exception. He writes as an engineer, for engineers and documents all his thought processes and misteps in the exact detailed yet concise way you'd want a colleague to who was handing off some work.
One question I would have about the research direction is the emphasis on realtime. If I understand correctly he's doing online learning in realtime. Obviously makes for a cool demo and pulls on his optimisation background, and no doubt some great innovations will be required to make this work. But I guess the bitter lesson and recent history also tell us that some solutions may only emerge at compute levels beyond what is currently possible for realtime inference let alone learning. And the only example we have of entities solving Atari games is the human brain, of which we don't have a clear understanding of the compute capacity. In which case, why wouldn't it be better to focus purely on learning efficiency and relax the realtime requirement for now?
That's a genuine question by the way, definitely not an expert here and I'm sure there's a bunch of value to working within these constraints. I mean, jumping spiders solve reasonably complex problems with 100k neurons, so who knows.