Andrej Karpathy – It will take a decade to work through the issues with agents

607 comments

·October 17, 2025

Imnimo

>What takes the long amount of time and the way to think about it is that it’s a march of nines. Every single nine is a constant amount of work. Every single nine is the same amount of work. When you get a demo and something works 90% of the time, that’s just the first nine. Then you need the second nine, a third nine, a fourth nine, a fifth nine. While I was at Tesla for five years or so, we went through maybe three nines or two nines. I don’t know what it is, but multiple nines of iteration. There are still more nines to go.

I think this is an important way of understanding AI progress. Capability improvements often look exponential on a particular fixed benchmark, but the difficulty of the next step up is also often exponential, and so you get net linear improvement with a wider perspective.

danielvaughn

I have a very surface level understanding of AI, and yet this always seemed obvious to me. It's almost a fundamental law of the universe that complexity of any kind has a long tail. So you can get AI to faithfully replicate 90% of a particular domain skill. That's phenomenal, and by itself can yield value for companies. But the journey from 90%-100% is going to be a very difficult march.

BolexNOLA

The last mile problem is inescapable!

ekjhgkejhgk

The interview which I've watched recently with Rich Sutton left me with the impression that AGI is not just a matter of adding more 9s.

The interviewer had an idea that he took for granted: that to understand language you have to have a model of the world. LLMs seem to udnerstand language therefore they've trained a model of the world. Sutton rejected the premise immediately. He might be right in being skeptical here.

LarsDu88

This world model talk is interesting, and Yann Lecunn has broached on the same topic, but the fact is there are video diffusion models that are quite good at representing the "video world" and even counterfactually and temporally coherently generating a representation of that "world" under different perturbations.

In fact you can go to a SOTA LLM today, and it will do quite well at predicting the outcomes of basic counterfactual scenarios.

Animal brains such as our own have evolved to compress information about our world to aide in survival. LLMs and recent diffusion/conditional flow matching models have been quite successful in compressing the "text world" and the "pixel world" to score good loss metrics on training data.

It's incredibly difficult to compress information without have at least some internal model of that information. Whether that model is a "world model" that fits the definition of folks like Sutton and LeCunn is semantic.

dreambuffer

Photons hit a human eye and then the human came up with language to describe that and then encoded the language into the LLM. The LLM can capture some of this relationship, but the LLM is not sensing actual photons, nor experiencing actual light cone stimulation, nor generating thoughts. Its "world model" is several degrees removed from the real world.

So whatever fragment of a model it gains through learning to compress that causal chain of events does not mean much when it cannot generate the actual causal chain.

timschmidt

1000% this. I would only add this has been demonstrated explicitly with chess: https://adamkarvonen.github.io/machine_learning/2024/01/03/c...

DrewADesign

I think current AI is a human language/behavior mirror. A cat might believe they see another cat looking in a mirror, but you can’t create a new cat by creating a perfect mirror.

tyre

There is some evidence from Anthropic that LLMs do model the world. This paper[0] tracing their "thought" is fascinating. Basically an LLM translating across languages will "light up" (to use a rough fMRI equivalent) for the same concepts (e.g. bigness) across languages.

It does have clusters of parameters that correlate with concepts, not just randomly "after X word tends to have Y word." Otherwise you would expect all of Chinese to be grouped in one place, all of French in another, all of English in another. This is empirically not the case.

I don't know whether to understand knowledge you have to have a model of the world, but at least as far as language, LLMs very much do seem to have modeling.

[0]: https://www.anthropic.com/research/tracing-thoughts-language...

manmal

> Basically an LLM translating across languages will "light up" (to use a rough fMRI equivalent) for the same concepts (e.g. bigness) across languages

I thought that’s the basic premise of how transformers work - they encode concepts into high dimensional space, and similar concepts will be clustered together. I don’t think it models the world, but just the texts it ingested. It’s observation and regurgitation, not understanding.

I do use agents a lot (soon on my second codex subscription), so I don’t think that’s a bad thing. But I’m firmly in the “they are useful tools” camp.

bravura

How large is a lion?

Learning the size of objects using pure text analysis requires significant gymnastics.

Vision demonstrates physical size more easily.

Multimodal learning is important. Full stop.

Purely textual learning is not sample efficient for world modeling and the optimization can get stuck in local optima that are easily escaped through multimodal evidence.

("How large are lions? inducing distributions over quantitative attributes", Elazar et al 2019)

vlovich123

If it was modeling the world you’d expect “give me a picture of a glass filled to the brim” to actually do that. It’s inability to correctly and accurately combine concepts indicates it’s probably not building a model of the real world.

jhanschoo

Let's make this more concrete than talking about "understanding knowledge". Oftentimes I want to know something that cannot feasibly be arrived at by reasoning, only empirically. Remaining within the language domain, LLMs get so much more useful when they can search the web for news, or your codebase to know how it is organized. Similarly, you need a robot that can interact with the world and reason from newly collected empirical data in order to answer these empirical questions, if the work had not already been done previously.

overfeed

> Basically an LLM translating across languages will "light up" for the same concepts across languages

Which is exactly what they are trained to do. Translation models wouldn't be functional if they are unable to correlate an input to specific outputs. That some hiddel-layer neurons fire for the same concept shouldn't come as a surprise, and is a basic feature required for the core functionality.

SR2Z

Right, but modeling the structure of language is a question of modeling word order and binding affinities. It's the Chinese Room thought experiment - can you get away with a form of "understanding" which is fundamentally incomplete but still produces reasonable outputs?

Language in itself attempts to model the world and the processes by which it changes. Knowing which parts-of-speech about sunrises appear together and where is not the same as understanding a sunrise - but you could make a very good case, for example, that understanding the same thing in poetry gets an LLM much closer.

godelski

  > that to understand knowledge you have to have a model of the world.

You have a small but important mistake. It's to recite (or even apply) knowledge. To understand does actually require a world model.

Think of it this way: can you pass a test without understanding the test material? Certainly we all saw people we thought were idiots do well in class while we've also seen people we thought were geniuses fail. The test and understanding usually correlates but it's not perfect, right?

The reason I say understanding requires a world model (and I would not say LLMs understand) is because to understand you have to be able to detail things. Look at physics, or the far more detail oriented math. Physicists don't conclude things just off of experimental results. It's an important part, but not the whole story. They also write equations, ones which are counterfactual. You can call this compression if you want (I would and do), but it's only that because of the generalization. But it also only has that power because of the details and nuance.

With AI many of these people have been screaming for years (check my history) that what we're doing won't get us all the way there. Not because we want to stop the progress, but because we wanted to ensure continued and accelerate progress. We knew the limits and were saying "let's try to get ahead of this problem" but were told "that'll never be a problem. And if it is, we'll deal with it when we deal with it." It's why Chollet made the claim that LLMs have actually held AI progress back. Because the story that was sold was "AGI is solved, we just need to scale" (i.e. more money). I do still wonder how different things would be if those of us pushing back were able to continue and scale our works (research isn't free, so yes, people did stop us). We always had the math to show that scale wasn't enough, but it's easy to say "you don't need math" when you can see progress. The math never said no progress nor no acceleration, the math said there's a wall and it's easier to adjust now than when we're closer and moving faster. Sadly I don't think we'll ever shift the money over. We still evaluate success weirdly. Successful predictions don't matter. You're still heralded if you made a lot of money in VR and Bitcoin, right?

robotresearcher

In my view 'understand' is a folk psychology term that does not have a technical meaning. Like 'intelligent', 'beautiful', and 'interesting'. It usefully labels a basket of behaviors we see in others, and that is all it does.

In this view, if a machine performs a task as well as a human, it understands it exactly as much as a human. There's no problem of how to do understanding, only how to do tasks. The 'problem' melts away when you take this stance.

Just my opinion, but my professional opinion from thirty-plus years in AI.

JKCalhoun

I'm not sure. There's a view that, as I understand it, suggests that language is intelligence. That language is a requirement for understanding.

An example might be kind of the contrary—that you might not be able to hold an idea in your head until it has been named. For myself, until I heard the word gestalt (maybe a fitting example?) I am not sure I could have understood the concept. But when it is described it starts to coalesce—and then when named, it became real. (If that makes sense.)

FWIW, Zeitgeist is another one of those concepts/words for me. I guess I have to thank the German language.

Perhaps it is why other animals on this planet seem to us lacking intelligence. Perhaps it is their lack of complex language holding their minds back.

nebula8804

Only problem is this time enough money is being burned that if AGI does not come, it will probably be extremely painful/fatal for a lot of people that had nothing to do with this field or the decisions being made. What will be the consequences if that comes to pass? So many lives were permanently ruined due to the GFC.

munksbeer

> We always had the math to show that scale wasn't enough

Math, to show that scale (presumably of LLMs) wasn't enough for AGI?

This sounds like it would be quite a big deal, what math is that?

naasking

> It's to recite (or even apply) knowledge. To understand does actually require a world model.

This is a shell game, or a god of the gaps. All you're saying is that the models "understand" how to recite or apply knowledge or language, but somehow don't understand knowledge or language. Well what else is there really?

bentt

I think this a useful challenge to our normal way of thinking.

At the same time, "the world" exists only in our imagination (per our brain). Therefore, if LLMs need a model of a world, and they're trained on the corpus of human knowledge (which passed through our brains), then what's the difference, especially when LLMs are going back into our brains anyway?

qlm

Language isn't thought. It's a representation of thought.

imtringued

Model based reinforcement learning is a thing and it is kind of a crazy idea. Look up temporal difference model predictive control.

The fundamental idea behind temporal difference is that you can record any observable data stream over time and predict the difference between past and present based on your decision variables (e.g. camera movement, actuator movement, and so on). Think of it like the Minecraft clone called Oasis AI. The AI predicts the response to a user provided action.

Now imagine if it worked as presented. The data problem would be solved, because you are receiving a constant stream of data every single second. If anything, the RL algorithms are nowhere near where they need to be and continual learning has not been solved yet, but the best known way is through automatic continual learning ala Schmidhuber (co-inventor of LSTMs along with Hochreiter).

So, model based control is solved right? Everything that can be observed can be controlled once you have a model!

Wrong. Unfortunately. You still need the rest of reinforcement learning: an objective and a way to integrate the model. It turns out that reconstructing the observations is too computationally challenging and the standard computational tricks like U-Nets learn a latent representation that is optimized for reconstruction rather than for your RL objectives. There is a data exchange problem that can only realistically be solved by throwing an even bigger model at it, but here is why that won't work either:

Model predictive control tries to find the best trajectory over a receding horizon. It is inherently future oriented. This means that you need to optimize through your big model and that is expensive to do.

So you're going to have to take shortcuts by optimizing for a specific task. You reduce the dimension of the latent space and stop reconstructing the observations. The price? You are now learning a latent space for your particular task, which is less demanding. The dream of continual learning with infinite data shatters and you are brought down to earth: it's better than what came before, but not that much better.

sysguest

yeah that "model of the world" would mean:

babies are already born with "the model of the world"

but a lot of experiments on babies/young kids tell otherwise

ekjhgkejhgk

> yeah that "model of the world" would mean: babies are already born with "the model of the world"

No, not necessarily. Babies don't interact with the world only by reading what people wrote wikipedia and stackoverflow, like these models are trained. Babies do things to the world and observe what happens.

I imagine it's similar to the difference between a person sitting on a bicycle and trying to ride it, vs a person watching videos of people riding bicycles.

I think it would actually be a great experiment. If you take a person that never rode a bicycle in their life and feed them videos of people riding bicycles, and literature about bikes, fiction and non-fiction, at some point I'm sure they'll be able to talk about it like they have huge experience in riding bikes, but won't be able to ride one.

godelski

It's a lot more complicated than that.

You have instincts, right? Innate fears? This is definitely something passed down through genetics. The Hawk/Goose Effect isn't just limited to baby chickens. Certainly some mental encoding passes down through genetics as how much the brain controls, down to your breathing and heartbeat.

But instinct is basic. It's something humans are even able to override. It's a first order approximation. Inaccurate to do meaningfully complex things, but sufficient to keep you alive. Maybe we don't want to call the instinct a world model (it certainly is naïve) but can't be discounted either.

In human development, yeah, the lion's share of it happens post birth. Human babies don't even show typical signs of consciousness, even really till the age of 2. There's many different categories of "awareness" and these certainly grow over time. But the big thing that makes humans so intelligent is that we continue to grow and learn through our whole lifetimes. And we can pass that information along without genetics and have very advanced tools to do this.

It is a combination of nature and nurture. But do note that this happens differently in different animals. It's wonderfully complex. LLMs are quite incredible but so too are many other non-thinking machines. I don't think we should throw them out, but we never needed to make the jump to intelligence. Certainly not so quickly. I mean what did Carl Sagan say?

ben_w

> babies are already born with "the model of the world"

> but a lot of experiments on babies/young kids tell otherwise

I believe they are born with such a model? It's just that model is one where mummy still has fur for the baby to cling on to? And where aged something like 5 to 8 it's somehow useful for us to build small enclosures to hide in, leading to a display of pillow forts in the modern world?

rwj

Lots of experiments show that babies develop import capabilities at roughly the same times. That speaks to inherited abilities.

null

[deleted]

DanHulton

The thing about this, though - cars have been built before. We understand what's necessary to get those 9s. I'm sure there were some new problems that had to be solved along the way, but fundamentally, "build good car" is known to be achievable, so the process of "adding 9s" there makes sense.

But this method of AI is still pretty new, and we don't know it's upper limits. It may be that there are no more 9s to add, or that any more 9s cost prohibitively more. We might be effectively stuck at 91.25626726...% forever.

Not to be a doomer, but I DO think that anyone who is significantly invested in AI really has to have a plan in case that ends up being true. We can't just keep on saying "they'll get there some day" and acting as if it's true. (I mean you can, just not without consequences.)

danielmarkbruce

Another way to look at it is the self driving car - they got there eventually.

If you work on applications using LLMs you can pretty easily see that Karpathy's sentiment is correct. You see it because you do it. Even simple applications are shaped like this, even if each 9 takes less time than self driving cars for a simple app, it still feels about right.

fair_enough

Reminds me of a time-honored aphorism in running:

A marathon consists of two halves: the first 20 miles, and then the last 10k (6.2mi) when you're more sore and tired than you've ever been in your life.

jakeydus

This is 100% unrelated to the original article but I feel like there's an underreported additional first half. As a bigger runner who still loves to run, the first two or three miles before I have enough endorphins to get into the zen state that makes me love running is the first half, then it's 17 miles of this amazing meditative mindset. Then the last 10k sucks.

awesome_dude

Just, ftr, endorphins cannot pass the blood brain barrier

http://hopkinsmedicine.org/health/wellness-and-prevention/th...

rootusrootus

I suspect that is true for many difficult physical goals.

My dad told me that the first time you climb a mountain, there will likely be a moment not too distant from the top when you would be willing to just sit down and never move again, even at the risk to your own life. Even as you can see the goal not far away.

He also said that it was a dangerous enough situation that as a climb leader he'd start kicking you if he had to, if you sat down like that and refused to keep climbing. I'm not a climber myself, though, so this is hearsay, and my dad is long dead and unable to remind me of what details I've forgotten.

tylerflick

I think I hated life most after 20 miles. Especially in training.

sarchertech

Why just run 20 miles then?

rootusrootus

Because then it wouldn't be a challenge and nobody would care about the achievement.

nextworddev

because that'd be quitting the race with 6.2 miles left to go

omidsa1

I also quite like the way he puts it. However, from a certain point onward, the AI itself will contribute to the development—adding nines—and that’s the key difference between this analogy of nines in other systems (including earlier domain‑specific ML ones) and the path to AGI. That's why we can expect fast acceleration to take off within two years.

breuleux

I don't think we can be confident that this is how it works. It may very well be that our level of intelligence has a hard limit to how many nines we can add, and AGI just pushes the limit further, but doesn't make it faster per se.

It may also be that we're looking at this the wrong way altogether. If you compare the natural world with what humans have achieved, for instance, both things are qualitatively different, they have basically nothing to do with each other. Humanity isn't "adding nines" to what Nature was doing, we're just doing our own thing. Likewise, whatever "nines" AGI may be singularly good at adding may be in directions that are orthogonal to everything we've been doing.

Progress doesn't really go forward. It goes sideways.

bamboozled

It's also assuming that all advances in AI just lead to cold hard gains, people have suggested this before but would a sentient AI get caught up in philosophical, silly or religious ideas? Silicone investor types seem to hope it's all just curing diseases they can profit from, but it might also be, "let's compose some music instead"?

adventured

Adding nines to nature is exactly what humans are doing. We are nature. We are part of the natural order.

Anything that exists is part of nature, there can be no exceptions.

If I go burn a forest down on purpose, that is in fact nature doing it. No different than if a dolphin kills another animal for fun or a chimp kills another chimp over a bit of territory. Insects are also every bit as 'vicious' in their conquests.

j45

Intuition of someone who has put in a decade or two of wondering openly can't me discounted as easily as someone who might be a beginner to it.

AGI to encompass all of humanity's knowledge in one source and beat every human on every front might be a decade away.

Individual agents with increased agency adequately covering more and more abilities consistently? Seems like a steady path that can be seen into the horizon to put one foot in front of the other.

For me, the grain of salt I'd take Karpathy with is much, much, smaller than average, only because he tries to share how he thinks and examines his own understanding and changes it.

His ability to explain complex things simply is something that for me helps me learn and understand things quicker and see if I arrive at something similar or different, and not immediately assume anything is wrong, or right without my understanding being present.

techblueberry

I think the 9's include this assumption.

Yoric

It's a possibility, but far from certainty.

If you look at it differently, assembly language may have been one nine, compilers may have been the next nine, successive generations of language until ${your favorite language} one more nine, and yet, they didn't get us noticeably closer to AGI.

rpcope1

> However, from a certain point onward, the AI itself will contribute to the development—adding nines—and that’s the key difference between this analogy of nines in other systems (including earlier domain‑specific ML ones) and the path to AGI.

There's a massive planet-sized CITATION NEEDED here, otherwise that's weapons grade copium.

null

[deleted]

aughtdev

I doubt this. General intelligence will be a step change not a gentle ramp. If we get to an architecture intelligent enough to meaningfully contribute to AI development, we'll have already made it. It'll simply be a matter of scale. There's no 99% AGI that can help build 100% AGI but for some reason can't drive a car or cook a meal or work an office job.

AnimalMuppet

Isn't that one of the measures of when it becomes an AGI? So that doesn't help you with however many nines we are away from getting an AGI.

Even if you don't like that definition, you still have the question of how many nines we are away from having an AI that can contribute to its own development.

I don't think you know the answer to that. And therefore I think your "fast acceleration within two years" is unsupported, just wishful thinking. If you've got actual evidence, I would like to hear it.

ben_w

AI has been helping with the development of AI ever since at least the first optimising compiler or formal logic circuit verification program.

Machine learning has been helping with the development of machine learning ever since hyper-parameter optimisers became a thing.

Transformers have been helping with the development of transformer models… I don't know exactly, but it was before ChatGPT came out.

None of the initials in AGI are booleans.

But I do agree that:

> "fast acceleration within two years" is unsupported, just wishful thinking

Nobody has any strong evidence of how close "it" is, or even a really good shared model of what "it" even is.

scragz

AGI is when it is general. a narrow AI trained only on coding and training AIs would contribute to the acceleration without being AGI itself.

godelski

It's a good way to think about lots of things. It's Pareto efficiency. The 80/20 rule

20% of your effort gets you 80% of the way. But most of your time is spent getting that last 20%. People often don't realize that this is fractal like in nature, as it draws from the power distribution. So of that 20% you still have left, the same holds true. 20% of your time (20% * 80% = 16% -> 36%) to get 80% (80% * 20% => 96%) again and again. The 80/20 numbers aren't actually realistic (or constant) but it's a decent guide.

It's also something tech has been struggling with lately. Move fast and break things is a great way to get most of the way there. But you also left a wake of destruction and tabled a million little things along the way. Someone needs to go back and clean things up. Someone needs to revisit those tabled things. While each thing might be little, we solve big problems by breaking them down into little ones. So each big problem is a sum of many little ones, meaning they shouldn't be quickly dismissed. And like the 9's analogy, 99.9% of the time is still 9hrs of downtime a year. It is still 1e6 cases out of 1e9. A million cases is not a small problem. Scale is great and has made our field amazing, but it is a double edged sword.

I think it's also something people struggle with. It's very easy to become above average, or even well above average at something. Just trying will often get you above average. It can make you feel like you know way more but the trap is that while in some domains above average is not far from mastery in other domains above average is closer to no skill than it is to mastery. Like how having $100m puts your wealth closer to a homeless person than a billionaire. At $100m you feel way closer to the billionaire because you're much further up than the person with nothing but the curve is exponential.

010101010101

https://youtu.be/bpiu8UtQ-6E?si=ogmfFPbmLICoMvr3

"I'm closer to LeBron than you are to me."

sdenton4

Ha, I often speak of doing the first 90% of the work, and then moving on to the following 90% of the work...

JimDabell

> The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.

— Tom Cargill, Bell Labs (September 1985)

https://dl.acm.org/doi/pdf/10.1145/4284.315122

inerte

I use "The project is 90% ready, now we only have to do the other half"

typpilol

92% is half actually - RuneScape Players

somanyphotons

This is an amazing quote that really applies to all software development

zeroonetwothree

Well, maybe not all. I’ve definitely built CRUD UIs that were linear in effort. But certainly anything technically challenging or novel.

Veserv

Drawn from Karpathy killing a bunch of people by knowingly delivering defective autonomous driving software instead of applying basic engineering ethics and refusing to deploy the dangerous product he was in charge of.

simonw

It looks like Andrej's definition of "agent" here is an entity that can replace a human employee entirely - from the first few minutes of the conversation:

When you’re talking about an agent, or what the labs have in mind and maybe what I have in mind as well, you should think of it almost like an employee or an intern that you would hire to work with you. For example, you work with some employees here. When would you prefer to have an agent like Claude or Codex do that work?

Currently, of course they can’t. What would it take for them to be able to do that? Why don’t you do it today? The reason you don’t do it today is because they just don’t work. They don’t have enough intelligence, they’re not multimodal enough, they can’t do computer use and all this stuff.

They don’t do a lot of the things you’ve alluded to earlier. They don’t have continual learning. You can’t just tell them something and they’ll remember it. They’re cognitively lacking and it’s just not working. It will take about a decade to work through all of those issues.

eddiewithzato

Because that's the definition that is leading to all these investments, the promise that very soon they will reach it. If Altman said plainly that LLMs will never reach that stage, there would be a lot less investment into the industry.

aik

Hard disagree. You don’t need AGI to transform countless workflows within companies, current LLMs can do it. A lot of the current investments are to help with the demand with current generation LLMs (and use cases we know will keep opening up with incremental improvements). Are you aware of how intensely all the main companies that host leading models (azure, aws, etc) are throttling usage due to not enough data center capacity? (Eg. At my company we have 100x more demand than we can get capacity for, and we’re barely getting started. We have a roadmap with 1000x+ the current demand and we’re a relatively small company.)

AGI would be more impactful of course, and some use cases aren’t possible until we have it, but that doesn’t diminish the value of current AI.

kllrnohj

> Eg. At my company we have 100x more demand than we can get capacity for, and we’re barely getting started. We have a roadmap with 1000x+ the current demand and we’re a relatively small company.

OpenAI's revenue is $13bn with 70% of that coming from people just spending $20/mo to talk to ChatGPT. Anthropic is projecting $9bn in revenue in 2025. For nice cold splash of reality, fucking Arizona Iced Tea has $3bn in revenue (also that's actual revenue not ARR)

You might have 100x more demand than you can get capacity for, but if that 100x still puts you at a number that in absolute terms is small, it's not very impressive. Similarly if you're already not profitable and achieving 100x growth requires 1,000x in spend, that's also not a recipe for success. In fact it's a recipe for going bankrupt in a hurry.

Culonavirus

Oh look, people with skin in the AI game insist AI is not a massive bubble. More news at 11.

sarchertech

He’s not just talking about agents good enough to replace workers. He’s talking about whether agents are currently useful at all.

>Overall, the models are not there. I feel like the industry is making too big of a jump and is trying to pretend like this is amazing, and it’s not. It’s slop. They’re not coming to terms with it, and maybe they’re trying to fundraise or something like that. I’m not sure what’s going on, but we’re at this intermediate stage. The models are amazing. They still need a lot of work. For now, autocomplete is my sweet spot. But sometimes, for some types of code, I will go to an LLM agent.

>They kept trying to mess up the style. They’re way too over-defensive. They make all these try-catch statements. They keep trying to make a production code base, and I have a bunch of assumptions in my code, and it’s okay. I don’t need all this extra stuff in there. So I feel like they’re bloating the code base, bloating the complexity, they keep misunderstanding, they’re using deprecated APIs a bunch of times. It’s a total mess. It’s just not net useful. I can go in, I can clean it up, but it’s not net useful.

bbor

Quite telling -- thanks for the insightful comment as always, Simon. Didn't know that, even though I've been discussing this on and off all day on Reddit.

He's a smart man with well-reasoned arguments, but I think he's also a bit poisoned by working at such a huge org, with all the constraints that comes with. Like, this:

  You can’t just tell them something and they’ll remember it.

It might take a decade to work through this issue if you just want to put a single LLM in a single computer and have it be a fully-fledged human, sure. And since he works at a company making some of the most advanced LLMs in the world, that perspective makes sense! But of course that's not how it's actually going to be (/already is).

LLMs are a necessary part of AGI(/"agents") due to their ability to avoid the Frame Problem[1], but they're far from the only needed thing. We're pretty dang good at "remembering things" with computers already, and connecting that with LLM ensembles isn't going to take anywhere close to 10 years. Arguably, we're already doing it pretty darn well in unified systems[2]...

If anyone's unfamiliar and finds my comment interesting, I highly recommend Minsky's work on the Society of Mind, which handled this topic definitively over 20 years ago. Namely;

A short summary of "Connectionism and Society of Mind" for laypeople at DARPA: https://apps.dtic.mil/sti/tr/pdf/ADA200313.pdf

A description of the book itself, available via Amazon in 48h or via PDF: https://en.wikipedia.org/wiki/Society_of_Mind

By far my favorite paper on the topic of connectionist+symbolist syncreticism, though a tad long: https://www.mit.edu/~dxh/marvin/web.media.mit.edu/~minsky/pa...

[1] https://plato.stanford.edu/entries/frame-problem/

[2] https://github.com/modelcontextprotocol/servers/tree/main/sr...

null

[deleted]

spjt

The thing about AGI is that if it's even possible, it's not coming before the money runs out of the current AI hype cycle. At least we'll all be able to pick up a rack of secondhand H100's for a tenner and a pack of smokes to run uncensored diffusion models on in a couple years. The real devastation will be in the porn industry.

joshellington

To throw two pennies in the ocean of this comment section - I’d argue we still lack schematic-level understanding of what “intelligence” even is or how it works. Not to mention how it interfaces with “consciousness”, and their likely relation to each other. Which kinda invalidates a lot of predictions/discussions of “AGI” or even in general “AI”. How can one identify Artificial Intelligence/AGI without a modicum of understanding of what the hell intelligence even is.

qudat

The reason why it’s so hard to define intelligence or consciousness is because we are hopelessly biased with a datapoint of 1. We also apply this unjustified amount of mysticism around it.

https://bower.sh/who-will-understand-consciousness

__MatrixMan__

I don't think we can ever know that we are generally intelligent. We can be unsure, or we can meet something else which possesses a type of intelligence that we don't, and then we'll know that our intelligence is specific and not general.

So to make predictions about general intelligence is just crazy.

And yeah yeah I know that OpenAI defines it as the ability to do all economically relevant tasks, but that's an awful definition. Whoever came up with that one has had their imagination damaged by greed.

judahmeek

All intelligence is specific, as evidenced by the fact that a universal definition regarding the specifics of "common sense" doesn't exist.

Culonavirus

> I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["<insert general intelligence buzzword>"], and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the <insert llm> involved in this case is not that.

https://en.wikipedia.org/wiki/I_know_it_when_I_see_it

vannucci

This so much this. We don’t even have a good model for how invertebrate minds work or a good theory of mind. We can keep imitating understanding but it’s far from any actual intelligence.

tim333

I'm not sure we or evolution needed a theory of mind. Evolution stuck neurons together in various ways and fiddled with it till it worked without a master plan and the LLM guys seem to be doing something rather like that.

null

[deleted]

hax0ron3

If the transcript is accurate, Karpathy does not actually ever, in this interview, say that AGI is a decade away, or make any concrete claims about how far away AGI is. Patel's title is misleading.

dang

Hmm good point. I skimmed the transcript looking for an accurate, representative quote that we could use in the title above. I couldn't exactly find one (within HN's 80 char limit), so I cobbled together "It will take a decade to get agents to work", which is at least closer to what Karpathy actually said.

If anyone can suggest a more accurate and representative title, we can change it again.

Edit: I thought of using "For now, autocomplete is my sweet spot", which has the advantage of being an exact quote; but it's probably not clear enough.

Edit 2: I changed it to "It will take a decade to work through the issues with agents" because that's closer to the transcript.

Anybody have a better idea? Help the cause of accuracy out here!

tim333

He says re agents:

>They don't have enough intelligence, they're not multimodal enough, they can't do computer use and all this stuff. They don't do a lot of the things you've alluded to earlier. They don't have continual learning. You can't just tell them something and they'll remember it. They're cognitively lacking and it's just not working.

>It will take about a decade to work through all of those issues. (2:20)

bamboozled

Couldn't have even been bothered watching ~ 2 minutes of an interview before commenting.

whiplash451

Did the same with Sutton (LLMs are a dead end) when Sutton never said this in the conversation.

nextworddev

There's a lot of salt here

dang

Please don't cross into personal attack. It's not what this site is for, and destroys what it is for.

JCharante

Why does everyone have such short timelines to show progress? So what if it takes 50 years to develop, we’ll have AGI for the next million years

Zacharias030

With all due respect, what does it say about us that „famous researcher voices his speculative opinion“ is an instant top 1 on hackernews?

padolsey

I mean, HN is no stranger to cults of personality. Post a paulg essay that says something a teenager could write and it'll fly up to position 1.

tptacek

That the speculative opinions of famous researchers are a good starting point for curious conversation.

Groxx

nothing in particular. it has always been pretty common here - famous researchers are often famous due to making significant correct predictions or reliably clearly communicating things. they are generally worth listening to, unless you know better.

that's pretty much the whole reason experts are experts.

and sometimes you do know better than the famous ones! but for others, it's still a decent starting point compared to most.

keeda

Huh, I'm surprised that he goes from "No AI" to "AI autocomplete" to "Vibecoding / Agents" (which I assume means no human review per his original coinage of the term.) This seems to preclude the chat-oriented / pair-programming model which I find most effective. Or even the plan-spec-codegen-review approach, which IME works extremely well for straightforward CRUD apps.

Also they discuss the nanochat repo in the interview, which has become more famous for his tweet about him NOT vibe-coding it: https://www.dwarkesh.com/i/176425744/llm-cognitive-deficits

Things are more nuanced than what people have assumed, which seems to be "LLMs cannot handle novel code". The best I can summarize it as is that he was doing rather non-standard things that confused the LLMs which have been trained on vast amounts on very standard code and hence kept defaulting to those assumptions. Maybe a rough analogy is that he was trying to "code golf" this repo whereas LLMs kept trying to write "enterprise" code because that is overwhelmingly what they have been trained on.

I think this is where the chat-oriented / pair-programming or spec-driven model shines. Over multiple conversations (or from the spec), they can understand the context of what you're trying to do and generate what you really want. It seems Karpathy has not tried this approach (given his comments about "autocomplete being his sweet spot".)

For instance, I'm working on some straightforward computer vision stuff, but it's complicated by the fact that I'm dealing with small, low-resolution images, which does not seem well-represented in the literature. Without that context, the suggestions any AI gives me are sub-optimal.

However, after mentioning it a few times, ChatGPT now "remembers" this in its context, and any suggestion it gives me during chat is automatically tailored for my use-case, which produces much better results.

Put another way (not an AI expert so I may be using the terms wrong), LLMs will default to mining the data distribution they've been trained on, but with sufficient context, they should be able to adapt their output to what you really want.

arthurofbabylon

Agency. If one studied the humanities they’d know how incredible a proposal “agentic” AI is. In the natural world, agency is a consequence of death: by dying, the feedback loop closes in a powerful way. The notion of casual agency (I’m thinking of Jensen Huang’s generative > agentic > robotic insistence) is bonkers. Some things are not easily speedrunned.

(I did listen to a sizable portion of this podcast while making risotto (stir stir stir), and the thought occurred to me: “am I becoming more stupid by listening to these pundits?” More generally, I feel like our internet content (and meta content (and meta meta content)) is getting absolutely too voluminous without the appropriate quality controls. Maybe we need more internet death.)

dist-epoch

Models die too - the less agentic ones are out-competed by the more agentic ones.

Every AI lab brags how "more agentic" their latest model is compared to the previous one and the competition, and everybody switches to the new model.

ngruhn

I don't agree but I did laugh

null

[deleted]