Skip to content(if available)orjump to list(if available)

Explainer: What's r1 and everything else?

WesleyJohnson

Someone should author an ELI5 (or slightly older) guide to how LLMs, RL, Agents, CoT, etc, all work and what all these acronyms mean. And then, add to it, daily or weekly, as new development arise. I don't want to keep reading dozens of articles, white papers, tweets, etc, as new developments happen. I want to go back to the same knowledge base, that's authored by the same person (or people), that maintains a consistent reading and comprehension level, and builds on prior points.

It seems like the AI space is moving impossibly fast, and its just ridiculously hard to keep up unless 1) you work in this space, 2) are very comfortable with the technology behind it, so you can jump in at any point and understand it.

esafak

Just ask an Internet-enabled LLM like You.com to do it. This is what they are good that. Wikipedia satisfies your repository requirement.

alotaluv

haha just have gpt operator do it

rahimnathwani

  Most important, R1 shut down some very complex ideas (like DPO & MCTS) and showed that the path forward is simple, basic RL.
This isn't quite true. R1 used a mix of RL and supervised fine-tuning. The data used for supervised fine-tuning may have been model-generated, but the paper implies it was human-curated: they kept only the 'correct' answers.

Havoc

>people re-creating R1 (some claim for $30)

R1 or the R1 finetunes? Not the same thing...

HF is busy recreating R1 itself but that seems to be a pretty big endevour not a $30 thing

m3kw9

Maybe they mistake recreation for the cp command

littlestymaar

This is indeed a massive exaggeration, I'm pretty sure the $30 experiment is this one: https://threadreaderapp.com/thread/1882839370505621655.html (github: https://github.com/Jiayi-Pan/TinyZero).

And while this is true that this experiment shows that you can reproduce the concept of direct reinforcement learning of an existing LLM, in a way that makes it develop reasoning in the same fashion Deepseek-R1 did, this is very far from a re-creation of R1!

whimsicalism

this is a pretty hype-laden/twitter-laden article, i would not trust it to explain things to you

thih9

Sure. But perhaps some hype is justified? Here's what a senior research scientist from nvidia says:

> We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive - truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely.

> DeepSeek-R1 not only open-sources a barrage of models but also spills all the training secrets. They are perhaps the first OSS project that shows major, sustained growth of an RL flywheel. (…)

https://x.com/DrJimFan/status/1881353126210687089

comrade1234

The benchmarks for the different models focus on math and coding accuracy. I have a use-case for a model where those two functions are completely irrelevant and I’m only interested in writing (chat, stories, etc). I guess you can’t really benchmark ‘concepts’ as easily as logic.

With distillation, can a model be made that strips out most of the math and coding stuff?

NitpickLawyer

> completely irrelevant and I’m only interested in writing (chat, stories, etc)

There's a person keeping track of a few writing prompts and the evolution of the quality of text with each new shiny model. They shared this link somewhere, can't find the source but I had it bookmarked for further reading. Have a look at it and see if it's something you'd like.

https://eqbench.com/results/creative-writing-v2/deepseek-ai_...

afro88

Here's a better link: https://eqbench.com/creative_writing.html

The R1 sample reads way better than anything else on the leaderboard to me. Quite a jump.

comrade1234

Those outputs are really good and come from deepseek-R1 (I assume the full version, not a distilled version).

R1 is quite large (685B params). I’m wondering if you can make a distilled R1 without the coding and math content. 7B works well for me locally. When I go up to 32B I seem to get worse results - I assume it’s just timing out in its think mode… I haven’t had time to really investigate though.

ben30

Yes, you can create a writing-focused model through distillation, but it's tricky. *Complete removal* of math/coding abilities is challenging because language models' knowledge is interconnected - the logical thinking that helps solve equations also helps structure coherent stories.

mercer

I understood that at least some of these big models (llama?) is basically bootstrapped with code. is there truth to that?

ben30

Yes, code is a key training component. Open-Llama explicitly used programming data as one of seven training components. However, newer models like Llama 3.1 405B have shifted to using synthetic data instead. Code helps develop structured reasoning patterns but isn't the sole foundation - models combine it with general web text, books, etc.

1123581321

Nice explainer. R1 hit sensational mainstream news which has resulted in some confusion and alarm among family and friends. It’s hard to succinctly explain this doesn’t mean China is destroying us, that Americans immediately started working with the breakthrough, cost optimization is inevitable in computing, etc.

bikamonki

So, is AI already reasoning or not?

brookst

Depends on your definition of reasoning. Creating valid chains of thought? Yes. Sentient? No.

ozten

Yes. ARC AGI benchmark was supposed to last years and is already saturated. The authors are currently creating the second version.

fullstackchris

So the conclusion is AI is about to "increase in abilities at an exponential rate", with the only data point being that R1 was sucessfully able to acheive o1 levels as an open source model? In other words, two extremely unrelated themes?

Does this guy know people were writing verbatim the same thing in like... 2021? Still always incredible to me the same repeated hype over and over rise to the surface. Oh well... old man gonna old man

simonw

From that article:

> ARC-AGI is a benchmark that’s designed to be simple for humans but excruciatingly difficult for AI. In other words, when AI crushes this benchmark, it’s able to do what humans do.

That's a misunderstanding of what ARC-AGI means. Here's what ARC-AGI creator François Chollet has to say: https://bsky.app/profile/fchollet.bsky.social/post/3les3izgd...

> I don't think people really appreciate how simple ARC-AGI-1 was, and what solving it really means.

> It was designed as the simplest, most basic assessment of fluid intelligence possible. Failure to pass signifies a near-total inability to adapt or problem-solve in unfamiliar situations.

> Passing it means your system exhibits non-zero fluid intelligence -- you're finally looking at something that isn't pure memorized skill. But it says rather little about how intelligent your system is, or how close to human intelligence it is.

tkellogg

Ah! My bad, I edited the article to simply quote Francois. Thanks for catching this, Simon.

dartos

> That's a misunderstanding of what ARC-AGI means

Misunderstanding benchmarks seems to be the first step to claiming human level intelligence.

Additionally:

> > ARC-AGI is a benchmark that’s designed to be simple for humans but excruciatingly difficult for AI. In other words, when AI crushes this benchmark, it’s able to do what humans do.

Doesn’t even make logical sense.

ethbr1

This feels like a generalized extension of the classic mis-reasoned response to 'A computer can now play chess.'

Common non-technical chain of thought after learning this: 'Previously, only humans could play chess. Now, computers can play chess. Therefore, computers can now do other things that previously only humans could do.'

The error is assuming that problems can only be solved via levels of human-style general intelligence.

Obviously, this is false from the way that computers calculate arithmetic, optimize via gradient descent, and innumerable other examples, but it does seem to be a common lay misunderstanding.

Probably why IBM abused it with their Watson marketing.

In reality, for reliable capabilities reasoning, the how matters very much.

9dev

Doesn’t that turn the entire premise on its head? If passing the benchmark means crossing the lower, not the upper threshold, that invalidates most claims derived from it.

cootsnuck

Correct. Hence many people constantly bemoaning the hype driven narratives that dominate many AI discussions.

polotics

Interesting article, but the flourish ending """AI will soon (if not already) increase in abilities at an exponential rate.""" is not at all substantiated. Would be nice to know how the author gets to that conclusion.

tkellogg

Author here. I do believe it's going to be exponential (not yet), but that's out of scope for the article. However, if someone has a good explainer link for that, please put it here and I'll link it into the post.

samsartor

All past data shows is exponential growth in the cost of AI systems, not an exponential growth in capability. Capabilities have certainly expanded, but that is hard to measure. The growth curve is just as likely to be sigmoid-shaped. Just a phase transition from "computers process information strictly procedurally" to "computers use fuzzy logic sometimes too". And if we've exhausted all the easy wins, that explains the increased interest in alternative scaling paths.

Obviously predicting the future is hard, and we won't know where this stops till we get there. But I think a degree of skepticism is warranted.

DennisP

Once AI becomes self-improving, using its intelligence to make itself more intelligent, exponential progress seems like the logical consequence. Any lack of exponential progress before it becomes self-improving doesn't have much bearing on that.

It certainly will be sigmoid-shaped in the end, but the top of the sigmoid could be way beyond human intelligence.

jcgrillo

Care to justify those beliefs or are we just supposed to trust your intuition? Why exponential and not merely quadratic (or some other polynomial)? How do you even quantify "it"? I'm teasing, somewhat, because I don't actually expect you're able to answer. Yours isn't reasoned arguments, merely religious fervor dressed up in techy garb. Prove me wrong!

Red_Comet_88

If you read the article, he explains that there are multiple scaling paths now, whereas before it was just parameter scaling. I think it's reasonable to estimate faster progress as a result of that observation.

I like that the HN crowd wants to believe AI is hype (as do I), but it's starting to look like wishful thinking. What is useful to consider is that once we do get AGI, the entirety of society will be upended. Not just programming jobs or other niches, but everything all at once. As such, it's pointless to resist the reality that AGI is a near term possibility.

It would be wise from a fulfillment perspective to make shorter term plans and make sure to get the most out of each day, rather than make 30-40 year plans by sacrificing your daily tranquility. We could be entering a very dark era for humanity, from which there is no escape. There is also a small chance that we could get the tech utopia our billionaire overlords constantly harp on about, but I wouldn't bet on it.

globalnode

that seems a bit harsh dont you think? besides youre the one making the assertion, you kinda need to do the proving ;)

FridgeSeal

It’s basically the line for all the AI-hype people: “all the problems are going away!”, “soon it’ll all magically make things exponentially good-er-er!”

brookst

Alternatively, it’s a restatement of the obvious empirical truth that technology tends in improve on an exponential and not linear curve. Seems like a simpler explanation that doesn’t even require insulting people.

Yossarrian22

The premise would be better supported if it could be shown that if we could 10x the speed at which matrix multiplication is performed conferred a linear or better increase in performance post GPT-4. As it stands that would just seem to give us current results faster, not better results

einsteinx2

I would argue that any given technology tends to improve on an S curve, so exponentially at first and then flattening out. See Moore’s law as a great example.

Or more on topic see the improvements in LLMs since they were invented. At first each release was an order of magnitude better than the last (see GPT 2 vs 3 vs 4), now they’re getting better but at a much slower rate.

Certainly feels like being at the top of an S curve to me, at least until an entirely new architecture is invented to supersede transformers.

null

[deleted]

suddenlybananas

That's why airplanes are the so much faster than they were 20 years ago.

the8472

The drumbeat of AI progress has been fairly steady, on log scales.

https://time.com/6300942/ai-progress-charts/

amazingamazing

that doesn't mean ai is improving itself though

the8472

My point was that it already was on an exponential trajectory. RL/self-play and the like remove some of the human inputs that were previously required for that growth.

Take the trajectory of chess. handcrafted rules -> policies based on human game statistics -> self-play bootstrapped from human games -> random-initialized self-play.

JoshTko

AI will improve at an exponential rate once it can independently improve AI performance. For example. Once AI can organically identify, test, confirm, deploy an improvement like R1 vs o1 (in terms of perf/watt) then we'll see exponential improvement. Honestly though, that still seems possible within 5 years or less, maybe 3.

amelius

Only if the AI can do it faster than humans.

And if the improvements it makes are not asymptotically diminishing.

ksec

>Honestly though, that still seems possible within 5 years or less, maybe 3.

If that is a normal human estimation I would guess in reality it is more likely to be in 6-10 years. Which is still good if we get it in 2030 - 2035.

ethbr1

For futurism on things that promise economic rewards, exponential increases are not uncommon.

boznz

Currently AI is getting better at sorting the data that already exists, but if enough reddit, wiki posts are wrong its answer is inevitably wrong. Without being able to experiment to test its theories against reality, the AI curve will likely not lead to super-intelligence without humans to assist. That's my 5 cents.

sandworm101

The key "ability" that will grow exponentially is AIs ability to convert investment dollars into silicon+electricity and then further reduce those into heat energy. Such schemes only seem wasteful to outsiders, those whose salaries are not tied to their ability to convert money into heat. A fun startup would be one that generates useful electricity from the AI investment cycle. If we put the Ai machine under a pot of water, we might then use the resulting steam to drive a turbine.

grues-dinner

Some datacentres do in fact recover the heat for things like municipal heating. It's tricky though because being near population centres that can use the heat is often (not always) inversely related to things that are good for datacentres like cheap land, power and lack of neighbours to moan about things like construction and cooling system noise.

There was also a startup selling/renting bitcoin miners that doubled as electrical heaters.

The problem is that computers are fundamentally resistors, so at most you can get 100% of the energy back as heat. But a heat pump can give you 2-4 times the energy back. So your AI work (or bitcoin mining) plus the capital outlay of the expensive computers has to be worth the difference.

ozten

Orbital Materials is designing wafer substrates that capture carbon and reuse excess heat.

immibis

Due to Carnot's law, you can't get much electricity that way without a big temperature difference. Think about it: the AI machine would have to run at at least 100 degrees Celsius to boil the water, and that's the bare minimum.

But if we can make computers that run at, say, 2000 degrees, without using several times more electricity, then we can capture their waste heat and turn a big portion of it back into electricity to re-feed the computers. It doesn't violate thermodynamics, it's just an alternative possibility to make more computers that use less electricity overall (an alternative to directly trying to reduce the energy usage of silicon logic gates) as long as we're still well above Landauer's limit.

sandworm101

At sea level. Put the machine in a vacuum chamber, or atop a big mountain, and we will boil the Ai kettles at less than 100c.

ioulaum

The exponential part may be iffy, but it is self improving.

And this same RL is also creating improvements in small model performance.

So, more LLMs are about to rise in quality.

immibis

It's self-improving? So, we can ask AI how to improve AI, and the suggestions actually work?

sesm

It's more like Intel in early days using their CPUs to compute layout for bigger CPUs.

dfilppi

[dead]

geuis

[flagged]

petemir

Perhaps you are not the intended audience for this article.

JimmyBuckets

Why are you so angry? I thought it was a wonderful overview. An even if not, insults are hardly necessary.

null

[deleted]

gregoriol

[flagged]

0xEF

What a useless comment. Point out your qualms with it, explain how you would have done it better or clarify inaccuracies you find. It helps promote discussion and opens the door for more collective information to be considered.

fragmede

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

https://news.ycombinator.com/newsguidelines.html