Explainer: What's r1 and everything else?

131 comments

·January 26, 2025

WesleyJohnson

Someone should author an ELI5 (or slightly older) guide to how LLMs, RL, Agents, CoT, etc, all work and what all these acronyms mean. And then, add to it, daily or weekly, as new development arise. I don't want to keep reading dozens of articles, white papers, tweets, etc, as new developments happen. I want to go back to the same knowledge base, that's authored by the same person (or people), that maintains a consistent reading and comprehension level, and builds on prior points.

It seems like the AI space is moving impossibly fast, and its just ridiculously hard to keep up unless 1) you work in this space, 2) are very comfortable with the technology behind it, so you can jump in at any point and understand it.

esafak

Just ask an Internet-enabled LLM like You.com to do it. This is what they are good that. Wikipedia satisfies your repository requirement.

alotaluv

haha just have gpt operator do it

Havoc

>people re-creating R1 (some claim for $30)

R1 or the R1 finetunes? Not the same thing...

HF is busy recreating R1 itself but that seems to be a pretty big endevour not a $30 thing

littlestymaar

This is indeed a massive exaggeration, I'm pretty sure the $30 experiment is this one: https://threadreaderapp.com/thread/1882839370505621655.html (github: https://github.com/Jiayi-Pan/TinyZero).

And while this is true that this experiment shows that you can reproduce the concept of direct reinforcement learning of an existing LLM, in a way that makes it develop reasoning in the same fashion Deepseek-R1 did, this is very far from a re-creation of R1!

m3kw9

Maybe they mistake recreation for the cp command

rahimnathwani

  Most important, R1 shut down some very complex ideas (like DPO & MCTS) and showed that the path forward is simple, basic RL.

This isn't quite true. R1 used a mix of RL and supervised fine-tuning. The data used for supervised fine-tuning may have been model-generated, but the paper implies it was human-curated: they kept only the 'correct' answers.

Philpax

I think what you're saying is consistent with the quote: human curation of SFT data is indeed not complex. There might be extra work on top of that RL, but it's the same work that's been done throughout LLM development.

tkellogg

Additionally, in the following days, I've seen evidence suggesting that the SFT part might not even be necessary. I'd argue that work wouldn't have happened if R1 wasn't released in the open.

fullstackchris

So the conclusion is AI is about to "increase in abilities at an exponential rate", with the only data point being that R1 was sucessfully able to acheive o1 levels as an open source model? In other words, two extremely unrelated themes?

Does this guy know people were writing verbatim the same thing in like... 2021? Still always incredible to me the same repeated hype over and over rise to the surface. Oh well... old man gonna old man

raincole

> Does this guy know people were writing verbatim the same thing in like... 2021?

Given how far gen AIs have improved since 2021, these people were quite spot on.

raincole

People keep saying that DeepSeek R1's training cost is just $5.6M. Where is the source?

I'm not asking for the proof. Just the source, even a self-claimed statement. I've read the R1's paper and it doesn't say the number of $5.6M. Is it somewhere in DeepSeek's press release?

whimsicalism

this is a pretty hype-laden/twitter-laden article, i would not trust it to explain things to you

thih9

Sure. But perhaps some hype is justified? Here's what a senior research scientist from nvidia says:

> We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive - truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely.

> DeepSeek-R1 not only open-sources a barrage of models but also spills all the training secrets. They are perhaps the first OSS project that shows major, sustained growth of an RL flywheel. (…)

https://x.com/DrJimFan/status/1881353126210687089

comrade1234

The benchmarks for the different models focus on math and coding accuracy. I have a use-case for a model where those two functions are completely irrelevant and I’m only interested in writing (chat, stories, etc). I guess you can’t really benchmark ‘concepts’ as easily as logic.

With distillation, can a model be made that strips out most of the math and coding stuff?

NitpickLawyer

> completely irrelevant and I’m only interested in writing (chat, stories, etc)

There's a person keeping track of a few writing prompts and the evolution of the quality of text with each new shiny model. They shared this link somewhere, can't find the source but I had it bookmarked for further reading. Have a look at it and see if it's something you'd like.

https://eqbench.com/results/creative-writing-v2/deepseek-ai_...

afro88

Here's a better link: https://eqbench.com/creative_writing.html

The R1 sample reads way better than anything else on the leaderboard to me. Quite a jump.

polynomial

Why is the main character named Rhys in most (?) of them? Llama[1], Claude[3], Mistral[4] & DeepSeek-r1[5] samples all named the main character Rhys, even though that was no where specified in the prompt? GPT-4o gives the character a different name[6]. Gemini[2] names the bookshop person Rhys instead! Am I just missing something really obvious? I feel like I'm missing something big that's right in front of me

[1] https://eqbench.com/results/creative-writing-v2/meta-llama__... [2] https://eqbench.com/results/creative-writing-v2/gemini-1.5-f... [3] https://eqbench.com/results/creative-writing-v2/claude-3-opu... [4] https://eqbench.com/results/creative-writing-v2/mistralai__M... [5] https://eqbench.com/results/creative-writing-v2/deepseek-ai_... [6] https://eqbench.com/results/creative-writing-v2/gpt-4o-2024-...

exikyut

Completely agree.

The only measurable flaw I could find was the errant use of an opening quote (‘) in

> He huffed a laugh. "Lucky you." His gaze drifted to the stained-glass window, where rain blurred the world into watercolors. "I bombed my first audition. Hamlet, uni production. Forgot ‘to be or not to be,' panicked, and quoted Toy Story."

It's pretty amazing I can find no fault with the actual text. No grammar errors, I like the writing, it competes with the quality and engagingness of a large swath of written fiction (yikes), I wanna read the next chapter.

comrade1234

Those outputs are really good and come from deepseek-R1 (I assume the full version, not a distilled version).

R1 is quite large (685B params). I’m wondering if you can make a distilled R1 without the coding and math content. 7B works well for me locally. When I go up to 32B I seem to get worse results - I assume it’s just timing out in its think mode… I haven’t had time to really investigate though.

ben30

Yes, you can create a writing-focused model through distillation, but it's tricky. *Complete removal* of math/coding abilities is challenging because language models' knowledge is interconnected - the logical thinking that helps solve equations also helps structure coherent stories.

mercer

I understood that at least some of these big models (llama?) is basically bootstrapped with code. is there truth to that?

ben30

Yes, code is a key training component. Open-Llama explicitly used programming data as one of seven training components. However, newer models like Llama 3.1 405B have shifted to using synthetic data instead. Code helps develop structured reasoning patterns but isn't the sole foundation - models combine it with general web text, books, etc.

1123581321

Nice explainer. R1 hit sensational mainstream news which has resulted in some confusion and alarm among family and friends. It’s hard to succinctly explain this doesn’t mean China is destroying us, that Americans immediately started working with the breakthrough, cost optimization is inevitable in computing, etc.

richardatlarge

T or F?

Nobody really saw the LLM leap coming

Nobody really saw R1 coming

We don’t know what’s coming

bikamonki

So, is AI already reasoning or not?

brookst

Depends on your definition of reasoning. Creating valid chains of thought? Yes. Sentient? No.

didibus

No. AI learns to predict reasons, and doing so as it predicts the answer improves its accuracy at predicting the answer.

In summary, even though they are called "reasoning" models, they are still based on prediction and pattern matching, not true logical reasoning. The improvement in accuracy is likely due to better leveraging of the model's statistical knowledge, rather than any deeper understanding of the problem's logic. And the reasons you see it output have nothing to do with the actual reasons it used to determine the answer.

In fact, R1.Zero hints that, it might be even better to let the AI follow a chain of thought that doesn't actually make logical sense or is understandable, and that doing so could even further improve its ability to accurately predict solutions to code, math and logic problems.

jug

Yes, that's what OpenAI o1 does, and DeepSeek R1. Also Google Gemini 2.0 Thinking models. It's a way to significantly improve benchmark scores, especially in math.

It's funny to watch too. I played with Gemini 2.0 on Google AI Studio and asked it to "come up with your favorite song as you take a long walk to really think this through".

The reasoning can then be shown, and it talked to itself, saying things like "since I'm an AI, I can't take walks, but with a request like this, the user seems to imply that I should choose something that's introspective and meaningful", and went on with how it picked candidates.

erlendstromsvik

I just tried that prompt with gemini-2.0-flash-thinking-exp-01-21

In the reasoning process it concludes on: From the brainstormed genres/artists, select a specific song. It's better to be concrete than vague. For this request, "Nuvole Bianche" by Ludovico Einaudi emerges as a strong candidate. Craft the Explanation and Scenario: Now, build the response around "Nuvole Bianche."

Then in the actual answer it proposes: "Holocene" by Bon Iver.

ozten

Yes. ARC AGI benchmark was supposed to last years and is already saturated. The authors are currently creating the second version.

simonw

From that article:

> ARC-AGI is a benchmark that’s designed to be simple for humans but excruciatingly difficult for AI. In other words, when AI crushes this benchmark, it’s able to do what humans do.

That's a misunderstanding of what ARC-AGI means. Here's what ARC-AGI creator François Chollet has to say: https://bsky.app/profile/fchollet.bsky.social/post/3les3izgd...

> I don't think people really appreciate how simple ARC-AGI-1 was, and what solving it really means.

> It was designed as the simplest, most basic assessment of fluid intelligence possible. Failure to pass signifies a near-total inability to adapt or problem-solve in unfamiliar situations.

> Passing it means your system exhibits non-zero fluid intelligence -- you're finally looking at something that isn't pure memorized skill. But it says rather little about how intelligent your system is, or how close to human intelligence it is.

tkellogg

Ah! My bad, I edited the article to simply quote Francois. Thanks for catching this, Simon.

dartos

> That's a misunderstanding of what ARC-AGI means

Misunderstanding benchmarks seems to be the first step to claiming human level intelligence.

Additionally:

> > ARC-AGI is a benchmark that’s designed to be simple for humans but excruciatingly difficult for AI. In other words, when AI crushes this benchmark, it’s able to do what humans do.

Doesn’t even make logical sense.

ethbr1

This feels like a generalized extension of the classic mis-reasoned response to 'A computer can now play chess.'

Common non-technical chain of thought after learning this: 'Previously, only humans could play chess. Now, computers can play chess. Therefore, computers can now do other things that previously only humans could do.'

The error is assuming that problems can only be solved via levels of human-style general intelligence.

Obviously, this is false from the way that computers calculate arithmetic, optimize via gradient descent, and innumerable other examples, but it does seem to be a common lay misunderstanding.

Probably why IBM abused it with their Watson marketing.

In reality, for reliable capabilities reasoning, the how matters very much.

antonvs

> Misunderstanding benchmarks seems to be the first step to claiming human level intelligence.

It's known as "hallucination" a.k.a. "guessing or making stuff up", and is a major challenge for human intelligence. Attempts to eradicate it have met with limited success. Some say that human intelligence will never reach AGI because of it.

dartos

Thankfully nobody is trying to sell humans as a service in an attempt to replace the existing AIs in the workplace (yet).

I’m sure such a product would be met with ridicule considering how often humans hallucinate. Especially since, as we all know, the only use for humans is getting responses given some prompt.

9dev

Doesn’t that turn the entire premise on its head? If passing the benchmark means crossing the lower, not the upper threshold, that invalidates most claims derived from it.

cootsnuck

Correct. Hence many people constantly bemoaning the hype driven narratives that dominate many AI discussions.

polotics

Interesting article, but the flourish ending """AI will soon (if not already) increase in abilities at an exponential rate.""" is not at all substantiated. Would be nice to know how the author gets to that conclusion.

tkellogg

Author here. I do believe it's going to be exponential (not yet), but that's out of scope for the article. However, if someone has a good explainer link for that, please put it here and I'll link it into the post.

samsartor

All past data shows is exponential growth in the cost of AI systems, not an exponential growth in capability. Capabilities have certainly expanded, but that is hard to measure. The growth curve is just as likely to be sigmoid-shaped. Just a phase transition from "computers process information strictly procedurally" to "computers use fuzzy logic sometimes too". And if we've exhausted all the easy wins, that explains the increased interest in alternative scaling paths.

Obviously predicting the future is hard, and we won't know where this stops till we get there. But I think a degree of skepticism is warranted.

DennisP

Once AI becomes self-improving, using its intelligence to make itself more intelligent, exponential progress seems like the logical consequence. Any lack of exponential progress before it becomes self-improving doesn't have much bearing on that.

It certainly will be sigmoid-shaped in the end, but the top of the sigmoid could be way beyond human intelligence.

jcgrillo

Care to justify those beliefs or are we just supposed to trust your intuition? Why exponential and not merely quadratic (or some other polynomial)? How do you even quantify "it"? I'm teasing, somewhat, because I don't actually expect you're able to answer. Yours isn't reasoned arguments, merely religious fervor dressed up in techy garb. Prove me wrong!

dleeftink

Not necessarily 'exponential' (more superlinear) in capabilities (yet) but rather in parameters/training data/compute/costs, which may sometimes be confused for the other.

[0]: https://ourworldindata.org/grapher/exponential-growth-of-par...

[1]: https://ourworldindata.org/grapher/exponential-growth-of-dat...

[2]: https://epoch.ai/blog/trends-in-training-dataset-sizes

[3]: https://ourworldindata.org/grapher/exponential-growth-of-com...

[4]: https://blog.tebs-lab.com/p/not-exponential-growth

Red_Comet_88

If you read the article, he explains that there are multiple scaling paths now, whereas before it was just parameter scaling. I think it's reasonable to estimate faster progress as a result of that observation.

I like that the HN crowd wants to believe AI is hype (as do I), but it's starting to look like wishful thinking. What is useful to consider is that once we do get AGI, the entirety of society will be upended. Not just programming jobs or other niches, but everything all at once. As such, it's pointless to resist the reality that AGI is a near term possibility.

It would be wise from a fulfillment perspective to make shorter term plans and make sure to get the most out of each day, rather than make 30-40 year plans by sacrificing your daily tranquility. We could be entering a very dark era for humanity, from which there is no escape. There is also a small chance that we could get the tech utopia our billionaire overlords constantly harp on about, but I wouldn't bet on it.

globalnode

that seems a bit harsh dont you think? besides youre the one making the assertion, you kinda need to do the proving ;)

sandworm101

The key "ability" that will grow exponentially is AIs ability to convert investment dollars into silicon+electricity and then further reduce those into heat energy. Such schemes only seem wasteful to outsiders, those whose salaries are not tied to their ability to convert money into heat. A fun startup would be one that generates useful electricity from the AI investment cycle. If we put the Ai machine under a pot of water, we might then use the resulting steam to drive a turbine.

immibis

Due to Carnot's law, you can't get much electricity that way without a big temperature difference. Think about it: the AI machine would have to run at at least 100 degrees Celsius to boil the water, and that's the bare minimum.

But if we can make computers that run at, say, 2000 degrees, without using several times more electricity, then we can capture their waste heat and turn a big portion of it back into electricity to re-feed the computers. It doesn't violate thermodynamics, it's just an alternative possibility to make more computers that use less electricity overall (an alternative to directly trying to reduce the energy usage of silicon logic gates) as long as we're still well above Landauer's limit.

sandworm101

At sea level. Put the machine in a vacuum chamber, or atop a big mountain, and we will boil the Ai kettles at less than 100c.

grues-dinner

Some datacentres do in fact recover the heat for things like municipal heating. It's tricky though because being near population centres that can use the heat is often (not always) inversely related to things that are good for datacentres like cheap land, power and lack of neighbours to moan about things like construction and cooling system noise.

There was also a startup selling/renting bitcoin miners that doubled as electrical heaters.

The problem is that computers are fundamentally resistors, so at most you can get 100% of the energy back as heat. But a heat pump can give you 2-4 times the energy back. So your AI work (or bitcoin mining) plus the capital outlay of the expensive computers has to be worth the difference.

ozten

Orbital Materials is designing wafer substrates that capture carbon and reuse excess heat.

FridgeSeal

It’s basically the line for all the AI-hype people: “all the problems are going away!”, “soon it’ll all magically make things exponentially good-er-er!”

brookst

Alternatively, it’s a restatement of the obvious empirical truth that technology tends in improve on an exponential and not linear curve. Seems like a simpler explanation that doesn’t even require insulting people.

Yossarrian22

The premise would be better supported if it could be shown that if we could 10x the speed at which matrix multiplication is performed conferred a linear or better increase in performance post GPT-4. As it stands that would just seem to give us current results faster, not better results

einsteinx2

I would argue that any given technology tends to improve on an S curve, so exponentially at first and then flattening out. See Moore’s law as a great example.

Or more on topic see the improvements in LLMs since they were invented. At first each release was an order of magnitude better than the last (see GPT 2 vs 3 vs 4), now they’re getting better but at a much slower rate.

Certainly feels like being at the top of an S curve to me, at least until an entirely new architecture is invented to supersede transformers.

null

[deleted]

suddenlybananas

That's why airplanes are the so much faster than they were 20 years ago.

the8472

The drumbeat of AI progress has been fairly steady, on log scales.

https://time.com/6300942/ai-progress-charts/

amazingamazing

that doesn't mean ai is improving itself though

the8472

My point was that it already was on an exponential trajectory. RL/self-play and the like remove some of the human inputs that were previously required for that growth.

Take the trajectory of chess. handcrafted rules -> policies based on human game statistics -> self-play bootstrapped from human games -> random-initialized self-play.

JoshTko

AI will improve at an exponential rate once it can independently improve AI performance. For example. Once AI can organically identify, test, confirm, deploy an improvement like R1 vs o1 (in terms of perf/watt) then we'll see exponential improvement. Honestly though, that still seems possible within 5 years or less, maybe 3.

amelius

Only if the AI can do it faster than humans.

And if the improvements it makes are not asymptotically diminishing.

ksec

>Honestly though, that still seems possible within 5 years or less, maybe 3.

If that is a normal human estimation I would guess in reality it is more likely to be in 6-10 years. Which is still good if we get it in 2030 - 2035.

ethbr1

For futurism on things that promise economic rewards, exponential increases are not uncommon.

boznz

Currently AI is getting better at sorting the data that already exists, but if enough reddit, wiki posts are wrong its answer is inevitably wrong. Without being able to experiment to test its theories against reality, the AI curve will likely not lead to super-intelligence without humans to assist. That's my 5 cents.

ioulaum

The exponential part may be iffy, but it is self improving.

And this same RL is also creating improvements in small model performance.

So, more LLMs are about to rise in quality.

immibis

It's self-improving? So, we can ask AI how to improve AI, and the suggestions actually work?

sesm

It's more like Intel in early days using their CPUs to compute layout for bigger CPUs.

dfilppi

[dead]