Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL
30 comments
·May 12, 2025Thomashuet
Summary: We've use the most complexest, buzzwordiest training infrastructure to increase the performance of our base model by a whopping 0.5% (±1%).
iTokio
It’s interesting that it does something useful (training a LLM) without trust and in a decentralized way.
Maybe this could be used as proof of work? To stop wasting computing resources in crypto currencies and get something useful as a byproduct.
Geee
No, this process doesn't produce "proof of work", i.e. verifiable proofs that energy has been used.
_ink_
I read an argument, that proof of work needs to be useless and wasteful. If it would produce value in itself it would make 51% attacks more economic and thus the currency less secure.
fastball
The emphasis is indeed on "without trust" – as far as I can tell this project is unable to verify whether the decentralized training nodes are contributing productively.
Without the ability to validate that training compute is heading in the globally desired direction, it is unlikely you could use it as the foundation of a (sound) cryptocurrency.
mentalgear
The reward model could be used as a validation/reward for the client. Give the same nodes the same inferences to make, and the one with the highest reward (those could be short, or even partially calculated long-term) will also get the "currency" reward.
mentalgear
That would be indeed a very promising way of FINALLY making cryptocurrency useful!
proof_by_vibes
There could be merit to this. Proofs are generally computationally hard, so it's possible that a currency could be created by quantifying verification.
littlestymaar
> To stop wasting computing resources in crypto currencies and get something useful as a byproduct.
Bitcoin is the only major cryptocurrency that still use proof of work today (others are either using “proof of stakes” or are “Layer 2” chains), and due to its (relative lack of) governance structure, it's very unlikely to ever change.
abtinf
Does this have anything to do with The Metamorphosis Of Prime Intellect, or did they just abuse the name and the cover art?
arthurcolle
Prime Intellect is a grabby AI :)
3abiton
This is rather exciting! I see the future of Co-op models made by a community of experts on a specific field that would still allow them to be competitive with "AI monopolies". Maybe not all hope is lost!
danielhanchen
I made some GGUFs at https://huggingface.co/unsloth/INTELLECT-2-GGUF
./llama.cpp/llama-cli -hf unsloth/INTELLECT-2-GGUF:Q4_K_XL -ngl 99
Also it's best to read https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-e... on sampling issues for QwQ based models.
Or TLDR, use the below settings:
./llama.cpp/llama-cli -hf unsloth/INTELLECT-2-GGUF:Q4_K_XL -ngl 99 --temp 0.6 --repeat-penalty 1.1 --dry-multiplier 0.5 --min-p 0.00 --top-k 40 --top-p 0.95 --samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"
schneehertz
I used to have an idea related to science fiction novels that artificial intelligence could aggregate computing power through the network to perform ultra-large-scale calculations, thereby achieving strong artificial intelligence. Reality will also develop in this way, which is very interesting
refulgentis
I guess I'm bearish?
It's not that they trained a new model, but they took an existing model and RL'd it a bit?
The scores are very close to QwQ-32B, and at the end:
"Overall, as QwQ-32B was already extensively trained with RL, it was difficult to obtain huge amounts of generalized improvement on benchmarks beyond our improvements on the training dataset. To see stronger improvements, it is likely that better base models such as the now available Qwen3, or higher quality datasets and RL environments are needed."
fabmilo
The interesting delta here is that this proves that we can distribute the training and get a functioning model. The scaling factor is way bigger than datacenters
comex
But does that mean much when the training that produced the original model was not distributed?
refulgentis
The RL, not the training. No?
christianqchung
Third party fine tuned open weighted LLMs tend to be good at a handful of benchmarks, but parity or lower on others compared to the original model. There are some exceptions like Nvidia's Nemotron series, but the differences generally are so small as to be imperceptible. Deepseek released finetunes of several Qwen and Llama models alongside R1, and while they were better in some select (mostly math) and coding domains, there were problems resulting from fine tuning that didn't result in them overtaking the original models in usage.
mountainriver
Awesome work this team is doing. Globally distributed MoE could have real legs
esafak
How are they ensuring robustness against adversarial responses?
nsingh2
From the article, seems like TOPLOC:
> based on top of novel components such as TOPLOC, which verifies rollouts from untrusted inference workers
xmasotto
Can an expert explain how this protects against adversarial actors?
At a glance it looks like something akin to a computing a checksum that's locality sensitive, so it's robust to floating point errors, etc.
What's to stop someone from sending bad data + a matching bad checksum?
quantumwoke
Wonder what the privacy story is like. Enterprises don't usually like broadcasting their private data across a freely accessible network.
bjt12345
A strong use case here for quantum-safe encryption.
ndgold
Pretty badass
There's a name and a logo. "Hubris" feels slightly beggared. https://en.m.wikipedia.org/wiki/The_Metamorphosis_of_Prime_I...