Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model

65 comments

·November 6, 2025

simonw

  uv tool install llm
  llm install llm-moonshot
  llm keys set moonshot # paste key
  llm -m moonshot/kimi-k2-thinking 'Generate an SVG of a pelican riding a bicycle'

https://tools.simonwillison.net/svg-render#%3Csvg%20width%3D...

Here's what I got using OpenRouter's moonshotai/kimi-k2-thinking instead:

https://tools.simonwillison.net/svg-render#%20%20%20%20%3Csv...

JJax7

Love seeing this benchmark become more iconic with each new model release. Still in disbelief at the GPT-5 variants' performance in comparison but its cool to see the new open source models get more ambitious with their attempts.

ahmedfromtunis

Where do you run a trillion-param model?

isoprophlex

You let the people at openrouter worry about that for you

MurizS

Which in turn lets the people at Moonshot AI worry about that for them, the only provider for this model as of now.

ripped_britches

Please for the love of god, if you work at cerebras, please put this on an API for me.

HarHarVeryFunny

It's good to see more competition, and open source, but I'd be much more excited to see what level of coding and reasoning performance can be wrung out of a much smaller LLM + agent as opposed to a trillion parameter one. The ideal case would be something that can be run locally, or at least on a modest/inexpensive cluster.

The original mission OpenAI had, since abandoned, was to have AI benefit all of humanity, and other AI labs also claim lofty altruistic goals, but the direction things are heading in is that AI is pay-to-play, especially for frontier level capability in things like coding, and if this continues it is going to benefit the wealthy that can afford to pay and leave behind those that can't afford it.

stingraycharles

With these things it’s always both at the same time: these super grandiose SOTA models are only making improvements mostly because of optimizations, and they’re just scaling our as far as they can.

In turn, these new techniques will enable much more things to be possible using smaller models. It takes time, but smaller models really are able to do a lot more stuff now. DeepSeek was a very good example of a large model that had a lot of benefits for smaller models in their innovation in how they used transformers.

Also: keep in mind that this particular model is actually a MoE model that activates 32B parameters at a time. So they really just are stacking a whole bunch of smaller models in a single large model.

a-dub

"open source" means there should be a script that downloads all the training materials and then spins up a pipeline that trains end to end.

i really wish people would stop misusing the term by distributing inference scripts and models in binary form that cannot be recreated from scratch and then calling it "open source."

emsign

They'd have to publish or link the training data, which is full of copyrighted material. So yeah, calling it open source is weird, calling it warez would be appropriate.

HarHarVeryFunny

Yeah, but "open weights" never seems to have taken off as a better description, and even if you did have the training data + recipe, the compute cost makes training it yourself totally impractical.

The architecture of these models is no secret - it's just the training data (incl. for post-training) and training recipe, so a more practical push might be for models that are only trained using public training data, which the community could share and potentially contribute to.

nyrp

> binary form that cannot be recreated from scratch

Back in my day, we called it "freeware"

poly2it

You have more rights over a freely licensed binary file than over a freeware file.

danielmarkbruce

"open source" has come to mean "open weight" in model land. It is what it is. Words are used for communication, you are the one misusing the words.

You can update the weights of the model, continue to train, whatever. Nobody is stopping you.

moffkalast

I'd agree but we're beyond hopelessly idealistic. That sort of approach only helps your competition who will use it to build a closed product and doesn't give anything of worth to people who want to actually use the model because they have no means to train it. Hell most people can barely scrape up enough hardware to even run inference.

Reproducing models is also not very ecological in when it comes down to it, do we really all need to redo the training that takes absurd amounts of power just to prove that it works? At least change the dataset to try and get a better result and provide another datapoint, but most people don't have the knowhow for it anyway.

Nvidia does try this approach sometimes funnily enough, they provide cool results with no model in hopes of getting people to buy their rented compute and their latest training platform as a service...

jstummbillig

> I'd be much more excited to see what level of coding and reasoning performance can be wrung out of a much smaller LLM + agent

Well, I think you are seeing that already? It's not like these models don't exist and they did not try to make them good, it's just that the results are not super great.

And why would they be? Why would the good models (that are barely okay at coding) be big, if it was currently possible to build a good models, that is small?

Of course new ideas will be found and that may change in the future, but it's not like the makers of small models find great optimizations that frontier models makers would not also use or have thought of already.

HarHarVeryFunny

Sure, but that's the point ... today's locally runnable models are a long way behind SOTA capability, so it'd be nice to see more research and experimentation in that direction. Maybe a zoo of highly specialized small models + agents for S/W development - one for planning, one for coding, etc?

pzo

Even if pay-to-play companies like moonshootai help to pay less.

You can run previous kimi k2 non-thinking model e.g. on groq with 720tok/s and for $1/$3 for million input/output tokens. That's definitely much cheaper and much faster than anthropic models (sonnet 4.5: 60tok/s, $3/$15)

pshirshov

> The ideal case would be something that can be run locally, or at least on a modest/inexpensive cluster.

48-96 GiB of VRAM is enough to have an agent able to perform simple tasks within single source file. That's the sad truth. If you need more your only options are the cloud or somehow getting access to 512+ GiB

pdyc

I think that's where prompt engineering would be needed. Bigger models produce good output even with ambiguous prompts. Getting similar output from smaller models is art,

pu_pe

Four independent Chinese companies released extremely good open source models in the past few months (DeepSeek, Qwen/Alibaba, Kimi/Moonshot, GLM/Z.ai). No American or European companies are doing that, including titans like Meta. What gives?

seunosewa

The Chinese are doing it because they don't have access to enough of the latest GPUs to run their own models. Americans aren't doing this because they need to recoup the cost of their massive GPU investments.

the_mitsuhiko

And Europeans don't it because quite frankly, we're not really doing anything particularly impressive with AI sadly.

speedgoose

To misquote the French president, "Who could have predicted?".

https://fr.wikipedia.org/wiki/Qui_aurait_pu_pr%C3%A9dire

sampton

Meta gave up on open weight path after DeepSeek.

gordonhart

It’s more fair to say they gave up after the Llama 4 disaster.

RIMR

Love their nonsense excuse they they are trying to protect us from misuse of "superintelligence".

>“We believe the benefits of superintelligence should be shared with the world as broadly as possible. That said, superintelligence will raise novel safety concerns. We’ll need to be rigorous about mitigating these risks and careful about what we choose to open source.” -Mark Zuckerberg

Meta has shown us daily that they have no interest in protecting anything but their profits. They certainly don't intend to protect people from the harm their technology may do.

They just know that saying "this is profitable enough for us to keep it proprietary and restrict it to our own paid ecosystem" will make the enthusiasts running local Llama models mad at them.

sheepscreek

I am sure they cherry-picked the examples but still, wow. Having spent a considerable amount of time trying to introduce OSS models in my workflows I am fully aware of their short comings. Even frontier models would struggle with such outputs (unless you lead the way, help break down things and maybe even use sub-agents).

Very impressed with the progress. Keeps me excited about what’s to come next!

lxe

This was the only recent model that provided a human solution to the stacking problem on the first try. https://medium.com/@nathanbos/gpt-4-may-have-general-intelli...

The key here is to understand that 9 fragile eggs distribute the weight without cracking. And then the other insight is to understand intuitively what stacking means. Where arranging things around certain objects doesn't make any sense.

stingraycharles

Available on OpenRouter already as well in case anyone wants to try it there: https://openrouter.ai/moonshotai/kimi-k2-thinking

am17an

The non-thinking version is the best writer by far. Excited for this one! They really cooked some different from other frontier labs.

spaceman_2020

Kimi K2 has a very good model feel. Was made with taste

chvid

So Apple is about to pay OpenAI 1 B usd pr year for what moonshot is giving for free?

wmf

You haven't seen Gemini 3 yet. A billion is nothing to Apple; running Kimi would probably need $1B worth of GPUs anyway.

Alifatisk

Can't wait for Artificial analysis benchmarks, still waiting on them adding Qwen3-max thinking, will be interesting to see how these two compare to each other

osti

Qwen 3 max has been getting rather bad reviews around the web (both on reddit and chinese social media), and from my own experience with it. So I wouldn't expect this to be worse.

SamDc73

Also, my experience with it wasn't that good; but it was looking good on benchmarks ..

It seems benchmark maxing, what you do when you're out of tricks?

Alifatisk

Ohhh, so Qwen3 235B-A22B-2507 is still better?

htrp

Did the ArtificialAnalysis team get bored or something? What makes a model worthy of benchmark inclusion?

emsign

> 200 to 300 consecutive tool calls

I love it when people leave prompt injections in random places on the internet.

enigma101

what's the hardware needed to run the trillion parameter model?

trvz

To start with, an Epyc server or Mac Studio with 512GB RAM.

CamperBob2

It's an MoE model, so it might not be that bad. The deployment guide at https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main... suggests that the full, unquantized model can be run at ~46 tps on a dual-CPU machine with 8× NVIDIA L20 boards.

Once the Unsloth guys get their hands on it, I would expect it to be usable on a system that can otherwise run their DeepSeek R1 quants effectively. You could keep an eye on https://old.reddit.com/r/LocalLlama for user reports.

HN

Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model

Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model