Xiaomi MiMo Reasoning Model
99 comments
·April 30, 2025lvl155
whynotmaybe
Haven't we reached a situation where English is the de facto language of scientific research, especially AI benchmarks ?
It's clearly impossible for me to try anything in Chinese, I'd need a translation.
chvid
All LLMs are trained on the same basic blob of data - mostly in English, mostly pirated books and stuff.
enlyth
I assume a large portion of high quality training material is in English
sigmoid10
You'd be correct. The largest portion of all languages in Common Crawl (aka the "whole open internet" training corpus) is English with 43%. No other language even reaches double digit percentages. The next biggest one is Russian at 6%, followed by German at 5%.
choutianxius
One reason is that there is no "good" search engine in China. The most popular one, Baidu, is like garbage compared to Google search. The most useful training data in Chinese would likely be from the social media and video sharing platforms, which I guess is much more difficult to crawl and clean up.
mensetmanusman
English won. The Chinese youth struggle to write their own calligraphy characters they can read now. Typing favors English.
rahimnathwani
It's easy and fast to type Chinese sentences using a keyboard.
throwaway519
The pendulum already turned back. The current generation under 20 grew up with touchscreens. That obseletes input with pinyin; many don't care if the device has no keyboard.
bilbo0s
The mandarin language models obviously exist, but what would you do with them if they provided access to them? And what knowledge would be in them? What is the body of knowledge encoded in Mandarin? What does that look like?
Sad reality is that not many outside of China have the facility with Mandarin to use those models. Even non-native Mandarin speakers who claim to be "fluent", are often messing up intended meaning in text. Or making literal translations that wind up making no sense.
Inside of China, llm use will be Mandarin based. Outside, it seems to me English is the natural choice.
Irony of Irony, probably the best way for a non Mandarin speaking layman to test a Mandarin based model would be to use another LLM to translate prompts to Mandarin.
It's a sad future we're looking at.
Or a brilliant one.
Time will tell.
johnla
For it to be brilliant, AI needs to be a benevolent tool all the time. It would take just a few malignant actors to turn our world upside. I suspect it'll follow the same Internet and social media path. Great at first, grow markets, bring us together and then take a turn.
34679
Nearly everyone in the urban areas of China spoke some English when I visited way back in 1995. It's a bilingual society.
crazygringo
This is not true. I was in Beijing around then and never met a single person who spoke English if they hadn't learned it for professional reasons (they worked in tourism, international business, etc.).
It could not have been further from a bilingual society.
rahimnathwani
I lived in Beijing and Shanghai for 9 years (2010-2019) and this is NOT my impression at all.
rahimnathwani
When you guys use gguf files in ollama, do you normally create a modelfile to go with it, or just hope that whatever default ollama has work with the new model?
https://github.com/ollama/ollama/blob/main/docs%2Fmodelfile....
monkmartinez
If you ollama pull <model> the modelfile will be downloaded along with the blob. To modify the model permanently, you can copypasta the modelfile into a text editor and then create a new model from the old modelfile with the changes you require/made.
Here is my workflow when using Open WebUI:
1. ollama show qwen3:30b-a3b-q8_0 --modelfile
2. Paste the contents of the modelfile into -> admin -> models -> OpenwebUI and rename qwen3:30b-a3b-q8_0-monkversion-1
3. Change parameters like num_gpu 90 to change layers... etc.
4. Keep | Delete old file
Pay attention to the modelfile, it will show you something like this: # To build a new Modelfile based on this, replace FROM with: # FROM qwen3:30b-a3b-q8_0 and you need to make sure the paths are correct. I store my models on a large nvme drive that isn't default ollama as an example of why that matters.
memhole
I’ll typically use the defaults initially and then use a Modelfile if it’s something I plan on using. I think you can dump the modelfile ollama uses to have a template to work with.
gizmodo59
Its funny to see benchmarks where they omit the top performing models like O3 (Which is the best model in many benchmarks currently) and Gemini Pro/Claude 3.7.
daveguy
Those are much much larger models, and they are proprietary. Those model providers just don't have the distilled versions identified and available.
Notice most of the models they are comparing with are 7B models. The exception is also an open weights model (Qwen-2.5-32B-RL-Zero). Even with 32B parameters the MiMo-7B outperforms it.
vessenes
Umm wow. Great benchmarks. I’m looking forward to chatting with this one.
A couple things stand out to me — first is that the 7B model is trained on 25T tokens(!). This is Meta-scale training; Llama 4 Maverick was trained on 22T or so. (Scout, the smaller model: 40T).
Second, this is an interesting path to take - not a distilled model or an RL layer to get reasoning out of another model, but a from-scratch RL model with reasoning baked in; the claims seem to indicate you get a lot of extra efficiency per-parameter doing this.
I don’t have experience with Xiaomi models, so I’m cautious about this one until I play with it, but it looks like a super viable local reasoning model from the stats.
jedisct1
GGUF version (for LM Studio, Ollama, etc): https://huggingface.co/jedisct1/MiMo-7B-RL-GGUF
Jotalea
I wonder if they will use this model for their AI assistant on their Xiaomi 15 series phones. They most likely will. I'm not really sure what to expect from it.
m4r1k
My Chinese friend told me MiMo doesn’t have a meaning in Chinese (of course Mi 米 = rice). Anybody have a clue for what it stands for?
gandalfgreybeer
A lot of Xiaomi products have the prefix Mi. My initial guess is Mo is for model.
Also related reference https://en.wikipedia.org/wiki/Xiaomi#Name_etymology
nicman23
probably μίμος (mime)
ramesh31
These benchmark numbers cannot be real for a 7b model
strangescript
The smaller models have been creeping upward. They don't make headlines because they aren't leapfrogging the mainline models from the big companies, but they are all very capable.
I loaded up a random 12B model on ollama the other day and couldn't believe how good it competent it seemed and how fast it was given the machine I was on. A year or so ago, that would have not been the case.
apples_oranges
exactly, it seems to validate my assumption from some time ago, that we will mostly use local models for everyday tasks.
pzo
yeah especially that this simplifies e.g. doing mobile app for 3rd party developers - not extra cost, no need to setup proxy server, monitoring usage to detect abuse, don't need to make complicated subscription plan per usage.
We just need Google or Apple to provide their own equivalent of both: Ollama and OpenRouter so user either use inference for free with local models or BringYourOwnKey and pay themself for tokens/electricity bill. We then just charge smaller fee for renting or buying our cars.
mring33621
strong agree
my employer talks about spending 10s of millions on AI
but, even at this early stage, my experiments indicate that the smaller, locally-run models are just fine for a lot of tech and business tasks
this approach has definite privacy advantages and likely has cost advantages, vs pay-per-use LLM over API.
jillesvangurp
Including figuring out which more expensive models to use when needed instead of doing that by default. Early LLMs were not great at reasoning and not great at using tools. And also not great at reproducing knowledge. Small models are too small to reliably reproduce knowledge but when trained properly they are decent enough for simple reasoning tasks. Like deciding whether to use a smarter/slower/more expensive model.
wg0
But who will keep them updated and what incentive they would have? That's I can't imagine. Bit vague.
nickip
What model? I have been using api's mostly since ollama was too slow for me.
patates
I really like Gemma 3. Some quantized version of the 27B will be good enough for a lot of things. You can also take some abliterated version[0] with zero (like zero zero) guardrails and make it write you a very interesting crime story without having to deal with the infamous "sorry but I'm a friendly and safe model and cannot do that and also think about the children" response.
[0]: https://huggingface.co/mlabonne/gemma-3-12b-it-abliterated
estsauver
Qwen3 and some of the smaller gemma's are pretty good and fast. I have a gist with my benchmark #'s here on my m4 pro max (with a whole ton of ram, but most small models will fit on a well spec'ed dev mac.)
https://gist.github.com/estsauver/a70c929398479f3166f3d69bce...
justlikereddit
Last time I did that I was also impressed, for a start.
Problem was that of a top ten book recommendations only the first 3 existed and the rest was a casually blended hallucination delivered in perfect English without skipping a beat.
"You like magic? Try reading the Harlew Porthouse series by JRR Marrow, following the orphan magicians adventures in Hogwesteros"
And the further towards the context limit it goes the deeper this descent into creative derivative madness it goes.
It's entertaining but limited in usefulness.
omnimus
LLMs are not search engines…
bearjaws
My guess is that it is over fitted to the tests.
revel
They used RFT and there's only so many benchmarks out there, so I would be very surprised if they didn't train on the tests.
GaggiX
https://qwenlm.github.io/blog/qwen3/
Go look at the benchmark numbers of qwen3-4B if you think these are unrealistic.
mirekrusin
Today's best models will be worse models for the rest of your life.
otabdeveloper4
LLM benchmarks are mostly bullshit right now. Wait a few years until the hype cycle returns to sanity.
andrepd
Every LLM is basically being trained on benchmarks so "benchmark" as applied to LLMs is a pretty meaningless term.
mobilio
Waiting for GGUF or MLX models.
Probably within few hours will be released.
Havoc
FYI making a gguf yourself isn't hard and doesn't even need a GPU.
But yeah waiting is the easier option
userbinator
...and searching for things related to multiple antennae just got harder.
They could've called it Xiaomimo.
arghwhat
multiple-input, multiple-output was horribly generic to begin with. Terms like multipath propagation and spatial multiplexing will do just fine.
CodeCompost
Open Source or Open Weights?
ilrwbwrkhv
And this point everybody will open source their models or weights. The only one which will not is open AI.
rvz
> The only one which will not is open AI.
I think you meant Anthropic. OpenAI is "planning" to release an open weight model this year likely competing against the Llama models. [0]
I have not seen an open weight AI model ever being released by Anthropic at all.
w4yai
Anyone tried it ?
Alifatisk
No, where can I try it? I saw a huggingface link but I wonder if they host it themselves somewhere to like how Alibaba does with Qwen chat.
yorwba
There is a HuggingFace space (probably not official) at: https://huggingface.co/spaces/orangewong/xiaomi-mimo-7b-rl You might have to wait a minute to get a response. Also, the space doesn't seem to have turn-taking implemented, so after giving the Assistant's response, it kept on generating the Human's next message and so on and so forth.
Why are there so many English-first AI models from China? Are they not interested in serving their own population? Or is it that if they publish Chinese-first models it won't get publicity in the West?