Mistral ships Le Chat – enterprise AI assistant that can run on prem
97 comments
·May 7, 2025codingbot3000
simonw
There are plenty of other ways to run Mistral models on a Mac. I'm a big fan of Mistral Small 3.1.
I've run that using both Ollama (easiest) and MLX. Here are the Ollama models: https://ollama.com/library/mistral-small3.1/tags - the 15GB one works fine.
For MLX https://huggingface.co/mlx-community/Mistral-Small-3.1-24B-I... and https://huggingface.co/mlx-community/Mistral-Small-3.1-24B-I... should work, I use the 8bit one like this:
llm install llm-mlx
llm mlx download-model mlx-community/Mistral-Small-3.1-Text-24B-Instruct-2503-8bit -a mistral-small-3.1
llm chat -m mistral-small-3.1
The Ollama one supports image inputs too: llm install llm-ollama
ollama pull mistral-small3.1
llm -m mistral-small3.1 'describe this image' \
-a https://static.simonwillison.net/static/2025/Mpaboundrycdfw-1.png
Output here: https://gist.github.com/simonw/89005e8aa2daef82c53c2c2c62207...indigodaddy
Simon, can you recommend some small models that would be usable for coding on a standard M4 Mac Mini (only 16G ram) ?
simonw
That's pretty tough - the problem is that you need to have RAM left over to run actual applications!
Qwen 3 8B on MLX runs in just 5GB of RAM and can write basic code but I don't know if it would be good enough for anything interesting: https://simonwillison.net/2025/May/2/qwen3-8b/
Honestly though with that little memory I'd stick to running against hosted LLMs - Claude 3.7 Sonnet, Gemini 2.5 Pro, o4-mini are all cheap enough that it's hard to spend much money with them for most coding workflows.
jychang
16GB on a mac with unified memory is too small for good coding models. Anything on that machine is severely compromised. Maybe in ~1 year we will see better models that fit in ~8gb vram, but not yet.
Right now, for a coding LLM on a Mac, the standard is Qwen 3 32b, which runs great on any M1 mac with 32gb memory or better. Qwen 3 235b is better, but fewer people have 128gb memory.
Anything smaller than 32b, you start seeing a big drop off in quality. Qwen 3 14b Q4_K_M is probably your best option at 16gb memory, but it's significantly worse in quality than 32b.
reichardt
With around 4.6 GiB model size the new Qwen3-8B quantized to 4-bit should fit comfortably in 16 GiB of memory: https://huggingface.co/mlx-community/Qwen3-8B-4bit
kergonath
> I think this is a game changer, because data privacy is a legitimate concern for many enterprise users.
Indeed. At work, we are experimenting with this. Using a cloud platform is a non-starter for data confidentiality reasons. On-premise is the way to go. Also, they’re not American, which helps.
> Btw, you can also run Mistral locally within the Docker model runner on a Mac.
True, but you can do that only with their open-weight models, right? They are very useful and work well, but their commercial models are bigger and hopefully better (I use some of their free models every day, but none of their commercial ones).
distances
I also kind of don't understand how it seems everyone is using AI for coding. I haven't had a client yet which would have approved any external AI usage. So I basically use them as search engines on steroids, but code can't go directly in or out.
Pamar
Personally I am trying to see if we can leverage AI to help write design documents instead of code, based on a fairly large library of human (poorly) written design documents and bug reports.
fhd2
You might be able to get your clients to sign something to allow usage, but if you don't, as you say, it doesn't seem wise to vibe code for them. For two reasons:
1. A typical contract transfers the rights to the work. The ownership of AI generated code is legally a wee bit disputed. If you modify and refactor generated code heavily it's probably fine, but if you just accept AI generated code en masse, making your client think that you wrote it and it is therefore their copyright, that seems dangerous.
2. A typical contract or NDA also contains non disclosure, i.e. you can't share confidential information, e.g. code (including code you _just_ wrote, due to #1) with external parties or the general public willy nilly. Whether any terms of service assurances from OpenAI or Anthropic that your model inputs and outputs will probably not be used for training are legally sufficient, I have doubts.
IANAL, and _perhaps_ I'm wrong about one or both of these, in one or more countries, but by and large I'd say the risk is not worth the benefit.
I mostly use third party LLMs like I would StackOverflow: Don't post company code there verbatim, make an isolated example. And also don't paste from SO verbatim. I tried other ways of using LLMs for programming a few times in personal projects and can't say I worry about lower productivity with these limitations. YMMV.
(All this also generally goes for employees with typical employment contracts: It's probably a contract violation.)
mark_l_watson
I have good results running Ollama locally with olen models like Gemma 3, Qwen 3, etc. The major drawback is slower inference speed. Commercial APIs like Google Gemini are so much faster.
Still, I find local models very much worth using after taking the time to set them up with Emacs, open-codex, etc.
abujazar
You can set up your IDE to use local LLMs through e.g. Ollama if your computer is powerful enough to run a decent model.
shmel
How is it different from the cloud? Plenty startups store their code on github, run prod on aws, and keep all communications on gmail anyway. What's so different about LLMs?
trollbridge
Most my clients have the same requirement. Given the code bases I see my competition generating, I suspect other vendors are simply violating this rule.
betterThanTexas
I would take any such claim with a heavy rock of salt because the usefulness of AI is going to vary drastically with the sort of work you're tasked with producing.
crimsoneer
Are your clients not on AWS/Azure/GCP? They all offer private LLMs out of the box now.
ATechGuy
Have you tried using private inference that uses GPU confidential computing from Nvidia?
lolinder
Game changer feels a bit strong. This is a new entry in a field that's already pretty crowded with open source tooling that's already available to anyone with the time and desire to wire it all up. It's likely that they execute this better than the community-run projects have so far and make it more approachable and Enterprise friendly, but just for reference I have most of the features that they've listed here already set up on my desktop at home with Ollama, Open WebUI, and a collection of small hand-rolled apps that plug into them. I can't run very big models on mine, obviously, but if I were an Enterprise I would.
The key thing they'd need to nail to make this better than what's already out there is the integrations. If they can make it seamless to integrate with all the key third-party enterprise systems then they'll have something strong here, otherwise it's not obvious how much they're adding over Open WebUI, LibreChat, and the other self-hosted AI agent tooling that's already available.
abujazar
Actually you shouldn't be running LLMs in Docker on Mac because it doesn't have GPU support. So the larger models will be extremely slow if they'll even produce a single token.
burnte
I have an M4 Mac Mini with 24GB of RAM. I loaded Studio.LM on it 2 days ago and had Mistral NeMo running in ten minutes. It's a great model, I need to figure out how to add my own writing to it, I want it to generate some starter letters for me. Impressive model.
nicce
> Btw, you can also run Mistral locally within the Docker model runner on a Mac.
Efficiently? I thought macOS does not have API so that Docker could use GPU.
jt_b
I haven't/wouldn't use it because I have a decent K8S ollama/open-webui setup, but docker announced this a month ago: https://www.docker.com/blog/introducing-docker-model-runner
nicce
Hmm, I guess that is not actually running inside container/ there is no isolation. Some kind of new way that mixes llama.cpp , OCI format and docker CLI.
ulnarkressty
I think many in this thread are underestimating the desire of VPs and CTOs to just offload the risk somewhere else. Quite a lot of companies handling sensitive data are already using various services in the cloud and it hasn't been a problem before - even in Europe with its GDPR laws. Just sign an NDA or whatever with OpenAI/Google/etc. and if any data gets leaked they are on the hook.
boringg
Good luck ever winning that one. How are you going to prove out a data leak with an AI model without deploying excessive amounts of legal spend?
You might be talking about small tech companies that have no other options.
v3ss0n
What's the point when we can run much powerful models now? Qwen3 , Deepseek
_bin_
It would be short-termist for Americans or euros to use chinese-made models. Increasing their popularity has an indirect but significant cost in the long term. china "winning AI" should be an unacceptable outcome for America or europe by any means necessary.
Havoc
Not quite following. It seems to talk about features common associated with local servers but then ends with available on gcp
Is this an API point? A model enterprises deploy locally? A piece of software plus a local model?
There is so much corporate synergy speak there I can’t tell what they’re selling
85392_school
This announcement accompanies the new and proprietary Mistral Medium 3, being discussed at https://news.ycombinator.com/item?id=43915995
_pdp_
While I am rooting for Mistral, having access to a diverse set of models is the killer app IMHO. Sometimes you want to code. Sometimes you want to write. Not all models are made equal.
the_clarence
Tbh I think the one general model approach is winning. People don't want to figure out which model is better at what unless its for a very specific task.
sschueller
Couldn't you could place a very light weight model in front to figure out which model to use?
binsquare
Well that sounds right up the alley of what I built here: www.labophase.com
victorbjorklund
Why use this instead of an open source model?
dlachausse
> our world-class AI engineering team offers support all the way through to value delivery.
victorbjorklund
Guess that makes sense. But I'm sure they charge good money for it and then you could just use that money for someone helping you with an open source model.
iamnotagenius
[dead]
I_am_tiberius
I really love using le chat. I feel much more save giving information to them than to openai.
null
badmonster
interesting take. i wonder if other LLM competitors would do the same.
starik36
I don't see any mention of hardware requirements for on prem. What GPUs? How many? Disk space?
tootie
I'm guessing it's flexible. Mistral makes small models capable of running on consumer hardware so they can probably scale up and down based on needs. And what is available from hosts.
null
FuriouslyAdrift
GPT4All has been running locally for quite a while...
phupt26
Another new model ( Medium 3) of Mistral is great too. Link: https://newscvg.com/r/yGbLTWqQ
I think this is a game changer, because data privacy is a legitimate concern for many enterprise users.
Btw, you can also run Mistral locally within the Docker model runner on a Mac.