Ollama Turbo
109 comments
·August 5, 2025extr
Nice release. Part of the problem right now with OSS models (at least for enterprise users) is the diversity of offerings in terms of:
- Speed
- Cost
- Reliability
- Feature Parity (eg: context caching)
- Performance (What quant level is being used...really?)
- Host region/data privacy guarantees
- LTS
And that's not even including the decision of what model you want to use!
Realistically if you want to use an OSS model instead of the big 3, you're faced with evalutating models/providers across all these axes, which can require a fair amount of expertise to discern. You may even have to write your own custom evaluations. Meanwhile Anthropic/OAI/Google "just work" and you get what it says on the tin, to the best of their ability. Even if they're more expensive (and they're not that much more expensive), you are basically paying for the priviledge of "we'll handle everything for you".
I think until providers start standardizing OSS offerings, we're going to continue to exist in this in-between world where OSS models theoretically are at performance parity with closed source, but in practice aren't really even in the running for serious large scale deployments.
coderatlarge
true but ignores handing over all your prompt traffic without any real legal protections as sama has pointed out:
[1] https://californiarecorder.com/sam-altman-requires-ai-privil...
supermatt
> OpenAI confirmed it has been preserving deleted and non permanent person chat logs since mid-Might 2025 in response to a federal court docket order
> The order, embedded under and issued on Might 13, 2025, by U.S. Justice of the Peace Decide Ona T. Wang
Is this some meme where “may” is being replaced with “might”, or some word substitution gone awry? I don’t get it.
kekebo
:)) Apparently. I don't have a better guess. Well spotted
dcreater
Called it.
It's very unfortunate that the local inference community has aggregated around Ollama when it's clear that's not their long term priority or strategy.
Its imperative we move away ASAP
janalsncm
Huggingface also offers a cloud product, but that doesn’t take away from downloading weights and running them locally.
tarruda
Llama.cpp (library which ollama uses under the hoods) has its own server, and it is fully compatible with open-webui.
I moved away from ollama in favor of llama-server a couple of months ago and never missed anything, since I'm still using the same UI.
mrcwinn
Yes, better to get free sh*t unsustainably. By the way, you're free to create an open source alternative and pour your time into that so we can all benefit. But when you don't — remember I called it!
mchiang
hmm, how so? Ollama is open and the pricing is completely optional for users who want additional GPUs.
Is it bad to fairly charge money for selling GPUs that cost us money too, and use that money to grow the core open-source project?
At one point, it just has to be reasonable. I'd like to believe by having a conscientious, we can create something great.
idiotsecant
Oh no this is a positively diabolical development, offering...hosting services tailored to a specific use case at a reasonable price ...
jasonjmcghee
Interested to see how this plays out - I feel like Ollama is synonymous with "local".
Aurornis
There's a small but vocal minority of users who don't trust big companies, but don't mind paying small companies for a similar service.
I'm also interested to see if that small minority of people are willing to pay for a service like this.
recursivegirth
Ollama, run by Facebook. Small company, huh.
mchiang
Ollama is not run by Facebook. We are a small team building our dreams.
threetonesun
I view it a bit like I do cloud gaming, 90% of the time I'm fine with local use, but sometimes it's just more cost effective to offload the cost of hardware to someone else. But it's not an all-or-nothing decision.
paxys
A subscription fee for API usage is definitely an interesting offering, though the actual value will depend on usage limits (which are kept hidden).
mchiang
we are learning the usage patterns to be able to price this more properly.
jacekm
What could be the benefit of paying $20 to Ollama to run inferior models instead of paying the same amount of money to e.g. OpenAI for access to sota models?
daft_pink
I feel the primary benefit of this Ollama Turbo is that you can quickly test and run different models in the cloud that you could run locally if you had the correct hardware.
This allows you to try out some open models and better assess if you could buy a dgx box or Mac Studio with a lot of unified memory and build out what you want to do locally without actually investing in very expensive hardware.
Certain applications require good privacy control and on-prem and local are something certain financial/medical/law developers want. This allows you to build something and test it on non-private data and then drop in real local hardware later in the process.
null
_--__--__
Groq seems to do okay with a similar service but I think their pricing is probably better.
Geezus_42
Yeah, the NAZI sex not will be great for business!
fredoliveira
Groq (the inference service) != Grok (xAI's model)
gabagool
You are thinking of Elon Grok, not Groq
rapind
I'm not sure the major models will remain at $20. Regardless, I support any and all efforts to keep the space crowded and competitive.
michelsedgh
I think its the data privacy is the main point and probably more usage before you hit limits? But mainly data privacy i guess
ibejoeb
I run a lot of mundane jobs that work fine with less capable models, so I can see the potential benefit. It all depends on the limits though.
AndroTux
Privacy, I guess. But at this point it’s just believing that they won’t log your data.
vanillax
nothing lmao. this is just ollama trying to make money.
irthomasthomas
If these are FP4 like the other ollama models then I'm not very interested. If I'm using an API anyway I'd rather use the full weights.
mchiang
OpenAI has only provided MXFP4 weights. These are the same weights used by other cloud providers.
liuliu
Any more information on "Privacy first"? It seems pretty thin if just not retaining data.
For Draw Things provided "Cloud Compute", we don't retain any data too (everything is done in RAM per request). But that is still unsatisfactory personally. We will soon add "privacy pass" support, but still not to the satisfactory. Transparency log that can be attested on the hardware would be nice (since we run our open-source gRPCServerCLI too), but I just don't know where to start.
pagekicker
I see no privacy advantage to working with Ollama, which can sell your data or have it subpoenaed just like anyone else.
jnmandal
I see a lot of hate for ollama doing this kind of thing but also they remain one of the easiest to use solutions for developing and testing against a model locally.
Sure, llama.cpp is the real thing, ollama is a wrapper... I would never want to use something like ollama in a production setting. But if I want to quickly get someone less technical up to speed to develop an LLM-enabled system and run qwen or w/e locally, well then its pretty nice that they have a GUI and a .dmg to install.
mchiang
Thanks for the kind words.
Since the new multimodal engine, Ollama has moved off of llama.cpp as a wrapper. We do continue to use the GGML library, and ask hardware partners to help optimize it.
Ollama might look like a toy and what looks trivial to build. I can say, to keep its simplicity, we go through a deep amount of struggles to make it work with the experience we want.
Simplicity is often overlooked, but we want to build the world we want to see.
dcreater
But Ollama is a toy, it's meaningful for hobbyists and individuals to use locally like myself. Why would it be the right choice for anything more? AWS, vLLM, SGLang etc would be the solutions for enterprise
I knew a startup that deployed ollama on a customers premises and when I asked them why, they had absolutely no good reason. Likely they did it because it was easy. That's not the "easy to use" case you want to solve for.
steren
> I would never want to use something like ollama in a production setting.
We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon.
apitman
Did you test multi-user cases?
ekianjo
you need to benchmark against llama.cpp as well.
timmg
It says “usage-based pricing” is coming soon. I think that is the sweet spot for a service like this.
I pay $20 to Anthropic, so I don’t think I’d get enough use out of this for the $20 fee. But being able to spin up any of these models and use as needed (and compare) seems extremely useful to me.
I hope this works out well for the team.
ac29
> It says “usage-based pricing” is coming soon. I think that is the sweet spot for a service like this.
Agreed, though there are already several providers of these new OpenAI models available, so I'm not sure what ollama's value add is there (there are plenty of good chat/code/etc interfaces available if you are bringing your own API keys).
domatic1
Open router competition?
philip1209
Seems like an easy way to run gpt-oss for development environments on laptops. Probably necessary if you plan to self-host in production.
I am so so so confused as to why Ollama of all companies did this other than an emblematic stab at making money-perhaps to appease someone putting pressure on them to do so. Their stuff does a wonderful job of enabling local for those who want it. So many things to explore there but instead they stand up yet another cloud thing? Love Ollama and hope it stays awesome