Ollama Turbo

109 comments

·August 5, 2025

captainregex

I am so so so confused as to why Ollama of all companies did this other than an emblematic stab at making money-perhaps to appease someone putting pressure on them to do so. Their stuff does a wonderful job of enabling local for those who want it. So many things to explore there but instead they stand up yet another cloud thing? Love Ollama and hope it stays awesome

extr

Nice release. Part of the problem right now with OSS models (at least for enterprise users) is the diversity of offerings in terms of:

- Speed

- Cost

- Reliability

- Feature Parity (eg: context caching)

- Performance (What quant level is being used...really?)

- Host region/data privacy guarantees

- LTS

And that's not even including the decision of what model you want to use!

Realistically if you want to use an OSS model instead of the big 3, you're faced with evalutating models/providers across all these axes, which can require a fair amount of expertise to discern. You may even have to write your own custom evaluations. Meanwhile Anthropic/OAI/Google "just work" and you get what it says on the tin, to the best of their ability. Even if they're more expensive (and they're not that much more expensive), you are basically paying for the priviledge of "we'll handle everything for you".

I think until providers start standardizing OSS offerings, we're going to continue to exist in this in-between world where OSS models theoretically are at performance parity with closed source, but in practice aren't really even in the running for serious large scale deployments.

coderatlarge

true but ignores handing over all your prompt traffic without any real legal protections as sama has pointed out:

[1] https://californiarecorder.com/sam-altman-requires-ai-privil...

supermatt

> OpenAI confirmed it has been preserving deleted and non permanent person chat logs since mid-Might 2025 in response to a federal court docket order

> The order, embedded under and issued on Might 13, 2025, by U.S. Justice of the Peace Decide Ona T. Wang

Is this some meme where “may” is being replaced with “might”, or some word substitution gone awry? I don’t get it.

kekebo

:)) Apparently. I don't have a better guess. Well spotted

dcreater

Called it.

It's very unfortunate that the local inference community has aggregated around Ollama when it's clear that's not their long term priority or strategy.

Its imperative we move away ASAP

janalsncm

Huggingface also offers a cloud product, but that doesn’t take away from downloading weights and running them locally.

tarruda

Llama.cpp (library which ollama uses under the hoods) has its own server, and it is fully compatible with open-webui.

I moved away from ollama in favor of llama-server a couple of months ago and never missed anything, since I'm still using the same UI.

mrcwinn

Yes, better to get free sh*t unsustainably. By the way, you're free to create an open source alternative and pour your time into that so we can all benefit. But when you don't — remember I called it!

mchiang

hmm, how so? Ollama is open and the pricing is completely optional for users who want additional GPUs.

Is it bad to fairly charge money for selling GPUs that cost us money too, and use that money to grow the core open-source project?

At one point, it just has to be reasonable. I'd like to believe by having a conscientious, we can create something great.

idiotsecant

Oh no this is a positively diabolical development, offering...hosting services tailored to a specific use case at a reasonable price ...

jasonjmcghee

Interested to see how this plays out - I feel like Ollama is synonymous with "local".

Aurornis

There's a small but vocal minority of users who don't trust big companies, but don't mind paying small companies for a similar service.

I'm also interested to see if that small minority of people are willing to pay for a service like this.

recursivegirth

Ollama, run by Facebook. Small company, huh.

mchiang

Ollama is not run by Facebook. We are a small team building our dreams.

threetonesun

I view it a bit like I do cloud gaming, 90% of the time I'm fine with local use, but sometimes it's just more cost effective to offload the cost of hardware to someone else. But it's not an all-or-nothing decision.

paxys

A subscription fee for API usage is definitely an interesting offering, though the actual value will depend on usage limits (which are kept hidden).

mchiang

we are learning the usage patterns to be able to price this more properly.

jacekm

What could be the benefit of paying $20 to Ollama to run inferior models instead of paying the same amount of money to e.g. OpenAI for access to sota models?

daft_pink

I feel the primary benefit of this Ollama Turbo is that you can quickly test and run different models in the cloud that you could run locally if you had the correct hardware.

This allows you to try out some open models and better assess if you could buy a dgx box or Mac Studio with a lot of unified memory and build out what you want to do locally without actually investing in very expensive hardware.

Certain applications require good privacy control and on-prem and local are something certain financial/medical/law developers want. This allows you to build something and test it on non-private data and then drop in real local hardware later in the process.

null

[deleted]

_--__--__

Groq seems to do okay with a similar service but I think their pricing is probably better.

Geezus_42

Yeah, the NAZI sex not will be great for business!

fredoliveira

Groq (the inference service) != Grok (xAI's model)

gabagool

You are thinking of Elon Grok, not Groq

rapind

I'm not sure the major models will remain at $20. Regardless, I support any and all efforts to keep the space crowded and competitive.

michelsedgh

I think its the data privacy is the main point and probably more usage before you hit limits? But mainly data privacy i guess

ibejoeb

I run a lot of mundane jobs that work fine with less capable models, so I can see the potential benefit. It all depends on the limits though.

AndroTux

Privacy, I guess. But at this point it’s just believing that they won’t log your data.

vanillax

nothing lmao. this is just ollama trying to make money.

irthomasthomas

If these are FP4 like the other ollama models then I'm not very interested. If I'm using an API anyway I'd rather use the full weights.

mchiang

OpenAI has only provided MXFP4 weights. These are the same weights used by other cloud providers.

liuliu

Any more information on "Privacy first"? It seems pretty thin if just not retaining data.

For Draw Things provided "Cloud Compute", we don't retain any data too (everything is done in RAM per request). But that is still unsatisfactory personally. We will soon add "privacy pass" support, but still not to the satisfactory. Transparency log that can be attested on the hardware would be nice (since we run our open-source gRPCServerCLI too), but I just don't know where to start.

pagekicker

I see no privacy advantage to working with Ollama, which can sell your data or have it subpoenaed just like anyone else.

jnmandal

I see a lot of hate for ollama doing this kind of thing but also they remain one of the easiest to use solutions for developing and testing against a model locally.

Sure, llama.cpp is the real thing, ollama is a wrapper... I would never want to use something like ollama in a production setting. But if I want to quickly get someone less technical up to speed to develop an LLM-enabled system and run qwen or w/e locally, well then its pretty nice that they have a GUI and a .dmg to install.

mchiang

Thanks for the kind words.

Since the new multimodal engine, Ollama has moved off of llama.cpp as a wrapper. We do continue to use the GGML library, and ask hardware partners to help optimize it.

Ollama might look like a toy and what looks trivial to build. I can say, to keep its simplicity, we go through a deep amount of struggles to make it work with the experience we want.

Simplicity is often overlooked, but we want to build the world we want to see.

dcreater

But Ollama is a toy, it's meaningful for hobbyists and individuals to use locally like myself. Why would it be the right choice for anything more? AWS, vLLM, SGLang etc would be the solutions for enterprise

I knew a startup that deployed ollama on a customers premises and when I asked them why, they had absolutely no good reason. Likely they did it because it was easy. That's not the "easy to use" case you want to solve for.

steren

> I would never want to use something like ollama in a production setting.

We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon.

apitman

Did you test multi-user cases?

ekianjo

you need to benchmark against llama.cpp as well.

timmg

It says “usage-based pricing” is coming soon. I think that is the sweet spot for a service like this.

I pay $20 to Anthropic, so I don’t think I’d get enough use out of this for the $20 fee. But being able to spin up any of these models and use as needed (and compare) seems extremely useful to me.

I hope this works out well for the team.

ac29

> It says “usage-based pricing” is coming soon. I think that is the sweet spot for a service like this.

Agreed, though there are already several providers of these new OpenAI models available, so I'm not sure what ollama's value add is there (there are plenty of good chat/code/etc interfaces available if you are bringing your own API keys).

domatic1

Open router competition?

philip1209

Seems like an easy way to run gpt-oss for development environments on laptops. Probably necessary if you plan to self-host in production.

HN

Ollama Turbo

Ollama Turbo