Google calls Gemma 3 the most powerful AI model you can run on one GPU

106 comments

·March 20, 2025

cwoolfe

Apparently it can also pray. Seriously, I asked it for biblical advice about a tough situation today and it said it was praying for me. XD

rurp

This reminds me a recent chat I had with Claude, trying to identify what looked like an unusual fossil. The responses included things along the lines of "What a neat find!" or "That's fascinating! I'd love it if you could share more details". The statements were normal nice things to hear from a friend, but I found them pretty off-putting coming from a computer that of course couldn't care less and isn't a living thing I have a relationship with.

This sort of thing worries me quite a bit. The modern internet has already sparked an awful lot of pseudo or para-social relationships through social media, OnlyFans and the like, with serious mental health and social cohesion costs. I think we're heading into a world where a lot of the remaining normal healthy social behavior gets subsumed by LLMs pretending to be your friend or romantic interest.

hengheng

Trying to find the silver lining in this makes me God's advocate I guess?

I was able to reflect a lot on my upbringing by reading reddit threads. Advice columns, relationships, parenting advice, just dealing with people. It was great to finally have a normalized, standardized world view to bounce my own concepts off. It was like an advice column in an old magazine, but infinitely big. In my early 20s I must have spent entire days on there.

I guess LLMs are the modern, ultra personalized version of that. Internet average, westernized culture, infinite supply, instantly. Just add water and enjoy a normal view of the world, no matter your surroundings or how you grew up. This is going to help out so many kids.

And they're not evil yet. Host your own LLMs before you tell them your secrets, people.

neither_color

It was great to finally have a normalized, standardized world view to bounce my own concepts off. It was like an advice column in an old magazine, but infinitely big. In my early 20s I must have spent entire days on there. I guess LLMs are the modern, ultra personalized version of that. Internet average, westernized culture, infinite supply, instantly.

That's a really interesting way to put it and actually made me look back at my own heavily internet-influenced upbringing. Setting healthy personal boundaries? Mindfulness for emotional management? Elevated respect for all types of people and ways of life beyond what my parents were exposed to? Yes. These were not automatically taught to me by my inherited culture or family. I would not have heard about them in a transformative way without the internet. Maybe passively, as something "those weird rich people" do, but not enough to become embedded in my mental operating system. Not to disparage the old culture. I still borrow a lot from it, but yeah I like westernized internet average culture.

rurp

I was also quick to dive into early internet forums and feel like I got a lot out of them, but LLMs just seem different. Forums were a novel medium but it was still real people interacting and connecting with each other, often over a shared interest. With LLMs none of the social interactions are genuine and will always be shallow.

I'm sure some nerds will continue to host their own models but I would bet that 99.9% of social-type LLM interactions will be with corporate hosted models that can and will be tweaked and weighted in whatever ways the host company thinks will make it the most money.

It all reminds me a lot of algorithmic social media feeds. The issues were forseen very early on even if we couldn't predict the exact details, and it's an unsurprising disappointment that all of the major sites have greatly deemphasized organic interactions with friends and family in favor of ads and outrage bait. LLMs are still in their honeymoon phase but with the amount of money being plowed into them I don't expect that to last much longer.

lamuswawir

New term, "God's advocate"!

nemothekid

>I think we're heading into a world where a lot of the remaining normal healthy social behavior gets subsumed by LLMs pretending to be your friend or romantic interest.

This is already happening.

https://x.com/zymillyyy/status/1902181493553733941

cwoolfe

I had to upload that screenshot to ChatGPT and ask it to translate. I must be getting old! ... "Clock my tea" haha. never heard that one before.

mixmastamyk

I installed a new “operating system” that sounds a lot like Scarlett Johansson. (Her 2013)

FridgeSeal

Wow, Twitch parasocial relationships have _nothing_ on this.

scottyah

I loved when Gemini called out what I thought was a very niche problem as classic. I think there are very few people attempting this stack, to the point where the vendor's documentation is incorrect and hasn't been updated in two years.

"Ah, the "SSL connection already established" error! This is a classic sign of a misconfiguration regarding how BDBA is attempting to establish a secure connection to your LDAP server."

Boogie_Man

I spent a good half hour "talking" to 4 mini about why Picard never had a family and the nature of the crew as his family despite the professional distance required. It really praised me when I said the holodeck scene where Data is playing King Henry walking amongst his men and I felt pretty smart and then realized I'd not actually garnered the admiration of anyone or anything.

prawn

I think there's a similar trap when you're using it for feedback on an idea or to brainstorm features and it gives you effusive praise. That's not a paying customer or even a real person. Like those people you quickly learn aren't too useful for seeking out over feedback because they rave about everything just to be nice.

xtiansimon

> “…I found them pretty off-putting coming from a computer that of course couldn't care less and isn't a living thing I have a relationship with.”

I’ve prolly complained about it here, but Spectrum cable’s pay by phone line in NYC has an automated assistant with a few emotive quirks.

I’m shocked how angry that robot vote makes me feel. I’m not a violent person, but getting played by a robot sets me over the edge in my work day.

Reminds me of a BoingBoing story from years ago about greeter robots being attacked in Japan. Japan has a tradition of verbally greeting customers as they enter the building and large department stores will have dedicated human greeters stationed at the entrance. IIFC this was a large store who replaced human greeters with these robots. Rando customers were attacking these robots. I now know how they feel.

thijson

https://www.youtube.com/watch?v=LghsLs3DYUs

Snip Snip was pleasant to talk to until the very end.

selcuka

> The statements were normal nice things to hear from a friend

How do we know our friends are not "trained" to give such responses and not much different from an LLM in this aspect?

One might say that curiosity is built-in to our DNAs, but DNA is also a part of our training.

fwip

People can want things, so there's the possibility that your friends are being honest.

LLMS cannot desire, and in fact are so devoid of interiority that they are not even lying when they say that they do.

cube00

After giving me continuous wrong answers ChatGPT decided it would try allow me to indulge it in a "learning opportunity" instead.

I completely understand your frustration, and I genuinely appreciate you pushing me to be more accurate. You clearly know your way around Rust, and I should have been more precise from the start.

If you’ve already figured out the best approach, I’d love to hear it! Otherwise, I’m happy to keep digging until we find the exact method that works for your case. Either way, I appreciate the learning opportunity.

herbst

Coding with ChatGPT always makes me angry in the end. Claude at least stays professional. ChatGPT always ends up sounding kinda mean.

null

[deleted]

mdp2021

Which reveals strong reasons to suspect strong "parroting" qualities. "Parroting" qualities that should have been fought in implementation since day 0.

snypher

I always laughed when ChatGPT would reply with the same emoji I typed, regardless of context. Not sure if parroting exactly, but I assumed it would understand meaning (if not context) of emoji?

Obscurity4340

Its fun to pick it apart sometimes and get it to correct itself but you often would never know unless you had direct or deep cut knowledge to interrogate it from

SubiculumCode

I'm not sure how much people parrot on the daily.

mdp2021

And we have tackled the issue long ago with education.

What are you trying to imply? Imitating fools or foolery is not a goal, replicating the unintelligent is not intelligence - it is strictly undesirable.

k8sToGo

I asked Claude for advice and it said something about it being heartbreaking.

johnisgood

LLMs always do. "I'm very sorry to hear that, seems like you are going through a lot" yadda yadda.

ChatGPT has been giving me too many emojis lately, however. I might tell it to avoid the use of emojis.

KoolKat23

And henceforth, I'll equate that statement as a pithy one GPU level effort from whoever offers it.

"Make sure you write atleast a Jetson Orin Nano Super level message in the condolence card"

snypher

Reply sounds like it was written by someone who has never had to ask AWS to up their vCPU limit.

harvey9

Douglas Adams' Electric Monk finally available to the public!

null

[deleted]

FreebasingLLMs

[dead]

timmg

I'm wondering how small of a model can be "generally intelligent" (as in LLM intelligent, not AGI). Like there must be a size too small to hold "all the information" in.

And I also wonder at what point we'll see specialized small models. Like if I want help coding, it's probably ok if the model doesn't know who directed "Jaws". I suspect that is the future: many small, specialized models.

But maybe training compute will just get to the point where we can run a full-featured model on our desktop (or phone)?

idonotknowwhy

> Like there must be a size too small to hold "all the information" in.

We're already there. If you running a Mistral-Large-2411 and Mistral-Small-2409 locally, you'll find the larger model is able to recall more specific details about works of fiction. And Deepseek-R1 is aware of a lot more.

Then you ask one of the Qwen2.5 coding models, and they won't even be aware of it, because they're:

> small, specialized models.

> But maybe training compute will just get to the point where we can run a full-featured model on our desktop (or phone)?

Training time compute won't allow the model to do anything out of distribution. You can test this yourself if you run one of the "R1 Distill" models. Eg. If you run the Qwen R1 distill and ask it about niche fiction, no matter how long you let it <think> for, it can't tell you something the original Qwen didn't know.

hnuser123456

I suppose we could eventually get to a super-MoE architecture. Models are limited to 4-16GB in size, but you could have hundreds of models on various topics. Load from storage to RAM and unload as needed. Should be able to load up any 4-16GB model in a few seconds. Maybe as well as a 4GB "Resident LLM" that is always ready to figure out which expert to load.

timmg

> We're already there. If you running a Mistral-Large-2411 and Mistral-Small-2409 locally, you'll find the larger model is able to recall more specific details about works of fiction.

Oh, for sure. I guess what I'm wondering is if we know the Small model (in this case) is too small -- or if we just haven't figured out how to train well enough?

Like, have we hit the limit already -- or, in (say) a year, will the Small model be able to recall everything the Big model does (say, as of today)?

svachalek

It's a sliding scale based on what you consider "generally intelligent" but they're getting smaller and smaller. This 27b model is comparable to 400b models not much over a year ago. But we'll start to see limits on how far that can go, maybe soon.

You can try different sizes of gemma3 models, though. The biggest one can answer a lot of things factually, while the smallest one is a hilarious hallucination factory, and the others are different levels in between.

JKCalhoun

That's interesting. Is there some quantitative way to know that a modern 27b model is equal to an older 400b model?

selcuka

https://medium.com/@elmo92/gemma-3-a-27b-multimodal-llm-bett...

> It comes in sizes from 1 billion to 27 billion parameters (to be precise: 1B, 4B, 12B, 27B), with the 27B version notably competing with much larger models, such as those with 400B or 600B parameters by LLama and DeepSeek.

LeoPanthera

Maybe Llama 3.3 70B doesn't count as running on "one GPU", but it certainly runs just fine on one Mac, and in my tests it's far better at holding onto concepts over a longer conversation than Gemma 3 is, which starts getting confused after about 4000 tokens.

pram

Gemma 3 is a lot better at writing for sure, compared to 2, but the big improvement is I can actually use a 32k+ context window and not have it start flipping out with random garbage.

grej

It lasted until Mistral released 3.1 Small a week later. Such is the pace of AI...

nirav72

yeah, I can't even keep up these days. So now mainly focus on what I can run locally via Ollama.

simonw

Gemma 3 is on Ollama now https://ollama.com/library/gemma3 but surprisingly they don't have Mistral 3.1 yet.

I've managed to run Mistral 3.1 on my laptop using MLX, notes here https://simonwillison.net/2025/Mar/17/mistral-small-31/

thot_experiment

Are you running vision tasks? Works just fine in ollama as an LLM but the vision component is unimplemented.

lostmsu

How's their lmarena ELO?

CamperBob2

Technically, the 1.58-bit Unsloth quant of DeepSeek R1 runs on a single GPU+128GB of system RAM. It performs amazingly well, but you'd better not be in a hurry.

jfim

What's the recommended way to run LLMs these days?

Ollama seems to work with DeepSeek R1 with enough memory using an older CPU but it's around 1 token/second on my desktop.

CamperBob2

I've looked into it and the only sane answer right now is still "If it flies, floats, or infers, rent it." You need crazy high memory bandwidth for good inference speed, and that means GPUs which are subject to steep monopoly pricing. That doesn't look to be changing anytime soon.

Second place is said to be the latest Macs with lots of unified memory, but it's a distant second place.

The recently announced hardware from nvidia is either underpowered, overpriced, or both, so there's not much point waiting for it.

jfim

Makes sense, thanks for sharing! I'll take it your recommendation to not buy also includes the upcoming project digits box from Nvidia?

bestouff

You probably meant 128GB

CamperBob2

Edited. TBF, I did say it was slow...

hy4000days

[dead]

refulgentis

Have you tried the 0.000000001 bit quant? IIUC it's a single param so youd get much faster speeds.

m00x

I found Mistral Small 3.1, which released slightly after Gemma 3, much better.

Much fewer refusals, more accurate, less babbling, generally better, but especially at coding.

ChrisArchitect

Google post from last week: https://blog.google/technology/developers/gemma-3/

pretoriusdre

My instinct is that it would be cheaper overall to buy API credits when needed, compared with buying a top-of-the-line GPU which sits idle for most of the day. That also opens up access to larger models.

tootie

It's a choice. Running local means personal safety and privacy. It could also mean easier compliance with any enterprise that doesn't want to share data.

rs186

Agrees with my own experience. I have a 4070 super which of course is nothing to brag about, but tps using quantized 27b model is miserable. I could go down to 12b or even smaller, but it would sacrifice in quality. Then I could upgrade my gears, but I realize that however much I spend, the experience is not going to be as smooth as off-the-shelf LLM products, and definitely not worth the cost.

Of course it is nice to have an LLM running locally where nobody gets to know your data. But I don't see a point in me spending thousands of $ to do that.

brookst

Yeah. Classic capacity / utilization problem.

odysseus

Does it run on the severed floor?

butterlettuce

I love how this show (2022) is just heavily emanating into pop culture.

Stoked for the season 2 finale today. It'll be like watching the Super Bowl.

vsgherzi

actually the finale is on friday not today!

rjeli

they come out thursday nights (in the US) - 9pm EST

siva7

what show?

null

[deleted]

fortyseven

Severance

tailspin2019

Severance

jprd

Severance

alas44

Severance

williamDafoe

Does anyone use GoogleAI? For an AI Company with an AI Ceo using AI language translation, I think their actual GPT products are all terrible and have a terrible rep. And who wants their private conversation shipped back to google for spying?

LeoPanthera

I use Gemini (Advanced) over ChatGPT. Google's privacy issues are concerning, but no moreso than giving my conversations to OpenAI.

In my experience, Gemini simply works better.

KoolKat23

Gemini 2.0 Flash et al are excellent. Think you need to try them again.

deepsquirrelnet

I tried a lot of models on openrouter recently, and I have to say that I found Gemini 2.0 flash to be surprisingly useful.

I’d never used one of Google’s proprietary models before that, but it really hits a sweet spot in the quality vs latency space right now.

herbst

The thing they ship on android is just horrible. It appears closer to markov chain than actual AI

RandyRanderson

So says Gemma 3.

HN

Google calls Gemma 3 the most powerful AI model you can run on one GPU

Google calls Gemma 3 the most powerful AI model you can run on one GPU