Gemini 3 for developers: New reasoning, agentic capabilities

srameshc

I think I am in this AI fatigue phase. I am past all hype with models, tools and agents and back to problem and solution approach, sometimes code gen with AI , sometimes think and ask for a piece of code. But not offloading to AI and buying all the bs, waiting it to do magic with my codebase.

amelius

Yeah, at this point I want to see the failure modes. Show me at least as many cases where it breaks. Otherwise, I'll assume it's an advertisement and I'll skip to the next headline. I'm not going to waste my time on it anymore.

Kiro

I agree but if Gemini 3 is as good as people on HN said about the preview, then this is the wrong announcement to sleep on.

jstummbillig

I think it's fun to see, what is not even considered magic anymore today.

ponyous

Just generated a bunch of 3D CAD models using Gemini 3.0 to see how it compares in spatial understanding and it's heaps better than anything currently out there - not only intelligence but also speed.

Will run extended benchmarks later, let me know if you want to see actual data.

lfx

Just hand sketched what 5 year old would do on the paper - the house, trees, sun. And asked to generate 3d model with tree.js.

Results are amazing! 2.5 and 3 seems way way head.

giancarlostoro

I'm not familiar enough with CAD what type of format is it?

koakuma-chan

When I see CAD, I always think of Casting Assistant Device.

ponyous

It’s not a format, but in my mind it implies designs that are supposed to be functional as opposed to models that are meant for virtual games.

It generated a blender script that makes the model.

adastra22

I would have used OpenSCAD for that purpose.

bilbo0s

Did your prompt instruct it to use blender?

slackerIII

What's the easiest way to set up automatic code review for PRs for my team on GitHub using this model?

clusterhacks

I wish I could just pay for the model and self-host on local/rented hardware. I'm incredibly suspicious of companies totally trying to capture us with these tools.

wohoef

Curious to see it in action. Gemini 2.5 has already been very impressive as a study buddy for courses like set theory, information theory, and automata. Although I’m always a bit skeptical of these benchmarks. Seems quite unlikely that all of the questions remain out of their training data.

mccoyb

I truly do not understand what plan to use so I can use this model for longer than ~2 minutes.

Using Anthropic or OpenAI's models are incredibly straightforward -- pay us per month, here's the button you press, great.

Where do I go for this for these Google models?

fschuett

Update VSCode to the latest version and click the small "Chat" button at the top bar. GitHub gives you like $20 for free per month and I think they have a deal with the larger vendors because their pricing is insanely cheap. One week of vibe-coding costs me like $15, only downside to Copilot is that you can't work on multiple projects at the same time because of rate-limiting.

kachapopopow

ai studio, you get a bunch of usage free if you want more you buy credits (google one subscriptions also give you some additional usage)

mccoyb

I see -- so this is the "paid" AI studio plan?

Does that have any relation to the Gemini plan thing: https://one.google.com/explore-plan/gemini-advanced?utm_sour...

kachapopopow

that's for the first party google integrations - not 3rd party. ai studio just gives you an api key that you can use anywhere.

aliljet

Understanding precisely why Gemini 3 isn't front of the pack on SWE Bench is really what I was hoping to understand here. Especially for a blog post targeted at software developers...

cube2222

Yeah, they mention a benchmark I'm seeing the first time (Terminal-Bench 2.0) and are supposedly leading in, while for some reason SWE Bench is down from Sonnet 4.5.

Curious to see some third-party testing of this model. Currently it seems to primarily improve of "general non-coding and visual reasoning" primarily, based on the benchmarks.

svantana

SWEBench-Verified is probably benchmaxxed at this stage. Claude isn't even the top performer, that honor goes to Doubao [1].

Also, the confidence interval for a such a small dataset is about 3 percent points, so these differences could just be up to chance.

[1] https://www.swebench.com/

pawelduda

Why is this particular benchmark important?

aliljet

Thus far, this is one of the best objective evaluations of real world software engineering...

adastra22

Idk, Sonnet 4.5 score better than Sonnet 4.0 on that benchmark, but is markedly worse in my usage. The utility of the benchmark is fading as it is gamed.

spookie

Does anyone trust benchmarks at this point? Genuine question. Isn't the scientific consensus that they are broken and poor evaluation tools?

mudkipdev

I make my own automated benchmarks

null

[deleted]

deanc

The AntiGravity seems to be a bit overwhelmed. Unable to set up an account at the moment.

jordanpg

What is Gemini 3 under the hood? Is it still just a basic LLM based on transformers? Or are there all kinds of other ML technologies bolted on now? I feel like I've lost the plot.

anilgulecha

It's a mixture-of-experts model. Basically N smaller model pieces put together, and when inference occurs, only 1 is active at a time. Each model piece would be tuned/good in one area.

meowface

I am very ignorant in this field but I am pretty sure under the hood they are all still fundamentally built on the transformer architecture, or at least innovations on the original transformer architecture.

fosterfriends

Gemini 3 and 3 pro are good bit cheaper than Sonnet 4.5 as well. Big fan

hubraumhugo

No gemini-3-flash yet, right? Any ETA on that mentioned? 2.5-flash has been amazing in terms of cost/value ratio.