Gemini 2.5 Flash
560 comments
·April 17, 2025zoogeny
jeeeb
After comparing Gemini Pro and Claude Sonnet 3.7 coding answers side by side a few times, I decided to cancel my Anthropic subscription and just stick to Gemini.
blueyes
One of the main advantages Anthropic currently has over Google is the tooling that comes with Claude Code. It may not generate better code, and it has a lower complexity ceiling, but it can automatically find and search files, and figure out how to fix a syntax error fast.
bayarearefugee
As another person that cancelled my Claude and switched to Gemini, I agree that Claude Code is very nice, but beyond some initial exploration I never felt comfortable using it for real work because Claude 3.7 is far too eager to overengineer half-baked solutions that extend far beyond what you asked it to do in the first place.
Paying real API money for Claude to jump the gun on solutions invalidated the advantage of having a tool as nice as Claude Code, at least for me, I admit everyone's mileage will vary.
igor47
I've switched to aider with the --watch-files flag. Being able to use models in nvim with no additional tooling is pretty sweet
vladmdgolam
There are at least 10 projects currently aiming to recreate Claude Code, but for Gemini. For example, geminicodes.co by NotebookLM’s founding PM Raiza Martin
mrinterweb
I don't understand the appeal of investing in leaning and adapting your workflow to use an AI tool that is so tightly coupled to a single LLM provider, when there are other great AI tools available that are not locked to a single LLM provider. I would guess aider is the closest thing to claude code, but you can use pretty much any LLM.
The LLM field is moving so fast that what is the leading frontier model today, may not be the same tomorrow.
Pricing is another important consideration. https://aider.chat/docs/leaderboards/
energy123
Google need to fix their Gemini web app at a basic level. It's slow, gets stuck on Show Thinking, rejects 200k token prompts that are sent one shot. Aistudio is in much better shape.
mogili
I use roo code with Gemini to get similar results for free
julianeon
Related:
Only Claude (to my knowledge) has a desktop app which can directly, and usually quite intelligently, modify files and create repos on your desktop. It's the only "agentic" option among the major players.
"Claude, make me an app which will accept Stripe payments and sell an ebook about coding in Python; first create the app, then the ebook."
It would take a few passes but Claude could do this; obviously you can't do that with an API alone. That capability alone is worth $30/month in my opinion.
WiSaGaN
Also the "project" feature in claude improves experience significantly for coder, where you can customize your workflow. Would be great if gemini has this feature.
onlyrealcuzzo
Yes, IME, Anthropic seemed to be ahead of Google by a decent amount with Sonnet 3.5 vs 1.5 Pro.
However, Sonnet 3.7 seemed like a very small increase, whereas 2.5 Pro seemed like quite a leap.
Now, IME, Google seems to be comfortably ahead.
2.5 Pro is a little slow, though.
I'm not sure which model Google uses for the AI answers on search, but I find myself using Search for a lot of things I might ask Gemini (via 2.5 Pro) if it was as fast as Search's AI answers.
null
dmix
How's is the speed of Gemini vs 3.7?
jessep
I have had a few epic refactoring failures with Gemini relative to Claude.
For example: I asked both to change a bunch of code into functions to pass into a `pipe` type function, and Gemini truly seemed to have no idea what it was supposed to do, and Claude just did it.
Maybe there was some user error or something, but after that I haven’t really used Gemini.
I’m curious if people are using Gemini and loving it are using it mostly for one-shotting, or if they’re working with it more closely like a pair programmer? I could buy that it could maybe be good at one but bad at the other?
Asraelite
This has been my experience too. Gemini might be better for vibe coding or architecture or whatever, but Claude consistently feels better for serious coding. That is, when I know exactly how I want something implemented in a large existing codebase, and I go through the full cycle of implementation, refinement, bug fixing, and testing, guiding the AI along the way.
It also seems to be better at incorporating knowledge from documentation and existing examples when provided.
mamp
I've been using Gemini 2.5 and Claude 3.7 for Rust development and I have been very impressed with Claude, which wasn't the case for some architectural discussions where Gemini impressed with it's structure and scope. OpenAI 4.5 and o1 have been disappointing in both contexts.
Gemini doesn't seem to be as keen to agree with me so I find it makes small improvements where Claude and OpenAI will go along with initial suggestions until specifically asked to make improvements.
yousif_123123
I have noticed Gemini not accepting an instruction to "leave all other code the same but just modify this part" on a code that included use of an alpha API with a different interface than what Gemini knows is the correct current API. No matter how I promoted 2.5 pro, I couldn't get it to respect my use of the alpha API, it would just think I must be wrong.
So I think patterns from the training data are still overriding some actual logic/intelligence in the model. Or the Google assistant fine-tuning is messing it up.
sleiben
Same here. Especially for native app development with swift I had way better results and just sticked with Gemini-2.5-*
Graphon1
Just curious, what tool do you use to interface with these LLMs? Cursor? or Aider? or...
speedgoose
I’m on GitHub Copilot with VsCode Insiders, mostly because I don’t have to subscribe to one more thing.
They pretty quick to let you use the latest models nowadays.
yieldcrv
I also cancelled my Anthropic yesterday, not because of Gemini but because it was the absolute worst time for Anthropic to limit their Pro plan to upsell their Max plan when there is so much competition out there
Manus.im also does code generation in a nice UI, but I’ll probably be using Gemini and Deepseek
No Moat strikes again
wcarss
Google has killed so many amazing businesses -- entire industries, even, by giving people something expensive for free until the competition dies, and then they enshittify hard.
It's cool to have access to it, but please be careful not to mistake corporate loss leaders for authentic products.
gexla
It's not free. And it's legit one of the best models. And it was a Google employee who was among the authors of the paper that's most recognized as kicking all this off. They give somewhat limited access in AIStudio (I have only hit the limits via API access, so I don't know what the chat UI limits are.) Don't they all do this? Maybe harder limits and no free API access. But I think most people don't even know about AIStudio.
JPKab
True. They are ONLY good when they have competition. The sense of complacency that creeps in is so obvious as a customer.
To this day, the Google Home (or is it called Nest now?) speaker is the only physical product i've ever owned where it lost features over time. I used to be able to play the audio of a Youtube video (like a podcast) through it, but then Google decided that it was very very important that I only be able to play a Youtube video through a device with a screen, because it is imperative that I see a still image when I play a longform history podcast.
Obviously, this is a silly and highly specific example, but it is emblematic of how they neglect or enshittify massive swathes of their products as soon as the executive team loses interest and puts their A team on some shiny new object.
lxgr
How would I know if it’s useful to me without being able to trial it?
Googles previous approach (Pro models available only to Gemini Advanced subscribers, and Advanced trials can’t be stacked with Google One paid storage, or rather they convert the already paid storage portion to a paid, much shorter Advanced subscription!) was mind-bogglingly stupid.
Having a free tier on all models is the reasonable option here.
pdntspa
The usage limit for experimental gets used up pretty fast in a vibe-coding situation. I found myself setting up an API account with billing enabled just to keep going.
bredren
(Public) corporate loss leaders? Cause they are all likely corporate.
Also, Anthropic is also subsidizing queries, no? The new “5x” plan illustrative of this?
No doubt anthropic’s chat ux is the best right now, but it isn’t so far ahead on that or holding some UX moat that I can tell.
mark_l_watson
In this case, Google is a large investor in Anthropic.
I agree that giving away access to expensive models long term is not a good idea on several fronts. Personally, I subscribe to Gemini Advanced and I pay for using the Gemini APIs.
EDIT: a very good deal, at $10/month is https://apps.abacus.ai/chatllm/ that gives you access to almost all commercial models as well as the best open weight models. I have never come close at all to using my monthly credits with them. If you like to experiment with many models the service is a lot of fun.
bossyTeacher
Just look at Chrome to see the bard/gemini's future. HN folks didn't care about Chrome then but cry about Google's increasingly hostile development of Chrome.
Look at Android.
HN behaviour is more like a kid who sees the candy, wants the candy and eats as much as it can without worrying about the damaging effect that sugar will have on their health. Then, the diabetes diagnosis arrives and they complain
fsndz
More and more people are coming to the realisation that Google is actually winning at the model level right now.
zaphirplane
What’s with the Google cheer squad in this thread, usually it’s Google lost its way and is evil.
Can’t be employees cause usually there is a disclaimer
pjerem
Google can be evil and release impressive language models. The same way as Apple releasing incredible hardware with good privacy while also being a totally insufferable and arrogant company.
bitpush
Gemini 2.5 is genuinely impressive.
crowbahr
Google employees only have to disclaimer when they're identified as Google employees.
So shit like "as a googler" requires "my opinions are my own yadda yadda"
MagicMoonlight
I haven’t met a single person that uses Gemini. Companies are using Copilot and individuals are using ChatGPT.
Also, why would I want Google to spy on my AI usage? They’re evil.
fsndz
why is Google more evil than say OpenAI ?
teleforce
>obsequious
Thanks for the new word, I have to look it up.
"obedient or attentive to an excessive or servile degree"
Apparently it means an AI that mindlessly follow your logic and instructions without reasoning and articulation is not good enough.
tkgally
Another useful word in this context is “sycophancy,” meaning excessive flattery or insincere agreement. Amanda Askell of Anthropic has used it to describe a trait they try to suppress in Claude:
davidsainez
The second example she uses is really important. You (used to) see this a lot in stackoverflow where an inexperienced programmer asks how to do some convoluted thing. Sure, you can explain how to do the thing while maintaining their artificial constraints. But much more useful is to say "you probably want to approach the problem like this instead". It is surely a difficult problem and context dependent.
snthpy
Interesting that Americans appear to hold their AI models to a higher standard than their politicians.
null
zoogeny
It's a bit of a fancy way to say "yes man". Like in corporations or politics, if a leader surrounds themselves with "yes men".
A synonym would be sycophantic which would be "behaving or done in an obsequious way in order to gain advantage." The connotation is the other party misrepresents their own opinion in order to gain favor or avoid disapproval from someone of a higher status. Like when a subordinate tries to guess what their superior wants to hear instead of providing an unbiased response.
I think that accurately describes my experience with some LLMs due to heavy handed RLHF towards agreeableness.
In fact, I think obsequious is a better word since it doesn't have the cynical connotation of sycophant. LLMs don't have a motive and obsequious describes the behavior without specifying the intent.
teleforce
Yes, that's the first two words that come to my mind when I read the meaning. The Gen Z word now I think is "simp".
sans_souse
I wonder if anyone here will know this one; I learned the word "obsequious" over a decade ago while working the line of a restaurant. I used to listen to the 2p2 (2 plus 2) poker podcasts during prep and they had a regular feature with David Sklansky (iirc) giving tips, stories, advice etc. This particular one he simply gave the word "obsequious" and defined it later. I remember my sous chef and I were debating what it could mean and I guessed it right. I still can't remember what it had to do with poker, but that's besides the point.
Maybe I can locate it
sicromoft
I didn't hear that one but I am a fan of Sklansky. And I also have a very vivid memory of learning the word, when I first heard the song Turn Around by They Might Be Giants. The connection with the song burned it into my memory.
nemomarx
I think here it's referring to a common problem where the AI agrees with your position too easily, and/or changes it's answer if you tell it the answer is wrong instantly (therefore providing no stable true answer if you asked it something about a fact)?
Also the slightly over cheery tone maybe.
lylah69
I like to do this with Claude. It takes 5 back & forths to get an uncertain answer.
Is there a way to tackle this?
dr_kiszonka
I was a big fan of that model but it has been replaced in AI Studio by its preview version, which, by comparison, is pretty bad. I hope Google makes the release version much closer to the experimental one.
zoogeny
I can confirm the model name in Run Settings has been updated to "Gemini 2.5 Pro Preview ..." when it used to be "Gemini 2.5 Pro (Experimental) ...".
I cannot confirm if the quality is downgraded since I haven't had enough time with it. But if what you are saying is correct, I would be very sad. My big fear is the full-fat Gemini 2.5 Pro will be prohibitively expensive, but a dumbed down model (for the sake of cost) would also be saddening.
gundmc
The AI Studio product lead said on Twitter that it is exactly the same model just renamed for clarity when pricing was announced
dieortin
The preview version is exactly the same as the experimental one afaik
MetaWhirledPeas
> 100% of my casual AI usage is now in Gemini and I look forward to asking it questions on deep topics because it consistently provides me with insight.
It's probably great for lots of things but it doesn't seem very good for recent news. I asked it about recent accusations around xAI and methane gas turbines and it had no clue what I was talking about. I asked the same question to Grok and it gave me all sorts of details.
arizen
This was my experience as well.
Gemini performing the best on coding tasks, while giving underwhelming responses on recent news.
While Grok was OK for coding tasks, but being linked to X, provided best response on recent events.
ramesh31
>It's probably great for lots of things but it doesn't seem very good for recent news.
You are missing the point here. The LLM is just the “reasoning engine” for agents now. Its corpus of facts are meaningless, and shouldn’t really be relied upon for anything. But in conjunction with a tool calling agentic process, with access to the web, what you described is now trivially doable. Single shot LLM usage is not really anything anyone should be doing anymore.
MetaWhirledPeas
> You are missing the point here.
I'm just discussing the GP's topic of casual use. Casual use implies heading over to an already-hosted prompt and typing in questions. Implementing my own 'agentic process' does not sound very casual to me.
darksaints
That’s all fine and dandy, but if you google anything related to llm agents, you get 1000 answers to 100 questions, companies hawking their new “visual programming” agent composers, and a ton of videos of douchebags trying to be the Steve Jobs of AI. The concept I’m sure is fine, but execution of agentic anything is still the Wild Wild West and nobody knows what they’re really doing.
rgoulter
The 1 million token context window also means you can just copy/paste so much source code or log output.
jofzar
My work doesn't have access to 2.5 pro and all these posts are just making me want it so much more.
I hate how slow things are sometimes.
basch
Can’t you just go into aistudio with any free gmail account?
sciurus
For many workplaces, it's not just that that don't pay for a service, it's that using it is against policy. If I tried to paste some code into ChatGPT, for example, our data loss prevention spyware would block it and I'd soon be having an uncomfortable conversation with our security team.
(We do have access to GitHub Copilot)
PerusingAround
This comment is exactly my experience, I feel like as if I had wrote it myself.
simonw
An often overlooked feature of the Gemini models is that they can write and execute Python code directly via their API.
My llm-gemini plugin supports that: https://github.com/simonw/llm-gemini
uv tool install llm
llm install llm-gemini
llm keys set gemini
# paste key here
llm -m gemini-2.5-flash-preview-04-17 \
-o code_excution 1 \
'render a mandelbrot fractal in ascii art'
I ran that just now and got this: https://gist.github.com/simonw/cb431005c0e0535343d6977a7c470...They don't charge anything extra for code execution, you just pay for input and output tokens. The above example used 10 input, 1,531 output which is $0.15/million for input and $3.50/million output for Gemini 2.5 Flash with thinking enabled, so 0.536 cents (just over half a cent) for this prompt.
pantsforbirds
See a example full in a few commands using uv think "wow I bet that Simon guy from twitter would love this" ... it's already him.
blahgeek
> An often overlooked feature of the Gemini models is that they can write and execute Python code directly via their API.
Could you elaborate? I thought function calling is a common feature among models from different providers
simonw
The Gemini API runs the Python code for you as part of your single API call, without you having to handle the tool call request yourself.
tempaccount420
This is so much cheaper than re-prompting each tool use.
I wish this was extended to things like: you could give the model an API endpoint that it can call to execute JS code, and the only requirement is that your API has to respond within 5 seconds (maybe less actually).
I wonder if this is what OpenAI is planning to do in the upcoming API update to support tools in o3.
WiSaGaN
This common feature requires the user of the API to implement the tool, in this case, the user is responsible to run the code the API outputs. The post you replied suggests that Gemini will run the code for the user behind the API call.
tempoponet
That was how I read it as well, as if it had a built-in lambda type service in the cloud.
If we're just talking about some API support to call python scripts, that's pretty basic to wire up with any model that supports tool use.
throaway920181
I wish Gemini could do this with Go. It generates plenty of junk/non-parseable code and I have to feed it the error messages and hope it properly corrects it.
arnaudsm
Gemini flash models have the least hype, but in my experience in production have the best bang for the buck and multimodal tooling.
Google is silently winning the AI race.
Nihilartikel
100% agree. I had Gemini flash 2 chew through thousands of points of nasty unstructured client data and it did a 'better than human intern' level conversion into clean structured output for about $30 of API usage. I am sold. 2.5 pro experimental is a different league though for coding. I'm leveraging it for massive refactoring now and it is almost magical.
jdthedisciple
> thousands of points of nasty unstructured client data
What I always wonder in these kinds of cases is: What makes you confident the AI actually did a good job since presumably you haven't looked at the thousands of client data yourself?
For all you know it made up 50% of the result.
mediaman
This was solved a hundred years ago.
It's the same problem factories have: they produce a lot of parts, and it's very expensive to put a full operator or more on a machine to do 100% part inspection. And the machines aren't perfect, so we can't just trust that they work.
So starting in the 1920s Walter Shewhart and Edward Deming came up with Statistical Process Control. We accept the quality of the product produced based on the variance we see of samples, and how they measure against upper and lower control limits.
Based on that, we can estimate a "good parts rate" (which later got used in ideas like Six Sigma to describe the probability of bad parts being passed).
The software industry was built on determinism, but now software engineers will need to learn the statistical methods created by engineers who have forever lived in the stochastic world of making physical products.
tominous
In my case I had hundreds of invoices in a not-very-consistent PDF format which I had contemporaneously tracked in spreadsheets. After data extraction (pdftotext + OpenAI API), I cross-checked against the spreadsheets, and for any discrepancies I reviewed the original PDFs and old bank statements.
The main issue I had was it was surprisingly hard to get the model to consistently strip commas from dollar values, which broke the csv output I asked for. I gave up on prompt engineering it to perfection, and just looped around it with a regex check.
Otherwise, accuracy was extremely good and it surfaced a few errors in my spreadsheets over the years.
Nihilartikel
For what it's worth, I did check over many hundreds of them. Formatted things for side by side comparison and ordered by some heuristics of data nastiness.
It wasn't a one shot deal at all. I found the ambiguous modalities in the data and hand corrected examples to include in the prompt. After about 10 corrections and some exposition about the cases it seemed to misundestand, it got really good. Edit: not too different from a feedback loop with an intern ;)
summerlight
Though the same logic can be applied to everywhere, right? Even if it's done by human interns, you need to audit everything to be 100% confident or just have some trust on them.
jofzar
It also depends on what you are using the data for, if it's for non (precise) data based decisions then it's fine. Specially if you looking for "vibe" based decisions before then dedicating time to "actually" process the data for confirmation.
30$ to get an view into data that would take at least x many hours of someone's time is actually super cheap, specially if the decision of that result is then to invest or not invest the x many hours to confirm it.
pamplemoose
You take a sample and check
golergka
Many types of data have very easily checkable aggregates. Think accounting books.
FooBarWidget
You can use AI to verify its own work. Last time I split a C++ header file into header + implementation file. I noticed some code got rewritten in a wrong manner, so I asked it to compare the new implementation file against the original header file, but to do so one method at a time. For each method, say whether the code is exactly the same and has the same behavior, ignoring superficial syntax changes and renames. Took me a few times to get the prompt right, though.
tcgv
> I'm leveraging it for massive refactoring now and it is almost magical.
Can you share more about your strategy for "massive refactoring" with Gemini?
Like the steps in general for processing your codebase, and even your main goals for the refactoring.
roygbiv2
Isn't it better to get gemini to create a tool to format the data? Or was it in such a state that that would have been impossible?
cdelsolar
what tool are you using 2.5-pro-exp through? Cline? Or the browser directly?
Nihilartikel
For 2.5 pro exp I've been attaching files into AIStudio in the browser in some cases. In others, I have been using vscode's Gemini Code Assist which I believe recently started using 2.5 Pro. Though at one point I noticed that it was acting noticeably dumber, and over in the corner, sure enough it warned that it had reverted to 2.0 due to heavy traffic.
For the bulk data processing I just used the python API and Jupyter notebooks to build things out, since it was a one-time effort.
manmal
Copilot experimental (need VSCode Insiders) has it. I‘ve thought about trying aider —-watch-files though, also works with multiple files.
statements
Absolutely agree. Granted, it is task dependent. But when it comes to classification and attribute extraction, I've been using 2.0 Flash with huge access across massive datasets. It would not be even viable cost wise with other models.
sethkim
How "huge" are these datasets? Did you build your own tooling to accomplish this?
bhl
It's cheap but also lazy. It sometimes generates empty strings or empty arrays for tool calls, and then I just re-route the request to a stronger model for the tool call.
I've spent a lot of time on prompts and tool-calls to get Flash models to reason and execute well. When I give the same context to stronger models like 4o or Gemini 2.5 Pro, it's able to get to the same answers in less steps but at higher token cost.
Which is to be expected: more guardrails for smaller, weaker models. But then it's a tradeoff; no easy way to pick which models to use.
Instead of SQL optimization, it's now model optimization.
spruce_tips
i have a high volume task i wrote an eval for and was pleasantly surprised at 2.0 flash's cost to value ratio especially compared to gpt4.1-mini/nano
accuracy | input price | output price
Gemini Flash 2.0 Lite: 67% | $0.075 | $0.30
Gemini Flash 2.0: 93% | $0.10 | $0.40
GPT-4.1-mini: 93% | $0.40 | $1.60
GPT-4.1-nano: 43% | $0.10 | $0.40
excited to to try out 2.5 flash
jay_kyburz
Can I ask a serious question. What task are you writing where its ok to get 7% error rate. I can't get my head around how this can be used.
16bytes
There are tons of AI/ML use-cases where 7% is acceptable.
Historically speaking, if you had a 15% word error rate in speech recognition, it would generally be considered useful. 7% would be performing well, and <5% would be near the top of the market.
Typically, your error rate just needs to be below the usefulness threshold and in many cases the cost of errors is pretty small.
omneity
In my case, I have workloads like this where it’s possible to verify the correctness of the result after inference, so any success rate is better than 0 as it’s possible to identify the “good ones”.
spruce_tips
low stakes text classification but it's something that needs to be done and couldnt be done in reasonable time frames or at reasonable price points by humans
muzani
I expect some manual correction after the work is done. I actually mentally counted all the times I pressed backspace while writing this paragraph, and it comes down to 45. I'm not counting the next paragraph or changing the number.
Humans make a ton of errors as well. I didn't even notice how many I was making here until I started counting it. AI is super useful to just write get a first draft out, not for the final work.
sroussey
You could be OCRing a page that includes a summation line, then add up all the numbers and check against the sum.
dist-epoch
[flagged]
ghurtado
I know it's a single data point, but yesterday I showed it a diagram of my fairly complex micropython program, (including RP2 specific features, DMA and PIO) and it was able to describe in detail not just the structure of the program, but also exactly what it does and how it does it. This is before seeing a single like of code, just going by boxes and arrows.
The other AIs I have shown the same diagram to, have all struggled to make sense of it.
redbell
> Google is silently winning the AI race
Yep, I agree! This convinced me: https://news.ycombinator.com/item?id=43661235
Layvier
Absolutely. So many use cases for it, and it's so cheap/fast/reliable
SparkyMcUnicorn
And stellar OCR performance. Flash 2.0 is cheaper and more accurate than AWS Textract, Google Document AI, etc.
Not only in benchmarks[0], but in my own production usage.
danielbln
I want to use these almost too cheap to meter models like Flash more, what are some interesting use cases for those?
42lux
The API is free, and it's great for everyday tasks. So yes there is no better bang for the buck.
drusepth
Wait, the API is free? I thought you had to use their web interface for it to be free. How do you use the API for free?
dcre
You can get an API key and they don't bill you. Free tier rate limits for some models (even decent ones like Gemini 2.0 Flash) are quite high.
spruce_tips
create an api key and dont set up billing. pretty low rate limits and they use your data
midasz
I use Gemini 2.5 pro experimental via openrouter in my openwebui for free. Was using sonnet 3.7 but I don't notice much difference so just default to the free thing now.
null
mlboss
using aistudio.google.com
minimaxir
One hidden note from Gemini 2.5 Flash when diving deep into the documentation: for image inputs, not only can the model be instructed to generated 2D bounding boxes of relevant subjects, but it can also create segmentation masks! https://ai.google.dev/gemini-api/docs/image-understanding#se...
At this price point with the Flash model, creating segmentation masks is pretty nifty.
The segmentation masks are a bit of a galaxy brain implementation by generating a b64 string representing the mask: https://colab.research.google.com/github/google-gemini/cookb...
I am trying to test it in AI Studio but it sometimes errors out, likely because it tries to decode the b64 lol.
simonw
This is SO cool. I built an interactive tool for trying this out (bring your own Gemini API key) here: https://tools.simonwillison.net/gemini-mask
More details plus a screenshot of the tool working here: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...
I vibe coded it using Claude and O3.
ipsum2
The performance is basically so bad it's unusable though, segmentation models and object detection models are still the best, for now.
xnx
There is a starter app in AI Studio that demos this: https://aistudio.google.com/apps/bundled/spatial-understandi...
msp26
I've had mixed results with the bounding boxes even on 2.5 pro. On complex images where a lot of boxes need to be drawn they're in the general region but miss the exact location of objects.
behnamoh
Wait, did they just kill YOLO, at least for time-insensitive tasks?
daemonologist
No, the speed of YOLO/DETR inference makes it cheap as well - probably at least five or six orders of magnitude cheaper.
Edit: After some experimentation, Gemini also seems to not perform nearly as well as a purpose-tuned detection model.
It'll be interesting to test this capability and see how it evolves though. At some point you might be able use it as a "teacher" to generate training data for new tasks.minimaxir
YOLO is probably still cheaper if bounding boxes are your main goal. Good segmentation models that work for arbitrary labels, however, are much more expensive to set up and run, so this type of approach could be an interesting alternative depending on performance.
vunderba
Well no. You can run/host YOLO which means not having to submit potentially sensitive information to a company that generates a large amount of revenue from targeted advertising.
daemonologist
Interestingly if you run this in Gemini (instead of AI Studio) you get:
I am sorry, but I was unable to generate the segmentation masks for _ in the image due to an internal error with the tool required for this task.
(Not sure if that's a real or hallucinated error.)xbmcuser
For a non programmer like me google is becoming shockingly good. It is giving working code the first time. I was playing around with it asked it to write code to scrape some data of a website to analyse. I was expecting it to write something that would scrape the data and later I would upload the data to it to analyse. But it actually wrote code that scraped and analysed the data. It was basic categorizing and counting of the data but I was not expecting it to do that.
kccqzy
That's the opposite experience of my wife who's in tech but also a non programmer. She wanted to ask Gemini to write code to do some basic data analysis things in a more automated way than Excel. More than once, Gemini wrote a long bash script where some sed invocations are just plain wrong. More than once I've had to debug Gemini-written bash scripts. As a programmer I knew how bash scripts aren't great for readability so I told my wife to ask Gemini to write Python. It resulted in higher code quality, but still contained bugs that are impossible for a non programmer to fix. Sometimes asking a follow up about the bugs would cause Gemini to fix it, but doing so repeatedly will result in Gemini forgetting what's being asked or simply throwing an internal error.
Currently IMO you have to be a programmer to use Gemini to write programs effectively.
drob518
IMO, the only thing that’s consistent about AIs is how inconsistent they are. Sometimes, I ask them to write code and I’m shocked at how well it works. Other times, I feel like I’m trying to explain to a 5-year-old Alzheimer’s patient what I want and it just can’t seem to do the simplest stuff. And it’s the same AI in both cases.
greyadept
I wouldn’t be surprised if AI tools are frequently throttled in the backend to save on costs, resulting in this type of inconsistency.
xbmcuser
I had similar experiences few months back that is why I am saying it is becoming shockingly good the 2.5 is a lot better than the 2.0 version. Another thing I have realized just like google search in the past your query has a lot to do with the results you get. So an example of what you want works at getting better results
ac29
> I am saying it is becoming shockingly good the 2.5 is a lot better than the 2.0 version
Are you specifically talking about 2.5 Flash? It only came out an hour ago, I dont know how you would have enough experience with it already to come to your conclusion.
(I am very impressed with 2.5 Pro, but that is a different model that's been available for several weeks now)
sbarre
I've found that good prompting isn't just about asking for results but also giving hints/advice/direction on how to go about the work.
I suspect that if Gemini is giving you bash scripts it's because you're note giving it enough direction. As you pointed out, telling it to use Python, or giving it more expectations about how to go about the work or how the output should be, will give better results.
When I am prompting for technical or data-driven work, I tend to almost walk through what I imagine the process would be, including steps, tools, etc...
SweetSoftPillow
It must have something to do with the way your wife is prompting. I've noticed this with my friends too. I usually get working code from Gemini 2.5 Pro on the first try, and with a couple of follow-up prompts, it often improves significantly, while my friends seem to struggle communicating their ideas to the AI and get worse results.
Good news: Prompting is a skill you can develop.
gregorygoc
Is there a website with off the shelf prompts that work?
halfmatthalfcat
Or we can just learn to write it ourselves in the same amount of time /shrug
dmos62
Which Gemini was it? I've been using 2.5 Flash all day for programming ClojureScript via roo code and it's been great. Provided I'm using agent orchestration, a memory bank, and having it write docs for code it will work on.
yakz
Ask it to write tests with the code and then ask it to fix the errors from the tests rather than just pointing out bugs. If you have an IDE that supports tool use (Claude Code, Roo Code) it can automate this process.
jiggawatts
The AIs like many things out there work like an "evil genie". They'll give you what you asked for. The problem is typically that users ask for the wrong thing.
I've noticed beginners make mistakes like using singular terms when they should have used plural ("find the bug" vs "find the bugs"), or they fail to specify their preferred platform, language, or approach.
You mentioned your wife is using Excel, which is primarily used on Windows desktops and/or with the Microsoft ecosystem of products such as Power BI, PowerShell, Azure, SQL Server, etc...
Yet you mention she got a bash script using sed, both of which are from the Linux / GNU ecosystem. That implies that your wife didn't specify that she wanted a Microsoft-centric solution to her problem!
The correct answer here would have likely to have been to use Microsoft Fabric, which is an entire bag of data analysis and reporting tools that has data pipelines, automation, publishing, etc...
Or... just use the MashUp engine that's built-in to both Excel and PowerBI, which allows a surprisingly complex set of text, semi-structured, and tabular data processing. It can re-run the import and update graphs and charts with the new data.
PS: This is similar to going up to a Node.js programmer with a request. It doesn't matter what it is, they will recommend writing JavaScript to solve the problem. Similarly, a C++ developer will reach for C++ to solve everything they're asked to do. Right now, the AIs strongly prefer Linux, JavaScript, and especially Python for problem solving, because that's the bulk of the open-source code they were trained with.
999900000999
Let's hope that's the case for a while.
I want to be able to just tell chat GPT or whatever to create a full project for me, but I know the moment it can do that without any human intervention, I won't be able to find a job.
ModernMech
I've been continually disappointed. I've been told it's getting exponentially better and we won't be able to keep up with how good they get, but I'm not convinced. I'm using them every single day and I'm never shocked or awed by its competence, but instead continually vexxed that isn't not living up to the hype I keep reading.
Case in point: there was a post here recently about implementing a JS algorithm that highlighted headings as you scrolled (side note: can anyone remember what the title was? I can't find it again), but I wanted to test the LLM for that kind of task.
Pretty much no matter what I did, I couldn't get it to give me a solution that would highlight all of the titles down to the very last one.
I knew what the problem was, but even guiding the AI, it couldn't fix the code. I tried multiple AIs, different strategies. The best I could come up with was to guide it step by step on how to fix the code. Even telling it exactly what the problem was, it couldn't fix it.
So this goes out to the "you're prompting it wrong" crowd... Can you show me a prompt or a conversation that will get an AI to spit out working code for this task: JavaScript that will highlighting headings as you scroll, to the very last one. The challenge is to prompt it to do this without telling it how to implement it.
I figure this should be easy for the AI because this kind of thing is very standard, but maybe I'm just holding it wrong?
jsnell
Even as a human programmer I don't actually understand your description of the problem well enough to be confident I could correctly guess your intent.
What do you mean by "highlight as you scroll"? I guess you want a single heading highlighted at a time, and it should be somehow depending on the viewport. But even that is ambiguous. Do you want the topmost heading in the viewport? The bottom most? Depending on scroll direction?
This is what I got one-shot from Gemini 2.5 Pro, with my best guess at what you meant: https://gemini.google.com/share/d81c90ab0b9f
It seems pretty good. Handles scrolling via all possible ways, does the highlighting at load too so that the highlighting is in effect for the initial viewport too.
The prompt was "write me some javascript that higlights the topmost heading (h1, h2, etc) in the viewport as the document is scrolled in any way".
So I'm thinking your actual requirements are very different than what you actually wrote. That might explain why you did not have much luck with any LLMs.
ModernMech
> Even as a human programmer I don't actually understand your description of the problem well enough to be confident I could correctly guess your intent.
Yeah, you understand what I meant. The code Gemini gave you implements the behavior, and the AI I used gave me pretty much the same thing. There's a problem with the algorithm tho -- if there's a heading too close to the bottom of the page it will never highlight. The page doesn't exhibit the bug because it provides enough padding at the bottom.
But my point wasn't that it couldn't one-shot the code; my point was that I couldn't interrogate it into giving me code that behaved as I wanted. It seemed too anchored to the solution it had provided me, where it said it was offering fixes that didn't do anything, and when I pointed that out it apologized and proceeded to lie about fixing the code again. It appeared to be an infinite loop.
I think what's happened here is the opposite of what you suggest; this is a very common tutorial problem, you can find solutions of the variety you showed me all over the internet, and that's essentially what Gemini gave you. But being tutorial code, it's very basic and tries not to implement a more robust solution that is needed in production websites. When I asked AI for that extra robustness, it didn't want to stray too far from the template, and the bug persisted.
Maybe you can coax it into getting a better result? I want to understand how.
croemer
"Overengineered anchor links": https://news.ycombinator.com/item?id=43570324
ModernMech
Thank you!!
ant6n
Last time I tried Gemini, it messed with my google photo data plan and family sharing. I wish I could try the AI separate from my Google account.
jsnell
> I wish I could try the AI separate from my Google account.
If that's a concern, just create another account. Doesn't even require using a separate browser profile, you can be logged into multiple accounts at once and use the account picker in the top right of most their apps to switch.
xnx
50% price increase from Gemini 2.0 Flash. That sounds like a lot, but Flash is still so cheap when compared to other models of this (or lesser) quality. https://developers.googleblog.com/en/start-building-with-gem...
swyx
done pretty much inline with the price elo pareto frontier https://x.com/swyx/status/1912959140743586206/photo/1
oezi
So if I see it right flash 2.5 doesn't push the pareto front forward, right? It just sits between 2.5 pro and 2.0 flash.
https://storage.googleapis.com/gweb-developer-goog-blog-asse...
jsnell
It does, that point in the tradeoff space was not available until now. Any model that's not dominated by at least one model on both axes will push forward the frontier. (The actual frontier isn't actually a straight line between the points on the frontier like visualized there. It's a step function.)
swyx
yeah but 1) its useful to have the point there on the curve if you need it, 2) intelligence is multidimensional, maybe in 2.5 flash you get qualitatively a better set of capabilities for your needs than 2.5 pro
xnx
Love that chart! Am I imagining that I saw a version of that somewhere that even showed how the boundary has moved out over time?
onlyrealcuzzo
Why isn't Phi-3, Llama 3, or Mistral in the comparison?
Aren't there a lot of hosted options? How do they compare in terms of cost?
deanmoriarty
Genuine naive question: when it comes to Google HN has generally a negative view of it (pick any random story on Chrome, ads, search, web, working at faang, etc. and this should be obvious from the comments), yet when it comes to AI there is a somewhat notable “cheering effect” for Google to win the AI race that goes beyond a conventional appreciation of a healthy competitive landscape, which may appear as a bit of a double standard.
Why is this? Is it because OpenAI is seen as such a negative player in this ecosystem that Google “gets a pass on this one”?
And bonus question: what do people think will happen to OpenAI if Google wins the race? Do you think they’ll literally just go bust?
sagarpatil
Most of us weren’t using Gemini pro models (1.0, 1.5, 2.0) but the recent 2.5 pro is such a huge step up. It’s better than 3.7 sonnet for coding. Better than o1, o3-mini models and now o3 and o4-mini. It’s become my daily driver. It does everything I need with almost 100% accuracy, is cheap, fast, 1 million context window, uses google web search for grounding, can fetch YouTube video transcripts, can fetch website content, works in google workspace: Gmail, Docs, Sheets. Really hard to beat this combo. Oh and if you subscribe to their AI plan it comes with 2 TB drive storage.
antirez
Maybe because Google is largely responsible, paying for the research, of most of the results we are seeing now. I'm not a Google fan, in the web side, and in their idea of what software engineering is, but they deserve to win the AI race, because right now all the other players provided a lot less than what Google did as public research. Also, with Gemini 2.5 PRO, there was a big hype moment, because the model is of unseen ability.
wkat4242
Maybe they deserve it but it would be really bad for the world. Because they will enshittify the hell out of it once they're established. That's their MO.
I don't want Google to have a stranglehold over yet another type of online service. So I avoid them.
And things are going so fast now, whatever Google has today that might be better than the rest, in two months the rest will have it too. Of course Google will have something new again. But being 2 months behind isn't a huge deal. I don't have to have the 'winning' product. In fact most of my AI tasks go to an 8b llama 3.1 model. It's about on par with gpt 3.5 but that's fine.
visarga
The situation with LLMs is much different than search, Google doesn't have such a large lead here. LLMs are social things, they learn from each other, any provider with SOTA model will see its abilities leaked through synthetic training data. That's what GPT-4 did for a year, against the wishes of OpenAI, powering up millions of open model finetunes.
CephalopodMD
As a googler working in LLM space, this feels like revisionist history to me haha! I remember a completely different environment only a few months ago when Anthropic was the darling child, and before that it was OpenAI (and for like 4 weeks somewhere in there, it was Deepseek). For literally years at this point, every time Bard or Gemini would make a major release, it would be largely ignored or put down in favor of the next "big thing" OpenAI was doing or Claude saturating coding benchmarks, never mind that Google was often just behind with the exact same tech ready to go, in some cases only missing their demo release by literally 1 day (remember live voice?). And every time this happened, folks would be posting things to the effect of "LOL I can't believe Google is losing the AI race - didn't they invent this?", "this is like Microsoft dropping the ball on mobile", "Google is getting their lunch eaten by scrappy upstarts," etc. I can't lie, it stings a bit when that's what you work on all day.
2.5 was quite good. Not stupidly good like the jump from GPT 2 to 3 or 3.5 to 4, but really good. It was a big jump in ELO and benchmarks. People like it, and I think it's just psychologically satisfying that the player everybody would have expected to win the AI race is currently in the lead. Gemini finally gets a day in the sun.
I'm sure this will change with whenever somebody comes up with the next big idea though. It probably won't take much to beat Gemini in the long run. There is literally zero moat.
wyre
Gemini is just that good. From my usage it is much smarter than DeepSeek or Claude 3.7 Thinking models.
A lot of Google’s market share across its services comes from the monopoly effects Google has. The quality of Gemini 2.5 is noticeably smarter than its competitors so I see the applause for the quality of the LLM and not for Google.
I think it’s way too early to say anything about who is winning the race. There is still a long way to go; o3 scores highest in Humanity’s Last Exam (https://agi.safe.ai/) at 20%, 2.5 scores 18%.
int_19h
I dislike Google rather strongly due to their ad-based business model, and I was previously very skeptical of their AI offerings because of very lackluster performance compared to OpenAI and Claude. But I can't help but be impressed with Gemini Pro 2.5 for "deep research" and agentic coding. I have subscriptions with all three so that I can keep up with SOTA, but if I had to choose only one to keep, right now it'd be Gemini.
That said I still don't "cheer" for them and I would really rather someone else win the race. But that is orthogonal to recognition of observed objective superiority.
julianeon
It's been a while since they won something the "old" Google way: by building a superior product that is #1 on its merits.
In that sense Gemini is a throwback: there's no trick - it's objectively better than everything else.
01100011
Didn't Google invent the transformer?
I think a lot of us see Google as both an evil advertiser and as an innovator. Google winning AI is sort of nostalgic for those of us who once cheered the "Do No Evil"(now mostly "Do Know Evil") company.
I also like how Google is making quiet progress while other companies take their latest incremental improvement and promote it as hard as they can.
pkaye
I think for a while some people felt the Google AI models are worse but now its getting much better. On the other hand Google has their own hardware so they can drive down the costs of using the models so it keeps pressure on Open AI do remain cost competitive. Then you have Anthropic which has very good models but is very expensive. But I've heard they are working with Amazon to build a data center with Amazons custom AI chips so maybe they can bring down their costs. In the end all these companies will need a good model and lower cost hardware to succeed.
mmaunder
More great innovation from Google. OpenAI have two major problems.
The first is Google's vertically integrated chip pipeline and deep supply chain and operational knowledge when it comes to creating AI chips and putting them into production. They have a massive cost advantage at every step. This translates into more free services, cheaper paid services, more capabilities due to more affordable compute, and far more growth.
Second problem is data starvation and the unfair advantage that social media has when it comes to a source of continually refreshed knowledge. Now that the foundational model providers have churned through the common crawl and are competing to consume things like video and whatever is left, new data is becoming increasingly valuable as a differentiator, and more importantly, as a provider of sustained value for years to come.
SamA has signaled both of these problems when he made noises about building a fab a while back and is more recently making noises about launching a social media platform off OpenAI. The smart money among his investors know these issues to be fundamental in deciding if OAI will succeed or not, and are asking the hard questions.
If the only answer for both is "we'll build it from scratch", OpenAI is in very big trouble. And it seems that that is the best answer that SamA can come up with. I continue to believe that OpenAI will be the Netscape of the AI revolution.
The win is Google's for the taking, if they can get out of their own way.
throwup238
Nobody has really talked about what I think is an advantage just as powerful as the custom chips: Google Books. They already won a landmark fair use lawsuit against book publishers, digitized more books than anyone on earth, and used their Captcha service to crowdsource its OCR. They've got the best* legal cover and all of the best sources of human knowledge already there. Then Youtube for video.
The chips of course push them over the top. I don't know how much Deep Research is costing them but it's by far the best experience with AI I've had so far with a generous 20/day rate limit. At this point I must be using up at least 5-10 compute hours a day. Until about a week ago I had almost completely written off Google.
* For what it's worth, I don't know. IANAL
dynm
The amount of text in books is surprisingly finite. My best estimate was that there are ~10¹³ tokens available in all books (https://dynomight.net/scaling/#scaling-data), which is less than frontier models are already being trained on. On the other hand, book tokens are probably much "better" than random internet tokens. Wikipedia for example seems to get much higher weight than other sources, and it's only ~3×10¹⁰ tokens.
dr_dshiv
We need more books! On it…
paxys
LibGen already exists, and all the top LLM publishers use it. I don't know if Google's own book index provides a big technical or legal advantage.
disgruntledphd2
I'd be very surprised if the Google books index wasn't much bigger and more diverse than libgen.
jofzar
Something that is not specifically called out but is also super relevant is actually the transcription of YouTube videos.
Every video is machine transcribed and stored and then for larger videos the author will often transcribed them themselves.
This is something they have already, it doesn't need any more "work" to get it vs a competitor.
jppittma
I would think the biggest advantage is YouTube. There's a lot of modern content for analysis that's uncontaminated by LLMs.
Keyframe
Google has the data and has the hardware, not to mention software and infrastructure talent. Once this Bismarck turns around and it looks like it is, who can parry it for real? They have internet.zip and all the previous versions as well, they have youtube, email, search, books, traffic, maps and business on it, phones and habits around it, even the OG social network, the usenet. It's a sleeping giant starting to wake up and it's already causing commotion, let's see what it does when it drinks morning coffee.
kriro
Agreed. One of Google's big advantages is the data access and integrations. They are also positioned really well for the "AI as entertainment" sector with youtube which will be huge (imo). They also have the knowledge in adtech and well injecting adds into AI is an obvious play. As is harvesting AI chat data.
Meta and Google are the long term players to watch as Meta also has similar access (Insta, FB, WhatsApp).
whoisthemachine
On-demand GenAI could definitely change the meaning of "You" in "Youtube".
eastbound
They have the Excel spreadsheets of all startups and businesses of the world (well 50/50 with Microsoft).
And Atlassian has all the project data.
jjani
More like 5/95 with Microsoft - and that's being generous, I wouldn't be surprised if it was 1/99. It's basicaly just hip tech companies and a couple of Fortune 500s that use Google Docs. And even their finance departments often use Excel. HN keeps underestimating how the whole physical world runs on Excel.
Keyframe
I still can't understand how google missed on github, especially since they were in the same space before with google code. I do understand how they couldn't make a github though.
whyenot
Another advantage that Google has is the deep integration of Gemini into Google Office products and Gmail. I was part of a pilot group and got to use a pre-release version and it's really powerful and not something that will be easy for OpenAI to match.
mmaunder
Agreed. Once they dial in the training for sheets it's going to be incredible. I'm already using notebooklm to upload finance PDFs, then having it generate tabular data and copypasta into sheets, but it's a garage solution compared to just telling it to create or update a sheet with parsed data from other sheets, PDFs, docs, etc.
And as far as gmail goes, I periodically try to ask it to unsubscribe from everything marketing related, and not from my own company, but it's not even close to being there. I think there will continue to be a gap in the market for more aggressive email integration with AI, given how useless email has become. I know A16Z has invested in a startup working on this. I doubt Gmail will integrate as deep as is possible, so the opportunity will remain.
Workaccount2
I frankly am in doubt of future office products. In the last month I have ditched two separate excel productivity templates in favor of bespoke wrappers on sqlite databases, written by Claude and Gemini. Easier to use and probably 10x as fast.
You don't need a 50 function swiss army knife when your pocket can just generate the exact tool you need.
jdgoesmarching
You say deep integration, yet there is still no way to send a Gemini Canvas to Docs without a lot of tedious copy-pasting and formatting because Docs still doesn’t actually support markdown. Gemini in Google Office in general has been a massive disappointment for all but the most simplistic of writing tasks.
They can have the most advanced infrastructure in the world, but it doesn’t mean much if Google continues its infamous floundering approach to product. But hey, 2.5 pro with Cline is pretty nice.
whyenot
Maybe I'm misunderstanding, but there is literally a Share button in Canvas right below each response with the option to export to Docs. Within Docs, you can also click on the Gemini "star" at the upper right to get a prompt and then also export into the open document. Note that this is a with "experimental" Gemini 2.5 Pro.
disgruntledphd2
Docs supports markdown in comments, where it's the only way to get formatting.
I love Googles product dysfunction sometimes :/
chucky_z
I have access to this now and I want it to work so bad and it's just proper shit. Absolute rubbish.
They really, truly need to fix this integration. Gemini in Google Docs is barely acceptable, it doesn't work at all (for me) in Gmail, and I've not yet had it do anything other than error in Google Sheets.
nimchimpsky
[dead]
zoogeny
If the battle was between Altman and Pichai I'd have my doubts.
But the battle is between Altman and Hassabis.
I recall some advice on investment from Buffett regarding how he invests in the management team.
mdp2021
Could you please expand, on both your points?
zoogeny
It is more gut feel than a rational or carefully reasoned argument.
I think Pichai has been an exceptional revenue maximizer but he lacks vision. I think he is probably capable of squeezing tremendous revenue out of AI once it has been achieved.
I like Hassabis in a "good vibe" way when I hear him speak. He reminds me of engineers that I have worked with personally and have gained my respect. He feels less like a product focused leader and more of a research focused leader (AlphaZero/AlphaFold) which I think will be critical to continue the advances necessary to push the envelope. I like his focus on games and his background in RL.
Google's war chest of Ad money gives Hassabis the flexibility to invest in non-revenue generating directions in a way that Altman is unlikely to be able to do. Altman made a decision to pivot the company towards product which led to the exodus of early research talent.
mmaunder
Note sure why their comment was downvoted. Google the names. Hassabis runs DeepMind at Google which makes Gemini and he's quite brilliant and has an unbelievable track record. Buffet investing in teams points out that there are smart people out there that think good leadership is a good predictor of future success.
sidibe
Sorry but my eyes rolled to the back of my head with this one. This is between two teams with tons of smart contributors, but the difference is one is more flexible and able to take risks vs the other that has many times more researchers and the world's best and most mature infrastructure/tooling. Its not a CEO vs CEO battle
zoogeny
I think it requires a nuanced take but allow me to provide some counter-examples.
The first is CEO pay rates. Another is the highest paid public employees (which tend to be coaches at state schools). This is evidence that the market highly values managers.
Another is systemic failures within enterprises. When Boeing had a few very public plane crashes, a certain narrative suggested that the transition from highly capable engineer managers to financial focus managers contributed to the problem. A similar narrative has been used to explain the decline of Intel.
Consider the return of Steve Jobs to Apple. Or the turn around at Microsoft with Nadella.
All of these are complex cases that don't submit to an easy analysis. Success and failure are definitely multi-factor and rarely can be traced to a single definitive cause.
Perhaps another way to look at it would be: what percentage of the success of highly complex organizations can be attributed to management? To what degree can poor management decisions contribute to the failure of an otherwise capable organization?
How much you choose to weight those factors is entirely up to you.
edit: I was also thinking about the way we think about the advantage of exceptional generals/admirals in military analysis. Or the effect a president can have on the direction of a country.
peterjliu
another advantage is people want the Google bot to crawl their pages, unlike most AI companies
CobrastanJorji
Reddit was an interesting case here. They knew that they had particularly good AI training data, and they were able to hold it hostage from the Google crawler, which was an awfully high risk play given how important Google search results are to Reddit ads, but they likely knew that Reddit search results were also really important to Google. I would love to be able to watch those negotiations on each side; what a crazy high stakes negotiation that must've been.
mattlondon
Particularly good training data?
You can't mean the bottom-of-the-barrel dross that people post on Reddit, so not sure what data you are referring to? Click-stream?
mmaunder
This is an underrated comment. Yes it's a big advantage and probably a measurable pain point for Anthropic and OpenAI. In fact you could just do a 1% survey of robots.txt out there and get a reasonable picture. Maybe a fun project for an HN'er.
newfocogi
This is right on. I work for a company with somewhat of a data moat and AI aspirations. We spend a lot of time blocking everyone's bots except for Google. We have people whose entire job is it to make it faster for Google to access our data. We exist because Google accesses our data. We can't not let them have it.
jiocrag
Excellent point. If they can figure out how to either remunerate or drive traffic to third parties in conjunction with this, it would be huge.
onlyrealcuzzo
> The smart money among his investors know these issues to be fundamental in deciding if OAI will succeed or not, and are asking the hard questions.
OpenAI has already succeeded.
If it ends up being a $100B company instead of a $10T company, that is success. By a very large margin.
It's hard to imagine a world in which OpenAI just goes bankrupt and ends up being worth nothing.
samuel
I can, and I would say it's a likely scenario, say 30%. If they don't have a significant edge over their competitors in the capabilities of their models, what's left? A money losing web app, and some API services that I'm sure aren't very profitable either. They can't compete with Google, Grok, Meta, MS, Amazon... They just can't.
They can end being the Altavista of this era.
bdangubic
it goes bankrupt when the cost of running the business outweights the earnings in the long run
dyauspitr
I haven’t heard this much positive sentiment about Google in a while. Making something freely available really turns public sentiment around.
labrador
> If the only answer for both is "we'll build it from scratch", OpenAI is in very big trouble
They could buy Google+ code from Google and resurrect it with OpenAI branding. Alternately they could partner with Bluesky
parsimo2010
I don't think the issue is solving the technical implementation of a new social media platform. The issue is whether a new social media platform from OpenAI will deliver the kind of value that existing platforms deliver. If they promise investors that they'll get TikTok/Meta/YouTube levels of content+interaction (and all the data that comes with it), but deliver Mastodon levels, then they are in trouble.
simonw
I spotted something interesting in the Python API library code:
https://github.com/googleapis/python-genai/blob/473bf4b6b5a6...
class ThinkingConfig(_common.BaseModel):
"""The thinking features configuration."""
include_thoughts: Optional[bool] = Field(
default=None,
description="""Indicates whether to include thoughts in the response. If true, thoughts are returned only if the model supports thought and thoughts are available.
""",
)
thinking_budget: Optional[int] = Field(
default=None,
description="""Indicates the thinking budget in tokens.
""",
)
That thinking_budget thing is documented, but what's the deal with include_thoughts? It sounds like it's an option to have the API return the thought summary... but I can't figure out how to get it to work, and I've not found documentation or example code that uses it.Anyone managed to get Gemini to spit out thought summaries in its API using this option?
msp26
The API won't give you the "thinking" tokens, those are only visible on AI studio. Probably to try to stop distillation, very disappointing. I find reading the cot to be incredibly informative to identify failure modes.
> Hey Everyone,
> Moving forward, our team has made a decision to only show thoughts in Google AI Studio. Meaning, we no longer return thoughts via the Gemini API. Here is the updated doc to reflect that.
https://discuss.ai.google.dev/t/thoughts-are-missing-cot-not...
---
After I wrote all of that I see that the API docs page looks different today and now says:
>Note that a summarized version of the thinking process is available through both the API and Google AI Studio.
https://ai.google.dev/gemini-api/docs/thinking
Maybe they just updated it? Or people aren't on the same page at Google idk
Previously it said
> Models with thinking capabilities are available in Google AI Studio and through the Gemini API. Note that the thinking process is visible within Google AI Studio but is not provided as part of the API output.
https://web.archive.org/web/20250409174840/https://ai.google...
phillypham
They removed the docs and support for it https://github.com/googleapis/python-genai/commit/af3b339a9d....
You can see the thoughts in AI Studio UI as per https://ai.google.dev/gemini-api/docs/thinking#debugging-and....
lemming
I maintain an alternative client which I build from the API definitions at https://github.com/googleapis/googleapis, which according to https://github.com/googleapis/python-genai/issues/345 should be the right place. But neither the AI Studio nor the Vertex definitions even have ThinkingConfig yet - very frustrating. In general it's amazing how much API munging is required to get a working client from the public API definitions.
Deathmax
It is gated behind the GOOGLE_INTERNAL visibility flag, which only internal Google projects and Cursor have at the moment as far as I know.
qwertox
In AI Studio the flash moddels has two toggles: Enable thinking and Set thinking budget. If thinking budget is enabled, you can set tue max number of tokens it can use to think, else it's Auto.
serjester
Just ran it on one of our internal PDF (3 pages, medium difficulty) to json benchmarks:
gemini-flash-2.0: 60 ish% accuracy 6,250 pages per dollar
gemini-2.5-flash-preview (no thinking): 80 ish% accuracy 1,700 pages per dollar
gemini-2.5-flash-preview (with thinking): 80 ish% accuracy (not sure what's going on here) 350 pages per dollar
gemini-flash-2.5: 90 ish% accuracy 150 pages per dollar
I do wish they separated the thinking variant from the regular one - it's incredibly confusing when a model parameter dramatically impacts pricing.
ValveFan6969
I have been having similar performance issues, I believe they intentionally made a worse model (Gemini 2.5) to get more money out of you. However, there is a way where you can make money off of Gemini 2.5.
If you set the thinking parameter lower and lower, you can make the model spew absolute nonsense for the first response. It costs 10 cents per input / output, and sometimes you get a response that was just so bad your clients will ask for more and more corrections.
mpalmer
Wow, what apps have you made so I know never to use them?
alecco
Gemini models are very good but in my experience they tend to overdo the problems. When I give it things for context and something to rework, Gemini often reworks the problem.
For software it is barely useful because you want small commits for specific fixes not a whole refactor/rewrite. I tried many prompts but it's hard. Even when I give it function signatures of the APIs the code I want to fix uses, Gemini rewrites the API functions.
If anybody knows a prompt hack to avoid this, I'm all ears. Meanwhile I'm staying with Claude Pro.
byearthithatius
Yes, it will add INSANE amounts of "robust error handling" to quick scripts where I can be confident about assumptions. This turns my clean 40 lines of Python where I KNOW the JSONL I am parsing is valid into 200+ lines filled with ten new try except statements. Even when I tell it not to do this, it loves to "find and help" in other ways. Quite annoying. But overall it is pretty dang good. It even spotted a bug I missed the other day in a big 400+ line complex data processing file.
stavros
I didn't realize this was a bigger trend, I asked it to write a simple testing script that POSTed a string to a local HTTP server as JSON, and it wrote a 40 line script, handling any possible error. I just wanted two lines.
free_energy_min
same issue here! isn’t even helpful because if the code isn’t working i want it to fail, not just skip over errors
jug
Yes, as late as earlier today, I asked it to provide "naive" code which helped a bit.
zhengyi13
I wonder how much of that sort of thing is driven by having trained their models on their own internal codebases? Because if that's the case, careful and defensive being the default would be unsurprising.
w4yai
Here's what I found to be working (not 100% but it gives much better and consistant results)
Basically, I ask it to repeat at the start of each message some rules :
"From now on, you must repeat and comply the following rules at the top of all your messages onwards:
- I will never rewrite API functions. Even if I think it's a good idea, it is a bad idea. I will keep the API function as it is and it is perfect like that.
- I will never add extra input validation. Even if I think it's a good idea, it is a bad idea. I will keep the function without validation and it is perfect like that.
- ...
- If I violate any of those rules, I did a bad job. "
Forcing it to repeat things make the model output more aligned and focused in my experience.
dherikb
I have the same issue using it with Aider.
The model is good to solve problems, but is very difficult to control the unnecessary changes that the model does in the rest of the code. Also it adds a lot of unnecessary comments, even when I explicitly say to not add.
For now Deepseek R1 and V3 it's working better to me, producing more predictable results and capturing better my intentions (not tried Claude yet).
thimabi
I find it baffling that Google offers such impressive models through the API and even the free AI Studio with fine-grained control, yet the models used in the Gemini app feel much worse.
Over the past few weeks, I’ve been using Gemini Advanced on my Workspace account. There, the models think for shorter times, provide shorter outputs, and even their context window is far from the advertised 1 million tokens. It makes me think that Google is intentionally limiting the Gemini app.
Perhaps the goal is to steer users toward the API or AI Studio, with the free tier that involves data collection for training purposes.
_delirium
This might have changed after you posted your comment, but it looks like 2.5 Pro and 2.5 Flash are available in the Gemini app now, both web and mobile.
thimabi
Oh, I didn’t mean to say that these models were unavailable through the app or website. Rather, I’ve realized that using them through the API or AI Studio yields much better results — even in the free tier.
You can check that by trying prompts with complex instructions and long inputs/outputs.
For instance, ask Gemini to generate notes from a specific source (say, a book or class transcription). Or ask it to translate a long article, full of idiomatic expressions, while maintaining high fidelity to the source. You will see that the very same Gemini models are underutilized on the app or the website, while their performance is stellar on the API or AI Studio.
mpalmer
Underutilized, or over-prompted for the layperson?
Google making Gemini 2.5 Pro (Experimental) free was a big deal. I haven't tried the more expensive OpenAI models so I can't even compare, only to the free models I have used of theirs in the past.
Gemini 2.5 Pro is so much of a step up (IME) that I've become sold on Google's models in general. It not only is smarter than me on most of the subjects I engage with it, it also isn't completely obsequious. The model pushes back on me rather than contorting itself to find a way to agree.
100% of my casual AI usage is now in Gemini and I look forward to asking it questions on deep topics because it consistently provides me with insight. I am building new tools with the mind to optimize my usage to increase it's value to me.