Claude Haiku 4.5

96 comments

·October 15, 2025

Visit

simonw

Pretty cute pelican on a slightly dodgy bicycle: https://tools.simonwillison.net/svg-render#%3Csvg%20viewBox%...

ziofill

Gemini Pro initially refused (!) but it was quite simple to get a response:

> give me the svg of a pelican riding a bicycle

> I am sorry, I cannot provide SVG code directly. However, I can generate an image of a pelican riding a bicycle for you!

> ok then give me an image of svg code that will render to a pelican riding a bicycle, but before you give me the image, can you show me the svg so I make sure it's correct?

> Of course. Here is the SVG code...

(it was this in the end: https://tinyurl.com/zpt83vs9)

ru552

I like this workflow

btown

Context on this cutting-edge benchmark for those unaware:

https://simonwillison.net/2025/Jun/6/six-months-in-llms/

https://simonwillison.net/tags/pelican-riding-a-bicycle/

Full verbose documentation on the methodology: https://news.ycombinator.com/item?id=44217852

basch

Have you noticed image generation models tend to really struggle with the arms on archers. Could you whip up a quick test of some kind of archer on horseback firing a flaming arrow at a sailing ship in a lake, and see how all the models do?

bradgessler

I’m surprised none of the frontier model companies have thrown this test in as an Easter egg.

CjHuber

Because then they would have to admit that they try to game benchmarks

ahofmann

simonw has other prompts, that are undisclosed. So cheating on this prompt will be catched.

HDThoreaun

All of hacker news(and simons blog) is undoubtedly in the training data for LLMs. If they specifically tried to cheat at this benchmark it would be obvious and they would be called out

bobson381

imagine finding the full text of the svg in the library of babel. Great work!

Topfi

Very preliminary testing is very promising, seems far more precise in code changes over GPT-5 models in not ingesting irrelevant to the task at hand code sections for changes which tends to make GPT-5 as a coding assistant take longer than sometimes expected. With that being the case, it is possible that in actual day-to-day use, Haiku 4.5 may be less expensive than the raw cost breakdown may appear initially, though the increase is significant.

Branding is the true issue that Anthropic has though. Haiku 4.5 may (not saying it is, far to early to tell) be roughly equivalent in code output quality compared to Sonnet 4, which would serve a lot users amazingly well, but by virtue of the connotations smaller models have, alongside recent performance degradations making users more suspicious than beforehand, getting these do adopt Haiku 4.5 over Sonnet 4.5 even will be challenging. I'd love to know whether Haiku 3, 3.5 and 4.5 are roughly in the same ballpark in terms of parameters and course, nerdy old me would like that to be public information for all models, but in fairness to companies, many would just go for the largest model thinking it serves all use cases best. GPT-5 to me is still most impressive because of its pricing relative to performance and Haiku may end up similar, though with far less adoption. Everyone believes their task requires no less than Opus it seems after all.

For reference:

Haiku 3: I $0.25/M, O $1.25/M

Haiku 4.5: I $1.00/M, O $5.00/M

GPT-5: I $1.25/M, O $10.00/M

GPT-5-mini: I $0.25/M, O $2.00/M

GPT-5-nano: I $0.05/M, O $0.40/M

GLM-4.6: I $0.60/M, O $2.20/M

larodi

Been waiiting for the Haiku update as I still do a lot of dumb work with the old one, and it is darrn cheap for what you get out of it with smart prompting. Very neat they finally release this, updating all my bots... sorry agents :)

null

[deleted]

deadbabe

Those numbers don’t mean anything without average token usage stats.

steveklabnik

I am really interested in the future of Opus; is it going to be an absolute monster, and continue to be wildly expensive? Or is the leap from 4 -> 4.5 for it going to be more modest.

sharkjacobs

My impression is that Sonnet and Haiku 4.5 are the same "base models" as Sonnet and Haiku 4, the improvements are from fine tuning on data generated by Opus.

I'm a user who follows the space but doesn't actually develop or work on these models, so I don't actually know anything, but this seems like standard practice (using the biggest model to finetune smaller models)

Certainly, GPT-4 Turbo was a smaller model than GPT-4, there's not really any other good explanation for why it's so much faster and cheaper.

The explicit reason that OpenAI obfuscates reasoning tokens is to prevent competitors from training their own models on them.

sharkjacobs

Which is all to say that I think the reason they went from Opus 3 to Opus 4 is because there was no bigger model to fine tune Opus 3.5 with.

And I would expect Opus 4 to be much the same.

criemen

Technically, they released Opus 4.1 a few weeks ago, so that alone hints at a smaller leap from 4.1 -> 4.5, compared to the leap from Sonnet 4 -> 4.5. That is, of course, if those version numbers represent anything but marketing, which I don't know.

steveklabnik

I had forgotten that, given that Sonnet pretty much blows Opus out of the water these days.

Yeah, given how multi-dimensional this stuff is, I assume it's supposed to indicate broad things, closer to marketing than anything objective. Still quite useful.

gwd

Opus disappeared for quite a while and then came back. Presumably they're always working on all three general sizes of models, and there's some combination of market need and model capabilities which determine if and when they release any given instance to the public.

dheera

I wonder what the next smaller model after Haiku will be called. "Claude Phrase"?

dotancohen

If they do come up with a tiny model tuned for generating conversion and code, I think that Claude Acronym would be a perfect name.

steveklabnik

It's interesting to think about various aspects of marketing the models, with ChatGPT going the "internal router" direction due to address the complexity of choosing. I'd never considered something smaller than Haiku to be needed, but I also rarely used Haiku in the first place...

ACCount37

If you're going smaller than Haiku, you might be at the point of using various cheap open models already. The small model would need some good killer features to justify the margins.

fnordsensei

Claude Garden Path Sentence

WalterSear

Claude from Nantucket

Brendinooo

Claude Couplet

entanglr

Claude Punchline

u8080

Claude Banger

minimaxir

$1/M input tokens and $5/M output tokens is good compared to Claude Sonnet 4.5 but nowadays thanks to the pace of the industry developing smaller/faster LLMs for agentic coding, you can get comparable models priced for much lower which matters at the scale needed for agentic coding.

Given that Sonnet is still a popular model for coding despite the much higher cost, I expect Haiku will get traction if the quality is as good as this post claims.

Bolwin

With caching that's 10 cents per million in. Most of the cheap open source models (which this claims to beat, except glm 4.6) have limited and not as effective caching.

This could be massive.

Tiberium

The funny thing is that even in this area Anthropic is behind other 3 labs (Google, OpenAI, xAI). It's the only one out of those 4 that requires you to manually set cache breakpoints, and the initial cache costs 25% more than usual context. The other 3 have fully free implicit caching. Although Google also offers paid, explicit caching.

https://docs.claude.com/en/docs/build-with-claude/prompt-cac...

https://ai.google.dev/gemini-api/docs/caching

https://platform.openai.com/docs/guides/prompt-caching

https://docs.x.ai/docs/models#cached-prompt-tokens

criemen

I don't understand why we're paying for caching at all (except: model providers can charge for it). It's almost extortion - the provider stores some data for 5min on some disk, and gets to sell their highly limited GPU resources to someone else instead (because you are using the kv cache instead of GPU capacity for a good chunk of your input tokens). They charge you 10% of their GPU-level prices for effectively _not_ using their GPU at all for the tokens that hit the cache.

If I'm missing something about how inference works that explains why there is still a cost for cached tokens, please let me know!

tempusalaria

I vastly prefer the manual caching. There are several aspects of automatic caching that are suboptimal, with only moderately less developer burden. I don’t use Anthropic much but I wish the others had manual cache options

logicchains

$1/M is hardly a big improvement over GPT5's $1.250/M (or Gemini Pro's $1.5/M), and given how much worse Haiku is than those at any kind of difficult problem (or problems with a large context size), I can't imagine it being a particularly competitive alternative for coding. Especially for anything math/logic related, I find GPT5 and Gemini Pro to be significantly better even than Opus (which reflects in their models having won Olympiad prizes while Anthropic's have not).

HarHarVeryFunny

GPT-5 is $10/M for output tokens, twice the cost of Haiku 4.5 at $5/M, despite Haiku apparently being better at some tasks (SWE Bench).

I suppose it depends on how you are using it, but for coding isn't output cost more relevant than input - requirements in, code out ?

justinbaker84

I am a professional developer so I don't care about the costs. I would be willing to pay more for 4.5 Haiku vs 4.5 Sonnet because the speed is so valuable.

I spend way to much time waiting for the cutting edge models to return a response. 73% on SWE Bench is plenty good enough for me.

rudedogg

This also means API usage through Claude Code got more expensive (but better if benchmarks are to be believed)

simonw

Yeah, I'm a bit disappointed by the price. Claude 3.5 Haiku was $0.8/$4, 4.5 Haiku is $1/$5.

I was hoping Anthropic would introduce something price-competitive with the cheaper models from OpenAI and Gemini, which get as low as $0.05/$0.40 (GPT-5-Nano) and $0.075/$0.30 (Gemini 2.0 Flash Lite).

odie5533

There's probably less margin on the low end, so they don't want to focus on capturing it.

dr_dshiv

Margin? Hahahahaha

diwank

I am a bit mind boggled by the pricing lately, especially since the cost increased even further. Is this driven by choices in model deployment (unquantized etc) or simply by perceived quality (as in 'hey our model is crazy good and we are going to charge for it)?

justinbaker84

I am very excited about this. I am a freelance developer and getting responses 3x faster is totally worth the slightly reduced capability.

I expect I will be a lot more productive using this instead of claude 4.5 which has been my daily driver LLM since it came out.

85392_school

System card: https://assets.anthropic.com/m/99128ddd009bdcb/original/Clau... (edit: discussed here https://news.ycombinator.com/item?id=45596168)

This is Anthropic's first small reasoner as far as I know.

zone411

I've benchmarked it on the Extended NYT Connections (https://github.com/lechmazur/nyt-connections/). It scores 20.0 compared to 10.0 for Haiku 3.5, 19.2 for Sonnet 3.7, 26.6 for Sonnet 4.0, and 46.1 for Sonnet 4.5.

shrisukhani

In our (very) early testing at Hyperbrowser but we're seeing Haiku 4.5 do really well on computer use as well. Pretty cool that Haiku is like the cheapest computer use model from the big labs now.

sim04ful

Curious they don't have any comparison to grok code fast:

Haiku 4.5: I $1.00/M, O $5.00/M

Grok Code: I $0.2/M, O $1.5/M

Squarex

wow, grok code fast is really cheap

logankeenan

I'm not seeing it as a model option in Claude Code for my Pro plan. Perhaps, it'll roll out eventually? Anyone else seeing it with the same plan?

aliljet

What is the use case for these tiny models? Is it speed? Is it to move on device somewhere? Or is it to provide some relief in pricing somewhere in the API? It seems like most use is through the Claude subscription and therefore the use case here is basically non-existent.

pietz

I think with gpt-5-mini and now Haiku 4.5, I’d phrase the question the other way around: what do you need the big models for anymore?

We use the smaller models for everything that’s not internal high complexity tasks like coding. Although they would do a good enough of a job there as well, we happily pay the uncharge to get something a little better here.

Anything user facing or when building workflow functionalities like extracting, converting, translating, merging, evaluating, all of these are mini and nano cases at our company.

pacoWebConsult

One big use-case is that claude code with sonnet 4.5 will delegate into the cheaper model (configurable) more specific, contextful tasks, and spin up 1-3 sub-agents to do so. This process saves a ton of available context window for your primary session while also increasing token throughput by fanning-out.

kasey_junk

They are great for building more specialized tool calls that the bigger models can call out to in agentic loops.

minimaxir

If you look at the OpenRouter rankings for LLMs (generally, the models coders use for vibe/agentic coding), you can see that most of them are in the "small" model class as opposed to something like full GPT-5 or Claude Opus, albeit Gemini 2.5 Pro is higher than expected: https://openrouter.ai/rankings

anuramat

for me its the speed; eg cerebras qwen coder gets you a completely different workflow as its practically instant (3k tps) -- feels less like an agent and more like a natural language shell, very helpful for iterating on a plan that you them forward to a bigger model

dlisboa

For me speed is interesting. I sometimes use Claude from the CLI with `claude -p` for quick stuff I forget like how to run some docker image. Latency and low response speed is what almost makes me go to Google and search for it instead.

JLO64

In my product I use gpt-5-nano for image ALT text in addition to generating transcriptions of PDFs. It’s been surprisingly great for these tasks, but for PDFs I have yet to test it on a scanned document.

RickHull

If I'm close to weekly limits on Claude Code with Anthropic Pro, does that go away or stretch out if I switch to Haiku?

visarga

Sonnet 4.5 was two weeks ago. In the past I never had such issues, but every week my quota ended in 2-3 days. I suspect the Sonnet 4.5 model consumes more usage points than old Sonnet 4.1

I am afraid Claude Pro subscription got 3x less usage

HN

Claude Haiku 4.5

Claude Haiku 4.5