Show HN: Price Per Token – LLM API Pricing Data

128 comments

·July 25, 2025

The LLM providers are constantly adding new models and updating their API prices. Anyone building AI applications knows that these prices are very important to their bottom line. The only place I am aware of is going to these provider's individual website pages to check the price per token.

To solve this inconvenience I spent a few hours making pricepertoken.com which has the latest model's up-to-date prices all in one place.

Thinking about adding image models too especially since you have multiple options (fal, replicate) to use the same model and the prices are not always the same.

Visit

numlocked

(I work at OpenRouter)

We have solved this problem by working with the providers to implement a prices and models API that we scrape, which is how we keep our marketplace up to date. It's been a journey; a year ago it was all happening through conversations in shared Slack channels!

The pricing landscape has become more complex as providers have introduced e.g. different prices for tokens depending on prompt length, caching, etc.

I do believe the right lens on this is actually the price per token by endpoint, not by model; there are fast/slow versions, thinking/non-thinking, etc. that can sometimes also vary by price.

The point of this comment is not to self promote, but we have put a huge amount of work into figuring all of this out, and have it all publicly available on OpenRouter (admittedly not in such a compact, pricing-focused format though!)

tekacs

I tried making it compact and easy just now! Thanks so much for the effort!

https://github.com/tekacs/llm-pricing

osigurdson

By endpoint, do you mean price by token by API shape? Perhaps my phrasing is even more confusing but that is how I see it. I.e. there are API "shapes" for which as long as the shape of the API is the same, my application can use it interchangeably with others. Other dimensions are quality, speed, acceptable error rates, etc., which naturally influence pricing.

overwatch34

[dead]

sophia01

But the data is... wrong? Google Gemini 2.5 Flash-Lite costs $0.10/mtok input [1] but is shown here as $0.40/mtok?

[1] https://ai.google.dev/gemini-api/docs/pricing#gemini-2.5-fla...

alexellman

the data is not wrong you are reading my table wrong

edit: my bad I was wrong shouldnt have responded like this

unglaublich

Ouch, bad response for someone with a business!

GaggiX

The input is wrong tho

Your website reports 0.30$ for input and that wouldn't make any sense as it would be priced the same as the bigger Flash model.

alexellman

ok yeah fixed that one, sorry...

awongh

This is great, but as others have mentioned the UX problem is more complicated than this:

- for other models there are providers that serve the same model with different prices

- each provider optimizes for different parameters: speed, cost, etc.

- the same model can still be different quantizations

- some providers offer batch pricing (e.g., Grok API does not)

And there are plenty of other parameters to filter over- thinking vs. non-thinking, multi-modal or not, etc. not to even mention benchmarks ranking.

https://artificialanalysis.ai gives a blended cost number which helps with sorting a bit, but a blended cost model for input/output costs are going to change depending on what you're doing.

I'm still holding my breath for a site that has a really nice comparison UI.

Someone please build it!

numlocked

(I work at OpenRouter)

We have a simple model comparison tool that is not-at-all-obvious to find on the website, but hopefully can help somewhat. E.g.

https://openrouter.ai/compare/qwen/qwen3-coder/moonshotai/ki...

alexellman

would a column for "provider" meaning the place you are actually making the call to solve this

svachalek

Please not benchmark ranking. We've encouraged this nonsense far too long already.

zeroCalories

I think it would be very hard to make a fair comparison. Best you could do is probably make the trade-offs clear and let people make their own choices. I think it could be cool to make something like a token exchange where people put up their requirements, and then companies offer competing services that fit those requirements. Would be cool to let random people offer to their compute, but you would need to find a way to handle people lying about their capabilities or stealing data.

pierre

Main issue is that token are not equivalent across provider / models. With huge disparity inside provider beyond the tokenizer model:

- An image will take 10x token on gpt-4o-mini vs gpt-4.

- On gemini 2.5 pro output token are token except if you are using structure output, then all character are count as a token each for billing.

- ...

Having the price per token is nice, but what is really needed is to know how much a given query / answer will cost you, as not all token are equals.

alexellman

yeah I am going to add an experiment that runs everyday and the cost of that will be a column on the table. It will be something like summarize this article in 200 words and every model gets the same prompt + article

bigiain

For me, and I suspect a lot of other HN readers, a comparison/benchmark on a coding task would be more useful. Something small enough that you can affordably run it every day across a reasonable range of coding focused models, but non trivial enough to be representative of day to day AI assisted coding.

One other idea - for people spending $20 or $200/month for AI coding tools, a monitoring service that tracks and alerts on detected pricing changes could be something worth paying for. I'd definitely subscribe at $5/month for something like that, and I'd consider paying more, possibly even talking work into paying $20 or $30 per month.

BonoboIO

On gemini 2.5 pro output token are token except if you are using structure output, then all character are count as a token each for billing.

Can you elaborate this? I don’t quite understand the difference.

rsanek

I hadn't heard of this before either and can't find anything to support it on the pricing page.

https://ai.google.dev/gemini-api/docs/tokens

mythz

There was a time when it was unbelievably frustrating to navigate the bunch of marketing pages required to find the cost of a newly announced model, now I just look at OpenRouter to find pricing.

CharlesW

Site is down as I type this, but a shout-out to Simon Willison's LLM pricing calculator: https://www.llm-prices.com/

criddell

If you had a $2500ish budget for hardware, what types of models could you run locally? If $2500 isn't really enough, what would it take?

Are there any tutorials you can recommend for somebody interested in getting something running locally?

cogman10

This is where you'd start for local: https://ollama.com/

You can, almost, convert the number of nodes to gb of memory needed. For example, Deepseek-r1:7b needs about 7gb of memory to run locally.

Context window matters, the more context you need, the more memory you'll need.

If you are looking for AI devices at $2500, you'll probably want something like this [1]. A unified memory architecture (which will mean LPDDR5) will give you the most memory for the least amount of money to play with AI models.

[1] https://frame.work/products/desktop-diy-amd-aimax300/configu...

mark_l_watson

I bought a Mac Mini M2Pro 32G 18 months ago for $1900. It is sufficient to run good up to and including 40B local models that are quantized.

When local models don’t cut it, I like Gemini 2.5 flash/pro and gemini-cli.

There are a lot of good options for commercial APIs and for running local models. I suggest choosing a good local and a good commercial API, and spend more time building things than frequently trying to evaluate all the options.

criddell

Are there any particular sources you found helpful to get started?

It's been a while since I checked out Mini prices. Today, $2400 buys an M4 Pro with all the cores, 64GB RAM, and 1TB storage. That's pleasantly surprising...

mark_l_watson

You can read my book on local models with Ollama free online: https://leanpub.com/ollama/read

null

[deleted]

dstryr

I would purchase [2] used 3090's as close to $600 as you can. The 3090 still remains the price-performance king.

redox99

Kimi and deepseek are the only models that don't feel like a large downgrade from the typical providers.

skeezyboy

you can run ollama stuff with just a decent cpu for some of them

yieldcrv

the local side of things with an $7,000 - $10,000 machine (512gb fast memory, cpu and disk) can almost reach parity with regard to text input and output and 'reasoning', but lags far behind for multimodal anything: audio input, voice output, image input, image output, document input.

there are no out the box solutions to run a fleet of models simultaneously or containerized either

so the closed source solutions in the cloud are light years ahead and its been this way for 15 months now, no signs of stopping

omneity

Would running vLLM in docker work for you, or do you have other requirements?

yieldcrv

its not an image and audio model, so I believe it wouldn't work for me by itself

would probably need multiple models running in distinct containers, with another process coordinating them

paradite

It's actually more complex than just input and output tokens, there are more pricing rules by various providers:

- Off-peak pricing by DeepSeek

- Batch pricing by OpenAI and Anthropic

- Context window differentiated pricing by Google and Grok

- Thinking vs non-thinking token pricing by Qwen

- Input token tiered pricing by Qwen coder

I originally posted here: https://x.com/paradite_/status/1947932450212221427

NitpickLawyer

> The only place I am aware of is going to these provider's individual website pages to check the price per token.

Openrouter is a good alternative. Added bonus that you can also see where the open models come in, and can make an educated guess on the true cost / size of a model, and how likely it is it's currently subsidised.

danenania

OpenRouter also has an endpoint for listing models (with pricing info) in its api: https://openrouter.ai/docs/overview/models

A limitation though, at least the last time I checked, is that you only get a single provider returned per model. That’s fine for the major commercial models that all have the same pricing on each provider, but makes it hard to rely on for open source models, which tend to have many providers offering them at different price points (sometimes very different price points—like 5x or 10x difference).

callbacked

Awesome list, any chance of adding OpenRouter? Looking at their website seems like it would be a pain to scrape all of that due to the site's layout.

murshudoff

https://openrouter.ai/docs/api-reference/list-available-mode... OpenRouter has an endpoint to get models and their pricing

alexellman

Yeah I am going to be adding more sources like that and Groq but just wanted to start with the basics and see if it resonated

tekacs

I've run into this a ton of times and these websites all kinda suck. Someone mentioned the OpenRouter /models endpoint in a sibling comment here, so I quickly threw this together just now. Please feel free to PR!

https://github.com/tekacs/llm-pricing

  llm-pricing

  Model                                     | Input | Output | Cache Read | Cache Write
  ------------------------------------------+-------+--------+------------+------------
  anthropic/claude-opus-4                   | 15.00 | 75.00  | 1.50       | 18.75      
  anthropic/claude-sonnet-4                 | 3.00  | 15.00  | 0.30       | 3.75       
  google/gemini-2.5-pro                     | 1.25  | 10.00  | N/A        | N/A        
  x-ai/grok-4                               | 3.00  | 15.00  | 0.75       | N/A        
  openai/gpt-4o                             | 2.50  | 10.00  | N/A        | N/A        
  ...

---

  llm-pricing calc 10000 200 -c 9500 opus-4 4.1

  Cost calculation: 10000 input + 200 output (9500 cached, 5m TTL)
  
  Model                      | Input     | Output    | Cache Read | Cache Write | Total    
  ---------------------------+-----------+-----------+------------+-------------+----------
  anthropic/claude-opus-4    | $0.007500 | $0.015000 | $0.014250  | $0.178125   | $0.214875
  openai/gpt-4.1             | $0.001000 | $0.001600 | $0.004750  | $0.000000   | $0.007350
  openai/gpt-4.1-mini        | $0.000200 | $0.000320 | $0.000950  | $0.000000   | $0.001470
  openai/gpt-4.1-nano        | $0.000050 | $0.000080 | $0.000237  | $0.000000   | $0.000367
  thudm/glm-4.1v-9b-thinking | $0.000018 | $0.000028 | $0.000333  | $0.000000   | $0.000378

---

  llm-pricing opus-4 -v

  === ANTHROPIC ===

  Model: anthropic/claude-opus-4
    Name: Anthropic: Claude Opus 4
    Description: Claude Opus 4 is benchmarked as the world's best coding model, at time of release, 
    bringing sustained performance on complex, long-running tasks and agent workflows. It sets new 
    benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and 
    Terminal-bench (43.2%).
    Pricing:
      Input: $15.00 per 1M tokens
      Output: $75.00 per 1M tokens
      Cache Read: $1.50 per 1M tokens
      Cache Write: $18.75 per 1M tokens
      Per Request: $0
      Image: $0.024
    Context Length: 200000 tokens
    Modality: text+image->text
    Tokenizer: Claude
    Max Completion Tokens: 32000
    Moderated: true

tekacs

Cache pricing tweaked & fixed since the above.

aaronharnly

Can you gather historical information as well? I did a bit of spelunking of the Wayback Machine to gather a partial dataset for OpenAI, but mine is incomplete. Future planning is well-informed by understanding the trends — my rough calculation was that within a model family, prices drop by about 40-80% per 12 months.

alexellman

Yeah I am planning on setting up automatic scraping and just having my own database. Maybe could add historical data beyond as well but just gonna save all my own data for now