Skip to content(if available)orjump to list(if available)

OpenAI's o1-pro now available via API

OpenAI's o1-pro now available via API

108 comments

·March 19, 2025

simonw

This is their first model to only be available via the new Responses API - if you have code that uses Chat Completions you'll need to upgrade to Responses in order to support this.

Could take me a while to add support for it to my LLM tool: https://github.com/simonw/llm/issues/839

null

[deleted]

icelancer

Oh interesting. I thought they were going to have forward compatibility with Completions. Apparently not.

dtagames

It does. There are two endpoints. Eventually, all new models will only be in the new endpoint. The data interfaces are compatible.

dtagames

It shouldn't be too bad. The responses API accepts the same basic interface as the chat completion one.

simonw

The harder bit is the streaming response format - that's changed a bunch, and my tool supports both streaming and non-streaming for both Python sync and async IO - so there are four different cases I need to consider.

Tiberium

Even the basic interface is different, actually - "input" vs "messages", no "max_completion_tokens" nor "max_tokens". That said, changing those things is quite easy.

davidbarker

Pricing: $150 / 1M input tokens, $600 / 1M output tokens. (Not a typo.)

Very expensive, but I've been using it with my ChatGPT Pro subscription and it's remarkably capable. I'll give it 100,000 token codebases and it'll find nuanced bugs I completely overlooked.

(Now I almost feel bad considering the API price vs. the price I pay for the subscription.)

jbellis

Remarkably capable is a good description.

Shameless plug: One of the reasons I wrote my AI coding assistant is to make it easier to get problems into o1pro. https://github.com/jbellis/brokk

andrewinardeer

I wonder what the input/output tokens will be priced at for AGI.

stavros

They won't. Your use cases won't be something the AI can't do itself, so why would they sell it to you instead of replace you with it?

AGI means the value of a human is the same as an LLM, but the energy requirements of a human are higher than those of an LLM, so humans won't be economical any more.

dnadler

Actually, I think humans require much less energy than LLMs. Even raising a human to adulthood would be cheaper from a calorie perspective than running an AGI algorithm (probably). Its the whole reason why the premise of the Matrix was ridiculous :)

Some quick back of the envelope says that it would take around 35 MWh to get to 40 years old (2000 kcal per day)

rlt

OpenAI doesn’t have the pre-existing business, relationships, domain knowledge, etc to just throw AGI at every possible use case. They will sell AGI for some fraction of what an equivalent human behind a computer screen would cost.

“AGI” is also an under-specified term. It will start (maybe is already there) equivalent to, say, a human in an overseas call center, but over time improve to the equivalent of a Fortune 500 CEO or Nobel prize winner.

“ASI”, on the other hand, will just recreate entire businesses from scratch.

ldjkfkdsjnv

As far as I'm concerned, all of the other models are a waste of time to use in comparison. Most people dont know how good this model is

dinobones

Interesting... Most benchmarks show this model as being worse than o3-mini-high and sonnet3.7.

What difference are you seeing from these models that makes it better?

I say this as someone considering shelling out $200 for ChatGPT pro for this.

jbellis

If you're in the habit of breaking down problems to Sonnet-sized pieces you won't see a benefit. The win is that o1pro lets you stop breaking down one level up from what you're used to.

It may also have a larger usable context window, not totally sure about that.

Tiberium

There actually were almost no benchmarks for o1 pro before because it wasn't on the API. o1 pro is a different model from o1 (yes, even o1 with high reasoning).

ldjkfkdsjnv

I regularly push 100k+ tokens into it. So most of my code base/large portions. I use the Repo Prompt product to construct the code prompts. It finds bugs and solutions at a rate that is far better than others. I also speak into the prompt to describe my problem, and find spoken language is interpreted very well.

I also frequently download all the source code of libraries I am debugging, and when running into issues, pass that code in along with my own broken code. Its very good

Hugsun

How long is it's thinking time when compared to o1?

The naming would suggest that o1-pro is just o1 with more time to reason. The API pricing makes that less obvious. Are they charging for the thinking tokens? If so, why is it so much more expensive if there are just more thinking tokens anyways?

Tiberium

I think o1 pro runs multiple instances of o1 in parallel and selects the best answer, or something of the sort. And you do actually always pay for thinking models with all providers, OpenAI included. It's especially interesting if you remember the fact that OpenAI hides the CoT from you, so you're in fact getting billed for "thinking" that you can't even read yourself.

ldjkfkdsjnv

I dont have the answers for you, I just know that if they charged 400$ a month I would pay it. It seems like a different model to me. I never use o3-mini or o3-mini-high. Just gpt4o or o1 pro

hooloovoo_zoo

Is your prompt {$codebase} find bugs?

davidbarker

Typically something like:

  Look carefully through my codebase and identify any bugs/issues, or refactors that could improve it.

  <codebase>
  …
  </codebase>
Doesn't have to be anything overly complicated to get good results. It also does well if you give it a git diff.

ionwake

Sorry if this is a noob question, but are you just pasting file strings inbetween those tags? like the contents of file1.js and file2.js?

simonw

It cost me 94 cents to render a pelican riding a bicycle SVG with this one!

Notes and SVG output here: https://simonwillison.net/2025/Mar/19/o1-pro/

qingcharles

Whenever you experience a new pelican I always have to check it against your past pelicans to see progress towards the Artificial Super Pelican Singularity:

https://simonwillison.net/tags/pelican-riding-a-bicycle/

mateus1

I’m no expert but that does not look like a 94c pelican to me.

deciduously

Better than my svg pelican would be, but it's a low bar.

jascination

Your collection of pelicans is so bloody funny, genuinely brightened my day.

I don't know what I was expecting when I clicked the link but it definitely wasn't this: https://simonwillison.net/tags/pelican-riding-a-bicycle/

prawn

I have been using ChatGPT to generate 3d models by pasting output into OpenSCAD. Often feels like coaching someone wearing a blindfold, but it can sometimes kick things forward quickly for low effort.

orzig

At this point you’d come out ahead just buying a pelican. Even before the tax benefits.

jwpapi

Those that have tested it and liked it. I feel very confident with Sonnet 3.7 right now,if I would wish for something its it to be faster. Most of the problems I’m facing are like execution problems I just want AI to do it faster than me coding everything on my own.

To me it seems like o1-pro would be to be used as a switch-in tool or to double-check your codebase, than a constant coding assistant? (Even with lower price), as I assume I would need to get done a tremendous amount of work including domain knowledge done to come up for the 10x more speed (estimated) of Sonnet?

CamperBob2

o1-pro can be very useful but it's ridiculously slow. If you find yourself wishing Sonnet 3.7 was faster, you really won't like o1-pro.

I pay for it and will probably keep doing so, but I find that I use it only as a last resort.

danpalmer

It has a 2023 knowledge cut-off, and 200k context window... ? That's pretty underwhelming.

gkoberger

On the flip side, the cutoff date probably makes it a lot more upbeat.

throw310822

Don't know if it's me, but this is really funny.

bearjaws

For a second I was like "2023 isn't that bad"... and then I realized we're well into 2025...

serjester

Assuming a highly motivated office worker spends 6 hours per day listening or speaking, at a salary of $160k per year, that works out to a cost of ≈$10k per 1M tokens.

OpenAI is now within an order of magnitude of a highly skilled humans with their frontier model pricing. o3 pro may change this but at the same time I don’t think they would have shipped this if o3 was right around the corner.

danpalmer

If you start paying someone and give them some onboarding docs, to a first approximation they'll start doing the job and you'll get value.

If you attach a credit card to o3 and give it some onboarding docs, it'll give you a nice summary of your onboarding docs that you didn't need.

We're a long way from a model doing arbitrary roles. Currently at the very minimum, you need a competent office worker to run the model, filter its output through their judgement, and act on it.

lherron

More like: every time you tell o3 to do something, it will first reread the onboarding docs (and charge you for doing so) before it does anything else.

levocardia

Right, value per token is much more important (but harder to quantify). A medical AI that could provide a one-paragraph diagnosis and treatment plan for rare / untreatable diseases could be generating thousands of dollars of value per token. Meanwhile, Claude has probably racked up millions of tokens wandering around Mt. Moon aimlessly.

elicksaur

“Untreatable” disease.

Yet somehow the AI knows a treatment?

serjester

I think that’s the remarkable thing - even with all of its flaws and its insane pricing, there’s plenty of people that will pay for it (myself included).

LLM’s are good at a class of tasks that humans aren’t.

nebula8804

How do you reconcile issues such as the o1 pro model erroring out every 3rd attempt at an extremely large context? (that still fits but is near the limit)

Every time I try to get this thing to read my codebase and onboarding docs (about 40k line angular codebase) it is "pull your hair out" failing leading to frustration.

EcommerceFlow

o1-pro still holds up to every other release, including Grok 3 think and Claude 3.7 think (haven't tried Max out though), and that's over 3 months ago, practically an eternity in Ai time.

Ironic since I was getting ready to cancel my Pro subscription, but 4.5 is too nice for non-coding/math tasks.

God I can't wait for o3 pro.

Tiberium

"Max" as in "Claude 3.7 Sonnet MAX" is apparently Cursor-specific marketing - by default they don't use all the context of the model and set the thinking budget to a lower value than the maximum allowed. So essentially it's the exact same 3.7 Sonnet model, just with different settings.

sheepscreek

4.5 works on Plus! I know. I was surprised too.

_pdp_

At first I thought, great, we can add it now to our platform. Now that I have seen the price, I am hesitant enabling the model for the majority of users (except rich enterprises) as they will most certainly shoot themselves in the foot.

danpalmer

> they will most certainly shoot themselves in the foot

...and then ask you for a refund or service credit.

WiSaGaN

I have always suspected that the o1-Pro is some kind of workflow on the o1 model. Is it possible that it dispatches to say 8 instances of o1 then do some type of aggregation over the results?

null

[deleted]

NoahZuniga

Seems underwhelming when openai's best model, o3, was demoed almost 4 months ago.

null

[deleted]

bakugo

> $150/Mtok input, $600/Mtok output

What use case could possibly justify this price?

serjester

Synthetic data generation. You can have a really powerful, expensive model create evals so you can tune a faster, cheaper system with similar performance.

jsheard

You could do that, but OpenAI specifically doesn't want you to: https://openai.com/policies/row-terms-of-use/

What you cannot do. You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not: Use Output to develop models that compete with OpenAI.

Presumably you run the risk of getting banned if they realize what you're doing.

andyferris

> You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not: Use Output to develop models that compete with OpenAI.

This reads as if they consider developing models that compete with OpenAI as illegal, harmful or abusive. Which is crazy. (The other dot points in their list in the linked terms seem better).

levocardia

I wonder if some of the high pricing is specifically an attempt to ward off this sort of "slow distillation" of a powerful model

SJC_Hacker

If it was possible:

1) Why wasn't OpenAI doing it themselves?

2) This means we've reached technological singularity if AI models can improve themselves (as in getting a smarter model, not just compressing existing ones like Deepseek)

echelon

Screw their TOS.

OpenAI trained on the world's data. Data they didn't license.

Anyone should be able to "rip them off" and copy their capabilities on the cheap.

serjester

Synthetic data is just as useful for building app layers evals. Probably significantly cheaper ways to get the data if you’re training your own model.

kelseyfrog

I compete with AI, not my models.

littlestymaar

I suspect they have no way to enforce that without risking false positive hurting their rich customers (and their business).

icelancer

Full file refactoring. But I just use the webUI for this and will continue to at these prices... probably.

refulgentis

It enables obscene unnatural things at a fraction of most SWE hourly rates. One win that jumps to mind was writing a complete implementation of a Windows PCM player, as a flutter plugin, with some unique design properties and emergent API behavior that it needed to replicate from existing iOS/Android code

wincy

I used O1 Pro to write a .NET authorization filter which when I wrote it I didn’t even know what that was. I was like “I have this problem, how can I fix it” and it just started going and the solution worked the first try. Everyone at work was like “great job!” I guess I did feed it a bunch of surrounding code and the authorization policy, but the policy only allowed us to attach one security trait when we wanted it to be “attach any number of security attributes and verify the user has at least one”. Still, it solved what likely would have been at least a day or two of research in an hour or so conversation.

alphabettsy

Is it secure?

zipy124

Does it really? Your average software engineer is like £20-30 an hour, for the cost of 1m output tokens you can get a Dev for a full week.

sheepscreek

The math doesn’t check out. A day maybe. Also it’s not just about a placeholder dev. The person needs to know your use-case and have the tech chops to deliver successfully in that timeframe.

Now to have that delivered to you in less than an hour? That’s a huge win.

intelVISA

The avg. SWE is a toss up if they create more issues than they solve over time. Factor in on-boarding, bugs and taking time away from other expensive people becomes >$100/hr real quick.

refulgentis

Leaving the dissection of this to the separate reply, let's estimate cost:

- 80 chars per line, 30 occupied (avg'd across 300 KLOC in codebase)

- 500 lines of code

- 15000 characters

- 4 chars / token

- 3750 tokens output

- 10 full iterations, and don't apply cached token pricing that's 90% off

- 37,500 tokens req'd in output

- $600 / 1M tokens

- $0.60 / 1K tokens

- $18

risyachka

More mediocre software is all the world needs.

refulgentis

A tool is a tool. Your output is what you decide.