Skip to content(if available)orjump to list(if available)

GPT-5: Key characteristics, pricing and system card

morleytj

It's cool and I'm glad it sounds like it's getting more reliable, but given the types of things people have been saying GPT-5 would be for the last two years you'd expect GPT-5 to be a world-shattering release rather than incremental and stable improvement.

It does sort of give me the vibe that the pure scaling maximalism really is dying off though. If the approach is on writing better routers, tooling, comboing specialized submodels on tasks, then it feels like there's a search for new ways to improve performance(and lower cost), suggesting the other established approaches weren't working. I could totally be wrong, but I feel like if just throwing more compute at the problem was working OpenAI probably wouldn't be spending much time on optimizing the user routing on currently existing strategies to get marginal improvements on average user interactions.

I've been pretty negative on the thesis of only needing more data/compute to achieve AGI with current techniques though, so perhaps I'm overly biased against it. If there's one thing that bothers me in general about the situation though, it's that it feels like we really have no clue what the actual status of these models is because of how closed off all the industry labs have become + the feeling of not being able to expect anything other than marketing language from the presentations. I suppose that's inevitable with the massive investments though. Maybe they've got some massive earthshattering model release coming out next, who knows.

thorum

The quiet revolution is happening in tool use and multimodal capabilities. Moderate incremental improvements on general intelligence, but dramatic improvements on multi-step tool use and ability to interact with the world, will eventually feed back into general intelligence.

hnuser123456

I agree, we have now proven that GPUs can ingest information and be trained to generate content for various tasks. But to put it to work, make it useful, requires far more thought about a specific problem and how to apply the tech. If you could just ask GPT to create a startup that'll be guaranteed to be worth $1B on a $1k investment within one year, someone else would've already done it. Elbow grease still required for the foreseeable future.

In the meantime, figuring out how to train them to make less of their most common mistakes is a worthwhile effort.

BoiledCabbage

Performance is doubling roughly every 4-7 months. That trend is continuing. That's insane.

If your expectations were any higher than that then, then it seems like you were caught up in hype. Doubling 2-3 times per year isn't leveling off my any means.

https://metr.github.io/autonomy-evals-guide/gpt-5-report/

oblio

By "performance" I guess you mean "the length of task that can be done adequately"?

It is a benchmark but I'm not very convinced it's the be-all, end-all.

jstummbillig

Things have moved differently than what we thought would happen 2 years ago, but lest we forget what has happened in the meanwhile (4o, o1 + thinking paradigm, o3)

So yeah, maybe we are getting more incremental improvements. But that to me seems like a good thing, because more good things earlier. I will take that over world-shattering any day – but if we were to consider everything that has happened since the first release of gpt-4, I would argue the total amount is actually very much world-shattering.

simonw

I for one am pretty glad about this. I like LLMs that augment human abilities - tools that help people get more done and be more ambitious.

The common concept for AGI seems to be much more about human replacement - the ability to complete "economically valuable tasks" better than humans can. I still don't understand what our human lives or economies would look like there.

What I personally wanted from GPT-5 is exactly what I got: models that do the same stuff that existing models do, but more reliably and "better".

GaggiX

Compared to GPT-4, it is on a completely different level given that it is a reasoning model so on that regard it does delivers and it's not just scaling, but for this I guess the revolution was o1 and GPT-5 is just a much more mature version of the technology.

hodgehog11

The aggressive pricing here seems unusual for OpenAI. If they had a large moat, they wouldn't need to do this. Competition is fierce indeed.

ilaksh

It's like 5% better. I think they obviously had no choice but to be price competitive with Gemini 2.5 Pro. Especially for Cursor to change their default.

impure

The 5 cents for Nano is interesting. Maybe it will force Google to start dropping their prices again which have been slowly creeping up recently.

0x00cl

Maybe the need/want data.

impure

OpenAI and most AI companies do not train on data submitted to a paid API.

WhereIsTheTruth

They also do not train using copyrighted material /s

dr_dshiv

And it’s a massive distillation of the mother model, so the costs of inference are likely low.

bdcdo

"GPT-5 in the API is simpler: it’s available as three models—regular, mini and nano—which can each be run at one of four reasoning levels: minimal (a new level not previously available for other OpenAI reasoning models), low, medium or high."

Is it actually simpler? For those who are currently using GPT 4.1, we're going from 3 options (4.1, 4.1 mini and 4.1 nano) to at least 8, if we don't consider gpt 5 regular - we now will have to choose between gpt 5 mini minimal, gpt 5 mini low, gpt 5 mini medium, gpt 5 mini high, gpt 5 nano minimal, gpt 5 nano low, gpt 5 nano medium and gpt 5 nano high.

And, while choosing between all these options, we'll always have to wonder: should I try adjusting the prompt that I'm using, or simply change the gpt 5 version or its reasoning level?

mwigdahl

If reasoning is on the table, then you already had to add o3-mini-high, o3-mini-medium, o3-mini-low, o4-mini-high, o4-mini-medium, and o4-mini-low to the 4.1 variants. The GPT-5 way seems simpler to me.

impossiblefork

Yes, I think so. It's n=1,2,3 m=0,1,2,3. There's structure and you know that each parameter goes up and in which direction.

makeramen

But given the option, do you choose bigger models or more reasoning? Or medium of both?

paladin314159

If you need world knowledge, then bigger models. If you need problem-solving, then more reasoning.

But the specific nuance of picking nano/mini/main and minimal/low/medium/high comes down to experimentation and what your cost/latency constraints are.

impossiblefork

I would have to get experience with them. I mostly use Mistral, so I have only the choice of thinking or not thinking.

namibj

Depends on what you're doing.

justusthane

> a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent

This is sort of interesting to me. It strikes me that so far we've had more or less direct access to the underlying model (apart from the system prompt and guardrails), but I wonder if going forward there's going to be more and more infrastructure between us and the model.

empiko

Despite the fact that their models are used in hiring, business, education, etc this multibillion company uses one benchmark with very artificial questions (BBQ) to evaluate how fair their model is. I am a little bit disappointed.

zaronymous1

Can anyone explain to me why they've removed parameter controls for temperature and top-p in reasoning models, including gpt-5? It strikes me that it makes it harder to build with these to do small tasks requiring high-levels of consistency, and in the API, I really value the ability to set certain tasks to a low temp.

Der_Einzige

It's because all forms of sampler settings destroy safety/alignment. That's why top_p/top_k are still used and not tfs, min_p, top_n sigma, etc, why temperature is locked to 0-2 arbitrary range, etc

Open source is years ahead of these guys on samplers. It's why their models being so good is that much more impressive.

oblio

Temperature is the response variation control?

anyg

Good to know - > Knowledge cut-off is September 30th 2024 for GPT-5 and May 30th 2024 for GPT-5 mini and nano

falcor84

Oh wow, so essentially a full year of post-training and testing. Or was it ready and there was a sufficiently good business strategy decision to postpone the release?

bhouston

Weird to have such an early knowledge cutoff. Claude 4.1 has March 2025 - 6 month more recent with comparable results.

ilaksh

This is key info from the article for me:

> -------------------------------

"reasoning": {"summary": "auto"} }'

Here’s the response from that API call.

https://gist.github.com/simonw/1d1013ba059af76461153722005a0...

Without that option the API will often provide a lengthy delay while the model burns through thinking tokens until you start getting back visible tokens for the final response.

diggan

> but for the moment here’s the pelican I got from GPT-5 running at its default “medium” reasoning effort:

Would been interesting to see a comparison between low, medium and high reasoning_effort pelicans :)

When I've played around with GPT-OSS-120b recently, seems the difference in the final answer is huge, where "low" is essentially "no reasoning" and with "high" it can spend seemingly endless amount of tokens. I'm guessing the difference with GPT-5 will be similar?

simonw

> Would been interesting to see a comparison between low, medium and high reasoning_effort pelicans

Yeah, I'm working on that - expect dozens of more pelicans in a later post.

pancakemouse

Practically the first thing I do after a new model release is try to upgrade `llm`. Thank you, @simonw !

efavdb

same, looks like he hasn't added 5.0 to the package yet but assume imminent.

https://llm.datasette.io/en/stable/openai-models.html

ks2048

So, "system card" now means what used to be a "paper", but without lots of the details?

simonw

AI labs tend to use "system cards" to describe their evaluation and safety research processes.

They used to be more about the training process itself, but that's increasingly secretive these days.

kaoD

Nope. System card is a sales thing. I think we generally call that "product sheet" in other markets.

nickthegreek

This new naming conventions, while not perfect are alot clearer and I am sure will help my coworkers.