sobiolite
nabla9
This is what eating your own dog food looks like when you are selling dog food.
EMIRELADERO
Saved. Thanks for that belly laugh.
echelon
Is this the moment the bubble pops (at least for OpenAI)?
GPT-5 has to be one of the most underwhelming releases to date, and that's fresh on the heels of the "gift" of GPT-OSS.
The hottest news out of OpenAI lately is who Mark Zuckerberg has added to Meta's "Superintelligence" roster.
michaelt
Perhaps they had their new AI generate the graphic.
null
outside1234
People at OpenAI are the top of their field. It is not sloppiness in this crowd.
teaearlgraycold
I don't think the PR people at OpenAI are at the top of their field.
marvinborner
This should also include the chart on "Coding deception" [1] which is quite deceptive (50.0 is not in fact less than 47.4)
zmmmmm
I pasted the image of the chart into ChatGPT-5 and prompted it with
>there seems to be a mistake in this chart ... can you find what it is?
Here is what it told me:
> Yes — the likely mistake is in the first set of bars (“Coding deception”). The pink bar for GPT-5 (with thinking) is labeled 50.0%, while the white bar for OpenAI o3 is labeled 47.4% — but visually, the white bar is drawn shorter than the pink bar, even though its percentage is slightly lower.
So they definitely should have had ChatGPT review their own slides.
qwertox
Both the submission and your link took me way too long to see what's the issue here.
What were they even thinking? Don't they care about this? Is their AI generating all their charts now and they don't even bother to review it?
panarky
Since everyone assumes GPT hallucinated these charts, the truth must be that they're 100% pure, organic, unadulterated human fuckups.
croes
Doesn’t matter. Either way is bad
windowdoor
My unjustified and unscientific opinion is that AI makes you stupid.
That's based solely on my own personal vibes after regularly using LLMs for a while. I became less willing to and capable of thinking critically and carefully.
nicce
It also scares me how good they are in appealing and social engineering. They have made me feel good about poor judgment and bad decision at least twice (which I noticed later on, still in time). New, strict system prompt and they give the opposite opinion and recommend against their previous suggestion. They are so good at arguing that they can justify almost anything and make you believe that this is what you should do unless you are among the 1% experts in the topic.
lacy_tinpot
AI being used to completely off load thinking is a total misuse of the technology.
But at the same time that this technology can seemingly be misused and cause really psychological harm is kind of a new thing it feels like. Right? Like there are reports of AI Psychosis, don't know how real it is, but if it's real I don't know any other tool that's really produced that kind of side effect.
II2II
No. AI is a tool to make ourselves look stupid. Suggesting that it makes people stupid suggest that they are even looking at the output.
chilmers
That one is so obviously wrong that it makes me wonder if someone mislabelled the chart, but perhaps I'm being too optimistic.
computomatic
Presumably it corresponds to Table 8 from this doc: https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb...
If that’s the case, it’s mislabelled and should have read “17%” which would better the visual.
mwigdahl
It's been fixed on the OpenAI website.
datadrivenangel
Added!
p1necone
This half makes sense to me - 'deception' is an undesirable quality in an llm, so less of it is 'better/more' from their audiences perspective.
However, I can't think of a sensible way to actually translate that to a bar chart where you're comparing it to other things that don't have the same 'less is more' quality (the general fuckery with graphs not starting at 0 aside - how do you even decide '0' when the number goes up as it approaches it), and what they've done seems like total nonsense.
JBiserkov
> 'deception' is an undesirable quality in an llm, so less of it is 'better/more' from their audiences perspective
So if that ^ is why 50.0 is lower than 47.4 ... but why is then 86.7 not lower than 9.0? Or 4.8 not lower than 2.1
I_am_tiberius
It would be interesting to know how this occurred. I assume there may have been last-minute high-level feedback suggesting: "We can't let users see that the new model is only slightly better than the old one. Adjust the y-axis to make the improvement appear more significant."
yoyohello13
It’s genuinely terrifying that people this incompetent have so much money and power.
fullshark
It’s more terrifying that no one cares about the truth it seems anywhere. Vibeworld, we are all selling vaporware and if you don’t build it who cares move into the next hype cycle that pumps the stock / gets VC funding. Absurd industry.
pesus
We're feeling the effects of living in a post-truth society more and more every day. It's pretty terrifying.
m_herrlich
It might not incompetent to assume the audience is not very discerning
aydyn
OpenAI is currently getting dunked on, on all major platforms. It is incompetent.
throwawayoldie
People reading Hacker News are the target audience, and here we are, discerning.
null
01HNNWZ0MV43FF
Hey, could be malice
null
ElijahLynn
The magic that is ChatGPT is definitely not incompetence.
They may not be perfect, but they provided a lot of value to many different industries including coding.
danpalmer
Maybe they asked GPT-5 to update slides.
qustrolabe
GPT-5 would've caught this mismatch for sure
datadrivenangel
Claude and ChatGPT actually took me several prompts to get them to identify this. They recognized from a screenshot that labeled axes that start at zero can be misleading, but missed the actual issue.
macNchz
That seemingly depends a bit on how hard you ask it to think, or how hard it decides to think based on your question.
outside1234
There is a smell of desperation around OpenAI, so I wouldn't be surprised if this level of hypevibing came from the top.
null
lnenad
I mean this is the industry standard. For example every time Nvidia dumps a new GPU into the ether, they do the same thing. Apple with M series CPUs. They even go a step further and compare a few generations back.
datadrivenangel
It's dishonest and the multiple examples in the same presentation tell you what you need to know about the credibility of the presenters
zigzag312
There's only one error, bar height for o3. Somehow height uses value from 4o, which seems like some sort of copy paste error.
EDIT: I was looking just at the first chart. I didn't see there's more below.
andrewstuart2
The other chart on that slide was actually to scale. My suspicion is that it was super rushed to the deadline for this presentation and they maybe didn't use excel or anything automatic for the charts, so they look better, and they missed the detail due to time pressure.
pryelluw
Similar to the glass demonstration on the cybertruck.
Eji1700
OpenAI has always known that "data" is part of marketing, and treated it as such. I don't think this is intentional, but they damn well knew, even back in the dota 2 days, how to present data in such a way as to overstate the results and hide the failures.
zmmmmm
it's so funny that it tried to deceive everybody about it's deceptiveness
interweb_tube
I'll always invest in a chart that's more pink than gray.
subtlesoftware
The 69.1 column has the same height as the 30.8 column. My guess is they just duplicated the 30.8 column and forgot to adjust the height to the number, which passed a cursory check because it was simply lower than the new model.
This doesn't explain the 50.0 column height though.
chilmers
Eyeballing it, that bar looks to be around 15% in height. Typing "50" instead of "15" is a plausible typo. Albeit, one you might expect from a high-schooler giving a class presentation, not in a flagship launch by one of the most hyped startups in history.
Just remember, everyone involved with these presentations is getting a guaranteed $1.5 million bonus. Then cry a little.
dragonwriter
> The 69.1 column has the same height as the 30.8 column. My guess is they just duplicated the 30.8 column and forgot to adjust the height to the number
Why, unless specifically for the purpose of making it possible to do inaccurate and misleading inconsistencies off this type, would you make charts for a professional presentation by a mechanism that involved separately manually creating the bars and the labels in the first place? I mean, maybe, if you were doing something artistic with the style that wasn't supported in charting software you might, but these are the most basic generic bar charts except for the inconsistencies.
null
datadrivenangel
People interested in misleading data visualization should look into Alberto Cairo's Book: How Charts Lie
44za12
That was quick, vibe coded, I presume?
datadrivenangel
The CSS animations are very revealing on that front from a performance perspective.
teaearlgraycold
I tend to blame performance issues on the developer writing the code on a top of the line computer. There are too many WebGL effects on startup websites that were built to run on a M4 Max.
thewebguyd
> There are too many WebGL effects on startup websites that were built to run on a M4 Max.
Tale as old as time. When the retina display macs first came out, we say web design suddenly no longer optimized for 1080p or less displays (and at the time, 1376x768 was the default resolution for windows laptops).
As much suffering as it'd be, I swear we'd end up with better software if we stopped giving devs top of the line machines and just issued whatever budget laptop is on sale at the local best buy on any given day.
datadrivenangel
Yeah this is somewhat stuttery on an M2 mac.
seba_dos1
It's less than 200 lines of CSS. Easily doable by a human in 30 minutes.
mattgreenrocks
I love how this has to be defended now, as if that was somehow unthinkable from a domain expert.
null
There are versions of both these charts with more plausible numbers and bar sizes in the "evaluation" section of the announcement post:
https://openai.com/index/introducing-gpt-5/
So, maybe this is just sloppiness and not intentionally misleading. But still, not a good look when the company burning through billions of dollars in cash and promising to revolutionize all human activity can't put together a decent powerpoint.