OpenAI researcher announced GPT-5 math breakthrough that never happened
127 comments
·October 19, 2025Timsky
crazygringo
In my experience doing literature super-deep-dives, it hallucinates sources about 50% of the time. (For higher-level literature surveys, it's maybe 5%.)
Of the other 50% that are real, it's often ~evenly split into sources I'm familiar with and sources I'm not.
So it's hugely useful in surfacing papers that I may very well never have found otherwise using e.g. Google Scholar. It's particularly useful in finding relevant work in parallel subfields -- e.g. if you work in physics but it turns out their are math results, or you work in political science and it turns out there are relevant findings from anthropology. And also just obscure stuff -- a random thesis that never got published or cited but the PDF is online and turns out to be relevant.
It doesn't matter if 75% of the results are not useful to me or hallucinated. Those only waste me minutes. The other 25% more than make up for it -- they're things I simply might never find otherwise.
glenstein
Struggling to understand this one. Is it that (1) it's lopsided toward reference materials found on the modern internet and not as useful for reviewing literature from the Before Times or (2) it's offering specific solutions but you're skeptical of them?
gpjt
To be fair to the OpenAI team, if read in context the situation is at worst ambiguous.
The deleted tweet that the article is about said "GPT-5 just found solutions to 10 (!) previously unsolved Erdös problems, and made progress on 11 others. These have all been open for decades." If it had been posted stand-alone then I would certainly agree that it was misleading, but it was not.
It was a quote-tweet of this: https://x.com/MarkSellke/status/1979226538059931886?t=OigN6t..., where the author is saying he's "pushing further on this".
The "this" in question is what this second tweet is in turn quote-tweeting: https://x.com/SebastienBubeck/status/1977181716457701775?t=T... -- where the author says "gpt5-pro is superhuman at literature search: [...] it just solved Erdos Problem #339 (listed as open in the official database erdosproblems.com/forum/thread/3…) by realizing that it had actually been solved 20 years ago"
So, reading the thread in order, you get
* SebastienBubeck: "GPT-5 is really good at literature search, it 'solved' an apparently-open problem by finding an existing solution"
* MarkSellke: "Now it's done ten more"
* kevinweil: "Look at this cool stuff we've done!"
I think the problem here is the way quote-tweets work -- you only see the quoted post and not anything that it in turn is quoting. Kevin Weil had the two previous quotes in his context when he did his post and didn't consider the fact that readers would only see the first level, so wouldn't have Sebastien Bubek's post in mind when they read his.That seems like an easy mistake to entirely honestly make, and I think the pile-on is a little unfair.
moefh
> Kevin Weil had the two previous quotes in his context when he did his post and didn't consider the fact that readers would only see the first level, so wouldn't have Sebastien Bubek's post in mind when they read his.
No, Weil said he himself misunderstood Sellke's post[1].
Note Weil's wording (10 previously unsolved Erdos problems) vs. Sellke's wording (10 Erdos problems that were listed as open).
OtherShrezzing
Am I correct in thinking this is the 2nd such fumble by a major lab? DeepMind released their “matrix multiplication better than SOTA” paper a few months back, which suggested Gemini had uncovered a new way to optimally multiply two matrices in fewer steps than previously known. Then immediately after their announcement, mathematicians pointed out that their newly discovered SOTA had been in the literature for 30-40 years, and was almost certainly in Gemini’s training set.
Frieren
> "GPT-5 is really good at literature search, it 'solved' an apparently-open problem by finding an existing solution"
Survivor bias.
I can assure you that GPT-5 fucks up even relatively easy searches. I need to have a very good idea how the results looks like and the ability to test it to be able to use any result from GPT-5.
If I throw the dice 1000 times and post about it each time that I got a double six. Am I the best dice thrower that there is?
zacmps
For literature search that might be ok. It doesn't need to replace any other tools, and if 1/10 it surfaces something you wouldn't have found otherwise it could be worth the time on the dud attempts.
card_zero
So the first guy said "solved [...] by realizing that it had actually been solved 20 years ago", and the second guy said "found solutions to 10 (!) previously unsolved Erdös problems".
Previously unsolved. The context doesn't make that true, does it?
null
YesBox
Wouldn't be surprised if OpenAI employees are being asked to phrase (market) things this way. This is not the first time they claimed GPT-5 "solved" something [1]
[1] https://x.com/SebastienBubeck/status/1970875019803910478
edit: full text
It's becoming increasingly clear that gpt5 can solve MINOR open math problems, those that would require a day/few days of a good PhD student. Ofc it's not a 100% guarantee, eg below gpt5 solves 3/5 optimization conjectures. Imo full impact of this has yet to be internalized...
827a
This happening the same week as DeepMind’s seemingly legitimate AI-assisted cancer treatment breakthrough is a startlingly bad look for OpenAI.
My boss always used to say “our only policy is, don’t be the reason we need to create a new policy”. I suspect OpenAI is going to have some new public communication policies going forward.
amirhirsch
The sad truth about this incident is that it reveals that OpenAI does not have a serious effort to actually work on unsolved math problems.
grafmax
I realized they jumped the shark when they announced the pivots to ads and porn. Markets haven’t caught on yet.
goalieca
The porn pivot makes perfect sense. Porn is already quite fake and unconvincing and none of that matters.
mrbombastic
It might not matter as far as profitability is concerned, ethically the second order effects will be very problematic. I am no puritan but the widespread availability of porn has already affected peoples sexual expectations greatly. AI generated porn is going to remove even more guardrails for behavior previously considered deviant, people will view and bring those expectations back to real life.
chanux
And there's no escape. The Internet was built for gambling and this.
throwacct
Unfortunately, the porn pivot might be their path to "profitability".
zeroonetwothree
They know where the money is.
grafmax
It’s standard practice for VC companies to enshittify after building a moat, relying on user lock-in. What’s remarkable is how quickly they’ve had to shift gears. And with this rapid pivot it’s questionable how large that moat really is.
null
HarHarVeryFunny
The porn / sex-chat one is really disappointing. It seems they've given up even pretending that they are trying to do something beneficial for society. This is just a pure society-be-damned money grab.
disgruntledphd2
They've raised far too much money for those kinda ethics, unfortunately.
bradly
My hunch is that they don't have a way to stop anything, so they are creating verticals to at least contain porn, medical, higher-ed users.
null
rowanG077
How so? I wouldn't put much stock into a roque employee announcing something wrong.
mrbungie
That's not any employee, its their VP of Science.
amirhirsch
The people involved are very smart and must know that AI doing novel math is a canary for AGI. A serious effort around solving open problems would not fuck up this kind of announcement.
coldtea
"rogue employee"
jebarker
That’s a non sequitur. They’re a fairly large organization, I’d be amazed if they don’t have multiple research sub-teams pursuing all sorts of different directions.
andrewstuart
Humans hallucinating about AI.
JKCalhoun
"OpenAI Researcher Hallucinates GPT-5 Math Breakthrough" could be a headline from The Onion.
reaperducer
"OpenAI Researcher Hallucinates GPT-5 Math Breakthrough" could be a headline from The Onion.
Off topic, but I saw The Onion on sale in the magazine rack of Barnes and Noble last month.
For those who miss when it was a free rag in sidewalk newsstands, and don't want to pony up for a full subscription, this is an option.
antegamisou
Seriously those headlines are getting DailyMail sensationalism levels of ridiculous.
nicce
I the old world we would just use the word bullshit.
alkyon
They started believing the very lies they invented.
moffkalast
"The truth is usually just an excuse for a lack of imagination."
pera
Heh stockholders are not hallucinating: They know very well what they are doing.
skeeter2020
retail investors? no way. The fever-dream may continue for a while but eventually it will end. Meanwhile we don't even know our full exposure to AI. It's going to be ugly and beyond burying gold in my backyard I can't even figure out how to hedge against this monster.
Palmik
More like humans hallucinating about humans hallucinating about AI, see here: https://news.ycombinator.com/item?id=45634120
MattGaiser
Humans "hallucinate" in the AI way constantly, which is why I don't see them as a barrier to LLMs replacing humans in many contexts. It really isn't unusual for a human to make stuff up or be unaware of stuff.
zeknife
A human being informed of a mistake will usually be able to resolve it and learn something in the process, whereas an LLM is more likely to spiral into nonsense
MattGaiser
You must know people without egos. Humans are better at correcting their mistakes, but far worse at admitting them.
But yes, as an edge case handler humans still have an edge.
alimw
You must know better humans than I do.
skeeter2020
Do you think the OpenAI human, when informed of their "oopsie" replied "You're right, there is existing evidence that this problem has already been solved. Blah Blah Blah ... and that's why our new model has made a huge breakthrough against previously unsolved math problems!"
zeroonetwothree
Humans are a bit better at knowing which things are important and doing more research. Also better at being honest when directly pressed. And infinitely better at learning from errors.
(Yes, not everyone, but we do have some mechanisms to judge or encourage)
lapcat
> Humans "hallucinate" in the AI way constantly
This claim is ambiguous. The use of the word "Humans" here obscures rather than clarifies the issue. Individual humans typically do not "hallucinate" constantly, especially not on the job. Any individual human who is as bad at their job as an LLM should indeed be replaced, by a more competent individual human, not by an equally incompetent LLM. This was true long before LLMs were invented.
In the movie "Bill and Ted's Excellent Adventure," the titular characters attempt to write a history report by asking questions of random strangers in a convenience store parking lot. This of course is ridiculous and more a reflection of the extreme laziness of Bill and Ted than anything else. Today, the lazy Bill and Ted would ask ChatGPT instead. It's equally ridiculous to defend the wild inaccuracy and hallucinations of LLMs by comparing them to average humans. It's not the job of humans to answer random questions on any subject.
Human subject matter experts are not perfect, but they’re much better than average and don’t hallucinate on their subjects. They also have accountability and paper trails, can be individually discounted for gross misconduct, unlike LLMs.
pas
it's the same thing with self-driving, if you can make it safer than a good human driver it's enough. but the bar is pretty low with driving (as evidenced by the hundreds of thousands of collisions and deaths and permanent disabilities each year). and rather high in scientific publishing.
null
random9749832
Best case: Hallucination
Worst case (more probable): Lying
MPSimmons
Hanlon's Razor
random9749832
They are expanding into the adult market because they are running out of ideas. I think common sense is enough to decide what is what here.
forgetfulness
Lying is a stupid way of selling something and making money
rixed
These days AI just obsequiously praise whatever stupid ideas the human throw at them, which encourage humans into hallucinating breakthroughs.
But it's only a matter of time before AI gets better at prompt engineering.
/s?
d_tr
Whatever happened to "don't get high on your own supply"?
cedws
Making such a claim should at the very least require proof that the information wasn’t in the training data.
jgalt212
After the circular financing schemes involving hundreds of billions of dollars were uncovered, nothing I read about the AI business and its artificial hype machine surprises me anymore.
anskxbsb
[dead]
kif
This honestly doesn’t surprise me. We have reached a point where it’s becoming clearer and clearer that AGI is nowhere to be seen, whereas advances in LLM ability to ‘reason’ have slowed down to (almost?) a halt.
dawnerd
But if you ask an AI hype person they’ll say we’re almost there we just need a bit more gigawatts of compute!
vbezhenar
In my book, chat-based AGI has been reached years ago, when I couldn't reliably distinguish computer from human.
Solving problems that humanity couldn't solve is super-AGI or something like that. It's not there indeed.
3836293648
Beating the Turing Test is not AGI, but it is beating the Turing Test and that was impressive enough when it happened
jdiff
We're not even solving problems that humanity can solve. There's been several times where I've posed to models a geometry problem that was novel but possible for me to solve on my own, but LLMs have fallen flat on executing them every time. I'm no mathematician, these are not complex problems, but they're well beyond any AI, even when guided. Instead, they're left to me, my trusty whiteboard, and a non-negligible amount of manual brute force shuffling of terms until it comes out right.
They're good at the Turing test. But that only marks them as indistinguishable from humans in casual conversation. They are fantastic at that. And a few other things, to be clear. Quick comprehension of an entire codebase for fast queries is horribly useful. But they are a long way from human-level general intelligence.
steveBK123
Hence the pivot into ads, shop-in-chat and umm.. adult content.
null
JKCalhoun
I try not to lose sight of the first time that I heard (some years back) that people were using this new LLM thing for DM'ing ("dungeon mastering", leading) a game of Dungeons and Dragons. I thought, this must be bullshit or some kind of witchcraft.
Definitely not anti-AI here. I think I have been disappointed though, since then, to slowly learn that they're (still) little beyond that.
Still amazing though. And better than a Google search (IMHO).
> GPT-5 is proving useful as a literature review assistant
No, it does not. It only produces a highly convincing counterfeit. I am honestly happy for people who are satisfied with its output: life is way easier for them than for me. Obviously, the machine discriminates me personally. When I spend hours in the library looking for some engineering-related math made in the 70s-80s, as a last resort measure, I can try to play this gambling with chat, hoping for any tiny clue to answer my question. And then for the following hours, I am trying to understand what is wrong with the chat output. Most often, I experience the "it simply can't be" feeling, and I know I am not the only one having it.