Over fifty new hallucinations in ICLR 2026 submissions
374 comments
·December 7, 2025WWWWH
stainablesteel
this brings us to a cultural divide, westerners would see this as a personal scar, as they consider the integrity of the publishing sphere at large to be held up by the integrity of individuals
i clicked on 4 of those papers, and the pattern i saw was middle-eastern, indian, and chinese names
these are cultures where they think this kind of behavior is actually acceptable, they would assume it's the fault of the journal for accepting the paper. they don't see the loss of reputation to be a personal scar because they instead attribute blame to the game.
some people would say it's racist to understand this, but in my opinion when i was working with people from these cultures there was just no other way to learn to cooperate with them than to understand them, it's an incredibly confusing experience to be working with them until you understand the various differences between your own culture and theirs
ribosometronome
Where do you see the authors? All I'm seeing is:
>Anonymous authors
>Paper under double-blind review
sureMan6
Either op mistakes the hallucinated citations for the authors (most likely, although there's almost no "middle eastern names" among them) Or he checked some that do have the names listed (I found 4, all had either Chinese names or "western" names) Anyway the great majority of papers (good or bad) I've seen have Indian or Chinese names attached, attributing bad papers to brown people having an inferior culture is just blatantly racist
titanomachy
Yeah WTF? Both authors and reviewers are hidden. Is this comment just an attempt to whip up racist fervor?
Aeglaecia
im not sure if you are gonna get downvoted so im sticking a limb out to cop any potential collateral damage in the name of finding out whether the common inhabitant of this forum considers the idea of low trust vs high trust societies to be inherently racist
CoastalCoder
I think it's an interesting question. Whether or not it can be discussed well here isn't so obvious.
ulrashida
Unfortunately while catching false citations is useful, in my experience that's not usually the problem affecting paper quality. Far more prevalent are authors who mis-cite materials, either drawing support from citations that don't actually say those things or strip the nuance away by using cherry picked quotes simply because that is what Google Scholar suggested as a top result.
The time it takes to find these errors is orders of magnitude higher than checking if a citation exists as you need to both read and understand the source material.
These bad actors should be subject to a three strikes rule: the steady corrosion of knowledge is not an accident by these individuals.
hippo22
It seems like this is the type of thing that LLMs would actually excel at though: find a list of citations and claims in this paper, do the cited works support the claims?
bryanrasmussen
sure, except when they hallucinate that the cited works support the claims when they do not. At which point you're back at needing to read the cited works to see if they support the claims.
BHSPitMonkey
You don't just accept the review as-is, though; You prompt it to be a skeptic and find a handful of specific examples of claims that are worth extra attention from a qualified human.
Unfortunately, this probably results in lazy humans _only_ reading the automated flagged areas critically and neglecting everything else, but hey—at least it might keep a little more garbage out?
19f191ty
Exactly abuse of citations is a much more prevalent and sinister issue and has been for a long time. Fake citations are of course bad but only tip of the iceberg.
seventytwo
Then punish all of it.
potato3732842
>These bad actors should be subject to a three strikes rule: the steady corrosion of knowledge is not an accident by these individuals.
These people are working in labs funded by Exxon or Meta or Pfizer or whoever and they know what results will make continued funding worthwhile in the eyes of their donors. If the lab doesn't produce the donor will fund another one that will.
null
theoldgreybeard
If a carpenter builds a crappy shelf “because” his power tools are not calibrated correctly - that’s a crappy carpenter, not a crappy tool.
If a scientist uses an LLM to write a paper with fabricated citations - that’s a crappy scientist.
AI is not the problem, laziness and negligence is. There needs to be serious social consequences to this kind of thing, otherwise we are tacitly endorsing it.
CapitalistCartr
I'm an industrial electrician. A lot of poor electrical work is visible only to a fellow electrician, and sometimes only another industrial electrician. Bad technical work requires technical inspectors to criticize. Sometimes highly skilled ones.
andy99
I’ve reviewed a lot of papers, I don’t consider it the reviewers responsibility to manually verify all citations are real. If there was an unusual citation that was relied on heavily for the basis of the work, one would expect it to be checked. Things like broad prior work, you’d just assume it’s part of background.
The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document. They are also assuming good faith.
stdbrouw
The idea that references in a scientific paper should be plentiful but aren't really that important, is a consequence of a previous technological revolution: the internet.
You'll find a lot of papers from, say, the '70s, with a grand total of maybe 10 references, all of them to crucial prior work, and if those references don't say what the author claims they should say (e.g. that the particular method that is employed is valid), then chances are that the current paper is weaker than it seems, or even invalid, and so it is extremely important to check those references.
Then the internet came along, scientists started padding their work with easily found but barely relevant references and journal editors started requiring that even "the earth is round" should be well-referenced. The result is that peer reviewers feel that asking them to check the references is akin to asking them to do a spell check. Fair enough, I agree, I usually can't be bothered to do many or any citation checks when I am asked to do peer review, but it's good to remember that this in itself is an indication of a perverted system, which we just all ignored -- at our peril -- until LLM hallucinations upset the status quo.
PeterStuer
I think the root problem is that everyone involved, from authors to reviewers to publishers, know that 99.999% of papers are completely of no consequence, just empty calories with the sole purpose of padding quotas for all involved, and thus are not going to put in the effort as if.
This is systemic, and unlikely to change anytime soon. There have been remedies proposed (e.g. limits on how many papers an author can publish per year, let's say 4 to be generous), but they are unlikely to gain traction as thoug most would agree onbenefits, all involved in the system would stand to lose short term.
grayhatter
> The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document.
I've always assumed peer review is similar to diff review. Where I'm willing to sign my name onto the work of others. If I approve a diff/pr and it takes down prod. It's just as much my fault, no?
> They are also assuming good faith.
I can only relate this to code review, but assuming good faith means you assume they didn't try to introduce a bug by adding this dependency. But I would should still check to make sure this new dep isn't some typosquatted package. That's the rigor I'm responsible for.
rokob
As a reviewer I at least skimmed the papers for every reference in every paper that I review. If it isn't useful to furthering the point of the paper then my feedback is to remove the reference. Adding a bunch of junk because it is broadly related in a giant background section is a waste of everyone's time and should be removed. Most of the time you are mostly aware of the papers being cited anyway because that is the whole point of reviewing in your area of expertise.
Aurornis
> I don’t consider it the reviewers responsibility to manually verify all citations are real
I guess this explains all those times over the years where I follow a citation from a paper and discover it doesn’t support what the first paper claimed.
not2b
Agreed. I used to review lots of submissions for IEEE and similar conferences, and didn't consider it my job to verify every reference. No one did, unless the use of the reference triggered an "I can't believe it said that" reaction. Of course, back then, there wasn't a giant plagiarism machine known to fabricate references, so if tools can find fake references easily the tools should be used.
andai
>I don’t consider it the reviewers responsibility to manually verify all citations are real.
Doesn't this sound like something that could be automated?
for paper_name in citations... do a web search for it, see if it there's a page in the results with that title.
That would at least give you "a paper with this name exists".
armcat
I agree with you (I have reviewed papers in the past), however, made-up citations are a "signal". Why would the authors do that? If they made it up, most likely they haven't really read that prior work. If they haven't, have they really done proper due dilligence on their research? Are they just trying to "beef up" their paper with citations to unfairly build up credibility?
barfoure
I’d love to hear some examples of poor electrical work that you’ve come across that’s often missed or not seen.
joshribakoff
I am not an electrician, but when I did projects, I did a lot of research before deciding to hire someone and then I was extremely confused when everyone was proposing doing it slightly differently.
A lot of them proposed ways that seem to violate the code, like running flex tubing beyond the allowed length or amount of turns.
Another example would be people not accounting for needing fireproof covers if they’re installing recessed, lighting in between dwelling in certain cities…
Heck, most people don’t actually even get the permit. They just do the unpermitted work.
AstroNutt
A couple had just moved in a house and called me to replace the ceiling fan in the living room. I pulled the flush mount cover down to start unhooking the wire nuts and noticed RG58 (coax cable). Someone had used the center conductor as the hot wire! I ended up running 12/2 Romex from the switch. There was no way in hell I could have hooked it back up the way it was. This is just one example I've come across.
xnx
No doubt the best electricians are currently better than the best AI, but the best AI is likely now better than the novice homeowner. The trajectory over the past 2 years has been very good. Another five years and AI may be better than all but the very best, or most specialized, electricians.
legostormtroopr
Current state AI doesn’t have hands. How can it possibly be better at installing electrics than anyone?
Your post reads like AI precisely because while the grammar is fine, it lacks context - like someone prompted “reply that AI is better than average”.
lencastre
an old boss of mine used to say there are no stupid electricians found alive, as they self select darwin award style
bdangubic
same (and much, much, much worse) for science
kklisura
> AI is not the problem, laziness and negligence is
This reminds me about discourse about a gun problem in US, "guns don't kill people, people kill people", etc - it is a discourse used solely for the purpose of not doing anything and not addressing anything about the underlying problem.
So no, you're wrong - AI IS THE PROBLEM.
Yoofie
No, the OP is right in this case. Did you read TFA? It was "peer reviewed".
> Worryingly, each of these submissions has already been reviewed by 3-5 peer experts, most of whom missed the fake citation(s). This failure suggests that some of these papers might have been accepted by ICLR without any intervention. Some had average ratings of 8/10, meaning they would almost certainly have been published.
If the peer reviewers can't be bothered to do the basics, then there is literally no point to peer review, which is fully independent of the author who uses or doesn't use AI tools.
smileybarry
Peer reviewers can also use AI tools, which will hallucinate a "this seems fine" response.
amrocha
If AI fraud is good at avoiding detection via peer review that doesn’t mean peer review is useless.
If your unit tests don’t catch all errors it doesn’t mean unit tests are useless.
sneak
> it is a discourse used solely for the purpose of not doing anything and not addressing anything about the underlying problem
Solely? Oh brother.
In reality it’s the complete opposite. It exists to highlight the actual source of the problem, as both industries/practitioners using AI professionally and safely, and communities with very high rates of gun ownership and exceptionally low rates of gun violence exist.
It isn’t the tools. It’s the social circumstances of the people with access to the tools. That’s the point. The tools are inanimate. You can use them well or use them badly. The existence of the tools does not make humans act badly.
TomatoCo
To continue the carpenter analogy, the issue with LLMs is that the shelf looks great but is structurally unsound. That it looks good on surface inspection makes it harder to tell that the person making it had no idea what they're doing.
embedding-shape
Regardless, if a carpenter is not validating their work before selling it, it's the same as if a researcher doesn't validate their citations before publishing. Neither of them have any excuses, and one isn't harder to detect than the other. It's just straight up laziness regardless.
judofyr
I think this is a bit unfair. The carpenters are (1) living in world where there’s an extreme focus on delivering as quicklyas possible, (2) being presented with a tool which is promised by prominent figures to be amazing, and (3) the tool is given at a low cost due to being subsidized.
And yet, we’re not supposed to criticize the tool or its makers? Clearly there’s more problems in this world than «lazy carpenters»?
k4rli
Very good analogy I'd say.
Also similar to what Temu, Wish, and other similar sites offer. Picture and specs might look good but it will likely be disappointing in the end.
SubiculumCode
Yeah seriously. Using an LLM to help find papers is fine. Then you read them. Then you use a tool like Zotero or manually add citations. I use Gemini Pro to identify useful papers that I might not yet have encountered before. But, even when asking to restrict itself to Pubmed resources, it's citations are wonky, citing three different version sources of the same paper (citations that don't say what they said they'd discuss).
That said, these tools have substantially reduced hallucinations over the last year, and will just get better. It also helps if you can restrict it to reference already screened papers.
Finally, I'd lke to say tthat if we want scientists to engage in good science, stop forcing them to spend a third of their time in a rat race for funding...it is ridiculously time consuming and wasteful of expertise.
bossyTeacher
The problem isn't whether they have more or less hallucinations. The problem is that they have them. And as long as they hallucinate, you have to deal with that. It doesn't really matter how you prompt, you can't prevent hallucinations from happening and without manual checking, eventually hallucinations will slip under the radar because the only difference between a real pattern and a hallucinated one is that one exists in the world and the other one doesn't. This is not something you can really counter with more LLMs either as it is a problem intrinsic to LLMs
nwallin
"Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can’t break."--Bruce Schneier
There's a corollary here with LLMs, but I'm not pithy enough to phrase it well. Anyone can create something using LLMs that they, themselves, aren't skilled enough to spot the LLMs' hallucinations. Or something.
LLMs are incredibly good at exploiting peoples' confirmation biases. If it "thinks" it knows what you believe/want, it will tell you what you believe/want. There does not exist a way to interface with LLMs that will not ultimately end in the LLM telling you exactly what you want to hear. Using an LLM in your process necessarily results in being told that you're right, even when you're wrong. Using an LLM necessarily results in it reinforcing all of your prior beliefs, regardless of whether those prior beliefs are correct. To an LLM, all hypotheses are true, it's just a matter of hallucinating enough evidence to satisfy the users' skepticism.
I do not believe there exists a way to safely use LLMs in scientific processes. Period. If my belief is true, and ChatGPT has told me it's true, then yes, AI, the tool, is the problem, not the human using the tool.
bigstrat2003
> If a carpenter builds a crappy shelf “because” his power tools are not calibrated correctly - that’s a crappy carpenter, not a crappy tool.
It's both. The tool is crappy, and the carpenter is crappy for blindly trusting it.
> AI is not the problem, laziness and negligence is.
Similarly, both are a problem here. LLMs are a bad tool, and we should hold people responsible when they blindly trust this bad tool and get bad results.
jodleif
I find this to be a bit “easy”. There is such a thing as bad tools. If it is difficult to determine if the tool is good or bad i’d say some of the blame has to be put on the tool.
rectang
“X isn’t the problem, people are the problem.” — the age-old cry of industry resisting regulation.
kklisura
It's not about resisting. It's about undermining any action whatsoever.
theoldgreybeard
I am not against regulation.
Quite the opposite actually.
codywashere
what regulation are you advocating for here?
kibwen
At the very least, authors who have been caught publishing proven fabrications should be barred by those journals from ever publishing in them again. Mind you, this is regardless of whether or not an LLM was involved.
rectang
I’m not advocating, I’m making a high-level observation: Industry forever pushes for nil regulation and blames bad actors for damaging use.
But we always have some regulation in the end. Even if certain firearms are legal to own, howitzers are not — although it still takes a “bad actor” to rain down death on City Hall.
The same dynamic is at play with LLMs: “Don’t regulate us, punish bad actors! If you still have a problem, punish them harder!” Well yes, we will punish bad actors, but we will also go through a negotiation of how heavily to constrain the use of your technology.
currymj
I recommend actually clicking through and reading some of these papers.
Most of those I spot checked do not give an impression of high quality. Not just AI writing assistance but many seem to have AI-generated "ideas", often plausible nonsense. the reviewers often catch the errors and sometimes even the fake citations.
can I prove malfeasance beyond a reasonable doubt? no. but I personally feel quite confident many of the papers I checked are primarily AI-generated.
I feel really bad for any authors who submitted legitimate work but made an innocent mistake in their .bib and ended up on the same list as the rest of this stuff.
uplifter
To me such an interpretation suggests there are likely to be papers that were not so easy to spot, perhaps because the AI accidentally happened upon more plausible nonsense and then generated fully non-sense data, which was believable but still (at a reduced level of criticality) nonsense data, to bolster said non-sense theory at a level that is less easy to catch.
This isn't comforting at all.
chistev
Last month, I was listening to the Joe Rogan Experience episode with guest Avi Loeb, who is a theoretical physicist and professor at Harvard University. He complained about the disturbingly increasing rate at which his students are submitting academic papers referencing non-existent scientific literature that were so clearly hallucinated by Large Language Models (LLMs). They never even bothered to confirm their references and took the AI's output as gospel.
https://www.rxjourney.net/how-artificial-intelligence-ai-is-...
teddyh
> Avi Loeb, who is a theoretical physicist and professor at Harvard University
Also a frequent proponent of UFO claims about approaching meteors.
chistev
Yea, he harped on that a lot during the podcast
venturecruelty
Talk about a buried lead... Avi Loeb is, first and foremost, a discredited crank.
sen
That’s implied by the fact he was on the Joe Rogan show.
mannanj
Isn't this an underlying symptom of lack of accountability of our greater leadership? They do these things, they act like criminals and thieves, and so the people who follow them get shown examples that it's OK while being told to do otherwise.
"Show bad examples then hit you on the wrist for following my behavior" is like bad parenting.
dandanua
I don't think they want you to follow their behavior. They do want accountability, but for everyone below them, not for themselves.
jameshart
Is the baseline assumption of this work that an erroneous citation is LLM hallucinated?
Did they run the checker across a body of papers before LLMs were available and verify that there were no citations in peer reviewed papers that got authors or titles wrong?
miniwark
They explain in the article what they consider a proper citation, an erroneous one and an hallucination, in the section "Defining Hallucitations". They also say than they have many false positives, mostly real papers who are not available online.
Thad said, i am also very curious of the result than their tool, would give to papers from the 2010's and before.
sigmoid10
If you look at their examples in the "Defining Hallucitations" section, I'd say those could be 100% human errors. Shortening authors' names, leaving out authors, misattributing authors, misspelling or misremembering the paper title (or having an old preprint-title, as titles do change) are all things that I would fully expect to happen to anyone in any field were things get ever got published. Modern tools have made the citation process more comfortable, but if you go back to the old days, you'd probably find those kinds of errors everywhere. If you look at the full list of "hallucinations" they claim to have discovered, the only ones I'd not immediately blame on human screwups are the ones where a title and the authors got zero matches for existing papers/people. If you really want to do this kind of analysis correctly, you'd have to match the claim of the text and verify it with the cited article. Because I think it would be even more dangerous if you can get claims accepted by simply quoting an existing paper correctly, while completely ignoring its content (which would have worked here).
Majromax
> Modern tools have made the citation process more comfortable,
That also makes some of those errors easier. A bad auto-import of paper metadata can silently screw up some of the publication details, and replacing an early preprint with the peer-reviewed article of record takes annoying manual intervention.
jameshart
I mean, if you’re able to take the citation, find the cited work, and definitively state ‘looks like they got the title wrong’ or ‘they attributed the paper to the wrong authors’, that doesn’t sound like what people usually mean when they say a ‘hallucinated’ citation. Work that is lazily or poorly cited but nonetheless attempts to cite real work is not the problem. Work which gives itself false authority by claiming to cite works that simply do not exist is the main concern surely?
_alternator_
Let me second this: a baseline analysis should include papers that were published or reviewed at least 3-4 years ago.
When I was in grad school, I kept a fairly large .bib file that almost certainly had a mistake or two in it. I don’t think any of them ever made it to print, but it’s hard to be 100% sure.
For most journals, they actually partially check your citations as part of the final editing. The citation record is important for journals, and linking with DOIs is fairly common.
currymj
the papers themselves are publicly available online too. Most of the ones I spot-checked give the extremely strong impression of AI generation.
not just some hallucinated citations, and not just the writing. in many cases the actual purported research "ideas" seem to be plausible nonsense.
To get a feel for it, you can take some of the topics they write about and ask your favorite LLM to generate a paper. Maybe even throw "Deep Research" mode at it. Perhaps tell it to put it in ICLR latex format. It will look a lot like these.
llm_nerd
People will commonly hold LLMs as unusable because they make mistakes. So do people. Books have errors. Papers have errors. People have flawed knowledge, often degraded through a conceptual game of telephone.
Exactly as you said, do precisely this to pre-LLM works. There will be an enormous number of errors with utter certainty.
People keep imperfect notes. People are lazy. People sometimes even fabricate. None of this needed LLMs to happen.
pmontra
Fabricated citations are not errors.
A pre LLM paper with fabricated citations would demonstrate will to cheat by the author.
A post LLM paper with fabricated citations: same thing and if the authors attempt to defend themselves with something like, we trusted the AI, they are sloppy, probably cheaters and not very good at it.
mapmeld
Further, if I use AI-written citations to back some claim or fact, what are the actual claims or facts based on? These started happening in law because someone writes the text and then wishes there was a source that was relevant and actually supportive of their claim. But if someone puts in the labor to check your real/extant sources, there's nothing backing it (e.g. MAHA report).
llm_nerd
>Fabricated citations are not errors.
Interesting that you hallucinated the word "fabricated" here where I broadly talked about errors. Humans, right? Can't trust them.
Firstly, just about every paper ever written in the history of papers has errors in it. Some small, some big. Most accidental, but some intentional. Sometimes people are sloppy keeping notes, transcribe a row, get a name wrong, do an offset by 1. Sometimes they just entirely make up data or findings. This is not remotely new. It has happened as long as we've had papers. Find an old, pre-LLM paper and go through the citations -- especially for a tosser target like this where there are tens of thousands of low effort papers submitted -- and you're going to find a lot of sloppy citations that are hard to rationalize.
Secondly, the "hallucination" is that this particular snake-oil firm couldn't find given papers in many cases (they aren't foolish enough to think that means they were fabricated. But again, they're looking to sell a tool to rubes, so the conclusion is good enough), and in others that some of the author names are wrong. Eh.
nkrisc
Under what circumstances would a human mistakenly cite a paper which does not exist? I’m having difficulty imagining how someone could mistakenly do that.
jameshart
The issue here is that many of the ‘hallucinations’ this article cites aren’t ’papers which do not exist’. They are incorrect author attributions, publication dates, or titles.
the_af
LLM are a force multiplier of this kind of errors though. It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.
Humans can do all of the above but it costs them more, and they do it more slowly. LLMs generate spam at a much faster rate.
llm_nerd
>It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.
But no one is claiming these papers were hallucinated whole, so I don't see how that's relevant. This study -- notably to sell an "AI detector", which is largely a laughable snake-oil field -- looked purely at the accuracy of citations[1] among a very large set of citations. Errors in papers are not remotely uncommon, and finding some errors is...exactly what one would expect. As the GP said, do the same study on pre-LLM papers and you'll find an enormous number of incorrect if not fabricated citations. Peer review has always been an illusion of auditing.
1 - Which is such a weird thing to sell an "AI detection" tool. Clearly it was mostly manual given that they somehow only managed to check a tiny subset of the papers, so in all likelihood was some guy going through citations and checking them on Google Search.
add-sub-mul-div
Quoting myself from just last night because this comes up every time and doesn't always need a new write-up.
> You also don't need gunpowder to kill someone with projectiles, but gunpowder changed things in important ways. All I ever see are the most specious knee-jerk defenses of AI that immediately fall apart.
llm_nerd
[flagged]
tokai
Yeah that is what their tool does.
senshan
As many pointed out, the purpose of peer review is not linting, but the assessment of the novelty and subtle omissions.
Which incentives can be set to discourage the negligence?
How about bounties? A bounty fund set up by the publisher and each submission must come with a contribution to the fund. Then there be bounties for gross negligence that could attract bounty hunters.
How about a wall of shame? Once negligence crosses a certain threshold, the name of the researcher and the paper would be put on a wall of shame for everyone to search and see?
skybrian
For the kinds of omissions described here, maybe the journal could do an automated citation check when the paper is submitted and bounce back any paper that has a problem with a day or two lag. This would be incentive for submitters to do their own lint check.
senshan
True if the citation has only a small typo or two. But if it is unrecognizable or even irrelevant, this is clearly bad (fraudulent?) research -- each citation has be read and understood by the researcher and put in there only if absolutely necessary to support the paper.
There must be price to pay for wasting other people's time (lives?).
TaupeRanger
It's going to be even worse than 50:
> Given that we've only scanned 300 out of 20,000 submissions, we estimate that we will find 100s of hallucinated papers in the coming days.
shusaku
20,000 submissions to a single conference? That is nuts
ghaff
Doesn't seem especially out of the norm for a large conference. Call it 10,000 attendees which is large but not huge. Sure; not everyone attending puts in a session proposal. But others put multiple. And many submit but, if not accepted don't attend.
Can't quote exact numbers but when I was on the conference committee for a maybe high four figures attendance conference, we certainly had many thousands of submissions.
zipy124
When academics are graded based on number of papers this is the result.
adestefan
The problem isn't only papers it's that the world of academic computer science coalesced around conference submissions instead of journal submissions. This isn't new and was an issue 30 years ago when I was in grad school. It makes the work of conference organizes the little block holding up the entire system.
analog31
This is an interesting article along those lines...
https://www.theguardian.com/technology/2025/dec/06/ai-resear...
noodlesUK
It astonishes me that there would be so many cases of things like wrong authors. I began using a citation manager that extracted metadata automatically (zotero in my case) more than 15 years ago, and can’t imagine writing an academic paper without it or a similar tool.
How are the authors even submitting citations? Surely they could be required to send a .bib or similar file? It’s so easy to then quality control at least to verify that citations exist by looking up DOIs or similar.
I know it wouldn’t solve the human problem of relying on LLMs but I’m shocked we don’t even have this level of scrutiny.
pama
Maybe you haven’t carefully checked yet the correctness of automatic tools or of the associated metadata. Zotero is certainly not bug free. Even authors themselves have miss-cited their own past work on occasion, and author lists have had errors that get revised upon resubmission or corrected in errata after publication. The DOI is indeed great, and if it is correct, I can still use the citation as a reader, but the (often abbreviated) lists of authors often have typos. In this case the error rate is not particularly high compared to random early review-level submissions I’ve seen many decades ago. Tools helped increase the number of citations and reduce the error per citation but not sure if they reduced the papers that have at least one error.
btisler
I’ve been working on tools that specifically address this problem, but from the level upstream of citation. They don’t check whether a citation exists — instead they measure whether the reasoning pathway leading to a citation is stable, coherent, and free of the entropy patterns that typically produce hallucinations.
The idea is simple: • Bad citations aren’t the root cause. • They are a late-stage symptom of a broken reasoning trajectory. • If you detect the break early, the hallucinated citation never appears.
The tools I’ve built (and documented so anyone can use) do three things: 1. Measure interrogative structure — they check whether the questions driving the paper’s logic are well-formed and deterministic. 2. Track entropy drift in the argument itself — not the text output, but the structure of the reasoning. 3. Surface the exact step where the argument becomes inconsistent — which is usually before the fake citation shows up.
These instruments don’t replace peer review, and they don’t make judgments about culture or intent. They just expose structural instability in real time — the same instability that produces fabricated references.
If anyone here wants to experiment or adapt the approach, everything is published openly with instructions. It’s not a commercial project — just an attempt to stabilize reasoning in environments where speed and tool-use are outrunning verification.
Code and instrument details are in my CubeGeometryTest repo (the implementation behind ‘A Geometric Instrument for Measuring Interrogative Entropy in Language Systems’). https://github.com/btisler-DS/CubeGeometryTest This is still a developing process.
Isamu
Someone commented here that hallucination is what LLMs do, it’s the designed mode of selecting statistically relevant model data that was built on the training set and then mashing it up for an output. The outcome is something that statistically resembles a real citation.
Creating a real citation is totally doable by a machine though, it is just selecting relevant text, looking up the title, authors, pages etc and putting that in canonical form. It’s just that LLMs are not currently doing the work we ask for, but instead something similar in form that may be good enough.
null
dclowd9901
To me, this is exactly what LLMs are good for. It would be exhausting double checking for valid citations in a research paper. Fuzzy comparison and rote lookup seem primed for usage with LLMs.
Writing academic papers is exactly the _wrong_ usage for LLMs. So here we have a clear cut case for their usage and a clear cut case for their avoidance.
skobes
If LLMs produce fake citations, why would we trust LLMs to check them?
venturecruelty
Because my boss said if I don't, I'm fired.
watwut
Because the risk is lower. They will give you suspicious citations and you can manually check those for false positives. If some false citation pass, it was still a net gain.
dawnerd
Shouldn’t need an llm to check. It’s just a list of authors. I wouldn’t trust an llm on this, and even if they were perfect that’s a lot of resource use just to do something traditional code could do.
idiotsecant
Exactly, and there's nothing wrong with using LLMs in this same way as part of the writing process to locate sources (that you verify), do editing (that you check), etc. It's just peak stupidity and laziness to ask it to do the whole thing.
Surely this is gross professional misconduct? If one of my postdocs did this they would be at risk of being fired. I would certainly never trust them again. If I let it get through, I should be at risk.
As a reviewer, if I see the authors lie in this way why should I trust anything else in the paper? The only ethical move is to reject immediately.
I acknowledge mistakes and so on are common but this is different league bad behaviour.