MIT asks arXiv to take down preprint of paper on AI and scientific discovery

92 comments

·May 16, 2025

Aurornis

> Our understanding is that only authors of papers appearing on arXiv can submit withdrawal requests. We have directed the author to submit such a request, but to date, the author has not done so.

Between this and the subtle reference to “former second-year PhD student” it makes sense that they’d have to make a public statement.

They do a good job of toeing the required line of privacy while also giving enough information to see what’s going on.

I wonder if the author thought they could leave the paper up and ride it into a new position while telling a story about voluntarily choosing to leave MIT. They probably didn’t expect MIT to make a public statement about the paper and turn it into a far bigger news story than it would have been if the author quietly retracted it.

JohnKemeny

Seeing as how the author has signed in with an account whose email address is username@mit.edu, MIT could just take over the account.

Aurornis

That's not how it works in the real world. That would be a fraudulent request and I suspect they'd invite legal trouble by impersonating someone else to access a computer system.

Furthermore, if the author could demonstrate to arXiv that the request was fraudulent, the paper would be reinstated. The narrative would also switch to people being angry at MIT for impersonating a student to do something.

NoMoreNicksLeft

>That's not how it works in the real world. That would be a fraudulent request and I suspect they'd invite legal trouble by impersonating someone else to access a computer system.

Emails are not people. You can impersonate a person, but you can't impersonate an email. If I own a company and I issue the email dick.less@privateequity.com but then have to fire him... using this email address to transfer company assets back to someone who can be responsible for them isn't fraud (for that purpose, at least). How is this not the same issue?

kragen

That kind of thing might lead to arXiv not accepting any more papers from MIT, or at least any more takedowns.

intoamplitudes

First impressions:

1. The data in most of the plots (see the appendix) look fake. Real life data does not look that clean.

2. In May of 2022, 6 months before chatGPT put genAI in the spotlight, how does a second-year PhD student manage to convince a large materials lab firm to conduct an experiment with over 1,000 of its employees? What was the model used? It only says GANs+diffusion. Most of the technical details are just high-level general explanations of what these concepts are, nothing specific.

"Following a short pilot program, the lab began a large-scale rollout of the model in May of 2022." Anyone who has worked at a large company knows -- this just does not happen.

btrettel

On point 2, the study being apparently impossible to conduct as described was also a problem for Michael LaCour. Seems like an underappreciated fraud-detection heuristic.

https://en.wikipedia.org/wiki/When_Contact_Changes_Minds

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...

> As we examined the study’s data in planning our own studies, two features surprised us: voters’ survey responses exhibit much higher test-retest reliabilities than we have observed in any other panel survey data, and the response and reinterview rates of the panel survey were significantly higher than we expected.

> The firm also denied having the capabilities to perform many aspects of the recruitment procedures described in LaCour and Green (2014).

raphman

Oh, he also claimed that he got IRB approval from "MIT’s Committee on the Use of Humans as Experimental Subjects under ID E-5842. JEL Codes: O31, O32, O33, J24, L65." before conducting this research, i.e., at a time when he wasn't even a PhD student.

raphman

FWIW, in the q&a after a talk, he claims that it was a GNN (graph neural network), not a GAN.

(In this q&a, the audience does not really question the validity of the research.)

https://doi.org/10.52843/cassyni.n74lq7

mncharity

Wayback of the Sloan School seminar page shows him doing one on February 24, 2025. I wonder how that went.

I miss google search's Cache. As with the seminar, several other hits on MIT pages have been removed. I'm reminded of a PBS News Hour story, on free fusion energy from water in your basement (yes, really), which was memory holed shortly after. The next-ish night they seemed rather put out, protesting they had verified the story... with "a scientist".

That cassyni talk link... I've seen a lot of MIT talks (a favorite mind candy), and though Sloan was underrepresented, that looked... more than a little odd. MIT Q&A norms are diverse, from the subtle question you won't appreciate if you haven't already spotted the fatal flaw, to leaving the speaker in tears. I wonder if there's a seminar tape.

I agree with point 1, at least superficially. But re: point 2, there are a lot of companies with close connections to MIT (and other big institutions like Stanford) that are interested in deploying cutting edge research experiments, especially if they already have established ties with the lab/PI

pixl97

>The data in most of the plots (see the appendix) look fak

Could a Benford's Law analysis apply here to detect that?

constantcrying

How would you apply it, why would it be applicable?

constantcrying

A month by month record of scientists time spend on different tasks is on its face absurd. The proposed methodology, automatic textual analysis of scientists written records, giving you a year worth of a near constant time split pre AI is totally unbelievable.

The data quality for that would need to be unimaginably high.

mzs

  % gunzip -c arXiv-2412.17866v1.tar.gz | tar xOf - main.tex | grep '\bI have\b'
  To summarize, I have established three facts. First, AI substantially increases the average rate of materials discovery. Second,  it  disproportionately benefits researchers with high initial productivity. Third, this heterogeneity is driven almost entirely  by differences in judgment. To understand the mechanisms behind these results, I investigate the dynamics of human-AI collaboration in science.
          \item Compared to other methods I have used, the AI tool generates potential materials that are more likely to possess desirable properties.
          \item The AI tool generates potential materials with physical structures that are more distinct than those produced by other methods I have used.
  % gunzip -c arXiv-2412.17866v1.tar.gz | tar xOf  - main.tex | grep '\b I \b' | wc
      25    1858   12791
  %

rafram

Not sure what you’re trying to say.

kccqzy

Maybe the point is that it is rare for a paper to have the pronoun "I" so many times. Usually the pronoun "we" is used even when there is a single author.

tokai

Impressively the paper seems to have been cited 50 times already. I don't mind much if its taken down or not, but with the old guard publishers you can at least get a redaction notice or comment about the issues with a paper embedded in the publication. If you find this paper cited somewhere and follow it to the source at arxiv, you will never be made aware of the disputes surrounding the research. Preprint servers has somewhat of a weakness here.

forgotpwd16

A weakness that goes hand-in-hand with the lack of peer review. Trusting the study means trusting the author or reviewing the paper yourself. If a withdraw happens, either the author comments on why they did it[0] or, similarly to previous, you've to search it yourself.

[0] E.g. arxiv/0812.0848: "This paper has been withdrawn by the author due to a crucial definition error of Triebel space".

pvg

The paper had an HN thread a few months ago

https://news.ycombinator.com/item?id=42115310

tsurba

Nice that someone realized then already it sounds sus https://news.ycombinator.com/item?id=42128532

mmooss

That's not a signal: There always are comments saying the research is suspect.

dang

Thanks! Macroexpanded:

Artificial Intelligence, Scientific Discovery, and Product Innovation [pdf] - https://news.ycombinator.com/item?id=42115310 - Nov 2024 (47 comments)

ipsum2

MIT's article is quite scant on details. WSJ has more information, but still no specifics: https://www.wsj.com/tech/ai/mit-says-it-no-longer-stands-beh...

> The paper was championed by MIT economists Daron Acemoglu, who won the 2024 economics Nobel, and David Autor. The two said they were approached in January by a computer scientist with experience in materials science who questioned how the technology worked, and how a lab that he wasn’t aware of had experienced gains in innovation. Unable to resolve those concerns, they brought it to the attention of MIT, which began conducting a review.

neonate

https://archive.ph/r63jR

null

[deleted]

timewizard

[flagged]

jldugger

To quote someone elsewhere: "Okay, time to pack it up boys! Someone found the cheatcode to defeating economic research."

in9

https://en.wikipedia.org/wiki/List_of_Nobel_Memorial_Prize_l...

throwup238

Nobel _Memorial_ Prize in Economic Sciences

That’s not a Nobel Prize.

colechristensen

This is inaccurate pedantry. It is commonly referred to as the nobel prize in economics and administered by the same foundation, the funding for it is a gift to the foundation from the Swedish central bank instead of being sourced from Nobel's estate.

Keyframe

yeah, but also "Nobel accuses the awarding institution of misusing his family's name, and states that no member of the Nobel family has ever had the intention of establishing a prize in economics." It's hijacking of the brand.

null

[deleted]

dougb5

This makes me think about the credibility of single-author vs. multi-author papers in different disciplines. In computer science, a paper is seen as suspicious if there's just one author (at least nowadays). But in economics it seems much more common. Can an economist explain this for me (or perhaps a paper written by multiple economists?)

type0

> But in economics it seems much more common

non-scientific studies can't be replicated

als0

What’s that got to do with the number of authors?

forgotpwd16

A study that cannot be replicated is a study that cannot be falsified. Authors don't mind putting their names on them because there's no accountability to be held and is purely net positive (one more publication and additional citations).

coderintherye

Nice Twitter thread from Nov '24 analyzing the paper: https://x.com/Robert_Palgrave/status/1856273405965693430

raphman

Thanks. On Twitter, Ethan Mollick seems to imply that Robert Palgrave might be the scientist that triggered the investigation.

andy99

Is there a way for not Twitter users to read these?

madars

Yeah, replace x.com with xcancel.com or nitter.poast.org, e.g., https://xcancel.com/Robert_Palgrave/status/18562734059656934...

drdeca

Yes, https://github.com/zedeus/nitter/wiki/Instances

12_throw_away

Reading the paper (which is still up) ... the "AI" (sigh) tool described there would not have been particularly novel or unusual, even if the research was conducted several years ago. ML + inverse design for materials has been used for decades.

ayhanfuat

> by a former second-year PhD student

Seems pretty serious if they kicked him out.

dhosek

I always wonder what happens with these high-profile transgressors. I once created a Google News alert for a high-level Apple employee who went to jail for some criminal act at Apple and never saw any indication of him again. I’m guessing his career in economics is likely over (he’d previously worked at the NY Fed before starting at MIT) and I wonder what he’ll end up doing—will he be able to find some sort of white-color work in the future or will he be condemned to retail or food-service employment.

Workaccount2

There are a gazillion small companies out there that hire white collar workers with only a rudimentary background check (are they a felon) and an interview that is more a vibe check than anything.

He probably will never be someone of significance, but he also will probably be able to have a standard middle class life.

Aurornis

The MIT announcement says they asked him to retract the paper but he wouldn't, which led to them making the public statement about the paper.

They may have thought they could jump into an industry job, including the paper and all of its good press coverage on their resume. Only the author can retract an arXiv paper, not their academic institution. It wouldn't be hard to come up with a story that they decided to leave the academic world and go into industry early.

MIT coming out and calling for the paper's retraction certainly hampers that plan. They could leave it up and hope that some future employer is so enamored with their resume that nobody does a Google search about it, but eventually one of their coworkers is going to notice.

NitpickLawyer

> will he be able to find some sort of white-color work in the future or will he be condemned to retail or food-service employment.

Lay low for a year, work on some start-up-ish looking project, then use his middle name to get hired at one of the many AI startups? (only half joking)...

nonameiguess

Stephen Glass, the dude who fabricated stories for New Republic back in the late 90s, has attempted at least twice to become an attorney after going to law school. Both New York and California denied his bar applications on the grounds that he failed the standards for moral character. He nonetheless seems to be employed by a law firm, but not as a practicing attorney.

Loughla

White collar encompasses a lot, outside of economics or finance.

Also, there are companies who will see that win at any cost mentality as a positive trait.

I'm betting whoever it is, is okay now.

hooloovoo_zoo

I don't think arXiv should take it down even if it is fraud. ArXiv is more about being a permanent store than a quality judge.

forgotpwd16

Actually arXiv is moderated and if policies are violated they may even withdraw a paper themselves, if it wasn't declined to be published in first place. Regarding policies, it's mentioned that a "submission may be declined if the moderators determine it lacks originality, novelty, significance, and/or contains falsified, plagiarized content or serious misrepresentations of data, affiliation, or content."

modzu

store of what? fake scientific articles or genuine preprints? if the latter clean this crap up

hooloovoo_zoo

Judging quality/fraud is the role of a journal/conference, not arXiv. If a paper gets rejected does it come off arXiv? No. If a paper is never submitted does it come off? No. If a paper is retracted, does it come off? No. ArXiv should avoid making as many subjective determinations as possible.

andy99

I agree with this, it's actually a good reminder not to trust a preprint server. Arxiv already has an inappropriate air of validity, moderation will only make it worse.

(Incidentally, I don't think misplaced trust in preprints is much of an academic issue, people that are experts in their field can easily judge quality for themselves. It's laypeople taking them at face value that's the problem.)

pbhjpbhj

>Earlier this year, the COD conducted a confidential internal review based upon allegations it received regarding certain aspects of this paper. While student privacy laws and MIT policy prohibit the disclosure of the outcome of this review, we are writing to inform you that MIT has no confidence in the provenance, reliability or validity of the data and has no confidence in the veracity of the research contained in the paper. Based upon this finding, we also believe that the inclusion of this paper in arXiv may violate arXiv’s Code of Conduct.

It sounds like "we don't like it and won't tell you why, we're hiding behind MIT policy and vague notions of privacy".

MIT should just demonstrate in a paper what the shortcomings are and print it, adding it to the citation tree of the original.

Looking very briefly at the paper and speculating wildly, I could imagine that the company who were subject of it - or their staff - might not appreciate it and have put pressure on MIT??

Solid amount of Streisand Effect going on here -- lots of attention has been bought to the paper (and that is everything after all!).

duskwuff

> It sounds like "we don't like it and won't tell you why, we're hiding behind MIT policy and vague notions of privacy".

FERPA is federal law. It is quite likely that MIT is legally bound to not release some pieces of evidence which are crucial in this case (hypothetically, for example: that the student's educational record is inconsistent with claims made in the paper).

shkkmo

> Looking very briefly at the paper and speculating wildly, I could imagine that the company who were subject of it - or their staff - might not appreciate it and have put pressure on MIT??

The apparent issue is that the data appears to have been entirely fabricated and is a lie. The author appears to simply be a fraud

shanemhansen

"I don't endorse this paper. Therefore you should take it down. I won't tell you why. Trust me bro."

Whether MIT is right or wrong, the arrogance displayed is staggering. The only thing more shocking is that obviously this behavior works for them and they are used to people obeying them without question because they are MIT.

levocardia

More like "Because it involves a student, FERPA won't allow us to legally disclose what's going on, but we kicked the student out so you should take the hint and realize what was going on"

morning-coffee

The arrogance of MIT is staggering? I would say the arrogance of paper's author is 10x as staggering that if what Robert Palgrave has suggested is true.

I think MIT is trying to protect its reputation as a would-be place of fraud-free research, unlike Harvard.

willb_ml

I'm sure this works for other institutions also, not just MIT. Maybe the evidence they have for the request requires disclosing data that violates FERPA, which they obviously aren't allowed to do.

ricksunny

It is indeed disappointing posture the institute is putting on full display here.

gowld

Bad OP Title

Better title:

MIT disavows heavily-discussed economics preprint paper about Artificial Intelligence and Scientific Discovery.

dang

I've attempted to put a neutral title at the top of this page. If someone can come up with a better (i.e. more accurate and neutral) one, we can change it again.

(Since press release titles about negative news tend to studiously avoid saying anything, we tend to classify them in the "misleading" bucket of https://news.ycombinator.com/newsguidelines.html, which justifies rewriting them.)

Twirrim

That would be contrary to HN's guidlines "please use the original title, unless it is misleading or linkbait; don't editorialize."

dylan604

[flagged]

null

[deleted]

jdhwosnhw

It’s the title of the original article…