arXiv moving from Cornell servers to Google Cloud
167 comments
·April 18, 2025x_may
khuey
> If GCP is helping, stay tuned for a blog post from google some time around the completion of the migration with a title like "Reaffirming our commitment to science" or something similarly self affirming.
"Google pays to run an enormous intellectual resource in exchange for a self-congratulatory blogpost" seems like a perfectly acceptable outcome for society here.
stonogo
It wasn't when it happened to Usenet.
toomuchtodo
Frequent backups to the Internet Archive for rehydration when needed. RIP Dejanews. Hopefully we’ve learned from past experience.
mistrial9
mirrors, please
yumraj
> If GCP is helping, stay tuned for a blog post from google some time around the completion of the migration with a title like "Reaffirming our commitment to science" or something similarly self affirming.
This is an odd criticism. If a company is footing the bill, it can’t even talk about it to gain some publicity/good will?
nophunphil
Footing the bill for how long?
flakiness
https://info.arxiv.org/about/supporters.html
Our Supporters
...
Gold Sponsors
Google, Inc (USA)
TZubiri
"Reaffirming our commitment to science" or something similarly self affirming.
While I understand that something is more genuine if done in secret, it doesn't stop being a real commitment to science just because you make a pr post about it.
If company X contributes to Y open source foundation, that's real and they get to claim clout, nobody cares about a post anyways.
tokai
Would love to see arXiv set up as a consortium of international academic libraries instead. Scientific publishing is where it is today because universities and scientific societies sold off or gave their journals to private enterprises. Letting Google in is a move in the wrong direction imo.
fc417fc802
Some sort of federated preprint protocol where anyone could stand up a node and clone the existing data would be ideal. The current centralized operator then becomes "just" a curator (and competing curators are easy to set up).
harywilke
This is something that started as far back as march 2023.
https://investinopen.org/blog/ioi-partners-with-arxiv-to-dev... https://blog.arxiv.org/2023/06/12/arxiv-is-hiring-software/
kiproping
I recently read about arxiv, it's history and all the mini-drama's around it https://www.wired.com/story/inside-arxiv-most-transformative....
I wonder if Ginsparg is finally retiring and relinquishing access.
chubot
Wow, this is a great article! (other archive link - https://archive.is/XVCi7 )
I didn't realize arXiV was started in 1991. And then I wondered why I had never heard of it while I was at Cornell from 1997-2001. Apparently it only assumed the arXiV name in 1999.
I like that it was a bunch of shell scripts :)
Long before arXiv became critical infrastructure for scientific research, it was a collection of shell scripts running on Ginsparg’s NeXT machine.
Interesting connections:
As an undergrad at Harvard, he was classmates with Bill Gates and Steve Ballmer; his older brother was a graduate student at Stanford studying with Terry Winograd, an AI pioneer.
On the move to the web in the early 90's:
He also occasionally consulted with a programmer at the European Organization for Nuclear Research (CERN) named Tim Berners-Lee
And then there was a 1994 move to Perl, and 2022 move to Python ...
Although my favorite/primary language is Python, I can't help but wonder if "rewrite in Python" is mainly a social issue ... i.e. maybe they don't know how to hire Perl programmers and move to the cloud. I guess rewrites are often an incomplete transmission of knowledge about the codebase.
chubot
Another tidbit: https://arxiv.org/abs/1706.04188
FAQ 1: Why did you create arXiv if journals already existed? Has it developed as you had expected?
Answer: Conventional journals did not start coming online until the mid to late 1990s. I originally envisioned it as an expedient hack, a quick-and-dirty email-transponder written in csh to provide short-term access to electronic versions of preprints until the existing paper distribution system could catch up, within about three months.
So it was in csh on NeXT. Tim Berners-Lee also developed the web on NeXT!
LarsDu88
John Carmack and John Romero also developed the original Doom on NeXT
kevinventullo
If funding is an issue, I’m quite certain they could set up a nonprofit to support it. I would happily donate to keep arxiv around.
tough
its not a nonprofit (cornell, its a university) but they do accept donations
gcr
Cornell is currently in hiring freeze. These roles will not be filled.
Source: I applied to a Cornell-related lab in March. A week after submitting my application the role was rescinded and my contact emailed me explaining the situation.
wyclif
It looks like the links are broken to the software engineer and DevOps open roles.
quantumHazer
Is it related with policies from the US administration?
trop
From 3/17/25:
> Together with all of American higher education, Cornell is entering a time of significant financial uncertainty. The potential for deep cuts in federal research funding, as well as tax legislation affecting our endowment income, has now been added to existing concerns related to rapid growth and cost escalations. It is imperative that we navigate this challenging financial landscape with a shared understanding and common purpose, to continue to advance our mission, strengthen our academic community, and deepen our impact. [0]
mindslight
Do bears shit in the woods?
darkoob12
Fantastic. Now countries like Iran are going to be blocked. Internet is not a public network anymore It is owned by mostly American cooperation and they will decide what content to show and which group of people can access it.
whygcp
That's true. I recently had to move a VM from gcp to hetzner because gcp would silently drop all packets to some countries, Iran included. And a Stack overflow question was the easiest way to learn about it, not gcp docs.
Orygin
I have looked at it recently and it seems Iran is blocking GCP, not the other way around. Not sure if Google keep a doc up to date with who blocks them.
sciurus
From my past experience, I can say that Google Cloud services (e.g load balancers) by default blocked traffic from ITAR sanctioned countries. Not just blocking people in those countries from becoming customers of GCP, but blocking them from accessing content hosted on GCP.
I didn't know how that situation had evolved since I last used GCP.
whygcp
Why do you think it's Iran doing blocking?
That's not what Google says: https://support.google.com/a/answer/2891389?hl=en
dvrp
Hope they stay vendor unlocked though
perihelions
Is this related to the federal government's vendetta against Cornell? And is there a risk Cornell becomes unable to operate arXiv because of this?
https://www.npr.org/2025/04/09/g-s1-59090/trump-officials-ha... ("Trump officials halt $1 billion in funding for Cornell, $790 million for Northwestern")
harywilke
No, they announced that they are starting this project back in June 2023 [0], though it is good to see Cornell suing the administration for the second time, first was back in February. And as an alum, who also had relatives up at syracuse, i appreciate the snark from syracuse.com [1] calling Cornell a 'central NY college' heh
0. https://blog.arxiv.org/2023/06/ 1. https://www.syracuse.com/news/2025/04/central-ny-college-sue...
sppfly
I'm wondering who will pay for the service fee to Google. arXiv will still be free right? That would not be just 88,000/year...
Daviey
If it means I can download papers without spoofing my user-agent, then I am happy!
fc417fc802
On the contrary, I'd expect Google to be much more proficient at blocking requests based on various factors. It wouldn't surprise me in the slightest if recaptcha made an appearance.
By all rights arxiv should be moving towards decentralization as opposed to being picked up by one of the largest centralized players.
andyjohnson0
Looks to me like their motivation was largely down to skillset issues and recruitment.
If this is motivated by the prospect of being menaced by the current US government then, while Google might be a safer home, arXiv is still vulnerable to having its funding disrupted by malicious actors.
It may be that it was time for the hardware that was previously running Arxiv to be retired and this is just another Capex -> Opex decision being made by so many tech companies.
I'd like to know if GCP is covering part of the bill? Or will Cornell be paying all of it? The new architecture smells of "[GCP] will pay/credit all of these new services if you agree to let one of our architects work with you". If GCP is helping, stay tuned for a blog post from google some time around the completion of the migration with a title like "Reaffirming our commitment to science" or something similarly self affirming.