Archivists work to save disappearing data.gov datasets
48 comments
·January 30, 2025jl6
> The outlet reports that deleted datasets "disproportionately" come from environmental science agencies like the Department of Energy, National Oceanic and Atmospheric Administration (NOAA), and the Environmental Protection Agency (EPA).
Was there an EO targeting these areas?
_DeadFred_
Looks like the EPA is being targeted (Even though ninety-five percent of the funding going to EPA has not only been appropriated, but is locked in, legally obligated grant funding. The Constitution does not give the president a line item veto over Congress's spending decisions):
https://www.cbsnews.com/news/epa-employees-warned-of-immedia...
arcbyte
The President's ability to affect spending is definitely limited, and hasn't been exercised really since Reagan, but still exists.
Congress rarely makes spending money it's goal, rather it appropriates money to accomplish some goal. Which is to say that if Congress wants a bridge across a river and appropriate 10 billion to build it, the President is not obligated to spend $10 billion if 7 or 8 or 9 will do. In some cases, Congress does appropriate money toward causes and intends all the money to be spent in furtherance of some guiding principle and in these cases all the money must be spent.
TheBlight
>"As a probationary/trial period employee, the agency has the right to immediately terminate you pursuant to 5 CFR § 315.804,"
Looks legit to me.
dangrossman
> "While 5 CFR 315 does permit immediate termination, it does not permit arbitrary termination. The termination must be related to unsatisfactory performance or conduct (section 804) or conditions arising before employment, which usually means something from your background investigation (section 805)..."
https://www.reddit.com/r/fednews/comments/1id7ud2/comment/m9...
johnneville
Politico reports that USDA landing pages regarding climate change were ordered to be deleted by a directive from the USDA's office of communications.
I think it is likely that orders to these other agencies follows this model. Many other datasets are being targeted via EO 14168 which has quite wide impacts but doesn't appear at first glance to apply to what i would expect to be a part of NOAA and EPA reports.
https://www.politico.com/news/2025/01/31/usda-climate-change...
chinathrow
These assholes.
null
bilbo0s
Don’t worry, it is a matter of great doctrinal import that all scientific datasets be replaced with datasets that have been properly refined in accordance with scripture. /s
Maybe this administration will get better over time?
uni_rule
Nah, the whole executive branch is getting Jack Welch'ed. Hopefully your tap water cleanliness regulations are strong on a state level.
pluto_modadic
don't they have to have to have done this /before/ it gets deleted?
debeloo
Is this normal when there's change in presidency?
meesles
From the article:
> Changes in presidential administrations have led to datasets being deleted in the past, either on purpose or by accident. When Biden took office, 1,000 datasets were deleted according to the Wayback Machine, via 404 Media's reporting.
derbOac
I think the question is the nature of the losses in the two cases, the transparency circumstances about them, and who exactly is making the decisions about specific datasets.
Time will tell but loss of public datasets is probably not usually good in general.
doener
Yes, the context in which this happens could provide clues as to the nature of these losses: https://news.ycombinator.com/item?id=42898165
animal_spirits
This is not a direct quote, the actual quote from the article is
> But archivists who have been working on analyzing the deletions and archiving the data it held say that while some of the deletions are surely malicious information scrubbing, some are likely routine artifacts of an administration change, and they are working to determine which is which. For example, in the days after Joe Biden was inaugurated, data.gov showed about 1,000 datasets being deleted as compared to a day before his inauguration, according to the Wayback Machine.
sunk1st
I don’t see a list of the datasets that have gone missing. Is there a list?
mistrial9
[flagged]
NortySpock
And you could also run your own archive bot (x86 only). I've got one running in a docker container, it downloads a webpage and auto-uploads it to archive.org
https://tracker.archiveteam.org/
Edit to add:
docker_compose.yml example:
services:
archiveteam:
image: atdr.meo.ws/archiveteam/warrior-dockerfile
ports:
- '8101:8001'
mem_limit: 4G
cpus: 3
dns:
- 9.9.9.10
- 8.8.8.8
labels:
- com.centurylinklabs.watchtower.enable=true
container_name: archiveteam-warrior
environment:
- DOWNLOADER=asdf # Change this to your nickname
- SELECTED_PROJECT=auto # Change this to your project of preference or let the archiveteam decide with 'auto'
- CONCURRENT_ITEMS=6 # Change this to the amount of concurrent download threads you can handle
watchtower:
command: '--label-enable --include-restarting --cleanup --interval 3600'
cpu_shares: 128
mem_limit: 1G
cpus: 1
image: containrrr/watchtower
volumes:
- '/var/run/docker.sock:/var/run/docker.sock'
container_name: watchtower
null
WhereIsTheTruth
data, today, is even more precious than we ever imagined
start to hoard now, and train later
-- any random AI chat:
prompt: find anomalies in data.gov datasets
Example Findings (Hypothetical):
A 2023 dataset shows a 15% unemployment rate in Nevada while neighboring states average 5%. Without a clear reason (e.g., casino industry collapse), this could be an anomaly.
A typo in nonfarm payrolls lists 500,000 jobs added instead of 50,000 in a month, creating a false spike.
more (it's kinda fun to play with): https://0.0g.gg/?ec93e411b83cdf87#AxonqVHwhsBo827yh5CHx5NnBP...
smrtinsert
Are datasets mirrored anywhere where the govt doesn't automatically have a take down authority? If not there should be a mirroring effort.
lhl
There's been a lot of discussion in https://www.reddit.com/r/DataHoarder/
Here's documentation on independent backup efforts of various government websites: https://www.reddit.com/r/DataHoarder/comments/1ifalwe/us_gov...
Also here: https://www.reddit.com/r/DataHoarder/comments/1idj6dm/all_us...
Apparently, much of the data has been back up here: https://eotarchive.org/
Here's also a discussion on whether the Internet Archive is sufficiently backed up/decentralized (it is not): https://www.reddit.com/r/DataHoarder/comments/1if32iq/does_i...
null
notavalleyman
I read, in past days, that the man who ordered the construction of the nearly infinite Wall of China was that First Emperor, Shih Huang Ti, who likewise ordered the burning of all the books before him. That the two gigantic operations - the five or six hundred leagues of stone to oppose the barbarians, the rigorous abolition of history, that is of the past - issued from one person and were in a certain sense his attributes, inexplicably satisfied me and, at the same time, disturbed me.
- Borges
o11c
For reference (since it both uses a non-Pinyin transcription and adds/drops characters), this refers to https://en.wikipedia.org/wiki/Qin_Shi_Huang
quuxplusone
Yes; and for further reference, the non-Pinyin transliteration is https://en.wikipedia.org/wiki/Wade%E2%80%93Giles and the "add/drop" is that instead of the title 秦始皇 Qín Shǐ Huáng (lit. "first Qin emperor") Borges is using the title 始皇帝 Shǐ Huángdì (lit. "first emperor").
https://en.wikipedia.org/wiki/Shi%20Huangdi (redirects to Qin Shi Huang)
belter
“Now I will tell you the answer to my question. It is this. The Party seeks power entirely for its own sake. We are not interested in the good of others; we are interested solely in power, pure power. What pure power means you will understand presently. We are different from the oligarchies of the past in that we know what we are doing. All the others, even those who resembled ourselves, were cowards and hypocrites. The German Nazis and the Russian Communists came very close to us in their methods, but they never had the courage to recognize their own motives. They pretended, perhaps they even believed, that they had seized power unwillingly and for a limited time, and that just around the corner there lay a paradise where human beings would be free and equal. We are not like that. We know that no one ever seizes power with the intention of relinquishing it. Power is not a means; it is an end. One does not establish a dictatorship in order to safeguard a revolution; one makes the revolution in order to establish the dictatorship. The object of persecution is persecution. The object of torture is torture. The object of power is power. Now you begin to understand me.”
― George Orwell, 1984
rightbyte
I think quotes like this should be attributed to the character (the secret police commissaire?) not the author?
gopher_space
It’d be a point against in a formal debate, but in casual conversation it’s safe to assume your audience passed junior high and is familiar with the work.
matwood
Including 1984 in the cite makes it clear IMO.
I referenced 1984 in comment this morning related to websites disappearing. We’ve (always|never) been at was with Eurasia…
itronitron
can't be "First Emperor" if it's known that someone else was in charge before you
o11c
Eh, the history doesn't support that meme; only the details and not the existence was allegedly burned (if it even happened at all). It's pretty well established that there were 7 major states (Qin, Qi, Chu, Yan, and the 3 Jin (Han, Wei, Zhao); notably, not including the official Zhou king) at the end of the Warring States period, and the Qin conquered the other 6 within a decade.
(Fun fact: the lesser-known state of Wey actually lasted longer than Wei due to not being worth conquering. In modern Chinese they are pronounced the same, but at the time they weren't.)
zamadatix
Sure, they just used a different title.
exe34
First week we had mass deportation, second week we've heard of the building of concentration camps for undesirables, and now the modern version of book burning. There's something different about this republican government.
southernplaces7
>second week we've heard of the building of concentration camps for undesirables
Mind posting a reference for such a declaration?
johnneville
I assume they are referring to this: https://www.whitehouse.gov/presidential-actions/2025/01/expa...
southernplaces7
I assumed they were referring to something like this but wanted to be sure. It's also way the fuck off base to call this a concentration camp for undesirables with all the insinuations of Nazism that come with that. This kind of Trump = Hilter hyperbole at the drop of a hat hazes all the very real and heavy criticisms that can be made of this administration.
Gitmo being used as a transit camp for deported illegal immigrants is not at all as say, Sachsenhausen or Dachau being used to illegally dumb german citizens and torture them there indefinitely because they criticized the ruling regime. When Trump starts trying to dump opposing intellectuals, Hollywood producers and "wayward" reporters or writers into Gitmo for indefinite detention, then we can make some firmer claims of literal dictatorship.
If anything, the use of Gitmo under Bush II, when actual U.S citizens were held there indefinitely under terrorism charges was closer to the mark than this now.
null
exe34
as posted by two others - hotel Gitmo on sands is expanding.
matwood
First week they also declared banned books a ‘hoax’. So maybe getting rid of existing information takes longer than a week.
rhinoceraptor
I heard there was this party that had the same weird hand gesture that President Musk did a week or two ago...
tdeck
Didn't you get the memo? Trump won 49.9% of the popular vote. That means we all need to pretend this is normal and OK now.
I’ve been archiving data.gov for over a year now and it’s not unusual to see large fluctuations on the order of hundreds or thousands of datasets. I’ve never bothered trying to figure out what exactly is changing, maybe I should build a tool for that…