Skip to content(if available)orjump to list(if available)

Internet Archive is now a federal depository library

bmurray7jhu

Unneeded materials from other depository libraries can now be transferred to the Internet Archive. Under 44 USC § 1912, depository libraries may dispose of outdated material, but must first offer to transfer to nearby depository institutions.

dylan604

What is "outdated material" for a library? Isn't that precisely where you go to find "outdated material" is a library's archives?

chpatrick

Stuff like the printed tax code of 1965 or Borland Pascal 1992 manual. Once you have it digitized it's a waste of space for libraries to have a physical copy because basically no one needs it.

Jarwain

In other words, a depository is cold storage

ocdtrekkie

Libraries have an entire concept of weeding, and numerous criteria for doing so: https://en.m.wikipedia.org/wiki/Weeding_(library)

Libraries are constantly bringing in new materials and very few are capable of constantly increasing in size to match. I believe national libraries like the Library of Congress tend not to weed, but they do have to offload material to satellite locations and storage facilities.

lucb1e

I'm having trouble finding what this means. Does IA now have new obligations, or gets new information, or something else, or all of the above?

The submission says:

> These records account for “millions and millions of pages” that can take up entire floors of public libraries, Kahle said. San Diego’s public library gave up its federal depository status in 2020 because its government documents took up so much space and often went unused. [...] The GPO [...] has ramped up efforts to digitize the Federal Depository Library Program.

Does IA now have to store floors upon floors of paper copies of information, at least until it got digitized? Or are they now merely obliged to host the digital materials insofar as they already exist? That sounds like what they are doing already for the whole web, and also apparently since 2022 when they started "Democracy’s Library, a free online compendium of government research and publications", just that now they're legally obliged to do this or something?

What I find on doi.gov[1] is "The mission of Federal depository libraries is to provide local, free access to information from the Federal government" and nothing really further on what this concretely means. Sounds like just an obligation though?

What I find on gpo.gov[2] is "The Federal Depository Library Program [ensures] that the American public has access to Government information in depository libraries". Could mean anything. The program ensures that, but let's assume that means the designated libraries ensure that, so then do these libraries get extra info that the public doesn't get (but in order to disseminate them to the public)? Makes no sense either

The GPO page and the submission also say that "Members of Congress may designate up to two qualified libraries." Did they get picked and now it's IA's obligation, or did IA ask for this? What do they get out of it?

[1] https://www.doi.gov/library/collections/federal-documents

[2] https://www.gpo.gov/how-to-work-with-us/agency/services-for-...

abracadaniel

As I understand it, it’s voluntary and like the government document version of the Twitter firehose. Direct access to all published government documents as they are created.

braiamp

lucb1e

I quoted from there, so yes

JumpCrisscross

"California Sen. Alex Padilla made the designation in a letter sent Thursday to the Government Publishing Office"

What does this mean. U.S. Senators can unilaterally designate federal depositories?

ssalka

It sounds like it was at the request of IA:

> "...in response to the enclosed letter I received from the Founder and Digital Librarian of the Internet Archive, Mr. Brewster Kahle, I am designating the Internet Archive as a federal depository library in California."

Which seems a lot more agreeable than unilateral designation (which is also how I initially read this).

MPSimmons

[flagged]

layman51

They already remove “inconvenient” webpages on the Wayback Machine if someone asks nicely enough. If I remember correctly, if you use it to save a software company’s documentation pages or evidence of something embarrassing like a potential data breach, they could remove it if the company asks. I think Oracle might have done something like this before.

tech234a

A community-maintained list collecting examples of such exclusions: https://wiki.archiveteam.org/index.php/List_of_websites_excl...

genter

Can't say I blame them, I wouldn't want to go up against Oracle's lawyers either.

01HNNWZ0MV43FF

If anyone reading knows an easy way to download and mirror IA pages please make it easier to find. A bot told me they offer downloads of the underlying WARC files but I could not find it

duskwuff

> A bot told me they offer downloads of the underlying WARC files but I could not find it

The "bot" is wrong. Most of the crawl data used by the Internet Archive, particularly the Alexa crawls, isn't publicly accessible. (This is because some of it includes archived pages which have since been suppressed by the site owner - removing those pages from the archived crawl data isn't practical.)

https://archive.org/details/alexacrawls

Common Crawl data is public, but less comprehensive than IA - https://commoncrawl.org/

fancy_pantser

There are utilities to help, waybackpack comes to mind, but I haven't looked in a while. https://github.com/jsvine/waybackpack

pabs3

I used wayback-machine-downloader, I think you need one of the forks to make it work though.

https://github.com/hartator/wayback-machine-downloader

badlibrarian

They locked away most .warc files due to the AI harvesting crunch.

toomuchtodo

It's a one way street. This provides more access to materials held by the federal gov for ingest into IA's storage system. Bit of a policy interconnect, if you will. Reminder to donate to the Archive.

jahewson

Doubtful. They’re not part of the government so the 1st amendment applies.

themgt

If you see a bank that says "federally chartered" or "federal deposit insurance corporation", stay clear!

chrisg23

I've heard it has already happened. Specifically the internet archive removed vidoes of the TempleOS developer Terry Davis' live streams because of problematic content.

If the internet archive is already curated for content then yeah there is a 100% chance that there will be more curation of content.

jazzyjackson

Kiwifarms as well. They are a bit of a pushover when it comes to controversy.

jprd

I thought Archive just removed access, but kept the content. I know that from a user perspective that is a distinction without a difference, but for posterity it matters.

Does anyone have any facts/citations on if this is a myth/coping mechanism I created, or reality?

BSOhealth

given this is already happening with many other taxpayer funded datasets, will pretty on brand with this group

odo1242

I mean, what would they do to exert control? Remove their federal depository status?

ranger_danger

Imagine having to delete their 100PB of warez.

rwmj

Wait til you hear about my local library. You can walk in and read or borrow any book without paying!

1659447091

Sounds outdated! My library doesnt even require me to walk in anymore, they send any book I want to read or listen to straight to my phone, and if they don't have it I can request they acquire it and send it to me for free

natas

I wish my public library was free...

GeorgeTirebiter

Sounds VERY Communist, or Socialist, or some other scary thing. Are you sure it's legal? Why, the AUTHORS and PUBLISHERS are being denied the revenues they would get if you would buy the book; or at least rent it. So, are libraries theft of Authors' and Publishers' renumeration? (And, to think, the richest man in the world at the time, Andrew Carnegie, endowed so many Libraries!)

NoMoreNicksLeft

Wait until you hear about my private library that resides on a Synology NAS. I can access it from anywhere in the world, on any device, and it's filled with whatever books I can bother to decide that I want that title. I have about 20,000 (not counting periodicals) all carefully curated and retail quality. I even got rid of those annoying generic Bantam Press covers and replaced them with the high-res stuff off the publisher's site.

Not sure what the appeal of the public library is, when you can have your own.

m3kw9

do we need an internet archive, archive now?

doener

Back in the days when things were sane my first thought reading this headline would have been: Nice, that‘s sounds official and important. Nowadays my first thought is: Wait, does this mean Trump can mess around with this?

stillwzcited

I’m still excited about it.

I hope that all of the world libraries join with the internet archive into a global cooperative.

I also hope there is a secret sub-basement in a different dimension that contains powerful artifacts, guarded by a master librarian.

A man can dream can’t he?

dsadfjasdf

yes trump is on the computer messing around with this

bigstrat2003

That says more about you than it says about the times, I'm afraid. Your first thought should still be the former, not the latter.

bdhcuidbebe

Read the news, bro.

ocdtrekkie

My take on this is that in desperation to become a real library despite Kahle's radical hatred of content creators, Kahle will end up dragging the legislative narrative in a direction that takes down real libraries with him. He will almost certainly broadcast his status as a federal depository library as part of his defenses in his numerous lawsuits.

One selfish man unwilling to recognize he is doing more harm than good.

bahmboo

"radical hatred of content creators" is a very harsh and specific allegation. I wasn't aware that Kahle was considered such a bad actor. I did some googling and wikipedia-ing and can't see much that supports that claim. I am very open minded to the nuances of IP rights vs information-wants-to-be-free so I'd love to hear more details about your position particularly as it relates to the federal depository designation.

badlibrarian

Making every book on the site available for unlimited download, not just rare things but contemporary best sellers, did huge reputational damage. Following it up by claiming he was saving scratchy old 78 RPM records, but in the process also making LPs from Paul McCartney and Jimi Hendrix available, continued the trend.

Tweeting out promotional links to the pages with those materials, while asking for donations on the top of the page? Well, I don't know if that's contempt for artists or just lack of common sense. But when they ask you to take down the material and you refuse...

The depository thing is a distraction. And they do have a habit of sensationalizing things in blog posts. So I understand where that commenter is coming from. Internet Archive is under attack from many sides but much of it is self-inflicted.

mdp2021

Libraries make «contemporary best sellers» and «LPs from Paul McCartney and Jimi Hendrix» freely available. You call it «reputational damage», others may call it "advancing demands over rights", "stirring a stagnating reality in view of effective progress" (with reference to dematerialization), "pushing a debate" (about where we want to go societally".

It is unwise to push these latter points with the outmost care without having awakened the masses and clarified your stances to decisors - it is unwise to be "right" in front of the immature. But the reputation damage remains about wisdom, not about pride.

toomuchtodo

They were already recognized by the state of California as a library, and have received federal funds for infrastructure under that designation. They’ve also been accepted into consortiums made up of other libraries in the US. Whether you believe they’re a library is immaterial.

badlibrarian

A federal Judge also ruled that "IA does not perform the traditional functions of a library."

https://publishers.org/wp-content/uploads/2024/09/2024.09.04...

Brewster has a friend in a state senator and he's trying to do what he can to preserve his section 108 privileges. He's removed over a million items in the past year after being repeatedly sued for copyright infringement, and leaked millions of private communications with patrons including passports and driver licenses. That's the undercurrent here.

Egos aside, the goal isn't to be a library: it's providing access to knowledge. But when your site is on the blocklist at public library terminals because you keep getting flagged for copyright violations and child pornography, maybe you're not on the path.

mdp2021

Unclear expression, since the goal of a library is «providing access to knowledge». Maybe the point is about the future of those services.

null

[deleted]