Skip to content(if available)orjump to list(if available)

Meta torrented & seeded 81.7 TB dataset containing copyrighted data

gizmo

Based on the encyclopedic knowledge LLMs have of written works I assume all parties did the same. But I think there is a broader point to make here. Youtube was initially a ghost town (it started as a dating site) and it only got traction once people started uploading copyrighted TV shows to it. Google itself got big by indexing other people's data without compensation. Spotify's music library was also pirated in the early days. The contracts with the music labels came later. GPL violations by commercial products fits the theme also.

Companies aggressively protect their own intellectual property but have no qualms about violating the IP rights of others. Companies. Individuals have no such privilege. If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life.

illegalmemory

" If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life."

This is exactly what I immediately thought while reading the article. It almost feels like the legal system only punishes general public, while most of these guys are above it.

rahton

The legal system is built to favor large corps and capital owners. See Katharina Pistor books for instance.

cheschire

I think it’s the other way around. Those large entities break all the same laws and rules as others and then get to the point where they can influence the creation of a regulatory moat around themselves to prevent competitors from taking the same path as them.

isaacremuant

It doesn't "seem". The entire system in most countries works, by design, that way because the people in power trade in influence at a different plane.

That's why democracy often feels "failed" in that no change can be achieved because "it's just more of the same". Few Lobbyists representing the interests of a few people have more power than millions voting differently.

G_o_D

Money speaks ! Money buys !

Wowfunhappy

> Based on the encyclopedic knowledge LLMs have of written works I assume all parties did the same.

I don't understand why you wouldn't just buy copies of the books. Seems like such a relatively inexpensive way to strengthen your legal case.

jokethrowaway

Buying the books won't automatically give you permission to use the content commercially

gosub100

thanks to the byzantine copyright system, you can't easily do it. Plus, just speculating, but maybe by paying, it establishes "consideration" for some implied contract? "You implicitly entered a contract with us by purchasing the book, then violated the contract by 'distributing' the material for commercial use" ?

cess11

Too much paperwork, too much effort. These are important people, doing much more important stuff than whatever book authors do.

Or so they think, I think.

ysofunny

the english empire once tried to mantain a monopoly over steam loom machines

the americans cheated their way to competition,

heck, even before that, the english empire got jumpstarted by stealing gold from the spanish (who were themselves exploiting it away from aztec and other mexican natives)

I'm saying it's business as usual, but also, culture doesn't work like tangible physical widgets so we must stop letting a few steal this boon of digital copying by means of silly ideas like DRM, copyright, patents. all means to cause scarcity

choult

Hollywood became popular for filmmaking because they were literally the opposite side of the country from Thomas Edison and his patents...

miltonlost

People criming in the past is not an excuse for companies committing crimes today. You’re excusing lawlessness.

Cain killed Abel and got away with it!! I can kill someone today too!!!

appreciatorBus

I think it’s fine to criticize the hypocrisy of viciously defending the copyrights you own, while gleefully running roughshod over the ones you don’t.

But it’s also possible that copyright as a concept, or in its current implementation, is bad and unjust.

I’m sure some copyright holders would like nothing more than to see an argument that elevates copyright violation to the level of murder, morally or legally. But I think it’s more akin to jaywalking - violating an unjust law that mostly shouldn’t exist.

fortran77

Cain was severely punished.

וְעַתָּ֖ה אָר֣וּר אָ֑תָּה מִן־הָֽאֲדָמָה֙ אֲשֶׁ֣ר פָּצְתָ֣ה אֶת־פִּ֔יהָ לָקַ֛חַת אֶת־דְּמֵ֥י אָחִ֖יךָ מִיָּדֶֽךָ׃ Therefore, you shall be more cursed than the ground, which opened its mouth to receive your brother’s blood from your hand.

https://www.sefaria.org/Genesis.4.12

nottorp

Interesting, if we're to trust what NotOpenAI and Facebook say about their IP, the US should pay the UK reparations for IP theft based on textile industry profits starting in the 1850s until today?

portaouflop

Why do I get sued when I share some BitTorrents but $bigcorp can just do it with 1000 scale without problems?

The issue here is not copyright/patents/etc - the issue is that the law is applied selectively — the issue is that Aaron Schwartz is dead for sharing knowledge with the public and Zuccborg is a billionaire building his torment nexus

dcchambers

I guess the solution is to create a shell company for your illegal activities?

gnfargbl

I think if Google attempted to download the entirety of JSTOR with the express intent of making the full dataset freely available, then Google would also face legal consequences.

It's true, and relevant, that Google would feel those consequences much less sharply than Swartz did.

vintermann

Don't buy into the rhetoric and call it "consequences". It's always a choice to sue, a choice to prosecute, and this would be true even if these choices were made consistently and impartially (which they certainly aren't).

pbh101

> Google itself got big by indexing other people's data without compensation

Weird framing given how much value was and is still placed on Google driving traffic to you

mrkeen

For Google's case the order was reversed.

Google used to send customers to your site. Now they try to show you the information on their site so that the customer doesn't need to go to your site.

joshstrange

Even before the LLM-craze Google was showing their Answers box or whatever it was called at the top of the results that told you the answer (sometimes) so that you didn’t have to visit any website.

cess11

It's roughly the Spotify story too. They had an extremely impressive catalog very early, way before they were bought by the entertainment cartel. The founders had background in torrenting and the initial product was quite similar to The Pirate Bay but with clearly capitalist ambitions and branding, in contrast to the anarchist leanings of the Pirate Bureau and rather anarchic attitude of The Pirate Bay.

newsclues

Comprehensive intellectual property needs to happen for the modern (digital) era.

Basically the entire legal system needs to be retooled and rethought for computers.

actionfromafar

Looks like the entire legal system is being retooled at the moment.

threeseed

No we just need to enforce the existing laws.

And the legal system is for humans not computers.

newsclues

The existing laws are a problem, and are not enforced in a fair and just manner.

Yes, the legal system is for humans, but we can use technology to improve the system for humans, so it's faster, better and more fair, because humans aren't perfect, and now we have technology to be better than the system create a long time ago. You don't think the legal system should run on pens and paper right? Adapting to typewriters, was a benifit to the system?

Well, video on demand, live streaming, and things like LLMs can also make the system better for humans.

Ekaros

Considering prices for single work, this must be multi-billion dollar compensation.

Take for example 675k paid for 31 songs. So 20k a song. If we estimate book to be say 10MB that would 8 million works. So I think reasonable compensation is something along 163 billion. Not even 10 years of net income. Which I think is entirely fair punishment.

pinoy420

For creating a backup of library genesis. No. They should be awarded a philanthropic prize.

striking

There's evidence of them seeding back as little as possible. I'm not sure how that's "creating a backup".

yoavm

We all like hating big corporations, especially Meta, and people seem to use this as an opportunity to advocate for punishing them. I think it's wiser to advocate for changing our IP laws.

lrvick

I truly hope Meta has a serious security issue that burns their company to the ground.

That said, I want them to burn for the right reasons.

Downloading data that should be available to the public is not one of them.

Ekaros

First punish them. Then change the laws.

aprilthird2021

I think most of the public is probably in favor of stronger IP laws now that big corps are threatening to make them jobless with IP-disrespecting AIs

miltonlost

Big corporations all like hating their consumers abd legal laws. You love committing crimes it seems.

seydor

We have at least 4 types of ill-defined concepts of property in the 21st century , largely due to our laziness, intellectual inertia and lack of motivation to make forward-thinking definitions for the coming age of AI and ubiquitous access to all information and all communication.

1) the concept of copyright is as old as the word suggests (copies are the least of our worries going forward - it should be possible to define processes for exploitation of ideas in a fair way)

2) we allow humans to learn from other people's ideas and transform them to commercial products and the same should happen for AIs in the future

3) we have an ill-defined concept of "personally identifying information" which gives people ownership to information that others have created via their own means - there should be better ways to ensure a level of privacy (but not absolute privacy) without overly-broad, nonsensical definitions of what is personally protected information

4) We allow social media and other telecommunications media to arbitrarily censor people's speech without recourse. This turns people's speech to property of the social media companies and imposes absolute power on it. This makes zero sense and is abusive towards the public at large. We need legal protections of speech in all media, not just state-owned media.

fimdomeio

It really makes you think about those crazy internet folks from back in the day who thought copyright law was too strict and that restricting humanity to knowledge in such a way was holding us all back for the benefit of a tiny few.

stefan_

The more concerning thing is that the best thing these overpaid people could come up with was.. download the torrent, like everyone else. Here you are, billions of resources, and no one is willing to spend a part of it to at least digitize some new data? Like even Google did?

dietr1ch

I think they are morally required to improve the current state.

- Seed the torrent and publicly promote piracy pushing lawmakers.

- Contribute with digitisation and open access like Google did in the past.

- Make the part of their dataset that was pirated publicly accessible.

- Fight stupid copyright laws. I can't believe that copyright lasts more than 20 years. No field moves that slowly, and there should be tighter limits on faster moving fields.

wnevets

My ISP will shut off my internet if it catches me torrenting copyrighted material but if you're a massive corporation that steal TBs of data its barely a blip in the news.

gameshot911

Beyond illegal downloading and distribution of copyrighted content, the article also describes how Meta staff seemingly lied about it in depositions (including, potentially, Mark Zuckerberg himself).

bmsleight_

So if I torrented and seeded, I would be doing it for my own entertainment, not commercially. I expect big copy-write holders to come after myself. If Meta does it - I guess they have better lawyers ?

Could make interesting case law.

unification_fan

> Could make interesting case law.

Yeah, to perpetuate this system where only those who can afford lawyers get to benefit

echoangle

Since it’s case law, everyone would benefit from the precedent

panki27

They could have at the very least seeded some more, to give something back to the, uh, community.

nyoomboom

Remembering Aaron Swartz in this moment

stingraycharles

Which was arguably more innocent — scientific papers.

piyuv

Meta is not “innocent”, and comparing this instance with Swartz is a huge offense to his legacy.

maverwa

I think comparing it is reasonable and valid. Equaling it would be incorrect. What Meta is (allegedly, likely) doing here is several orders of magnitude worse, in scale and intention. I'd say both ethical and probably juristical.

But just because the scale and intention are different, does not mean we cannot compare both cases. They are not equal, far from it. But they are compareable.

Philpax

I don't think you've read the parent comment correctly?

RobotToaster

Before I decided my opinion on this I need to know their ratio.

adamsocrat

Article states: Meta also allegedly modified settings "so that the smallest amount of seeding possible could occur"

MaKey

Damn leechers!

woadwarrior01

I wonder what happened to the related OpenAI training GPT3 on the books3 dataset story[1] from ~2 years ago?

[1]: https://www.wired.com/story/battle-over-books3/