Zuckerberg approved training Llama on LibGen [pdf]

201 comments

·January 12, 2025

mrtksn

We are approaching the "UBI or Guillotine" fork simply because rules and regulations work selectively. Just like with the "If we pay for copyright or business becomes impossible" defense, this is yet another wast unfairness against those who had to transfer their resources to learn a skill. Awful lot of people had hard life or got into debt for things that big tech is immune from.

Or maybe we will come into the conclusion that all this works only if there's no such thing as IP, reset the playing field for everyone and if anyone wants to make money will have to actually work for it every single time. IIRC that's what's happening in China and its how they surpassed US in innovation.

Technically, that's a deregulation - just not the kind of deregulation the big tech is pushing for. Maybe the next time there's a graph showing how regulations made EU lag behind, add the graph of China too to spice things up.

With so many technical people out of work and promises of make the employed ones obsolete too, it can be a good idea to let people build thing instead of unfairly concentrating even more power onto kleptocratic entities.

JumpCrisscross

> We are approaching the "UBI or Guillotine" fork

Even in the 18th century, the French aristocracy mostly cruised through the Revolution from afar, surviving with fortunes largely intact to this day [1]. If the fork is UBI or guillotine, the selfish move by the private-jetting billionaire class—personally and financially more mobile and global than the French aristocracy ever was—is the latter.

> if there's no such thing as IP, reset the playing field for everyone

Your thesis is letting Altman, Zuckerberg and Musk have free rein would decrease inequality?

> IIRC that's what's happening in China

Not really [2].

[1] https://www.bbc.com/news/magazine-37655777

[2] https://www.chinaiplawupdate.com/2023/08/china-prosecutes-11...

lupire

Extremely misleading citation.

> Criminal trademark infringement made up the majority of IP crimes with 10,384 people prosecuted accounting for 88.9% of the total.

Trademark infringement is of a completely different character from copyright.

Trademark infringement is pure fraud and lying.

Take out trademark infringement, and you have only 1 prosecution per year per 700,000 people.

JumpCrisscross

> Take out trademark infringement, and you have only 1 prosecution per year per 700,000 people

What is it in America? Did we even have a single criminal non-trademark IP prosecution in 2024?

XorNot

The other way to look at it though is that revolution won't solve your problems, and Americans are far too confident that it will.

JumpCrisscross

> other way to look at it though is that revolution won't solve your problems, and Americans are far too confident that it will

Americans are largely not for a revolution because most of us aren’t idiots. There is idle chatter of a civil war, but that’s again (a) bluster (not that this can’t take on a life of its own) and (b) about consolidating control versus wholesale rebuilding the American class structure.

tyleo

I’m no advocate for revolution but the American problem is that our revolution actually worked. Americans freed themselves from a prior group of elites unlike the grandparent comment is claiming of the French elites.

bushbaba

unlike then, today global mobility is within the means of most the western world. A French Revolution today could very well extend globally to identify and re patriot.

JumpCrisscross

> French Revolution today could very well extend globally to identify and re patriot

We have zero historical or contemporary precedent for this, and strong incentives for everyone else in the world to not play along. (As they did in sheltering the French aristocracy.)

In a hypothetical American revolution, foreign powers would be looking for their slice of the pie. To think through this dispassionately, imagine civil war breaking out in Russia or China. A second American revolution à la the first would put today’s billionaires and political elite in a room to draft a new constitution to their liking.

wesapien

Isn't UBI just going to raise inflation? People who don't need it will claim it and use the existing tax loopholes. Tax laws will need to be rewritte.

webmaven

The "U" in UBI is for "Universal". There is no means-testing. Everyone gets it regardless of assets or income, which means there is no need to spend any effort on checking whether someone is "poor enough".

Though the state would have to make sure the person receiving the benefit actually exists, is still alive, etc.

wesapien

I understand what UBI means but it's the effect is what I think people do not understand. Based on the Cantillon effect, UBI will just accelerate the separation between the rich and the poor.

tricorn

No, you can do a UBI that keeps the money supply the same, and use it as a way to stabilize the economy. With a $2000/mo UBI, 50% flat tax on other income, 25% VAT, phase it in by doing 10% of that the first year (and 90% of your current taxes, 90% of current support payments), second year 20% and 80%, so the impact isn't too disruptive. Adjust the flat tax rate as the Federal budget changed (a spending bill is automatically a tax bill as well). Adjust the VAT to control inflation.

nradov

You've got to be kidding. As a regular middle class citizen my taxes are high enough already. There's no way I'll vote for UBI so that some slackers can sit around getting high and playing Xbox.

motorest

> Isn't UBI just going to raise inflation?

Even assuming this scaremongering scenario, the world would be in a far better place if society assured everyone would be guaranteed a certain income.

Also, the scenario that supports the hypothesis of higher inflation is that more people in society are suddenly able to afford goods and services that were out of their reach without UBI. Can anyone actually put to words why that is undesirable?

aesh2Xa1

I think one criticism is that prices would change to capture the UBI.I think I read the idea in "Progress and Poverty," although I've certainly seen it elsewhere since:

- If everyone suddenly has more money (say $2 more per day)

- And milk is a basic necessity

- The milk seller knows everyone needs milk and now has $2 more to spend

- They can gradually raise the price of milk by close to $2

- Consumers must still buy milk at the higher price

- The intended benefit of the extra $2 is effectively captured by the milk seller

The increases in general purchasing power can be absorbed by suppliers of essential goods. If you have just excess discretionary income in the general case, then non-essential goods can bump in price, too.

weatherlite

It's gonna be complex and messy. On the one hand yes, many people receiving UBI = inflation. On the other hand many highly paid software devs (And soon after - accountants, lawyers, marketers, sales people etc etc) are losing their incomes = very deflationary.

It's gonna be interesting that's for sure.

wesapien

UBI has less friction as far as implementation since we don't need qualify anyone. With AI, we can afford to have that extra step (nuance) and be able to make sure its a needs based approach. The future requires various combinations of changes. Fix the tax system and then UBI (in this specific order) OR !UBI (needs based distribution).

PaulRobinson

If it's truly universal, no. Several experiments (controlled and natural), have shown this.

wesapien

Has there been experiments/testing at city/state scale. UBI is country scale and it's way more complex than testing it on a small town of people who I assume are selected for their needs.

tharmas

Indeed it would as the landlords would just raise rents accordingly.

We saw a bit of that with Covid cheques.

PaulRobinson

I think you'll find rising rents are more correlated with rising interest rates than Covid cheques, but given one of the key grievances perceived by UBI advocates is class inequality and lack of social mobility, if UBI became politically possible then so would rent controls and controls on prices of key essential commodities while waiting for it to "settle in".

ColdTakes

There isn't going to be a revolution. Americans are all talk no action.

Workaccount2

The legal problem is in outputting IP, I still have yet to see a convincing argument that training on copyrighted data is a breach of IP laws.

The trained models are trillionths the size of their training sets. There is no archive of copied data in them.

agilob

>argument that training on copyrighted data is a breach of IP laws.

You pay for access to materials, not using or remembering the material in its original format.

93po

Nearly every website does not charge me anything to retrieve information that is their intellectual property.

swatcoder

Training on copyrighted works licensed for such use is inarguably conforming.

Acquiring and using works without such license is just piracy. Whatever your stand on piracy is, most individuals and businesses are not free to incorporate it into their projects. Normal people have faced significant penalties for piracy, and concientious business operators avoid it.

Sure would be disappointing to all those people if there were suddenly a ruling that said "well, but it's okay that these guys did it because they're filthy rich and went real hard with it"

Workaccount2

Again, models are not archives of data.

Llama 3.1 70B is around 45GB is size, despite being trained on likely hundreds of petabytes of data. And before you say it, they are not fancy compression algo's either, the loss is so high they would be useless.

JeremyNT

How can it possibly be the case that it's ok for meta to download and ingest the entire contents of libgen but it is not ok for an individual human to selectively download a single work and read it?

Whatever legal contortions used to justify this are, quite frankly, bullshit. This isn't how anything should work even if these companies can buy themselves a regulatory regime where it does.

bdndndndbve

The idea that abolishing IP protections and letting AI companies run rampant is an offramp for wealth inequality is such a wild take to me?

Realistically billionaires are using racist and homophobic populism as a way to direct working class energy away from wealth inequality. Making people think "woke" is the reason why the earth is on fire and they can't have health insurance.

netfl0

Ah yes, because the working class is primarily concerned with protecting their intellectual property…

impomura

the working class is paywalled out of education because of IP laws that can seemingly be ignored by the AI companies

bdndndndbve

I think OP is coming from the "temporarily embarassed billionaire" perspective where if only we had a libertarian hellscape without pesky laws they would be a funeral baron who runs Bartertown.

casey2

How can you get the definition of fairness so backwards? Giant corporations provide literally everything you take for granted and they should be punished because you are envious? I don't get it.

There is a reason everyone with over 130 IQ wants to work for them rather than starting their own companies.

Lucasoato

They shouldn’t be punished because people are envious, they should be punished because they’re not respecting other people's intellectual property without an agreement in place.

We can’t protect IPs only when that benefits big corps. We should protect them always or accept that the world is better if we go in another direction, changing the rules for everybody.

visarga

Training on copyrighted data should be legally allowed

- of course exact reproduction of protected content is a no-no

- but learning is ok, as long as it is transformative. User prompts and responses are pushing the model outside its training distribution anyway - users add their own intent, making usage transformative

- when LLMs synthesize from multiple sources, the result is transformative

- if you try to protect expression it is meaningless now, but if you protect abstract ideas it kneecaps creativity

- the problems of copyright started with the apparition of internet, not with AI

- revenues from royalty are almost zero today, as each new content competes against an unbounded list of other works that have been accumulating for decades online

- because royalties are shit, creatives now focus on ads, and this leads to enshittification, attention grabbing junk everywhere, attention is scarce content is post-scarcity

- we actually like interactive participation more than passive consumption; we now edit Wikipedia, contribute to open source, have papers published for free on arXiv, use social networks where our comments are shared with the world, play games instead of reading books - it is another age, the interactive age

- AI is actually more than an infringement tool, it is useful for many legit purposes

- and AI is the worst possible infringement tool, it can hallucinate details, get thins wrong; By comparison copying is free and easy and precise to the letter

So the idea that training is infringement is pretty abusive, it tries to make copyright be about abstractions which is wrong. We can't return to 1990s, so we have to live with its demise. It's been dying for 3 decades already.

saagarjha

People who are smart typically have better things to do than talk about their IQs. Or sell ads, for that matter.

bdndndndbve

How can you get the definition of fairness so backwards? The King provides literally everything you take for granted and he should be punished because you are envious? I don't get it.

There's a reason why every vassal with a sizeable estate wants to be in the King's court rather than starting their own country.

null

[deleted]

boramalper

Alluded multiple times in the comments already but worth being explicit: Aaron Swartz killed himself 12 years ago yesterday for facing "a cumulative maximum penalty of $1 million in fines, 35 years in prison" [0] after downloading academic journal articles, which would be only a small percentage of what's available on LibGen.

Free for me, not for thee.

[0] https://en.wikipedia.org/wiki/Aaron_Swartz

JumpCrisscross

> Free for me, not for thee

Swartz was charged with 35 to 50 years, realistically faced up to 10, and was offered 6 months if he plead guilty [1]. That offer moreover wasn’t the final offer.

Put another way, it’s not clear that the law is being applied to Zuckerberg differently than it was to Swartz given the law wasn’t actually ever applied to Swartz. (Or that they wouldn’t gladly trade this lawbreaking for $1mm in fines and a negotiation over penalties where the prosecution opens with 6 months jail.)

The prosecutor acted inappropriately in that case; MIT, more wildly so. That doesn’t, however, carry over to a transgression of the law given we never got to that stage.

[1] https://www.forbes.com/sites/forbesdev/2023/02/28/increase-w...?

inetknght

> it’s not clear that the law is being applied to Zuckerberg differently than it was to Swartz given the law wasn’t actually ever applied to Swartz

Has Zuckerberg actually been charged with something with equivalent potential consequences?

If not, then your statement is false on its face.

JumpCrisscross

> Has Zuckerberg actually been charged with something with equivalent potential consequences?

I didn’t say Zuckerberg has been subjected to what Swartz was. Swartz never wielded the nation-state level power of a billionaire—it’s difficult to imagine how he could be subjected to similar psychological stress.

I said the law isn’t being applied to Zuckerberg (or anyone who has downloaded LibGen, for that matter) differently because the law was never applied to Swartz. Given the unpopular Swartz prosecution ended Ortiz’s career, and the lack of recent criminal copyright cases, it’s unlikely anyone would attempt to apply it as they did then. To anyone, including Zuckerberg.

TL; DR If you dislike what Zuckerberg is doing, you’re probably advocating for a clarification of the law. If you like it, erm, nothing much to do here.

bagels

LibGen is the most generic name ever, had to look it up. Turns out that LibGen is a collection of pirated books.

https://en.m.wikipedia.org/wiki/Library_Genesis

perihelions

Shadow libraries are a heavily-discussed, recurring topic on HN,

https://hn.algolia.com/?query=libgen&type=all ("LibGen")

https://hn.algolia.com/?query=anna's%20archive&type=all ("Anna's Archive")

https://hn.algolia.com/?query=z%20library&type=all ("Z-Library")

https://hn.algolia.com/?query=scihub&type=all ("SciHub")

A_D_E_P_T

It's not just a collection, it's the collection. It contains almost every scientific book ever printed, for one thing.

Frankly, it's a massive boon to researchers. It's like a top-tier research university library at your fingertips, and usually more convenient than the real thing.

reddalo

Also free. That helps.

But the sad state of the affairs is that if Aaron Swartz does it, he ends up dead; if Meta does it, everything is fine.

A_D_E_P_T

A lot of people would gladly pay. I'm a paying subscriber to Anna's Archive, which vastly improves the experience of that site. (It's borderline unusable without a subscription.)

Thing is, the Elsevier/Springer model makes it incredibly difficult to pay them. With single papers or book chapters in the $30-40 range, an afternoon's research can easily cost $600. (Note that the authors and reviewers don't get royalties on this, and the Editor-in-Chief of any given journal usually only makes a small stipend!)

There are services like DeepDyve, but they're intentionally gimped and difficult to use, because their user interface is 100% built around preventing you from downloading or screenshotting the papers you "rent"!

If the publishers set up a $100/month all-open-access program, and if the experience were at least halfway decent, I'd bet that a lot of people sign up. And that's not cheap!

mistercheph

Funny that the world where almost all human knowledge and art is free and accessible for everyone exists in parallel to one where articles about which McDonalds meal are you are paywalled, and funny which world civilized nations have chosen in order to protect The Suite Life of Zack & Cody and all the artists whose livelihoods depend on reruns of iCarly.

ppp999

A lot less generic than X

resiros

I would argue that it's right call: 1) it's in the world's best interest. I am running llama locally on laptop, and the ability to have the distilled world's knowledge at your fingertips will generate much much more value than what it takes. 2) it does not 'take' any value from the book creators. No one's going to 'not buy a book' because an LLM has been trained with its content (in contrast you might argue that you are likely to not buy a book because you downloaded it from libgen).

Copyright laws are not millennia-old ethical laws that everyone agrees on (like don't steal), they are a modern human construct that were created for the greater good (incentivize creation), and we should revisit them with new tech.

lnkl

"1) it's in the world's best interest."

How is pleasing Meta's shareholders in world's best interest.

TiredOfLife

How is using llama for free locally pleasing shareholders?

jbentley1

Things can both please share holders and create value for users of that thing.

edoceo

> incentivize creation

Humans do that naturally (see: children)

The copyright laws are to protect profit.

ulbu

wat? facebook is going to 'not buy a book' for each book it's gone through. world's best interest that one of the wealthiest companies in the world don't pay their dues? world's best interest? when we know nothing about the societal and political effects llms will have in the hands of such people?

what are you rationalising about?

1vuio0pswjnm7

PDF: https://ia902305.us.archive.org/34/items/gov.uscourts.cand.4...

Text: https://www.courtlistener.com/docket/67569326/373/kadrey-v-m...

"Meta's request is preposterous. With one possible exception, there is not a single thing in those briefs that should be sealed."

"It is clear that Meta's sealing request is not designed to protect against the disclosure of sensitive business information that competitors could use to their advantage. Rather, it is designed to avoid negative publicity."

"If Meta again submits an unreasonably broad sealing request, all materials will simply be unsealed."

"One final comment. Between this sealing request and assertions in Meta's opposition brief such as "[t]hat document expressly discusses torrents and seeding", Opp. at 7, the Court is becoming concerned that Meta and its counsel are starting to travel down a familiar road. See In re Facebook, Inc. Consumer Privacy User Profile Litigation, 655 F. Supp. 3d 899 (N.D. Cal. 2023)."

consumer451

It is very difficult for me to believe that Meta's recent political relations moves are not related to the open cases where Meta is the defendant.

qwertox

I don't understand your comment. This is about a lawsuit which shows that Zuckerberg OK'd the downloading and use of LibGen data. The case exists at least since mid-2023 and was in discovery phase until 13. Dec 2024. Shortly before the deadline Meta provided this new information, because they had to.

credit_guy

I guess the parent is saying that the new administration could be more business friendly in prosecuting this type of cases. It might even drop this case altogether. But only if Meta is "friendly" to the administration too.

tux3

They're saying that Meta has been kowtowing to the incoming administration in hopes of getting in their good graces.

Rather famously, some elements of that administration are above the criminal code, so that's not implausible.

lupire

PP is referring to Facebook/Meta's new policy changes like banning intelligence/sanity-based insults on the Platform, but carving out an exception specifically and explicitly for transgender people as targets, and removing tampons from men's bathrooms.

monsieurbanana

He's saying that he wants to pay Trump to win these lawsuits, which is a smart move as we know justice is for sale.

bamboozled

The pivot would potentially help his cause though , would it not ?

aprilthird2021

The antitrust one is most relevant as the new party in power would be gleeful to see it broken up but otherwise disagrees with the concept of antitrust

frob

"For my friends, everything; for my enemies, the law."

hatenberg

Nah it’s just that Zuck watched the Barbie movie and realized Soace Karen was getting entirely too much limelight and declared a Year of Masculinity

lupire

Calling someone Karen is a misogynist slur, and calling a man by a woman's name without consent is doubly a misogynist slur.

hatenberg

Ok Karen

elashri

There are three positions around the usage of of shadow libraries.

1- Should we develop this argument into more discussion as society and humans around the knowledge publication and the publication industry greed and the rent-seeking business model.

2- Big Corporation shouldn't just ignore the copyright law while maintaining the strongest copyright protections and going after small folks.

3- The usual argument about how LLMs training is different from people actually using pirated textbook because it is expensive (college and learning is hard and expensive specially in places like Africa).

These are different angles and I think we can try to address all of them as they are not exclusive. There are good arguments around point 3 on two sides. I don't think there is a good argument why we should allow the status quo regarding the first point though. For two, it is more complicated to even discuss specially on HN.

miohtama

We can rewrite copyright laws.

Havoc

I guess the zuck would download a car...

Will be interesting to see where this lands, because all outcomes seem to have significant secondary effects.

mnky9800n

I would download a car

blooalien

https://cults3d.com/en/collections/best-stl-files-cars-3d-pr...

You're welcome! :P

lxgr

In hindsight (considering how LLMs are trained etc.) it makes total sense, but "Big Tech vs. Big Copyright" is something I didn't have on my 2020s bingo card.

I wonder who will come out on top, and whether there will be any incidental improvements for consumers, but unfortunately I can imagine an "AI training exemption" all too well.

kccqzy

That's not surprising to me at all. Even in the 2000s there was a famous lawsuit about Google Books scanning books without approval and the proposed settlement was essentially allowing Google to sell scanned ebooks while giving copyright holders a cut[0]. At that time Google truly felt like don't-be-evil corporation, and lawyers for the copyright holders wanted to give Google all this data as long as Google pays the copyright holders. In the 2020s however I cannot imagine any Big Tech company to have that don't-be-evil spirit and I also cannot imagine them voluntarily paying anything to copyright holders.

[0]: https://www.newyorker.com/business/currency/what-ever-happen...

nonrandomstring

> "Big Tech vs. Big Copyright"

Indeed. But when do those intersect or diverge?

I don't blame him. What would you do? If I had a near perfect data training set of all the most useful books and a hungry AI to train, it would be the logical step.

The reason this is news is because of the stinking hypocrisy of it all. It's really the same topic as the Swartz-Altman discussion here [0], in that these giant companies want to have it both ways.

Where is Zuckerberg's shout-out for Alexandra Elbakyan? [1] Or for Brewster Kahle? Or any of the wast army of people who preserve and curate the vital culture of humanity by protecting it from intellectual property dungeons?

The colossal hypocrisy is that a company like Meta wishes to live under the protective umbrella of "Intellectual Property". It wants to stop me just stealing it's stuff and setting up a better Facebook

Were it exposed to the same rules it wishes to live by, it would be torn apart by vibrant and deserving competition within days.

All the Zuckerberg, Meta or OpenAI are doing is setting the ground for the abolition of intellectual property. They are literally the proverbial people who will buy the rope with which to hang themselves.

(Edit. that doesn't make sense insert <proverb about buying ropes that actually makes sense>)

[0] https://news.ycombinator.com/item?id=42671427

[1] https://en.wikipedia.org/wiki/Alexandra_Elbakyan

criley2

I don't view Big Tech as being against copyright. They simply hold a position that they will not pay for something unless forced to ("make me" - a very common position for the powerful to hold).

In fact, I'd argue that Big Tech is pro copyright, because once they force the copyright holder to negotiate, the cost is irrelevant to them and they build a moat around that access.

For example, Google stole Reddit content for Gemini until Reddit was forced to the table, and now Google has a seemingly exclusive agreement around Reddit data for AI purposes.

jsheard

> I don't view Big Tech as being against copyright. They simply hold a position that they will not pay for something unless forced to

Yep, the contradiction between them feeling entitled to use anything they want for training, while simultaneously having license terms which forbid using the output of their models to train other models is pretty glaring. Information wants to flow freely but only in one direction apparently.

lxgr

> having licenses which forbid using the output of their models to train other models

I haven't been following it closely, but aren't there already court rulings saying that generative AI output by itself is not copyrightable?

swatcoder

Yup. For Big Tech, the ideal outcome of these cases isn't that copyright is widely or deeply undermined as they rely heavily on it themselves (let alone how their customers and investors benefit from it).

Their ideal outcome is that there's some narrow carveout that gives them permission to ignore copyright where they want to, while extending similar permission to as few/irrelevant others as possible.

n144q

> I'd argue that Big Tech is pro copyright

I agree but for a different reason -- cost is actually relevant, in the sense that only the biggest player can afford to pay for the copyrights. If you are a small player, however your tech stack is or how good your model is, if you can't afford it, you can't compete with Google.

52-6F-62

In the past we called that tyranny, when a power thought it could act entirely without restraint.

Now I guess it’s defended as good business and good science by so many flunkies.

Knifes edge stuff. Tech people should all be reading the books, not Mark’s steamroller.

There goes the gravy train

dialup_sounds

Odds are that licensing gets streamlined into something like compulsory mechanical licensing and rates get negotiated into something that Big Tech and Big Media can both live with.

The whole conflict boils down to one party having piles of money and another party having something they want. That's not an intractable problem.

CuriouslyC

Big tech will win, because what they're doing is already basically legal, and they're worming their way up the new administration's ass.

visarga

Maybe training on copyrighted data should be allowed if the size of the training set is huge, as each individual example is justa drop in the ocean compared to the full training set.

If you train a model 20B parameters on 20T tokens, even with 1000 tokens per example, the model extracts about 1 byte of information per example. What is the value of 1 byte of copyright infringement?

lxgr

By the same logic, pirating movies should be allowed as long as the person doing it watches enough of them for each individual one to be almost meaningless…?

webmaven

If by "pirating" you mean distributing copies, probably not. But if you mean downloading copies, probably yes. Consider the case of the film student studying the entire ouevres of multiple directors.

visarga

Yes, if they watch a billion movies, it should be free to watch any copyrighted one.

TiredOfLife

The hilarious thing is that the same people that freely pirate music, videos, books and articles are on the side of huge copyright hoarders like Disney

jazzyjackson

I just wish the big corps would change the law to allow everyone to pirate freely, but instead they’re arguing for a carve out specially for training language models.

Funes-

Yes. Every "AI" company is training their software on everything, regardless of what they claim, and making millions, billions of dollars on it.

consumer451

YouTube was mostly a library of pirated content when Google bought it for $1.6B.

Spotify began by uploading an employee's pirated MP3s, and is now valued at $92B.

There are plenty of other examples. One of the ways to success is to ignore silly legal matters, build a product people want, and worry about the legality later. It's not just AI companies, the pattern is well established.

disqard

Eric Schmidt said so, and it caused a giant uproar.

To me, it's just "more of the same", but apparently he said the quiet part out loud, which was somehow verboten.

(Edit to add: I'm not saying "I think this is okay", but rather "this is Standard Operating Procedure for startups" -- even Reddit was seeded with fake accounts and content, to give the appearance of an active online community. This sort of hustle is a core part of SV culture, and I don't think this is going to change in a hurry.)

Excerpt:

...in the example that I gave of the TikTok competitor, and by the way, I was not arguing that you should illegally steal everybody's music. What you would do if you're a Silicon Valley entrepreneur, which hopefully all of you will be, is if it took off, then you'd hire a whole bunch of lawyers to go clean the mess up, right? But if nobody uses your product, it doesn't matter that you stole all the content.

consumer451

Thank you so much for this reference. It is the truth, and I have bookmarked it.

https://finance.yahoo.com/news/ex-google-ceo-schmidt-advised...

Funes-

>was

>began

You're very obviously missing a key point here. It's rather simple: pirating is integral to "AI", as it is of the utmost importance with regards to its optimization and even to building its basic functionalities. It will never cease to happen nor is it part of some "preliminary" process in which executives "ignore silly legal matters" in order to kick-start their projects only to discard those practices once they eventually take off. Comparisons to YouTube, Spotify, etc., are invalid for this very reason.

consumer451

I should have been very clear that "silly legal matters" was meant tongue in cheek. I do not think that this is cool at all.

You raise a good point. However, both Spotify and YouTube benefited from network effects and being the biggest guerrilla in the room. Can you remove the initial illegality from their later success, since the latter dependend on the prior?

What seems inevitable is that some deal is made with major rights holders, the little guy gets screwed, as has happened before.

freefaler

Not yet... they're not profitable still, but will be in the future (those who survive). Nvidia is making all the money from the eager investors who subsidize the "free" chatGPT & related tools.

emahhh

Exactly. Not surprised at all.

visarga

Do you mean "making cents per million tokens"? And the benefit obviously belongs to the person who prompts, because they solve a task or get help. The value of that help can be from trivial to life changing.

jpc0

I'm at a moral impass on yhis specifically.

Llama is probably one of the few LLMs that probably doesn't generate an income for Meta but I can't exactly see how other than by assisting their current ad generation.

Them being open weight isn't as good a what a "proper" open source LLM would be, but vs OpenAI which likely did the same thing it's significantly better.

On the other hand if copyright is enforced it should be enforced across the board, if I did the same thing while training an AI would I get the same treatment... Equal before the law and all that...

On the third point, I cannot legally obtain scientific paper without very significant cost to myself. My local libraries don't have a reasonable selection and even the university libraries that will let me as a member of public or even alumni still hold membership, specifically exclude scientific papers in that membership and you need to pay per paper.

paolgiacomelli

Let's then stop calling it "Artificial Intelligence" and call it what it is, making "plagiarism software" because "It doesn't create anything, but copies existing works from existing artists and modifies them in such a way that they can escape copyright.

Noam Chomsky, New York Times - March 8, 2023