Breaking the Llama Community License
82 comments
·April 13, 2025lolinder
SahAssar
I don't think most people in the weights-vs-source debate misunderstands this, it's just the the current "open-source" models for the most part do not even meet the bar of source-available, so talking about if the license is actually Open is not the current discussion.
lolinder
See, but my point is that this is putting the cart before the horse. The "Open" in "Open Source" is what matters most by far, the same way that the "Free" in "Free Software" is the key word that qualifies the kind of software we're taking about.
Once we've resolved the problem of using the word "Open" incorrectly I'm happy to have a conversation about what should be the preferred form for modification (i.e. source) of an LLM. But that is the less important and far more esoteric discussion (and one about which reasonable people can and do disagree), to the point where it's merely a distraction from the incredibly meaningful and important problem of calling something "Open Source" while attaching an Acceptable Use policy to it.
achierius
> The "Open" in "Open Source" is what matters most by far, the same way that the "Free" in "Free Software" is the key word that qualifies the kind of software we're taking about.
I don't think this is true. If someone said "look, my software is open source" and by "source" they meant the binary they shipped, the specific definition of "open" they chose to use would not matter much for the sort of things I'd like to do with an open source project. Both are important.
fragmede
In today's world, if Meta did release full source they used to create Llama, there are only about a dozen institutions that have the capacity to actually do anything with that, and no one has that kind of spare capacity just lying around. So the question of having the source for now in this case is less about being able to do something with it, and more about behind able to examine what's going into it. Aside from making it so it won't tell me how to make cocaine or bombs, what other directives has it been programmed with on top of the intial training run. That's what's important here, so I disagree that is a red herring. Both aspects are important here, but the most important one is to not let Mark Zuckerberg co-opt the term Open Source when it's only model available, and definitely not even actually Open at that.
lxgr
It gets even weirder with Llama 4: https://www.llama.com/llama4/use-policy/ [Update: Apparently this has been the case since 3.2!]
> With respect to any multimodal models included in Llama 4, the rights granted under Section 1(a) of the Llama 4 Community License Agreement are not being granted to you if you are an individual domiciled in, or a company with a principal place of business in, the European Union. This restriction does not apply to end users of a product or service that incorporates any such multimodal models.
This is especially strange considering that Llama 3.2 also was multimodal, yet to my knowledge there was no such restriction.
In any case, at least Huggingface seems to be collecting these details now – see for example https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Inst...
Curious to see what Ollama will do.
diggan
Technically, those were there since 3.2's Use Policy. I've summarized the changes of the license here: https://notes.victor.earth/how-llamas-licenses-have-evolved-...
lxgr
Oh, I must have compared the license of 3.2 to the usage policy of 4 then or made some other error; I was convinced this was a new restriction!
Thank you, also for that article – the tabular summary of changes across the two is great!
geocar
Super weird.
Any idea what standing Meta/Llama thinks they have when they write stuff like that?
Is it copyright law? Do they think llama4 is copyright them? Is it something else?
crimsoneer
I'm pretty sure what they think is "we can loudly complain EU over-regulation is stifling innovation and try and get the public on-board by showing them everything they're missing".
jujube3
They don't want to be sued in the EU for releasing a model. One way to do that is by not releasing it in Europe.
It's the same reason a lot of websites block the EU rather than risk being sued under the GDPR.
dheera
> Curious to see what Ollama will do.
I don't think they care. I'm pretty sure Llama itself trained on a bunch of copyrighted data. Have licence agreements actually mattered?
cma
As long as you get the model weights without agreeing to the license, there has been no case in the US yet at least where model weights have been ruled to be subject to copyright and it is likely they are a mechanical transform of data. FB probably owns copyright on some of the data used for instruction tuning the model, but they rely on the transform also removing that copyright if it removes the copyright of the other underlying data they don't own the copyright to.
They want to have their cake and eat it though and these companies are all lobbying hard in a political system with open bribery.
lxgr
> As long as you get the model weights without agreeing to the license
But is that how it works? Not implying that the situation is otherwise comparable, but you e.g. can't ignore the GPL that a copyrighted piece of code is made available to you under, just because "you didn't agree to it".
As I see it (as a non-lawyer), either model weights are copyrightable (then Meta's license would likely be enforceable), or they aren't, but then even "agreeing to the license" shouldn't be able to restrict what you can do with them.
In other words, I'd consider it a very strange outcome if Meta could sue licensees for breach of contract for their use of the weights, but not non-licensees, if the license itself only covers something that Meta doesn't even have exclusive rights to in the first place. (Then again, stranger decisions have been made around copyright.)
geocar
> FB probably owns copyright on some of the data used for instruction tuning the model,
I'm not so sure they do, and even if they did so what? Holding the copyright on some of the data being used in the model doesn't mean they hold the copyright on the model.
> They want to have their cake and eat it
Nemo auditur propriam turpitudinem allegans.
NitpickLawyer
> I'm pretty sure Llama itself trained on a bunch of copyrighted data.
Every good, "SotA" model is trained on copyrighted data. This fact becomes aparent when models are released with everything public (i.e. training data) and they score significantly behind in every benchmark.
tough
Research team from orielly found out openai trained on copyirghted books
prob got a sub...
https://ssrc-static.s3.us-east-1.amazonaws.com/OpenAI-Traini...
wrs
AFAIK it’s still an open question whether there is any copyright in model weights, for various reasons including the lack of human authorship. Which would mean that if you didn’t separately and explicitly agree to the license by clicking through something, there is no basis for the “by using this model” agreement.
Of course you probably don’t have enough money to get a ruling on this question, just wanted to point out that (afaik) it is up for debate. Maybe you should just avoid clicking on license agreement buttons, if you can.
ronsor
I'm in the "model weights aren't copyrightable" camp myself. I think the license exists largely to shield Meta from liability or the actions of third parties.
hackingonempty
Humans make many choices that effect the trained weights: curation and choice of datasets, training schedules and hyperparameters. If these choices are made with an eye towards the generated results, rather than mechanically based upon test scores, why doesn't that rise up to the minimal level required to get a copyright on the weights?
wrs
It might. As I say, it’s up for debate. A judge might look at the 1kB of hyperparameters versus the 1TB of training data, and the 10 person-years of human effort versus 100,000 GPU-years of computer effort, and conclude differently.
Does Google have copyright of their search index? Never tested, as far as I know.
oceanplexian
> AFAIK it’s still an open question whether there is any copyright in model weights
There's definitely copyright when you ask the model to spit out Chapter 3 of a Harry Potter book and it literally gives it to you verbatim (Which I've gotten it to do with the right prompts). There's no world where the legal system gives Meta a right to license out content that never belonged to them in the first place.
jcranmer
In the US, it's not an open question. Feist v Rural holds that any work needs to possess a minimum spark of (human) creativity to be able to be copyrighted; data collected by the "sweat of the brow" is explicitly not allowed to be copyrighted. Thus things like maps and telephone books aren't really copyrightable (they do retain a "thin copyright" protection, but in the present context, you're going to say that the code has copyright but the model weights do not). Most European jurisdictions do recognize a "sweat of the brow" doctrine, and they could be copyrightable there.
What's not clear is whether or not the model weights infringe the copyright of the works they were trained on.
grumpymuppet
This seems like a reasonable position to take. Can you copyright the contents of a vacuum bag after pouring it down a gallon board as "art"?
Did you have any meaningful hand in constructing the contents?
jfarina
Seriously. Can I copyright 34 * 712 * 9.2 * pi? I didn't think you could.
markisus
I wonder what Meta’s attitude towards copyright in general is. Last I heard their lawyers were trying to say that it’s all good to pirate copyrighted works to make machine learning models.
mewse-hn
A lot of posts here saying this is irrelevant or whatever - unix was mostly developed out in the open and got into all sorts of trouble when big companies started cracking down on copyright ownership. This blog post isn't entreating you to necessarily behave differently w/r/t your Llama usage, just to be aware the license is restrictive, doesn't really line up with Meta calling it "open source", and eventually this could create consequences down the road. The post doesn't even mention it but there are safer models to use right now (deepseek) that have permissive licensing.
Groxx
@dang the title rename seems worse: "you're probably ..." is rather meaningfully different than "breaking the ..." as the latter sounds like it's instructions on how to "break the ..."
mkl
This was posted by the author a couple of weeks ago, but didn't get any traction: https://news.ycombinator.com/item?id=43504429
NoahZuniga
The post states:
> One example where this requirement wasn't violated, is on build.nvidia.com
But built with llama isn't shown prominently, so this is actually an example of a violation of the license.
thot_experiment
That's just like, your opinion man. This entire discussion and blog post are purely a fun distraction, legal contracts don't work how programmers think they work. The only definition of "prominently" that matters is the one the judge rules on when Zuck sues you.
dangus
Meta is free to license Llama from Meta under a different license are they not?
NoahZuniga
Yes, but the post gives that as an example of what follows the license. So even if it's not illegal because nvidia has a different license, it doesn't follow as a good example.
dangus
Yeah, basically what I'm saying is that we can't even guarantee that it's an attempt at compliance with this specific license because it's a major corporation that may want to use the software under a different license negotiated privately.
cmacleod4
A strange assertion considering I'm not using this "Llama" thing :-/
janalsncm
If a rogue laptop violates the Llama license and no one is around to enforce it, did it really break the Llama license?
Seriously, I genuinely wonder what the purpose of adding random unenforceable licenses to code/binaries. Meta knows people don’t read license agreements, so if they’re not interested in enforcing a weird naming convention, why stipulate it at all?
mertleee
The guy who wrote this article is likely the only person to self-report to the police that he might have broken a dumb rule published by a huge company that manipulates and deceives millions of people every minute... haha
antirez
With Gemma3, Qwen, the new Mistral and DeepSeek, all models stronger than llama of the same size, why one would risk issues?
diggan
> Gemma3
Funnily enough, Gemma 3 also probably isn't "open source" if you have a previous understanding of what that is and means, they have their own "Gemma Terms of Use" which acts as the license. See https://ollama.com/library/gemma2/blobs/097a36493f71 for example.
antirez
True, but it's a lot more "soft". They basically say do what you want if you don't violate this list of prohibited uses, which is not open source of course, but the list of prohibited users are, mostly, things already against the law or in a very gray zone.
ai-christianson
Deepseek are the kings currently.
punnerud
The same Berne Convention apply to Meta/Llama, as they use to scrape the web. You can use derivative work or summaries without reference. And copyright only applies to work done by human?
So when they say you have to reference Llama, it does not actually apply in most countries?
null
This is why I've always considered the weights-vs-source debate to be an enormous red herring that skips the far more important question: are the weights actually "Open" in the first place?
If Llama released everything that the most zealous opponents of weights=source demand they release under the same license that they're currently offering the weights under, we'd still be left with something that falls cleanly into the category of Source Available. It's a generous Source Available, but removes many of the freedoms that are part of both the Open Source and Free Software Definitions.
Fighting over weights vs source implicitly cedes the far more important ground in the battle over the soul of FOSS, and that will have ripple effects across the industry in ways that ceding weights=source never would.