AMD Announces "Instella" Open-Source 3B Language Models

MattTheRealOne

It is great to see that they even trained it on open datasets. More AI models need to do this, especially if they market themselves as open.

mdp2021

Why? A great part of the important material for knowledge and mandatory intellectual exercise is not "open" (possibly "accessible", yet under copy rights).

rafram

It’s difficult to argue that a model is truly “open” if the creator won’t even tell you what they trained it on. Even as companies like Meta argue that training on copyrighted material is OK, they still don’t want to openly admit that that’s what they did - and providing their training data, which would likely be a giant list of books from LibGen, would give the game away.

perfmode

they stated it in the original llama paper

esafak

That's not AMD's fault. They are abiding by the law. If IP holders licensed or shared their data, AMD could train on them.

t-3

It's important to note that AMD is an IP company, Meta is a data company. AMD would be shooting itself in the foot if it normalized flagrantly violating the IP of others. Meta doesn't care about IP, they just want to sell ads and data.

kubb

This is important for AMD presumably because it demonstrates that ML can be practically done on their hardware.

Most likely a part of strategy to dislodge Nvidia as the leading AI chip supplier, and AMD is in the position to try.

How well it will work? Well I don’t know enough details about these companies to tell.

echelon

"Fully Open"

> The Instella-3B models are licensed for academic and research purposes under a ReasearchRAIL license.

Huge mistake.

This would have been an amazing PR win for AMD if they just gave it away.

Open models attract ecosystems. It'd be a fantastic sales channel for their desktop GPU hardware if they can also build increasing support for ML with their cards and drivers.

woadwarrior01

I suspect the release was for marketing reasons and not for winning developer goodwill.

woadwarrior01

Meh. Custom architecture with a non-commercial, research-only license. Qwen-2.5 3B also has a research-only license, but is way ahead of this model on almost all of the benchmarks.

https://huggingface.co/amd/Instella-3B-Instruct

bayindirh

Why not being able to sell what others did is so important?

In my eyes, having a completely novel and reproducible model from end to end, including its dataset is great news.

pama

Not GP, but in my opinion the reason why the license restriction is so important is that otherwise very few people will be able to try it. Huggingface or other commercial providers cannot put it up online (even if they didnt charge for the use, these entities might benefit from publicity so they might need to negotiate with AMD). I am not sure if this model will make it to the lmsys leaderboard either unless AMD helps provide an API endpoint and allows them to use it. If you install it in an HPC center for non-industry researchers you have to trust the new AMD codes (few people will do a serious infosec analysis on them), and you have to make sure you exclude people that might have commercial interests (say a student with a startup). It is not the license only that is slowing things down, but if the license was more general the code would develop more smoothly, and things like vllm might start to support it real soon.

bayindirh

The license doesn't restrict hosting it (and explicitly allows 3rd party access), but might require a license wall to click through. I'm not sure where did you infer that from.

I'm not an AMD employee, so I can't tell about their API access.

People here see student startups, I see tons of non-commercial research networks from where I sit. So the license is not absurd from where I look.

_aavaa_

Because they are heavily abusing the definition of “open source”. Though that ship has also sailed when it was decided that a model is “open source” when you get the final weights but not the exact training scripts and data

bayindirh

> Though that ship has also sailed when it was decided that a model is “open source” ...

So we can't egg AMD just because they did something better in some cases and worse in others?

They released a model from end to end and shown that they can compete now. Who cares about the business applications. That can come later.

The other actors abuse the definition of Open Source, too. Not only in AI, even. So, we shall denounce others with the same force, but we don't, because of the broken window theorem.

AMD is already an underdog, so whatever they do has no merit, and worthy of booing. Do masses boo NVIDIA for their abuse of the ecosystem? Did Intel got the same treatment when they were choking everyone else unethically?

Of course not.. Because they are/were the incumbents. They had no broken windows.

42lux

But they trained on open datasets…

woadwarrior01

I'm trying to be flippant, but I'm genuinely curious. Have you read their license[1]? The terms are really broad and onerous even if one wants to use it for purely non-commercial, academic purposes.

[1]: https://huggingface.co/amd/Instella-3B-Instruct/raw/main/LIC...

bayindirh

> Have you read their license[1]?

Yes.

> The terms are really broad and onerous even if one wants to use it for purely non-commercial, academic purposes.

No. It's just legalese to prevent commercial use, abuse of models and prevention of responsibility. As a person who sits in an academic research center, I see no problems at first blush.

OneDeuxTriSeiGo

It's just the RAIL Research-use licenses. Both the Research M and Research S license.

https://www.licenses.ai/blog/2023/3/3/ai-pubs-rail-licenses

https://www.licenses.ai/rail-license-generator

The intent of the license is to show the techniques used in the project and to provide the results in a form that they can be used to further development of other projects but not be used themselves.

The TLDR of the license is "here's the model and the sources. use it to help make your own models but don't use our model directly"

rbanffy

I understand their point in doing this - they want to use it as a sales enabler: they want you to buy GPUs to train their model for yourself, maybe tweak it with some non-free content, and then be able to use it commercially.

megadata

The entitlement is big and burly with this one.

m00dy

I'm curious how it stacks up against the phi4-mini.

woadwarrior01

It's very far behind phi4-mini in perf. Although, phi4-mini is slightly larger (3.8B parameters vs 3B parameters).

HN

AMD Announces "Instella" Open-Source 3B Language Models

AMD Announces "Instella" Open-Source 3B Language Models