Skip to content(if available)orjump to list(if available)

EuroLLM: LLM made in Europe built to support all 24 official EU languages

adzm

For those curious, the 24 official languages are Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish.

Maltese, interestingly, is the only Afro-Asiatic derived language.

Hungarian, Finnish, and Estonian are the three Uralic languages.

All the others are Indo-European, Greek being the only Hellenic one, Irish the only Celtic, the rest are Baltic, Slavic, Italic, or Germanic.

(I originally used the term Balto-Slavic, though I was unaware of some of the connotations of that term until just now. Baltic and Slavic do share a common origin, but that was a very very long time ago)

arbuge

> Maltese, interestingly, is the only Afro-Asiatic derived language.

It's Semitic, to be precise.

https://en.wikipedia.org/wiki/Semitic_languages

UebVar

Arabic, even. An outlier, as it is AFAIK the only arabic dialect that is not written with the arabic alphabet. Also it's far removed from other arabic dialects.

Vinnl

Tomorrow there are elections in the Netherlands, and two parties are proposing adding Frysian to that list: https://neerlandistiek.nl/2025/10/kies-voor-taal/

Best get to retraining those models.

tecleandor

AFAIK, they are trying to get Frisian added to the "European Charter for Regional or Minority Languages", not the official language list.

They get certain recognition, but they are not official in Europe. For example, just from Spain there are 13 languages on that list.

mikrl

As a Brit I feel very at home when hearing/reading Dutch and Frisian. It’s a reminder that England and the Low Countries share a lot of close history all the way back to Anglo-Saxon times; of being fishers, traders, burghers and mercenaries moving around the North Sea chasing opportunities, spreading and augmenting languages.

“Brea, bûter en griene tsiis is goed Ingelsk en goed Frysk”

tannhaeuser

> However modern standard Dutch (Nederlands, Hollands) is based upon Franconian, rather than Saxon dialects.

> Some of these [Old Saxon] speakers took part in the Germanic conquest of England in the fifth century AD. While it is not true that English and Plattdeutsch derive completely from the same source, the Old Saxon input into Anglo-Saxon was of primary importance and this linguistic group contributed greatly to the Anglo-Saxon dialects which our English forefathers spoke.

[1]: http://www.plattmaster.de/plattoew.htm

tirant

Not only on the language but also in gastronomy and architecture. When I see old towns in UK I usually think about Dutch towns but just without any biking infrastructure.

RobotToaster

If you've ever read anything written in old English, it's a even closer to Dutch.

przemub

Each EU country nominates one official language for the EU, otherwise we'd have Catalan, Breton, Kashubian and many more.

Levitz

Well, this was 4 days ago, Spain in talks with Germany regarding the addition of official languages:

https://www.politico.eu/article/catalan-basque-galician-boos...

rsynnott

They could get Austria to do it, as it presumably has a spare slot.

runarberg

Is English a legacy official language then from the time the UK was a member (I‘m guessing Ireland nominated Irish instead of English). Aside it feels very un-EU to push this limitation, as I was under the assumption that EU was all about celebrating (European) diversity.

piltdownman

Including the nasty political side-show that is Ulster Scots - literally only brought in as a chilling effect 'whataboutism' to diminish support when Irish speakers ask for language rights in Northern Ireland.

https://www.reddit.com/r/northernireland/comments/1fivtob/no...

sigmar

Should be noted- the Netherlands can't unilaterally make changes. Spain has been trying to push for languages to be added and hasn't had luck.

Vinnl

Haha I just added it as a fun fact, I don't actually believe folks will need to start retraining things, or that this is likely to be at the top of the priorities list for anyone. Party programmes are aspirational anyway.

ginko

Just do a 50:50 mix of the German and Dutch model weights.

Vinnl

Oops, accidentally made the model speak Limburgish.

purrcat259

I read, write and speak Maltese, AMA if you are curious about the language.

Raed667

Tunisians claim they can understand Maltese with minimum effort, is it reciprocal? How close is Maltese to arabic / tunisian dialect ?

purrcat259

I don't have much personal experience in attempting to communicate with arabic speakers. From others I have heard Lebanese arabic is the closest and you can have a passable conversation.

arbuge

Not sure which Tunisians are claiming this but they'd definitely need a lot more than minimum effort. Maltese split off from Arabic around 1k years ago. The two languages sound pretty different, and are written with different alphabets.

franklin_p_dyer

Not a question, but - Tatoeba could use your help! It is an open source (both code and data) dataset of parallel sentences and their Maltese data is very lacking. Also it’s pretty fun to just translate a bunch of random sentences into a language you speak. :-)

https://tatoeba.org/

nxor

How are loan words viewed? Do businesses work in Maltese? Are monolingual speakers of the language regarded differently than those fluent in English? Do young people in Malta listen to Maltese music?

purrcat259

Maltese has been loaded with loan words since forever. 5 points if you can guess where bonġu, bravu and mappa come from. At some point there was some literary council for the language that decided that any new loan words should just be spelled phonetically. Computer became kompjuter.

Businesses do work in Maltese and English. Both are official languages. Its quite rare to encounter a business that deals near exclusively in Maltese. Many prefer Maltese but will fall back to english where necessary.

Regarding monolignual speakers, I think theres a lot of stereotypes for maltese only, english only and code switchers. I think its all a bit silly... So as long as communication can happen I don't fuss.

On Maltese music... There's a lot of low ish quality music then there's a few absolute gems. Look up The Travellers, Lapes, Jon Mallia on YouTube/Spotify.

JAlexoid

Yes, there's plenty of Maltese spoken and listened to.

I was surprised to hear Maltese radio stations played in taxis, while visiting Malta just a few weeks back

Tade0

How is "Marsaxlokk" really pronounced? I've heard that word a few times, but never from a native. Google translate can't help me here, as it doesn't seem to have Maltese text-to-speech.

purrcat259

Read with English pronunciation, closest would be mar-sa-shlock.

adzm

I'm actually really curious about everyday usage of the language; is code switching between English and Maltese more common than Maltese on its own? I've seen a few online communities where the vocabulary switches between Maltese and English very often which is interesting but I wonder how much of that is just online / written versus everyday speech.

purrcat259

Depends on where you live and how you were brought up, but for the most part code switching is default.

There was a point about 7 years ago when the overton window shifted to "speak english to strangers first" because of a large influx of foreigners who did not know the language. Since then I've met foreigners who have better Maltese than some natives.

Older folks & geriatrics will sometimes be surprised when they assume someone is foreign and they turn out to be Maltese. "int Malti??" is a statement I get often because I don't look Mediterranean despite being born here.

ebb_earl_co

What is the name of Maltese in Maltese? Like “el español” in Spanish, it’s neat to know what languages call themselves

ggsp

Wikipedia says it's "Malti"

kridsdale3

'ish' is a pretty universal english suffix. So Spanish is just "españ-ish".

runarberg

Is there any dialect of Arabic which you can understand without too much effort?

How much do you consider Maltese its own language (as opposed to a dialect of Arabic)?

notahacker

I know that the reverse understanding isn't too bad from chatting with a Saudi-born member of staff on holiday in Malta.

I don't think anyone would seriously consider it a dialect of Arabic though with its completely different alphabet and half the vocabulary and morphology coming from Italian languages/dialects, even if Malta hadn't spent the best part of a millennium trying very hard not to become part of the Arab world

cm2012

Can you communicate with Maltese dogs more effectively?

purrcat259

Only if we have a few Maltesers first

ks2048

From other comments, it seems many people don't realize that there are 11 more languages than these 24 official (this is mentioned in the paper):

Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.

sva_

Seems like the model isn't limited to those though, from the paper:

> as well as some additional relevant languages (Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian).

https://arxiv.org/pdf/2409.16235

The paper also goes into detail on training set sources, which I feel like a curation thereof might be considered the main contribution of this publication?

amarant

I find it interesting that Norwegian isn't on the list.

I have often joked that Norwegian is just a dialect of Swedish, but I never expected to get official validation like this!

rcbdev

Norwegian is not on this list, because in fact no country with Norwegian as their national language is part of the European Union at the time of writing.

null

[deleted]

emil-lp

Norway isn't in EU, though.

null

[deleted]

bdhtu

Norway isn't in the EU.

jim180

Lithuanian and Latvian are Baltic languages. Nothing to do with Slavic...

Telaneo

asveikau

See the section "historical dispute".

I think some people get touchy about them being lumped together if their last period of commonality (per the article) was 1400 BCE. For comparison, I believe all the Slavic languages were mutually intelligible around 1200 AD. But much more recently than this, in the last few centuries, there have been notable attempts by east slavs to absorb the Baltic language cultures and deny them.

kaato137

Balto-Slavic branch divides into Baltic and Slavic language groups so nothing wrong here

sublimefire

It is just one of the theories, there is no clear evidence to suggest that Baltic and Slavic were the same language thousands of years ago.

kreetx

Yup, most of Eastern Europe are Balto-Slavic. While the division from the Eastern Slavic languages (Russian, Belarussian, Ukranian, etc) is distant, they are still Slavic. From Eastern Europe, only Estonian is not a Slavic language.

Tade0

Plenty of wrong here, considering Lithuanian and Latvian are utterly unintelligible to slavs, save for loanwords, but Slavic languages between themselves retain some level of intelligibility, which even spawned two competing constructed languages.

adzm

I was thinking about separating the two groups when I was writing this but was afraid of getting too verbose, though in retrospect that probably would have made more sense regardless of the historical lineage. My apologies if this came off as inconsiderate.

I updated my original comment, and learned a good amount about that dispute as a result, so thanks for calling it out.

_kidlike

In Greek we call our language Hellenic, and our country Hellas. "Greek" / "Greece" don't exist in the Hellenic language.

ranadomo

> Γραικοί, Graikoí were an ancient Hellenic tribe

https://en.wikipedia.org/wiki/Graecians

whimsicalism

Actually nuts to me the degree to which European policymakers do not even begin to understand how to kickstart technologically-intensive industry. Anyone who has seen close-up the results of a "pick the winners" grant-style approach to innovation knows what will go wrong here.

Also funny to read this narrative of how access to the European 'supercomputer' cluster is going. https://x.com/levelsio/status/1981485945745788969

dzikimarian

While grant process in EU isn't fun, I think Levels has bit of an ego issues. He mentioned that if he had issues like that on eg X, he would see Elon himself in the replies.

While he is great at converting his influencer status to income in his micro-SaaS projects, I don't think running ad-fueled browser games on state-sponsored super computer should be really aim of these grant programs.

alecco

He is 100% right on this one. From personal experience trying to figure out EU. Lawyer bureaucrats manage funds behind red tape clearly meant to be for their pals.

All these while the EU is running out of funds and in a process of de-industrialization. There should be an independent corruption investigation on Brussels.

dzikimarian

I took part in application for EU grants a few times and our company group did it many times over the years.

It's bureaucracy, often bordering with stupidity. You may need advisors to navigate all their forms & processes. But it certainly isn't "pals-only" type of deal.

On the other hand - is it harder than getting VC funding? For seasoned founder with reputation - probably. For fresh startup - probably not.

bjourne

Of course there is red tape. EU funding comes from taxpayer money and we want it to be spent wisely. The red tape is precisely to prevent it from being funneled to pals. EU has funded quite a few free software projects so it's not like the red tape is an insurmountable burden: https://www.ri.se/en/news/blog/europes-digital-future-spells...

whimsicalism

I'm actually no fan of his, so that's fine. That said, I went to the actual website he was talking about (I'm also an EU citizen) and in this case it is exactly as described and bordering comical.

troupo

It's not even close to how he described it.

antman

What are the effects of pick the winner strategy? Sounds intriguing

softwaredoug

Is the point of these policies to pick winners? Or to upskill the creators and stimulate the economy by giving possible entrepreneurs experience Europeans can't get in big tech?

In the US, some ex-Googler might found a startup. Europe doesn't have the equivalent of FAANG. (Europe-wide companies are not quite as easy as US-wide)

Even if the super computer itself "fails", is the goal actually the secondary impacts to the economy?

(And in the US, we do our own fair share of picking winners / losers, especially in the current regime)

null

[deleted]

deaux

> What's REALLY much more important though if you want to be a part of the AI race and I've posted for years here with @euaccofficial is to make Europe a really extremely attractive place to start and run an AI business. Remove regulatory obstructions and give tax discounts for startups. Let them build a business first that can compete worldwide and once they make enough money (let's say $100M/y), then slowly start adding regulation.

When you talk to most EU business owners, even in tech, the limiting factor isn't regulations. This being the #1 reason is such a tired trope.

Ironically, China has in some ways a bigger regulatory burden when it comes to software, as there if the government doesn't approve the business is dead in the water. I doubt that Klarna would've gotten off the ground there, for one, I could see them being shut down much earlier there. In the EU only now very slowly are some governments even starting to talk about some weak measures around their business model. But I've never, not once in my life, heard "Chinese software companies can't get off the ground due to the regulatory burden".

The same people who clamor about the EU regulations are the ones who hate on the EU for their protectionist measures against US tech. Yet another bout of irony here - China's software industry has flourished exactly thanks to 10 times stronger protectionist measures against US tech. So has Korea's, and their protectionism has never even been anywhere on the China level, more inbetween EU and China. No, if there's anything that would help, it's much more tech protectionism in the EU.

Pieter Levels is at the end of the day an influencer, not a serious founder.

pier25

> When you talk to most EU business owners, even in tech, the limiting factor isn't regulations.

I have a tech startup in Estonia and I agree. To me the biggest limiting factor is lack of funding.

moffkalast

Yep, VCs don't exist here. Plus the absurd starting costs, it's like what, 20k to set up a GmbH?

clickety_clack

It’s probably the people who didn’t start a business in the EU that you want to talk to. Like, I’m European, but I started my company in the US because everything is so much easier here.

deaux

Where in Europe and where in the US? You probably started one in the easiest US state to do so. Did you try starting one in the easiest EU state? Otherwise we already can't take things very seriously.

Secondly, what's easier besides VC funding? If it's VC funding, the disparity there has nothing to do with regulations - guess how much VC funding the non-EU rest of the world gets.

lukan

What would you want to see changed to consider coming back?

sofixa

> but I started my company in the US because everything is so much easier here

Which part is easier? That you have 50 different states with slightly varying laws to consider (e.g. Californian Data protection)? That you have a byzantine system of "benefits" to choose and manage?

And compared to where? Germany or Estonia or Sweden or Spain? The complexities will vary wildly depending on the country (kind of like in the US, where lots of companies pick the state to base themselves in based on the combination of favourable laws and precedents and taxes).

whimsicalism

> When you talk to most EU business owners, even in tech, the limiting factor isn't regulations. This being the #1 reason is such a tired trope.

Okay, what is the limiting factor? Because when I talk to EU business owners (admittedly, very few) - they point to lack of big EU capital markets, which is directly downstream of the policy environment. And when I talk to top EU human capital, they all point to the lack of competitive wages. There's a real difficulty in allocating capital to talented humans.

And, at least in Southern Europe, the income tax schedule is so aggressive it's hard to justify continuing working in many of these countries if you are highly talented.

Like, if you can tell me what the induced operator norm from l_2 -> l_2 is - probably you should come to the US and work at a biglab and make bank. What can you do in Portugal, Italy, Spain, etc.??

> Pieter Levels is at the end of the day an influencer, not a serious founder.

Sure, agreed.

I think it is a complete misreading to point to protectionism as the reason for Chinese success, but having a big unified domestic market for consumers along with massive saving rates and capital controls probably does help.

actionfromafar

One fairly large factor is that even though English is much more common today, you just can't operate (depending on the product of course) in many countries without having customer support, documentation etc in the local language.

KaiserPro

Money.

Why work in the "europoor" countries when you can go to america and earn megabucks.

sofixa

> Okay, what is the limiting factor

A few.

A big part is that the EU is a collection of countries that (with very few exceptions) have different languages and laws. For a company to serve Spain and France, for instance, it would need to translate everything, hire local lawyers and customer support agents. Considering the much smaller size of the countries (biggest one is 70 million vs 330 million in the US), the opportunity for "unlimited" growth is limited.

This also rebounds in the fact that when an American company makes it big, they have the resources to flood other EU markets and be cheaper/better than the local competition due to economies of scale and money based on their big successful US market. A French company making it big is still small compared to a US equivalent.

Then, there's the capital markets, no denying that. The money being thrown around the US is like nowhere else on the planet. Some of it definitely a bubble / unrealistic, but that doesn't matter. But in part it's because of the size of the total potential market that this is justified.

Education / national mythology also plays a part, I think (this is pure conjecture now). In the US, the "American Dream", "everyone can make it" etc is heavily ingrained. It propagates through the world with the help of Hollywood and other American cultural exports. In most EU countries, there isn't such a heavy emphasis on independence and "pulling yourself up by your bootstraps". "Hustle culture" isn't a thing. So for most people, it isn't something that comes naturally to them to start a company and work 100 hour weeks to be big and rich and successful and famous.

That's not to say there aren't such people, I went to 42 and have been to Station F and know some people in that universe. A decent proportion of my classmates wanted to make their startup and make it big, and some did end up starting their own companies.

deaux

> I think it is a complete misreading to point to protectionism as the reason for Chinese success, but having a big unified domestic market for consumers along with massive saving rates and capital controls probably does help.

Capital controls are protectionist measures, but anyway, no.

> Okay, what is the limiting factor?

Let's look at which countries have a significant local software industry compared to population size.

- China

- US

- Korea

- You can argue for Japan and India but that's already starting to stretch.

- Yup, effectively no where else. Even in an "out of the way" place like Myanmar everyone uses Meta, with a nice little genocide to show for it. Sure, in Vietnam they use Zalo, and other places have a few other local players. But most of the famous US tech apps are dominant.

Is the EU the outlier here? No. Everywhere else US tech dominates. Meta, Netflix, Apple, Google, Uber, Spotify, Microsoft, Match Group, Paypal, Amazon, and on and on. They don't just dominate the EU, they dominate the world.

Except for the countries I named above, where at least some of the markets that US big tech competes in, instead have bigger local players. And even there, guess what?

Their market share is almost 1:1 linearly correlated to the degree of protectionism in those countries, all the way from China, then Korea, then India/Japan, and then everywhere else! Who woulda thought!

Why does Korea have much less US tech dominance than, say, Germany? Despite German companies theoretically having a big advantage: the German public is 100x more privacy conscious than the Korean one, and much less trusting of US companies.

I can tell you that it's not less regulations; Korea's GDPR is much more onerous than the EU's and so are investment regulations. On every single regulatory aspect, German software startups have it easier. But they were never protected. US tech was allowed to waltz in, dump their products - that's what they did, it's hilarious how now China "dumping" EVs and solar is suddenly an issue when it's exactly the strategy that US tech continues to this day; the AI companies are doing it right now! And the Korean companies were protected. Both by the rules burden, that local companies had to deal with too, along with intentional protectionism.

When it comes to solar and EVs, we all understand that a foreign country dumping their goods kills local industry. It's the exact same with software.

But then half of HN has millions on the bank exactly thanks to the above - this is where all those fat SV salaries have come from - so I do get the lack of desire to understand it.

greg_V

Tbh, a lot of EU protectionism vs. US tech seems not to keep the competition out. In fact, with the amount of free press US startups get and the size of their coffers, they can simply roll over the local competition in EU markets most of the time.

What it's terribly good at is adding burdens that the US giants don't face early on, slowing down the early growth between 28 fragmented markets. I don't know specifically about how China works, but the question is proving product-market fit, and for that, you need a lot of users fast.

In the EU, it's a different battle country to country as the media environment, the markets, the regulation etc. are all fractured.

tinco

Yeah no, it's just not how it works. They're trying to support fundamental research and they have limited resources to accomplish them. Some random dude who wants to build a company that generates pretty AI pictures is just not the target audience, and he rightly got rejected.

And frankly, the dream scenario that Pieter describes where he somehow would qualify for these resources also wouldn't help kickstart the tech industry, and it's also not how it works in the states.

What does help, and what European governments (at least the one in The Netherlands that Pieter is from) actually do, is more funding for startups. If you're a startup founder in NL almost every angel you talk to has a matched funding deal with the government. That's such a smart way of keeping up with the US. Do you think US startups get free compute from the government? They don't even get subsidies most of the time. What they get is better funding because there's more capital available, and helping investors with that is exactly how you solve that.

whimsicalism

I don't think what you're saying is inconsistent with what I'm saying. I think you are making a big deal out of the difference between state investment funds and subsidized GPUs but I think they basically work by similar mechanisms.

logifail

> What does help, and what European governments (at least the one in The Netherlands that Pieter is from) actually do, is more funding for startups. If you're a startup founder in NL almost every angel you talk to has a matched funding deal with the government. That's such a smart way of keeping up with the US.

Does government offering matched funding to investors actually help startups who are struggling to find (any) funding? If a startup can't find (any) funding, matching is irrelevant.

> Do you think US startups get free compute from the government? They don't even get subsidies most of the time. What they get is better funding because there's more capital available, and helping investors with that is exactly how you solve that.

Umm. I'm not really convinced that the political elites in Europe understand how to do any of this stuff well.

See also: https://www.eib.org/en/publications/online/all/the-scale-up-...

saubeidl

This guy spreads FUD about the "unelected commission". What a loon.

troupo

Levels is engagement farming. Instead of uncritically reposting him you could've gone ahead and read what the cluster is for: https://x.com/dmitriid/status/1982927767286231403

Cluster: for public benefit, cutting edge research in biotech, medical, robotics.

Levels: I want to create AI photos of people for my AI Slop startup

whimsicalism

> Cluster: for public benefit, cutting edge research in biotech, medical, robotics.

That's not what the quoted paragraph says and you can read the whole release if you want: https://ec.europa.eu/commission/presscorner/detail/en/ip_25_...

troupo

I literally quoted the paragraph from this link in the tweet I provided: Edit: lol, I didn't, I quoted it from a policy document, not from press release. However, my point stands:

--- start quote ---

Apply AI Strategy

The Apply AI Strategy aims to harness AI's transformative potential by driving adoption of AI across strategic and public sectors including healthcare, pharmaceuticals, energy, mobility, manufacturing, construction, agri-food, defence, communications and culture. It will also support small and medium-sized enterprises (SMEs) with their specific needs and help Industries integrate AI into their operations.

--- end quote ---

I also quoted a paragraph from a document I will find when I'm not on mobile.

Levels literally wants to train AI Slop: https://x.com/levelsio/status/1981499900266193028

--- start quote ---

Train a foundational model for AI photos of people

--- end quote ---

webdevver

EU grifting is so much worse than even the most brazen Trumpian crypto pump n' dump.

Geniunely repugnant. Atleast the Trump admin has the decency to pump everyones 401k...

I'm trying to figure out why it bothers me so much. I think its because the EU are such unbelievable losers in everything they do. they can't even grift, thats how useless they are. they can't even steal properly. its so undignified, and offensive to the senses.

whimsicalism

Wouldn't go that far. EU policymakers have good intentions, I believe - but ultimately are products of their environment and cultural inclination.

The EU is such a bizarre place because they treat capital and entrepreneurs with such massive distrust, but never really bothered getting rid of the quasi-static entrenched hierarchies from feudalism? Like I'll go to the UK or France and there will just be massive swathes of land owned by the nobility or 'former' nobility? Maybe start there but let your high-value human capital earn a good wage?

coolewurst3000

You are wrong in that you think the hierarchies stem specifically from feudalism, but you are absolutely correct in that these hierarchies exist and are deeply entrenched. Sweden and Germany have one of the lowest percentages of self-made vs. inherited fortunes in the western world. Actually some tax policies in the US enable much more upward mobility, such as real estate taxation and 401k-like vehicles.

sofixa

> France and there will just be massive swathes of land owned by the nobility or 'former' nobility

Yeah, no, this isn't even remotely true.

tokai

[flagged]

Stagnant

Title is missing "(2024)". The 9B model was released last december[0].

0: https://sites.google.com/view/eurollm/home

htrp

>The EuroLLM Team brings together some of the brightest minds in AI including Unbabel, Instituto Tecnico Lisbon, the University of Edinburgh, Instituto de Telecommunicacoes, Université Paris-Saclay, Aveni, Sorbonne University, Naver Labs, and the University of Amsterdam.

>Europe is the only continent in the world to have a large public network of supercomputers that are managed by the EuroHPC Joint Undertaking (EuroHPC JU). As soon as we received the EuroHPC JU access to the supercomputer, we were ready to roll up our sleeves and get to work. We developed the small model right away and in less than 6 months the second model was ready.

[1] https://www.eurohpc-ju.europa.eu/eurohpc-success-story-speak...

Repurposing some of that physics sim compute

loandbehold

Aren't all frontier models already able to use all these languages? Support for specific languages doesn't need to be built in, LLMs support all languages because they are trained on multilingual data.

tensor

No, that's not how training works. It's not just about having an example in a given language, but also how many examples and the ratio of examples compared to other languages. English hugely eclipses any other language on most US models and that's why performance on other languages is subpar compared to performance on english.

voxgen

Ratio/quantity is important, but quality is even more so.

In recent LLMs, filtered internet text is at the low end of the quality spectrum. The higher end is curated scientific papers, synthetic and rephrased text, RLHF conversations, reasoning CoTs, etc. English/Chinese/Python/JavaScript dominate here.

The issue is that when there's a difference in training data quality between languages, LLMs likely associate that difference with the languages if not explicitly compensated for.

IMO it would be far more impactful to generate and publish high-quality data for minority languages for current model trainers, than to train new models that are simply enriched with a higher percentage of low-quality internet scrapings for the languages.

Byamarro

There's actually a research showing that llms are more accurate when questions are in Polish: https://arxiv.org/pdf/2503.01996

andy12_

I have never noticed any major difference in performance of ChatGPT between English and Spanish. The truth is that as long as the amount of training data of a given language is above some threshold, knowledge transfers between languages.

melvinmelih

> because they are trained on multilingual data

But they were not trained on government-sanctioned homegrown EU data.

sunaookami

Who in their right mind would use this?

tensor

I'd use a model trained on a targeted and curated data set over one trained on all the crap on the internet any day.

saretup

The entirety of the internet vs government-sanctioned homegrown EU data.

tonyhart7

"But they were not trained on government-sanctioned homegrown EU data."

ok what are you implying on this

raverbashing

> But they were not trained on government-sanctioned homegrown EU data.

If none of the LLM makers used the very big corpus of EU multilingual data I have an EU regulation bridge to sell it to you

whazor

European governments have huge collections of digitalised books, research, public data.

But also European culture could maybe make a difference? You can already see big differences between Grok and ChatGPT in terms of values.

pembrook

If it's publicly available data, books and research, I can assure you the big models have already all been trained on it.

European culture is already embedded in all the models, unless the people involved in this project have some hidden trove of private data that they're training on which diverges drastically from things Europeans have published publicly (I'm 99.9% positive they don't...especially given Europe's alarmist attitude around anything related to data).

I think people don't understand a huge percentage of the employees at OpenAI, Anthropic, etc. are non-US born.

numpad0

Not natively, they all sound translated in languages other than English. I occasionally come across French people complaining about LLMs' use of non-idiomatic French, but it's probably not a French problem at all, considering that this effort includes so many Indo-European languages.

FinnKuhn

I can at least also confirm this for German. Here is one example that is quite annyoing:

Chat GPT for example tends to start emails with "ich hoffe, es geht dir gut!", which means "I hope you are well!". In English (especially American) corporate emails this is a really common way to start an email. In German it is not as "how are you" isn't a common phrase used here.

null

[deleted]

lm28469

Meh, it depends a lot on the dataset, which are heavily skewed towards the main languages. For example they almost always confuse Czech and Slovak and often swap one for the other in middle of chats

mirekrusin

But the only way to unskew it is to remove main language data because there isn't really any to add, no?

tensor

You can also correctly bias your sampling so that when selecting new training instances each language is chosen equally. Generally the diversity of data is good, unless that data is "wrong" which, ironically, is probably most of the internet, but I digress.

RobotToaster

Aren't they about as different as American English and British English?

svobodovic

The difference ia larger than let's say just a "dialect". They really are different languages, even though we generally understand each other quite well (younger generations less so). I've heard it's about as different as e. g. Danish and Swedish - not sure if that comparison is helpful.

intended

Nope. Capability begins to degrade once you move away from english.

Plus all your T&S/AI Safety is not solved with translation, you need lexicons and data sets of examples.

Like, people use someone in Malaysia, to label the Arabic spoken by someone playing a video game in Doha - the cultural context is missing.

The best proxy to show the degree of lopsidedness was from this : https://cdt.org/insights/lost-in-translation-large-language-...

Which in turn had to base it on this: https://stats.aclrollingreview.org/submissions/linguistic-di...

From what I am aware of, LLM capability degrades once you move out of English, and many nation states are either building, or considering the option of building their own LLMs.

adt

The EuroLLM-9B model release is from Dec/2024, and scores just above random chance for benchmarks like MMLU-Pro (17.6%, random chance is 10%).

Comparison with similar EU models + 600 other highlights:

https://lifearchitect.ai/models-table/

supermatt

> It is fully open source and available via Hugging Face.

This model was released in 2024, and I couldn't find any links to the training data - is it just an open weights model?

srameshc

I was thinking the same, why are so many superior models coming from only countries like US and China. And why are European countries not in the list other than France with Mistral. Why are so few companies in India, Japan, South Korea even close to a promising new model like what Chinese companies did ?

nonethewiser

"Why" is a fair question but are you surprised? Europe is consistently behind in tech.

Europe has about 1.3 times the population of the USA and about 75% of the GDP yet EU tech output is a very small percentage of US tech output. We are not talking about 70, 50, 30, or even 20%. It's a drop in the bucket.

>The seven largest U.S. tech companies, Alphabet (Google), Amazon, Apple, Meta, Microsoft, Nvidia, and Tesla, are 20 times bigger than Europe’s seven largest, and generate 10 times more revenue.

https://eqtgroup.com/thinq/technology/why-is-europes-tech-in...

"Why" is a good question, but I definitely wouldnt expect significant competition in LLMs from Europe based on the giant tech disparity. Having 1 non-cutting edge model that isn't really competitive is pretty much what I would expect.

InsideOutSanta

> The seven largest U.S. tech companies (...) are 20 times bigger than Europe’s seven largest, and generate 10 times more revenue.

I'm going to guess that this part is intentional. Europe tends to be more aggressive in enforcing antitrust laws. Economically, Europe's goal isn't to have the biggest companies but to have more smaller companies.

So you're not going to get companies like Google, but you will get companies like Proton, Spotify, Tuta, Hetzner, Mistral, Threema, Filen, Babbel, Nextcloud, CryptPad, DeepL, Vivaldi, and so on.

nonethewiser

>I'm going to guess that this part is intentional. Europe tends to be more aggressive in enforcing antitrust laws. Economically, Europe's goal isn't to have the biggest companies but to have more smaller companies.

So is your hypothesis that the total market cap of EU tech companies is something like 50,60,70, etc. % of total US tech marketcap? Something significantly different than the ~10% implied by that figure (largest us companies 10x largest EU companies). And it's just more broadly distributed?

Hard to find data on this but this is showing EU tech market cap at 3.2T. https://www.stateofeuropeantech.com/chapters/outcomes

Whereas this is saying the US "megacaps" ($200B+) are at 21T. https://www.cnbc.com/2025/09/05/tech-megacaps-worth-market-c...

Which puts the entire EU tech market at 15% of the US megacaps. Not even the entire market.

emporas

Also, commercial software is consistently behind from open source.

I only use open source LLMs for writing (Qwen 32b from Groq) and open source editor of course, Emacs.

If some people can write better using commercial LLMs (and commercial editors), by all means, but they put themselves at a disadvantage.

Next step for me, is to use something open source for translation, I use Claude for the moment, and open source for programming, I use GPT curently. In less than a year I will find a satisfying solution to both of these problems. I haven't looked deep enough.

loandbehold

Because training frontier model is expensive and only US and China have capital structure to raise tens of billions of dollars to do it.

lossolo

You can easily fit below 10 billion for the whole datacenter, then you only pay for electricity + maintenance + staff. 100k GPUs cost a few billion USD, that's more than enough to train frontier models, run experiments, and serve models in the EU to start. Look at what xAI did and how much it cost them and it's more expensive to do in US than in EU.

busssard

being able to train new frontier models is the new equivalent to nuclear capabilities.

i predict at some point countries will get CIA'ed when they publish plans to build a large data center.

Similar to the time when they got CIA'ed when announcing plans for new nuclear plants.

henriquenunez

They are already CIA'ed on a regular basis for much less than that.

sublimefire

As a European citizen I think it boils down to access to the capital. EU/EEA is not a country and the market is sort of fragmented. The big players are UK, France, Germany, everyone else does not have the same access to money as say in the US. Folks want to do it but there is a glass ceiling. Hence you have these collabs among large institutions to tap into funds such as from Horizon which are academic in nature and do not translate well into products.

isodev

Because the value of these models is (actually) yet to be proven. Why saturate the market with something that we already have at least one of and others are selling as a service? No model provider (including the "big ones" like OpenAI) has been able to produce a viable business case. They're all literally running on government deals and investor money.

sunaookami

EU made a >900 page law about AI and patted themselves on the back for being "the first to regulate AI" (which was not even true, China had an AI law before and it's two pages long).

sajithdilshan

This cannot be stressed enough. In my experience working in multiple tech startups in Germany, the power compliance, legal and all other 2nd line has over engineering is quite immense. Most of the time they act as a hindrance for innovation rather than a supporting factor.

This AI law is a clear example of that. Pencil pushers creating more obstacles for the sake of creating more obstacles rather than actually taking a pragmatic approach.

isodev

It's strange, my real life experience is very different than yours. Unless you're training AI to do something shady, it's really no bother at all. In fact, most of what the AI Act requires, you have to do anyway for a good model card.

apples_oranges

Does it even make sense? Just use the American or Chinese ones, adjust As needed. Where’s the point in spending millions to build The same thing or worse

t43562

Now that the big bets have been made, who wants to try to compete with them?

Zufriedenheit

EU officials should create an environment where abundant private companies can afford to put out many great open models instead of funding some selected individuals with taxpayer money.

rmoriz

Maybe we can call it "open weights" and not open source?

extraduder_ire

From the EuroLLM-9B page on hugginface;

>You need to agree to share your contact information to access this model

Is this common? I've never seen it on the site before, and it isn't on the smaller model. What are they collecting this information for?

ks2048

I'm not sure which models require this and why, but I've come across it. e.g. the llama models, https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct

trilogic

Great job, Thank you.

We support your work and offer backup and distribution. Here a copy just in case: https://hugston.com/uploads/llm_models/EuroLLM-22B-Instruct-...