Open Euro LLM: Open LLMs for Transparent AI in Europe

288 comments

·February 3, 2025

isoprophlex

I'm HIGHLY sceptical. The academics will love it, because they get money. But look at that list of parties involved. More than twenty parties supplying people; none of them will have this initiative on the top of their list of loyalties and priorities.

Meaning, everyone will talk, noone will take charge, some millions change hands and we continue with business as usual.

Instead this should have been a single new non-profit or whatever with deep pockets that convinces smart people to give their 100% for a while.

Death by committee. And I say this as someone who was in a multi-million research program across ~8 universities, that was going to do "groundbreaking" research. After a few months everyone was back to pushing their own lines of research, there was almost zero collaboration let alone common language or goal setting.

bayindirh

I can see that you're unfamiliar with how EU grants and how these project collections work, but I don't have much time to address this with great detail.

As a person who's in this type of projects for a long time, what I can say is "it works", because people do not compete with each other, but will build it together.

What I can say is, if they have came this far, there's already plans about what to do, and how to do, and none of the parties are inexperienced in these kinds of things.

misiek08

"It works!" is the only thing that will be visible on web page after hundreds of milions will be burned. I’m observing few of such „unprecedented” cooperation projects from EU funds. A lot of meetings, a lot of managers, plenty of very unskilled people creating mess and few names doing presentations so companies will believe everybody know what are they doing. Same from company side - they need being in those projects to comply with stupid EU rules about being eco.

Ball of mud.

varjag

Europe runs its space programme in this way and so far it has pretty good track record. There are more ways to build stuff than the worship of personality.

badlogic

As someone who has lived through Eurostar and Horizon 2020, and who has participated as both a researcher and corporate partner, I can say: it does not work.

Unless by work you mean "successfully passed the post-project review by non-experts based on a bunch of slides"

Point at a single project of this sort that had any tangible output that's still in use.

sunshine-o

I once registered as an "expert" on those EU related websites in the hope to be invited to an event where I could network.

Next thing I know one of those Horizon 2020 project send me 20 proposals to evaluate and select by next week. Each of them was 50-100 pages long, mostly BS.

I couldn't really do any real due diligence and I don't believe anybody did any on me. So just create register fake domain names to get a fake corporate email addresses, create a fake LinkedIn profiles and you can have a significant weight in the selection process for grants. It is that simple.

I remember it made me feel sick in my stomach to think that the money that would be given through my evaluation was most likely equivalent to one year of tax revenue from a random honest small business.

rickdeckard

> Point at a single project of this sort that had any tangible output that's still in use.

Not sure what is your limiting factor (just universities + industry consortiums or explicit IT projects?).

Graphene Flagship might be an example, with their research on Graphene they contributed to the foundation of more efficient batteries and solar panels, innovation in automotive and commercial products and so on.

Clean Sky Joint Undertaking (CSJU) also had quite some impact on the industry (I think it was part of EU's Horizon 2020). They worked on technologies to reduce CO2 emissions and noise of Aircrafts and contributed quite a bit to the European industry (Rotor engine innovations, advanced greener materials, etc.)

And I think the discovery of the Higgs boson was also the result of a European Research consortium with CERN...

So yeah, Europe is surely not the center of all innovation and economic efficiency, but I wouldn't demonize every attempt to change that...

hedgew

My experience from these projects is the opposite. The projects are always secondary priorities for participants, and the difficulty of coordinating some dozen entirely separate organisations towards something actually productive is immense. In practice each participant independently spends the money they get on something lightly relevant, and the occasional coordination meetings are spent on planning how to fulfill the reporting requirements of the grant.

Business and research are difficult enough even when done by tightly knit teams and constantly tested against real world systems and customer feedback. The idea that a hodgepodge of organisations can achieve poorly defined yet aspirational goals on a low budget is massively misguided.

closewith

> I can see that you're unfamiliar with how EU grants and how these project collections work, but I don't have much time to address this with great detail.

This is a take that can only come from someone who is dependent on Horizon, because I don't think any independent observer could look at Horizon projects and say they just work.

bayindirh

> This is a take that can only come from someone who is dependent on Horizon.

No, I'm not dependent on Horizon Programme. I just look at what we did, the outcomes, and talk from that point.

Maybe our sphere is one of the ones which deliver. I can't see the whole thing, it's too big to observe. Even if we're in the 5% which delivers, which is same with the startup scene, which is loudly applauded because it's an incredibly well working system.

ngcazz

Having worked on an FP7 programme myself and having a family member involved in project audits, I’d say some skepticism is warranted—particularly regarding the incentives that attract private sector partners and the talent they actually allocate once funds are secured.

Funding is tied to employee qualifications and effectively subsidises salaries, which creates room for misalignment. No-shows of allocated employees were not uncommon, since a company willing to accept lower-quality deliverables can assign junior employees to do the work at a fraction of the cost, while the salary difference for their PhDs simply becomes added margin.

lifty

Can you tell me please what you worked on and where I can see the output? I’ve been adjacent to these kind of efforts and the only thing I can say is that I’m highly skeptical of your claims.

jansan

In the case of Quaero [1] "it did not work". Sure, all involved parties were praising the project and by constantly shifting goalposts they could label it a success, but in the end it was a huge waste of money, sucked in by the usual suspects.

[1] https://en.wikipedia.org/wiki/Quaero

lyu07282

> it was a huge waste of money, sucked in by the usual suspects

other example: https://en.wikipedia.org/wiki/Gaia-X

Maken

While I do think EU grants are a good thing, I'm sceptic about these too-big-to-fail multi-national projects. I still remember the Human Brain Project.

null

[deleted]

rhubarbtree

Ah, so you’ve been an academic before, then.

The problem is academic culture is corrupt, and it’s very hard to reverse the decay.

Simple example: one Russell Group UK university (like many others) was admitting students who couldn’t speak English. A lecturer on a technical subject found they were struggling to understand his course, in part due to the language barrier. Come the exam, most of the students failed. He was told to make the exam easier so they would pass. The lecturer involved is a well meaning kindly man who would consider himself very ethical. But he did what he was told and the students passed.

In such a system it’s hard to see how an individual can fix it. If he had protested, he’d have been gently moved aside and the exam would have been rewritten by someone else.

Research is similarly corrupt. Grants are written to match a call, and they promise the earth. Friends review them and score highly. Pals on the grant committee favour their friends. And it’s implicitly agreed that the outcomes don’t have to be achieved. You go back to doing your original research, or not doing much at all, or more likely figuring out how to get some papers published and writing more grant proposals.

The idealistic, actually interested in progressing the field, leave or are squeezed out, looked over for lectureships in favour of folks who bring in grants via bs and politics.

Choose a topic you know about. Go on the EPSRC website. Look at grants ten years ago and see what their promised outcomes were.

My only answer is that a project like this must be done by people hired from outside of academia, which at this point is probably corrupt beyond repair. I look back at previous generations and wonder how the hell so much advancement was achieved.

tmikaeld

” The models will be developed within Europe's robust regulatory framework, ensuring alignment with European values while maintaining technological excellence.”

They may release something, but i doubt it will be more useful than what already exists.

bayindirh

> They may release something, but i doubt it will be more useful than what already exists.

I wouldn't put such prejudice in this thing. I'm not implying that you're wrong, but I'm highly skeptical that the model will be incompetent or inferior.

Also, don't forget. They'll open source it end to end. From data to training/testing code and everything in between.

miohtama

The model itself might be useful in the end. But it's terrible industrial policy to kill your startups with regulation and then the state needs to step in, because private companies no longer want to work with you, or no new companies are created.

null

[deleted]

rcarmo

As someone who worked in several Eurescom research projects back in the early 90s and watched it all get steamrolled by actual pragmatic work done in telcos and US manufacturers, I have zero faith in this even as a political/independence gesture.

There are loads of people who think "there is no moat and Europe can do this" (including the Portuguese government, which announced a Portuguese LLM at WebSummit--which, hilariously, is being trained on a research "supercomputer" in Spain), and they have no idea how far (politically, economically and pragmatically) Europe's tech scene is from the US. Other than Mistral, of course.

londons_explore

> Death by committee.

This is how the EU works. It's the reason the EU has very little innovation compared to the USA.

FranzFerdiNaN

> The academics will love it, because they get money.

Nice way to frame this.

yread

I'm involved with IMI-BIGPICTURE, a similarly sized EU initiative (~70M funding). It's not that bad. Things will take a while to start moving but as long as all the players stay on the same page shit will get done. 10x slower than with a small team but some things can't be done in small teams

closewith

> The project aims to create a repository of digital copies of around 3 million slides covering a range of disease areas. This repository will then be used to develop artificial intelligence tools that could aid in the analysis of slides.

€70 MM to get the digital copies of 3 million slides. Speaks for itself.

jpdus

As someone who is in general skeptical of programs like this (and an European) there are 2 remarkable / timely things about this:

- This project doesn't just allocate money to universities or one large company, but includes top research institutions as well as startups and GPU time on supercomputing clusters. The participants are very well connected (e.g. also supported by HF, Together and the likes with European roots) - Deepseek has just shown that you probably can't beat the big labs with these resources, but you can stay sufficient close to the frontier to make a dent.

Europe needs to try this. Will this close the Gap to the US/China? Probably not. But it could be a catalyst for competitive Open source models and partially revitalize AI in Europe. let's see..

PS: on Twitter there was a screenshot yesterday that in a new EU draft, "accelerate" was used six times. Maybe times are changing a little bit.

Disclaimer: Our company is part of this project, so I might be biased.

riedel

I wish you the best of luck. However, this is basically a still just a European joint research project (admittedly compatibly well funded) with similar partners that have been also connected before in other research projects. To really compete in the space it will require new ideas, great talent and good leadership towards a common goal. I have myself been part of many EU funded projects and know the difficulty of realizing this within such a project. Public funding sadly has adversarial effects sometimes.

As for computing cost: as EuroHPC gives resources to research for free there can be more budget for computing. The EuroHPC joint undertaking has just decided to invest hundreds of millions of Euro in new AI clusters and supporting services. So this can come on top. Actually projects like this are much needed to also make good use of the money.

Disclaimer: my lab is involved in one of the new AI Factories.

menaerus

So, if one has a well thought-through idea, what is the process of getting the resources ($$$) from OpenEuroLLM and the compute from EuroHPC? How do I become a partner as a long-standing engineer with plenty of industry practice in research and development?

I am asking this because I never really understood how EU funds are working, they always seemed to me as there's a lot of gate keeping.

jpdus

There definitely is - but that we, as a startup that is barely a year old and not widely known outside our niche in AI dev circles and on Huggingface, are part of this is already a sign that times are changing.

To be fair: We probably couldn't have handled the paperwork without LLM´s - but due to this technology, the process was still long and involved but manageable.

(BTW: We´re hiring, if you really want to work on this ;-). As a freelancer/solo entrepreneur this will be difficult though..)

riedel

Applying for EC Projects is a complicated game. For SMEs there is typically Open Calls like this: https://www.ffplus-project.eu/

Regarding compute simply file an application here: https://eurohpc-ju.europa.eu/access-our-supercomputers/euroh...

The data analytics and AI call is currently not open but the AI factories will start in April, so there will be compute.

If you have questions don't hesitate to contact me as I will have to do commuty management for HammerHAI

FanaHOVA

The problem is that: - These are not really super computing cluster in LLM terms. Leonardo is a 250 PFlops cluster. That is really not much at all. - If people in charge of this project actually believe R1 costs $5.5M to build from scratch, it's already over.

jpdus

I think no one believes that R1 costs $5.5m from scratch. People in this project (most, not all) are very aware of the realities in training and are very well connected in the US as well. Besides Leonardo there are JUWELS, LUMI & other which can be used for ablations and so on.

This will never compete with what the frontier labs have (+ are building) but might be just enough for something, that is close enough to be a useful alternative :).

PS: Huge fan of Latent Space :)

whimsicalism

what are you all talking about? most people in the industry do believe the publicly stated numbers for dsv3

whimsicalism

> If people in charge of this project actually believe R1 costs $5.5M to build from scratch, it's already over.

wdym?

MR4D

The money doesn’t matter.

The goals don’t matter.

The people don’t matter.

The only thing that matters is how much regulatory red tape is involved.

My guess is that the paperwork will kill this. Read the announcement. Too much discussion about regulatory framework. In the US or China, all you need is some money and smart people. That’s a very low barrier to getting moving forward.

askonomm

In other words, to be successful you need to be able to break the law and lobby the government? That is indeed the USA mindset, or should I say United Corporations of America? I'm happy EU is not USA.

MR4D

That’s absolutely asinine and not at all what I said.

The EU over regulates things like tech and that why they won’t be successful at have an AI tech scene. Over time, anyone good will migrate to the US or China where they can work faster and not have as many rules to deal with.

A simple example is hiring and firing people - it’s much easier to make personnel changes in the US than Europe. As a result, US companies can take more risks.

jpdus

I agree that the announcement should´ve talked more about goals and performance than regulatory stuff ;-).

But I think there is a new understanding among the bureaucracy that regulation (alone, without innovation) will kill Europe´s competitiveness and that some acceleration and cutting of red tape is necessary.

Can't say with certainty that this will be successful. But that we, as a very young startup that is barely known outside of our AI Open Source niche, are part of this, is already a sign in itself - a year ago I´d have never believed that this might be an option (and also probably would've declined if someone asked us to join a EU-funded project).

We will have engineers without a degree (but hundreds of thousands of HF downloads) working side-by-side with some of the top researchers + HPC centers.

MR4D

I wish the effort well. Any change is welcome.

permanent

> China, all you need is some money and smart people

No way

oytis

What I don't understand is the big plan. Say, you manage bring about something that works in the lab on par with DeepSeek R1. What happens with it next? In the market LLMs are being improved continuously based on feedback - in terms of usage data etc. and new versions are being released multiple times a year. If we want to stay sovereign, we need a similar engine started in Europe, but I can't see how a research project relying on a walled garden system of supercomputer centres can start it.

intelVISA

What route(s) did you go through for funding? As an outsider the bureaucracy fascinates me, I trust it's all open and transparent like the EU?

2-3-7-43-1807

> Deepseek has just shown that you probably can't beat the big labs with these resources

is that a new take? cause so far deepseek was considered as proof for small companies being able to compete with big players like openai ...

jpdus

might be debatable - but I tend to agree with Dario Amodei on this; my guess is that R1 is 7-10 months behind the internal frontier at the big labs, while having a few small novel tricks. (But i might err, will be interesting to see the development going forward)

2-3-7-43-1807

the main narrative so far was that deepseek was cheaper and better than llms by openai.

sarusso

They allocated €37.4 million [1]. As an European, I truly don’t understand why they keep ignoring that the money required for such projects is at least an order of magnitude more.

[1] https://digital-strategy.ec.europa.eu/en/news/pioneering-ai-...

esperent

Deepseek's release has shown that there's no great risk in getting left behind. All the info is out there, people with skills are readily available, creating a model that will match whatever current model is considered frontier level is not that hard for an entity like the EU.

For everyone here shouting that the EU needs to do something, be a leader, what have they lost so far by choosing to lead in legislation instead of development?

They've lost nothing. They've gained a lot.

They can use the same frontier level open source model as everyone else, and meanwhile, they can stay on top of harmful uses like social or credit scoring.

Also speaking as a European, legislation is kind of the point of a government in the first place. I do think the EU goes too far in many cases. But I haven't seen anything that makes me think they're dealing with this particular hype train badly so far. Play the safe long game, let everyone else spend all the money, see what works, focus on legislation of potentially dangerous technology.

lolinder

> legislation is kind of the point of a government in the first place

I would personally consider legislation to be but one means to an end, with the point of a (democratic) government actually being to ensure stability and prosperity for its citizens.

In that framework, "leading with legislation" doesn't make any sense—you can lead with results, but the legislation is not itself a result! Lead with development or lead with standard of living or lead with civil rights, but don't lead with legislation.

Your formulation sounds like politician's logic: "something must be done, this is something, therefore we must do it". Legislation as an end in itself. Very interesting.

https://www.youtube.com/watch?v=vidzkYnaf6Y

esperent

> I would personally consider legislation to be but one means to an end, with the point of a (democratic) government actually being to ensure stability and prosperity for its citizens

You're correct, in retrospect I was a bit hyperbolic in my statement.

A better statement of my view is: the goal of a government should be the prosperity and wellbeing of it's citizens and the greater system we're all a part of (both geopolitical and ecological), and the best way we've so far discovered to do that is via legislation of an otherwise free market.

glooglork

> They can use the same frontier level open source model as everyone else, and meanwhile, they can stay on top of harmful uses like social or credit scoring.

We are dependent on models created by USA and Chinese companies for access to the technology that seems to be the next internet - while the entire world is accelerating hard towards protectionism and tariff wars.

What could possibly go wrong

wickedsight

Yeah, this is exactly what scares me. But it also scares me that there's almost zero oversight on what USA and China are producing and the bias that could be embedded into these models by their creators...

I'm just not sure whether it's worse to be behind or to try to be in front by all means necessary.

amunozo

I partially agree with you. The only problem is that these markets are highly monopolistic, and we will be creating another technological dependency on the US.

YetAnotherNick

Deepseek didn't show anything except the compute cost of final model. We don't know how much data collection costed, how much unethical data like copyrighted data or OpenAI's data is needed, the cost of experiments etc.

> Creating a model that will match whatever current model is considered frontier level is not that hard for an entity like the EU.

If they have this as their top priority and allotted few billion dollars then sure. Not in the current form where the people involved are only involved for publication, not doing hard engineering things that takes months or years and they could do the same thing in OpenAI or Deepseek for like $1 million salary which both of them pay.

tesch1

> lead in legislation

> legislation is kind of the point of a government

As an American, most of this post reads like doublespeak satire. I guess it's not, but just to put a transatlantic pov here.

I'll add a sports metaphor for good measure: in order to become expert football players, we'll get tickets to watch the best teams play.

spiderfarmer

I’ll add some European wisdom to your sports metaphor. You don’t have to become a big football player to make money in football. I’d rather make money from the tickets and rights than dedicate my life to a sport that’s only played in the US.

esperent

> As an American, most of this post reads like doublespeak satire

Yeah, you guys have a lot of brainwashing to get over. I can imagine that you're deeply conditioned to read any outside views on politics as satire.

One kind of brainwashing is the need to reframe everything political into sports metaphors. The EU is not a sports team. It's a political entity. Whatever you might have been taught, these are very different things, with different needs. You can't have meaningful conversations about a political entity via sports metaphors.

Well, maybe in US politics you can. There you have two teams determined to beat the other at all costs. EU politics isn't like that. We are trying to work together, not kill each other.

null

[deleted]

Etheryte

Personally I'm rather happy that the allocation was not too large at first, even that is quite a sizeable sum. The EU is great at kickstarting projects that sound like a panacea, but end up not leading to anything. Once they have something to show, by all means, throw more money at them.

nradov

The trap that these EU projects typically fall into is that they burn all of the grant funding on paying politically connected consultants to write reports. No one gets around to building an MVP.

intelVISA

I thought that was the point?

riedel

As said before in another comment. The project can likely make use of 'free' EuroHPC resources, which will also be funded simultaneously with hundreds of millions. Still not Stargate, but if they can actually innovate something beyond the obvious (like R1) I think the money is still useful.

sarusso

On what basis are you are stating this? I'm asking because I have been involved in another project like these (15M budget) and the main issue was the lack of computing resources allocation, because no one thought about it (true story).

riedel

The application process IMHO is quite complex (thwy want compute estimations and CVs etc).However if you figure how ist works it is at least relatively easy to get batches of 100k GPU hours via EuroHPC. Currently few calls are open, but there is typically at least also national infrastructure. Again this is nothing compared to what OpenAI or Meta has access to. I just got 25k node hours on Meluxina for a fine-tune project. My colleague got quite some GPU compute oh Germany's Tier3 NHR Clsuter (Horeka with >700 A100/H100 GPUs).

miohtama

Because Europe does not have enough money. This comes from taxes i.e. as an EU citizen you pay for the fun.

Private sector often does not fund projects like these as they have bad return on investment.

Cumpiler69

>Because Europe does not have enough money.

They seem to have enough to send overseas and to spend on illegal economic migrants.

>Private sector often does not fund projects like these as they have bad return on investment.

Then why does the private sector in the US fund projects like these?

tmnvdb

Because America has the best private capital and startup ecosystem in the world it has a good chance of picking the big winners. There is no corresponding European ecosystem, only a bunch of small national ones. In fact, European investors are not betting on EU startups because they are unlikely to be able to scale to beat US and Chinese competitors due to lack of market and capital market scale.

rsynnott

> They seem to have enough to send overseas

If you mean the NDICI stuff, that's hardly 'sending money overseas', and it's a fairly tiny fraction of spending.

> and to spend on illegal economic migrants.

... What are you talking about here? What portion of the EU budget is spent on that? What activity specifically?

In the real world, most EU spending is on regional development, agricultural stuff, and operating the EU (civil service, enforcement bodies, etc etc). The EU is not a country and has only a very small budget (about 170bn/year).

egorfine

These millions were allocated for business-class tickets and accommodations for the talking members of this task force. So, 37.4m is plenty.

I have zero doubt that nothing else will come out of this.

Source: have been working with major UN and international bodies on the software side.

everyone

Not anymore, with Deepseek's stuff right? Which is open.

sarusso

DeepSeek had plenty of R&D expertise which were not included in the (declared) model training cost. Here we are talking about building something nearly from scratch, even if there is an open source starting point you still need the infrastructure, expertise and people to make it work, which with that budget are going to be hard to secure. Moreover these projects take months and months to get approved, meaning that this one was conceived long before DeepSeek, thus highlighting the original disalignment between the goal and the budget. DeepSeek might have changed the scenario (I hope so) but it would be just a lucky ex-post event… not a conscious choice behind that budget.

cess11

What do you mean, "nearly from scratch"?

Aleph Alpha is a business that has been going for some time in this sector, at least a couple of years with commercial LLM products. It's likely they'll provide hardware and base models for this project.

blackeyeblitzar

DeepSeek probably spent closer to two billion on hardware. And then there’s the energy cost of numerous runs, staff costs, all of that. The 5.5m cost was basically misleading info, maybe used strategically to create doubt in the US tech industry or for DeepSeek’s parent hedge fund to make money off shorts.

https://semianalysis.com/2025/01/31/deepseek-debates/

null

[deleted]

null

[deleted]

rsynnott

I mean, I get that the current strategy by most participants seems to be burning billions on models which are almost immediately obsoleted, but it's... unclear whether this is a _good_ strategy. _Especially_ after deepseek has just shown that there _are_ approaches other than just "throw infinite GPUs at it".

Like, insofar as any of this is useful, working on, say, more techniques for reducing cost feels a lot more valuable than cranking out yet another frontier model which will be superseded within months.

thatguymike

So for €52mn you'll get... a worse Llama? But don't worry, it'll be "transparent and compliant" which will make people want to use it. Very European.

jamil7

> a worse Llama?

As someone who lives here, I'd actually be surprised if we even got that. I expect lots of taxpayer funded websites, manifestos, PowerPoints and numerous discussions and ultimately nothing.

dailykoder

That'd be very very good actually. I'd be happy if institutions would use that where one could TECHNICALLY (maybe just a miniscule amount of people would do that) verify data from end to end, instead of some "open" model that is actually not open at all. A little worse performance is a good trade-off imo

askonomm

What's with this American mentality that everything needs to always be the best, and if it isn't, it should't even exist? I know USA is alright with breaking the law, invading people's privacy and lobbying its government to the point where it's really the corporations that elect politicians into power, but why do you also need Europe to be the same way? I thought us Europeans have made it pretty clear we don't like your way of governing, so stop forcing it on us. I'd much rather use a less capable LLM if it meant that the LLM isn't driven on top of mountains of illegally collected data.

malthaus

you get money to sustain a bunch of academics and startups past their good-by date

the eu gets some publicity

and the public gets nothing but another bite out of their taxes

NunoSempere

They just shipped a frontpage; there is no model yet.

podgorniy

url says "Press release" https://openeurollm.eu/launch-press-release. They delivered what they promised.

beernet

The actual top EU AI labs like Mistral, Black Forest Labs, or Stability AI are nowhere to be seen. Same goes for potent, established companies like SAP, Schwarz Group and the like. They likely made the right move here as this is doomed to fail, as correctly elaborated by the top comments.

tobyhinloopen

Better late than never. I can't wait to get my hands on a mediocre AI that's 2 generations behind!

CrimsonRain

2 is generous. 5 is more likely.

Also can't wait to get bombarded with cookie popup, ai bias popup, then ai accuracy popup etc.

Lionga

Anything more then a powerpoint coming out of this would be a generous expectation.

sofixa

> ai bias popup

It's all fun and games until AI models decide your type of people (blonde/brown/from that zip code/with that type of last name/went to that school/worked there in the past/have those facial features) are "bad" or "untrustworthy" or don't deserve healthcare or to be hired for that job or get a mortgage.

"AI" bias has existed for as long as we have had "AI" in its various forms. Remember ML algorithms classifying black people as monkeys? And the "solution" was to make them unable to find monkeys or primates. That one got big because of the implication.. when it's "people with the last name Smith being dumb", nobody will care

CrimsonRain

Has nothing to do with bias popup. What next? Need to slap a "chocking hazard" label on small items? Keep out of reach of children on medicines? Oh wai...

sirsinsalot

The alternative is businesses are not held to account. I'd much rather have a cookie pop-up and GDPR notices than businesses have no guard rails against moves that are not in the interest of the user/customer.

CrimsonRain

Or hear me out...mandate cookie setting option in browser (no cookie, essential cookies, tracking cookies) instead of fuking prompt in every single site. That also allows me to not let sites ask and force essential cookies everywhere.

or block tracking altogether.

these cookie banners collectively wasted billions of hours for no gain.

voidr

> The models will be developed within Europe's robust regulatory framework

I'm sure that all AI research needs is "robust regulation".

As a European, it annoys me to no end that Brussels bureaucrats think they know and understand everything and they can regulate everything, the only thing they are achieving is making sure that AI companies will avoid forming in the EU, because nobody wants to be at a disadvantage compared to the rest of the world, sure eventually they will provide service to the EU countries, but we will never have our own industry.

The EU needs to stop having pencil pushers make decisions on things they have no clue about and somehow get people who know what they are talking about to make the choices.

tessitore

I see plenty of pessimism in the comments, & talk about how unqualified the organizations people are expected to be, without addressing how important this initiative is to EU & how necessary it is they succeed.

USA has just proven they are economically unpredictable & so unstable they have become fiscally volatile with control in the hands of the lobbiests. This is why Open LLM has the starting support it does already, & for soverign nations is seen as mission critical so as to avoid long term digital services taxations being leveraged like tarrifs against anyone who does not cooperate with whomever is leading the USA.

So to me it feels as though the project is impressive, & quite likely to succeed where others have failed because so few do understand the technology enough to get in the way of progress towards openly standardizing decentralization of AI compute across soverign cloud infrastructures. Even if Aleph Alpha is not able to lead development fast enough, organizations such as OUMI (Open Universal Machine Intelligence) will be working alongside them in attempts to build out the Linuxs of AI frontier modelling.

If nothing else, Open LLM guarantees a raise in the social standards of what it takes to succeed in AI long term. At worst it provides a measure bar of success of global AI innitiatives to be compared against while introducing new organizations & people to the open source ecosystem who would never of otherwise invested in it without the EU stamp of approval.

moffkalast

> The models will be developed within Europe's robust regulatory framework, ensuring alignment with European values while maintaining technological excellence

As a European, that's practically an oxymoron. The more one limits oneself to legally clean data, the worse the models will be.

I hate to be pessimistic from the get go, but it doesn't sound like anything useful will be produced by this and we'll have to keep relying on Google to do proper multilinguality in open models because Mistral can't be arsed to bother beyond French and German.

Fnoord

I've been using Mistral past week due to changes in geopolitics, and Mistral works absolutely great in English. I haven't bothered in my native language yet, but in English it worked great. Better than my first experience with ChatGPT (GPT 3.5), actually.

Fnoord

Update: tried a couple of Dutch (my native language) queries, and it worked well. No issues whatsoever. Which is no surprise, given Dutch <-> English and vice versa translations often work very well.

moffkalast

Ok I see we're very far from being on the same page.

Multilingualism in context of language models means something more than English, because that's what every model trained on the internet already knows. There aren't any I'm aware of that don't, since it would be exceedingly hard to exclude it from the dataset even if you wanted to for some reason. This is like the "what about men's rights" when talking about women's rights... yes we know, they're already entirely ubiquitous.

But more properly I would consider LLM multilingualism straight up knowing all languages. We benchmark models on the MMLU and similar collections that contain all fields of knowledge known to man, so I would say it's reasonable to expect fluency of all languages as well.

lnauta

I've been using mistral for most of January at the same rate as chatgpt before. I decided to pay for it as its per token (in and out) and the bill came yesterday... A whopping 1 cent. Thats probably rounded up.

ur-whale

> I decided to pay for it as its per token (in and out) and the bill came yesterday... A whopping 1 cent.

Doesn't sound too good wrt their eventual profitability.

simion314

>As a European, that's practically an oxymoron. The more one limits oneself to legally clean data, the worse the models will be.

Train an LLM with text books and other legal books, you do not need to train it on pop culture to make it intelligent.

For face generations you might need to be more creative, you should not need milions of images stolen from social media to train your model.

But makes sense that tech giants do not want to share their data set and be transparent about stuff.

jampekka

> Train an LLM with text books and other legal books

Without licenses to the books, they are just as illegal (and maybe even moreso) than web content.

simion314

>Without licenses to the books, they are just as illegal (and maybe even moreso) than web content.

There are books that are out of copyright, and also free books.

troyvit

If LLM organizations are free to throw billions at hardware they can spare a paltry €50 million for 10 million e-books though, right?

idunnoman1222

Open AI fed their original model Anna’s archive for breakfast.

htrp

>Mistral can't be arsed to bother beyond French and German.

Any more details here or a writeup you can link to?

moffkalast

My own experience mainly, only Gemma seems to have been any good for Slavic languages so far, and only the 27B when unquantized is reliable enough to be in any way usable.

Ravenwolf posts tests on his German benchmarks every so often in locallama and most models seem to do well enough, but I've heard some claims from people about Mistral's being their favorite models in German anyhow. And I think Mistral-Large scores higher than Llama-405B in French on lmsys and that's at least something one would expect from a French company.

mhitza

In my experience Mistral (at least Nemo) works well with other languages. Don't know about Slavic languages but it does Romanian, with apparent issues around the translation of technical terms.

jampekka

What do you mean by relying on Google?

Llama 3.1 and DeepSeek v3/R1 largest models are rather good at even a niche language like Finnish. The performance does plummet in the smaller versions, and even quantization may harm multilinguality disproportionally.

Something like deliberately distilling specific languages from the largest models could work well. Starting from scratch with a "legal" dataset will most likely fail as you say.

Silo AI (co-lead of this model) already tried Finnish and Scandinavian/Nordic models with the from-scratch strategy, and the results are not too encouraging.

https://huggingface.co/LumiOpen

moffkalast

Yes I think small languages which have a total corpus of maybe a few hundred million tokens total have no chance of producing a coherent model without synthetic data. And using synthetic data from existing models trained on all public (and less public) data is enough of a legally gray area that I wouldn't expect this project to consider it, so it's doomed before it even starts.

Something like 4o is so perfect in most languages that one could just make an infinite dataset from it and be done with it. I'm not sure how OAI managed it tbh.

rixed

This article stayed on the front page of HN for a couple of days: https://timsh.org/tracking-myself-down-through-in-app-ads/

The author was in Europe.

Aparently, all the rules protecting the privacy of european citizens make no difference in practice.

I wonder why, but I believe the EU will look into this soon, since it would be so unconfortable if the king were bad.

danieldk

Or it just takes time to enforce the regulations. As an EU citizen, the recent regulations have already helped me a lot - a lot of companies provide data takeout now, it has become easier to remove accounts, many more websites ask specific consent, etc. Or even small things, our daughter's school has to ask specific consent if they can make photos and where they can post them (of group activities, etc.). Does everyone play according the rules? Not yet, but we will get there.

627467

> our daughter's school has to ask specific consent if they can make photos and where they can post them

The result of this is we don't see anything out daughter does in school because school decides to comply with draconian regulation by saying "fuck it". The same applies to having parents present daily: we don't touch the grounds of the school unless we make a formal request, we don't see the teacher everyday, we don't hear how the day went from professionals who actually spent time with them. This is all 100% the opposite of our experience outside of Europe before moving and I'm comparing public school system in a third world country to an European one. It's just an anecdote but it hasn't been more clear to me how much in a death spiral the EU is than the experience we currently are having

probably_wrong

Maybe it's because I'm not a parent, but what you describe seems to me less "privacy gone wild" and more "Europe vs Non-Europe".

The German parents I know wouldn't consider going to the school without reason (kids go either alone or as a group), nor would they expect their teachers to give daily reports. Not because of privacy rules, but rather because you're expected to grow up independent. There are of course regular reports, but talking to the kid's teacher every day would, I believe, get you classified as "oh, that parent". And then there's also the problem of vindictive divorcing parents who take their children away before the other parent shows up.

> we don't see anything our daughter does in school

If you're talking about photos of the children then I can't imagine a cost-effective way to ensure that photos of your children end up on the Internet while photos of my (hypothetical) children hugging yours do not. But perhaps you have a more precise example in mind.

kubb

You guys expect to see the teacher every single day?

danieldk

The result of this is we don't see anything out daughter does in school because school

This makes very little sense. Our daughter's school just has three checkboxes: private school website, social media, local newspapers. We checked 'private school website' and we get pictures of school activities, but they don't post them on Facebook, etc.

we don't hear how the day went from professionals who actually spent time with them

Uhm, so? I don't feel the need to micro-manage our daughter's school life? She'll tell us what she did after school if she so pleases. If there is something important, the teacher will send a message. Not everything needs to a 24/7 live social media feed. Kids go to and from school by themselves and arrange their own playdates after a certain age, that's how they learn to be independent.

When I was a kid I also went from/to school by myself starting when I was maybe 7 or 8?

rsynnott

Enforcement of the GDPR has been _grindingly_ slow; the first really significant fines weren't issued until 2022, when the Irish regulator finally pulled the finger out.

Presumably based on this experience, more recent internet-y laws (DMA, DSA, AI Act) are _not_ dependent on national regulators, and enforcement is getting off the ground more or less immediately. I'd expect that when the GDPR's successor shows up it'll follow suit.

fforflo

In the EU, we need some social contracts for those things. Multiple EU-funded projects are launched, consortiums between Unis and Private sector companies are created, deliverables are delivered, and grants are allocated, but the cumulative results haven't been that great, have they? That has happened for every "trend," from nuclear physics to "expert systems" in the 1990s to green tech, now AI, etc.

websap

Here's the simple question - who gets fired when the deliverables aren't met?

This is the single greatest motivators for American companies in our exhausting capitalistic society.