AI, Wikipedia, and uncorrected machine translations of vulnerable languages
33 comments
·October 25, 2025simonw
null
TZubiri
The Cebuano wiki is a similar case, not spoken often, but it was a personal project of an editor that was mad at political articles and started making animal articles in the Cebuano wiki.
The solution is to differentiate and tag inputs and outputs, such that outputs can't be fed as inputs recursively. Funnily enough, wikipedia's sourcing policy does this perfectly, not only are sources the input and page content is just an output, but page content is a tertiary source, and sources by policy should be secondary (and sometimes primary) sources, so the system is even protected against cross tertiary source pollution (say an encyclopedia feeding off wikipedia and viceversa).
It is only when articles posing as secondary sources fail to cite wikipedia that a recursive quality loss can occur, see [[citogenesis]]
galagawinkle489
Many sources for Wikipedia articles refer to Wikipedia without citing it. Many journalists will work from Wikipedia, and most of Wikipedia's sources are journalistic articles. It happens to be that often this isn't noticed because the information obtained this way is true and uncontroversial. Citogenesis only documents examples where, by bad luck, the result is untrue information.
fooker
Great, now apply this idea to political ideology.
Papers and books will be written about naive heavy handed online censorship creating echo chambers and driving the US into fascism.
AlienRobot
Yes, half of my entire political ideology is based on posts written by 12 year olds on the Internet. The other half is based on posts written by dogs[1].
1. https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_...
fooker
Yep, it's either dogs or r̸u̸s̸s̸i̸a̸n̸ chinese bots.
add-sub-mul-div
Jfc, not everything is about that.
fooker
I meant it as an example of--road to hell paved with good intentions and "and naively failed to recognise the damage they were doing".
But you do you.
aucisson_masque
> Wehr, who now teaches Greenlandic in Denmark, speculates that perhaps only one or two Greenlanders had ever contributed.
That's the core issue, it's not those who use AI translator or worst like Google translate. If there isn't any Greenlander to contribute to their Wikipedia, they don't deserve to have one and instead must rely on other languages.
The difference between an empty Wikipedia and one filled with translated articles that contains error isn't much. They should instead close that version of Wikipedia until there are enough volunteers.
Symbiote
The end of the article says they have closed it.
consp
That last part creates a chicken and egg problem. You can argue about it but I will bet it will never get traction if there is no basis to start from.
bawolff
Wikipedia has an "incubator" setup where people can start working on a language in the incubator until it demonstrates enough interest.
Mars008
> who use AI translator or worst like Google translate
It's the same. Google translate uses trained AI models.
strogonoff
Wikipedia editors is among the many communities that have for a long time mostly successfully relied on the tendency of relatively superficial, easy to validate capabilities (such as being able to use a website, write something resembling real language, and handle basic communication) to correlate with more valuable but harder to validate qualities (such as ability to write reasonably well and follow rules/guidelines, and generally being a well-intentioned person) as one of their main barriers to entry. Attributable to the deluge of commercial LLMs[0] available at such low prices that their operators lose millions to billions of dollars in order to gain market share and ultimately profit, such communities may not be able to continue to exist as is for long, I suspect: either they would be forced to institute more intrusive barriers (be that ID verification, invite-only memberships, or something else) while the deluge lasts, or they may be effectively destroyed when members secretly lacking the requisite qualities and act in bad faith become a majority, damage community’s reputation, and drive out the existing members.
[0] Which paradoxically to a significant degree exist thanks to the unpaid work of volunteers in many of such communities.
foxglacier
If nobody's reading them and nobody's writing them, then perhaps it doesn't matter. We could let Wikipedia-Greenlandic persist as its own evolved language that forks from the original.
> potentially pushing the most vulnerable languages on Earth toward the precipice as future generations begin to turn away from them.
OK? We have lots of dead languages. It's fine. People use whatever languages are appropriate to them and we don't need to maintain them forever.
arthurjj
This was my take from the article also. These languages are clearly dying and not many people speak them as their primary language so the human suffering is minimal. Which means keeping them around is a past time that some people happen to enjoy (unless there is a Saphir-Whorf hypothesis I'm missing)
But the sentence `well-meaning Wikipedians who think that by creating articles in minority languages they are in some way “helping” those communities` clearly shows the author hasn't really considered the issue.
aucisson_masque
I see that this comment get downvoted but I think we can agree on the facts that languages, just like species, die while other flourish. And that's fine.
Survival of the fittest, right ? Not enough people speaking Greenlandic, too complicated even for it's own population who would rather speak danish ? The very reason I'm speaking English is because it was forced military during the 19th century by the UK and since the 20th by Hollywood.
Just like a virus, if a language doesn't spread, it die.
jiggawatts
As an immigrant to an anglophone country, I noticed a few things:
When people have varying levels of capability with languages, they’ll switch to whatever is the lowest common denominator — the language that the group can best communicate in. This tended to be English, even amongst a bunch of native speakers of a common foreign language.
Moreover, this is context dependent: when talking about technical matters (especially computing), the Lingua Franca (pun intended) is English. You’ll hear “locals” switch to either mixed or pure English, even if they’re not great at it. Science, aviation, etc… is the same.
Before English it was French that had this role, and before then it was Latin and Greek.
The thing is, when the whole world speaks one common language like Latin or English, this is a tiny bit sad for some Gaelic tribe that got wiped out culturally, but incredibly valuable for everybody everywhere. International commerce becomes practical. Students can study overseas, spreading ideas further and wider. Books have a bigger market, attracting smarter and better authors. There’s a bigger pool of talented authors to begin with, some of which write educational textbooks of exceptional sparkling quality. These all compound to create a more educated, vibrant, and varied culture… because of, not despite the single language.
haiji2025
yes
ratg13
It's ironic that the "solution" to the problem is being driven by yet another person that isn't native to Greenland.
While they may be a Greenlandic teacher, it's almost assured that they are teaching western Greenlandic, which is similar to Canadian Inuktitut.
People in the East of Greenland speak a language that has similarities, but is different enough in vocabulary and sounds that it's often considered a separate language and not a dialect.
When people from East and West Greenland come together, they typically speak Danish because they can't understand each other in their own native language.
So we're talking about a country that has 55k people and a portion of them don't even speak the official language.. This guy would have no way of knowing whether something was written poorly by a computer or a poorly educated greenlandic native that maybe isn't so good with the official language.
Given that the majority of the country's citizens do not use the internet at all, it is not even clear what his solution is other than just deciding to be some sort of magic arbiter .. which is not realistic or sustainable.
Uehreka
I wish people on HN would stop acting like “magic arbiter” solutions are “not realistic”, when in reality it’s the only way things have every worked. Are federal judges “magic arbiters”? Yes. Do judges make bad calls? Yes. Do we not like when large numbers of judges who are unfriendly to our side get life appointments? Yes. Has anyone proposed an actual better way of solving these kinds of problems? No.
So to get back to the point: Yes the solution is to appoint someone a magic arbiter, and hope they don’t screw up. The fact that it’s a deeply imperfect way of solving problems doesn’t mean it’s not workable. It just means it will backfire at some point, and someone else will get appointed instead.
optionalsquid
> Given that the majority of the country's citizens do not use the internet at all
On what do you base this assertion? I was not able to find up-to-date statistics, but 72% of participants in this survey from 2013 had internet access at home, either via PC or via mobile devices, and another 11% had internet access elsewhere:
https://digitalimik.gl/-/media/datagl/old_filer/strategi_201...
bawolff
> People in the East of Greenland speak a language that has similarities, but is different enough in vocabulary and sounds that it's often considered a separate language and not a dialect.
If this is true, then the easy solution would be to just have two separate wikipedia editions (assuming there is interest).
After all if we have en, sco, jam and ang, surely there is room for two greenlandics. The limitting factor is user interest.
AlienRobot
As someone who isn't a native English speaker, I believe most people who use the Internet would benefit from simply learning English rather than having an unchecked AI translate things to them. Reddit for example has joined millions of terrible Wordpress websites in auto-translating everything for SEO purposes and Google seems to be fine with this for some reason. It's ironic that it has reached the point that if you search for a "multi-language" plugin for Wordpress, most of the results aren't about letting you write an article in multiple languages, they're just about automatically translating a single article to 30 languages with machine translation.
The reason none of this makes sense to me is that it's intellectually crippling Internet users. Computers and the Internet are tools. If you want something machine translated to you, you can use a tool like Google translate to translate it for you. If the webmaster does this, it robs people from the opportunity to learn to use those tools and they become dependent on third parties to do this for them when they would have a lot more freedom if they just did it themselves (or if they learned English).
Teach a man to fish...
spookie
A lot of written text out there in other languages isn't available in English, simply put you have many eco chambers of singular languages out there. Most people are ok with just reading what they understand.
johnea
Unlike what the title of the post implies, I would say Wikipedia bears 0% of the blame for this issue.
I would put 50% of the blame on goggle, for offering up translations that are wholly or partially in error, without any indication such as a warning message to that effect.
Then I would assign 40% of the blame on LLM text generation based on models where the model creators performed no review of their training data.
The final 10% of blame goes to anyone who would post rubbish without first hand knowledge that at least the translation was correct.
Except for that final 10%, all of the blame goes to the profit motive. Foisting shit on the world for the sole purpose of profit.
And lets face it, this isn't exactly the first time marginalized people, or their languages, have suffered because of western capitalism...
p.s. fan-bois kool-aid drinkers, feel free to start your down-voting now...
I'm surprised this story didn't mention the scandal with Scots Wikipedia: https://www.theguardian.com/uk-news/2020/aug/26/shock-an-aw-...
> an American teenager – who does not speak Scots, the language of Robert Burns – has been revealed as responsible for almost half of the entries on the Scots language version of Wikipedia
It wasn't malicious either, it was someone who started editing Wikipedia at 12 and naively failed to recognise the damage they were doing.