Compressing Icelandic name declension patterns into a 3.27 kB trie
59 comments
·August 2, 2025dmurray
For the 800 names that were missing declension data in the database, it seems like the most straightforward thing to do would be to assign their declensions by hand. It shouldn't take a native speaker more than a couple of hours (if some name they haven't seen before is ambiguous, then whatever they guess at least won't sound obviously wrong to other native speakers). Alternatively, very cheap to ask an LLM to do it.
Encoding them into a trie like this would still be a good way to distribute the result, but you don't have to rely on the trie also being a good way to guess the declensions.
perching_aix
Yeah, that'd be a good idea. That said, it still wouldn't resolve the issue for names that are in-use despite not being approved (or foreign names).
I also live in a country with a centrally governed personal name list, but you can request exceptions, and there are people who were born before the list existed, so their names won't necessarily be on the list either. Immigrants can also retain their names during naturalization I believe, and there can be lots of other complications still. So the ability to sorta-kinda predict the proper declension is still useful.
esafak
I wonder if existing LLMs already know these patterns?
wizzwizz4
I see no reason that an LLM should be better at guessing than a trie (unless the actual example was in its training data, in which case a web search would be more appropriate).
dmurray
I agree. I just like having the guessing done at compile time on principle. It allows you to change a guess, if you find that it's wrong, and convince yourself that you haven't broken any of the other cases where you were previously accidentally right.
mikepurvis
I’m surprised there’d be a benefit to doing this in the JS vs having your database just return all the cases with the name and then you select which one you need at display time — basically in the same layer that’s populating your localized language templates.
That said I’m curious how this manifests with cross-language situations. I guess the Icelandic UI displaying French names would just always use the nomitive case, and likewise for the English UI displaying Icelandic names? I assume this all mostly matters where the user is directly being addressed, or perhaps in an admin panel (“user x responded to user y”).
treetalker
I remember that when I was first learning Spanish in high school, I found a piece of (Windows) software that pelted you with a series of pairs of an infinitive and a tense, and you had to conjugate the infinitive accordingly. (Spanish conjugation typically changes the end of the word; irregular verbs tend to involve stem changes). It was fantastic practice and really ingrained the rules; I became a whiz at it.
When I started learning Russian, the declensions (like the ones mentioned in the article) really threw me for a loop. I looked all over for a similar app to explain the patterns and drill rote practice, but never found one.
While slightly off-topic, does anyone know of such an app (web-based or macOS/iOS)?
Rendello
There's some Anki (flashcard) decks that use the "KOFI" method:
> KOFI (Konjugation First) is the name I've given to a provocative language-learning approach I've created: to learn all the forms of a language's conjugation before even starting to formally study the language
I used the French one, years after I learned French, because my conjugation was abysmal. You can get by using basic tenses or wrong tenses, and people will understand you, but it's not what you want. The KOFI method is supposed to teach you all the conjugation patterns in a matter of months before learning the language, I'd like to give it a try in-earnest some day for a new language. My interest in French has waned so I didn't stick with it.
kashunstva
> … learning Russian… explain the patterns… such an app
Non-native Russian speaker here. In the past, I cobbled together some scripts that use the spaCy Python module with the larger of the two Russian modules to provide context-aware lemmatization and grammatical tag extraction.
On the whole, though, my biggest gains in Russian were in letting go of the need to analytically deconstruct the inflections and instead build up a mental library of patterns (and exceptions) in my head through use.
EDIT: I mean context within a sentence, not a broader meaning.
jdcarr
I use ConjuGato on iOS for practicing Spanish conjugations. There’s a game mode where you’re given an infinitive/tense/person and think of the conjugation and you can filter it down to solely irregular verbs to learn the exceptions
jeffwass
When I was learning Spanish (on my own) 25 years ago I had a Spanish/English dictionary. It only translated verbs to Spanish infinitive, but each had a numerical index mapping it to a class of verbs with the same conjugation pattern.
There was a section at the front of the dictionary with full conjugation patterns over all tenses for one sample verb in each class.
Eg, each type of stem-changing verb fell into one index, full irregulars were singletons in their own class, some irregulars that behave similarly (iirc tener and detener) shared one class.
So all verbs in Spanish fell neatly into a few dozen unique patterns, and the indexing was already done.
I was going to build a quiz software just like you mentioned to conjugate any verb in any tense, but “never got around to it”.
I wonder how the reverse-string trie pattern in the article would be for reconstructing the class mapping.
yorwba
You might be able to build something similar yourself using declension data extracted from Wiktionary using wiktextract: https://github.com/tatuylonen/wiktextract#pre-extracted-data
leobg
netsharc
Grandfather talks about classical Windows software. On the Play Store this app says "Contains ads - In-app purchases".
Ah, as a cheap bastard, I hate how software was pay once back then, and for this one I'm just going to ask you what's the monthly subscription price?
GuB-42
I don't know about this app but many of the "Contains ads - In-app purchases" apps offer to remove the ads for a one-time payment.
mpascale00
This comes up in so many threads here... How can we change the culture of subscriptions back to pay once???
nsksl
Find a pirate version if possible…
gametorch
I used Clozemaster effectively to learn Russian. It's not exactly what out describe, but you can fly through many "clozes" to ingrain the patterns into your brain.
ralferoo
I mean, it's an interesting problem for Icelandic sites, but because he's explaining the basic concepts of how declensions work, it seems like he's aiming this at non-Icelandic developers. If they were to use this, no doubt it'll end up butchering names in some other language and lead to all manner of hard to track down bugs.
For example, if an English person called Arthur uses the site in Icelandic, I'm not sure they'd expect their name to be changed to presumably "Arth", "Arthi" or "Arthar" even if they were a keen learner of Icelandic. Their name is their name. So, as well as storing someone's name, you also have to ask them what language their name is, or guess and get it wrong. At that point, you might as well just ask them for all the different forms for the name as well, and then you don't have to worry about whether their name is on an approved list or not.
And if the website isn't localised into Icelandic, I've also got to wonder if Icelandic visitors would have an expectation of Icelandic grammar rules being applied to English (or whatever) text. Most Icelandic people I've spoken to before have an excellent command of English anyway, and I'm sure they'd understand why their name isn't changing form in English.
robin_reala
No idea if Rails copes with this automatically, but it feels like the sort of magic it’s historically been really good at. I remember reading the source code for `pluralise` and finding that someone had encoded the pluralisation rules including irregular cases for Welsh.
Alifatisk
Love Rails, there is a method for everything
tempodox
Is Icelandic name declension deterministic enough that this method reliably works? That would be a lucky break. Language is typically quite messy.
silvestrov
One more optimization idea: instead of the trie mapping to the suffix string directly, then instead make an array of unique suffixes and let the trie map to the index into the array, e.g.
const suffixes = [",,,", "a,u,u,u", ",,i,s", ",,,s", "i,a,a,a", ...];
and then use the index of this list in the var serializedInput = "{e:{n:{ein:0_r: ...
radpanda
> There are, in fact, 88 approved Icelandic names with this exact pattern of declension, and they all end with “dur”, “tur” or “ður”.
…
> But that quickly breaks down. There are other names ending with “ður” or “dur” that follow a different pattern of declension
My “everything should be completely orderly” comp-sci brain is always triggered by these almost trivial problems that end up being much more interesting.
Is the suffix pattern based on the pronunciation of the syllable(s) before the suffix? If one wanted to improve upon your work for unknown names, rather than consider the letters used, would you have to do some NLP on the name to get a representation of the pronunciation and look that up (in a trie or otherwise)?
dmit
> Is the suffix pattern based on the pronunciation of the syllable(s) before the suffix?
Careful, this is how you fall down the Are Dependent Types The Answer?? hole.
perching_aix
Not sure what that's supposed to mean, but if Icelandic is anything like my native language in this, then it is indeed a pronunciation based thing. Which should make sense, since languages are (historically) spoken first, written second.
dmit
Heheh, it was mostly a reference to my [and mostly others'!] experiments with encoding human languages in a programming language. There are some pretty neat ideas there to explore, like the difference between Subject-Object-Verb (SOV) and Object-Subject-Verb. Or postfix languages (e.g. Forth) mapping to some human languages.
In this particular example, having a subsequent part of an expression rely on prior parts would usually be accomplished at runtime in most languages. But some (like Idris) might allow you to encode the rules in the type system. Thus the rabbit hole.
jedimastert
It's like an interview question from hell. Reversing a trie is those things that I might ever use once in my life, but that one time I will look like an absolute wizard.
ryanjshaw
An interesting article but I was surprised there was no discussion about what humans do to address this problem?
Zanfa
They stick with the nominative case. That’s the only safe way not to butcher somebody’s name in a language like Estonian that has 14 cases. It’s infinitely easier to update copy to use only nominative than try to apply the cases automatically.
lifthrasiir
A possible alternative, especially for beygla/strict, would be perfect hashing.
My brain is screaming that there has to be a solution in <1kb uncompressed (for the non-strict version).
Maybe generating a minimal list of regexes that classifies 100% of names correctly? Maybe a big enough bloom filter? Maybe like a bloom filter but instead of hashes we use engineered features?