Ancient-DNA study identifies originators of Indo-European language family
79 comments
·February 7, 2025falaki
I haven't read the papers in detail, but can someone explain how genetics can be used to trace spread of languages? For context, you don't need population movements for a language to spread (it is similar to religion). See this article for a logical explanation: https://medium.com/incerto/a-few-things-we-dont-quite-get-ab...
astrange
Well, you can't. In this case I believe they're already pretty confident about who the PIE speaking people are (the "Yanmaya") and this study is about tracking down where they originally lived. And they have shown that they mostly replaced the previous European population rather than transferring the languages to them.
David Reich is aggressive about these genetics results though. IIRC I read a NYT story once where he came in and claimed to have upended all of Polynesian history based on the genetics of a few historical skulls they found, but it didn't seem like strong enough evidence to me.
adastra22
You can’t. But if population A and population B share a ancestor X years ago, and they also speak languages that appear to have drifted apart by X many years, the inference that their ancestor spoke a common proto-language is the simplest explanation.
eddiewithzato
Well you can and in fact they have narrowed down the language to a haplogroup even. R1b in the case of greek for example
philwelch
You are correct that the spread of genetics and the spread of language do not have to coincide. However, in this case, it seems that they do.
If you study the genomes of the populations of Europe as well as parts of Central and South Asia, you can reconstruct a very broad family tree rooted in a shared genetic ancestry from in a population who lived somewhere in Eurasia at a certain point in time. If you also study the languages of those same populations, you can independently reconstruct a family tree of languages that culminates in the reconstructed Proto-Indo-European language that would have existed at the same point in time. The simplest explanation for this is the spread of Indo-European-speaking populations, and not merely the language itself, from a single ancestral population.
eddiewithzato
[flagged]
ants_everywhere
[flagged]
astrange
Deepseek doesn't give very Chinese-sounding answers in my experience unless you mention China. Like all LLMs, it mostly feels like it's reading from English Wikipedia.
mateus1
I’m sorry, where do you get that communist countries negates genetically science? A quick search shows that Cuba, for example has plenty of geneticists, researchers and healthcare initiatives around hereditary diseases…
rossdavidh
David Reich, one of the principal authors of the study in question, wrote a book a few years back titled "Who We Are and How We Got Here", which I quite liked (https://www.goodreads.com/review/show/2605841954). It predates some of this research, obviously, but it does have a chapter on the Indo-European origin question, along with chapters on a lot of other interesting paleo-DNA research.
adolph
He was on the Dwarkesh podcast last August to provide some lay person friendly synopsis and updates to “Who We Are.” Worth listening to even if you have read the book (in my mind at least).
Warning, link has an auto play when I opened it (but don’t let that minor obnoxiousness dissuade you from listening).
triyambakam
Can someone smarter than me explain how it's even possible to use DNA to identify the origin of a language, given that e.g. if this were tried with a language like German (or maybe any Western European language) the puzzle would look very confusing and is not DNA based.
macleginn
The story with the Indo-Europeans is basically as follows:
1. By intersecting ancient word sets of ancient Indo-European languages using comparative phonetics we can try and reconstruct the words of the proto-IE language, both their approximate sounds and approximate meanings. This gives us some information about the society. E.g., the PIE language very likely had a word for wheel, which puts the common PIE community in the period after the wheel was invented. Other words can help us guess what landscape the PIE people lived in, and it has been generally assumed for almost a century now that it strongly resembles Southeastern Europe, essentially the Ukrainian steppe. Two alternative hypotheses (modern-day Turkey and the area to the north, in modern-day Poland/Ukraine) had different drawbacks. We can also look at the locations of the earliest historically attested IE groups (Europe, Middle East, Punjab, Anatolia) and try and guess where they all may have had come from, given the time frame.
2. By looking at the descriptions of the earliest IE societies (first of all the society of Rig-Veda), we can try and guess what way of life these people had. We can then look at all the archaeological cultures in the roughly appropriate area from the roughly appropriate time frame and see which of those have features of interest (in the IE case, warrior-like culture with social stratification, etc.).
3. We know that IE migrated a lot and provided a lot of genetic material to modern populations in Europe and some other regions. Since quite recently, by looking at palaeo-DNA data from the remains of the people who belonged to these cultures, we can try and check who of them made the biggest contribution to contemporary populations.
All these sources of data are rather imprecise, but if you combine them all together and see a clear pattern, this looks rather convincing.
FlyingSnake
> the society of Rig-Veda
I fail to understand how the Rigvedic society can be connected to this DNA research. Rigveda never mentions anything beyond the Punjab/Swat/Haryana region in any of the hymns. The flora and fauna mentioned in it is also exclusive to this region. Lastly there is no mention of an ancient homeland both in Rigveda and Avesta.
flir
I believe there's some stuff around burial practices that parallels some steppe practices. Something about horses and mound construction, I think?
Here we go: https://www.discovermagazine.com/planet-earth/chariot-racers... - make of that what you will.
empath75
I think this comment is based on some confusion about how languages spread. Languages spread along with people, but while a local language may be replaced, the people are not generally replaced with the language. There may have been some genetic mixture, there may have been a time where they were conquered by them for a time, but there's no sense in which the people who wrote those works _were_ Yamnayan, any more than the Germans are. They wouldn't have a story about having a far away homeland because they wouldn't have had a far away homeland, and nobody would have remembered any previous language because that language had been replaced thousands of years before, and well before anybody started writing anything down. They gradually picked up the language of either invaders or their trading partners, just as has happened many other times in history.
Edited to add: there are basically no migration stories in _any_ indo-european mythological cycles or oral traditions. That's not evidence that there wasn't spread through, migration or invasion, but it does indicate that it was a gradual process that wouldn't have been particularly noticeable in any one life time.
danans
It's not about the origin of a single language.
It's about the origin of a population whose widely dispersed descendants often speak a language whose primary features descend from the language spoken by the original population (albeit changed via thousands of years of drift and borrowing from other languages).
That doesn't mean that a) all features of the descendant language come from the origin language or b) all speakers of the descendant language have ancestry from the original population.
sampton
Writings on artifacts and burial practices associated with DNA fragments found at the burial sites.
DC-3
This study is about prehistoric Steppe peoples, there are no Indo-European inscriptions from this time period nor would there be any until several millennia after this time.
teleforce
> there are no Indo-European inscriptions from this time period nor would there be any until several millennia after this time
That's a very negative presumptions.
How about the oldest attestation of Indo-European language or the long extinct language Hittite who once lived in Bronze age Anatolian Steppe? The language is attested in cuneiform, in records dating from the 17th to the 13th centuries BCE.
Hittite people created an empire centred on Hattusa, and also around northern Levant and Upper Mesopotamia [1].
[1] Hittite language:
The-Old-Hacker
neonate
That's a good article. It's here too: https://archive.ph/eJoqA.
stult
> It finds evidence that the culture may have taken root somewhere near the present-day small town of Mykhailivka in the southern part of Ukraine.
As anyone following the war in Ukraine closely has long since realized, village names alone are not very useful for identifying where something is in Ukraine. There are just too many places with the same names. e.g., https://en.wikipedia.org/wiki/Mykhailivka
JohnGrun
This book is a very very deep dive into this subject. It may be a bit out of date. Published in 2007 https://en.wikipedia.org/wiki/The_Horse,_the_Wheel,_and_Lang...
teleforce
Related HN posts [1], [2].
Fun facts, the most common words of Indo-European Family are surprisingly very similar across Sanskrit (S) <--> English (E) <--> German (G) [3].
Pitara (S) <--> Father (E) <--> Vater (G)
Matara (S) <--> Mother (E) <--> Mutter (G)
Bhratara (S) <--> Brother (E) <--> Bruder (G)
Duhitar (S) <--> Daughter (E) <--> Tochter (G)
[1] New insights into the origin of the Indo-European languages (147 comments):
https://news.ycombinator.com/item?id=36930321
[2] Ancient genomes provide final word in Indo-European linguistic origins (16 comments):
https://news.ycombinator.com/item?id=42515584
[3] Turandot and the Deep Indo-European Roots of “Daughter” (15 comments):
gbuk2013
My dad has literally just published a book (in Russian) with about 850 words with near identical sound and meanings in Russian and other Slavonic languages. :)
https://borissoff.wordpress.com/2025/02/06/russian-sanskrit-...
For my part I built the web based editing tool, DB and LaTeX generation system that he used to assemble this massive undertaking over the years. :)
https://borissoff.wordpress.com/2015/10/30/first-public-pres...
It was interesting hearing him talk about how you can see pieces of the original proto language preserved in the different languages. E.g. Russian has 6 cases, Sanskrit has some of these but also others and the original language had something like 12 (I don’t have any particular knowledge on the subject so might be misremembering).
For me it was interesting that the original language seemed to be more complex than the modern descendants, like there is a general trend towards simplification with time. In my mind then there is the question as to where the original complex language came from and why would a culture that we would consider more primitive that ours would need and come up with one.
Hemospectrum
The complexity of natural human languages comes in different forms, but as a general rule, whenever you see something that's built into another language and "missing" from your own, you can express it by using more words. For example, PIE had a lot of noun cases that aren't in English, but you don't need the instrumental case to precisely express its purpose. You can say something like "by means of a forklift."
Some studies actually suggest that literacy systematically pressures languages to use longer, more complex sentences, thus disincentivizing complex inflection rules.
gbuk2013
I get that part - I speak both English and Russian and the latter is more concise and nuanced due to the more complex grammar.
It’s just interesting that the apparent trend is from complexity to simplification, like what I observed with English as grammar is not taught so much here in England anymore. It could well be (and likely is) an illusion stemming from my shallow knowledge of the subject of linguistics.
trhway
>built into another language and "missing" from your own, you can express it by using more words. ... "by means of a forklift."
and that "more words" combination may be more precise, expressive and much simpler to handle in communication in some contexts (not necessary in all though) than say something like <prefix><word root><suffix 1><suffix2> with <suffix>-es being "juschij" and the likes (my past comment on that https://news.ycombinator.com/item?id=40244902 )
An example: "Petr kicked Ivan" and "Ivan kicked Petr" - 2 opposite things in English while in Russian i can use all 6 combinations of the "Petr", "kicked", "Ivan" words while still saying the same thing just by utilizing necessary suffixes to express the case, and by switching suffixes i can use the same 6 combinations to express opposite ("Ivana pnul Petr" and "Petr pnul Inava" and "Pnul Ivana Petr" and so on - all is the same thing while "Ivan pnul Petra", "Petra pnul Ivan",... is the opposite - great for writing poetry, while not that good for the contexts where concise and precise communication is at premium, like for example in the tech world)
Hemospectrum
Similarities like these, especially with Latin in the mix, were the clue that originally put early linguists on the scent of the IE language family several centuries ago. Since then, extensive research has been done into how exactly these languages developed from their common ancestors. Some modern dictionaries, like Wiktionary, contain entire family trees comparing the divergent development of these cognates and many, many others.
danans
> Pitara (S) <--> Father (E) <--> Vater (G)
> Matara (S) <--> Mother (E) <--> Mutter (G)
> Bhratara (S) <--> Brother (E) <--> Bruder (G)
> Duhitar (S) <--> Daughter (E) <--> Tochter (G
Since you seem to be quoting the Sanskrit words in their root forms, (to which the case-lacking English and German equivalents most closely correspond) your spellings are incorrect. The correct forms are:
pitr
mātr
bhrātr
duhitr
No thematic 'a' on the end.
You might be confusing it with the nominative plural case forms:
pitarah
mātarah
bhrātarah
duhitarah
adolph
Could you explain in non-specialist language how similarities between these modern languages now has anything to do with their relationship from some earliest common ancestor? How is that explanation better than convergent evolution or overfitting hallucinations?
When I look at the difference between modern and “old English” they seem to have changed quite a bit [0]. When I read an etymological explanation [1], it sounds like a just so story.
0. https://www.reddit.com/r/etymology/comments/9ouweu/how_engli...
1. https://www.pimsleur.com/blog/words-for-father-around-the-wo...
yorwba
The explanation is better if it allows you to explain a large number of similar words arising from a common source by a systematic process.
If you have to make up a new just-so story for every pair of words, of course you're not gaining much, but if the same story works for many words at the same time, positing a common origin isn't too far-fetched.
fuzztester
>Pitara (S) <--> Father (E) <--> Vater (G)
>Matara (S) <--> Mother (E) <--> Mutter (G)
Also some roots of the smaller natural numbers, like (E): one, two, three, four, five, six, seven, eight, nine, ten, etc.
(G) eins, zwei, drei, ...
(S) eka, dvi, tri, ...
See the "Table" here:
https://en.m.wikipedia.org/wiki/Devanagari_numerals
Although it is about numerals, there are words in a few languages, on the right side.
And Sanskrit is the ancestor of many Indian language, such as the regional languages of most of the northern (e.g. Punjabi, Haryanvi, Himachali, Hindi and its dialects), central (e.g. Hindi), eastern (e.g. Bengali, Odiya) and western (e.g. Gujarati, Marwadi) Indian states. To a rough approximation, only the languages of the 4 (now 5, with Telangana added) southern states, and of the 6 / 7 north-eastern states (Assam, Manipur, Mizoram, Meghalaya, etc.) and maybe a few aboriginals' / forest tribals' languages, like Bhil, Gond, etc., don't descend from Sanskrit.
anon291
Lots of verbs too.
For example, 'to be' - French 'etre' (circumflex over the e indicates old 's' after the e), Marathi 'asane' (pronounced esnay)
'to go', German gehen, Marathi jana (when conjugated the j becomes hard)
'to give', french 'donner', Hindi 'danaa' (pronounced similarly)
'to mix', french 'melanger', Hind 'melaanaa'
Other non-obvious ones:
Vedas and Wisdom / Wit. Alternatively, Latin video (to see)
Dyaus-pitar and Jupiter, Zeus-pater
'that' in English is 'que' (that/what) in french and 'kya' (for what) or 'ki' (for that) in Hindi (pronounced similarly to French 'que').
English burden or 'to bear' and Hindi bhar (burden)
English 'ignite', Latin 'ignis' and Indic 'agni' (fire)
'Raja' and 'regal' or 'royal'
'Dental' and Hindi 'dant' (tooth)
Greek 'polis' and Indic 'pore' / 'pur' / 'puram' (the 'r' is pronounced like a soft l)
richardfontana
> Dyaus-pitar and Jupiter, Zeus-pater
This one is slightly more interesting than a mere cognate as it is believed that the Proto-Indo-European speakers worshipped a sky god with the reconstructed name *Dyḗus ph₂tḗr ("sky-father") which is the ancestor of these (also Tyr and the like on the Germanic side). See:
https://en.wikipedia.org/wiki/*Dy%C4%93us "*Dyēus is considered by scholars the most securely reconstructed deity of the Indo-European pantheon, as identical formulas referring to him can be found among the subsequent Indo-European languages and myths of the Vedic Indo-Aryans, Latins, Greeks, Phrygians, Messapians, Thracians, Illyrians, Albanians and Hittites."
philwelch
What I find interesting is that the primary Turkic/Mongolic deity, Tengri, is also a sky father. There’s no shared genetic or linguistic ancestry there, just two different steppe nomad populations independently deifying the daylight sky the same way.
yorwba
French être is from PIE h₁ésti https://en.wiktionary.org/wiki/Reconstruction:Proto-Indo-Eur... which also gave rise to Marathi आथि (āthi). Marathi असणे (asṇe) https://en.wiktionary.org/wiki/%E0%A4%85%E0%A4%B8%E0%A4%A3%E... appears unrelated. (But might be cognate to English at home?)
Not all similarities between mondern languages are inherited, coincidences do happen.
jyscao
> This led to a demographic explosion, so that in a few hundred years Yamnaya descendants numbered many tens of thousands and were spread from Hungary to eastern China.
Don’t they mean western China here?
snovymgodym
yeah definitely. Probably in reference to: https://en.wikipedia.org/wiki/Tocharians
cool_dude85
An excellent book I strongly recommend about PIE is The Horse, The Wheel, and Language.
dsign
Only 6500 years? That's incredibly recent for such an influential language. For comparison, Sargon of Akkad died only 4000 years ago, and there are written records from him. True, he didn't speak Indo-European, but Afroasiatic/Akkadian, and that was the language on those cuneiform tablets the researchers used for reference.
On a tangent, with the advent of AI and the final decades of our species, we should make more clay tablets to leave lying around...
teleforce
The oldest attestation of Indo-European language is now the long extinct language Hittite who used to live in Bronze Age Anatolian Steppe. The language is attested in cuneiform, in records dating from the 17th to the 13th centuries BCE.
Hittite people created an empire centred on Hattusa, and also around northern Levant and Upper Mesopotamia.
> On a tangent, with the advent of AI and the final decades of our species, we should make more clay tablets to leave lying around
The irony is that even with AI we have yet to decode Indus script perhaps due to the lack of the equivalent of Rosetta Stone [1]. I think there's a Nobel prize waiting for those who can decipher the Indus script with AI or not [2].
[1] Rosetta Stone:
https://en.wikipedia.org/wiki/Rosetta_Stone
[2] Indus script:
FlyingSnake
Regarding Indus script: I’ve recently come across this purported attempt by someone who claims they’ve deciphered Indus script. I’m. It not sold on it but it is making some waves in Indian circles.
iamshs
That's a Hindu Nationalist website and a claim. They forcefully fit Indus Script into Sanskrit, to try to outflank the Aryan Invasion Theory. It is not making anywhere but only in Brahmin circles.
canjobear
The problem with Indus script is that the inscriptions are all very short. This gives you very little information to go on.
empath75
I believe the article is saying that the Hittite branch split off early from the rest of the tree, and the Yamnaya are the ones that spread it around the world -- the Hittite branch ended up being a dead end.
fvvybfbfbyg
English is what? ~600-800 years old? Most other major Western European languages only developed over the past ~2000 years or so.
It’s not like Porto Indo-European developed out of nothing. It was related to other languages that just didn’t survive and happens to be the most recent (hypothesized) common ancestor of all other Indo-European languages)
psunavy03
> final decades of our species
Oh, come on. This is what we get from social media bubbles and breathless irresponsible media reporting.
dang
"Please don't pick the most provocative thing in an article or post to complain about in the thread. Find something interesting to respond to instead."
https://news.ycombinator.com/newsguidelines.html
I agree it's a provocation and, worse, a generic tangent, but the rest of the comment was pretty good.
beepbooptheory
I know you have no time, and I don't need a response. I have no skin in this exchange either, fwiw. But I just want to try to unpack here how this could be either a provocation or a noteworthy tangent. Is it that being pessimistic about the future is flamebait? Is it perhaps sneering? Is there anything to be said about the rest of the sentence in question and how its clearly just being a little cheeky? Or is that perhaps whats wrong with it?
Just feels perhaps a little out of place this time that the gp would be in the wrong at all here. But I'm sure I'm missing something obvious.
hollerith
>Oh, come on. This is what we get from social media bubbles and breathless irresponsible media reporting [emphasis mine].
Speaking of bubbles, how sure are you that Silicon Valley and HN are not part of a bubble composed of people with an emotional attachment to technological progress and people with a financial stake in AI?
How sure are you that the AI labs aren't being even more irresponsible than the news media?
null
brcmthrowaway
Does this mention Avestan at all?
I recently came across this presentation of Kristian Kristiansen, University of Gothenburg: "Towards a New European prehistory: genes, archaeology and language" (2023): https://www.youtube.com/watch?v=bxTVSwt-jsU [video], which I enjoyed very much. Prof. Kristiansen is a leading researcher in this area.