Interdisciplinary Work Pt. 2: Genetics and Archaeology

Find Pt. 1 here.

One more player in the multi-disciplinary puzzle is Genetics. While there has been some major development in the field regarding archaeogenetics[1], I am excited but cautious. As an example, I will take one of the major articles:

Allentoft et al. (2015) screened 601 human samples for ancient DNA (aDNA – DNA that is isolated from ancient specimens), 101 of them selected for deeper screening. Their report includes the details of 101 individuals (102 samples, 2 of them proved genetically identical). The samples stem from Scandinavia, Central and Southern Europe over Eastern Europe, Southern Urals to Central Siberia. The time span corresponds to the period from Corded Ware / Yamnaya into the full European Bronze Age. For dating they used radiocarbon dating (14C). The difference between aDNA and modern DNA is that aDNA is characterized by certain types of damage since DNA declines over time. This is an important factor, as there will be modern DNA present from other sources, i.e., contamination, and it is therefore a validation criterion in aDNA research. But as they state in their report, even after using the Bayesian approach (a method of statistical inference that focuses on interpreting probability), the results cannot completely exclude a certain level of modern DNA contamination, so additional contamination estimates will be necessary – if there is sufficient data to do so.

This illustrates my reservations. I am careful to get too excited about the results just yet. Genetical analysis of aDNA is extremely complicated. As DNA degrades, we are left with few samples, since not every burial found can result in valuable DNA. But why (and how) were those 101 samples selected for deeper screening, what about the other samples, why weren’t they selected?

Since linguists are used to working with many samples (in this case glosses, phonological and morphological traits, sound laws and other features) to compare languages externally and internally, the data from 101 individuals over such a time span and geographically widespread seems too few for representative sampling. I am looking forward to seeing more results in the future, as methods improve.

The results of Allentoft et al. 2015 implicate the following about the major cultural changes during the Bronze Age of Eurasia (ca. 3000-1000 BC):

  • The Yamnaya / Afanasievo cultures moved into Central Asia and the Altai-Sayan region
  • The Corded Ware culture in Europe was the result of an admixture with local Neolithic people
  • The Sintashta culture bears genetic resemblance to the Corded Ware culture and was therefore likely to be an eastward migration into Asia,
  • Spreading towards the Altai it evolved into the Andronovo culture (Allentoft et al 2015 write specifically Andronovo culture on page 171, but it is important to remember that Andronovo was in fact a Late Bronze Age cultural complex (Kuz’mina 2007, p. 17-30) which was then gradually admixed and replaced by later East Asian cultures).
  • The Afanasievo culture present in the Altai region around 3000 BC could explain why Tocharian, an Indo-European language, is found in the Tarim Basin in China: They may have brought the Indo-European language as a result of their Yamnaya heritage southward to Xinjang and the Tarim.

Even though these results seem very promising, Allentoft et al. 2015 warn us to be cautious about the result, since such relationships must be demonstrated case by case.

Haak et al. 2015 published by coincidence their article in the same issue of Nature, with very similar results. They had sampled data from 69 individuals. Their results confirmed that:

  • 8000-7000 years ago, closely related groups of early farmers (EF) appeared in Germany, Hungary and Spain, different from the indigenous groups, while in Russia there lived a distinct group of hunter-gatherers with affinity to a Siberian ancestor.
  • By 6000-5000 years ago, farmers in Europe had more hunter-gatherer ancestry than their predecessors.
  • But in Russia the Yamnaya steppe herders were descendants from both the preceding eastern European hunter-gatherers and from a population of Near Eastern Ancestry
  • Before the Late Neolithic period, only one individual with haplogroup R1B (Samara) was found outside Russia. In the Late Neolithic and the Bronze Age, 60% of the European individuals outside Russia (which means 10 individuals) and all of the sampled individuals from European Russia from all periods (9 individuals) showed the presence of haplogroups R1a and R1b.
  • Western and Eastern Europe must have come into contact around 2500 BC through a massive migration into the European heartland from the East. This steppe ancestry persisted in all samples from Central Europe until about 1000 BC and is still found in present-day Europeans.

Again, this supports the theory that at least some of the Indo-European languages (after Anatolian had split off) spread to Europe from the steppes, through movement of peoples and not (only) ideas. Corded Ware share some elements of material culture with for example the Yamnaya, and the genetic data in Haak et al. (and Allentoft et al.) provides evidence of migration, suggesting that it was relatively sudden. Both studies also challenge the theory of the Anatolian homeland. But, Haak et al. caution us that while this data can be used to make a compelling case for a massive migration from the steppes as source for the Indo-European languages in Europe, the homeland cannot be determined from the data in their report.

Both Haak et al. and Allentoft at all. have received criticism, amongst others from Klejn et al. (2017). Klejn expresses his doubt that the genetic “…discoveries reflect a direct migration from the Yamnaya to the Corded Ware cultures.” (Klejn et al. 2017, p. 2). He bases his doubt on glottochronology and cladistics, which place the breakup of the Indo-European languages within the 7th to 5th millennia BC. However, which Proto-Indo-European are we talking about? Early, Post-Anatolian or Late PIE? Post-Anatolian PIE can be dated to around 4000-3500 BC. The Yamnaya are dated to 3300-2600 BC. This leaves us with a gap between the Early Proto-Indo-European and the Yamnaya, but they could have spoken Post-Anatolian PIE (to which we can reconstruct the wheel vocabulary), or Late PIE.
Klejn also argues that the Yamnaya culture begins at the very earliest around 3000 BC (refuted by archaeologists and radiocarbon dating), likely later, and correlates the breakup of the proto-language with the disintegration of the Yamnaya culture, and not the beginning (Klejn et al. 2017, p. 3).

To determine which version of PIE the Yamnaya spoke (if everything else is correct), I will try to compare linguistic and archaeological data:

The Afanasievo culture (3300-2500 BC) has been correlated with the Tocharian languages, but this is problematic (Adams in Kapović 2017, p. 456), as the traces of this culture southward to the Tarim Basin are faint. There are no traces of cereal culture within the Afanasievo culture, while this clearly had some importance for Tocharian language speakers. Adams states the possibility that the Tocharian speakers belong within the Andronovo horizon (ca. 1800 – 900 BC, or what corresponds to the Late and Final Bronze Ages) (ibid).

Another challenge is that both Tocharian and Indo-Iranian are located in the east, and if present in the same time span, would have likely encountered each other. But they are nothing alike, and Tocharian is a centum language, while Indo-Iranian languages belong to the satem-languages. But Adams names a set of very ancient (because of the phonetic circumstances) borrowings from Iranian into Tocharian (Adams in Kapović 2017, p. 457). He concludes that the Pre-Tocharian speakers probably left the northern frontier of the Indo-European horizon around 4500 BC towards the east to the southern Urals – North Kazakhstan steppes. There they encountered only briefly first the Proto-Indo-Iranian speakers and then Proto-Indo-Iranian speakers, who were moving further south (therefore no further common linguistic traits). After around 2000 BC they moved further into the Tarim Basin (ibid). In Tocharian, only one of the words within the semantic field of vehicles is present (*kwékwlo-), while the branches after Tocharian split off also had *h2eḱs– ‘axle” and *rót(h2)- (Olander 2019, p. 22).

So, with this we could roughly date Late PIE to around 2000 BC, based on the correlated culture, since we can’t date the beginnings of Tocharian (and with it the split from Post-Anatolian PIE). What I want to demonstrate here, is that we are still within the Post Anatolian Proto-Indo-European time span, if Klejn suggests that the Yamnaya disintegrated and got replaced by the Poltavka culture (from 2600 BC) amongst others (Klejn et al. 2017). Thus, the Yamnaya could have spoken post-Anatolian PIE.

However, Haak et al respond in the same article that they are explicitly not referring to the Yamnaya speaking Proto-Indo-European, but rather to a continuum of languages likely to have existed in the western Eurasian steppes, after Anatolian had split off (p. 6). And again, what we call this continuum is a matter of terminology, but this would then be post-Anatolian PIE.

Allentoft et al. answer in the same article, that the Yamnaya culture begins at 3000 BC (p. 7), indicating Klejns argumentation is incorrect. The conclusion of both Haak et al. and Allentoft et al. in their answer to Klejn is that they cannot exclude the possibility that steppe DNA came to Europe already before the Yamnaya expansion since there is still data lacking from other cultures (which is one of the reasons why I am so careful about concluding too much from genetic results just yet), but that they are simply presenting the most likely scenario; as well as their result gives strong evidence to a substantial human migration shortly after 3000 BC, with remarkable similarities to the model of the Indo-European family of languages and their spreading (p.8). The discussion in the article goes on, but this would go too far.

Finally, I’d like to explore the implications of one other article. Ning et al. (2019), who tested genome-wide data from 10 ancient individuals form northeastern Xinjiang, support the Steppe hypothesis, further argue, based on the results of their genetic study, the theory that the origin of the Tocharian languages is related to the Yamnaya. People with Yamnaya-related steppe ancestry spread from the Pontic-Caspian steppes to the Altai region in the form of people related to the Afanasievo culture, while it seems likely that another part of the same population migrated west towards Europe (correlating with the spreading of the Indo-European languages). The fact that the Tocharian languages and the Indo-Iranian languages are different branches is supported by evidence that the Indo-Iranian speakers belonged to the Sintashta and the later Andronovo horizon (Ning et al. 2019, p 4-5).



There are two things I’d like to point out:

  • We lack data. Yes, there seems to be a correlation, but it is based on limited data. I hope for the future that we can gain more results and thus more insight into the matter. And interdisciplinary work can certainly help to shed more light on this.
  • We need to work on an interface when doing multi-disciplinary research. Some of the discussion and the misunderstanding between Klejn, Allentoft and Haak is based on simple linguistic terminology.  Such terminology needs to be clear for all parts working together, and for the reader.


[1] Haak et al 2015, Allentoft et al. 2015, Klejn et al. 2017, Wang et al. 2019, Ning et al. 2019, Narasimhan et al. 2019.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by

Up ↑

%d bloggers like this: