The cladistic tree model, pt. 1: A possible Balkan Group?

There has always been a big interest in to how to graphically present the Indo-European languages. Various models have been introduced, but recently, the presentation of the Indo- European languages as a flat cladistic tree model has become popular. You can find the model I am currently using here.
The cladistic approach was originally used for biological classification, categorizing organisms based on the most recent ancestor. This approach has been applied to linguistics to create a tree model for genetically related languages. There are of course rules that apply, and problems that need to be solved. Not all branches are accepted with wide consent, and neither are the names of certain branches. While writing my master thesis, two of those problems gained my attention, the possible existence of a Balkan group and a potential Indo-Slavic branch. My aim is to examine some of the subgroups and their relations, if they are geographical (Sprachbund) or genetical (in the linguistic sense); and whether there should be further nodes added or not.

Grouping the Indo-European languages has been a subject for discussion for a long time and can be done in different ways. The tree model in this blog is based on shared innovations. It does not in itself state the age of the included languages, nor is it dialect-geographical. Porzig (1954, p. 53-64) writes that only shared innovations can prove a shared history, that they need to be numerous or striking enough to exclude chance, and that shared phonological and morphological traits are better indicators than shared vocabulary. A language family consists of all the languages that have evolved from one common ancestor (Kapovic 2017, p. 3ff).

How do we define a node in the tree? The languages need to have evolved enough to be mutually incomprehensible, but, considering Porzig’s approach, they need to share innovations so they can be considered to have the same ancestor language. While I think that Porzig has very valid arguments about what is needed to determine if languages are related, I do not agree with his statement that we are looking for certain nodes in vain (Porzig 1954, p. 64). The reality is more complicated.

The challenge with the Balkan group does not lie as much in its existence as in its definition and how and if to include it into the Indo-European family tree. The ongoing discussion is whether the languages within this group are genetically related or geographically, as a Sprachbund. The approach must thus be to find shared features and determine if they are shared inventions, borrowings or chance.

The Balkan group includes Armenian, Phrygian, Albanian and Greek. Armenian and Greek have been tied together before, first by Pedersen in 1924 and again Hamp in 1976; most famously arguing against this theory are Clackson 1995 and Kim 2018. The definition as a Balkan group comes from the fact that some of the included languages have been placed in historical times in the Balkan area or close by (Matzinger 2012, p. 142).

The existence and quality of the Balkan group has been a point of discussion for many linguists, and I will name a few examples here to show the different approaches to and opinions of the problem.

Ivo Hajnal, 2003

In 2003, Ivo Hajnal published an article about the paleo-linguistics of the Balkan area. What is called Balkanlinguistik in German is not the same as the Balkan group this paper treats. The Balkanlinguistik is concerned with the modern languages in the area, divided in two circles: The inner circle with Albanian, Rumanian, Bulgarian and Macedonian; the outer circle with Modern Greek, Serbo-Croatian, partly Turkish and a few more idioms showing Balkanismen. This is the Sprachbund in the Balkan area we know today. Not all the included languages are Indo-European and the methods of the Balkanlinguistic are different to the ones of the Indo- European studies. They are typological instead of comparative. With this, Hajnal exemplifies the differences between both disciplines (Hajnal 2003, p. 117-121).

However, in his opinion, the Balkanlinguistik and the Indo-European studies can absolutely work together. The genealogical research can help in the typological discipline to define the existing relations between languages, which are no longer neighbors: “Die Möglichkeiten … implizieren die Existenz nachursprachlicher, voreinzelsprachlicher Sprachbünde” (Hajnal 2003, p. 121). This is comparable to Matzinger’s Zwischensprachen (see below).

Hajnal’s publication is of interest to me because he compares the knowledge of the existing Sprachbund in the Balkan with the prehistoric situation of the area, combined with what is known from archaeological research. His approach is very methodical. He lists two hypotheses concerning the traditional Indo-European family tree (ibid):

  • It is possible that certain languages were not yet differentiated after the disintegration of Indo- European, and only later separated from each other
  • It is possible that certain peoples and already differentiated languages due to migration came close to each other again and created areal-linguistic innovations.

Thus, he proceeds with naming the subject of his research the Paleo-Linguistics of the Balkan area: The pre-Romanic, pre-Slavic and even pre-Hellenic stage of this area. The limited written sources of the old Balkan languages pose a big challenge. Most languages are attested as Trümmersprachen, for example Ancient Macedonian (see Hajnal 2003, p. 122-124), Thracian and Illyrian, or only known indirectly, such as the language of the Dardanians or of the Paeonian tribes. Hajnal advises the reader to be cautious about building a hypothesis onTrümmersprachen alone – they should only be used in the final stage of the hypothesis building, for comparison or orientation. But in the case of the Balkan group, a hypothesis should first of all build on the better attested languages such as Greek and Albanian, and then draw on languages like Armenian and Phrygian (Hajnal p. 124 f.).

Hajnal focuses on three shared features within the Paleo-Balkan languages:

  1. The suffix *-ih2 : The laryngeal vocalization in auslaut. Usually in the Indo-European languages in the, the laryngeal is lost, and the vowel becomes long, for example Lat. genetrīx. In Armenian and Greek, the laryngeal is vocalized to *-i̯ə2 (cf. Olsen & Thorsø forthcoming): *potnih2 > gr. πότνια; *sterih2 *steri̯a > arm. ster ̆j. Hajnal also cites Klingenschmitt (1992, p. 104) with alb. zonjë from older *desi̯ās-potnia with *potnia < *potnih2.In this comparison, Hajnal also mentions the prothetic vowels, the laryngeal vocalization in initial position in Greek, Armenian and Phrygian, maybe even Ancient Macedonian (cf Olsen & Thorsø forthcoming, Matzinger 2012, p. 150).
  2. *-ei̯o- (*-eii̯o-) in possessive adjectives and patronyms: attested in Mycenaean Greek, as well as Phrygian, Venetian and Messapic
  3. *-si instead of *-su in the locative plural ending: Usually, the ending of the locative plural is *- su, but in Greek, Albanian and Phrygian it is *-si.

Turning back to Hajnal’s two possible hypotheses, he decides to analyze the second one first (2003, p. 130 ff.). If the shared features are the result of a convergent development, then this means that the speakers have had contact, meaning they must have lived geographically close. This is possible to examine with the help of archaeological and historical data. In his article, Hajnal mentions Gimbutas and the expansion of the Kurgan cultures; however, since his article is from 2003, there has been a lot of development in exactly this field, so in this case, there is more recent research to examine, if one wants to accept or refute hypothesis 2.

Two well-known and often cited articles from 2015 (Allentoft et al., Haak et al.) confirm Hajnal’s archaeological summary, especially about the Yamnaya. He cites Gimbutas, and we know now, that Gimbutas’ theory was not far from the truth – the beginning of the Yamnaya culture dates to around 3300-3000 BCE and there is evidence for a substantial human migration shortly after 3000 BCE (for example Klejn et al. p.7-8).

Map from Narasimhan et al. (2019, p 1) illustrating the spread of the Yamnaya

Hajnal sums up the archeological records (2003, p. 131f.):

  • A “kurganized” Central Europe is hit by a migration from the East, the Yamna (or Yamnaya), around 3000 BCE. As a result, people are being pressed from Central Europe to the (South)West.
  • In the early Bronze Age, Greece and Macedonia seem to have been invaded by the speakers of Anatolian languages (based on characteristic pottery and housing findings).
  • After 2500 BCE another wave from the North introduces Corded Ware and Tumulus graves to the area – likely coming from the region of present-day Albania.
  • During these invasions, the first Hellenic tribes reached the area and subsequently established the Mycenaean culture on the Peloponnese.
  • In the Late Bronze Age, the Urnfield culture forms in Central Europe and the (Northern) Balkan Peninsula, a gradual development caused by intense trade.

His conclusions after comparing the archaeological data together with the linguistic hypotheses (2003, p. 132ff.):

  • The speakers of Mycenaean Greek likely originated in present-day Albania and Macedonia. The archaeological records confirm in a way the linguistic theory about a linguistic relation between Greek and Macedonian (He refers to Ancient Macedonian, since Modern Macedonian is a Slavic language).
  • Proto-Phrygian was spoken north-east from the Proto-Graeco-Macedonian group. Around 1200 BCE the Phrygians had spread into Northern Greece and Macedonia and left Macedonia in 1000 BCE for Minor Asia. Hajnal writes that a common proto-language cannot be proven but is likely to have existed.
  • Armenian is still problematic. We know that Herodot mentions the Armenians as φρυγῶν ἄποικοι, as Hajnal cites (p. 133) but we don’t know much more about them, and especially not where they came from. The similarities between Greek and Armenian are striking but a common proto-language is heavily discussed.
  • Thracian, Illyrian, Messapic, Venetic and Albanian are problematic as well. Thracian is a Trümmersprache, so it is hard to conclude much on this language (for the reasons stated above). The same applies to Illyrian. On Messapic and Venetian there is, according to Hajnal, no consensus (p. 134). Albanian is first attested from the 15th Century CE, so there is no direct access to its older stages of linguistic development.

This leads Hajnal to the conclusion that there were two genetically (in linguistic terms, not biological) different groups after 2000 BCE: Greek, (Ancient) Macedonian, Phrygian and probably Armenian on the one hand and Albanian, Messapic and Illyrian on the other. This agrees with the tree model, I have included in the beginning – at least for Greek and Armenian, and Albanian. It is very interesting that Hajnal sees Armenian and Greek as belonging to one genetic group.

Hajnal now goes back to the three shared features he has listed in the beginning, to finish his analysis (2003, p. 135f.):

  1. The suffix *-ih2: It could be inherited as well as a convergent development. Hajnal deems it possible that the suffix in certain Balkan languages first underwent the usual development to *- ī and later to *-i̯a to prevent ambiguity.
  2. *-ei̯o- (*-eii̯o-): It is likely that this innovation has spread from a central language through contact to the other languages in the area around it, especially when the “receiving” language was lacking a morpheme with this function. (Note: If we combine this with Matzinger, who sees Greek as central, it should have been spread from Mycenaean Greek, where it is attested. It is as well attested in Phrygian, which probably was in the northern vicinity of a possible Proto-Greek- Macedonia language, as Hajnal calls it. Considering the movement of the Phrygians, they could have adapted it through contact).
  3. *-si instead of *-su: For Hajnal, this does not let us conclude much about a convergent development, since the proto-language possibly had both variants, leaving the individual languages to choose which form they would continue; it is not possible to conclude if that choice was made in a common proto-Balkan-language or only later through contact – the same as with the first point (Note: Albanian has *-si as well, but if I would assume a Graeco-Armenian node, Albanian could have taken over *-si through contact.)

Overall, Hajnal’s conclusion (2003, p. 142f.) is that a common proto-Balkan-language is unlikely, but that it has been a Sprachbund (based on historical and archaeological results); but he classifies Armenian, Greek, Ancient Macedonian and Phrygian into one genetically related group of languages, which is interesting for the discussion of a Graeco-Armenian node.

Part 2 will focus on further opinions and my conclusion.

2 thoughts on “The cladistic tree model, pt. 1: A possible Balkan Group?

Add yours

    1. Thank you! Indeed it does. There are approximate year dates on the map, however the resolution is not the best. You can find the article, to which this map belongs, here: You can also download the map there in high resolution.
      I will publish the second part very soon, but with the kids at home for the next three weeks, my time is limited. But I am glad to hear that you find this post interesting!

      Best Regards. Eva


Leave a Reply to Marcel Biehringer Cancel reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by

Up ↑

%d bloggers like this: