“Indo-European Languages are defined as a family of languages, issuing from a common language, which have become differentiated by gradual separation.” (Benveniste, Emile, p. 28)
I will begin with a short overview over the Indo-European (IE) languages and the reconstructed Proto-Indo-European (PIE) language. For a more in depth reading, I recommend J.P. Mallory and D.Q. Adams: The Oxford Introduction to Proto-Indo-European and the Proto-Indo-European World. The family of the Indo-European languages include more language sub-families than the following list, I will discuss those further down:
Tree model of the Indo-European language family
A proto-language is hypothetical, and therefore its reconstructions are marked with a * because there are no written records of that form, but they are close enough to the real world for us to know that they existed – the daughter languages are the proof of this. This is what we achieve with the linguistic methodology.
This graphic above is a relatively simple cladistic tree model, which I have designed based on a model and table by Thomas Olander (Olander 2019, p. 8 & 9, figure 2.1. and table 2.1.). Not all of the sub-branches are accepted without discussion (especially Indo-Slavic), and I will explain my reasoning for using this model when I discuss the various branches below.
Olander has also a forthcoming article about Indo-European cladistic nomenclature and its challenges, which I recommend reading. This article illustrates one of the big problems in communications especially with other disciplines, when terminology is so variable as the linguistic terminology Proto-Indo-European (he is listing 13 different terms for the same concept. While I for now employ the terminology used by Anthony & Ringe 2015 in my thesis (since I am referring to their work), I absolutely agree that we need to work on terminology and have therefore also added Olander’s terminology in this tree model.
I also had a lot of thoughts about the Greco-Armenian branch, but I carefully decided to keep it in my model, despite it still being under discussion. I will update this tree model when necessary.
Terminology as used by Anthony & Ringe 2015:
- Early Proto-Indo-European (before Anatolian has split off)
- Post-Anatolian Proto-Indo-European (after Anatolian has split off)
- Late Proto-Indo-European (after Tocharian has split off)
The Indo-European Languages, a more in-depth overview
The family of Anatolian languages is thought to have split off first from Proto-Indo-European, leaving us with what is called post-Anatolian-PIE (Anthony and Ringe 2015, p. 201). The Anatolian languages, of which Hittite is the best attested, date to the last two millennia BC (van den Hout 2011, p. 1) but they have not survived. Hittite is the best attested of these languages.
The Anatolian languages were the first group to split off from the Indo-European family. Evidence for this can be found in the phonology and morphology (see 188.8.131.52). One example is that the reconstructed root for ‘axle’, *h2eḱs-, can be found in most daughter languages with the same meaning, but Anatolian did not preserve a word derived from this root, which indicates that Anatolian split away before the invention of axles (Anthony 2017, p. 45). The same goes for the PIE verb *kwelh1-, ‘turn’, which is the root for ‘wheel’. Both words are part of a semantic field, which does not seem to exist in Anatolian. Wheeled vehicles did not exist anywhere before 4000 BCE (Anthony, Brown, 2017, p33), so Anatolian probably split off before the appearance of those, thus developing its own words and not using inherited terms.
(Map: van den Hout, 2011, p. XV)
Hittite is the now extinct language spoken by the Hittites, an Anatolian people from the Late Bronze Age, approx. 1650 – 1200 BC, living in what is today the Republic of Turkey. If the current mainstream consensus of the Indo-European homeland is correct, the Hittite language has been a superstrate, and intrusive to Anatolia. It was the principal administrative language of the kingdom of Ḫatti (the capital city being Ḫattuša, today Boğazköy). It is both synthetic and inflecting, which are typical features of an older Indo-European language (Hoffner & Melchert 2008, p. 2).
The Hittites were one of the major powers in the Near East (van den Hout 2011, p.2). Their language, Hittite, should more correctly be named Nesite, since the Hittites themselves called their language neš(umn)ili/našili ‘of (the city) Nesa’, but the name Hittite has become far too well established to be replaced (Melchert 2003, p. 15).
The written sources are texts in cuneiform (Keilschrift), mainly represented on clay tablets. The characters would have been impressed with a stylus, giving the script its distinguished look. It is a phonetic writing system, supplemented with logographics: logograms from Sumerian (Sumerograms) and Akkadian languages (Akkadograms). The differences are marked in transliteration by using italic lowercase for cuneiform, italic uppercase for Akkadograms and roman uppercase for Sumerograms (Hoffner & Melchert 2008, p. 9 ff.)
The impact of Hittite on the Indo-European family tree
The first deciphering of the Hittite tablets was achieved by the philologist Hrozný in 1917. He revealed similarities between Hittite and other Indo-European languages, and it was soon clear that Hittite represented a whole new branch within the IE family, Anatolian. It also turned out to be by far the oldest surviving written source of any Indo-European language. There were very odd lexical and grammatical features, which distanced it from the other languages but connected it to Luvian, Lycian, Palaic and Lydian. In comparison to Sanskrit, it was a morphologically simple language (Klein 2017, p. 220 f.).
The deciphering of Hittite supported the theory of the laryngeals: sounds written in Hittite with ḫ were in other languages represented by a long vowel, for example Hitt. paḫš– ‘protect’ vs. Lat. pāscō, or vanished word-initially Hitt. ḫant- ‘front’ vs. Lat. ante ‘in front of’. This ḫ was a striking difference between Hittite and the other Indo-European languages and could finally be connected by Jerzy Kuryłowicz to Saussure’s earlier theory of a consonantal schwa. The laryngeal theory was (re)born (laryngeal being an earlier term which the linguist Hermann Møller took from Semitic). Where other languages only could show traces of a vanished laryngeal, Hittite showed them more or less directly (Klein 2017, p. 221 f.). This is clearly an archaism and was kept in Hittite but lost in the other branches after Anatolian had split off, thus providing evidence for Anatolian to be the first to branch out.
While the laryngeal theory took a long time to break through in some countries, albeit not without debate, it is now widely accepted. There are some differing opinions about it, though. In the laryngeal theory, there are three common laryngeals. But for example, Kuryłowicz set up a fourth laryngeal having the same properties as h2 but not preserved in Hittite. This is the theory which Edgar H. Sturtevant (who wrote “A Comparative Grammar of the Hittite Language”(1951)), inspired by Edward Sapir, took over. Various other laryngeal theories followed, but the three-laryngeal model has prevailed. (Klein 2017, p. 223).
Another issue causing much consternation was the “Indo-Hittite” theory. Especially Sturtevant was firmly convinced that Proto-Anatolian was a sister of Proto-Indo-European, not a daughter language, and that their common parent language was Proto-Indo-Hittite. He based it on the archaisms of Hittite and Hittite’s very early attestation. This theory has never been commonly accepted. Rather, there is now a general consensus that Anatolian was the first sub-branch to split off from the Indo-European mother language, because of the many lower-profile phenomena, including: the post-Anatolian activization of the *-nt- participles and *-eh2 as replacement of the pronominal nom.-acc. neuter plural in *-oi (Klein 2017, p. 233f.). Essentially, this is a problem of terminology. To make it more visible:
After Anthony and Ringe (2015, p. 201), which I personally prefer:
- Early Proto-Indo-European (before Anatolian has split off)
- Post-Anatolian Proto-Indo-European (after Anatolian has split off)
- Late Proto-Indo-European (after Tocharian has split off)
The Indo-Hittite theory in comparison:
- Proto-Indo-Hittite (before Anatolian with Hittite has split off)
- Proto-Indo-European (after Anatolian has split off)
There is no real argument whether Anatolian was the first branch to have split off, but a discussion about how closely Anatolian was related to PIE, and what we therefore should call the period before and after the split.
The last thing I would like to mention are the Hittite ḫi– and mi-conjugations. These are two types of verbal conjugations, where the endings differ in the present and preterite singular. Here are some examples of the present singular for comparison:
- mi-conjugation: 1sg.prs. –mi, 2sg.prs. –ši, 3sg.prs. –zi
- ḫi-conjugation: 1sg.prs. -(ḫ)ḫi / -(ḫ)ḫe,prs. -(t)ti, 3sg.prs. -i
There is no difference in meaning between the two conjugations. The endings of the ḫi-conjugation look remarkably like the perfect in PIE.
The origin of the mi-conjugation is clear (the endings go back to the primary active endings in Proto-Indo-European), but the origin of the ḫi-conjugation has been and still is debated. There are two “main” opposing factions, of which I’d like to name Jasanoff (2003) and Eichner (1975). Further information on the contrasting views of this debate described in the first chapter of Jasanoff, 2003 (“The Problem of the ḫi-conjugation”), although I summarize them here.
Jasanoff (2003, p. 91) introduces the ‘h2e-theory’, arguing that PIE had an active athematic present with the endings *-h2e, *-th2e, *-e, 3pl. *-(é)rs, endings which usually are associated with the IE perfect, and that the ḫi-conjugation is essentially derived from those. This would mean that the Hittite ḫi-conjugation is archaic and the other daughter languages changed these endings into the perfect after Anatolian had split off.
Another position is taken by Eichner (1975), who argues that the ḫi-conjugation in Hittite developed from the perfect (and middle) in PIE. This means that Anatolian changed the perfect to something new, while the other daughter languages kept the original PIE concept.
Personally, the latter seems to me a more likely, if not natural, development. I don’t think that only Anatolian, especially Hittite, preserved the original state while all the other branches changed it to something else. This also keeps Anatolian linguistically a bit closer to the other branches, and this is another reason, why I prefer ‘Early Proto-Indo-European’ to ‘Indo-Hittite’.
There is not much known about the Luwian territory. Through the Hittite Laws, we know that “Luwiya” existed and that it must have been close to the heartland of the Hittite kingdom in Hatti (Melchert 2003, p. 1f.). It is also through these laws that we know that the Hittites viewed the Luwians as foreign, as not belonging to their own social group. We do not know how big Luwiya was, nor if it was a purely geographical term or a nation (the close relationship and the inclusion of the Luwians in the Hittite Laws speak against that). There are no records of kings of Luwiya and no evidence for a unified Luwian state. Luwian hieroglyphic inscriptions and Luwian ritual texts in the Hittite archives indicate, however, that the Luwian territory must have been widespread, although we don’t know the exact borders (Melchert 2003, p. 2-3).
In writing, Luwian is attested both as Hieroglyphic Luwian (HLuwian) and Cuneiform Luwian (CLuwian). Cuneiform Luwian is attested already at the beginning of the 16th century BC, occurring in Hittite texts as loanwords (Klein 2017, p. 242). Hieroglyphic Luwian is first attested from around 1400 BC to approximately 700 BC (van den Hout 2011, p. 1)
We can find Luwian loanwords in Hittite; those loanwords are fully adapted to Hittite. In the oldest attested Hittite, there is no evidence of words with Luwian inflections, but the occurrence of loanwords demonstrates Luwian influence on Hittite (Melchert 2003, p. 13).
1.3 Lycian, Carian, Lydian
Lycian and Carian are more closely related to Luwian than any other language of the Anatolian subfamily, but the geographical situation is as complicated as the Luwian one: “We can therefore be no more precise for the prestages of Lycian and Carian than ‘somewhere in the southwest’…” (Melchert 2003, p. 14-15). Lycian has its own alphabet, adapted from the Greek (Clackson 2011, p. xviii).
Lydian is a language of western Anatolia. The earliest inscriptions are attested on coins from the 8th century BC. Inscriptions on stones are from the 5th and 4th century BC (Meier-Brügger 2000, p. 24).
Palaic is the language of Palā (see map), which lay north and northwest of the Hittite core area (Klein 2017, p. 241), and was mentioned in the Hittite Laws alongside Hatti and Luwiya. It is attested as a liturgical language in few ritual texts from Hattusa, fully distinct from Hittite and Luwian (Melchert 2003, p. 10-11)
(Map from Fortson 2010, p. 401 – The Tarim Basin)
The Tocharian branch consists of two closely related languages, Tocharian A and Tocharian B. Both languages are found in documents from the regions of Qarashahr and Turfan in the Center of Chinese Turkestan, and additional documents in Tocharian B have been found further west, around Kucha (Adams 1988, p. 1). The documents are datable to around the sixth, seventh, and eighth centuries A.D. The documents found were often left by Buddhists as votive offerings at desert shrines.
We do not know with certainty if Tocharian was spoken by the Yüeh-chih /Tokharoi populations (see Adams 1988, p. 4f. for a more in-depth discussion). Adams argues that there are correspondences between the Germanic languages, Indic, Greek and to a lesser degree with the Baltic, and therefore suggests that pre-Tocharian was a dialect located geographically close to pre-Germanic and, more distantly, to pre-Baltic. Later on, the people speaking pre-Tocharian languages must have moved south and east and had some contact with ancestors of the Greeks (maybe in the northern Balkans) and the Indo-Iranians (while moving across the Pontic steppes).
Tocharian A was solely a liturgical language, while Tocharian B was both a written and spoken language. Both Tocharian A and B were written mostly in a variety of the north Indian Brāhmī syllabary (Adams 1988, p. 9).
The opinions about how exactly Italic and Celtic relate to each other are mixed, yet I choose to group these two. My reasoning is as follows:
There are both morphological and phonological correspondences between the two branches (Kapović 2017, p. 320):
- Thematic genitive in -ī
- Superlative suffix *-(i)sṃmo-
- Development of *CṚHR to CrāC
The likely value of at least some of those connections is still debated. Sims-Williams recognizes the Italo-Celtic theory as resilient but is skeptical. His arguments are amongst others that the thematic ī-genitive is found also in Messapic, but not in Celtiberian or Osco-Umbrian, and that the superlative suffix “may be due to diffusion over the long period during which Italic and Celtic speakers lived side by side” (Sims-Williams in Kapović 2017 p. 357).
Since languages are usually grouped together because of their shared morphological innovations (because it is unlikely that those languages have developed the same morphological traits independently, so they are very likely to be inherited). In addition, the correspondences listed by Kapović are mainly morphological, which is also why I have chosen to group Italic and Celtic together.
(Map: from Weiss 2011, p. XXVII)
The Italic branch includes Latin, which is the only surviving language that was spoken in ancient Italy (apart from Greek), and was closely related to Faliscan. Another branch is formed by the Sabellic languages (Oscan, Umbrian and South Picene) (Kapović 2017, p. 317). The last branch is Venetic. However, while Weiss sees Venetic as its own branch within Italic, Kapović (2017, p. 318) writes that only some phonological evidence points into this direction, but the morphological evidence does not point either way, and leaves the question open for now. I have chosen to keep Venetic as a branch of its own in my model.
Latin is attested from the 6th century BC (Very Old Latin) to Late Latin (7th century AD) (Weiss 2011, p. 23). The history of the Latin alphabet goes back to Greek. The Greek alphabet that most people are familiar with is the Ionian variant. But the alphabet, which eventually contributed to the development of the Latin alphabet, is the Western Greek alphabet, which had some differences, for example preserving the Digamma for a longer time. This alphabet was adopted by the Etruscans (who were not Indo-European speakers), to the needs of their own language. The Romans then adapted the Etruscan alphabet.
Sabellic is a sub-branch that contains amongst others: Oscan (ca. 650 mostly short inscriptions mainly from the 5th to the 1st century BC), Umbrian (ca. 40 short inscriptions and the Tabulae Iguvinae, 7th to 1st century BC), South Picene (ca. 20 inscriptions, 6th to 4th century BC) and Pre-Samnite (ca. 20 inscriptions, 5th century BC) (Weiss 2011, p. 13).
There are approximately 400 documents in Venetic, dated to ca. 550 BC (written in a modified version of the Etruscan alphabet) to 100 BC (Latin alphabet). Weiss writes that Venetic probably is a separate branch of Italic, but could also be classified as separate, closely allied language (Weiss, 2011, p. 13 & 15-16).
(Map from Fortson 2010, p. 310 – The Celts)
Celtic can be divided in two groups, Continental Celtic, which was extinct at the latest by AD 500 and Insular Celtic, which includes all the surviving Celtic languages (Kapović 2017, p. 352 ff.).
Continental Celtic: Gaulish, Galatian & Celtiberian.
Insular Celtic: This branch is again divided into the Brythonic (or Brittonic) and the Goidelic (or Gaelic) sub-branch. The two sub-branches are very distinct and mutually unintelligible (Kapović 2017, p. 352 ff.):
- Brythonic (Welsh, Breton & Cornish)
- Goidelic (Irish, Scottish Gaelic, Manx)
The term Celtic is a modern one. The Irishmen in medieval times spoke Goídelach and medieval Welshmen called themselves Brython or Cymry and spoke Cymraeg. The kinship between those languages was likely discovered much later, and the idea of a pan-Celtic ethnic unity is a modern Romantic one.
The “homeland” of the Celtic speakers is so far unknown. The archaeological association of Celtic speakers with the Hallstatt or La Tène cultures is according to Sims-Williams (in Kapović 2017, p. 354) not without problems, even though Celtic speakers likely have been part of those cultures. But a correlation between La Tène and Celtic does not work in Spain or northern Italy, where the findings of Celtic inscriptions do not correlate with La Tène material. Historical references by Latin and Greek writers are problematic as well, as they may not correspond to our modern linguistic term Celtic, and they may not have differentiated between barbarian people such as the Celtae and the Germani.
(Map from Fortson 2010, p. 352 – The Germanic peoples around 500 AD)
The position of the Germanic languages within the Indo-European languages is problematic, as well as its presentation within a tree model (Kapović 2017, p. 388). There is some evidence that North and West Germanic are closer to each other than their sister East Germanic, and they are therefore grouped into one branch, North-West, with East Germanic being the other branch:
(graphic by me)
The fact that East Germanic left Southern Scandinavia relatively early and left a dialect continuum of North and West Germanic supports this model.
Germanic has been referred to as a homogenous proto-language, which split into dialects at a later time, although traces of old dialectal differences make this seem unlikely (Klein 2017, p. 989).
About the placement within the Indo-European languages, the problem is that the historical neighbors of Germanic are not always regarded as its closest relatives. The first Consonant Shift (Grimm’s law) is a perfect candidate for being the main distinctive feature of Germanic, separating it from the other Indo-European languages. On the other hand, it might just be the most obvious one, since other phonological, morphological and accent changes have influenced Germanic just as much, if not more (Verner’s law, the loss or vocalization of laryngeals, vowel mergers, accent fixation on the first syllable / stem syllable) (Klein 2017, p. 988 f.)
Germanic is an extremely complex branch, and as Bousquette and Salmons note “shared innovations are unlikely to be independent innovations, but these prove problematic” (Bousquette and Salmons in Kapović 2017, p. 388). Ludwig Rübekeil writes in Klein, 2017, p. 989 about a mechanism called drift: “… the phenomenon whereby parallel linguistic processes often appear in genetically related languages subsequent to their separation, apparently owing to their common past”. Considering both statements, it is very difficult to place Germanic in a certain location exactly, nor can we be sure about the relationship between its subbranches. But I have chosen to place North and West Germanic together, as the sociohistorical evidence (Kapović 2017, p. 388) seems to be important to me to consider in this model as well. In the general tree model, I follow Olander (2019, p. 8 & 9, figure 2.1. and table 2.1.)
4.1 North Germanic
Ancient Norse is the first attested language in inscriptions found in the older fuþark, consisting of 24 letters. The change from Ancient Norse to Old Nordic occurs around 700 BC, when the older fuþark is replaced by the younger fuþark (18 letters). This branch includes Old Icelandic, Old Norwegian, Old Swedish and Old Danish. Old Swedish and Old Danish together make the East Norse languages, Old Norwegian and Old Icelandic the West Norse languages. The best attested and most conservative language of this branch is Old Icelandic, from the 9th century AD and later, with a big vernacular literary tradition. The earliest attestations of Old Norwegian are from around 1150-1200 BC. The East Norse languages are more sparsely attested, and from around 1250 BC (Klein 2017, p. 878 f.).
4.2 East Germanic
Gothic is the earliest Germanic language attested in a longer text, the Gothic bible translation by Wulfila (ca 350-380 BC). Gothic became extinct after the realms of the Ostrogoths and Visigoths collapsed. The latest attested East Germanic language is Crimean Gothic, which includes a list with 86 gothic words, and is probably a late EG dialect (Klein 2017, p. 879).
4.3 West Germanic
Early West Germanic developed after the Angles, Jutes, and parts of the Saxons left to settle in Britain, and a natural border between North Germanic and West Germanic was created by the 6th century AD.
Part of this branch is Old Saxon (Old Low Germanic), of which we don’t have many runic sources. The major literary source is Heliand, composed between 830-840 AD. Its descendant is Middle Low Germanic.
Old English is attested in runic inscriptions starting in the late 5th century AD, written in Anglo-Frisian fuþorc, which had additional runes, testifying that sound changes had occurred. The earliest Old English text is from around AD 600, king Æþelberht’s code of Kentish. The oldest Old English poetry includes Beowulf. Its descendent is Middle English.
Old High German runic inscriptions date back to the 6th century AD, found in southwestern Germany. After AD 600, the runic tradition ends suddenly as a consequence of Christianization. Notable for Old High German is the second sound shift, first attested on the Wurmlingen lancehead. The first literary texts are from the late 8th century AD. The descendant of Old High German is Middle High German. Old Frisian belongs to the same branch, attested from AD 1200. Other languages are Langobardic and Old Dutch (Klein 2017, p. 881 ff.).
Modern German, as it is now, is spoken in a large area (Germany, Austria, Switzerland) and is a dialect continuum, in which individual borders are almost impossible to set.
5. A Balkan Group – Balkan-Indo-European?
Within the Indo-European languages, Armenian has a close connection to Greek. This means, that both languages, and their speakers, could have been in close contact in prehistoric times in the area north of the Balkan Peninsula and the Black Sea (Schmitt 1981, p. 23).
There are also theories that Armenian also has a connection with the fragmentarily attested language Phrygian (Minor Asia), or Albanian, so that there have been suggestions of a Balkan branch (for example Holst 2009). This branch mightinclude Greek, Phrygian and Albanian, which would mean they had a common ancestor before Greek split off. If this was the case, then we should be able to track down some shared innovations. There are lexical similarities between Armenian and Greek, but those alone are not enough to assume a Balkan branch; for that, shared morphological or phonological traits are needed.
One could argue for example that the prothetic vowels (deriving from word-initial laryngeals), which are found in Greek, Armenian and Phrygian (Olsen 2017, p. 429) are such a trait, amongst others. But the Greco-Armenian theory is also opposed, for example by Kim (2018). He sees no close connection between Armenian and Greek, let alone the existence of a Balkan branch at all. Kim argues in his article (2018, p. 259 f.) that features like the augment also found in Indo-Iranian (and therefore not exclusive to Greek and Armenian) and sees a closer connection between Armenian and Indo-Iranian. He also argues that lexical similarities are not enough to establish these connections as they could be loans or borrowed.The phonological and morphological shared traits are simple and natural enough to have developed in those languages independently.
Even though there exists the possibility of a Balkan branch, I do not include that possible branch here. Maybe at some point a consensus can be reached regarding its existence. Lexical similarities can still be interesting, if the words are formed in the same way. For example, given the same root with the same suffixes – we might think their independent invention in two languages as unlikely. However, Armenian is a satem-language and Greek is a centum-language. How does that fit together, if they belong to one branch? Sadly, I do not have enough space in this thesis to discuss this further, but those are possibilities which need to be considered when arguing for or against a Balkan branch, or for that matter a Greco-Armenian branch.
(Map from Fortson 2010, p. 250 – Greek dialects)
The first attested findings of the Greek language are written down in Linear B (a syllabary) on tablets dated to the 14thcentury BC on Crete in Knossos. This was Mycenean Greek, and more tablets were found in continental Greece, as well asothers in Pylos, Thebes and Mycenae (Meier-Brügger 2000, p. 26 f.). The oldest texts are the Linear B texts, written in a syllabic writing system (Sihler 1995, p.9).
The first texts in Ancient Greek are the epics the Iliad and the Odyssey. While the stories were supposedly written by Homer and are thought to date back to the Bronze Age, the written form dates to around the 8th century BC. They most likely were transmitted orally and were then written down later. The oldest sources are found in Egypt from the 2ndcentury AD (Meier-Brügger 2000, p. 27).
Ancient Greek was divided into several dialects, collected into two primary groups (Klein 2017, p. 1583): East Greek (including Attic-Ionic, Arcado-Cypriot) and West Greek (including Aeolic with Lesbian, Thessalian and Boetian; and Doric-Northwest Greek).
Later on, the Koine, a common dialect for all Greeks, was established. It was based on Attic but also has some traits from the other dialects (Rix 1992, p. 6). Modern Greek is not usually treated in the Indo-European textbooks, as it has taken in a lot of loanwords and is fairly different.
The Greek alphabet derives from the Phoenician alphabet, which was adapted from the West Semitic alphabet, and consisted only of consonants (which is typical for Semitic writing). The Greeks adapted the Phoenician alphabet from the 8th century BC to their needs and added letters for the consequential marking of vowels (Rix 1992, p. 25).
(Map from Fortson 2010, p. 384)
It is not exactly clear where the Armenians came from. It is proven that they lived in the area around the Mount Ararat and Lake Van, as well as the headwaters of the Euphrates and Tigris Rivers. There are theories that the Armenians invaded from Minor Asia, and maybe even farther up north (Schmitt 1981, p. 18-19).
The Armenian alphabet, invented by the Armenian monk, Mesrop, appeared fairly late, around the 5th century AD, when compared to other Indo-European languages. This leaves us with the oldest known Armenian texts.
Classic Armenian, the oskedarean Hayerên (“Golden Age Armenian”), also called grabar, is a form of Koine, based on oneArmenian dialect. There is not much known about other dialects. Since the Armenian writing consisted mainly of translations of the Bible and Christian texts, which were written in Classic Armenian, there is a lack of knowledge about the colloquial language at that time (Schmitt 1981, p. 20 f.).
(Map from Fortson 2010, p. 447)
Albanian is spoken by people in Albania and the Albanian diaspora in America, Europe (former Yugoslavia, Macedonia, Greece and Southern Italy) and Oceania. There are two main dialects in the Albanian homeland, separated roughly by the Shkumbin river (Fortson 2010, p. 446):
- Geg, spoken in northern Albania, former Yugoslavia, Macedonia and Turkey
- Tosk, spoken in southern Albania, Greece and Italy. The Italian version, the dialect Arbëresh is the most archaic version. Standard Albanian is based on Tosk.
It is the last Indo-European branch to appear in written records. Albanian was heavily influenced by Latin in the 2nd century BC. Later on, loanwords were added to Macedonian and Bulgarian. The first recorded piece of Albanian is a baptismal formula (1462). The first known book in Tosk dates to 1592. The linguistically most interesting text is the Albanian law code, the kanun (Fortson 2010, p. 446 ff.)
The existence of a common ancestor for Indo-Slavic, after Greek, Armenian, Albanian etc. have split off, is also under discussion. I have chosen to accept the combination of them as Indo-Slavic, because they share some striking features.
- Both share the RUKI-rule, in exactly the same way – this would be indeed strange if it was an independent development
- Both branches consist of satem-languages
The best-known early Indo-Iranian language is Sanskrit, with Vedic Sanskrit being the oldest known variety. It can, for example, be found in the hymns of the Rig Veda.
Nuristani is sometimes thought of as a third branch besides Indic and Iranian, but this is still debated (Fortson 2010, p. 202).
The Mitanni texts are the earliest texts in which Indo-Iranian is attested, but only in the form of loanwords. They were found in the Near East, as citations in Hittite and Hurrian texts, linked to an empire called Mitanni (or Mittani). Most of the words could be either Iranian or Indic, but a-i-ka “one” points to Indic origin (Fortson 2010, p. 206 f.)
The speakers of Proto-Indo-Iranian are commonly identified as the archaeological Sintashta culture and its successor, the early Andronovo cultural horizon and was spoken around the end of the 3rd millennium BC (Kulikov in Kapović 2017, p. 205).
Indic tribes migrated from the Iranian plateau to the Punjab northwest of modern India, probably during the mid-second millennium BC. The Indus Valley civilization, already flourishing before the arrival of the tribes, left behind some inscriptions in a language which was not Indo-European; it may have been Dravidian (Fortson 2010, p. 206)
The most ancient language is Sanskrit (broadly referring to Old Indic), attested from the second millennium BC onwards (Kulikov in Kapović 2017, p. 214). Vedic Sanskrit and Classical Sanskrit are the two major dialects belonging to Old Indic. Vedic gained its name from the Vedas, a collection of sacred lore, with the oldest Veda being the Rig Veda. But like the Homeric epics, it was likely composed much earlier and transmitted orally until it got written down (Fortson 2010, p. 208).
Classical Sanskrit is the language of the two important Indian epics, the Mahābhārata (which includes the famous Bhagavad-Gītā) and the Rāmāyaṇa, which is still used today to a certain extent. (Fortson 2010, p. 209).
Middle Indic: Sanskrit is no longer spoken but used as a sacral language in Hindu context. Prākrits (the middle Indic languages) were used for poetry and dramatic work as well as in religious texts. The late Middle Indic vernaculars were used in in literal tradition as well, and the colloquial language was represented by early forms of New Indic (Kulikov in Kapović 2017, p. 215).
All the Middle and Modern Indo-Iranian languages essentially derive from Sanskrit.
Avestan is the oldest preserved Iranian language. Old /Gathic Avestan (an archaic dialect comparable to Vedic) is the earliest preserved stage of the language. Younger Avestan has a greater corpus of written records preserved. The manuscript tradition goes back to the Sasanian period (224-651 AD) (Sims-Williams in Kapović 2017, p. 264 ff.).
The only other older Iranian language preserved in a significant amount is Old Persian, known from inscriptions of the Achaemenian period, 6th to 4th centuries BC. Like Avestan, it is written in a specially invented script, in this case a form of cuneiform writing (Sims-Williams in Kapović 2017, p. 266).
(Map from Fortson 2010, p. 237 – The Persian Empire under Darius I)
The Old Persian inscriptions are the only authentic Old Iranian originals, written by the very people who spoke the language. These inscriptions come from various sites in western Iran, most of them from the time of the reign of Darius I (reigned 521 – 486 BC) and his son Xerxes (reigned 486 – 465 BC). It is there where Median loanwords have been attested (Fortson 2010, p. 238).
Middle Iranian languages: Usually divided into East and West Iranian (Fortson 2010, p. 241 ff). Not all of them can be traced back directly to Avestan or Old Persian. There is great admixture with other dialects which result in the Middle Iranian languages.
- West Middle Iranian: Parthian (the official language of the Arsacid dynasty (247 BC – 224 AD), Middle Persian (official language of the Sassian dynasty (224 – 652 AD), Pahlavi (the language of the Zoroastrian Middle Persian texts) and Manichean Middle Persian.
- East Middle Iranian: Bactrian; Kothanese and Tumshuqese (collectively referred to as Saka), Sogdian and Choresmian.
It should also be mentioned that here are many Modern Iranian languages, for example Ossetic, Kurdish, Farsi, Balochi and Pashto.
(Map from Fortson 2010, p. 421 – The Slavs and Balts around 1000 AD)
Even though the grouping of Baltic and Slavic into one speech community has been problematic for some, most Indo-Europeanists agree that those language branches belong closely together. Prehistorically, the Balts and Slavs were located in Eastern Europe, close to the Germanic tribes, which can be seen in numerous shared features between Germanic and Balto-Slavic, for example (Fortson 2010, p. 414):
- Dative and instrumental plural with *-m-formant rather than *-bh-formant
- Demonstrative pronoun with the stem *ˆki-
- Both have merged *a and *o
Balto-Slavic shares the ruki rule with Indo-Iranian, and is also a satem language, pointing to an early period of common development. It has three unique phonological features within the IE language families (Fortson 2010, p. 414 f.):
- Distinction between rising and falling pitch accents
- Change of the syllabic resonants typically to resonants preceded by i
- Change of *VRHC to *¯VRC.
We don’t know very much about the Balts, compared to the Slavs. In historical times, the Balts occupied a smaller area along the Baltic sea, but in the Late Bronze Age, they may have stretched from the western border of Poland to the Ural Mountains. It is a great loss for Indo-European that the Baltic languages were written down so late (Fortson 2010, p. 432 f.).
The only surviving Baltic languages are Lithuanian and Latvian (East Baltic). Old Prussian (West Baltic) is extinct.
The ancestors of the Slavs were likely once located in or near Iranian territory, as Iranian loanwords were borrowed into pre-Slavic at an early date. The territory would have been north of the Black Sea, as there are a number of river names, seemingly Iranian, such as the Dnieper, the Don etc. The tribes then moved westwards and came into contact with Germanic tribes, which can be seen in the shared features I have listed above. The modern Slavic languages only arose about 1500 years ago (Fortson 2010, p. 420).
Again, it was religion which led to the beginnings of Slavic literacy. In AD 863, two brothers came to Moravia to teach the Slavs about Christianity, the monks and missionaries Constantine and Methodius. Methodius did most of the translation work, while Constantine devised a script, which is now called Glagolitic. The Cyrillic script, which is used nowadays, is distinct from it, and was devised about thirty years later in Bulgaria, based on the Greek majuscule letters. None of the brother’s translations has survived, so the earliest known Slavic comes from their successors in Bulgaria, called Old Church Slavonic because of its role in Slavic Christendom. Old Bulgarian is the Old Church Slavic of Bulgaria (Fortson 2010, p. 427).