Peering at the Tocharians through Language: A Window to the Ancient Europoid Folk of Western China

Written by Afsheen Sharifzadeh, a graduate of Tufts University focusing on Iran and the Caucasus. The goal of this article is to present the Tocharian narrative in a broad linguistic framework, with a focus on affinities to earlier Proto-Indo-European. 

Europoid-type “Tarim Mummies” found in XInjiang, China, dating back to around 1800 BC. The Tocharians are described as having full beards, red or blond hair, deep-set blue or green eyes and high noses and with no sign of decline as attested in Chinese sources for nearly a millennium. The mummies, particularly the early ones, are frequently associated with the presence of the Indo-European Tocharian languages in the Tarim Basin, although Mallory and Mair attribute the later mummies to the Iranian Saka (Scythian) people who settled later in the western part of the basin.


What do Englishmen, Sicilians, Spaniards, Bengalis, Kurds, Russians, Welshmen, Germans, Pashtuns, Lithuanians, Armenians, Australians, Persians, Irish, Greeks, Swedes, Punjabis, Albanians, Brazilians, Icelandics, Romani, Ossetians, and many other peoples all have in common? Astonishingly enough, we all speak languages derived from a single Mother Tongue. This is a humorously underappreciated fact amidst the clutter of our daily social interactions, and more broadly, in our latent perpetuation of decidedly irreconcilable ethnic consciousnesses.

But this Mother Tongue was not a monolith. And neither was it ever recorded or attested to, to the modern linguist’s dismay. Variations of it were spoken for a span of roughly two thousand years between 4500 BC and 2500 BC, as it underwent drastic regional transformations and passed through defining bottleneck events. All the while, its daughters were splitting and differentiating via mass migrations of peoples throughout Eurasia, sometimes losing and regaining contact with each other after centuries or millennia. In the absence of written attestation to any of these highly dynamic prehistoric vernaculars, linguists have used the comparative method of language reconstruction to produce a long, fragmentary list of words used in daily speech. These registers have been applied in synergy with archaeological evidence to paint a compelling narrative for one of the most appreciable ethno-linguistic progenitors of modern human civilization.

Pashtun children in the village of Khost, Afghanistan. Pashto is an Eastern Iranian branch language, and shares a common ancestor with languages such as English, Russian, Italian, Welsh and Hindi, in the form of the Proto-Indo-European language, spoken between ~4,500-2,500 B.C.

The Mother Tongue is known to linguists as Proto-Indo-European. But where did the progenitors of all these modern languages, the Proto-Indo-Europeans, live? After all, they were not some obscure race of language-speaking humanoids roaming aimlessly on a primitive Earth, but rather, a pluralistic people who lived fairly recently with families, communal responsibilities, ambitions and concerns like you and I, speaking an adaptive language with which they sang, joked, loved, lamented and prayed, in a world populated by many different language families but theirs came to include roughly half the world’s population by the modern era.

The Tocharians as Indo-Europeans

The answer to that question has been the subject of heated debate among archaeologists and linguists for over a century. In the opinion of this author, the preponderance of archaeological, philological, and chronological evidence points to a Pontic-Caspian Urheimat (homeland)  for the speakers of Proto-Indo-European. This so-called “Kurgan Hypothesis” posits that in the riverine steppe lands darting from southern Ukraine deep into the Ural Mountains of Russia lived a semi-nomadic, mortuary mound-building (called kurgans, from Turkic), animal-sacrificing, cannabis-smoking, pastoralist, glory-inspired people whose commitment to ceremony and client-host tradition coupled with their militaristic ingenuity served as a franchising incentive for the widespread adoption of their languages by subject peoples. In archaeology, our conjectured Proto-Indo-Europeans are said to have composed the early mesolithic Yamna culture. This phenomenon, wherein language shift occurs due to emulation of an intruding but more powerful minority, is called Elite Dominance; a postulation that also explains the later extinction of Iranian languages in Central Asia and Azerbaijan beginning in the in the 11th century AD upon the arrival of a minority of Turkic-speaking peoples.

The Kurgan hypothesis postulates a Pontic-Caspian steppe homeland for the speakers of Proto-Indo-European (pink). The black arrows represent the various branch splittings of neolithic PIE-speaking peoples between ~4,500-2,500 B.C.

The Yamna culture (early Proto-Indo-Europeans) was in a turn a collection of semi-nomadic, pastoral tribes which more or less could understand each other, probably pulled to the Russian steppe (Samara and Khvalynsk) from the northeast Black Sea basin by adverse climatic changes. As such, we might better conceive of Pre-Proto-Indo-European (the stage of linguistic development before the Yamna horizon) as a group of related dialects which evolved from one group, Indo-Uralic (connecting Indo-European to Uralic languages like Finnish, Estonian, Hungarian, and Mari), of another [Uralo-Siberian] group of an earlier [Eurasiatic] group of the proposed primitive Nostratic language macrofamily. Vladislav Illich-Svetych suggested that the Nostratic language was an incredibly remote, primitive but expressive ancestral language to Indo-European, Uralic, Altaic, Semitic, Kartvelian, and disputably other families, spoken by bands of foragers near the end of the last glacial period some 13,000 years ago. If such a conjecture were to have any baring in reality, then Indo-European languages would have remote genetic affinities to modern languages like Mongolian, Arabic, Ainu, Turkish, Somali, Nivkh, Georgian, and Korean. The implications of such a theory are earth-shaking for modern social constructs of “ethnicity” throughout the world.

Centum-Satem isogloss between Indo-European branches descended from splitting events of neolithic pastoralists migrating out of the Pontic-Caspian Indo-European homeland. Centum languages (blue) departed first and share a number of archaic phonological features that were later innovated in the Satem (red) languages that stayed behind (Indo-Iranian, Baltic, Slavic, Armenian; Albanian has incongruities). The hypothetical area of origin of satemization happens to also be in the range of the Sintasha/Abachevo/Srubna cultures (dark red). Tocharian, the easternmost Indo-European language spoken in the Silk Road caravan cities of the Tarim Basin in northwestern China, also lacked the Satem and Ruki innovations, so it likewise seems to have departed prior to the Satemization phenomenon.

When the first wheel-driven wagons rolled into the Pontic-Caspian steppe via the Caucasus piedmont from the ancient urban civilizations in the Near East around 4,500 BCE, the new invention spurred what archaeologists refer to as the Yamnaya horizon. This horizon transformed the Yamna culture (Proto-Indo-Europeans) into a mobile, expansive economy. Many migrations (especially the Corded Ware cultural horizon that stretched from the Netherlands to the Volga) coincide, as reflected in their Indo-European lexicons, with the new revolutionary technology of the wheeled wagon. Over the millennia, the combination of push and pull factors—perhaps a combination of tribal conflicts, climatic changes, and economic incentives—spread the speakers of PIE throughout Europe and Asia, and gave raise to a number of distinct and innovative cultural horizons (TRB/Globular Amphora culture, Funnel Beaker, Pit-Grave/Poltavka, Catacomb-Grave, Abashevo-Fatyanovo-Balanovo, Andronovo, Timber-grave, Usatovo, etc.) that interacted with and often displaced many Proto-Uralic and Paleo-European speaking cultures (prehistoric European languages of unknown provenance, such as the language of the Cucuteni-Tripolye culture. A modern survivor is Basque). The speakers of Proto-Indo-European then pioneered the chariot using a technology from the Fertile Crescent, and it wasn’t until a millenium later that wagon chariots appear in China. The Beijing Chinese word for wheel is KuLu, which bares an interesting resemblance to the nearby Repin Centum derived Tocharian Kokale (from PIE *kwel-/ *kwol).

The Proto-Indo-Europeans were the first to domesticate the horse and develop chariotry. The Mitanni dynasty ruled over a Hurrian-speaking (non-Indo-European; likely related to Urartian and to modern Northeast Caucasian languages) population in what is today northern Syria between 1500 and 1350 BC, but likely was founded by Old Indic-speaking mercenaries, perhaps charioteers, who usurped the throne–a common pattern in Near Eastern and Iranian dynastic histories. The Mitanni rulers regularly made references to the hymns and deities of the Rig Veda to the east, including Indra, Varuna, and the Nasatyas or Divine Twins. The Mitanni military aristocracy was headed by the “maryanna” (from Indic “marya”: “young man”, employed in the Rig Veda to refer to the heavenly war-band assembled around Indra.) All Mitanni Kings, first to last, took Old Indic throne names, such as Tvesa-ratha (“having an attacking chariot”), and in the oldest surviving horse-training manual in the world, a Mitanni horse trainer used many Old Indic terms for technical details, including horse color and number of laps.

At some point around 3,700-3500 BCE, a mass migration took place from the central zone of the Yamna culture, around a site called Repin between the Don and Volga. The push factors for this Trans-Ural exodus are unknown, but it may have been encouraged by the new opportunities for social and economic expansion offered by the novel mobile economy discussed above, or perhaps it was due to a conflict event. The migrants settled on virgin land on the contact zone with Siberian foragers (hunters and gatherers; perhaps Proto-Altaic-speaking) a startling 2000 km to the east of their starting point, and this area developed into the Early Bronze Age Afanasievo culture. It is to the Afanasievo cultural horizon that supporters of the Kurgan Hypothesis ascribe a Pre-Tocharian pedigree.

Tracking Tarim Mummies - books - - Map
The projected Pre-Tocharian migration from the Eastern dialects of PIE accross the Ural Moutain range and into the Altai region around 3,700-3,500 B.C., where the migrants likely interacted with speakers of Proto-Uralic and Proto-Altaic before migrating southward into the Tarim Basin.

At the time of this hypothetical Pre-Tocharian split from the eastern dialects, Proto-Indo-European was still in an early stage of its development. Pre-Germanic and Pre-Italo-Celtic would split several centuries later into the Danube valley, around 3,300-3,000 BCE, but from the western and central dialects respectively. Pre-Armenian, Pre-Albanian, Pre-Phrygian, and Pre-Greek split later yet with PIE transhumance into the Balkans, but their origins are conflicting and their affinities with each other are problematic for a number of reasons that are outside the scope of this article (such as incongruities in Satemization and Centum superstrate; see Middle Dnieper multi-ethnic “vortex” culture for more reading). Even later, Pre-Baltic and Pre-Slavic split off probably from the northwestern dialects of PIE probably around 2,800 BCE, and finally Pre-Indo-Iranian between 2,500-2,300 BCE from the northeastern group. Tocharian was probably closest of kin to the PIE dialects that were ancestral to the later Thraco-Phrygian and Armenian, but similarities with Italo-Celtic suggest an extended period of contact following an initial separation event.

Pre-Anatolian had split first of all daughters, perhaps half a millennium before Pre-Tocharian around 4200 BCE from archaic Proto-Indo-European, which lacked grammatical gender, complex verbal tenses (Anatolian only has present and perfect), the dual case for nouns, and major phonemic and lexical shifts that would be passed down to the rest of her daughters. As such, for some Indo-Europeanists these traits suggest that the Anatolian branch did not develop from Proto-Indo-European at all but rather that the two evolved from different geographic dialects of a Pre-Proto-Indo-European ancestral dialect continuum, termed “Indo-Hittite” by William Sturtevant. For example, whereas almost all modern Indo-European languages have inherited PIE *do- “to give”, this root originally meant “to take” in archaic PIE around the time of the Pre-Anatolian branch splitting. It later underwent a semantic shift probably in the context of the mesolithic Proto-Indo-European client-host gift-offering tradition in the steppe, so Hittite (Anatolian branch) has instead the archaic *Pai- “to give”. Pre-Anatolian then differentiated into Lycian, Hittite, Luwian, and the poorly attested Palaic, among other languages, in the coming millennia, all of which are now long-extinct but were once spoken for thousands of years in modern-day Turkey.

Picture of The Lion Gate - Hittite Capital Hattusa 6
The Lion Gates at the ruins of Hattusha, Turkey. Hattusha was once the capital city of the Hattians, who are now believed to have been remote relatives of the Proto-Northwest Caucasian-speaking peoples. Beginning in the 4th millennium BCE the Hattic language was gradually displaced by archaic Indo-European languages, most likely archaic Anatolian branch languages Luwian and Hittite, and the Hattians were ultimately absorbed and assimilated into Indo-European-speaking society after nearly two thousand years of coexistence by the end of the 2nd millennium BCE. However, the latter adopted the former’s self-designation (<Hatti; which in the opinion of this author, is likely also the root of the Armenian self-designation Հայ Hay).

The Tocharian Language

Despite her early separation from PIE, Tocharian still shares a striking number of cognates with her sisters in her core vocabulary. If we indulge ourselves for moment, we can imagine there was once a young Tocharian girl on a farm in western China who called out the ñem (Swedish namn, Kurmanji Kurdish nav, French nom) of her older procer (Dutch broeder, Persian barâdar, Russian brat) to käm (English: come, Kurmanji Kurdish: gav, Afrikaans: kom) help her mälka (German melken, Albanian mjel, Latin mulgere) the tri (Spanish tres, Lithuanian trýs, Pashto dre) kews(English cow, Armenian kov, Persian gâv) in the pen at nighttime under the lyuks (English light, Latin lux, Armenian luys) of the beautiful stars and meñe (Danish måne, Sorani Kurdish mang, Ukrainian misjac’), while her macer (Armenian mayr, Phrygian matar, German Mutter)  and pacer(Italian pater, Hindi pitr, Persian pedar)  were preparing the misa (English meat, Gothic mats, Armenian mis) of a yekwe (Latin equus, Hittite ekuus, Irish Gaelic each) for dinner and fetching fresh war (Hittite wa-a-tar, Belorussian vadá, West Frisian wetter) to wash it down.

English Tocharian B Ancient Greek Middle Persian Portuguese Proto-Indo-European
name ñem ónoma nâm nome *h₃néh₃-m̥n
eight okt oktṓ hašt oito *h₃eḱtéh₃(u)
mother macer mḗtēr mâdar mãe *méh₂tēr
foot paiye poús pây *pṓds
wolf walkwe lúkos gurg lobo *wĺ̥kʷos
new ñuwe néos nōg novo *néwos
star śre astḗr stâr estrela *h₂stḗr

Language comparison chart prepared by the author illustrating a few readily recognizable cognates between Tocharian and a few extant/extinct Centum and Satem members of the Indo-European family, including English (Germanic), Ancient Greek (Hellenic), Middle Persian (Iranian; Satemization and Ruki rule reflected in the register for “eight” = “hašt”; labial to velar shift and *r/*l merger reflected in “wolf” = “gurg”) and Portuguese (Italic).

As the second major branching-off event of PIE, Tocharian maintains a number of archaisms that are absent in later branches. For example, Tocharian is the only geographically “eastern” Centum language, as it split before the Satem shift occurred in PIE (the Satem group merged Proto-Indo-European palatovelars *ḱ, *ḱʰ, *ǵ, *ǵʰ. and plain velars *k, *kʰ, *g, *gʰ, yielding plain velars only, but retained the labiovelars as a distinct set. For example, *ḱ became Sanskrit ś [ɕ], Latvian, Avestan, Russian and Armenian s, Lithuanian š [ʃ], and Albanian th [θ] but k before a resonant.) As such, it does not feature the subsequent innovation of Ruki sound law (*s >  / {*r, *w, *K, *y}). However, some Indo-European tribes (dialects) maintained tribal-linguistic contact—often via assimilated substrate—prior to their various distinct Proto-stages, including Pre-Greek Catacomb and Pre-Tocharian Don-Repin. The Volga Uralic Mordvin languages (Erzya/Moskha) have loanwords from early Indo-Iranian, East Baltic, and a Tocharian-like Ural-Volga area Repin Centum language, inferring another contact period probably whilst en route to Afanasievo.

Documents from the 6th to 8th centuries identify two Tocharian languages which probably split in the first millennium B.C. Tocharian A (Turfanian) is distributed along the eastern part of the Silk Road, while Tocharian B (Kuchean) is centered in the northern part. Tocharian A and Tocharian B were strikingly different languages with radical divergence in their plural markers, case system and verbal system, although it is unclear whether they were mutually intelligible. Tocharian A was more archaic and used solely as a Buddhist liturgical language, while the Tocharian B corpus includes documents that are both secular and religious in nature, suggesting that it may have been the spoken language of the entire area (discussed below). Alternatively, the lack of a secular corpus in Tocharian A could simply be an accident; the result of a fragmentary preservation of texts. Lastly, Tocharian C is only attested to in about 100 words in Prakrit documents, conceptualized by linguists who reconstructed these loanwords and attributed their origin to some unknown sister of Tocharian A and B.Moksha_girls
The Mordvin people centered in the middle Volga region of Russia speak languages belonging to the Uralic macrofamily (includes Finnish, Hungarian, Estonian, Saami, Mari etc.), whose proto-language homeland was probably in the birch-pine forest zone on the southern flanks of the Ural Mountains. Linguistic evidence suggests that Proto-Uralic and Proto-Indo-European likely shared two kinds of linkages; one kind, revealed in the similarity of pronouns, noun endings, and shared basic vocabulary, could be ancestral: the proto-languages probably shared some quite ancient common ancestor, perhaps a broadly related set of intergrading dialects spoken by hunters roaming between the Carpathians and the Urals at the end of the last Ice Age (Joseph Greenberg calls this language stock “Eurasiatic”, perhaps ultimately descended from Nostratic). The second link is cultural; proto-Uralic foragers interacted through trade with the neolithic Proto-Indo-European tribes migrating out of the homeland who introduced them to agriculture and the wheel, and again much later with Indo-Iranian and Tocharian migrants trekking eastward out of the PIE homeland.

On the relationship of Tocharians A and B, George Lane, an authority on Tocharian, concludes: “at the time when the extant materials in dialect A were written it was purely a liturgical language in the monasteries of the east, and had been so preserved for several centuries at least…. it had long since ceased to be a vernacular [as a result of Turkic immigration into the area]… whereas Tocharian B was clearly the vernacular of a comparatively rich and flourishing culture [to the west and better protected by the mountains and the desert from the influence of the Turks].”  It is very likely that B was also the language of everyday monastery life in the east, existing side by side with the liturgical form of A. Lane concludes: “the two Tocharian dialects A and B have gone through a long period of independent development… anywhere from five hundred to a thousand years…they are, in my estimation, no longer mutually intelligible.”

Wooden tablet with an inscription showing Tocharian B in its Brahmic form. Kucha, China, 5th-8th century (Tokyo National Museum)

The Decline of the Tocharians
Following their Trans-Ural exodus by over a millennium, the Tocharians interacted with and borrowed extensively from the Indo-Iranians, their distant Indo-European cousins unbeknownst to them at the time. The most recent linguistic influences upon Tocharian were Iranian and Sanskrit, as a result of extensive missionary activity from Iran and India which coincides with the Tocharian’ adoption of Buddhism. The primary effect of these languages upon Tocharian was in loanwords into the lexicon, especially in religious terminology. There was also a notable Manichaean minority, again of Iranian provenance. The Europoid-type residents of Turfan and Kucha were first noted by the Chinese in the Han-shu in the first century BC. as one of the barbarian kingdoms in their western region which had been involved in many wars with the Chinese, along with the Hsiung-nu (Mongolian nomads), Turks, and Tibetans. The Chinese sources refer to the fair, red-haired inhabitants of the Tarim basin as Yuezhi. Ultimately, the Tocharians appear to have emerged as a devoutly Buddhist and mercantile people, serving as middle-men between the more advanced civilizations of early Imperial China, Southwest Asia, and the various Iranian peoples to the west.

Painting of Buddhist monks from the Eastern Tarim Basin, Belezek, c. 8th century AD, with a Tocharian on the left. As a result of Iranian Buddhist proselytic activity, the Tocharians adopted Mahayana Buddhism and served as an important conduit for the spread of Buddhism into China and the East.

As such the footprint of the speakers of Tocharian languages remains blurred, as we can only observe them through the lens of Buddhism. Tocharians are represented iconographically, wherein they present themselves as Buddhists dressed in north Indian clothes, or as warriors dressed is Sassanian Iranian dress. Together with East Iranian peoples, such as the Bactrians, Kushans and Khotanese, the Tocharians seem to have played a role in the Silk Road transmission of Buddhism to China. Exactly when Buddhism was introduced to Tocharia from India by Middle Iranian-speaking peoples is unknown since there are no historical records describing such a transmission. Nevertheless it is likely to have been around the beginning of the Common Era, as there were already Kuchean missionary Buddhist monks in China beginning the third century AD.

Although details surrounding the social and political undertakings of the Tocharians remains shrouded in mystery due to lack of attestation, it seems that their culture lasted up until the end of the first millenium of the Common Era, after which time they were either assimilated into the growing Turkic-speaking population in the area or simply died out. This means that the Tocharians composed a distinct ethno-linguistic grouping within Indo-European for nearly three millennia–in turn providing quite a considerable window for study–but their own scant literary productions (mostly monastic and mercantile in nature) coupled with those of their neighbors fail to provide us with any substantive account regarding their existence. In general, therefore, the Tocharian evidence, due to the rather late date of the extant documents, its geographic isolation from other IE languages, and the influence of non-IE languages, has not been as helpful in reconstructing PIE as, for instance Sanskrit, Greek, or Hittite have been. However, we can learn from Tocharian about the effect that a long migration and contacts with members of other language families can have on an IE language and, as Winter says, “below the rather forbidding surface of our Tocharian data there are some real treasures to be found.”


Anthony, David W. “The Horse, The Wheel and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World.” Princeton Review Press: 2007.

Dickens, Mark. “Everything You Always Wanted to Know about Tocharian.”

Excerpt from Virdainas: a Jatvingian-Sudovian Dictionary. Jos. Paskha 2012.

2 thoughts on “Peering at the Tocharians through Language: A Window to the Ancient Europoid Folk of Western China

  1. Pingback: On “Parskahayeren”, or the Language of Iranian Armenians | borderlessblogger

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s