The Indo-European Cognate Relationships (IE-CoR) dataset is a comprehensive, open-access relational database detailing cognates—inherited related words—across 160 Indo-European languages. Developed by a consortium of 89 linguists, it aims to serve as a benchmark for computational research into the evolution of this vast language family, encompassing 25,731 lexeme entries grouped into 4,981 cognate sets based on 170 core meanings. The dataset incorporates time calibration data, geographical/social metadata, and a novel structure for coding horizontal transfer, adhering to the Cross-Linguistic Data Format (CLDF) for interoperability and long-term accessibility. IE-CoR addresses limitations of previous datasets through improved coverage, rigorous coding protocols, and a focus on the primary cognate state of root morphemes, offering a valuable resource for phylogenetic and quantitative linguistic research.
A pair of landmark studies has identified the originators of the Indo-European family of languages in current-day Russia about 6,500 years ago, the Caucasus Lower Volga people.
>“We can see there was a small group of villages 5,700 to 5,300 years ago with just a couple thousand breeding individuals,” Reich said. “And then there was a demographic explosion, with these people going everywhere.”
A collection of 23 maps and charts illustrating various aspects of language, including origins, distribution, diversity, and evolution, with a focus on English and global patterns.
Harvard researchers traced the origins of the vast Indo-European language family to the Caucasus-Lower Volga region, identifying the ancestral population known as the Yamnaya, who appeared around 3300 BCE and spread from Hungary to western China.