The Indo-European Cognate Relationships (IE-CoR) dataset is a comprehensive, open-access relational database detailing cognates—inherited related words—across 160 Indo-European languages. Developed by a consortium of 89 linguists, it aims to serve as a benchmark for computational research into the evolution of this vast language family, encompassing 25,731 lexeme entries grouped into 4,981 cognate sets based on 170 core meanings. The dataset incorporates time calibration data, geographical/social metadata, and a novel structure for coding horizontal transfer, adhering to the Cross-Linguistic Data Format (CLDF) for interoperability and long-term accessibility. IE-CoR addresses limitations of previous datasets through improved coverage, rigorous coding protocols, and a focus on the primary cognate state of root morphemes, offering a valuable resource for phylogenetic and quantitative linguistic research.
Explores the fascinating world of etymology, focusing on Proto-Indo-European language roots, linguistic connections across modern languages, and the detective work involved in uncovering word origins. Discusses examples like 'mellifluous,' language families, and historical linguistic patterns.