klotz: dataset*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. The Indo-European Cognate Relationships (IE-CoR) dataset is a comprehensive, open-access relational database detailing cognates—inherited related words—across 160 Indo-European languages. Developed by a consortium of 89 linguists, it aims to serve as a benchmark for computational research into the evolution of this vast language family, encompassing 25,731 lexeme entries grouped into 4,981 cognate sets based on 170 core meanings. The dataset incorporates time calibration data, geographical/social metadata, and a novel structure for coding horizontal transfer, adhering to the Cross-Linguistic Data Format (CLDF) for interoperability and long-term accessibility. IE-CoR addresses limitations of previous datasets through improved coverage, rigorous coding protocols, and a focus on the primary cognate state of root morphemes, offering a valuable resource for phylogenetic and quantitative linguistic research.
  2. Learn how to fine-tune large language models like Llama 3 for function calling, enabling interaction with external tools and APIs for tasks like web search and math operations.
  3. HuggingFace has released FineWeb, a new large-scale dataset consisting of 15 trillion tokens and 44TB of disk space designed for pretraining large language models (LLMs). The dataset, which leverages data from CommonCrawl, undergoes rigorous deduplication and quality filtering processes, making it a valuable tool for researchers.
    2024-06-04 Tags: , , , , by klotz
  4. 2019-09-26 Tags: , , , by klotz
  5. 2018-12-30 Tags: , , by klotz
  6. 2018-09-18 Tags: , , , by klotz
  7. 2018-08-12 Tags: , by klotz
  8. 2018-08-10 Tags: , , by klotz
  9. 2018-08-01 Tags: , , , by klotz
  10. A DataFrame is a DataSet Row » .
    2016-12-12 Tags: , by klotz

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: dataset

About - Propulsed by SemanticScuttle