klotz: dimensionality reduction* + nlp*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. The article explains semantic text chunking, a technique for automatically grouping similar pieces of text to be used in pre-processing stages for Retrieval Augmented Generation (RAG) or similar applications. It uses visualizations to understand the chunking process and explores extensions involving clustering and LLM-powered labeling.
  2. Exploratory data analysis (EDA) is a powerful technique to understand the structure of word embeddings, the basis of large language models. In this article, we'll apply EDA to GloVe word embeddings and find some interesting insights.
  3. Alternative to t-SNE and PCA
  4. t-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. The technique can be implemented via Barnes-Hut approximations, allowing it to be applied on large real-world datasets. We applied it on data sets with up to 30 million examples. The technique and its variants are introduced in the following papers:

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: dimensionality reduction + nlp

About - Propulsed by SemanticScuttle