SemanticScuttle - klotz.me » klotz: dimensionality reduction+nlp

klotz: dimensionality reduction* + nlp*

Document Clustering with LLM Embeddings in scikit-learn

This tutorial demonstrates how to perform document clustering using LLM embeddings with scikit-learn. It covers generating embeddings with Sentence Transformers, reducing dimensionality with PCA, and applying KMeans clustering to group similar documents.

2026-02-11 Tags: document clustering, llm embeddings, sentence transformers, scikit-learn, pca, kmeans, dimensionality reduction, natural language processing, nlp by klotz

A Visual Exploration of Semantic Text Chunking

The article explains semantic text chunking, a technique for automatically grouping similar pieces of text to be used in pre-processing stages for Retrieval Augmented Generation (RAG) or similar applications. It uses visualizations to understand the chunking process and explores extensions involving clustering and LLM-powered labeling.

2024-09-21 Tags: text, chunking, nlp, rag, dimensionality reduction, hierarchical clustering, umap, summarization, llm by klotz

Diving into Word Embeddings with EDA

Exploratory data analysis (EDA) is a powerful technique to understand the structure of word embeddings, the basis of large language models. In this article, we'll apply EDA to GloVe word embeddings and find some interesting insights.

2024-07-12 Tags: word, embeddings, eda, glove, pca, dimensionality reduction, nlp, text, python by klotz

Mapping the tech world with t-SNE - Towards Data Science

2020-01-16 Tags: t-sne, deep learning, nlp, text, clustering, dimensionality reduction, medium by klotz

GitHub - lmcinnes/umap: Uniform Manifold Approximation and Projection

Alternative to t-SNE and PCA

2018-10-09 Tags: umap, visualization, dimensionality reduction, python, embedding, machine learning, t-sne, pca by klotz

t-SNE – Laurens van der Maaten

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. The technique can be implemented via Barnes-Hut approximations, allowing it to be applied on large real-world datasets. We applied it on data sets with up to 30 million examples. The technique and its variants are introduced in the following papers:

2021-12-01 Tags: t-sne, laurents van der maaten, google, machine learning, visualization, dimensionality reduction, plot, word2vec, word embedding by klotz

(9) What are the best free and easy tools for visualizing word vectors trained by word2vec? - Quora

2016-05-19 Tags: t-sne, word embedding, word2vec, visualization, dimensionality reduction by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: dimensionality reduction* + nlp*

Linked Tags

Related Tags