Learn how to build a simple semantic search engine using sentence embeddings and nearest neighbors, focusing on the limitations of keyword-based search and leveraging large language models for semantic understanding.
This article compares the performance of LLM embeddings, TF-IDF, and Bag of Words for text vectorization and information retrieval tasks using scikit-learn. It provides a practical comparison with code examples and discusses the strengths and weaknesses of each approach.
This tutorial demonstrates how to perform document clustering using LLM embeddings with scikit-learn. It covers generating embeddings with Sentence Transformers, reducing dimensionality with PCA, and applying KMeans clustering to group similar documents.