This tutorial demonstrates how to combine LLM embeddings, TF-IDF vectors, and metadata features into a single Scikit-learn pipeline for document retrieval and search. It covers generating embeddings with Sentence Transformers, calculating TF-IDF, handling metadata, and building a combined retrieval system.
This article details building a Retrieval-Augmented Generation (RAG) system to assist with research paper tasks, specifically question answering over a PDF document. It covers document loading, splitting, embedding with Sentence Transformers, using ChromaDB as a vector database, and implementing a query interface with LangChain.