SemanticScuttle - klotz.me » klotz: document retrieval

New Technique Makes RAG Systems Much Better at Retrieving the Right Documents

Researchers from Cornell University developed a technique called 'contextual document embeddings' to improve the performance of Retrieval-Augmented Generation (RAG) systems, enhancing the retrieval of relevant documents by making embedding models more context-aware.

Standard methods like bi-encoders often fail to account for context-specific details, leading to poor performance in application-specific datasets. Contextual document embeddings address this by enhancing the sensitivity of the embedding model to subtle differences in documents, particularly in specialized domains.

The researchers proposed two complementary methods to improve bi-encoders:

Modifying the training process using contrastive learning to distinguish between similar documents.
Modifying the bi-encoder architecture to incorporate corpus context during the embedding process.

These modifications allow the model to capture both the general context and specific details of documents, leading to better performance, especially in out-of-domain scenarios. The new technique has shown consistent improvements over standard bi-encoders and can be adapted for various applications beyond text-based models.

2024-10-10 Tags: rag, embedding, document retrieval, llm by klotz

How to Use HyDE for Better LLM RAG Retrieval

Dr. Leon Eversberg explains how to improve the retrieval step in RAG pipelines using the HyDE technique, making LLMs more effective in accessing external knowledge through documents.

2024-10-05 Tags: hyde, llm, rag, document retrieval by klotz

Scaling ColPali to billions of PDFs with Vespa

This blog post explores scaling ColPali for efficient document retrieval across large collections of PDFs using Vespa's phased retrieval and ranking pipeline, including the use of a hamming-based MaxSim similarity function.

2024-09-23 Tags: colpali, document retrieval, vespa, maxsim, hamming distance, vlm, binary quantization, pdf, vision language models, llm by klotz

SemanticScuttle - klotz.me

klotz: document retrieval*

Linked Tags

Related Tags