klotz: document retrieval*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Researchers from Cornell University developed a technique called 'contextual document embeddings' to improve the performance of Retrieval-Augmented Generation (RAG) systems, enhancing the retrieval of relevant documents by making embedding models more context-aware.

    Standard methods like bi-encoders often fail to account for context-specific details, leading to poor performance in application-specific datasets. Contextual document embeddings address this by enhancing the sensitivity of the embedding model to subtle differences in documents, particularly in specialized domains.

    The researchers proposed two complementary methods to improve bi-encoders:

    - Modifying the training process using contrastive learning to distinguish between similar documents.
    - Modifying the bi-encoder architecture to incorporate corpus context during the embedding process.

    These modifications allow the model to capture both the general context and specific details of documents, leading to better performance, especially in out-of-domain scenarios. The new technique has shown consistent improvements over standard bi-encoders and can be adapted for various applications beyond text-based models.
    2024-10-10 Tags: , , , by klotz
  2. Dr. Leon Eversberg explains how to improve the retrieval step in RAG pipelines using the HyDE technique, making LLMs more effective in accessing external knowledge through documents.
    2024-10-05 Tags: , , , by klotz
  3. This blog post explores scaling ColPali for efficient document retrieval across large collections of PDFs using Vespa's phased retrieval and ranking pipeline, including the use of a hamming-based MaxSim similarity function.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: document retrieval

About - Propulsed by SemanticScuttle