Researchers from Cornell University developed a technique called 'contextual document embeddings' to improve the performance of Retrieval-Augmented Generation (RAG) systems, enhancing the retrieval of relevant documents by making embedding models more context-aware.
Standard methods like bi-encoders often fail to account for context-specific details, leading to poor performance in application-specific datasets. Contextual document embeddings address this by enhancing the sensitivity of the embedding model to subtle differences in documents, particularly in specialized domains.
The researchers proposed two complementary methods to improve bi-encoders:
- Modifying the training process using contrastive learning to distinguish between similar documents.
- Modifying the bi-encoder architecture to incorporate corpus context during the embedding process.
These modifications allow the model to capture both the general context and specific details of documents, leading to better performance, especially in out-of-domain scenarios. The new technique has shown consistent improvements over standard bi-encoders and can be adapted for various applications beyond text-based models.
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.
Walkthrough on building a Q and A pipeline using various tools, and distributing it with ModelKits for collaboration.
Case study on measuring context relevance in retrieval-augmented generation systems using Ragas, TruLens, and DeepEval. Develop practical strategies to evaluate the accuracy and relevance of generated context.
ColBERT is a new way of scoring passage relevance using a BERT language model that substantially solves the problems with dense passage retrieval.
Image Similarity Search
Reverse Image Search
Object Similarity Search
Robust OCR Document Search
Semantic Search
Cross-modal Retrieval
Probing Perceptual Similarity
Comparing Model Representations
Concept Interpolation
Concept Space Traversal
Image Similarity Search