This pull request adds initial support for reranking to libllama, llama-embeddings, and llama-server using two models: BAAI/bge-reranker-v2-m3 and jinaai/jina-reranker-v1-tiny-en. The reranking is implemented as a classification head added to the model graph. Testing and benchmarking were performed with server integration.
This page provides documentation for the rerank API, including endpoints, request parameters, and response formats.
Maximize search relevancy and RAG accuracy with Jina Reranker. Features include multilingual retrieval, code search, and a 6x speedup over the previous version.