What is a Reranker?
A reranker is a model that takes a query and a set of candidate results from an initial retrieval step, and re-scores them for relevance. Unlike embedding models that encode query and documents independently, rerankers compare them jointly — producing more accurate relevance scores at the cost of higher latency.
Why does reranking matter?
First-stage retrieval (semantic or keyword search) is optimised for speed and scale — it retrieves the top-k results from millions of documents in milliseconds. But speed comes at a cost: embedding-based retrieval encodes the query and documents separately, so it can miss subtle relevance signals.
A reranker fixes this. It sees the query and each candidate document together, allowing it to reason about their relationship directly. This two-stage approach — retrieve broadly, then rerank precisely — is the standard pattern in production search and RAG systems.
How does a reranker work?
Rerankers are typically cross-encoder models. Given a (query, document) pair, the model outputs a single relevance score:
- Retrieve — use a fast vector search to get top-100 candidates
- Rerank — pass each (query, candidate) pair through the cross-encoder
- Return — serve the top-k reranked results
from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# First-stage: fast embedding retrievalcandidates = vector_db.search(query_vector, top_k=100)candidate_texts = [c.text for c in candidates]
# Second-stage: rerank for precisionscore_result = client.score( "BAAI/bge-reranker-v2-m3", Item(text="indemnification clause"), [Item(id=str(i), text=t) for i, t in enumerate(candidate_texts)],)id_to_candidate = {str(i): c for i, c in enumerate(candidates)}results = [id_to_candidate[e["item_id"]] for e in score_result["scores"][:10]]When should you use a reranker?
A reranker is worth adding when:
- Precision matters more than raw speed — e.g. legal search, medical RAG, customer support
- Your retrieval recall is good but top results are noisy — reranking cleans up the final ordering
- You’re building a RAG pipeline — the documents fed to an LLM must be highly relevant; irrelevant context degrades answer quality
- Query complexity is high — long, nuanced queries benefit most from joint query-document scoring
You generally don’t need a reranker for simple lookup tasks or when latency budgets are very tight.
Reranker vs embedding model: key differences
| Embedding model | Reranker | |
|---|---|---|
| Architecture | Bi-encoder | Cross-encoder |
| Encodes | Query and docs independently | Query + doc jointly |
| Speed | Fast (pre-compute doc vectors) | Slower (runtime per pair) |
| Accuracy | Good | Higher |
| Use in pipeline | First-stage retrieval | Second-stage reranking |
Which reranker models does SIE support?
SIE supports leading open-source rerankers including:
- BGE-Reranker-v2-M3 — multilingual, strong general-purpose performance
- BGE-Reranker-v2-gemma — higher accuracy for complex queries
- Jina Reranker v2 — lightweight, fast
All models run in your own AWS or GCP environment — no data sent to external APIs. You can hot-swap models without downtime.
Frequently asked questions
Does using a reranker significantly increase latency? Reranking 100 candidates typically adds 50–200ms depending on model size and hardware. For most search and RAG applications this is acceptable given the accuracy gains. SIE’s GPU batching minimises this overhead.
Can I use a reranker without a vector database? Yes — you can rerank any list of documents, including keyword search results from Elasticsearch or BM25.
Do I need to fine-tune a reranker for my domain? Out-of-the-box rerankers perform well for general queries. For specialised domains (legal, medical, code), fine-tuned or LoRA-adapted rerankers improve significantly. SIE supports LoRA hot-loading for this purpose.