---
title: What is a Reranking Pipeline?
description: A reranking pipeline is a two-stage retrieval architecture where a fast first-stage retriever (embedding model + vector DB) fetches a broad set of candidates, and a slower but more accurate second-stage reranker (cross-encoder) re-scores and reorders them. The result is retrieval that combines the scalability of ANN...
canonical_url: https://superlinked.com/glossary/what-is-a-reranking-pipeline
last_updated: 2026-06-11
---

# What is a Reranking Pipeline?

A reranking pipeline is a two-stage retrieval architecture where a fast first-stage retriever (embedding model + vector DB) fetches a broad set of candidates, and a slower but more accurate second-stage reranker (cross-encoder) re-scores and reorders them. The result is retrieval that combines the scalability of ANN search with the precision of deep query-document interaction. This is the standard approach for production search and RAG systems requiring high accuracy.

---

## Why use a reranking pipeline instead of just a retriever?

Embedding models are bi-encoders: they encode query and documents independently and compare vectors. This is fast and scalable, but misses fine-grained relevance signals that require seeing the query and document together.

A reranker processes them jointly and catches what the retriever misses:
- A document about "Python snakes" retrieved for a query about "Python programming"
- A legal clause that is semantically nearby but not specifically relevant to the query's precise requirement
- A document that matches the topic but answers a different question

Adding a reranker to the pipeline improves precision significantly at the cost of extra latency (typically 50-200ms), which is acceptable for most production search and RAG workloads.

---

## How does a reranking pipeline work?

```
User query
    │
    ▼
[Embedding model]   ← encodes query to vector
    │
    ▼
[Vector database]   ← ANN search, returns top-100 candidates
    │
    ▼
[Reranker]          ← scores each (query, candidate) pair jointly
    │
    ▼
Top-K reranked results
    │
    ▼
[LLM / answer generation]  (for RAG)
```

The reranker only processes the top-N candidates from first-stage retrieval (typically 20-100), not the full corpus, making it tractable despite its higher per-pair cost.

---

## Building a reranking pipeline with SIE

```python
from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

query = "What are the indemnification obligations in a SaaS agreement?"

# Stage 1: fast embedding retrieval
query_vector = client.encode("BAAI/bge-m3", Item(text=query), is_query=True)["dense"]
candidates = vector_db.search(query_vector, top_k=50)
candidate_texts = [c.text for c in candidates]

# Stage 2: rerank with cross-encoder
score_result = client.score(
    "BAAI/bge-reranker-v2-m3",
    Item(text=query),
    [Item(id=str(i), text=t) for i, t in enumerate(candidate_texts)],
)
id_to_text = {str(i): t for i, t in enumerate(candidate_texts)}

# Take top 5 for LLM context (scores sorted by relevance, rank 0 = best)
top_chunks = [id_to_text[e["item_id"]] for e in score_result["scores"][:5]]
```

Both the embedding model and reranker run on the same SIE cluster: one deployment, two models, all within your cloud account.

---

## First-stage retrieval size: how many candidates to fetch?

The number of candidates passed to the reranker affects both quality and latency:

| Candidates (k) | Recall improvement | Reranker latency |
|---|---|---|
| 10 | Baseline | ~20ms |
| 20 | +5-10% | ~40ms |
| 50 | +10-15% | ~100ms |
| 100 | +12-18% | ~200ms |

More candidates → higher recall (more relevant docs in the pool) → better reranker output. The diminishing returns typically plateau around 50-100 candidates. For latency-sensitive applications, 20-50 is a practical sweet spot.

---

## Which reranker should you use?

| Model | Size | Accuracy | Latency | Best for |
|---|---|---|---|---|
| BGE-reranker-base | 110M | Good | Fast | High-throughput production |
| BGE-reranker-large | 335M | Better | Medium | Balanced production |
| BGE-reranker-v2-m3 | 570M | High | Medium | Multilingual |
| BGE-reranker-v2-gemma | 2.5B | Highest | Slower | Maximum accuracy |
| Jina Reranker v2 | 137M | Good | Fast | Lightweight option |

For most production RAG systems, BGE-reranker-v2-m3 provides the best accuracy-latency trade-off, especially if your content is multilingual.

---

## Reranking pipeline vs hybrid search: which to prioritise?

These are complementary, not competing:

- **Hybrid search** improves first-stage recall: more relevant documents enter the candidate pool
- **Reranking** improves precision: the right documents are at the top of the final list

The optimal pipeline for high-accuracy production systems is:

```
Hybrid retrieval (dense + sparse) → Reranker → LLM generation
```

Start with a dense retrieval + reranker pipeline. Add hybrid search once you've validated the retrieval quality improvement justifies the additional complexity.

---

## Measuring reranking pipeline quality

| Metric | What it measures |
|---|---|
| NDCG@K | Quality of ranking, relevant docs scored higher |
| MRR@K | How high the first relevant result appears |
| Precision@K | Of top-K results, fraction that are relevant |
| Recall@K (pre-rerank) | Coverage before reranking: are relevant docs in the pool? |

Measure recall@100 before reranking and precision@5 after. This tells you whether your first-stage retrieval is finding relevant docs (recall) and whether your reranker is surfacing them at the top (precision).

---

## Frequently asked questions

**Does reranking significantly increase latency?**
With 50 candidates and BGE-reranker-v2-m3 on SIE's GPU, reranking adds ~80-120ms. For most search and RAG applications this is acceptable given the precision gains. For sub-50ms latency requirements, use a smaller reranker or fewer candidates.

**Can I use a reranker without a vector database?**
Yes. You can pass any list of documents to the reranker: BM25 results, keyword search results, or a manually curated list. The reranker doesn't care how the candidates were retrieved.

**Should the reranker model match the embedding model?**
No, they operate independently. Using BGE-M3 for embedding and BGE-reranker-v2-gemma for reranking is a valid and high-performing combination.

---

## Related resources

- [What is a reranker?](/glossary/what-is-a-reranker)
- [What is RAG?](/glossary/what-is-rag)
- [What is hybrid search?](/glossary/what-is-hybrid-search)
- [Regulatory Intelligence RAG example](/docs/examples/regulatory-intelligence-rag)
- [Browse reranker models on SIE](/models)
