How to rerank search results with SIE
Reranking improves search quality by scoring query-document pairs with cross-attention. Unlike embedding similarity, a cross-encoder sees both the query and document together in a single forward pass, which enables deeper semantic matching and more accurate relevance scoring.
SIE’s score primitive wraps this in a single API call:
from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
query = Item(text="What is machine learning?")items = [ Item(text="Machine learning is a subset of AI that learns from data."), Item(text="The weather forecast predicts rain tomorrow."), Item(text="Deep neural networks power modern ML systems."),]
result = client.score("BAAI/bge-reranker-v2-m3", query, items)for entry in result["scores"]: print(f"Rank {entry['rank']}: {entry['score']:.3f}")import { SIEClient } from "@superlinked/sie-sdk";
const client = new SIEClient("http://localhost:8080");
const query = { text: "What is machine learning?" };const items = [ { text: "Machine learning is a subset of AI that learns from data." }, { text: "The weather forecast predicts rain tomorrow." }, { text: "Deep neural networks power modern ML systems." },];
const result = await client.score("BAAI/bge-reranker-v2-m3", query, items);for (const entry of result.scores) { console.log(`Rank ${entry.rank}: ${entry.score.toFixed(3)}`);}
await client.close();For model recommendations, see the Reranker Models page or the full model catalog.
When Should I Use Reranking?
Section titled “When Should I Use Reranking?”Use reranking when:
- First-stage retrieval returns good candidates but imperfect ordering
- You are retrieving 50 to 100 candidates and want only the top 10
- Query-document relevance requires deep semantic understanding
Skip reranking when:
- You need sub-10ms latency (reranking typically adds 20 to 100ms)
- Your retrieval quality is already high enough
- You are processing millions of documents (rerank a subset instead)
How Does Two-Stage Retrieval Work?
Section titled “How Does Two-Stage Retrieval Work?”The standard pattern is to retrieve a broad set of candidates with embeddings, then rerank only the top candidates with a cross-encoder:
# Stage 1: fast retrieval from your vector databasequery_text = "What is machine learning?"query_embedding = client.encode( "BAAI/bge-m3", Item(text=query_text), is_query=True,)# ...search your vector DB, get top 100 candidates...
# Stage 2: accurate reranking of those candidatesresult = client.score( "BAAI/bge-reranker-v2-m3", query=Item(text=query_text), items=[Item(id=f"doc-{i}", text=doc["text"]) for i, doc in enumerate(top_100_docs)])
top_10_ids = [entry["item_id"] for entry in result["scores"][:10]]// Stage 1: fast retrieval from your vector databaseconst queryText = "What is machine learning?";const queryEmbedding = await client.encode( "BAAI/bge-m3", { text: queryText }, { isQuery: true },);// ...search your vector DB, get top 100 candidates...
// Stage 2: accurate reranking of those candidatesconst result = await client.score( "BAAI/bge-reranker-v2-m3", { text: queryText }, top100Docs.map((doc, i) => ({ id: `doc-${i}`, text: doc.text })),);
const top10Ids = result.scores.slice(0, 10).map(entry => entry.itemId);This approach consistently improves quality without requiring you to rerank your entire corpus. See Reranker Models for recommended model pairings.
Response Format
Section titled “Response Format”The ScoreResult contains:
| Field | Type | Description |
|---|---|---|
model | str | Model used for scoring |
query_id | str or None | Query ID if provided |
scores | list[ScoreEntry] | Scored and ranked results, sorted by relevance |
Each ScoreEntry contains:
| Field | Type | Description |
|---|---|---|
item_id | str or None | Document ID from input, or auto-generated as item-N |
score | float | Relevance score (higher means more relevant) |
rank | int | Rank position (0 is most relevant) |
Using Item IDs to Track Results
Section titled “Using Item IDs to Track Results”Assign IDs to your items so you can map reranked results back to your original records:
query = Item(id="q1", text="What is Python?")items = [ Item(id="doc-1", text="Python is a programming language."), Item(id="doc-2", text="Snakes are reptiles."), Item(id="doc-3", text="Python was created by Guido van Rossum."),]result = client.score("BAAI/bge-reranker-v2-m3", query, items)for entry in result["scores"]: print(f"{entry['item_id']}: rank={entry['rank']}, score={entry['score']:.3f}")# doc-1: rank=0, score=0.891# doc-3: rank=1, score=0.756# doc-2: rank=2, score=0.012const query = { id: "q1", text: "What is Python?" };const items = [ { id: "doc-1", text: "Python is a programming language." }, { id: "doc-2", text: "Snakes are reptiles." }, { id: "doc-3", text: "Python was created by Guido van Rossum." },];const result = await client.score("BAAI/bge-reranker-v2-m3", query, items);for (const entry of result.scores) { console.log(`${entry.itemId}: rank=${entry.rank}, score=${entry.score.toFixed(3)}`);}// doc-1: rank=0, score=0.891// doc-3: rank=1, score=0.756// doc-2: rank=2, score=0.012HTTP API
Section titled “HTTP API”The server defaults to msgpack. For JSON responses:
curl -X POST http://localhost:8080/v1/score/BAAI/bge-reranker-v2-m3 \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -d '{ "query": {"text": "What is machine learning?"}, "items": [ {"text": "Machine learning uses algorithms to learn from data."}, {"text": "The weather is sunny today."} ] }'See the HTTP API Reference.
Performance Considerations
Section titled “Performance Considerations”Batch size matters. Cross-encoders process one query-document pair at a time. 100 documents means 100 forward passes, so keep candidate sets reasonable (50 to 200).
Latency vs quality. Smaller reranker models are faster but less accurate. Larger models like BAAI/bge-reranker-v2-m3 give better quality at higher latency. See Reranker Models for a comparison.
GPU batching. The SIE server batches concurrent requests automatically, so GPU utilisation improves under load.
Frequently Asked Questions
Section titled “Frequently Asked Questions”What is the difference between reranking and embedding similarity? Embedding similarity compares a query vector to document vectors independently, which is fast but less precise. Reranking (cross-encoding) processes the query and document together in one pass, allowing the model to attend to both simultaneously. This is slower but significantly more accurate.
When should I use ColBERT reranking instead of a cross-encoder? ColBERT (multi-vector reranking) is faster than cross-encoders because it pre-computes document representations. Use it when you need better-than-dense quality without the full latency of a cross-encoder. See Multi-Vector Reranking.
Which reranker model should I use?
For English, mixedbread-ai/mxbai-rerank-large-v2 is a strong default. For multilingual reranking, use BAAI/bge-reranker-v2-m3. See the Reranker Models guide and model catalog.
Does SIE reranking work with LangChain or LlamaIndex?
Yes. SIE reranking is available through the sie-langchain, sie-llamaindex, and sie-haystack integration packages. See Integrations for setup instructions.