How to rerank search results with SIE

Reranking improves search quality by scoring query-document pairs with cross-attention. Unlike embedding similarity, a cross-encoder sees both the query and document together in a single forward pass, which enables deeper semantic matching and more accurate relevance scoring.

SIE’s score primitive wraps this in a single API call:

Python
TypeScript

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

query = Item(text="What is machine learning?")
items = [
    Item(text="Machine learning is a subset of AI that learns from data."),
    Item(text="The weather forecast predicts rain tomorrow."),
    Item(text="Deep neural networks power modern ML systems."),
]

result = client.score("BAAI/bge-reranker-v2-m3", query, items)
for entry in result["scores"]:
    print(f"Rank {entry['rank']}: {entry['score']:.3f}")

import { SIEClient } from "@superlinked/sie-sdk";

const client = new SIEClient("http://localhost:8080");

const query = { text: "What is machine learning?" };
const items = [
  { text: "Machine learning is a subset of AI that learns from data." },
  { text: "The weather forecast predicts rain tomorrow." },
  { text: "Deep neural networks power modern ML systems." },
];

const result = await client.score("BAAI/bge-reranker-v2-m3", query, items);
for (const entry of result.scores) {
  console.log(`Rank ${entry.rank}: ${entry.score.toFixed(3)}`);
}

await client.close();

For model recommendations, see the Reranker Models page or the full model catalog.

When Should I Use Reranking?

Use reranking when:

First-stage retrieval returns good candidates but imperfect ordering
You are retrieving 50 to 100 candidates and want only the top 10
Query-document relevance requires deep semantic understanding

Skip reranking when:

You need sub-10ms latency (reranking typically adds 20 to 100ms)
Your retrieval quality is already high enough
You are processing millions of documents (rerank a subset instead)

How Does Two-Stage Retrieval Work?

The standard pattern is to retrieve a broad set of candidates with embeddings, then rerank only the top candidates with a cross-encoder:

Python
TypeScript

# Stage 1: fast retrieval from your vector database
query_text = "What is machine learning?"
query_embedding = client.encode(
    "BAAI/bge-m3",
    Item(text=query_text),
    is_query=True,
)
# ...search your vector DB, get top 100 candidates...

# Stage 2: accurate reranking of those candidates
result = client.score(
    "BAAI/bge-reranker-v2-m3",
    query=Item(text=query_text),
    items=[Item(id=f"doc-{i}", text=doc["text"]) for i, doc in enumerate(top_100_docs)]
)

top_10_ids = [entry["item_id"] for entry in result["scores"][:10]]

// Stage 1: fast retrieval from your vector database
const queryText = "What is machine learning?";
const queryEmbedding = await client.encode(
  "BAAI/bge-m3",
  { text: queryText },
  { isQuery: true },
);
// ...search your vector DB, get top 100 candidates...

// Stage 2: accurate reranking of those candidates
const result = await client.score(
  "BAAI/bge-reranker-v2-m3",
  { text: queryText },
  top100Docs.map((doc, i) => ({ id: `doc-${i}`, text: doc.text })),
);

const top10Ids = result.scores.slice(0, 10).map(entry => entry.itemId);

This approach consistently improves quality without requiring you to rerank your entire corpus. See Reranker Models for recommended model pairings.

Response Format

The ScoreResult contains:

Field	Type	Description
`model`	`str`	Model used for scoring
`query_id`	`str or None`	Query ID if provided
`scores`	`list[ScoreEntry]`	Scored and ranked results, sorted by relevance

Each ScoreEntry contains:

Field	Type	Description
`item_id`	`str or None`	Document ID from input, or auto-generated as `item-N`
`score`	`float`	Relevance score (higher means more relevant)
`rank`	`int`	Rank position (0 is most relevant)

Using Item IDs to Track Results

Assign IDs to your items so you can map reranked results back to your original records:

Python
TypeScript

query = Item(id="q1", text="What is Python?")
items = [
    Item(id="doc-1", text="Python is a programming language."),
    Item(id="doc-2", text="Snakes are reptiles."),
    Item(id="doc-3", text="Python was created by Guido van Rossum."),
]
result = client.score("BAAI/bge-reranker-v2-m3", query, items)
for entry in result["scores"]:
    print(f"{entry['item_id']}: rank={entry['rank']}, score={entry['score']:.3f}")
# doc-1: rank=0, score=0.891
# doc-3: rank=1, score=0.756
# doc-2: rank=2, score=0.012

const query = { id: "q1", text: "What is Python?" };
const items = [
  { id: "doc-1", text: "Python is a programming language." },
  { id: "doc-2", text: "Snakes are reptiles." },
  { id: "doc-3", text: "Python was created by Guido van Rossum." },
];
const result = await client.score("BAAI/bge-reranker-v2-m3", query, items);
for (const entry of result.scores) {
  console.log(`${entry.itemId}: rank=${entry.rank}, score=${entry.score.toFixed(3)}`);
}
// doc-1: rank=0, score=0.891
// doc-3: rank=1, score=0.756
// doc-2: rank=2, score=0.012

HTTP API

The server defaults to msgpack. For JSON responses:

curl -X POST http://localhost:8080/v1/score/BAAI/bge-reranker-v2-m3 \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{
    "query": {"text": "What is machine learning?"},
    "items": [
      {"text": "Machine learning uses algorithms to learn from data."},
      {"text": "The weather is sunny today."}
    ]
  }'

See the HTTP API Reference.

Performance Considerations

Batch size matters. Cross-encoders process one query-document pair at a time. 100 documents means 100 forward passes, so keep candidate sets reasonable (50 to 200).

Latency vs quality. Smaller reranker models are faster but less accurate. Larger models like BAAI/bge-reranker-v2-m3 give better quality at higher latency. See Reranker Models for a comparison.

GPU batching. The SIE server batches concurrent requests automatically, so GPU utilisation improves under load.

Frequently Asked Questions

What is the difference between reranking and embedding similarity? Embedding similarity compares a query vector to document vectors independently, which is fast but less precise. Reranking (cross-encoding) processes the query and document together in one pass, allowing the model to attend to both simultaneously. This is slower but significantly more accurate.

When should I use ColBERT reranking instead of a cross-encoder? ColBERT (multi-vector reranking) is faster than cross-encoders because it pre-computes document representations. Use it when you need better-than-dense quality without the full latency of a cross-encoder. See Multi-Vector Reranking.

Which reranker model should I use? For English, mixedbread-ai/mxbai-rerank-large-v2 is a strong default. For multilingual reranking, use BAAI/bge-reranker-v2-m3. See the Reranker Models guide and model catalog.

Does SIE reranking work with LangChain or LlamaIndex? Yes. SIE reranking is available through the sie-langchain, sie-llamaindex, and sie-haystack integration packages. See Integrations for setup instructions.