Why did we open-source our inference engine? Read the post
← All Glossary Articles

What is Hybrid Search?

Hybrid search combines dense vector (semantic) search with sparse keyword (BM25-style) search, merging results from both to produce a single ranked list. It captures the strengths of each approach — semantic understanding from dense retrieval and exact-term precision from sparse retrieval — making it more accurate than either method alone for most production use cases.


Why does hybrid search matter?

Neither dense nor sparse retrieval is universally better:

  • Dense retrieval excels at understanding meaning, synonyms, and paraphrase — but struggles with rare terms, product codes, and proper nouns
  • Sparse retrieval excels at exact keyword matching — but misses relevant documents that use different vocabulary

A user searching for “BGE-M3 throughput benchmark” needs both: semantic understanding of “throughput benchmark” and exact matching of the specific model name “BGE-M3”. Hybrid search handles both simultaneously.

In practice, hybrid search consistently outperforms either approach alone on retrieval benchmarks across diverse query types.


How does hybrid search work?

Hybrid search has two parallel retrieval paths that are then merged:

Query
├──► [Embedding model] ──► dense vector ──► [Vector DB: ANN search] ──► dense results
└──► [Sparse encoder] ──► sparse vector ──► [Inverted index: BM25] ──► sparse results
[Reciprocal Rank Fusion / weighted merge]
Hybrid ranked list
[Optional: Reranker]
Final results

The merging step — often Reciprocal Rank Fusion (RRF) — combines the rank positions from both result sets into a single score without requiring score normalisation.


What is Reciprocal Rank Fusion (RRF)?

RRF is a simple, robust algorithm for merging ranked lists. For each document, its RRF score is:

RRF(doc) = Σ 1 / (k + rank_in_list)

Where k is a constant (typically 60) and rank_in_list is the document’s position in each retrieval result. Documents that rank highly in multiple lists get boosted; documents that only appear in one list are scored more conservatively.

RRF works well in practice because it’s robust to score scale differences between dense and sparse systems.


How do you implement hybrid search with SIE?

BGE-M3 produces both dense and sparse vectors from a single model, making it ideal for hybrid search with SIE:

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Encode documents with both dense and sparse vectors
results = client.encode(
"BAAI/bge-m3",
[Item(text=d) for d in documents],
output_types=["dense", "sparse"],
)
dense_vectors = [r["dense"] for r in results] # for ANN search
sparse_vectors = [r["sparse"] for r in results] # for inverted index / BM25
# Most vector DBs (Qdrant, Weaviate) support hybrid search natively
# Pass both vector types and let the DB handle merging

Qdrant, Weaviate, and Chroma all support hybrid search with dense + sparse vectors. SIE integrates directly with each.


Dense vs sparse vs hybrid: when to use each

ScenarioBest approach
Natural language questionsDense or hybrid
Product code / SKU lookupSparse or hybrid
Multilingual queriesDense (BGE-M3) or hybrid
Mixed query types in productionHybrid
Tight latency budgetDense only
Domain-specific terminologyHybrid or sparse

For most production search and RAG systems, hybrid search is the recommended default.


Hybrid search vs reranking: what’s the difference?

These are complementary, not competing:

  • Hybrid search improves the retrieval step — finding better candidates from the full corpus
  • Reranking improves the ranking step — more precisely ordering a shortlist of candidates

The optimal pipeline for high-accuracy systems is: hybrid retrieval → reranking → LLM generation (for RAG).


Frequently asked questions

Does hybrid search require two separate models? Not with BGE-M3. It produces dense and sparse representations from a single model, reducing infrastructure complexity. SIE hosts BGE-M3 for both outputs.

Is hybrid search significantly slower than dense-only search? The retrieval step adds minimal latency since both paths run in parallel. Encoding time is unchanged (same model, same call). The main overhead is the merging step, which is negligible.

Which vector databases support hybrid search? Qdrant, Weaviate, and Milvus all support hybrid search natively. SIE has integration guides for each.


Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.