What is Hybrid Search?
Hybrid search combines dense vector (semantic) search with sparse keyword (BM25-style) search, merging results from both to produce a single ranked list. It captures the strengths of each approach — semantic understanding from dense retrieval and exact-term precision from sparse retrieval — making it more accurate than either method alone for most production use cases.
Why does hybrid search matter?
Neither dense nor sparse retrieval is universally better:
- Dense retrieval excels at understanding meaning, synonyms, and paraphrase — but struggles with rare terms, product codes, and proper nouns
- Sparse retrieval excels at exact keyword matching — but misses relevant documents that use different vocabulary
A user searching for “BGE-M3 throughput benchmark” needs both: semantic understanding of “throughput benchmark” and exact matching of the specific model name “BGE-M3”. Hybrid search handles both simultaneously.
In practice, hybrid search consistently outperforms either approach alone on retrieval benchmarks across diverse query types.
How does hybrid search work?
Hybrid search has two parallel retrieval paths that are then merged:
Query │ ├──► [Embedding model] ──► dense vector ──► [Vector DB: ANN search] ──► dense results │ └──► [Sparse encoder] ──► sparse vector ──► [Inverted index: BM25] ──► sparse results │ [Reciprocal Rank Fusion / weighted merge] │ Hybrid ranked list │ [Optional: Reranker] │ Final resultsThe merging step — often Reciprocal Rank Fusion (RRF) — combines the rank positions from both result sets into a single score without requiring score normalisation.
What is Reciprocal Rank Fusion (RRF)?
RRF is a simple, robust algorithm for merging ranked lists. For each document, its RRF score is:
RRF(doc) = Σ 1 / (k + rank_in_list)Where k is a constant (typically 60) and rank_in_list is the document’s position in each retrieval result. Documents that rank highly in multiple lists get boosted; documents that only appear in one list are scored more conservatively.
RRF works well in practice because it’s robust to score scale differences between dense and sparse systems.
How do you implement hybrid search with SIE?
BGE-M3 produces both dense and sparse vectors from a single model, making it ideal for hybrid search with SIE:
from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Encode documents with both dense and sparse vectorsresults = client.encode( "BAAI/bge-m3", [Item(text=d) for d in documents], output_types=["dense", "sparse"],)
dense_vectors = [r["dense"] for r in results] # for ANN searchsparse_vectors = [r["sparse"] for r in results] # for inverted index / BM25
# Most vector DBs (Qdrant, Weaviate) support hybrid search natively# Pass both vector types and let the DB handle mergingQdrant, Weaviate, and Chroma all support hybrid search with dense + sparse vectors. SIE integrates directly with each.
Dense vs sparse vs hybrid: when to use each
| Scenario | Best approach |
|---|---|
| Natural language questions | Dense or hybrid |
| Product code / SKU lookup | Sparse or hybrid |
| Multilingual queries | Dense (BGE-M3) or hybrid |
| Mixed query types in production | Hybrid |
| Tight latency budget | Dense only |
| Domain-specific terminology | Hybrid or sparse |
For most production search and RAG systems, hybrid search is the recommended default.
Hybrid search vs reranking: what’s the difference?
These are complementary, not competing:
- Hybrid search improves the retrieval step — finding better candidates from the full corpus
- Reranking improves the ranking step — more precisely ordering a shortlist of candidates
The optimal pipeline for high-accuracy systems is: hybrid retrieval → reranking → LLM generation (for RAG).
Frequently asked questions
Does hybrid search require two separate models? Not with BGE-M3. It produces dense and sparse representations from a single model, reducing infrastructure complexity. SIE hosts BGE-M3 for both outputs.
Is hybrid search significantly slower than dense-only search? The retrieval step adds minimal latency since both paths run in parallel. Encoding time is unchanged (same model, same call). The main overhead is the merging step, which is negligible.
Which vector databases support hybrid search? Qdrant, Weaviate, and Milvus all support hybrid search natively. SIE has integration guides for each.