How Does Qdrant Work with Embedding Models?
Qdrant is an open-source vector database that stores embedding vectors alongside payload metadata and enables fast approximate nearest neighbour (ANN) search, filtered search, and hybrid (dense + sparse) search. It works with embedding models by receiving the vectors they produce — generated by SIE — and indexing them in an HNSW graph for millisecond-latency retrieval at scale.
Why Qdrant?
Qdrant is a strong default choice for production semantic search and RAG pipelines because:
- Written in Rust — low latency, high throughput, predictable performance under load
- Native hybrid search — combines dense vector search with sparse BM25-style search in one query
- Multi-vector support — stores ColBERT-style token vectors for late interaction retrieval
- Filterable ANN — filter by metadata without sacrificing recall (adaptive strategy selection)
- Open source + cloud — run self-hosted or use Qdrant Cloud
- Active development — among the fastest-evolving vector databases in the ecosystem
How Qdrant and SIE work together
SIE handles the encoding; Qdrant handles the storage and retrieval:
Documents → [SIE: BGE-M3] → vectors → [Qdrant: HNSW index] → storedQuery → [SIE: BGE-M3] → vector → [Qdrant: ANN search] → resultsFull pipeline example:
from sie_sdk import SIEClientfrom sie_sdk.types import Itemfrom qdrant_client import QdrantClientfrom qdrant_client.models import VectorParams, Distance, PointStruct
sie = SIEClient("http://localhost:8080")qdrant = QdrantClient("http://localhost:6333")
# 1. Create collectionqdrant.create_collection( collection_name="documents", vectors_config=VectorParams(size=1024, distance=Distance.COSINE))
# 2. Encode and index documentsencode_results = sie.encode("BAAI/bge-m3", [Item(text=c) for c in document_chunks])vectors = [r["dense"] for r in encode_results]
qdrant.upsert( collection_name="documents", points=[ PointStruct( id=i, vector=v.tolist(), payload={"text": chunk, "source": source, "date": date} ) for i, (v, chunk, source, date) in enumerate(zip(vectors, document_chunks, sources, dates)) ])
# 3. Searchquery_vector = sie.encode("BAAI/bge-m3", Item(text=user_query), is_query=True)["dense"]
results = qdrant.search( collection_name="documents", query_vector=query_vector, query_filter={"must": [{"key": "date", "range": {"gte": "2024-01-01"}}]}, limit=20)Hybrid search with Qdrant and SIE
BGE-M3 produces both dense and sparse vectors. Qdrant’s hybrid search combines them:
from sie_sdk.types import Itemfrom qdrant_client.models import NamedVector, NamedSparseVector, SparseVector
# Encode with both dense and sparse outputsquery_result = sie.encode( "BAAI/bge-m3", Item(text=user_query), output_types=["dense", "sparse"], is_query=True,)sparse = query_result["sparse"]
# Search with bothresults = qdrant.query_points( collection_name="documents", prefetch=[ # Dense retrieval {"query": query_result["dense"], "using": "dense", "limit": 50}, # Sparse retrieval {"query": SparseVector(indices=sparse["indices"], values=sparse["values"]), "using": "sparse", "limit": 50}, ], query={"fusion": "rrf"}, # Reciprocal Rank Fusion limit=20)Multi-vector (ColBERT) with Qdrant
Qdrant supports multi-vector storage for ColBERT-style late interaction retrieval:
from sie_sdk.types import Itemfrom qdrant_client.models import MultiVectorConfig, MultiVectorComparator
# Create collection with multi-vector supportqdrant.create_collection( collection_name="documents_colbert", vectors_config={ "colbert": MultiVectorConfig( size=128, distance=Distance.COSINE, multivector_config=MultiVectorComparator.MAX_SIM ) })
# Index ColBERT token vectorscolbert_results = sie.encode( "BAAI/bge-m3", [Item(text=d) for d in documents], output_types=["multivector"],)colbert_mvs = [r["multivector"] for r in colbert_results]# Upsert token vectors per documentQdrant configuration for production
Key settings to tune for production deployments:
# Collection with tuned HNSW parametersqdrant.create_collection( collection_name="production", vectors_config=VectorParams( size=1024, distance=Distance.COSINE, hnsw_config={"m": 16, "ef_construct": 128}, quantization_config={"scalar": {"type": "int8", "quantile": 0.99}} ))
# Set search ef at query timeresults = qdrant.search( collection_name="production", query_vector=query_vector, search_params={"hnsw_ef": 128, "exact": False}, limit=20)Quantisation (INT8) reduces memory by ~4× with minimal recall loss — recommended for large corpora.
Frequently asked questions
Does Qdrant support real-time updates? Yes. Qdrant’s HNSW index supports incremental inserts and deletes. New vectors are immediately searchable after insertion.
What is Qdrant’s payload filtering performance like? Qdrant uses an adaptive strategy that selects between pre-filtering and post-filtering based on filter selectivity. This typically maintains 95%+ recall even with highly selective filters.
Can I run Qdrant alongside SIE on the same infrastructure? Yes. Both can run in the same Kubernetes cluster. SIE handles the GPU workloads; Qdrant runs on CPU nodes. They communicate over the cluster’s internal network.