Search & Retrieval

How Does Qdrant Work with Embedding Models?

Qdrant is an open-source vector database that stores embedding vectors alongside payload metadata and enables fast approximate nearest neighbour (ANN) search, filtered search, and hybrid (dense + sparse) search. It works with embedding models by receiving the vectors they produce (generated by SIE) and indexing them in an HNSW graph for millisecond-latency retrieval at scale.

Why Qdrant?

Qdrant is a strong default choice for production semantic search and RAG pipelines because:

Written in Rust: low latency, high throughput, predictable performance under load
Native hybrid search: combines dense vector search with sparse BM25-style search in one query
Multi-vector support: stores ColBERT-style token vectors for late interaction retrieval
Filterable ANN: filter by metadata without sacrificing recall (adaptive strategy selection)
Open source + cloud: run self-hosted or use Qdrant Cloud
Active development: among the fastest-evolving vector databases in the ecosystem

How Qdrant and SIE work together

SIE handles the encoding; Qdrant handles the storage and retrieval:

Documents → [SIE: BGE-M3] → vectors → [Qdrant: HNSW index] → stored
Query     → [SIE: BGE-M3] → vector  → [Qdrant: ANN search] → results

Full pipeline example:

from sie_sdk import SIEClient
from sie_sdk.types import Item
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct

sie = SIEClient("http://localhost:8080")
qdrant = QdrantClient("http://localhost:6333")

# 1. Create collection
qdrant.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE)
)

# 2. Encode and index documents
encode_results = sie.encode("BAAI/bge-m3", [Item(text=c) for c in document_chunks])
vectors = [r["dense"] for r in encode_results]

qdrant.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=i,
            vector=v.tolist(),
            payload={"text": chunk, "source": source, "date": date}
        )
        for i, (v, chunk, source, date) in enumerate(zip(vectors, document_chunks, sources, dates))
    ]
)

# 3. Search
query_vector = sie.encode("BAAI/bge-m3", Item(text=user_query), is_query=True)["dense"]

results = qdrant.search(
    collection_name="documents",
    query_vector=query_vector,
    query_filter={"must": [{"key": "date", "range": {"gte": "2024-01-01"}}]},
    limit=20
)

Hybrid search with Qdrant and SIE

BGE-M3 produces both dense and sparse vectors. Qdrant’s hybrid search combines them:

from sie_sdk.types import Item
from qdrant_client.models import NamedVector, NamedSparseVector, SparseVector

# Encode with both dense and sparse outputs
query_result = sie.encode(
    "BAAI/bge-m3",
    Item(text=user_query),
    output_types=["dense", "sparse"],
    is_query=True,
)
sparse = query_result["sparse"]

# Search with both
results = qdrant.query_points(
    collection_name="documents",
    prefetch=[
        # Dense retrieval
        {"query": query_result["dense"], "using": "dense", "limit": 50},
        # Sparse retrieval
        {"query": SparseVector(indices=sparse["indices"], values=sparse["values"]),
         "using": "sparse", "limit": 50},
    ],
    query={"fusion": "rrf"},  # Reciprocal Rank Fusion
    limit=20
)

Multi-vector (ColBERT) with Qdrant

Qdrant supports multi-vector storage for ColBERT-style late interaction retrieval:

from sie_sdk.types import Item
from qdrant_client.models import MultiVectorConfig, MultiVectorComparator

# Create collection with multi-vector support
qdrant.create_collection(
    collection_name="documents_colbert",
    vectors_config={
        "colbert": MultiVectorConfig(
            size=128,
            distance=Distance.COSINE,
            multivector_config=MultiVectorComparator.MAX_SIM
        )
    }
)

# Index ColBERT token vectors
colbert_results = sie.encode(
    "BAAI/bge-m3",
    [Item(text=d) for d in documents],
    output_types=["multivector"],
)
colbert_mvs = [r["multivector"] for r in colbert_results]
# Upsert token vectors per document

Qdrant configuration for production

Key settings to tune for production deployments:

# Collection with tuned HNSW parameters
qdrant.create_collection(
    collection_name="production",
    vectors_config=VectorParams(
        size=1024,
        distance=Distance.COSINE,
        hnsw_config={"m": 16, "ef_construct": 128},
        quantization_config={"scalar": {"type": "int8", "quantile": 0.99}}
    )
)

# Set search ef at query time
results = qdrant.search(
    collection_name="production",
    query_vector=query_vector,
    search_params={"hnsw_ef": 128, "exact": False},
    limit=20
)

Quantisation (INT8) reduces memory by ~4× with minimal recall loss, and is recommended for large corpora.

Frequently asked questions

Does Qdrant support real-time updates? Yes. Qdrant’s HNSW index supports incremental inserts and deletes. New vectors are immediately searchable after insertion.

What is Qdrant’s payload filtering performance like? Qdrant uses an adaptive strategy that selects between pre-filtering and post-filtering based on filter selectivity. This typically maintains 95%+ recall even with highly selective filters.

Can I run Qdrant alongside SIE on the same infrastructure? Yes. Both can run in the same Kubernetes cluster. SIE handles the GPU workloads; Qdrant runs on CPU nodes. They communicate over the cluster’s internal network.