Search & Retrieval

What is a Vector Index?

A vector index is a data structure that organises high-dimensional vectors to enable fast approximate nearest neighbour (ANN) search. Instead of comparing a query vector to every stored vector, the index groups or graphs vectors in ways that allow search to skip irrelevant regions of the vector space, reducing query time from linear to sub-linear. HNSW and IVF are the two most widely used vector index types in production systems.

Why do vector indexes matter?

Without an index, searching for the nearest vector among 10 million 768-dimensional vectors requires computing 10 million dot products per query, too slow for real-time search. A vector index pre-organises the vectors so that search can focus on the most likely relevant region of the space, typically reducing comparisons by 99%+ while finding 95-99% of the true nearest neighbours.

Choosing and configuring the right index directly affects:

Query latency: how fast each search request completes
Recall: how many relevant results are returned
Memory usage: how much RAM the index requires
Index build time: how long it takes to index new vectors

HNSW: the dominant production index

HNSW (Hierarchical Navigable Small World) builds a multi-layer proximity graph. Vectors with similar neighbours are connected by edges. Search navigates from a sparse top layer down to a dense bottom layer:

Layer 2:  sparse graph — few nodes, long-range connections
Layer 1:  medium graph — more nodes, medium connections
Layer 0:  dense graph — all nodes, short-range connections

Search: enter at layer 2, greedily navigate to nearest node,
        drop to layer 1, repeat, drop to layer 0, find k-NN

When to use HNSW:

Production semantic search with real-time inserts
Corpus size up to ~100M vectors
When recall > 95% is required
When you need to add new vectors without rebuilding the index

Key parameters:

M: number of bidirectional edges per node (16-32 typical). Higher = better recall, more memory.
ef_construction: beam width during index build (128-200 typical). Higher = better index quality, slower build.
ef: beam width during search. Higher = better recall, slower query.

IVF: cluster-based indexing

IVF (Inverted File Index) partitions vectors into clusters using k-means, then at query time only searches the nearest clusters:

Build: k-means clustering → nlist centroids + inverted lists
Query: find nprobe nearest centroids → search their inverted lists

When to use IVF:

Very large corpora (100M-1B+ vectors)
Batch indexing workflows (corpus changes infrequently)
When memory is constrained (IVF uses less memory than HNSW)

Key parameters:

nlist: number of clusters. Recommended: sqrt(n) to 4×sqrt(n).
nprobe: clusters searched per query. Higher = better recall, slower.

Product Quantisation (PQ): memory compression

PQ compresses vectors by dividing them into sub-vectors and replacing each with a codebook index. A 768-dimensional float32 vector (3KB) can be compressed to 96 bytes, a 32× reduction.

Often combined with IVF as IVF-PQ for billion-scale retrieval where memory is the primary constraint.

Trade-off: significant accuracy loss compared to full-precision HNSW. Use only when memory constraints make uncompressed indexing infeasible.

Index types by use case

Scenario	Recommended index	Why
General production search	HNSW	Best recall-latency balance, incremental inserts
Real-time inserts	HNSW	Supports incremental updates
100M-1B vectors	IVF-HNSW or IVF-PQ	HNSW memory too high at this scale
Memory-constrained	IVF-PQ	32× compression
Development / small corpus	Flat (exact)	No approximation needed under ~100K vectors

How vector indexes are managed in SIE pipelines

SIE produces the vectors; your vector database manages the index. When setting up a new collection:

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance

qdrant = QdrantClient("http://localhost:6333")

# Create collection with HNSW config
qdrant.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1024,           # BGE-M3 output dimension
        distance=Distance.COSINE,
        hnsw_config={
            "m": 16,
            "ef_construct": 128
        }
    )
)

At query time, set ef (search beam width) to balance recall vs latency for your specific requirements.

Frequently asked questions

Do I need to rebuild the index when adding new vectors? With HNSW, no: new vectors are inserted into the graph incrementally. With IVF, the index must be rebuilt or the new vectors temporarily use a flat fallback index. This is why HNSW dominates production systems with frequent updates.

What happens to index performance as the corpus grows? HNSW query time grows slowly (O(log n)) as corpus size increases. IVF query time is more predictable (searching a fixed number of clusters) but recall may degrade if nlist isn’t scaled with corpus size.

How do I choose between cosine similarity and dot product distance? Cosine similarity is equivalent to dot product when vectors are L2-normalised. SIE normalises output vectors by default, so both are equivalent. Cosine is safer for robustness if vector magnitude varies.