What is a Vector Index?
A vector index is a data structure that organises high-dimensional vectors to enable fast approximate nearest neighbour (ANN) search. Instead of comparing a query vector to every stored vector, the index groups or graphs vectors in ways that allow search to skip irrelevant regions of the vector space — reducing query time from linear to sub-linear. HNSW and IVF are the two most widely used vector index types in production systems.
Why do vector indexes matter?
Without an index, searching for the nearest vector among 10 million 768-dimensional vectors requires computing 10 million dot products per query — too slow for real-time search. A vector index pre-organises the vectors so that search can focus on the most likely relevant region of the space, typically reducing comparisons by 99%+ while finding 95–99% of the true nearest neighbours.
Choosing and configuring the right index directly affects:
- Query latency — how fast each search request completes
- Recall — how many relevant results are returned
- Memory usage — how much RAM the index requires
- Index build time — how long it takes to index new vectors
HNSW: the dominant production index
HNSW (Hierarchical Navigable Small World) builds a multi-layer proximity graph. Vectors with similar neighbours are connected by edges. Search navigates from a sparse top layer down to a dense bottom layer:
Layer 2: sparse graph — few nodes, long-range connectionsLayer 1: medium graph — more nodes, medium connectionsLayer 0: dense graph — all nodes, short-range connections
Search: enter at layer 2, greedily navigate to nearest node, drop to layer 1, repeat, drop to layer 0, find k-NNWhen to use HNSW:
- Production semantic search with real-time inserts
- Corpus size up to ~100M vectors
- When recall > 95% is required
- When you need to add new vectors without rebuilding the index
Key parameters:
M— number of bidirectional edges per node (16–32 typical). Higher = better recall, more memory.ef_construction— beam width during index build (128–200 typical). Higher = better index quality, slower build.ef— beam width during search. Higher = better recall, slower query.
IVF: cluster-based indexing
IVF (Inverted File Index) partitions vectors into clusters using k-means, then at query time only searches the nearest clusters:
Build: k-means clustering → nlist centroids + inverted listsQuery: find nprobe nearest centroids → search their inverted listsWhen to use IVF:
- Very large corpora (100M–1B+ vectors)
- Batch indexing workflows (corpus changes infrequently)
- When memory is constrained (IVF uses less memory than HNSW)
Key parameters:
nlist— number of clusters. Recommended: sqrt(n) to 4×sqrt(n).nprobe— clusters searched per query. Higher = better recall, slower.
Product Quantisation (PQ): memory compression
PQ compresses vectors by dividing them into sub-vectors and replacing each with a codebook index. A 768-dimensional float32 vector (3KB) can be compressed to 96 bytes — a 32× reduction.
Often combined with IVF as IVF-PQ for billion-scale retrieval where memory is the primary constraint.
Trade-off: significant accuracy loss compared to full-precision HNSW. Use only when memory constraints make uncompressed indexing infeasible.
Index types by use case
| Scenario | Recommended index | Why |
|---|---|---|
| General production search | HNSW | Best recall-latency balance, incremental inserts |
| Real-time inserts | HNSW | Supports incremental updates |
| 100M–1B vectors | IVF-HNSW or IVF-PQ | HNSW memory too high at this scale |
| Memory-constrained | IVF-PQ | 32× compression |
| Development / small corpus | Flat (exact) | No approximation needed under ~100K vectors |
How vector indexes are managed in SIE pipelines
SIE produces the vectors; your vector database manages the index. When setting up a new collection:
from qdrant_client import QdrantClientfrom qdrant_client.models import VectorParams, Distance
qdrant = QdrantClient("http://localhost:6333")
# Create collection with HNSW configqdrant.create_collection( collection_name="documents", vectors_config=VectorParams( size=1024, # BGE-M3 output dimension distance=Distance.COSINE, hnsw_config={ "m": 16, "ef_construct": 128 } ))At query time, set ef (search beam width) to balance recall vs latency for your specific requirements.
Frequently asked questions
Do I need to rebuild the index when adding new vectors? With HNSW, no — new vectors are inserted into the graph incrementally. With IVF, the index must be rebuilt or the new vectors temporarily use a flat fallback index. This is why HNSW dominates production systems with frequent updates.
What happens to index performance as the corpus grows?
HNSW query time grows slowly (O(log n)) as corpus size increases. IVF query time is more predictable (searching a fixed number of clusters) but recall may degrade if nlist isn’t scaled with corpus size.
How do I choose between cosine similarity and dot product distance? Cosine similarity is equivalent to dot product when vectors are L2-normalised. SIE normalises output vectors by default, so both are equivalent. Cosine is safer for robustness if vector magnitude varies.