What is Multi-Vector Search?
Multi-vector search is a retrieval technique where each document is represented by multiple vectors (one per token or passage) rather than a single fixed-size vector. At query time, the query’s token vectors are compared against all document token vectors, enabling fine-grained token-level matching that captures nuanced relevance signals that single-vector retrieval misses.
Why does multi-vector search matter?
Single-vector retrieval compresses an entire document into one vector, losing fine-grained detail in the process. A query about a specific clause in a legal contract, or a precise technical term in a research paper, may not match well against a document-level summary vector, even if the exact answer is present in the document.
Multi-vector search solves this by preserving token-level representations. The matching happens at the token level, so a specific query term can find its exact counterpart in a long document, even if the overall document is only partially relevant.
How does multi-vector search work?
Instead of pooling token representations into one vector:
- Encode document → retain one vector per token:
[v₁, v₂, ..., vₙ] - Encode query → retain one vector per token:
[q₁, q₂, ..., qₘ] - Score with MaxSim → for each query token, find its maximum similarity across all document tokens, then sum:
Score(Q, D) = Σᵢ max_j (qᵢ · dⱼ)This is the ColBERT scoring mechanism. Every query token gets matched to its best corresponding document token, and these scores are summed into a final relevance score.
Multi-vector vs single-vector vs sparse retrieval
| Single-vector | Multi-vector (ColBERT) | Sparse (BM25) | |
|---|---|---|---|
| Vectors per doc | 1 | N (one per token) | Vocab-size sparse |
| Captures semantics | ✓ | ✓ (token-level) | ✗ |
| Handles exact terms | ✗ | ✓ | ✓ |
| Storage cost | Low | High | Medium |
| Retrieval speed | Fastest | Slower | Fast |
| Accuracy | Good | Highest | Good for keywords |
Multi-vector retrieval achieves the highest accuracy but at significant storage cost: a 512-token document produces 512 vectors instead of 1.
What is BGE-M3’s multi-vector capability?
BGE-M3 is unique in supporting all three retrieval modes from a single model, including multi-vector. This means you can produce ColBERT-style multi-vector representations without a separate model:
from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Encode with multi-vector (ColBERT-style) outputresults = client.encode( "BAAI/bge-m3", [Item(text=d) for d in documents], output_types=["dense", "sparse", "multivector"],)
dense_vectors = [r["dense"] for r in results]sparse_vectors = [r["sparse"] for r in results]colbert_vectors = [r["multivector"] for r in results] # one [num_tokens, 128] array per docYou can then combine all three signals for maximum retrieval accuracy, the approach used in BGE-M3’s MIRACL and BEIR benchmark results.
When should you use multi-vector search?
Multi-vector retrieval is worth the extra storage and compute when:
- High-precision retrieval is critical: legal, medical, or compliance document search where missing a relevant clause has real consequences
- Long documents: single vectors compress too much information out of long texts; token-level matching preserves it
- Specific term lookup: when queries contain precise technical terms that need exact matching alongside semantic understanding
- You’re combining with reranking: use multi-vector for first-stage retrieval to maximise recall, then a reranker for precision
For most general-purpose search, single-vector with a reranker achieves comparable quality at lower infrastructure cost.
Storage considerations for multi-vector
A 512-token document produces 512 vectors of 128 dimensions each (ColBERT uses smaller per-token dimensions). For 1 million documents:
- Single-vector (768 dims, float32): ~3GB
- Multi-vector ColBERT (512 tokens × 128 dims): ~256GB
This is why multi-vector is used selectively, often for a high-value subset of your corpus, with single-vector covering the rest.
Qdrant and Weaviate both support multi-vector indexing natively.
Frequently asked questions
Is multi-vector search the same as ColBERT? ColBERT is the most prominent multi-vector retrieval architecture. Multi-vector search is the broader category; ColBERT is one implementation using late interaction (MaxSim scoring).
Can I use multi-vector retrieval with any vector database? Not all vector databases support multi-vector natively. Qdrant supports it via multi-vectors. Weaviate has ColBERT support. Check your vector DB’s documentation before committing to a multi-vector approach.
Does SIE support multi-vector encoding? Yes. BGE-M3 on SIE can return ColBERT-style token vectors alongside dense and sparse representations in a single encode call.