---
title: What is Multi-Vector Search?
description: "Multi-vector search is a retrieval technique where each document is represented by multiple vectors (one per token or passage) rather than a single fixed-size vector. At query time, the query's token vectors are compared against all document token vectors, enabling fine-grained token-level matching that captures n..."
canonical_url: https://superlinked.com/glossary/what-is-multi-vector-search
last_updated: 2026-06-09
---

# What is Multi-Vector Search?

Multi-vector search is a retrieval technique where each document is represented by multiple vectors (one per token or passage) rather than a single fixed-size vector. At query time, the query's token vectors are compared against all document token vectors, enabling fine-grained token-level matching that captures nuanced relevance signals that single-vector retrieval misses.

---

## Why does multi-vector search matter?

Single-vector retrieval compresses an entire document into one vector, losing fine-grained detail in the process. A query about a specific clause in a legal contract, or a precise technical term in a research paper, may not match well against a document-level summary vector, even if the exact answer is present in the document.

Multi-vector search solves this by preserving token-level representations. The matching happens at the token level, so a specific query term can find its exact counterpart in a long document, even if the overall document is only partially relevant.

---

## How does multi-vector search work?

Instead of pooling token representations into one vector:

1. **Encode document** → retain one vector per token: `[v₁, v₂, ..., vₙ]`
2. **Encode query** → retain one vector per token: `[q₁, q₂, ..., qₘ]`
3. **Score with MaxSim** → for each query token, find its maximum similarity across all document tokens, then sum:

```
Score(Q, D) = Σᵢ max_j (qᵢ · dⱼ)
```

This is the **ColBERT** scoring mechanism. Every query token gets matched to its best corresponding document token, and these scores are summed into a final relevance score.

---

## Multi-vector vs single-vector vs sparse retrieval

| | Single-vector | Multi-vector (ColBERT) | Sparse (BM25) |
|---|---|---|---|
| Vectors per doc | 1 | N (one per token) | Vocab-size sparse |
| Captures semantics | ✓ | ✓ (token-level) | ✗ |
| Handles exact terms | ✗ | ✓ | ✓ |
| Storage cost | Low | High | Medium |
| Retrieval speed | Fastest | Slower | Fast |
| Accuracy | Good | Highest | Good for keywords |

Multi-vector retrieval achieves the highest accuracy but at significant storage cost: a 512-token document produces 512 vectors instead of 1.

---

## What is BGE-M3's multi-vector capability?

BGE-M3 is unique in supporting all three retrieval modes from a single model, including multi-vector. This means you can produce ColBERT-style multi-vector representations without a separate model:

```python
from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

# Encode with multi-vector (ColBERT-style) output
results = client.encode(
    "BAAI/bge-m3",
    [Item(text=d) for d in documents],
    output_types=["dense", "sparse", "multivector"],
)

dense_vectors = [r["dense"] for r in results]
sparse_vectors = [r["sparse"] for r in results]
colbert_vectors = [r["multivector"] for r in results]  # one [num_tokens, 128] array per doc
```

You can then combine all three signals for maximum retrieval accuracy, the approach used in BGE-M3's MIRACL and BEIR benchmark results.

---

## When should you use multi-vector search?

Multi-vector retrieval is worth the extra storage and compute when:

- **High-precision retrieval is critical**: legal, medical, or compliance document search where missing a relevant clause has real consequences
- **Long documents**: single vectors compress too much information out of long texts; token-level matching preserves it
- **Specific term lookup**: when queries contain precise technical terms that need exact matching alongside semantic understanding
- **You're combining with reranking**: use multi-vector for first-stage retrieval to maximise recall, then a reranker for precision

For most general-purpose search, single-vector with a reranker achieves comparable quality at lower infrastructure cost.

---

## Storage considerations for multi-vector

A 512-token document produces 512 vectors of 128 dimensions each (ColBERT uses smaller per-token dimensions). For 1 million documents:

- Single-vector (768 dims, float32): ~3GB
- Multi-vector ColBERT (512 tokens × 128 dims): ~256GB

This is why multi-vector is used selectively, often for a high-value subset of your corpus, with single-vector covering the rest.

Qdrant and Weaviate both support multi-vector indexing natively.

---

## Frequently asked questions

**Is multi-vector search the same as ColBERT?**
ColBERT is the most prominent multi-vector retrieval architecture. Multi-vector search is the broader category; ColBERT is one implementation using late interaction (MaxSim scoring).

**Can I use multi-vector retrieval with any vector database?**
Not all vector databases support multi-vector natively. Qdrant supports it via multi-vectors. Weaviate has ColBERT support. Check your vector DB's documentation before committing to a multi-vector approach.

**Does SIE support multi-vector encoding?**
Yes. BGE-M3 on SIE can return ColBERT-style token vectors alongside dense and sparse representations in a single encode call.

---

## Related resources

- [What is ColBERT?](/glossary/what-is-colbert)
- [What is BGE-M3?](/glossary/what-is-bge-m3)
- [What is hybrid search?](/glossary/what-is-hybrid-search)
- [What is semantic search?](/glossary/what-is-semantic-search)
- [Browse models on SIE](/models)
- [SIE + Qdrant integration](/docs/integrations/qdrant)
