---
title: What is a Vector Database?
description: A vector database is a database purpose-built for storing, indexing, and querying high-dimensional numerical vectors. Unlike traditional databases that query by exact value or keyword, vector databases find the nearest vectors to a query vector using approximate nearest neighbour (ANN) algorithms — enabling semantic...
canonical_url: https://superlinked.com/glossary/what-is-a-vector-database
last_updated: 2026-06-02
---

# What is a Vector Database?

A vector database is a database purpose-built for storing, indexing, and querying high-dimensional numerical vectors. Unlike traditional databases that query by exact value or keyword, vector databases find the nearest vectors to a query vector using approximate nearest neighbour (ANN) algorithms — enabling semantic similarity search at scale. They are the storage and retrieval layer in semantic search and RAG systems.

---

## Why do vector databases exist?

Standard databases (PostgreSQL, MongoDB, Elasticsearch) are not designed for similarity search over millions of high-dimensional vectors. Exact nearest neighbour search over 768-dimensional vectors is computationally intractable at scale — it requires comparing every query vector to every stored vector.

Vector databases solve this with specialised ANN index structures (HNSW, IVF, PQ) that trade a small amount of accuracy for orders-of-magnitude faster search. They also provide:

- **Filtered search** — combine vector similarity with metadata filters (e.g. "find similar documents from the last 30 days")
- **Hybrid search** — combine dense vector search with sparse BM25-style search in one query
- **Scalar storage** — store the original text and metadata alongside the vectors
- **CRUD operations** — update and delete vectors as your corpus changes

---

## How does a vector database work?

At index time:
1. Receive vectors (produced by an embedding model like BGE-M3 via SIE)
2. Build an ANN index over the vectors (HNSW is most common)
3. Store vectors alongside the original text and metadata

At query time:
1. Receive a query vector (encoded by the same embedding model)
2. Traverse the ANN index to find the approximate k nearest vectors
3. Return the matching text, metadata, and similarity scores

```python
# Indexing
qdrant.upsert(
    collection_name="documents",
    points=[
        {"id": doc_id, "vector": vector, "payload": {"text": chunk, "date": date}}
        for doc_id, vector, chunk, date in zip(ids, vectors, chunks, dates)
    ]
)

# Querying
results = qdrant.search(
    collection_name="documents",
    query_vector=query_vector,
    query_filter={"must": [{"key": "date", "range": {"gte": "2024-01-01"}}]},
    limit=20
)
```

---

## Major vector databases compared

| Database | Open source | Hybrid search | Multi-vector | Managed cloud | Best for |
|---|---|---|---|---|---|
| Qdrant | ✓ | ✓ | ✓ | ✓ | Performance, Rust-based |
| Weaviate | ✓ | ✓ | ✓ | ✓ | GraphQL API, modules |
| Chroma | ✓ | Limited | ✗ | ✗ | Simplicity, local dev |
| Pinecone | ✗ | ✓ | ✗ | ✓ (only) | Managed, easy setup |
| Milvus | ✓ | ✓ | ✓ | ✓ | Large scale, enterprise |
| pgvector | ✓ | Limited | ✗ | ✓ (via RDS) | Existing PostgreSQL users |

SIE has integration guides for Qdrant, Weaviate, and Chroma.

---

## Vector database vs traditional database + pgvector

pgvector is a PostgreSQL extension that adds vector similarity search. It's a good starting point but has limitations at scale:

| | pgvector | Purpose-built vector DB |
|---|---|---|
| Setup | Easy (existing PG) | Separate deployment |
| Scale | Millions of vectors | Hundreds of millions+ |
| ANN performance | Good (HNSW support) | Optimised, faster |
| Hybrid search | Limited | Native |
| Filtering | Full SQL | Purpose-built |

For prototyping or small corpora (<1M vectors), pgvector is practical. For production search systems, a purpose-built vector DB provides better performance and features.

---

## How does SIE work with vector databases?

SIE produces the vectors; the vector database stores and retrieves them. They're complementary:

```python
from sie_sdk import SIEClient
from sie_sdk.types import Item
import qdrant_client

sie = SIEClient("http://localhost:8080")
qdrant = qdrant_client.QdrantClient("http://localhost:6333")

# Encode documents with SIE
encode_results = sie.encode("BAAI/bge-m3", [Item(text=c) for c in document_chunks])
vectors = [r["dense"] for r in encode_results]

# Store in Qdrant
qdrant.upsert(collection_name="docs", points=[
    {"id": i, "vector": v.tolist(), "payload": {"text": c}}
    for i, (v, c) in enumerate(zip(vectors, document_chunks))
])

# Search
query_vector = sie.encode("BAAI/bge-m3", Item(text=user_query), is_query=True)["dense"]
results = qdrant.search("docs", query_vector=query_vector, limit=10)
```

---

## Choosing the right vector database

Key questions to guide your decision:

- **Scale** — how many vectors now, and in 12 months?
- **Filtering needs** — do you need complex metadata filters alongside vector search?
- **Hybrid search** — do you need BM25 + vector combined?
- **Deployment** — self-hosted or managed cloud?
- **Multi-vector** — do you need ColBERT-style token-level retrieval?
- **Existing stack** — does your team already use PostgreSQL (pgvector) or Elasticsearch?

For new production deployments with SIE, Qdrant is the most commonly recommended choice: open source, high performance, native hybrid search, and multi-vector support.

---

## Frequently asked questions

**Is a vector database the same as a vector store?**
Often used interchangeably. "Vector store" is sometimes used informally for simpler, in-memory implementations (like FAISS). "Vector database" implies a production system with persistence, CRUD, and querying capabilities.

**Can I use a vector database without an embedding model?**
You need vectors to populate it. You can generate vectors using any embedding model — SIE, OpenAI, Cohere, or others. The vector DB is agnostic to how the vectors were generated.

**Do vector databases replace traditional search engines like Elasticsearch?**
For semantic search use cases, yes. But many teams use both: Elasticsearch for keyword/structured search and a vector DB for semantic search, combining results via hybrid retrieval.

---

## Related resources

- [SIE + Qdrant integration](/docs/integrations/qdrant)
- [SIE + Weaviate integration](/docs/integrations/weaviate)
- [SIE + Chroma integration](/docs/integrations/chroma)
- [What is approximate nearest neighbour search?](/glossary/what-is-approximate-nearest-neighbour-search)
- [What is semantic search?](/glossary/what-is-semantic-search)
- [What is hybrid search?](/glossary/what-is-hybrid-search)
