Why did we open-source our inference engine? Read the post
← All Glossary Articles

How Does Weaviate Work with Embedding Models?

Weaviate is an open-source vector database that stores objects with vector representations and enables semantic search, hybrid search, and filtered retrieval via a GraphQL or REST API. It works with embedding models by accepting vectors at insert time (generated by SIE) and indexing them in HNSW graphs for fast ANN retrieval. Weaviate’s module system also enables direct integration with external vectorisers, though SIE’s self-hosted approach keeps data within your own infrastructure.


Why Weaviate?

Weaviate is a strong choice when:

  • Your team prefers GraphQL as the query interface
  • You want schema-based data modelling with typed properties
  • You need hybrid search (BM25 + vector) out of the box
  • You’re building with LangChain or LlamaIndex (Weaviate has first-class integrations)
  • You want a module ecosystem for connecting additional ML models

Weaviate’s schema approach, defining data classes with properties, makes it particularly well-suited for structured document retrieval where metadata filtering is as important as vector similarity.


How Weaviate and SIE work together

SIE encodes documents; Weaviate stores and retrieves them:

import weaviate
from sie_sdk import SIEClient
from sie_sdk.types import Item
sie = SIEClient("http://localhost:8080")
w = weaviate.Client("http://localhost:8080") # Weaviate client
# 1. Define schema
w.schema.create_class({
"class": "Document",
"vectorizer": "none", # We provide vectors externally via SIE
"properties": [
{"name": "text", "dataType": ["text"]},
{"name": "source", "dataType": ["text"]},
{"name": "date", "dataType": ["date"]}
]
})
# 2. Encode and insert documents
encode_results = sie.encode("BAAI/bge-m3", [Item(text=c) for c in document_chunks])
vectors = [r["dense"] for r in encode_results]
with w.batch as batch:
for chunk, vector, source, date in zip(document_chunks, vectors, sources, dates):
batch.add_data_object(
data_object={"text": chunk, "source": source, "date": date},
class_name="Document",
vector=vector.tolist()
)
# 3. Search
query_vector = sie.encode("BAAI/bge-m3", Item(text=user_query), is_query=True)["dense"]
result = (
w.query
.get("Document", ["text", "source", "date"])
.with_near_vector({"vector": query_vector.tolist()})
.with_limit(20)
.do()
)

Hybrid search with Weaviate and SIE

Weaviate’s hybrid search combines BM25 keyword search with vector search using Reciprocal Rank Fusion:

# Hybrid search — no separate query encoding needed for BM25 component
result = (
w.query
.get("Document", ["text", "source"])
.with_hybrid(
query=user_query, # BM25 uses text directly
vector=query_vector.tolist(), # SIE-encoded vector for semantic search
alpha=0.5, # 0 = pure BM25, 1 = pure vector, 0.5 = balanced
fusion_type="relativeScoreFusion"
)
.with_limit(20)
.do()
)

BGE-M3’s sparse vectors can also be used with Weaviate’s sparse vector support for even more accurate hybrid retrieval.


Filtered vector search in Weaviate

Weaviate’s where filter enables metadata-filtered ANN search:

result = (
w.query
.get("Document", ["text", "source", "date"])
.with_near_vector({"vector": query_vector.tolist()})
.with_where({
"path": ["date"],
"operator": "GreaterThan",
"valueDate": "2024-01-01T00:00:00Z"
})
.with_limit(20)
.do()
)

Weaviate uses a roaring bitmap filter for high-performance metadata filtering that maintains recall under selective filters.


Weaviate v4 client (Python)

Weaviate v4 introduced a new Python client with a cleaner API:

import weaviate
import weaviate.classes as wvc
client = weaviate.connect_to_local()
# Create collection
documents = client.collections.create(
name="Document",
vectorizer_config=wvc.config.Configure.Vectorizer.none(),
properties=[
wvc.config.Property(name="text", data_type=wvc.config.DataType.TEXT),
wvc.config.Property(name="source", data_type=wvc.config.DataType.TEXT),
]
)
# Insert with vectors
documents.data.insert_many([
wvc.data.DataObject(properties={"text": chunk, "source": src}, vector=vec.tolist())
for chunk, src, vec in zip(chunks, sources, vectors)
])
# Search
results = documents.query.near_vector(
near_vector=query_vector.tolist(),
limit=20
)

Weaviate vs Qdrant: key differences

WeaviateQdrant
Query languageGraphQL + RESTREST + gRPC
SchemaRequired (typed classes)Optional (flexible payload)
Hybrid search✓ (built-in)✓ (built-in)
Module ecosystem✓ (vectorisers, readers)Limited
LanguageGoRust
PerformanceGoodSlightly faster at scale
Best forStructured docs, LangChainRaw performance, flexibility

Frequently asked questions

What is Weaviate’s vectorizer module system? Weaviate modules allow automatic vectorisation at insert time using external models (OpenAI, Cohere, HuggingFace). When using SIE, set vectorizer: "none" and provide vectors directly. This gives you full control over the encoding model and keeps data off external APIs.

Does Weaviate support multi-vector / ColBERT retrieval? Weaviate has added multi-vector support. Check the SIE + Weaviate integration docs for current ColBERT implementation details.

Can I run Weaviate and SIE in the same Kubernetes cluster? Yes. Both are containerised and can run in the same cluster, with SIE on GPU nodes and Weaviate on CPU nodes.


Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.