Why did we open-source our inference engine? Read the post
← All Glossary Articles

How Does Haystack Work with Embedding Models?

Haystack is an open-source framework for building NLP and RAG pipelines by connecting modular components — document stores, retrievers, readers, and generators — into composable directed acyclic graphs. It works with embedding models by providing retriever components that call the embedding model at query and index time. SIE integrates with Haystack as a self-hosted embedding backend, replacing managed API calls with GPU inference in your own cloud.


Why Haystack?

Haystack is the right choice when:

  • You want a structured pipeline framework rather than writing retrieval logic from scratch
  • You need to compose complex multi-hop or multi-stage pipelines (retrieve → rerank → generate → verify)
  • You’re building production RAG systems and want pre-built components for evaluation, caching, and monitoring
  • You want to swap components (different retrievers, LLMs, document stores) without rewriting pipeline logic
  • You need multi-modal pipelines handling both text and images

Haystack’s component abstraction means you can prototype with OpenAI embeddings and switch to SIE self-hosted inference for production without changing pipeline logic.


Core Haystack concepts

Components — individual pipeline steps with typed inputs and outputs (EmbeddingRetriever, SentenceTransformersTextEmbedder, OpenAIGenerator, etc.)

Pipeline — a directed acyclic graph of components connected by their input/output types

Document Store — the storage layer (InMemoryDocumentStore, QdrantDocumentStore, WeaviateDocumentStore, etc.)

Documents — Haystack’s data type representing a piece of text with metadata and an optional embedding vector


How SIE integrates with Haystack

SIE acts as the embedding backend. Use a custom TextEmbedder component that calls SIE:

from haystack import component, Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from sie_sdk import SIEClient
from sie_sdk.types import Item
# Custom SIE embedder component
@component
class SIETextEmbedder:
def __init__(self, model: str = "BAAI/bge-m3"):
self.client = SIEClient("http://localhost:8080")
self.model = model
@component.output_types(embedding=list[float])
def run(self, text: str):
result = self.client.encode(self.model, Item(text=text), is_query=True)
return {"embedding": result["dense"].tolist()}
@component
class SIEDocumentEmbedder:
def __init__(self, model: str = "BAAI/bge-m3"):
self.client = SIEClient("http://localhost:8080")
self.model = model
@component.output_types(documents=list[Document])
def run(self, documents: list[Document]):
texts = [doc.content for doc in documents]
encode_results = self.client.encode(
self.model,
[Item(text=t) for t in texts],
)
for doc, res in zip(documents, encode_results):
doc.embedding = res["dense"].tolist()
return {"documents": documents}

Building an indexing pipeline with SIE + Haystack

from haystack import Pipeline
from haystack.components.writers import DocumentWriter
document_store = InMemoryDocumentStore()
# Indexing pipeline
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", SIEDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder.documents", "writer.documents")
# Index documents
raw_docs = [Document(content=chunk, meta={"source": src}) for chunk, src in zip(chunks, sources)]
indexing_pipeline.run({"embedder": {"documents": raw_docs}})

Building a RAG query pipeline with SIE + Haystack

from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
PROMPT = """
Answer the question using the provided context.
Context: {% for doc in documents %}{{ doc.content }}{% endfor %}
Question: {{ query }}
"""
rag_pipeline = Pipeline()
rag_pipeline.add_component("query_embedder", SIETextEmbedder())
rag_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store, top_k=5))
rag_pipeline.add_component("prompt_builder", PromptBuilder(template=PROMPT))
rag_pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o"))
rag_pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "llm.prompt")
result = rag_pipeline.run({
"query_embedder": {"text": "What are the termination conditions?"},
"prompt_builder": {"query": "What are the termination conditions?"}
})
print(result["llm"]["replies"][0])

Using Haystack with Qdrant and SIE

For production, swap InMemoryDocumentStore for QdrantDocumentStore:

from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
document_store = QdrantDocumentStore(
url="http://localhost:6333",
index="documents",
embedding_dim=1024, # BGE-M3 output dim
)
retriever = QdrantEmbeddingRetriever(document_store=document_store, top_k=20)

Haystack vs LangChain vs LlamaIndex

HaystackLangChainLlamaIndex
Pipeline modelTyped DAGChain / AgentQuery engine
Component typingStrictLooseMedium
RAG focusGeneral
Evaluation toolingStrongGrowingGood
Production maturityHighHighHigh
Best forProduction RAG, evaluationAgents, diverse tasksDocument QA

All three integrate well with SIE — the choice comes down to team familiarity and pipeline complexity.


Frequently asked questions

Does Haystack have a built-in SIE integration? The SIE SDK is used via a custom component as shown above. A native Haystack SIE integration is available — see the SIE + Haystack integration guide for the current implementation.

Can Haystack pipelines be serialised and deployed? Yes. Haystack pipelines serialise to YAML, enabling reproducible deployments and version-controlled pipeline definitions.

Does Haystack support reranking with SIE? Yes. Add a custom reranker component that calls client.score() between the retriever and prompt builder steps.


Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.