Skip to content
Why did we open-source our inference engine? Read the post

Haystack

The sie-haystack package provides native Haystack components for embeddings, reranking, and extraction. Use SIETextEmbedder and SIEDocumentEmbedder for dense embeddings, the sparse variants for hybrid search, SIERanker for cross-encoder reranking, and SIEExtractor for zero-shot extraction (entities, relations, classifications, and object detection).

Terminal window
pip install sie-haystack

This installs sie-sdk and haystack-ai as dependencies.

Terminal window
# Docker (recommended)
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:default
# Or with GPU
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:default

SIE provides five embedder components following Haystack conventions:

ComponentUse Case
SIETextEmbedderEmbed queries (dense)
SIEDocumentEmbedderEmbed documents (dense)
SIESparseTextEmbedderEmbed queries (sparse)
SIESparseDocumentEmbedderEmbed documents (sparse)
SIEImageEmbedderEmbed images (CLIP, SigLIP, ColPali)

Use SIETextEmbedder for embedding queries in retrieval pipelines:

from sie_haystack import SIETextEmbedder
embedder = SIETextEmbedder(
base_url="http://localhost:8080",
model="BAAI/bge-m3"
)
result = embedder.run(text="What is vector search?")
embedding = result["embedding"] # list[float]
print(len(embedding)) # 1024

Use SIEDocumentEmbedder for embedding documents before indexing:

from haystack import Document
from sie_haystack import SIEDocumentEmbedder
embedder = SIEDocumentEmbedder(
base_url="http://localhost:8080",
model="BAAI/bge-m3"
)
docs = [
Document(content="Machine learning uses algorithms to learn from data."),
Document(content="Neural networks are inspired by biological neurons."),
]
result = embedder.run(documents=docs)
embedded_docs = result["documents"]
for doc in embedded_docs:
print(f"{len(doc.embedding)} dimensions")

Include metadata fields in the embedding by specifying meta_fields_to_embed:

embedder = SIEDocumentEmbedder(
model="BAAI/bge-m3",
meta_fields_to_embed=["title", "author"]
)
doc = Document(
content="Deep learning uses multiple layers.",
meta={"title": "Neural Networks", "author": "Jane Doe"}
)
# Embeds: "Neural Networks Jane Doe Deep learning uses multiple layers."
result = embedder.run(documents=[doc])

For hybrid search, use the sparse embedder components. These work with stores like Qdrant that support sparse vectors.

from sie_haystack import SIESparseTextEmbedder
embedder = SIESparseTextEmbedder(
base_url="http://localhost:8080",
model="BAAI/bge-m3"
)
result = embedder.run(text="What is vector search?")
sparse_embedding = result["sparse_embedding"]
print(sparse_embedding.keys()) # dict_keys(['indices', 'values'])
from haystack import Document
from sie_haystack import SIESparseDocumentEmbedder
embedder = SIESparseDocumentEmbedder(
base_url="http://localhost:8080",
model="BAAI/bge-m3"
)
docs = [Document(content="Python is a programming language.")]
result = embedder.run(documents=docs)
# Sparse embedding stored in document metadata
sparse = result["documents"][0].meta["_sparse_embedding"]
print(sparse.keys()) # dict_keys(['indices', 'values'])

For ColBERT/late-interaction models, use the multivector embedder components. These produce per-token embeddings that enable MaxSim scoring for higher retrieval quality.

from sie_haystack import SIEMultivectorTextEmbedder
embedder = SIEMultivectorTextEmbedder(
base_url="http://localhost:8080",
model="jinaai/jina-colbert-v2"
)
result = embedder.run(text="What is vector search?")
multivector = result["multivector_embedding"] # list[list[float]] - one vector per token
from haystack import Document
from sie_haystack import SIEMultivectorDocumentEmbedder
embedder = SIEMultivectorDocumentEmbedder(
base_url="http://localhost:8080",
model="jinaai/jina-colbert-v2"
)
docs = [Document(content="Python is a programming language.")]
result = embedder.run(documents=docs)
# Multivector embedding stored in document metadata
mv = result["documents"][0].meta["_multivector_embedding"]
print(f"{len(mv)} token vectors, {len(mv[0])} dims each")

Complete retrieval pipeline using SIE embeddings with an in-memory document store:

from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from sie_haystack import SIEDocumentEmbedder, SIETextEmbedder
# 1. Create document store and embedder
document_store = InMemoryDocumentStore()
doc_embedder = SIEDocumentEmbedder(
base_url="http://localhost:8080",
model="BAAI/bge-m3"
)
# 2. Prepare and embed documents
documents = [
Document(content="Machine learning is a branch of artificial intelligence."),
Document(content="Neural networks are inspired by biological neurons."),
Document(content="Deep learning uses multiple layers of neural networks."),
Document(content="Python is popular for machine learning development."),
]
embedded_docs = doc_embedder.run(documents=documents)["documents"]
document_store.write_documents(embedded_docs)
# 3. Build retrieval pipeline
query_embedder = SIETextEmbedder(
base_url="http://localhost:8080",
model="BAAI/bge-m3"
)
retrieval_pipeline = Pipeline()
retrieval_pipeline.add_component("query_embedder", query_embedder)
retrieval_pipeline.add_component(
"retriever",
InMemoryEmbeddingRetriever(document_store=document_store, top_k=2)
)
retrieval_pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
# 4. Query
result = retrieval_pipeline.run({"query_embedder": {"text": "What is deep learning?"}})
for doc in result["retriever"]["documents"]:
print(f"Score: {doc.score:.3f} - {doc.content[:50]}")

SIERanker reranks documents by relevance to a query. Use it after initial retrieval to improve precision. Works with both cross-encoder models (e.g., jinaai/jina-reranker-v2-base-multilingual) and ColBERT/late-interaction models (e.g., jinaai/jina-colbert-v2).

from haystack import Document
from sie_haystack import SIERanker
ranker = SIERanker(
base_url="http://localhost:8080",
model="jinaai/jina-reranker-v2-base-multilingual",
top_k=3
)
docs = [
Document(content="Machine learning is a subset of AI."),
Document(content="The weather is sunny today."),
Document(content="Deep learning uses neural networks."),
]
result = ranker.run(query="What is ML?", documents=docs)
for doc in result["documents"]:
score = doc.meta.get("score", 0)
print(f"{score:.3f}: {doc.content[:50]}")
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from sie_haystack import SIEDocumentEmbedder, SIERanker, SIETextEmbedder
document_store = InMemoryDocumentStore()
# ... embed and write documents ...
pipeline = Pipeline()
pipeline.add_component("query_embedder", SIETextEmbedder(model="BAAI/bge-m3"))
pipeline.add_component(
"retriever",
InMemoryEmbeddingRetriever(document_store=document_store, top_k=20)
)
pipeline.add_component(
"ranker",
SIERanker(model="jinaai/jina-reranker-v2-base-multilingual", top_k=5)
)
pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
pipeline.connect("retriever.documents", "ranker.documents")
# Retrieves 20 docs, reranks, returns top 5
result = pipeline.run({
"query_embedder": {"text": "What is deep learning?"},
"ranker": {"query": "What is deep learning?"},
})
for doc in result["ranker"]["documents"]:
print(doc.content[:60])

SIEExtractor provides zero-shot extraction using GLiNER, GLiREL, GLiClass, and GroundingDINO/OWL-v2 models. It declares 4 output types: entities, relations, classifications, and objects.

from sie_haystack import SIEExtractor
extractor = SIEExtractor(
base_url="http://localhost:8080",
model="urchade/gliner_multi-v2.1",
labels=["person", "organization", "location"]
)
result = extractor.run(text="Tim Cook announced new products at Apple Park in Cupertino.")
for entity in result["entities"]:
print(f"{entity.label}: {entity.text} ({entity.score:.2f})")
# person: Tim Cook (0.96)
# organization: Apple (0.91)
# location: Cupertino (0.88)

Extract relationships between entities using GLiREL:

from sie_haystack import SIEExtractor
extractor = SIEExtractor(
base_url="http://localhost:8080",
model="jackboyla/glirel-large-v0",
labels=["works_for", "ceo_of", "founded"]
)
result = extractor.run(text="Tim Cook is the CEO of Apple Inc.")
for relation in result["relations"]:
print(f"{relation.head} --{relation.relation}--> {relation.tail}")
# Tim Cook --ceo_of--> Apple Inc.

Classify text into categories using GLiClass:

from sie_haystack import SIEExtractor
extractor = SIEExtractor(
base_url="http://localhost:8080",
model="knowledgator/gliclass-base-v1.0",
labels=["positive", "negative", "neutral"]
)
result = extractor.run(text="I absolutely loved this movie! The acting was superb.")
for classification in result["classifications"]:
print(f"{classification.label}: {classification.score:.2f}")
# positive: 0.94
# neutral: 0.04
# negative: 0.02

The extractor returns typed dataclass instances for each result type:

from sie_haystack.extractors import Entity, Relation, Classification, DetectedObject
# Entity fields:
# text: str - matched text span
# label: str - entity type
# score: float - confidence score
# start: int - character offset start
# end: int - character offset end
# Relation fields:
# head: str - source entity
# tail: str - target entity
# relation: str - relation type
# score: float - confidence score
# Classification fields:
# label: str - classification category
# score: float - confidence score
# DetectedObject fields:
# label: str - object class
# score: float - confidence score
# bbox: list - bounding box [x1, y1, x2, y2]

SIEImageEmbedder embeds images using multimodal models like CLIP, SigLIP, and ColPali. It works as a standard Haystack component in pipeline graphs.

from sie_haystack import SIEImageEmbedder
embedder = SIEImageEmbedder(
base_url="http://localhost:8080",
model="openai/clip-vit-large-patch14"
)
# Embed images from file paths
result = embedder.run(images=["photo1.jpg", "photo2.png"])
embeddings = result["embeddings"] # list[list[float]]
# Or from raw bytes
with open("photo.jpg", "rb") as f:
image_bytes = f.read()
result = embedder.run(images=[image_bytes])
from haystack import Pipeline
from sie_haystack import SIEImageEmbedder
pipeline = Pipeline()
pipeline.add_component("image_embedder", SIEImageEmbedder(model="openai/clip-vit-large-patch14"))
# Connect to your vector store retriever...

Supported models include openai/clip-vit-large-patch14, google/siglip-base-patch16-224, and other vision models in the Model Catalog.

ParameterTypeDefaultDescription
base_urlstrhttp://localhost:8080SIE server URL
modelstrBAAI/bge-m3Model to use
gpustrNoneTarget GPU type for routing
optionsdictNoneModel-specific options
timeout_sfloat180.0Request timeout in seconds
ParameterTypeDefaultDescription
meta_fields_to_embedlist[str]NoneMetadata fields to include
ParameterTypeDefaultDescription
base_urlstrhttp://localhost:8080SIE server URL
modelstrjinaai/jina-reranker-v2-base-multilingualReranker model
top_kintNoneNumber of documents to return
gpustrNoneTarget GPU type for routing
optionsdictNoneModel-specific options
timeout_sfloat180.0Request timeout in seconds

The extraction model determines which output types are populated. Use GLiNER models for entities, GLiREL for relations, GLiClass for classifications, and GroundingDINO/OWL-v2 for object detection.

ParameterTypeDefaultDescription
base_urlstrhttp://localhost:8080SIE server URL
modelstrurchade/gliner_multi-v2.1Extraction model (GLiNER, GLiREL, GLiClass, GroundingDINO, OWL-v2)
labelslist[str]["person", "organization", "location"]Labels for extraction (entity types, relation types, or classification categories)
gpustrNoneTarget GPU type for routing
optionsdictNoneModel-specific options
timeout_sfloat180.0Request timeout in seconds

Contact us

Tell us about your use case and we'll get back to you shortly.