---
title: Haystack
description: Use SIE embeddings, reranking, and extraction in Haystack RAG pipelines.
canonical_url: https://superlinked.com/docs/integrations/haystack
last_updated: 2026-05-20
---

The `sie-haystack` package provides native Haystack components for embeddings, reranking, and extraction. Use `SIETextEmbedder` and `SIEDocumentEmbedder` for dense embeddings, the sparse variants for hybrid search, `SIERanker` for cross-encoder reranking, and `SIEExtractor` for zero-shot extraction (entities, relations, classifications, and object detection).

Imports follow Haystack's `haystack_integrations.components.*` convention. The legacy `sie_haystack` imports remain available for compatibility.

## Installation

```bash
pip install sie-haystack
```

This installs `sie-sdk` and `haystack-ai` as dependencies.

## Start the Server

Source: [packages/sie_server/src/sie_server/cli.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/cli.py)

```bash
# Docker (recommended)
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default

# Or with GPU
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
```

## Embedders

Source: [integrations/sie_haystack/src/sie_haystack/embedders.py](https://github.com/superlinked/sie/blob/main/integrations/sie_haystack/src/sie_haystack/embedders.py)

SIE provides five embedder components following Haystack conventions:

| Component | Use Case |
|-----------|----------|
| `SIETextEmbedder` | Embed queries (dense) |
| `SIEDocumentEmbedder` | Embed documents (dense) |
| `SIESparseTextEmbedder` | Embed queries (sparse) |
| `SIESparseDocumentEmbedder` | Embed documents (sparse) |
| `SIEImageEmbedder` | Embed images (CLIP, SigLIP, ColPali) |

### Text Embedder

Use `SIETextEmbedder` for embedding queries in retrieval pipelines:

```python
from haystack_integrations.components.embedders.sie import SIETextEmbedder

embedder = SIETextEmbedder(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3"
)

result = embedder.run(text="What is vector search?")
embedding = result["embedding"]  # list[float]
print(len(embedding))  # 1024
```

### Document Embedder

Use `SIEDocumentEmbedder` for embedding documents before indexing:

```python
from haystack import Document
from haystack_integrations.components.embedders.sie import SIEDocumentEmbedder

embedder = SIEDocumentEmbedder(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3"
)

docs = [
    Document(content="Machine learning uses algorithms to learn from data."),
    Document(content="Neural networks are inspired by biological neurons."),
]

result = embedder.run(documents=docs)
embedded_docs = result["documents"]

for doc in embedded_docs:
    print(f"{len(doc.embedding)} dimensions")
```

### Metadata Fields

Include metadata fields in the embedding by specifying `meta_fields_to_embed`:

```python
embedder = SIEDocumentEmbedder(
    model="BAAI/bge-m3",
    meta_fields_to_embed=["title", "author"]
)

doc = Document(
    content="Deep learning uses multiple layers.",
    meta={"title": "Neural Networks", "author": "Jane Doe"}
)

# Embeds: "Neural Networks Jane Doe Deep learning uses multiple layers."
result = embedder.run(documents=[doc])
```

## Sparse Embeddings

Source: [integrations/sie_haystack/src/sie_haystack/embedders.py](https://github.com/superlinked/sie/blob/main/integrations/sie_haystack/src/sie_haystack/embedders.py)

For hybrid search, use the sparse embedder components. These work with stores like Qdrant that support sparse vectors.

### Sparse Text Embedder

```python
from haystack_integrations.components.embedders.sie import SIESparseTextEmbedder

embedder = SIESparseTextEmbedder(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3"
)

result = embedder.run(text="What is vector search?")
sparse_embedding = result["sparse_embedding"]
print(sparse_embedding.keys())  # dict_keys(['indices', 'values'])
```

### Sparse Document Embedder

```python
from haystack import Document
from haystack_integrations.components.embedders.sie import SIESparseDocumentEmbedder

embedder = SIESparseDocumentEmbedder(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3"
)

docs = [Document(content="Python is a programming language.")]
result = embedder.run(documents=docs)

# Sparse embedding stored in document metadata
sparse = result["documents"][0].meta["_sparse_embedding"]
print(sparse.keys())  # dict_keys(['indices', 'values'])
```

## Multivector (ColBERT) Embeddings

Source: [integrations/sie_haystack/src/sie_haystack/embedders.py](https://github.com/superlinked/sie/blob/main/integrations/sie_haystack/src/sie_haystack/embedders.py)

For ColBERT/late-interaction models, use the multivector embedder components. These produce per-token embeddings that enable MaxSim scoring for higher retrieval quality.

### Multivector Text Embedder

```python
from haystack_integrations.components.embedders.sie import SIEMultivectorTextEmbedder

embedder = SIEMultivectorTextEmbedder(
    base_url="http://localhost:8080",
    model="jinaai/jina-colbert-v2"
)

result = embedder.run(text="What is vector search?")
multivector = result["multivector_embedding"]  # list[list[float]] - one vector per token
```

### Multivector Document Embedder

```python
from haystack import Document
from haystack_integrations.components.embedders.sie import SIEMultivectorDocumentEmbedder

embedder = SIEMultivectorDocumentEmbedder(
    base_url="http://localhost:8080",
    model="jinaai/jina-colbert-v2"
)

docs = [Document(content="Python is a programming language.")]
result = embedder.run(documents=docs)

# Multivector embedding stored in document metadata
mv = result["documents"][0].meta["_multivector_embedding"]
print(f"{len(mv)} token vectors, {len(mv[0])} dims each")
```

## Full Example

Source: [integrations/sie_haystack/src/sie_haystack/embedders.py](https://github.com/superlinked/sie/blob/main/integrations/sie_haystack/src/sie_haystack/embedders.py)

Complete retrieval pipeline using SIE embeddings with an in-memory document store:

```python
from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.sie import SIEDocumentEmbedder, SIETextEmbedder

# 1. Create document store and embedder
document_store = InMemoryDocumentStore()
doc_embedder = SIEDocumentEmbedder(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3"
)

# 2. Prepare and embed documents
documents = [
    Document(content="Machine learning is a branch of artificial intelligence."),
    Document(content="Neural networks are inspired by biological neurons."),
    Document(content="Deep learning uses multiple layers of neural networks."),
    Document(content="Python is popular for machine learning development."),
]

embedded_docs = doc_embedder.run(documents=documents)["documents"]
document_store.write_documents(embedded_docs)

# 3. Build retrieval pipeline
query_embedder = SIETextEmbedder(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3"
)

retrieval_pipeline = Pipeline()
retrieval_pipeline.add_component("query_embedder", query_embedder)
retrieval_pipeline.add_component(
    "retriever",
    InMemoryEmbeddingRetriever(document_store=document_store, top_k=2)
)
retrieval_pipeline.connect("query_embedder.embedding", "retriever.query_embedding")

# 4. Query
result = retrieval_pipeline.run({"query_embedder": {"text": "What is deep learning?"}})

for doc in result["retriever"]["documents"]:
    print(f"Score: {doc.score:.3f} - {doc.content[:50]}")
```

## Reranking

Source: [integrations/sie_haystack/src/sie_haystack/rankers.py](https://github.com/superlinked/sie/blob/main/integrations/sie_haystack/src/sie_haystack/rankers.py)

`SIERanker` reranks documents by relevance to a query. Use it after initial retrieval to improve precision. Works with both cross-encoder models (e.g., `jinaai/jina-reranker-v2-base-multilingual`) and ColBERT/late-interaction models (e.g., `jinaai/jina-colbert-v2`).

```python
from haystack import Document
from haystack_integrations.components.rankers.sie import SIERanker

ranker = SIERanker(
    base_url="http://localhost:8080",
    model="jinaai/jina-reranker-v2-base-multilingual",
    top_k=3
)

docs = [
    Document(content="Machine learning is a subset of AI."),
    Document(content="The weather is sunny today."),
    Document(content="Deep learning uses neural networks."),
]

result = ranker.run(query="What is ML?", documents=docs)

for doc in result["documents"]:
    score = doc.meta.get("score", 0)
    print(f"{score:.3f}: {doc.content[:50]}")
```

### In a Pipeline

```python
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.sie import SIEDocumentEmbedder, SIETextEmbedder
from haystack_integrations.components.rankers.sie import SIERanker

document_store = InMemoryDocumentStore()

# ... embed and write documents ...

pipeline = Pipeline()
pipeline.add_component("query_embedder", SIETextEmbedder(model="BAAI/bge-m3"))
pipeline.add_component(
    "retriever",
    InMemoryEmbeddingRetriever(document_store=document_store, top_k=20)
)
pipeline.add_component(
    "ranker",
    SIERanker(model="jinaai/jina-reranker-v2-base-multilingual", top_k=5)
)
pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
pipeline.connect("retriever.documents", "ranker.documents")

# Retrieves 20 docs, reranks, returns top 5
result = pipeline.run({
    "query_embedder": {"text": "What is deep learning?"},
    "ranker": {"query": "What is deep learning?"},
})

for doc in result["ranker"]["documents"]:
    print(doc.content[:60])
```

## Extraction

Source: [integrations/sie_haystack/src/sie_haystack/extractors.py](https://github.com/superlinked/sie/blob/main/integrations/sie_haystack/src/sie_haystack/extractors.py)

`SIEExtractor` provides zero-shot extraction using GLiNER, GLiREL, GLiClass, and GroundingDINO/OWL-v2 models. It declares 4 output types: `entities`, `relations`, `classifications`, and `objects`.

### Entity Extraction

```python
from haystack_integrations.components.extractors.sie import SIEExtractor

extractor = SIEExtractor(
    base_url="http://localhost:8080",
    model="urchade/gliner_multi-v2.1",
    labels=["person", "organization", "location"]
)

result = extractor.run(text="Tim Cook announced new products at Apple Park in Cupertino.")
for entity in result["entities"]:
    print(f"{entity.label}: {entity.text} ({entity.score:.2f})")
# person: Tim Cook (0.96)
# organization: Apple (0.91)
# location: Cupertino (0.88)
```

### Relation Extraction

Extract relationships between entities using GLiREL:

```python
from haystack_integrations.components.extractors.sie import SIEExtractor

extractor = SIEExtractor(
    base_url="http://localhost:8080",
    model="jackboyla/glirel-large-v0",
    labels=["works_for", "ceo_of", "founded"]
)

result = extractor.run(text="Tim Cook is the CEO of Apple Inc.")
for relation in result["relations"]:
    print(f"{relation.head} --{relation.relation}--> {relation.tail}")
# Tim Cook --ceo_of--> Apple Inc.
```

### Text Classification

Classify text into categories using GLiClass:

```python
from haystack_integrations.components.extractors.sie import SIEExtractor

extractor = SIEExtractor(
    base_url="http://localhost:8080",
    model="knowledgator/gliclass-base-v1.0",
    labels=["positive", "negative", "neutral"]
)

result = extractor.run(text="I absolutely loved this movie! The acting was superb.")
for classification in result["classifications"]:
    print(f"{classification.label}: {classification.score:.2f}")
# positive: 0.94
# neutral: 0.04
# negative: 0.02
```

### Output Dataclasses

The extractor returns typed dataclass instances for each result type:

```python
from haystack_integrations.components.extractors.sie import (
    Classification,
    DetectedObject,
    Entity,
    Relation,
)

# Entity fields:
#   text: str       - matched text span
#   label: str      - entity type
#   score: float    - confidence score
#   start: int      - character offset start
#   end: int        - character offset end

# Relation fields:
#   head: str       - source entity
#   tail: str       - target entity
#   relation: str   - relation type
#   score: float    - confidence score

# Classification fields:
#   label: str      - classification category
#   score: float    - confidence score

# DetectedObject fields:
#   label: str      - object class
#   score: float    - confidence score
#   bbox: list      - bounding box [x1, y1, x2, y2]
```

## Image Embeddings

Source: [integrations/sie_haystack/src/sie_haystack/embedders.py](https://github.com/superlinked/sie/blob/main/integrations/sie_haystack/src/sie_haystack/embedders.py)

`SIEImageEmbedder` embeds images using multimodal models like CLIP, SigLIP, and ColPali. It works as a standard Haystack component in pipeline graphs.

```python
from haystack_integrations.components.embedders.sie import SIEImageEmbedder

embedder = SIEImageEmbedder(
    base_url="http://localhost:8080",
    model="openai/clip-vit-large-patch14"
)

# Embed images from file paths
result = embedder.run(images=["photo1.jpg", "photo2.png"])
embeddings = result["embeddings"]  # list[list[float]]

# Or from raw bytes
with open("photo.jpg", "rb") as f:
    image_bytes = f.read()
result = embedder.run(images=[image_bytes])
```

### In a Pipeline

```python
from haystack import Pipeline
from haystack_integrations.components.embedders.sie import SIEImageEmbedder

pipeline = Pipeline()
pipeline.add_component("image_embedder", SIEImageEmbedder(model="openai/clip-vit-large-patch14"))
# Connect to your vector store retriever...
```

Supported models include `openai/clip-vit-large-patch14`, `google/siglip-base-patch16-224`, and other vision models in the [Model Catalog](/models#task=encode).

## Configuration Options

### All Embedders

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_url` | `str` | `http://localhost:8080` | SIE server URL |
| `model` | `str` | `BAAI/bge-m3` | Model to use |
| `gpu` | `str` | `None` | Target GPU type for routing |
| `options` | `dict` | `None` | Model-specific options |
| `timeout_s` | `float` | `180.0` | Request timeout in seconds |

### Document Embedders Only

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `meta_fields_to_embed` | `list[str]` | `None` | Metadata fields to include |

### SIERanker

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_url` | `str` | `http://localhost:8080` | SIE server URL |
| `model` | `str` | `jinaai/jina-reranker-v2-base-multilingual` | Reranker model |
| `top_k` | `int` | `None` | Number of documents to return |
| `gpu` | `str` | `None` | Target GPU type for routing |
| `options` | `dict` | `None` | Model-specific options |
| `timeout_s` | `float` | `180.0` | Request timeout in seconds |

### SIEExtractor

The extraction model determines which output types are populated. Use GLiNER models for entities, GLiREL for relations, GLiClass for classifications, and GroundingDINO/OWL-v2 for object detection.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_url` | `str` | `http://localhost:8080` | SIE server URL |
| `model` | `str` | `urchade/gliner_multi-v2.1` | Extraction model (GLiNER, GLiREL, GLiClass, GroundingDINO, OWL-v2) |
| `labels` | `list[str]` | `["person", "organization", "location"]` | Labels for extraction (entity types, relation types, or classification categories) |
| `gpu` | `str` | `None` | Target GPU type for routing |
| `options` | `dict` | `None` | Model-specific options |
| `timeout_s` | `float` | `180.0` | Request timeout in seconds |

## What's Next

- [Rerank Results](/docs/score/) - cross-encoder reranking details
- [Extract](/docs/extract/) - extraction details (NER, relations, classification, vision)
- [Model Catalog](/models#task=encode,score) - all supported embedding and reranking models
- [LangChain Integration](/docs/integrations/langchain/) - alternative framework option
- [Troubleshooting](/docs/reference/troubleshooting/) - common errors and solutions
