---
title: LlamaIndex
description: Use SIE embeddings and reranking in LlamaIndex RAG pipelines.
canonical_url: https://superlinked.com/docs/integrations/llamaindex
last_updated: 2026-05-20
---

The `sie-llamaindex` package (Python) and `@superlinked/sie-llamaindex` package (TypeScript) provide drop-in components for LlamaIndex. Use `SIEEmbedding` for vector stores, `SIENodePostprocessor` for reranking, and `create_sie_extractor_tool` for extraction (entities, relations, classifications, and object detection).

## Installation

#### Python

```bash
pip install sie-llamaindex
```
This installs `sie-sdk` and `llama-index-core` as dependencies.

#### TypeScript

```bash
pnpm add @superlinked/sie-llamaindex
```
This installs `@superlinked/sie-sdk` and `llamaindex` as dependencies.

## Start the Server

Source: [packages/sie_server/src/sie_server/cli.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/cli.py)

```bash
# Docker (recommended)
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default

# Or with GPU
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
```

## Embeddings

Source: [integrations/sie_llamaindex/src/sie_llamaindex/embeddings.py](https://github.com/superlinked/sie/blob/main/integrations/sie_llamaindex/src/sie_llamaindex/embeddings.py)

`SIEEmbedding` implements LlamaIndex's `BaseEmbedding` interface. Set it as the default embed model or use it directly.

#### Python

```python
from llama_index.core import Settings
from sie_llamaindex import SIEEmbedding

# Set as default embedding model
Settings.embed_model = SIEEmbedding(
    base_url="http://localhost:8080",
    model_name="BAAI/bge-m3"
)

# Or use directly
embed_model = SIEEmbedding(model_name="BAAI/bge-m3")
embedding = embed_model.get_text_embedding("Your text here")
print(len(embedding))  # 1024
```

#### TypeScript

```typescript
import { Settings } from "llamaindex";
import { SIEEmbedding } from "@superlinked/sie-llamaindex";

// Set as default embedding model
Settings.embedModel = new SIEEmbedding({
  baseUrl: "http://localhost:8080",
  modelName: "BAAI/bge-m3",
});

// Or use directly
const embedModel = new SIEEmbedding({ modelName: "BAAI/bge-m3" });
const embedding = await embedModel.getTextEmbedding("Your text here");
console.log(embedding.length); // 1024
```

### With VectorStoreIndex

#### Python

```python
from llama_index.core import Settings, VectorStoreIndex, Document
from sie_llamaindex import SIEEmbedding

Settings.embed_model = SIEEmbedding(model_name="BAAI/bge-m3")

documents = [
    Document(text="Machine learning uses algorithms to learn from data."),
    Document(text="The weather is sunny today."),
]

index = VectorStoreIndex.from_documents(documents)
results = index.as_query_engine().query("What is machine learning?")
```

#### TypeScript

```typescript
import { Settings, VectorStoreIndex, Document } from "llamaindex";
import { SIEEmbedding } from "@superlinked/sie-llamaindex";

Settings.embedModel = new SIEEmbedding({ modelName: "BAAI/bge-m3" });

const documents = [
  new Document({ text: "Machine learning uses algorithms to learn from data." }),
  new Document({ text: "The weather is sunny today." }),
];

const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine();
const results = await queryEngine.query({ query: "What is machine learning?" });
```

### Async Support

#### Python

Both sync and async methods are available:

```python
# Sync
embedding = embed_model.get_text_embedding(text)
embeddings = embed_model.get_text_embedding_batch(texts)

# Async
embedding = await embed_model.aget_text_embedding(text)
query_embedding = await embed_model.aget_query_embedding(query)
```

#### TypeScript

All methods are async by default:

```typescript
// Single text
const embedding = await embedModel.getTextEmbedding(text);

// Multiple texts
const embeddings = await embedModel.getTextEmbeddings(texts);
```

## Reranking

Source: [integrations/sie_llamaindex/src/sie_llamaindex/rerankers.py](https://github.com/superlinked/sie/blob/main/integrations/sie_llamaindex/src/sie_llamaindex/rerankers.py)

`SIENodePostprocessor` implements `BaseNodePostprocessor`. Use it to rerank retrieved nodes. Works with both cross-encoder models (e.g., `jinaai/jina-reranker-v2-base-multilingual`) and ColBERT/late-interaction models (e.g., `jinaai/jina-colbert-v2`) - just change the model name.

#### Python

```python
from llama_index.core.schema import NodeWithScore, TextNode, QueryBundle
from sie_llamaindex import SIENodePostprocessor

reranker = SIENodePostprocessor(
    base_url="http://localhost:8080",
    model="jinaai/jina-reranker-v2-base-multilingual",
    top_n=3
)

nodes = [
    NodeWithScore(node=TextNode(text="Machine learning is a subset of AI."), score=0.5),
    NodeWithScore(node=TextNode(text="The weather is sunny today."), score=0.6),
    NodeWithScore(node=TextNode(text="Deep learning uses neural networks."), score=0.4),
]

reranked = reranker.postprocess_nodes(nodes, QueryBundle(query_str="What is ML?"))

for node in reranked:
    print(f"{node.score:.3f}: {node.node.get_content()[:50]}")
```

#### TypeScript

```typescript
import { TextNode } from "llamaindex";
import { SIENodePostprocessor } from "@superlinked/sie-llamaindex";

const reranker = new SIENodePostprocessor({
  baseUrl: "http://localhost:8080",
  modelName: "jinaai/jina-reranker-v2-base-multilingual",
  topN: 3,
});

const nodes = [
  { node: new TextNode({ text: "Machine learning is a subset of AI." }), score: 0.5 },
  { node: new TextNode({ text: "The weather is sunny today." }), score: 0.6 },
  { node: new TextNode({ text: "Deep learning uses neural networks." }), score: 0.4 },
];

const reranked = await reranker.postprocessNodes(nodes, "What is ML?");

for (const node of reranked) {
  console.log(`${node.score?.toFixed(3)}: ${node.node.getContent().slice(0, 50)}`);
}
```

### With Query Engine

#### Python

```python
from llama_index.core import VectorStoreIndex
from sie_llamaindex import SIENodePostprocessor

reranker = SIENodePostprocessor(
    model="jinaai/jina-reranker-v2-base-multilingual",
    top_n=5
)

# Create query engine with reranking
query_engine = index.as_query_engine(
    node_postprocessors=[reranker],
    similarity_top_k=20  # Retrieve 20, rerank to 5
)

response = query_engine.query("What is machine learning?")
```

#### TypeScript

```typescript
import { SIENodePostprocessor } from "@superlinked/sie-llamaindex";

const reranker = new SIENodePostprocessor({
  modelName: "jinaai/jina-reranker-v2-base-multilingual",
  topN: 5,
});

// Create query engine with reranking
const queryEngine = index.asQueryEngine({
  nodePostprocessors: [reranker],
  similarityTopK: 20, // Retrieve 20, rerank to 5
});

const response = await queryEngine.query({ query: "What is machine learning?" });
```

## Hybrid Search

Source: [integrations/sie_llamaindex/src/sie_llamaindex/embeddings.py](https://github.com/superlinked/sie/blob/main/integrations/sie_llamaindex/src/sie_llamaindex/embeddings.py)

Use `SIESparseEmbeddingFunction` with vector stores that support hybrid search.

#### Python

```python
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from sie_llamaindex import SIEEmbedding, SIESparseEmbeddingFunction

# Create sparse embedding function
sparse_embed_fn = SIESparseEmbeddingFunction(
    base_url="http://localhost:8080",
    model_name="BAAI/bge-m3"
)

# Create hybrid vector store
client = QdrantClient(":memory:")
vector_store = QdrantVectorStore(
    client=client,
    collection_name="hybrid_docs",
    enable_hybrid=True,
    sparse_embedding_function=sparse_embed_fn
)
```

#### TypeScript

```typescript
import { QdrantVectorStore } from "llamaindex";
import { QdrantClient } from "@qdrant/js-client-rest";
import { SIEEmbedding, SIESparseEmbeddingFunction } from "@superlinked/sie-llamaindex";

// Create sparse embedding function
const sparseEmbedFn = new SIESparseEmbeddingFunction({
  baseUrl: "http://localhost:8080",
  modelName: "BAAI/bge-m3",
});

// Create hybrid vector store
const client = new QdrantClient({ url: "http://localhost:6333" });
const vectorStore = new QdrantVectorStore({
  client,
  collectionName: "hybrid_docs",
  enableHybrid: true,
  sparseEmbeddingFunction: sparseEmbedFn,
});
```

## Multimodal Embeddings (Python only)

Source: [integrations/sie_llamaindex/src/sie_llamaindex/embeddings.py](https://github.com/superlinked/sie/blob/main/integrations/sie_llamaindex/src/sie_llamaindex/embeddings.py)

`SIEMultiModalEmbedding` extends LlamaIndex's `MultiModalEmbedding` base class, enabling image embedding with models like CLIP, SigLIP, and ColPali. It plugs into LlamaIndex's multimodal pipelines (e.g. `MultiModalVectorStoreIndex`).

```python
from llama_index.core import Settings
from sie_llamaindex import SIEMultiModalEmbedding

# Set as embedding model - supports both text and images
Settings.embed_model = SIEMultiModalEmbedding(
    base_url="http://localhost:8080",
    model_name="openai/clip-vit-large-patch14"
)

# Embed images
embedding = Settings.embed_model.get_image_embedding("photo.jpg")
embeddings = Settings.embed_model.get_image_embedding_batch(["img1.jpg", "img2.jpg"])

# Text embeddings still work (inherited from BaseEmbedding)
text_embedding = Settings.embed_model.get_text_embedding("A photo of a cat")
```

Supported models include `openai/clip-vit-large-patch14`, `google/siglip-base-patch16-224`, `vidore/colpali-v1.2`, and other vision-capable models in the [Model Catalog](/models#task=encode).

:::note[TypeScript]
LlamaIndex.TS does not have a `MultiModalEmbedding` base class. For image embedding in TypeScript, use the [SIE SDK directly](/docs/encode/):

```typescript

const client = new SIEClient("http://localhost:8080");
const result = await client.encode("openai/clip-vit-large-patch14", {
  images: [imageBytes],  // Uint8Array
});
```
:::

## Full RAG Pipeline (Python)

Source: [integrations/sie_llamaindex/src/sie_llamaindex/](https://github.com/superlinked/sie/blob/main/integrations/sie_llamaindex/src/sie_llamaindex/)

Complete example combining embeddings, reranking, and LLM generation:

```python
from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.llms.openai import OpenAI
from sie_llamaindex import SIEEmbedding, SIENodePostprocessor

# 1. Configure SIE embeddings
Settings.embed_model = SIEEmbedding(
    base_url="http://localhost:8080",
    model_name="BAAI/bge-m3"
)
Settings.llm = OpenAI(model="gpt-4o-mini")

# 2. Create documents and index
documents = [
    Document(text="Machine learning is a branch of artificial intelligence."),
    Document(text="Neural networks are inspired by biological neurons."),
    Document(text="Deep learning uses multiple layers of neural networks."),
    Document(text="Python is popular for machine learning development."),
]

index = VectorStoreIndex.from_documents(documents)

# 3. Create reranker
reranker = SIENodePostprocessor(
    base_url="http://localhost:8080",
    model="jinaai/jina-reranker-v2-base-multilingual",
    top_n=2
)

# 4. Build query engine with reranking
query_engine = index.as_query_engine(
    node_postprocessors=[reranker],
    similarity_top_k=10  # Retrieve 10, rerank to 2
)

# 5. Query
response = query_engine.query("What is deep learning?")
print(response)
```

## Configuration Options

### SIEEmbedding

#### Python

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_url` | `str` | `http://localhost:8080` | SIE server URL |
| `model_name` | `str` | `BAAI/bge-m3` | Model to use |
| `instruction` | `str` | `None` | Instruction prefix for encoding |
| `output_dtype` | `str` | `None` | Output dtype: float32, float16, int8, binary |
| `gpu` | `str` | `None` | Target GPU type for routing |
| `timeout_s` | `float` | `180.0` | Request timeout in seconds |
| `embed_batch_size` | `int` | `10` | Batch size for embedding multiple texts |

#### TypeScript

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `baseUrl` | `string` | `http://localhost:8080` | SIE server URL |
| `modelName` | `string` | `BAAI/bge-m3` | Model to use |
| `instruction` | `string` | `undefined` | Instruction prefix for encoding |
| `outputDtype` | `DType` | `undefined` | Output dtype: float32, float16, int8, binary |
| `gpu` | `string` | `undefined` | Target GPU type for routing |
| `timeout` | `number` | `180000` | Request timeout in milliseconds |
| `embedBatchSize` | `number` | `10` | Batch size for embedding multiple texts |

### SIENodePostprocessor

#### Python

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_url` | `str` | `http://localhost:8080` | SIE server URL |
| `model` | `str` | `jinaai/jina-reranker-v2-base-multilingual` | Reranker model |
| `top_n` | `int` | `None` | Number of nodes to return |
| `gpu` | `str` | `None` | Target GPU type for routing |
| `options` | `dict` | `None` | Model-specific options |
| `timeout_s` | `float` | `180.0` | Request timeout in seconds |

#### TypeScript

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `baseUrl` | `string` | `http://localhost:8080` | SIE server URL |
| `modelName` | `string` | `jinaai/jina-reranker-v2-base-multilingual` | Reranker model |
| `topN` | `number` | `undefined` | Number of nodes to return |
| `gpu` | `string` | `undefined` | Target GPU type for routing |
| `timeout` | `number` | `180000` | Request timeout in milliseconds |

### create_sie_extractor_tool / createSIEExtractorTool

The extraction model determines which result types are populated. The tool returns a dict with keys `entities`, `relations`, `classifications`, and `objects`. The tool name is `"sie_extract"`.

#### Python

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_url` | `str` | `http://localhost:8080` | SIE server URL |
| `model` | `str` | `urchade/gliner_multi-v2.1` | Extraction model (GLiNER, GLiREL, GLiClass, GroundingDINO, OWL-v2) |
| `labels` | `list[str]` | `["person", "organization", "location"]` | Labels for extraction (entity types, relation types, or classification categories) |
| `gpu` | `str` | `None` | Target GPU type for routing |
| `options` | `dict` | `None` | Model-specific options |
| `timeout_s` | `float` | `180.0` | Request timeout in seconds |
| `name` | `str` | `sie_extract` | Tool name for the agent |
| `description` | `str` | Auto-generated | Tool description for the agent |

#### TypeScript

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `baseUrl` | `string` | `http://localhost:8080` | SIE server URL |
| `modelName` | `string` | `urchade/gliner_multi-v2.1` | Extraction model (GLiNER, GLiREL, GLiClass, GroundingDINO, OWL-v2) |
| `labels` | `string[]` | `["person", "organization", "location"]` | Labels for extraction (entity types, relation types, or classification categories) |
| `threshold` | `number` | `undefined` | Minimum confidence threshold (0-1) |
| `gpu` | `string` | `undefined` | Target GPU type for routing |
| `timeout` | `number` | `180000` | Request timeout in milliseconds |
| `name` | `string` | `sie_extract` | Tool name for the agent |
| `description` | `string` | Auto-generated | Tool description for the agent |

## Extraction

Source: [integrations/sie_llamaindex/src/sie_llamaindex/extractors.py](https://github.com/superlinked/sie/blob/main/integrations/sie_llamaindex/src/sie_llamaindex/extractors.py)

`create_sie_extractor_tool` (Python) / `createSIEExtractorTool` (TypeScript) returns a LlamaIndex `FunctionTool` for use with agents. It supports all extraction types: entities (GLiNER), relations (GLiREL), classifications (GLiClass), and object detection (GroundingDINO/OWL-v2). The tool returns a dict with keys `entities`, `relations`, `classifications`, and `objects`.

### Entity Extraction

#### Python

```python
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from sie_llamaindex import create_sie_extractor_tool

extractor = create_sie_extractor_tool(
    base_url="http://localhost:8080",
    model="urchade/gliner_multi-v2.1",
    labels=["person", "organization", "location"],
)

agent = ReActAgent.from_tools([extractor], llm=OpenAI(model="gpt-4o-mini"))
response = agent.chat("Extract entities from: Tim Cook announced new products at Apple Park in Cupertino.")
print(response)
```

#### TypeScript

```typescript
import { OpenAI, ReActAgent } from "llamaindex";
import { createSIEExtractorTool } from "@superlinked/sie-llamaindex";

const extractor = createSIEExtractorTool({
  baseUrl: "http://localhost:8080",
  modelName: "urchade/gliner_multi-v2.1",
  labels: ["person", "organization", "location"],
});

const agent = new ReActAgent({
  tools: [extractor],
  llm: new OpenAI({ model: "gpt-4o-mini" }),
});

const response = await agent.chat({
  message: "Extract entities from: Tim Cook announced new products at Apple Park in Cupertino.",
});
console.log(response.message.content);
```

### Relation Extraction

Extract relationships between entities using GLiREL:

#### Python

```python
from sie_llamaindex import create_sie_extractor_tool

extractor = create_sie_extractor_tool(
    base_url="http://localhost:8080",
    model="jackboyla/glirel-large-v0",
    labels=["works_for", "ceo_of", "founded"],
)

# Use directly (without an agent)
result = extractor.call("Tim Cook is the CEO of Apple Inc.")
for relation in result["relations"]:
    print(f"{relation['head']} --{relation['relation']}--> {relation['tail']}")
# Tim Cook --ceo_of--> Apple Inc.
```

#### TypeScript

```typescript
import { createSIEExtractorTool } from "@superlinked/sie-llamaindex";

const extractor = createSIEExtractorTool({
  baseUrl: "http://localhost:8080",
  modelName: "jackboyla/glirel-large-v0",
  labels: ["works_for", "ceo_of", "founded"],
});

// Use directly (without an agent)
const result = await extractor.call("Tim Cook is the CEO of Apple Inc.");
for (const relation of result.relations) {
  console.log(`${relation.head} --${relation.relation}--> ${relation.tail}`);
}
// Tim Cook --ceo_of--> Apple Inc.
```

### Text Classification

Classify text into categories using GLiClass:

#### Python

```python
from sie_llamaindex import create_sie_extractor_tool

extractor = create_sie_extractor_tool(
    base_url="http://localhost:8080",
    model="knowledgator/gliclass-base-v1.0",
    labels=["positive", "negative", "neutral"],
)

result = extractor.call("I absolutely loved this movie! The acting was superb.")
for classification in result["classifications"]:
    print(f"{classification['label']}: {classification['score']:.2f}")
# positive: 0.94
# neutral: 0.04
# negative: 0.02
```

#### TypeScript

```typescript
import { createSIEExtractorTool } from "@superlinked/sie-llamaindex";

const extractor = createSIEExtractorTool({
  baseUrl: "http://localhost:8080",
  modelName: "knowledgator/gliclass-base-v1.0",
  labels: ["positive", "negative", "neutral"],
});

const result = await extractor.call(
  "I absolutely loved this movie! The acting was superb."
);
for (const classification of result.classifications) {
  console.log(`${classification.label}: ${classification.score.toFixed(2)}`);
}
// positive: 0.94
// neutral: 0.04
// negative: 0.02
```

## TypeScript Feature Support

The TypeScript `@superlinked/sie-llamaindex` package supports all core features.

| Feature | Python | TypeScript |
|---------|--------|-----------|
| Dense embeddings | Yes | Yes |
| Sparse embeddings | Yes | Yes |
| Reranking | Yes | Yes |
| Extraction (entities, relations, classifications, objects) | Yes | Yes |
| Multimodal embeddings | Yes | Via SDK |

## What's Next

- [Rerank Results](/docs/score/) - cross-encoder reranking details
- [Extract](/docs/extract/) - extraction details (NER, relations, classification, vision)
- [Model Catalog](/models) - all supported models
- [Troubleshooting](/docs/reference/troubleshooting/) - common errors and solutions