LlamaIndex

The sie-llamaindex package (Python) and @superlinked/sie-llamaindex package (TypeScript) provide drop-in components for LlamaIndex. Use SIEEmbedding for vector stores, SIENodePostprocessor for reranking, and create_sie_extractor_tool for extraction (entities, relations, classifications, and object detection).

pip install sie-llamaindex

This installs sie-sdk and llama-index-core as dependencies.

pnpm add @superlinked/sie-llamaindex

This installs @superlinked/sie-sdk and llamaindex as dependencies.

Start the Server

# Docker (recommended)
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default

# Or with GPU
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default

Embeddings

SIEEmbedding implements LlamaIndex’s BaseEmbedding interface. Set it as the default embed model or use it directly.

Python
TypeScript

from llama_index.core import Settings
from sie_llamaindex import SIEEmbedding

# Set as default embedding model
Settings.embed_model = SIEEmbedding(
    base_url="http://localhost:8080",
    model_name="BAAI/bge-m3"
)

# Or use directly
embed_model = SIEEmbedding(model_name="BAAI/bge-m3")
embedding = embed_model.get_text_embedding("Your text here")
print(len(embedding))  # 1024

import { Settings } from "llamaindex";
import { SIEEmbedding } from "@superlinked/sie-llamaindex";

// Set as default embedding model
Settings.embedModel = new SIEEmbedding({
  baseUrl: "http://localhost:8080",
  modelName: "BAAI/bge-m3",
});

// Or use directly
const embedModel = new SIEEmbedding({ modelName: "BAAI/bge-m3" });
const embedding = await embedModel.getTextEmbedding("Your text here");
console.log(embedding.length); // 1024

With VectorStoreIndex

Python
TypeScript

from llama_index.core import Settings, VectorStoreIndex, Document
from sie_llamaindex import SIEEmbedding

Settings.embed_model = SIEEmbedding(model_name="BAAI/bge-m3")

documents = [
    Document(text="Machine learning uses algorithms to learn from data."),
    Document(text="The weather is sunny today."),
]

index = VectorStoreIndex.from_documents(documents)
results = index.as_query_engine().query("What is machine learning?")

import { Settings, VectorStoreIndex, Document } from "llamaindex";
import { SIEEmbedding } from "@superlinked/sie-llamaindex";

Settings.embedModel = new SIEEmbedding({ modelName: "BAAI/bge-m3" });

const documents = [
  new Document({ text: "Machine learning uses algorithms to learn from data." }),
  new Document({ text: "The weather is sunny today." }),
];

const index = await VectorStoreIndex.fromDocuments(documents);
const queryEngine = index.asQueryEngine();
const results = await queryEngine.query({ query: "What is machine learning?" });

Async Support

Python
TypeScript

Both sync and async methods are available:

# Sync
embedding = embed_model.get_text_embedding(text)
embeddings = embed_model.get_text_embedding_batch(texts)

# Async
embedding = await embed_model.aget_text_embedding(text)
query_embedding = await embed_model.aget_query_embedding(query)

All methods are async by default:

// Single text
const embedding = await embedModel.getTextEmbedding(text);

// Multiple texts
const embeddings = await embedModel.getTextEmbeddings(texts);

Reranking

SIENodePostprocessor implements BaseNodePostprocessor. Use it to rerank retrieved nodes. Works with both cross-encoder models (e.g., jinaai/jina-reranker-v2-base-multilingual) and ColBERT/late-interaction models (e.g., jinaai/jina-colbert-v2) - just change the model name.

Python
TypeScript

from llama_index.core.schema import NodeWithScore, TextNode, QueryBundle
from sie_llamaindex import SIENodePostprocessor

reranker = SIENodePostprocessor(
    base_url="http://localhost:8080",
    model="jinaai/jina-reranker-v2-base-multilingual",
    top_n=3
)

nodes = [
    NodeWithScore(node=TextNode(text="Machine learning is a subset of AI."), score=0.5),
    NodeWithScore(node=TextNode(text="The weather is sunny today."), score=0.6),
    NodeWithScore(node=TextNode(text="Deep learning uses neural networks."), score=0.4),
]

reranked = reranker.postprocess_nodes(nodes, QueryBundle(query_str="What is ML?"))

for node in reranked:
    print(f"{node.score:.3f}: {node.node.get_content()[:50]}")

import { TextNode } from "llamaindex";
import { SIENodePostprocessor } from "@superlinked/sie-llamaindex";

const reranker = new SIENodePostprocessor({
  baseUrl: "http://localhost:8080",
  modelName: "jinaai/jina-reranker-v2-base-multilingual",
  topN: 3,
});

const nodes = [
  { node: new TextNode({ text: "Machine learning is a subset of AI." }), score: 0.5 },
  { node: new TextNode({ text: "The weather is sunny today." }), score: 0.6 },
  { node: new TextNode({ text: "Deep learning uses neural networks." }), score: 0.4 },
];

const reranked = await reranker.postprocessNodes(nodes, "What is ML?");

for (const node of reranked) {
  console.log(`${node.score?.toFixed(3)}: ${node.node.getContent().slice(0, 50)}`);
}

With Query Engine

Python
TypeScript

from llama_index.core import VectorStoreIndex
from sie_llamaindex import SIENodePostprocessor

reranker = SIENodePostprocessor(
    model="jinaai/jina-reranker-v2-base-multilingual",
    top_n=5
)

# Create query engine with reranking
query_engine = index.as_query_engine(
    node_postprocessors=[reranker],
    similarity_top_k=20  # Retrieve 20, rerank to 5
)

response = query_engine.query("What is machine learning?")

import { SIENodePostprocessor } from "@superlinked/sie-llamaindex";

const reranker = new SIENodePostprocessor({
  modelName: "jinaai/jina-reranker-v2-base-multilingual",
  topN: 5,
});

// Create query engine with reranking
const queryEngine = index.asQueryEngine({
  nodePostprocessors: [reranker],
  similarityTopK: 20, // Retrieve 20, rerank to 5
});

const response = await queryEngine.query({ query: "What is machine learning?" });

Hybrid Search

Use SIESparseEmbeddingFunction with vector stores that support hybrid search.

Python
TypeScript

from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from sie_llamaindex import SIEEmbedding, SIESparseEmbeddingFunction

# Create sparse embedding function
sparse_embed_fn = SIESparseEmbeddingFunction(
    base_url="http://localhost:8080",
    model_name="BAAI/bge-m3"
)

# Create hybrid vector store
client = QdrantClient(":memory:")
vector_store = QdrantVectorStore(
    client=client,
    collection_name="hybrid_docs",
    enable_hybrid=True,
    sparse_embedding_function=sparse_embed_fn
)

import { QdrantVectorStore } from "llamaindex";
import { QdrantClient } from "@qdrant/js-client-rest";
import { SIEEmbedding, SIESparseEmbeddingFunction } from "@superlinked/sie-llamaindex";

// Create sparse embedding function
const sparseEmbedFn = new SIESparseEmbeddingFunction({
  baseUrl: "http://localhost:8080",
  modelName: "BAAI/bge-m3",
});

// Create hybrid vector store
const client = new QdrantClient({ url: "http://localhost:6333" });
const vectorStore = new QdrantVectorStore({
  client,
  collectionName: "hybrid_docs",
  enableHybrid: true,
  sparseEmbeddingFunction: sparseEmbedFn,
});

Multimodal Embeddings (Python only)

SIEMultiModalEmbedding extends LlamaIndex’s MultiModalEmbedding base class, enabling image embedding with models like CLIP, SigLIP, and ColPali. It plugs into LlamaIndex’s multimodal pipelines (e.g. MultiModalVectorStoreIndex).

from llama_index.core import Settings
from sie_llamaindex import SIEMultiModalEmbedding

# Set as embedding model - supports both text and images
Settings.embed_model = SIEMultiModalEmbedding(
    base_url="http://localhost:8080",
    model_name="openai/clip-vit-large-patch14"
)

# Embed images
embedding = Settings.embed_model.get_image_embedding("photo.jpg")
embeddings = Settings.embed_model.get_image_embedding_batch(["img1.jpg", "img2.jpg"])

# Text embeddings still work (inherited from BaseEmbedding)
text_embedding = Settings.embed_model.get_text_embedding("A photo of a cat")

Supported models include openai/clip-vit-large-patch14, google/siglip-base-patch16-224, vidore/colpali-v1.2, and other vision-capable models in the Model Catalog.

LlamaIndex.TS does not have a MultiModalEmbedding base class. For image embedding in TypeScript, use the SIE SDK directly:

import { SIEClient } from "@superlinked/sie-sdk";

const client = new SIEClient("http://localhost:8080");
const result = await client.encode("openai/clip-vit-large-patch14", {
  images: [imageBytes],  // Uint8Array
});

Full RAG Pipeline (Python)

Complete example combining embeddings, reranking, and LLM generation:

from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.llms.openai import OpenAI
from sie_llamaindex import SIEEmbedding, SIENodePostprocessor

# 1. Configure SIE embeddings
Settings.embed_model = SIEEmbedding(
    base_url="http://localhost:8080",
    model_name="BAAI/bge-m3"
)
Settings.llm = OpenAI(model="gpt-4o-mini")

# 2. Create documents and index
documents = [
    Document(text="Machine learning is a branch of artificial intelligence."),
    Document(text="Neural networks are inspired by biological neurons."),
    Document(text="Deep learning uses multiple layers of neural networks."),
    Document(text="Python is popular for machine learning development."),
]

index = VectorStoreIndex.from_documents(documents)

# 3. Create reranker
reranker = SIENodePostprocessor(
    base_url="http://localhost:8080",
    model="jinaai/jina-reranker-v2-base-multilingual",
    top_n=2
)

# 4. Build query engine with reranking
query_engine = index.as_query_engine(
    node_postprocessors=[reranker],
    similarity_top_k=10  # Retrieve 10, rerank to 2
)

# 5. Query
response = query_engine.query("What is deep learning?")
print(response)

Configuration Options

SIEEmbedding

Python
TypeScript

Parameter	Type	Default	Description
`base_url`	`str`	`http://localhost:8080`	SIE server URL
`model_name`	`str`	`BAAI/bge-m3`	Model to use
`instruction`	`str`	`None`	Instruction prefix for encoding
`output_dtype`	`str`	`None`	Output dtype: float32, float16, int8, binary
`gpu`	`str`	`None`	Target GPU type for routing
`timeout_s`	`float`	`180.0`	Request timeout in seconds
`embed_batch_size`	`int`	`10`	Batch size for embedding multiple texts

Parameter	Type	Default	Description
`baseUrl`	`string`	`http://localhost:8080`	SIE server URL
`modelName`	`string`	`BAAI/bge-m3`	Model to use
`instruction`	`string`	`undefined`	Instruction prefix for encoding
`outputDtype`	`DType`	`undefined`	Output dtype: float32, float16, int8, binary
`gpu`	`string`	`undefined`	Target GPU type for routing
`timeout`	`number`	`180000`	Request timeout in milliseconds
`embedBatchSize`	`number`	`10`	Batch size for embedding multiple texts

SIENodePostprocessor

Python
TypeScript

Parameter	Type	Default	Description
`base_url`	`str`	`http://localhost:8080`	SIE server URL
`model`	`str`	`jinaai/jina-reranker-v2-base-multilingual`	Reranker model
`top_n`	`int`	`None`	Number of nodes to return
`gpu`	`str`	`None`	Target GPU type for routing
`options`	`dict`	`None`	Model-specific options
`timeout_s`	`float`	`180.0`	Request timeout in seconds

Parameter	Type	Default	Description
`baseUrl`	`string`	`http://localhost:8080`	SIE server URL
`modelName`	`string`	`jinaai/jina-reranker-v2-base-multilingual`	Reranker model
`topN`	`number`	`undefined`	Number of nodes to return
`gpu`	`string`	`undefined`	Target GPU type for routing
`timeout`	`number`	`180000`	Request timeout in milliseconds

create_sie_extractor_tool / createSIEExtractorTool

The extraction model determines which result types are populated. The tool returns a dict with keys entities, relations, classifications, and objects. The tool name is "sie_extract".

Python
TypeScript

Parameter	Type	Default	Description
`base_url`	`str`	`http://localhost:8080`	SIE server URL
`model`	`str`	`urchade/gliner_multi-v2.1`	Extraction model (GLiNER, GLiREL, GLiClass, GroundingDINO, OWL-v2)
`labels`	`list[str]`	`["person", "organization", "location"]`	Labels for extraction (entity types, relation types, or classification categories)
`gpu`	`str`	`None`	Target GPU type for routing
`options`	`dict`	`None`	Model-specific options
`timeout_s`	`float`	`180.0`	Request timeout in seconds
`name`	`str`	`sie_extract`	Tool name for the agent
`description`	`str`	Auto-generated	Tool description for the agent

Parameter	Type	Default	Description
`baseUrl`	`string`	`http://localhost:8080`	SIE server URL
`modelName`	`string`	`urchade/gliner_multi-v2.1`	Extraction model (GLiNER, GLiREL, GLiClass, GroundingDINO, OWL-v2)
`labels`	`string[]`	`["person", "organization", "location"]`	Labels for extraction (entity types, relation types, or classification categories)
`threshold`	`number`	`undefined`	Minimum confidence threshold (0-1)
`gpu`	`string`	`undefined`	Target GPU type for routing
`timeout`	`number`	`180000`	Request timeout in milliseconds
`name`	`string`	`sie_extract`	Tool name for the agent
`description`	`string`	Auto-generated	Tool description for the agent

Extraction

create_sie_extractor_tool (Python) / createSIEExtractorTool (TypeScript) returns a LlamaIndex FunctionTool for use with agents. It supports all extraction types: entities (GLiNER), relations (GLiREL), classifications (GLiClass), and object detection (GroundingDINO/OWL-v2). The tool returns a dict with keys entities, relations, classifications, and objects.

Entity Extraction

Python
TypeScript

from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI
from sie_llamaindex import create_sie_extractor_tool

extractor = create_sie_extractor_tool(
    base_url="http://localhost:8080",
    model="urchade/gliner_multi-v2.1",
    labels=["person", "organization", "location"],
)

agent = ReActAgent.from_tools([extractor], llm=OpenAI(model="gpt-4o-mini"))
response = agent.chat("Extract entities from: Tim Cook announced new products at Apple Park in Cupertino.")
print(response)

import { OpenAI, ReActAgent } from "llamaindex";
import { createSIEExtractorTool } from "@superlinked/sie-llamaindex";

const extractor = createSIEExtractorTool({
  baseUrl: "http://localhost:8080",
  modelName: "urchade/gliner_multi-v2.1",
  labels: ["person", "organization", "location"],
});

const agent = new ReActAgent({
  tools: [extractor],
  llm: new OpenAI({ model: "gpt-4o-mini" }),
});

const response = await agent.chat({
  message: "Extract entities from: Tim Cook announced new products at Apple Park in Cupertino.",
});
console.log(response.message.content);

Relation Extraction

Extract relationships between entities using GLiREL:

Python
TypeScript

from sie_llamaindex import create_sie_extractor_tool

extractor = create_sie_extractor_tool(
    base_url="http://localhost:8080",
    model="jackboyla/glirel-large-v0",
    labels=["works_for", "ceo_of", "founded"],
)

# Use directly (without an agent)
result = extractor.call("Tim Cook is the CEO of Apple Inc.")
for relation in result["relations"]:
    print(f"{relation['head']} --{relation['relation']}--> {relation['tail']}")
# Tim Cook --ceo_of--> Apple Inc.

import { createSIEExtractorTool } from "@superlinked/sie-llamaindex";

const extractor = createSIEExtractorTool({
  baseUrl: "http://localhost:8080",
  modelName: "jackboyla/glirel-large-v0",
  labels: ["works_for", "ceo_of", "founded"],
});

// Use directly (without an agent)
const result = await extractor.call("Tim Cook is the CEO of Apple Inc.");
for (const relation of result.relations) {
  console.log(`${relation.head} --${relation.relation}--> ${relation.tail}`);
}
// Tim Cook --ceo_of--> Apple Inc.

Text Classification

Classify text into categories using GLiClass:

Python
TypeScript

from sie_llamaindex import create_sie_extractor_tool

extractor = create_sie_extractor_tool(
    base_url="http://localhost:8080",
    model="knowledgator/gliclass-base-v1.0",
    labels=["positive", "negative", "neutral"],
)

result = extractor.call("I absolutely loved this movie! The acting was superb.")
for classification in result["classifications"]:
    print(f"{classification['label']}: {classification['score']:.2f}")
# positive: 0.94
# neutral: 0.04
# negative: 0.02

import { createSIEExtractorTool } from "@superlinked/sie-llamaindex";

const extractor = createSIEExtractorTool({
  baseUrl: "http://localhost:8080",
  modelName: "knowledgator/gliclass-base-v1.0",
  labels: ["positive", "negative", "neutral"],
});

const result = await extractor.call(
  "I absolutely loved this movie! The acting was superb."
);
for (const classification of result.classifications) {
  console.log(`${classification.label}: ${classification.score.toFixed(2)}`);
}
// positive: 0.94
// neutral: 0.04
// negative: 0.02

TypeScript Feature Support

The TypeScript @superlinked/sie-llamaindex package supports all core features.

Feature	Python	TypeScript
Dense embeddings	Yes	Yes
Sparse embeddings	Yes	Yes
Reranking	Yes	Yes
Extraction (entities, relations, classifications, objects)	Yes	Yes
Multimodal embeddings	Yes	Via SDK

What’s Next

Rerank Results - cross-encoder reranking details
Extract - extraction details (NER, relations, classification, vision)
Model Catalog - all supported models
Troubleshooting - common errors and solutions