LlamaIndex
The sie-llamaindex package (Python) and @superlinked/sie-llamaindex package (TypeScript) provide drop-in components for LlamaIndex. Use SIEEmbedding for vector stores, SIENodePostprocessor for reranking, and create_sie_extractor_tool for extraction (entities, relations, classifications, and object detection).
Installation
Section titled “Installation”pip install sie-llamaindexThis installs sie-sdk and llama-index-core as dependencies.
pnpm add @superlinked/sie-llamaindexThis installs @superlinked/sie-sdk and llamaindex as dependencies.
Start the Server
Section titled “Start the Server”# Docker (recommended)docker run -p 8080:8080 ghcr.io/superlinked/sie-server:default
# Or with GPUdocker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:defaultEmbeddings
Section titled “Embeddings”SIEEmbedding implements LlamaIndex’s BaseEmbedding interface. Set it as the default embed model or use it directly.
from llama_index.core import Settingsfrom sie_llamaindex import SIEEmbedding
# Set as default embedding modelSettings.embed_model = SIEEmbedding( base_url="http://localhost:8080", model_name="BAAI/bge-m3")
# Or use directlyembed_model = SIEEmbedding(model_name="BAAI/bge-m3")embedding = embed_model.get_text_embedding("Your text here")print(len(embedding)) # 1024import { Settings } from "llamaindex";import { SIEEmbedding } from "@superlinked/sie-llamaindex";
// Set as default embedding modelSettings.embedModel = new SIEEmbedding({ baseUrl: "http://localhost:8080", modelName: "BAAI/bge-m3",});
// Or use directlyconst embedModel = new SIEEmbedding({ modelName: "BAAI/bge-m3" });const embedding = await embedModel.getTextEmbedding("Your text here");console.log(embedding.length); // 1024With VectorStoreIndex
Section titled “With VectorStoreIndex”from llama_index.core import Settings, VectorStoreIndex, Documentfrom sie_llamaindex import SIEEmbedding
Settings.embed_model = SIEEmbedding(model_name="BAAI/bge-m3")
documents = [ Document(text="Machine learning uses algorithms to learn from data."), Document(text="The weather is sunny today."),]
index = VectorStoreIndex.from_documents(documents)results = index.as_query_engine().query("What is machine learning?")import { Settings, VectorStoreIndex, Document } from "llamaindex";import { SIEEmbedding } from "@superlinked/sie-llamaindex";
Settings.embedModel = new SIEEmbedding({ modelName: "BAAI/bge-m3" });
const documents = [ new Document({ text: "Machine learning uses algorithms to learn from data." }), new Document({ text: "The weather is sunny today." }),];
const index = await VectorStoreIndex.fromDocuments(documents);const queryEngine = index.asQueryEngine();const results = await queryEngine.query({ query: "What is machine learning?" });Async Support
Section titled “Async Support”Both sync and async methods are available:
# Syncembedding = embed_model.get_text_embedding(text)embeddings = embed_model.get_text_embedding_batch(texts)
# Asyncembedding = await embed_model.aget_text_embedding(text)query_embedding = await embed_model.aget_query_embedding(query)All methods are async by default:
// Single textconst embedding = await embedModel.getTextEmbedding(text);
// Multiple textsconst embeddings = await embedModel.getTextEmbeddings(texts);Reranking
Section titled “Reranking”SIENodePostprocessor implements BaseNodePostprocessor. Use it to rerank retrieved nodes. Works with both cross-encoder models (e.g., jinaai/jina-reranker-v2-base-multilingual) and ColBERT/late-interaction models (e.g., jinaai/jina-colbert-v2) - just change the model name.
from llama_index.core.schema import NodeWithScore, TextNode, QueryBundlefrom sie_llamaindex import SIENodePostprocessor
reranker = SIENodePostprocessor( base_url="http://localhost:8080", model="jinaai/jina-reranker-v2-base-multilingual", top_n=3)
nodes = [ NodeWithScore(node=TextNode(text="Machine learning is a subset of AI."), score=0.5), NodeWithScore(node=TextNode(text="The weather is sunny today."), score=0.6), NodeWithScore(node=TextNode(text="Deep learning uses neural networks."), score=0.4),]
reranked = reranker.postprocess_nodes(nodes, QueryBundle(query_str="What is ML?"))
for node in reranked: print(f"{node.score:.3f}: {node.node.get_content()[:50]}")import { TextNode } from "llamaindex";import { SIENodePostprocessor } from "@superlinked/sie-llamaindex";
const reranker = new SIENodePostprocessor({ baseUrl: "http://localhost:8080", modelName: "jinaai/jina-reranker-v2-base-multilingual", topN: 3,});
const nodes = [ { node: new TextNode({ text: "Machine learning is a subset of AI." }), score: 0.5 }, { node: new TextNode({ text: "The weather is sunny today." }), score: 0.6 }, { node: new TextNode({ text: "Deep learning uses neural networks." }), score: 0.4 },];
const reranked = await reranker.postprocessNodes(nodes, "What is ML?");
for (const node of reranked) { console.log(`${node.score?.toFixed(3)}: ${node.node.getContent().slice(0, 50)}`);}With Query Engine
Section titled “With Query Engine”from llama_index.core import VectorStoreIndexfrom sie_llamaindex import SIENodePostprocessor
reranker = SIENodePostprocessor( model="jinaai/jina-reranker-v2-base-multilingual", top_n=5)
# Create query engine with rerankingquery_engine = index.as_query_engine( node_postprocessors=[reranker], similarity_top_k=20 # Retrieve 20, rerank to 5)
response = query_engine.query("What is machine learning?")import { SIENodePostprocessor } from "@superlinked/sie-llamaindex";
const reranker = new SIENodePostprocessor({ modelName: "jinaai/jina-reranker-v2-base-multilingual", topN: 5,});
// Create query engine with rerankingconst queryEngine = index.asQueryEngine({ nodePostprocessors: [reranker], similarityTopK: 20, // Retrieve 20, rerank to 5});
const response = await queryEngine.query({ query: "What is machine learning?" });Hybrid Search
Section titled “Hybrid Search”Use SIESparseEmbeddingFunction with vector stores that support hybrid search.
from llama_index.vector_stores.qdrant import QdrantVectorStorefrom qdrant_client import QdrantClientfrom sie_llamaindex import SIEEmbedding, SIESparseEmbeddingFunction
# Create sparse embedding functionsparse_embed_fn = SIESparseEmbeddingFunction( base_url="http://localhost:8080", model_name="BAAI/bge-m3")
# Create hybrid vector storeclient = QdrantClient(":memory:")vector_store = QdrantVectorStore( client=client, collection_name="hybrid_docs", enable_hybrid=True, sparse_embedding_function=sparse_embed_fn)import { QdrantVectorStore } from "llamaindex";import { QdrantClient } from "@qdrant/js-client-rest";import { SIEEmbedding, SIESparseEmbeddingFunction } from "@superlinked/sie-llamaindex";
// Create sparse embedding functionconst sparseEmbedFn = new SIESparseEmbeddingFunction({ baseUrl: "http://localhost:8080", modelName: "BAAI/bge-m3",});
// Create hybrid vector storeconst client = new QdrantClient({ url: "http://localhost:6333" });const vectorStore = new QdrantVectorStore({ client, collectionName: "hybrid_docs", enableHybrid: true, sparseEmbeddingFunction: sparseEmbedFn,});Multimodal Embeddings (Python only)
Section titled “Multimodal Embeddings (Python only)”SIEMultiModalEmbedding extends LlamaIndex’s MultiModalEmbedding base class, enabling image embedding with models like CLIP, SigLIP, and ColPali. It plugs into LlamaIndex’s multimodal pipelines (e.g. MultiModalVectorStoreIndex).
from llama_index.core import Settingsfrom sie_llamaindex import SIEMultiModalEmbedding
# Set as embedding model - supports both text and imagesSettings.embed_model = SIEMultiModalEmbedding( base_url="http://localhost:8080", model_name="openai/clip-vit-large-patch14")
# Embed imagesembedding = Settings.embed_model.get_image_embedding("photo.jpg")embeddings = Settings.embed_model.get_image_embedding_batch(["img1.jpg", "img2.jpg"])
# Text embeddings still work (inherited from BaseEmbedding)text_embedding = Settings.embed_model.get_text_embedding("A photo of a cat")Supported models include openai/clip-vit-large-patch14, google/siglip-base-patch16-224, vidore/colpali-v1.2, and other vision-capable models in the Model Catalog.
Full RAG Pipeline (Python)
Section titled “Full RAG Pipeline (Python)”Complete example combining embeddings, reranking, and LLM generation:
from llama_index.core import Settings, VectorStoreIndex, Documentfrom llama_index.llms.openai import OpenAIfrom sie_llamaindex import SIEEmbedding, SIENodePostprocessor
# 1. Configure SIE embeddingsSettings.embed_model = SIEEmbedding( base_url="http://localhost:8080", model_name="BAAI/bge-m3")Settings.llm = OpenAI(model="gpt-4o-mini")
# 2. Create documents and indexdocuments = [ Document(text="Machine learning is a branch of artificial intelligence."), Document(text="Neural networks are inspired by biological neurons."), Document(text="Deep learning uses multiple layers of neural networks."), Document(text="Python is popular for machine learning development."),]
index = VectorStoreIndex.from_documents(documents)
# 3. Create rerankerreranker = SIENodePostprocessor( base_url="http://localhost:8080", model="jinaai/jina-reranker-v2-base-multilingual", top_n=2)
# 4. Build query engine with rerankingquery_engine = index.as_query_engine( node_postprocessors=[reranker], similarity_top_k=10 # Retrieve 10, rerank to 2)
# 5. Queryresponse = query_engine.query("What is deep learning?")print(response)Configuration Options
Section titled “Configuration Options”SIEEmbedding
Section titled “SIEEmbedding”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model_name | str | BAAI/bge-m3 | Model to use |
instruction | str | None | Instruction prefix for encoding |
output_dtype | str | None | Output dtype: float32, float16, int8, binary |
gpu | str | None | Target GPU type for routing |
timeout_s | float | 180.0 | Request timeout in seconds |
embed_batch_size | int | 10 | Batch size for embedding multiple texts |
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | http://localhost:8080 | SIE server URL |
modelName | string | BAAI/bge-m3 | Model to use |
instruction | string | undefined | Instruction prefix for encoding |
outputDtype | DType | undefined | Output dtype: float32, float16, int8, binary |
gpu | string | undefined | Target GPU type for routing |
timeout | number | 180000 | Request timeout in milliseconds |
embedBatchSize | number | 10 | Batch size for embedding multiple texts |
SIENodePostprocessor
Section titled “SIENodePostprocessor”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | jinaai/jina-reranker-v2-base-multilingual | Reranker model |
top_n | int | None | Number of nodes to return |
gpu | str | None | Target GPU type for routing |
options | dict | None | Model-specific options |
timeout_s | float | 180.0 | Request timeout in seconds |
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | http://localhost:8080 | SIE server URL |
modelName | string | jinaai/jina-reranker-v2-base-multilingual | Reranker model |
topN | number | undefined | Number of nodes to return |
gpu | string | undefined | Target GPU type for routing |
timeout | number | 180000 | Request timeout in milliseconds |
create_sie_extractor_tool / createSIEExtractorTool
Section titled “create_sie_extractor_tool / createSIEExtractorTool”The extraction model determines which result types are populated. The tool returns a dict with keys entities, relations, classifications, and objects. The tool name is "sie_extract".
| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | urchade/gliner_multi-v2.1 | Extraction model (GLiNER, GLiREL, GLiClass, GroundingDINO, OWL-v2) |
labels | list[str] | ["person", "organization", "location"] | Labels for extraction (entity types, relation types, or classification categories) |
gpu | str | None | Target GPU type for routing |
options | dict | None | Model-specific options |
timeout_s | float | 180.0 | Request timeout in seconds |
name | str | sie_extract | Tool name for the agent |
description | str | Auto-generated | Tool description for the agent |
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | http://localhost:8080 | SIE server URL |
modelName | string | urchade/gliner_multi-v2.1 | Extraction model (GLiNER, GLiREL, GLiClass, GroundingDINO, OWL-v2) |
labels | string[] | ["person", "organization", "location"] | Labels for extraction (entity types, relation types, or classification categories) |
threshold | number | undefined | Minimum confidence threshold (0-1) |
gpu | string | undefined | Target GPU type for routing |
timeout | number | 180000 | Request timeout in milliseconds |
name | string | sie_extract | Tool name for the agent |
description | string | Auto-generated | Tool description for the agent |
Extraction
Section titled “Extraction”create_sie_extractor_tool (Python) / createSIEExtractorTool (TypeScript) returns a LlamaIndex FunctionTool for use with agents. It supports all extraction types: entities (GLiNER), relations (GLiREL), classifications (GLiClass), and object detection (GroundingDINO/OWL-v2). The tool returns a dict with keys entities, relations, classifications, and objects.
Entity Extraction
Section titled “Entity Extraction”from llama_index.core.agent import ReActAgentfrom llama_index.llms.openai import OpenAIfrom sie_llamaindex import create_sie_extractor_tool
extractor = create_sie_extractor_tool( base_url="http://localhost:8080", model="urchade/gliner_multi-v2.1", labels=["person", "organization", "location"],)
agent = ReActAgent.from_tools([extractor], llm=OpenAI(model="gpt-4o-mini"))response = agent.chat("Extract entities from: Tim Cook announced new products at Apple Park in Cupertino.")print(response)import { OpenAI, ReActAgent } from "llamaindex";import { createSIEExtractorTool } from "@superlinked/sie-llamaindex";
const extractor = createSIEExtractorTool({ baseUrl: "http://localhost:8080", modelName: "urchade/gliner_multi-v2.1", labels: ["person", "organization", "location"],});
const agent = new ReActAgent({ tools: [extractor], llm: new OpenAI({ model: "gpt-4o-mini" }),});
const response = await agent.chat({ message: "Extract entities from: Tim Cook announced new products at Apple Park in Cupertino.",});console.log(response.message.content);Relation Extraction
Section titled “Relation Extraction”Extract relationships between entities using GLiREL:
from sie_llamaindex import create_sie_extractor_tool
extractor = create_sie_extractor_tool( base_url="http://localhost:8080", model="jackboyla/glirel-large-v0", labels=["works_for", "ceo_of", "founded"],)
# Use directly (without an agent)result = extractor.call("Tim Cook is the CEO of Apple Inc.")for relation in result["relations"]: print(f"{relation['head']} --{relation['relation']}--> {relation['tail']}")# Tim Cook --ceo_of--> Apple Inc.import { createSIEExtractorTool } from "@superlinked/sie-llamaindex";
const extractor = createSIEExtractorTool({ baseUrl: "http://localhost:8080", modelName: "jackboyla/glirel-large-v0", labels: ["works_for", "ceo_of", "founded"],});
// Use directly (without an agent)const result = await extractor.call("Tim Cook is the CEO of Apple Inc.");for (const relation of result.relations) { console.log(`${relation.head} --${relation.relation}--> ${relation.tail}`);}// Tim Cook --ceo_of--> Apple Inc.Text Classification
Section titled “Text Classification”Classify text into categories using GLiClass:
from sie_llamaindex import create_sie_extractor_tool
extractor = create_sie_extractor_tool( base_url="http://localhost:8080", model="knowledgator/gliclass-base-v1.0", labels=["positive", "negative", "neutral"],)
result = extractor.call("I absolutely loved this movie! The acting was superb.")for classification in result["classifications"]: print(f"{classification['label']}: {classification['score']:.2f}")# positive: 0.94# neutral: 0.04# negative: 0.02import { createSIEExtractorTool } from "@superlinked/sie-llamaindex";
const extractor = createSIEExtractorTool({ baseUrl: "http://localhost:8080", modelName: "knowledgator/gliclass-base-v1.0", labels: ["positive", "negative", "neutral"],});
const result = await extractor.call( "I absolutely loved this movie! The acting was superb.");for (const classification of result.classifications) { console.log(`${classification.label}: ${classification.score.toFixed(2)}`);}// positive: 0.94// neutral: 0.04// negative: 0.02TypeScript Feature Support
Section titled “TypeScript Feature Support”The TypeScript @superlinked/sie-llamaindex package supports all core features.
| Feature | Python | TypeScript |
|---|---|---|
| Dense embeddings | Yes | Yes |
| Sparse embeddings | Yes | Yes |
| Reranking | Yes | Yes |
| Extraction (entities, relations, classifications, objects) | Yes | Yes |
| Multimodal embeddings | Yes | Via SDK |
What’s Next
Section titled “What’s Next”- Rerank Results - cross-encoder reranking details
- Extract - extraction details (NER, relations, classification, vision)
- Model Catalog - all supported models
- Troubleshooting - common errors and solutions