CrewAI

The sie-crewai package provides CrewAI tools and embedders: SIERerankerTool for reranking, SIEExtractorTool for extraction (entities, relations, classifications, and object detection), and SIESparseEmbedder for hybrid search.

Installation

pip install sie-crewai

This installs sie-sdk and crewai as dependencies.

Start the Server

# Docker (recommended)
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default

# Or with GPU
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default

Embedders

SIE integrates with CrewAI through two embedding approaches:

Dense embeddings - Use SIE’s OpenAI-compatible API with CrewAI’s built-in embedder config
Sparse embeddings - Use SIESparseEmbedder for hybrid search workflows

Dense Embeddings

Configure CrewAI to use SIE’s OpenAI-compatible endpoint:

from crewai import Crew

crew = Crew(
    agents=[...],
    tasks=[...],
    embedder={
        "provider": "openai",
        "config": {
            "api_base": "http://localhost:8080/v1",
            "model": "BAAI/bge-m3"
        }
    }
)

Sparse Embeddings

Use SIESparseEmbedder for sparse vectors in hybrid search:

from sie_crewai import SIESparseEmbedder

sparse_embedder = SIESparseEmbedder(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3"
)

# Embed documents
sparse_vectors = sparse_embedder.embed_documents([
    "Machine learning uses algorithms to learn from data.",
    "The weather is sunny today."
])
print(sparse_vectors[0].keys())  # dict_keys(['indices', 'values'])

# Embed a query (uses is_query=True for asymmetric models)
query_vector = sparse_embedder.embed_query("What is machine learning?")

Full Example

Complete example using SIE embeddings with a CrewAI agent for hybrid search:

from crewai import Agent, Crew, Task
from sie_crewai import SIESparseEmbedder

# 1. Configure dense embeddings via OpenAI-compatible API
embedder_config = {
    "provider": "openai",
    "config": {
        "api_base": "http://localhost:8080/v1",
        "model": "BAAI/bge-m3"
    }
}

# 2. Set up sparse embedder for hybrid search
sparse_embedder = SIESparseEmbedder(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3"
)

# 3. Prepare your corpus with both dense and sparse embeddings
corpus = [
    "Machine learning is a branch of artificial intelligence.",
    "Neural networks are inspired by biological neurons.",
    "Deep learning uses multiple layers of neural networks.",
]

# Get sparse embeddings for your vector database
sparse_vectors = sparse_embedder.embed_documents(corpus)
# Store sparse_vectors in your vector DB (Qdrant, Weaviate, etc.)

# 4. Create a research agent
researcher = Agent(
    role="Research Analyst",
    goal="Find and analyze information from the knowledge base",
    backstory="Expert at finding relevant information using semantic search.",
    verbose=True
)

# 5. Define the research task
research_task = Task(
    description="Search the knowledge base for information about deep learning.",
    expected_output="A summary of findings about deep learning.",
    agent=researcher
)

# 6. Create and run the crew
crew = Crew(
    agents=[researcher],
    tasks=[research_task],
    embedder=embedder_config,
    verbose=True
)

result = crew.kickoff()
print(result)

Reranker Tool

SIERerankerTool is a CrewAI BaseTool that reranks documents by relevance to a query. Agents can use it to improve search quality.

from crewai import Agent, Crew, Task
from sie_crewai import SIERerankerTool

reranker = SIERerankerTool(
    base_url="http://localhost:8080",
    model="jinaai/jina-reranker-v2-base-multilingual",
)

researcher = Agent(
    role="Research Analyst",
    goal="Find the most relevant information",
    tools=[reranker],
)

task = Task(
    description="Rerank these documents for the query 'What is deep learning?'",
    expected_output="The most relevant documents.",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()

Extractor Tool

SIEExtractorTool is a CrewAI BaseTool that extracts structured data from text. It supports all extraction types: entities (GLiNER), relations (GLiREL), classifications (GLiClass), and object detection (GroundingDINO/OWL-v2). The _run() method formats all 4 types in the string output with separate sections for entities, relations, classifications, and objects.

Entity Extraction

from crewai import Agent, Crew, Task
from sie_crewai import SIEExtractorTool

extractor = SIEExtractorTool(
    base_url="http://localhost:8080",
    model="urchade/gliner_multi-v2.1",
    labels=["person", "organization", "location"],
)

analyst = Agent(
    role="Data Analyst",
    goal="Extract key entities from documents",
    tools=[extractor],
)

task = Task(
    description="Extract all people, organizations, and locations from: 'Tim Cook announced new products at Apple Park in Cupertino.'",
    expected_output="A list of extracted entities.",
    agent=analyst,
)

crew = Crew(agents=[analyst], tasks=[task])
result = crew.kickoff()

Relation Extraction

Extract relationships between entities using GLiREL:

from sie_crewai import SIEExtractorTool

extractor = SIEExtractorTool(
    base_url="http://localhost:8080",
    model="jackboyla/glirel-large-v0",
    labels=["works_for", "ceo_of", "founded"],
)

# Use with an agent, or call directly:
result = extractor._run("Tim Cook is the CEO of Apple Inc.")
print(result)
# Relations:
# Tim Cook --ceo_of--> Apple Inc. (score: 0.92)

Text Classification

Classify text into categories using GLiClass:

from sie_crewai import SIEExtractorTool

extractor = SIEExtractorTool(
    base_url="http://localhost:8080",
    model="knowledgator/gliclass-base-v1.0",
    labels=["positive", "negative", "neutral"],
)

result = extractor._run("I absolutely loved this movie! The acting was superb.")
print(result)
# Classifications:
# positive (score: 0.94)
# neutral (score: 0.04)
# negative (score: 0.02)

Configuration Options

SIESparseEmbedder

Parameter	Type	Default	Description
`base_url`	`str`	`http://localhost:8080`	SIE server URL
`model`	`str`	`BAAI/bge-m3`	Model to use for sparse embeddings
`gpu`	`str`	`None`	Target GPU type for routing
`options`	`dict`	`None`	Model-specific options
`timeout_s`	`float`	`180.0`	Request timeout in seconds

SIERerankerTool

Parameter	Type	Default	Description
`base_url`	`str`	`http://localhost:8080`	SIE server URL
`model`	`str`	`jinaai/jina-reranker-v2-base-multilingual`	Reranker model
`gpu`	`str`	`None`	Target GPU type for routing
`options`	`dict`	`None`	Model-specific options
`timeout_s`	`float`	`180.0`	Request timeout in seconds

SIEExtractorTool

The extraction model determines which result types are included in the output. Use GLiNER models for entities, GLiREL for relations, GLiClass for classifications, and GroundingDINO/OWL-v2 for object detection.

Parameter	Type	Default	Description
`base_url`	`str`	`http://localhost:8080`	SIE server URL
`model`	`str`	`urchade/gliner_multi-v2.1`	Extraction model (GLiNER, GLiREL, GLiClass, GroundingDINO, OWL-v2)
`labels`	`list[str]`	`["person", "organization", "location"]`	Labels for extraction (entity types, relation types, or classification categories)
`gpu`	`str`	`None`	Target GPU type for routing
`options`	`dict`	`None`	Model-specific options
`timeout_s`	`float`	`180.0`	Request timeout in seconds

What’s Next

Encode Text - dense and sparse embedding details
Score / Rerank - reranking details
Extract - extraction details (NER, relations, classification, vision)
Model Catalog - all supported models
Troubleshooting - common errors and solutions