Skip to content
Why did we open-source our inference engine? Read the post

Chroma

The sie-chroma package (Python) and @superlinked/sie-chroma package (TypeScript) provide embedding functions for ChromaDB. Use SIEEmbeddingFunction for dense embeddings in standard collections. Use SIESparseEmbeddingFunction for hybrid search on Chroma Cloud.

Terminal window
pip install sie-chroma

This installs sie-sdk and chromadb as dependencies.

Terminal window
# Docker (recommended)
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:default
# Or with GPU
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:default

SIEEmbeddingFunction implements ChromaDB’s EmbeddingFunction protocol. Use it when creating or querying collections.

from sie_chroma import SIEEmbeddingFunction
embedding_function = SIEEmbeddingFunction(
base_url="http://localhost:8080",
model="BAAI/bge-m3",
)
ParameterTypeDefaultDescription
base_urlstrhttp://localhost:8080SIE server URL
modelstrBAAI/bge-m3Model to use for embeddings
gpustrNoneTarget GPU type for routing
optionsdictNoneModel-specific options
timeout_sfloat180.0Request timeout in seconds

Create a ChromaDB collection with SIE embeddings and perform similarity search:

import chromadb
from sie_chroma import SIEEmbeddingFunction
# Initialize the embedding function
embedding_function = SIEEmbeddingFunction(
base_url="http://localhost:8080",
model="BAAI/bge-m3",
)
# Create a Chroma client and collection
client = chromadb.Client()
collection = client.create_collection(
name="documents",
embedding_function=embedding_function,
)
# Add documents
collection.add(
documents=[
"Machine learning is a subset of artificial intelligence.",
"Neural networks are inspired by biological neurons.",
"Deep learning uses multiple layers of neural networks.",
"Python is popular for machine learning development.",
],
ids=["doc1", "doc2", "doc3", "doc4"],
)
# Query the collection
results = collection.query(
query_texts=["What is deep learning?"],
n_results=2,
)
for doc, distance in zip(results["documents"][0], results["distances"][0]):
print(f"{distance:.4f}: {doc}")
import chromadb
from sie_chroma import SIEEmbeddingFunction
embedding_function = SIEEmbeddingFunction(model="BAAI/bge-m3")
# Use persistent storage
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection(
name="my_collection",
embedding_function=embedding_function,
)

SIESparseEmbeddingFunction generates sparse embeddings for Chroma Cloud hybrid search. Use it with SparseVectorIndexConfig.

from sie_chroma import SIESparseEmbeddingFunction
sparse_ef = SIESparseEmbeddingFunction(
base_url="http://localhost:8080",
model="BAAI/bge-m3",
)

The sparse embedding function returns dict[int, float] mappings of token indices to weights. This format is compatible with Chroma Cloud’s hybrid search feature.

ChromaDB’s embedding function interface accepts text only. For image embedding with models like CLIP or SigLIP, use the SIE SDK to encode images and pass the pre-computed embeddings to ChromaDB:

from sie_sdk import SIEClient
from sie_sdk.types import Item
import chromadb
sie = SIEClient("http://localhost:8080")
chroma = chromadb.Client()
collection = chroma.create_collection("images")
# Encode images with SIE SDK
results = sie.encode(
"openai/clip-vit-large-patch14",
[Item(images=["img1.jpg"]), Item(images=["img2.jpg"])],
output_types=["dense"]
)
# Store pre-computed embeddings in Chroma
collection.add(
ids=["img1", "img2"],
embeddings=[r["dense"].tolist() for r in results],
metadatas=[{"path": "img1.jpg"}, {"path": "img2.jpg"}]
)

See Encode for full SDK documentation and the Model Catalog for supported vision models.

Contact us

Tell us about your use case and we'll get back to you shortly.