Chroma
The sie-chroma package (Python) and @superlinked/sie-chroma package (TypeScript) provide embedding functions for ChromaDB. Use SIEEmbeddingFunction for dense embeddings in standard collections. Use SIESparseEmbeddingFunction for hybrid search on Chroma Cloud.
Installation
Section titled “Installation”pip install sie-chromaThis installs sie-sdk and chromadb as dependencies.
pnpm add @superlinked/sie-chromaThis installs @superlinked/sie-sdk and chromadb as dependencies.
Start the Server
Section titled “Start the Server”# Docker (recommended)docker run -p 8080:8080 ghcr.io/superlinked/sie-server:default
# Or with GPUdocker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:defaultEmbedding Function
Section titled “Embedding Function”SIEEmbeddingFunction implements ChromaDB’s EmbeddingFunction protocol. Use it when creating or querying collections.
from sie_chroma import SIEEmbeddingFunction
embedding_function = SIEEmbeddingFunction( base_url="http://localhost:8080", model="BAAI/bge-m3",)import { SIEEmbeddingFunction } from "@superlinked/sie-chroma";
const embeddingFunction = new SIEEmbeddingFunction({ baseUrl: "http://localhost:8080", model: "BAAI/bge-m3",});Configuration Options
Section titled “Configuration Options”| Parameter | Type | Default | Description |
|---|---|---|---|
base_url | str | http://localhost:8080 | SIE server URL |
model | str | BAAI/bge-m3 | Model to use for embeddings |
gpu | str | None | Target GPU type for routing |
options | dict | None | Model-specific options |
timeout_s | float | 180.0 | Request timeout in seconds |
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | http://localhost:8080 | SIE server URL |
model | string | BAAI/bge-m3 | Model to use for embeddings |
gpu | string | undefined | Target GPU type for routing |
timeout | number | 180000 | Request timeout in milliseconds |
Full Example
Section titled “Full Example”Create a ChromaDB collection with SIE embeddings and perform similarity search:
import chromadbfrom sie_chroma import SIEEmbeddingFunction
# Initialize the embedding functionembedding_function = SIEEmbeddingFunction( base_url="http://localhost:8080", model="BAAI/bge-m3",)
# Create a Chroma client and collectionclient = chromadb.Client()collection = client.create_collection( name="documents", embedding_function=embedding_function,)
# Add documentscollection.add( documents=[ "Machine learning is a subset of artificial intelligence.", "Neural networks are inspired by biological neurons.", "Deep learning uses multiple layers of neural networks.", "Python is popular for machine learning development.", ], ids=["doc1", "doc2", "doc3", "doc4"],)
# Query the collectionresults = collection.query( query_texts=["What is deep learning?"], n_results=2,)
for doc, distance in zip(results["documents"][0], results["distances"][0]): print(f"{distance:.4f}: {doc}")import { ChromaClient } from "chromadb";import { SIEEmbeddingFunction } from "@superlinked/sie-chroma";
// Initialize the embedding functionconst embeddingFunction = new SIEEmbeddingFunction({ baseUrl: "http://localhost:8080", model: "BAAI/bge-m3",});
// Create a Chroma client and collectionconst client = new ChromaClient();const collection = await client.createCollection({ name: "documents", embeddingFunction,});
// Add documentsawait collection.add({ documents: [ "Machine learning is a subset of artificial intelligence.", "Neural networks are inspired by biological neurons.", "Deep learning uses multiple layers of neural networks.", "Python is popular for machine learning development.", ], ids: ["doc1", "doc2", "doc3", "doc4"],});
// Query the collectionconst results = await collection.query({ queryTexts: ["What is deep learning?"], nResults: 2,});
for (let i = 0; i < results.documents[0].length; i++) { const doc = results.documents[0][i]; const distance = results.distances?.[0][i]; console.log(`${distance?.toFixed(4)}: ${doc}`);}With Persistent Storage
Section titled “With Persistent Storage”import chromadbfrom sie_chroma import SIEEmbeddingFunction
embedding_function = SIEEmbeddingFunction(model="BAAI/bge-m3")
# Use persistent storageclient = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_or_create_collection( name="my_collection", embedding_function=embedding_function,)import { ChromaClient } from "chromadb";import { SIEEmbeddingFunction } from "@superlinked/sie-chroma";
const embeddingFunction = new SIEEmbeddingFunction({ model: "BAAI/bge-m3" });
// Use persistent storage (requires chroma server running)const client = new ChromaClient({ path: "http://localhost:8000" });
const collection = await client.getOrCreateCollection({ name: "my_collection", embeddingFunction,});Sparse Embeddings (Chroma Cloud)
Section titled “Sparse Embeddings (Chroma Cloud)”SIESparseEmbeddingFunction generates sparse embeddings for Chroma Cloud hybrid search. Use it with SparseVectorIndexConfig.
from sie_chroma import SIESparseEmbeddingFunction
sparse_ef = SIESparseEmbeddingFunction( base_url="http://localhost:8080", model="BAAI/bge-m3",)The sparse embedding function returns dict[int, float] mappings of token indices to weights. This format is compatible with Chroma Cloud’s hybrid search feature.
import { SIESparseEmbeddingFunction } from "@superlinked/sie-chroma";
const sparseEf = new SIESparseEmbeddingFunction({ baseUrl: "http://localhost:8080", model: "BAAI/bge-m3",});
// Generate sparse embeddingsconst embeddings = await sparseEf.generate(["Hello world"]);console.log(embeddings[0].indices); // [1, 5, 10, ...]console.log(embeddings[0].values); // [0.5, 0.3, 0.2, ...]
// Or as dict format for Chroma Cloudconst dictEmbeddings = await sparseEf.generateAsDict(["Hello world"]);console.log(dictEmbeddings[0]); // { 1: 0.5, 5: 0.3, 10: 0.2, ... }The sparse embedding function returns { indices: number[], values: number[] } objects or Record<number, number> dicts (via generateAsDict). Both formats are compatible with Chroma Cloud’s hybrid search feature.
Multimodal Embeddings
Section titled “Multimodal Embeddings”ChromaDB’s embedding function interface accepts text only. For image embedding with models like CLIP or SigLIP, use the SIE SDK to encode images and pass the pre-computed embeddings to ChromaDB:
from sie_sdk import SIEClientfrom sie_sdk.types import Itemimport chromadb
sie = SIEClient("http://localhost:8080")chroma = chromadb.Client()collection = chroma.create_collection("images")
# Encode images with SIE SDKresults = sie.encode( "openai/clip-vit-large-patch14", [Item(images=["img1.jpg"]), Item(images=["img2.jpg"])], output_types=["dense"])
# Store pre-computed embeddings in Chromacollection.add( ids=["img1", "img2"], embeddings=[r["dense"].tolist() for r in results], metadatas=[{"path": "img1.jpg"}, {"path": "img2.jpg"}])See Encode for full SDK documentation and the Model Catalog for supported vision models.
What’s Next
Section titled “What’s Next”- Encode Text - embedding API details and output types
- Model Catalog - all supported embedding models
- Troubleshooting - common errors and solutions