How to generate embeddings with SIE
Dense embeddings are fixed-dimension float vectors that capture semantic meaning. SIE’s encode primitive converts text or images into these vectors using any of 85+ supported models. The resulting embeddings power semantic search, RAG retrieval, and recommendation systems.
from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")result = client.encode("BAAI/bge-m3", Item(text="Your text here"))print(f"Dimensions: {len(result['dense'])}") # 1024import { SIEClient } from "@superlinked/sie-sdk";
const client = new SIEClient("http://localhost:8080");const result = await client.encode("BAAI/bge-m3", { text: "Your text here" });console.log(`Dimensions: ${result.dense?.length}`); // 1024
await client.close();Not sure which model to use? See the Model Selection Guide or the full model catalog.
When Should I Use Dense Embeddings?
Section titled “When Should I Use Dense Embeddings?”Use dense embeddings when:
- You need semantic similarity matching, not just exact keyword matching
- Your vector database supports ANN search (Qdrant, Weaviate, Chroma, and LanceDB all work with SIE)
- You want a simple, fast retrieval baseline to start with
Consider a different approach when:
- You need hybrid search (keyword and semantic combined): see Sparse and Hybrid Search
- You need maximum retrieval accuracy: see Multi-vector and ColBERT
- You are searching over images: see Multimodal Embeddings
Basic Usage
Section titled “Basic Usage”Single Item
Section titled “Single Item”result = client.encode("BAAI/bge-m3", Item(text="Hello world"))print(result["dense"][:5]) # First 5 dimensionsconst result = await client.encode("BAAI/bge-m3", { text: "Hello world" });console.log(result.dense?.slice(0, 5)); // First 5 dimensionsBatch Encoding
Section titled “Batch Encoding”Pass a list of items for efficient GPU-batched processing:
items = [ Item(text="First document"), Item(text="Second document"), Item(text="Third document"),]results = client.encode("BAAI/bge-m3", items)const results = await client.encode("BAAI/bge-m3", [ { text: "First document" }, { text: "Second document" }, { text: "Third document" },]);The server batches requests automatically. You do not need to manage batch sizes manually.
Tracking Items by ID
Section titled “Tracking Items by ID”items = [ Item(id="doc-1", text="First document"), Item(id="doc-2", text="Second document"),]results = client.encode("BAAI/bge-m3", items)for result in results: print(f"{result['id']}: {len(result['dense'])} dims")const results = await client.encode("BAAI/bge-m3", [ { id: "doc-1", text: "First document" }, { id: "doc-2", text: "Second document" },]);for (const result of results) { console.log(`${result.id}: ${result.dense?.length} dims`);}Should I Encode Queries and Documents Differently?
Section titled “Should I Encode Queries and Documents Differently?”Yes, for asymmetric models. Queries are short and question-like; documents are longer content. Many models are trained to distinguish these and perform better when you tell them which is which.
# Encode a search queryquery = client.encode( "BAAI/bge-m3", Item(text="What is machine learning?"), is_query=True,)
# Encode documents (default, no is_query flag needed)documents = client.encode( "BAAI/bge-m3", [Item(text="Machine learning is..."), Item(text="Deep learning uses...")],)// Encode a search queryconst query = await client.encode( "BAAI/bge-m3", { text: "What is machine learning?" }, { isQuery: true },);
// Encode documents (default, no isQuery flag needed)const documents = await client.encode( "BAAI/bge-m3", [{ text: "Machine learning is..." }, { text: "Deep learning uses..." }],);For instruction-tuned models like Alibaba-NLP/gte-Qwen2-1.5B-instruct, pass an explicit instruction string to guide embedding behaviour:
result = client.encode( "Alibaba-NLP/gte-Qwen2-1.5B-instruct", Item(text="What is Python?"), instruction="Represent this query for retrieving programming tutorials:")const result = await client.encode( "Alibaba-NLP/gte-Qwen2-1.5B-instruct", { text: "What is Python?" }, { instruction: "Represent this query for retrieving programming tutorials:" },);What Output Types Are Available?
Section titled “What Output Types Are Available?”By default, encode returns dense embeddings. Models that support it (such as BAAI/bge-m3) can return sparse and multi-vector outputs in a single call:
result = client.encode( "BAAI/bge-m3", Item(text="text"), output_types=["dense", "sparse", "multivector"])
print(result["dense"]) # 1024-dim float arrayprint(result["sparse"]) # {"indices": [...], "values": [...]}print(result["multivector"]) # [num_tokens, 1024] arrayconst result = await client.encode( "BAAI/bge-m3", { text: "text" }, { outputTypes: ["dense", "sparse", "multivector"] },);
console.log(result.dense); // Float32Array, 1024 elementsconsole.log(result.sparse); // { indices: Int32Array, values: Float32Array }console.log(result.multivector); // Float32Array[], [num_tokens][1024]Not all models support all output types. BAAI/bge-m3 is the main model supporting all three. Most models support dense only.
Response Fields
Section titled “Response Fields”| Field | Type | Description |
|---|---|---|
id | str or None | Item ID if provided |
dense | NDArray[float32] | Dense embedding vector |
sparse | SparseResult or None | Sparse indices and values |
multivector | NDArray[float32] or None | Per-token embeddings (ColBERT) |
timing | TimingInfo | Request timing breakdown |
Good Starting Models
Section titled “Good Starting Models”| Model | Dims | Max Length | Notes |
|---|---|---|---|
BAAI/bge-m3 | 1024 | 8192 | Multilingual; supports dense, sparse, multivector |
NovaSearch/stella_en_400M_v5 | 1024 | 512 | Best English quality per GB of VRAM |
intfloat/e5-base-v2 | 768 | 512 | Solid all-rounder |
sentence-transformers/all-MiniLM-L6-v2 | 384 | 256 | Fastest and most lightweight |
See How do I choose the right model? or the model catalog.
HTTP API
Section titled “HTTP API”The server defaults to msgpack for efficient numpy transport. To use plain JSON:
curl -X POST http://localhost:8080/v1/encode/BAAI/bge-m3 \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -d '{"items": [{"text": "Your text here"}]}'See the full HTTP API Reference.
Frequently Asked Questions
Section titled “Frequently Asked Questions”What is the difference between dense and sparse embeddings? Dense embeddings represent meaning as a fixed-length float vector (for example, 1024 numbers). Sparse embeddings represent text as a weighted set of vocabulary tokens, which is useful for keyword matching. Most use cases start with dense. Add sparse when you need hybrid search. See Sparse and Hybrid Search.
What embedding dimensions should I use?
Higher dimensions capture more nuance but use more memory and slow down ANN search. 384-dim models like all-MiniLM are fast but less precise. 1024-dim models like bge-m3 and stella are the standard production choice. 4096-dim models like NV-Embed-v2 give the best quality at high memory cost. Start at 1024.
Can SIE generate image embeddings?
Yes. SIE supports multimodal models like google/siglip-so400m-patch14-384 that embed both text and images into the same vector space. See Multimodal Embeddings.
Does SIE integrate with LangChain, LlamaIndex, or Haystack? Yes. SIE has first-class integrations with LangChain, LlamaIndex, Haystack, Qdrant, Weaviate, Chroma, LanceDB, and more. See Integrations.