How to generate embeddings with SIE

Dense embeddings are fixed-dimension float vectors that capture semantic meaning. SIE’s encode primitive converts text or images into these vectors using any of 85+ supported models. The resulting embeddings power semantic search, RAG retrieval, and recommendation systems.

Python
TypeScript

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")
result = client.encode("BAAI/bge-m3", Item(text="Your text here"))
print(f"Dimensions: {len(result['dense'])}")  # 1024

import { SIEClient } from "@superlinked/sie-sdk";

const client = new SIEClient("http://localhost:8080");
const result = await client.encode("BAAI/bge-m3", { text: "Your text here" });
console.log(`Dimensions: ${result.dense?.length}`); // 1024

await client.close();

Not sure which model to use? See the Model Selection Guide or the full model catalog.

When Should I Use Dense Embeddings?

Use dense embeddings when:

You need semantic similarity matching, not just exact keyword matching
Your vector database supports ANN search (Qdrant, Weaviate, Chroma, and LanceDB all work with SIE)
You want a simple, fast retrieval baseline to start with

Consider a different approach when:

You need hybrid search (keyword and semantic combined): see Sparse and Hybrid Search
You need maximum retrieval accuracy: see Multi-vector and ColBERT
You are searching over images: see Multimodal Embeddings

Basic Usage

result = client.encode("BAAI/bge-m3", Item(text="Hello world"))
print(result["dense"][:5])  # First 5 dimensions

const result = await client.encode("BAAI/bge-m3", { text: "Hello world" });
console.log(result.dense?.slice(0, 5)); // First 5 dimensions

Batch Encoding

Pass a list of items for efficient GPU-batched processing:

Python
TypeScript

items = [
    Item(text="First document"),
    Item(text="Second document"),
    Item(text="Third document"),
]
results = client.encode("BAAI/bge-m3", items)

const results = await client.encode("BAAI/bge-m3", [
  { text: "First document" },
  { text: "Second document" },
  { text: "Third document" },
]);

The server batches requests automatically. You do not need to manage batch sizes manually.

Tracking Items by ID

Python
TypeScript

items = [
    Item(id="doc-1", text="First document"),
    Item(id="doc-2", text="Second document"),
]
results = client.encode("BAAI/bge-m3", items)
for result in results:
    print(f"{result['id']}: {len(result['dense'])} dims")

const results = await client.encode("BAAI/bge-m3", [
  { id: "doc-1", text: "First document" },
  { id: "doc-2", text: "Second document" },
]);
for (const result of results) {
  console.log(`${result.id}: ${result.dense?.length} dims`);
}

Should I Encode Queries and Documents Differently?

Yes, for asymmetric models. Queries are short and question-like; documents are longer content. Many models are trained to distinguish these and perform better when you tell them which is which.

Python
TypeScript

# Encode a search query
query = client.encode(
    "BAAI/bge-m3",
    Item(text="What is machine learning?"),
    is_query=True,
)

# Encode documents (default, no is_query flag needed)
documents = client.encode(
    "BAAI/bge-m3",
    [Item(text="Machine learning is..."), Item(text="Deep learning uses...")],
)

// Encode a search query
const query = await client.encode(
  "BAAI/bge-m3",
  { text: "What is machine learning?" },
  { isQuery: true },
);

// Encode documents (default, no isQuery flag needed)
const documents = await client.encode(
  "BAAI/bge-m3",
  [{ text: "Machine learning is..." }, { text: "Deep learning uses..." }],
);

For instruction-tuned models like Alibaba-NLP/gte-Qwen2-1.5B-instruct, pass an explicit instruction string to guide embedding behaviour:

Python
TypeScript

result = client.encode(
    "Alibaba-NLP/gte-Qwen2-1.5B-instruct",
    Item(text="What is Python?"),
    instruction="Represent this query for retrieving programming tutorials:"
)

const result = await client.encode(
  "Alibaba-NLP/gte-Qwen2-1.5B-instruct",
  { text: "What is Python?" },
  { instruction: "Represent this query for retrieving programming tutorials:" },
);

What Output Types Are Available?

By default, encode returns dense embeddings. Models that support it (such as BAAI/bge-m3) can return sparse and multi-vector outputs in a single call:

Python
TypeScript

result = client.encode(
    "BAAI/bge-m3",
    Item(text="text"),
    output_types=["dense", "sparse", "multivector"]
)

print(result["dense"])        # 1024-dim float array
print(result["sparse"])       # {"indices": [...], "values": [...]}
print(result["multivector"])  # [num_tokens, 1024] array

const result = await client.encode(
  "BAAI/bge-m3",
  { text: "text" },
  { outputTypes: ["dense", "sparse", "multivector"] },
);

console.log(result.dense);        // Float32Array, 1024 elements
console.log(result.sparse);       // { indices: Int32Array, values: Float32Array }
console.log(result.multivector);  // Float32Array[], [num_tokens][1024]

Not all models support all output types. BAAI/bge-m3 is the main model supporting all three. Most models support dense only.

Response Fields

Field	Type	Description
`id`	`str or None`	Item ID if provided
`dense`	`NDArray[float32]`	Dense embedding vector
`sparse`	`SparseResult or None`	Sparse indices and values
`multivector`	`NDArray[float32] or None`	Per-token embeddings (ColBERT)
`timing`	`TimingInfo`	Request timing breakdown

Good Starting Models

Model	Dims	Max Length	Notes
`BAAI/bge-m3`	1024	8192	Multilingual; supports dense, sparse, multivector
`NovaSearch/stella_en_400M_v5`	1024	512	Best English quality per GB of VRAM
`intfloat/e5-base-v2`	768	512	Solid all-rounder
`sentence-transformers/all-MiniLM-L6-v2`	384	256	Fastest and most lightweight

See How do I choose the right model? or the model catalog.

HTTP API

The server defaults to msgpack for efficient numpy transport. To use plain JSON:

curl -X POST http://localhost:8080/v1/encode/BAAI/bge-m3 \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -d '{"items": [{"text": "Your text here"}]}'

See the full HTTP API Reference.

Frequently Asked Questions

What is the difference between dense and sparse embeddings? Dense embeddings represent meaning as a fixed-length float vector (for example, 1024 numbers). Sparse embeddings represent text as a weighted set of vocabulary tokens, which is useful for keyword matching. Most use cases start with dense. Add sparse when you need hybrid search. See Sparse and Hybrid Search.

What embedding dimensions should I use? Higher dimensions capture more nuance but use more memory and slow down ANN search. 384-dim models like all-MiniLM are fast but less precise. 1024-dim models like bge-m3 and stella are the standard production choice. 4096-dim models like NV-Embed-v2 give the best quality at high memory cost. Start at 1024.

Can SIE generate image embeddings? Yes. SIE supports multimodal models like google/siglip-so400m-patch14-384 that embed both text and images into the same vector space. See Multimodal Embeddings.

Does SIE integrate with LangChain, LlamaIndex, or Haystack? Yes. SIE has first-class integrations with LangChain, LlamaIndex, Haystack, Qdrant, Weaviate, Chroma, LanceDB, and more. See Integrations.