Fastembed → SIE

Fastembed is the Qdrant-maintained library that runs ONNX embedding models in-process via onnxruntime. SIE serves the same models out-of-process over HTTP.

Why migrate

Out-of-process serving. Every Python worker that imports Fastembed gets its own copy of the model in RAM. SIE loads weights once per worker pod regardless of how many app processes connect.
Shared GPU. Fastembed is CPU-only by default (GPU support requires a separate ONNX runtime build). SIE serves on CPU, MPS, or CUDA without changing client code.
Multi-model. SIE can serve dense, sparse, ColBERT, rerankers, and vision models from one cluster. Fastembed does not cover sparse or cross-encoder rerankers natively.

What stays the same

Model checkpoint (e.g. BAAI/bge-small-en-v1.5).
Vector dimension.
Cosine semantics. Embeddings between Fastembed (ONNX) and SIE (PyTorch) sit within ~1e-3 cosine, so existing indexes do not need re-embedding.

Before

from fastembed import TextEmbedding

encoder = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
[vec] = list(encoder.embed(["The mitochondrion is the powerhouse of the cell."]))

After

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")
result = client.encode(
    "BAAI/bge-small-en-v1.5",
    Item(text="The mitochondrion is the powerhouse of the cell."),
)
vec = result["dense"]  # np.ndarray, shape [384]

Re-embed required?

No if you keep the same checkpoint. Yes if you take the migration as a chance to upgrade to a stronger model.

Run it yourself

sentence-transformers/all-MiniLM-L6-v2 is the common-denominator small model both Fastembed and SIE ship by default.

mise run serve -- -m sentence-transformers/all-MiniLM-L6-v2
uv add fastembed

Run the ‘before’ and ‘after’ snippets from this page. Expected: identical dim (384), cosine at or above 0.999.

Using `BAAI/bge-small-en-v1.5` (or any other model)

Most Fastembed users actually run BAAI/bge-small-en-v1.5. SIE doesn’t ship that bundle by default, so add one. Migration mechanics are otherwise identical. Drop a YAML file at packages/sie_server/models/BAAI__bge-small-en-v1.5.yaml:

sie_id: BAAI/bge-small-en-v1.5
hf_id: BAAI/bge-small-en-v1.5
inputs:
  text: true
  image: false
  audio: false
  video: false
tasks:
  encode:
    dense:
      dim: 384
    sparse: null
    multivector: null
  score: null
  extract: null
max_sequence_length: 512
profiles:
  default:
    max_batch_tokens: 16384
    compute_precision: null
    adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerDenseAdapter
    adapter_options:
      loadtime:
        trust_remote_code: false
      runtime:
        pooling: cls
        normalize: true

Then mise run serve -- -m BAAI/bge-small-en-v1.5. The sentence-transformers__all-MiniLM-L6-v2.yaml bundle is the closest working reference; copy it and adjust dim, max_sequence_length, and pooling.