Fastembed → SIE
Fastembed is the Qdrant-maintained library that runs ONNX embedding models in-process via onnxruntime. SIE serves the same models out-of-process over HTTP.
Why migrate
Section titled “Why migrate”- Out-of-process serving. Every Python worker that imports Fastembed gets its own copy of the model in RAM. SIE loads weights once per worker pod regardless of how many app processes connect.
- Shared GPU. Fastembed is CPU-only by default (GPU support requires a separate ONNX runtime build). SIE serves on CPU, MPS, or CUDA without changing client code.
- Multi-model. SIE can serve dense, sparse, ColBERT, rerankers, and vision models from one cluster. Fastembed does not cover sparse or cross-encoder rerankers natively.
What stays the same
Section titled “What stays the same”- Model checkpoint (e.g.
BAAI/bge-small-en-v1.5). - Vector dimension.
- Cosine semantics. Embeddings between Fastembed (ONNX) and SIE (PyTorch) sit within ~1e-3 cosine, so existing indexes do not need re-embedding.
Before
Section titled “Before”from fastembed import TextEmbedding
encoder = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")[vec] = list(encoder.embed(["The mitochondrion is the powerhouse of the cell."]))from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")result = client.encode( "BAAI/bge-small-en-v1.5", Item(text="The mitochondrion is the powerhouse of the cell."),)vec = result["dense"] # np.ndarray, shape [384]Re-embed required?
Section titled “Re-embed required?”No if you keep the same checkpoint. Yes if you take the migration as a chance to upgrade to a stronger model.
Run it yourself
Section titled “Run it yourself”sentence-transformers/all-MiniLM-L6-v2 is the common-denominator
small model both Fastembed and SIE ship by default.
mise run serve -- -m sentence-transformers/all-MiniLM-L6-v2uv add fastembedRun the ‘before’ and ‘after’ snippets from this page. Expected: identical dim (384), cosine at or above 0.999.
Using BAAI/bge-small-en-v1.5 (or any other model)
Section titled “Using BAAI/bge-small-en-v1.5 (or any other model)”Most Fastembed users actually run BAAI/bge-small-en-v1.5. SIE doesn’t
ship that bundle by default, so add one. Migration mechanics are
otherwise identical. Drop a YAML file at
packages/sie_server/models/BAAI__bge-small-en-v1.5.yaml:
sie_id: BAAI/bge-small-en-v1.5hf_id: BAAI/bge-small-en-v1.5inputs: text: true image: false audio: false video: falsetasks: encode: dense: dim: 384 sparse: null multivector: null score: null extract: nullmax_sequence_length: 512profiles: default: max_batch_tokens: 16384 compute_precision: null adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerDenseAdapter adapter_options: loadtime: trust_remote_code: false runtime: pooling: cls normalize: trueThen mise run serve -- -m BAAI/bge-small-en-v1.5. The
sentence-transformers__all-MiniLM-L6-v2.yaml bundle is the closest
working reference; copy it and adjust dim, max_sequence_length,
and pooling.