Skip to content
Why did we open-source our inference engine? Read the post

Fastembed → SIE

Fastembed is the Qdrant-maintained library that runs ONNX embedding models in-process via onnxruntime. SIE serves the same models out-of-process over HTTP.

  • Out-of-process serving. Every Python worker that imports Fastembed gets its own copy of the model in RAM. SIE loads weights once per worker pod regardless of how many app processes connect.
  • Shared GPU. Fastembed is CPU-only by default (GPU support requires a separate ONNX runtime build). SIE serves on CPU, MPS, or CUDA without changing client code.
  • Multi-model. SIE can serve dense, sparse, ColBERT, rerankers, and vision models from one cluster. Fastembed does not cover sparse or cross-encoder rerankers natively.
  • Model checkpoint (e.g. BAAI/bge-small-en-v1.5).
  • Vector dimension.
  • Cosine semantics. Embeddings between Fastembed (ONNX) and SIE (PyTorch) sit within ~1e-3 cosine, so existing indexes do not need re-embedding.
from fastembed import TextEmbedding
encoder = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
[vec] = list(encoder.embed(["The mitochondrion is the powerhouse of the cell."]))
from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
result = client.encode(
"BAAI/bge-small-en-v1.5",
Item(text="The mitochondrion is the powerhouse of the cell."),
)
vec = result["dense"] # np.ndarray, shape [384]

No if you keep the same checkpoint. Yes if you take the migration as a chance to upgrade to a stronger model.

sentence-transformers/all-MiniLM-L6-v2 is the common-denominator small model both Fastembed and SIE ship by default.

Terminal window
mise run serve -- -m sentence-transformers/all-MiniLM-L6-v2
uv add fastembed

Run the ‘before’ and ‘after’ snippets from this page. Expected: identical dim (384), cosine at or above 0.999.

Using BAAI/bge-small-en-v1.5 (or any other model)

Section titled “Using BAAI/bge-small-en-v1.5 (or any other model)”

Most Fastembed users actually run BAAI/bge-small-en-v1.5. SIE doesn’t ship that bundle by default, so add one. Migration mechanics are otherwise identical. Drop a YAML file at packages/sie_server/models/BAAI__bge-small-en-v1.5.yaml:

sie_id: BAAI/bge-small-en-v1.5
hf_id: BAAI/bge-small-en-v1.5
inputs:
text: true
image: false
audio: false
video: false
tasks:
encode:
dense:
dim: 384
sparse: null
multivector: null
score: null
extract: null
max_sequence_length: 512
profiles:
default:
max_batch_tokens: 16384
compute_precision: null
adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerDenseAdapter
adapter_options:
loadtime:
trust_remote_code: false
runtime:
pooling: cls
normalize: true

Then mise run serve -- -m BAAI/bge-small-en-v1.5. The sentence-transformers__all-MiniLM-L6-v2.yaml bundle is the closest working reference; copy it and adjust dim, max_sequence_length, and pooling.

Contact us

Tell us about your use case and we'll get back to you shortly.