Infinity → SIE
Infinity ships an embedding, reranking, and CLIP server with an OpenAI-compatible API. SIE covers the same surface plus sparse / multivector models, multi-model serving, and managed deployment tooling.
Why migrate
Section titled “Why migrate”- One cluster for N models. Infinity is single-model per container. SIE serves all configured models from one cluster with LRU eviction.
- Typed multi-modality outputs. Infinity is centered on dense
embeddings plus cross-encoder rerank. SIE returns typed
dense,sparse, andmultivectoroutputs from a singleencodecall, useful when an upstream retriever wants more than one signal per document. - Managed deployment. SIE ships a Helm chart, KEDA autoscaler config,
Grafana dashboards, and a
sie-adminCLI.
What stays the same
Section titled “What stays the same”- OpenAI-compatible endpoint. Existing Infinity clients (typically the OpenAI SDK pointed at Infinity) can swap base URLs and keep working.
- Model checkpoints. Same checkpoint, same vector space. Most Infinity-supported encoders work in SIE without re-engineering.
Before
Section titled “Before”# Pin a tag in production; :latest shown for brevity.# See https://hub.docker.com/r/michaelfeil/infinity/tagsdocker run --rm -p 7997:7997 michaelfeil/infinity:latest \ v2 --model-id BAAI/bge-small-en-v1.5from openai import OpenAI
client = OpenAI(api_key="not-needed", base_url="http://localhost:7997")resp = client.embeddings.create( model="BAAI/bge-small-en-v1.5", input=["..."],)mise run serve -- -m BAAI/bge-small-en-v1.5from openai import OpenAI
# Drop-in: keep the OpenAI SDK, change base_url.client = OpenAI(api_key="not-needed", base_url="http://localhost:8080/v1")
# …or use the native SDK for sparse/multivector/rerank.from sie_sdk import SIEClientfrom sie_sdk.types import Item
sie = SIEClient("http://localhost:8080")result = sie.encode("BAAI/bge-small-en-v1.5", Item(text="..."))Mapping
Section titled “Mapping”| Infinity | SIE equivalent |
|---|---|
--model-id BAAI/bge-small-en-v1.5 | bundle config + mise run serve |
--engine torch / optimum / ctranslate2 | SIE adapter selection (auto) |
| Multiple containers for multiple models | Single SIE cluster, one Helm chart |
/embeddings (OpenAI-compatible) | /v1/embeddings on SIE |
/rerank (custom Infinity endpoint) | client.score(...) (Python SDK) |
/classify | client.extract(...) with classifier model |
Image inputs on /embeddings (CLIP) | client.encode(model, Item(image=...)) |
Re-embed required?
Section titled “Re-embed required?”No. Cross-backend numerical drift between Infinity (PyTorch, CTranslate2, or ONNX, depending on flags) and SIE (PyTorch) sits at ~1e-3 cosine, well below any retrieval quality threshold.
Run it yourself
Section titled “Run it yourself”# Pin a tag in production; :latest shown for brevity.docker run -d -p 7997:7997 michaelfeil/infinity:latest \ v2 --model-id sentence-transformers/all-MiniLM-L6-v2
mise run serve -- -m sentence-transformers/all-MiniLM-L6-v2uv add openaiRun the ‘before’ and ‘after’ snippets from this page against both. Expected: identical dim (384), cosine at or above 0.999.