---
title: Infinity → SIE
description: Migrate from michaelfeil/infinity single-model containers to SIE multi-model serving with managed deployment.
canonical_url: https://superlinked.com/docs/migrate/infinity
last_updated: 2026-05-07
---

[Infinity](https://github.com/michaelfeil/infinity) ships an embedding,
reranking, and CLIP server with an OpenAI-compatible API. SIE covers
the same surface plus sparse / multivector models, multi-model
serving, and managed deployment tooling.

## Why migrate

- **One cluster for N models.** Infinity is single-model per container.
  SIE serves all configured models from one cluster with LRU eviction.
- **Typed multi-modality outputs.** Infinity is centered on dense
  embeddings plus cross-encoder rerank. SIE returns typed `dense`,
  `sparse`, and `multivector` outputs from a single `encode` call,
  useful when an upstream retriever wants more than one signal per
  document.
- **Managed deployment.** SIE ships a Helm chart, KEDA autoscaler config,
  Grafana dashboards, and a `sie-admin` CLI.

## What stays the same

- **OpenAI-compatible endpoint.** Existing Infinity clients (typically
  the OpenAI SDK pointed at Infinity) can swap base URLs and keep
  working.
- **Model checkpoints.** Same checkpoint, same vector space. Most
  Infinity-supported encoders work in SIE without re-engineering.

## Before

```bash
# Pin a tag in production; :latest shown for brevity.
# See https://hub.docker.com/r/michaelfeil/infinity/tags
docker run --rm -p 7997:7997 michaelfeil/infinity:latest \
  v2 --model-id BAAI/bge-small-en-v1.5
```

```python
from openai import OpenAI

client = OpenAI(api_key="not-needed", base_url="http://localhost:7997")
resp = client.embeddings.create(
    model="BAAI/bge-small-en-v1.5",
    input=["..."],
)
```

## After

```bash
mise run serve -- -m BAAI/bge-small-en-v1.5
```

```python
from openai import OpenAI

# Drop-in: keep the OpenAI SDK, change base_url.
client = OpenAI(api_key="not-needed", base_url="http://localhost:8080/v1")

# …or use the native SDK for sparse/multivector/rerank.
from sie_sdk import SIEClient
from sie_sdk.types import Item

sie = SIEClient("http://localhost:8080")
result = sie.encode("BAAI/bge-small-en-v1.5", Item(text="..."))
```

## Mapping

| Infinity                                | SIE equivalent                            |
|-----------------------------------------|-------------------------------------------|
| `--model-id BAAI/bge-small-en-v1.5`     | bundle config + `mise run serve`          |
| `--engine torch` / `optimum` / `ctranslate2` | SIE adapter selection (auto)         |
| Multiple containers for multiple models | Single SIE cluster, one Helm chart        |
| `/embeddings` (OpenAI-compatible)       | `/v1/embeddings` on SIE                   |
| `/rerank` (custom Infinity endpoint)    | `client.score(...)` (Python SDK)          |
| `/classify`                             | `client.extract(...)` with classifier model |
| Image inputs on `/embeddings` (CLIP)    | `client.encode(model, Item(image=...))`   |

## Re-embed required?

**No.** Cross-backend numerical drift between Infinity (PyTorch,
CTranslate2, or ONNX, depending on flags) and SIE (PyTorch) sits at
~1e-3 cosine, well below any retrieval quality threshold.

## Run it yourself

```bash
# Pin a tag in production; :latest shown for brevity.
docker run -d -p 7997:7997 michaelfeil/infinity:latest \
  v2 --model-id sentence-transformers/all-MiniLM-L6-v2

mise run serve -- -m sentence-transformers/all-MiniLM-L6-v2
uv add openai
```

Run the 'before' and 'after' snippets from this page against both.
Expected: identical dim (384), cosine at or above 0.999.