OpenAI → SIE

OpenAI Embeddings is a paid managed API. SIE is a self-hosted inference engine. There are two migration paths:

Drop-in shim. Point the OpenAI SDK at SIE’s /v1/embeddings endpoint. Two-line change, every other call site untouched.
Native SDK. Use sie_sdk.SIEClient directly to access sparse, multivector, ColBERT, rerankers, and extraction.

Why migrate

Cost crosses over at moderate volume. OpenAI bills per token; SIE has flat hourly cost. The right answer depends on your workload; see the worked example below. Plug in your own numbers before quoting either side.
Data residency. Embeddings of your text never leave your network.
No rate limits. Your ceiling is whatever GPU capacity you provision. You also own the SLA: OpenAI’s published 99.9% becomes whatever your platform team operates.
Model breadth. OpenAI ships 3 embedding models. SIE serves 100+ out of the box (109 bundle configs as of writing) across dense, sparse, ColBERT/multivector, vision, and rerankers.

TL;DR

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
client = OpenAI(api_key="not-needed", base_url="http://sie:8080/v1")

…and model="text-embedding-3-small" becomes model="intfloat/e5-base-v2" (or whichever SIE model you pick).

Before

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
resp = client.embeddings.create(
    model="text-embedding-3-small",
    input=["The mitochondrion is the powerhouse of the cell."],
)
vector = resp.data[0].embedding  # 1536-dim

from openai import OpenAI

client = OpenAI(api_key="not-needed", base_url="http://localhost:8080/v1")
resp = client.embeddings.create(
    model="intfloat/e5-base-v2",
    input=["The mitochondrion is the powerhouse of the cell."],
)
vector = resp.data[0].embedding  # 768-dim

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")
result = client.encode(
    "intfloat/e5-base-v2",
    Item(text="The mitochondrion is the powerhouse of the cell."),
)
vector = result["dense"].tolist()

Mapping

OpenAI	SIE equivalent
`text-embedding-3-small` (1536)	`intfloat/e5-base-v2` (768)
`text-embedding-3-large` (3072)	`Alibaba-NLP/gte-Qwen2-1.5B-instruct` or `NovaSearch/stella_en_1.5B_v5`
`dimensions=N` truncation	Slice client-side, or pick a smaller-dim model
`encoding_format="base64"`	Supported on `/v1/embeddings` with the same field
`user="..."` for abuse tracking	Use SIE telemetry / your own tracing

Worked cost example

Public list prices, no negotiated discounts, single region:

Workload	OpenAI (`text-embedding-3-small` @ $0.02 / 1M tokens)	SIE on one `g5.xlarge` (1× A10G, ~$1.00/hr on-demand)
10M tokens / day	~ $0.20 / day · ~$ 6 / month	~ $24 / day · ~$ 730 / month
100M tokens / day	~ $2 / day · ~$ 60 / month	~ $24 / day · ~$ 730 / month
1B tokens / day	~ $20 / day · ~$ 600 / month	~ $24 / day · ~$ 730 / month
5B tokens / day	~ $100 / day · ~$ 3,000 / month	needs ≥1 GPU; still ~ $730–$ 2,200 / month

Crossover sits around 1.2B tokens / day on this size class. Below that, OpenAI is cheaper and you don’t run a GPU. Above that, SIE wins on cost and the gap widens linearly. None of this counts the engineering cost of operating a GPU pool, which is the actually-load-bearing variable for most teams. Plug in your own utilization, GPU size, and reserved-instance discount before quoting a number to your manager.

Re-embed required?

Yes. Different model → different vector space. Even text-embedding-3-small truncated to 1024 dims is not interchangeable with any open-source model. Plan a re-embed window before cutting over.

Run it yourself

# Start SIE with E5 loaded.
mise run serve -- -m intfloat/e5-base-v2

# Run the OpenAI 'before' script and the SIE 'after' script
# from this page. Compare the printed embeddings.
export OPENAI_API_KEY=sk-...
uv add openai

Cosine across spaces carries no signal, so don’t expect 1.0. For sign-off, run your retrieval eval against both legs.