OpenAI → SIE
OpenAI Embeddings is a paid managed API. SIE is a self-hosted inference engine. There are two migration paths:
- Drop-in shim. Point the OpenAI SDK at SIE’s
/v1/embeddingsendpoint. Two-line change, every other call site untouched. - Native SDK. Use
sie_sdk.SIEClientdirectly to access sparse, multivector, ColBERT, rerankers, and extraction.
Why migrate
Section titled “Why migrate”- Cost crosses over at moderate volume. OpenAI bills per token; SIE has flat hourly cost. The right answer depends on your workload; see the worked example below. Plug in your own numbers before quoting either side.
- Data residency. Embeddings of your text never leave your network.
- No rate limits. Your ceiling is whatever GPU capacity you provision. You also own the SLA: OpenAI’s published 99.9% becomes whatever your platform team operates.
- Model breadth. OpenAI ships 3 embedding models. SIE serves 100+ out of the box (109 bundle configs as of writing) across dense, sparse, ColBERT/multivector, vision, and rerankers.
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])client = OpenAI(api_key="not-needed", base_url="http://sie:8080/v1")…and model="text-embedding-3-small" becomes model="intfloat/e5-base-v2" (or
whichever SIE model you pick).
Before
Section titled “Before”import osfrom openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])resp = client.embeddings.create( model="text-embedding-3-small", input=["The mitochondrion is the powerhouse of the cell."],)vector = resp.data[0].embedding # 1536-dimfrom openai import OpenAI
client = OpenAI(api_key="not-needed", base_url="http://localhost:8080/v1")resp = client.embeddings.create( model="intfloat/e5-base-v2", input=["The mitochondrion is the powerhouse of the cell."],)vector = resp.data[0].embedding # 768-dimfrom sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")result = client.encode( "intfloat/e5-base-v2", Item(text="The mitochondrion is the powerhouse of the cell."),)vector = result["dense"].tolist()Mapping
Section titled “Mapping”| OpenAI | SIE equivalent |
|---|---|
text-embedding-3-small (1536) | intfloat/e5-base-v2 (768) |
text-embedding-3-large (3072) | Alibaba-NLP/gte-Qwen2-1.5B-instruct or NovaSearch/stella_en_1.5B_v5 |
dimensions=N truncation | Slice client-side, or pick a smaller-dim model |
encoding_format="base64" | Supported on /v1/embeddings with the same field |
user="..." for abuse tracking | Use SIE telemetry / your own tracing |
Worked cost example
Section titled “Worked cost example”Public list prices, no negotiated discounts, single region:
| Workload | OpenAI (text-embedding-3-small @ $0.02 / 1M tokens) | SIE on one g5.xlarge (1× A10G, ~$1.00/hr on-demand) |
|---|---|---|
| 10M tokens / day | ~6 / month | ~730 / month |
| 100M tokens / day | ~60 / month | ~730 / month |
| 1B tokens / day | ~600 / month | ~730 / month |
| 5B tokens / day | ~3,000 / month | needs ≥1 GPU; still ~2,200 / month |
Crossover sits around 1.2B tokens / day on this size class. Below that, OpenAI is cheaper and you don’t run a GPU. Above that, SIE wins on cost and the gap widens linearly. None of this counts the engineering cost of operating a GPU pool, which is the actually-load-bearing variable for most teams. Plug in your own utilization, GPU size, and reserved-instance discount before quoting a number to your manager.
Re-embed required?
Section titled “Re-embed required?”Yes. Different model → different vector space. Even
text-embedding-3-small truncated to 1024 dims is not interchangeable
with any open-source model. Plan a re-embed window before cutting over.
Run it yourself
Section titled “Run it yourself”# Start SIE with E5 loaded.mise run serve -- -m intfloat/e5-base-v2
# Run the OpenAI 'before' script and the SIE 'after' script# from this page. Compare the printed embeddings.export OPENAI_API_KEY=sk-...uv add openaiCosine across spaces carries no signal, so don’t expect 1.0. For sign-off, run your retrieval eval against both legs.