Skip to content
Why did we open-source our inference engine? Read the post

OpenAI → SIE

OpenAI Embeddings is a paid managed API. SIE is a self-hosted inference engine. There are two migration paths:

  1. Drop-in shim. Point the OpenAI SDK at SIE’s /v1/embeddings endpoint. Two-line change, every other call site untouched.
  2. Native SDK. Use sie_sdk.SIEClient directly to access sparse, multivector, ColBERT, rerankers, and extraction.
  • Cost crosses over at moderate volume. OpenAI bills per token; SIE has flat hourly cost. The right answer depends on your workload; see the worked example below. Plug in your own numbers before quoting either side.
  • Data residency. Embeddings of your text never leave your network.
  • No rate limits. Your ceiling is whatever GPU capacity you provision. You also own the SLA: OpenAI’s published 99.9% becomes whatever your platform team operates.
  • Model breadth. OpenAI ships 3 embedding models. SIE serves 100+ out of the box (109 bundle configs as of writing) across dense, sparse, ColBERT/multivector, vision, and rerankers.
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
client = OpenAI(api_key="not-needed", base_url="http://sie:8080/v1")

…and model="text-embedding-3-small" becomes model="intfloat/e5-base-v2" (or whichever SIE model you pick).

import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
resp = client.embeddings.create(
model="text-embedding-3-small",
input=["The mitochondrion is the powerhouse of the cell."],
)
vector = resp.data[0].embedding # 1536-dim
from openai import OpenAI
client = OpenAI(api_key="not-needed", base_url="http://localhost:8080/v1")
resp = client.embeddings.create(
model="intfloat/e5-base-v2",
input=["The mitochondrion is the powerhouse of the cell."],
)
vector = resp.data[0].embedding # 768-dim
OpenAISIE equivalent
text-embedding-3-small (1536)intfloat/e5-base-v2 (768)
text-embedding-3-large (3072)Alibaba-NLP/gte-Qwen2-1.5B-instruct or NovaSearch/stella_en_1.5B_v5
dimensions=N truncationSlice client-side, or pick a smaller-dim model
encoding_format="base64"Supported on /v1/embeddings with the same field
user="..." for abuse trackingUse SIE telemetry / your own tracing

Public list prices, no negotiated discounts, single region:

WorkloadOpenAI (text-embedding-3-small @ $0.02 / 1M tokens)SIE on one g5.xlarge (1× A10G, ~$1.00/hr on-demand)
10M tokens / day~0.20/day 0.20 / day · ~6 / month~24/day 24 / day · ~730 / month
100M tokens / day~2/day 2 / day · ~60 / month~24/day 24 / day · ~730 / month
1B tokens / day~20/day 20 / day · ~600 / month~24/day 24 / day · ~730 / month
5B tokens / day~100/day 100 / day · ~3,000 / monthneeds ≥1 GPU; still ~730730–2,200 / month

Crossover sits around 1.2B tokens / day on this size class. Below that, OpenAI is cheaper and you don’t run a GPU. Above that, SIE wins on cost and the gap widens linearly. None of this counts the engineering cost of operating a GPU pool, which is the actually-load-bearing variable for most teams. Plug in your own utilization, GPU size, and reserved-instance discount before quoting a number to your manager.

Yes. Different model → different vector space. Even text-embedding-3-small truncated to 1024 dims is not interchangeable with any open-source model. Plan a re-embed window before cutting over.

Terminal window
# Start SIE with E5 loaded.
mise run serve -- -m intfloat/e5-base-v2
# Run the OpenAI 'before' script and the SIE 'after' script
# from this page. Compare the printed embeddings.
export OPENAI_API_KEY=sk-...
uv add openai

Cosine across spaces carries no signal, so don’t expect 1.0. For sign-off, run your retrieval eval against both legs.

Contact us

Tell us about your use case and we'll get back to you shortly.