Inference

SIE vs TEI: How Do They Compare?

SIE (Superlinked Inference Engine) and TEI (Text Embeddings Inference by Hugging Face) are both open-source servers for self-hosting text embedding models. TEI is a lightweight, single-model server focused on embeddings. SIE is a broader inference platform supporting multiple simultaneous models, rerankers, extraction models, LoRA adapters, and multi-GPU cluster deployments.

Quick comparison

	SIE	TEI
Model types	Embeddings, rerankers, extraction, OCR	Embeddings, rerankers
Multi-model support	✓ (multiple models per cluster)	✗ (one model per instance)
LoRA hot-loading	✓	✗
Multi-GPU cluster	✓ (Helm chart, auto-scaling)	Limited
AWS / GCP Terraform	✓ (official modules)	Manual
SDK	✓ (`sie-sdk`)	REST API only
Monitoring	Built-in	Basic
Licence	Apache 2.0	Apache 2.0
Backed by	Superlinked	Hugging Face

When should you use TEI?

TEI is a good choice when:

You need a single embedding model for a small-scale or prototyping use case
Your team already uses Hugging Face infrastructure extensively
You want the absolute minimum footprint (TEI is a single Docker container)
You don’t need rerankers, extraction, or LoRA support

TEI is battle-tested for simple embedding serving and integrates well with the Hugging Face ecosystem.

When should you use SIE?

SIE is the better choice when:

You need multiple models in one deployment (e.g. an embedding model + a reranker + an OCR model)
You want to swap LoRA adapters at runtime without restarting (e.g. switching between legal, medical, and general embeddings)
You’re deploying at production scale on AWS or GCP with auto-scaling and GPU spot instances
You want lower operational overhead: SIE’s Terraform modules and Helm chart handle infrastructure provisioning
You need a proper SDK rather than raw REST calls
Your workload involves document processing (OCR, extraction) as well as embeddings

Performance comparison

SIE’s batching engine is optimised for GPU throughput across concurrent requests and multiple models. When running several models simultaneously (a common production pattern: embed + rerank), SIE’s shared GPU cluster is more efficient than running multiple TEI instances separately.

For single-model, single-request benchmarks, TEI and SIE are comparable. The difference grows with:

Concurrent requests (SIE batches more efficiently)
Multiple model types (SIE shares GPU memory)
Large corpus indexing jobs (SIE’s async batching reduces wall-clock time)

See the full SIE vs TEI vs OpenAI benchmark for cost, latency, and throughput data.

Deployment comparison

TEI on AWS (manual)

# Provision EC2 instance manually
# Install Docker
docker run ghcr.io/huggingface/text-embeddings-inference \
  --model-id BAAI/bge-m3

SIE on AWS (Terraform + Helm)

module "sie" {
  source = "superlinked/sie/aws"
  region = "us-east-1"
  gpus   = ["a100-40gb", "l4-spot"]
}

terraform apply
helm install sie oci://ghcr.io/superlinked/charts/sie-cluster

SIE provisions the full GPU cluster, configures networking, and deploys the inference server in a single workflow. TEI requires manual instance provisioning and doesn’t include cluster management tooling.

Summary: choosing between SIE and TEI

If your needs are simple (one model, low scale, quick setup), TEI is a reasonable starting point. If you’re building a production inference stack for search or RAG, SIE provides the multi-model support, operational tooling, and GPU efficiency that most production use cases eventually require.

Frequently asked questions

Can I migrate from TEI to SIE? Yes. SIE exposes a compatible REST API, so migrating typically involves updating the endpoint URL and installing the sie-sdk. The model IDs use the same Hugging Face format.

Is SIE harder to set up than TEI? For a single model, SIE has slightly more setup (Terraform + Helm vs a single Docker command). For multi-model production deployments, SIE’s tooling saves significant time versus managing multiple TEI instances.

Does SIE support all models that TEI supports? SIE supports 100+ models, including the major embedding and reranker families. If a specific model you need isn’t listed, it can be added, since SIE is open source.