---
title: What is SIE?
description: SIE (Superlinked Inference Engine) is an open-source inference server for small AI models. It runs encoders, rerankers, and extractors on your own GPU, serving 85+ models through three simple API primitives.
canonical_url: https://superlinked.com/docs
last_updated: 2026-05-14
---

**SIE (Superlinked Inference Engine) is an open-source inference server for small AI models.** It runs encoders, rerankers, and entity extractors on your own infrastructure, from a laptop to a production Kubernetes cluster, without managing per-model deployments or paying per-token API costs.

SIE exposes three primitives:

- **[Encode](https://superlinked.com/docs/encode/)** converts text or images to vectors for semantic search and RAG
- **[Score](https://superlinked.com/docs/score/)** reranks query-document pairs for higher-precision retrieval
- **[Extract](https://superlinked.com/docs/extract/)** pulls entities and structured data from unstructured text

85+ models are supported out of the box. The server handles batching, GPU sharing, and model switching automatically. Browse the full [model catalog](https://superlinked.com/models).

> SIE is built by [Superlinked](https://superlinked.com/), the team behind the Superlinked vector compute framework. [Read the launch post](https://superlinked.com/blog/launch).

---

## Get Started

| I want to... | Go to |
| --- | --- |
| **Get my first vectors in 2 minutes** | [Quickstart](https://superlinked.com/docs/quickstart/) |
| **Embed text or images** | [Encode Overview](https://superlinked.com/docs/encode/) |
| **Rerank search results** | [Score Overview](https://superlinked.com/docs/score/) |
| **Extract entities from text** | [Extract Overview](https://superlinked.com/docs/extract/) |
| **Choose the right model** | [Model Selection Guide](https://superlinked.com/docs/choosing/) |
| **See all 85+ models** | [Model Catalog](https://superlinked.com/models) |
| **Deploy to production** | [Deployment Overview](https://superlinked.com/docs/deployment/) |
| **Connect to LangChain or LlamaIndex** | [Integrations](https://superlinked.com/docs/integrations/) |

---

## Why Does SIE Exist?

LLM inference tools are designed for one large model spread across many GPUs. Small model inference is the opposite problem: you run many models (encoders, rerankers, extractors) on one GPU and need fast switching between them.

**What makes SIE different from other inference servers:**

1. **Compute engine abstraction.** SIE wraps PyTorch, SGLang, and Flash Attention behind three uniform primitives. The server picks the best engine per model automatically.
2. **Multi-model GPU sharing.** Many models can share one GPU via LRU eviction. One SIE instance serves any model at query time without pre-loading everything.
3. **Same code, laptop to cloud.** The same Docker image runs locally and in a production Kubernetes cluster. There is no separate production mode.
4. **Validated correctness.** Every supported model has quality and latency targets checked in CI.

---

## How Does SIE Compare to Alternatives?

| | SIE | TEI (HuggingFace) | OpenAI API |
|---|---|---|---|
| Self-hosted | Yes | Yes | No |
| Multi-model on one GPU | Yes | No (one model per server) | N/A |
| Encode + Score + Extract | Yes | Encode only | Encode only |
| 85+ supported models | Yes | Varies | Limited |
| Open source | Yes | Yes | No |
| No per-token cost | Yes | Yes | No |

See the [SIE vs TEI vs OpenAI benchmark](https://superlinked.com/docs/examples/benchmark/) for full performance numbers.

---

## Frequently Asked Questions

**What is SIE used for?**
SIE is used to generate embeddings for semantic search and RAG pipelines, rerank search results to improve precision, and extract entities from unstructured text. All of this runs on your own infrastructure. See [superlinked.com](https://superlinked.com/) for more on what you can build.

**Does SIE support GPU inference?**
Yes. SIE runs on CPU or GPU. For production inference at scale, a GPU is strongly recommended. See [Hardware and Capacity](https://superlinked.com/docs/deployment/resources/) for GPU sizing guidance.

**How many models can SIE run at the same time?**
SIE loads models on demand and evicts the least-recently-used models when GPU memory fills up. An L4 GPU (24GB) keeps 2 to 3 standard models hot simultaneously. All 85+ models are available at query time regardless of VRAM.

**Is SIE open source?**
Yes. SIE is open source and available on [GitHub](https://github.com/superlinked/sie). The core inference server is free to use. Superlinked also offers managed cloud deployment. [Contact us](https://superlinked.com/) to learn more.

**How is SIE different from the Superlinked framework?**
The [Superlinked framework](https://superlinked.com/) is a higher-level Python SDK for building multi-attribute search and recommendation systems. SIE is the inference layer underneath it. You can use SIE standalone or as part of a full Superlinked stack.
