---
title: Docker
description: Run SIE in containers with GPU support.
canonical_url: https://superlinked.com/docs/deployment/docker
last_updated: 2026-05-20
---

## Quick Start

Source: [packages/sie_server/Dockerfile.cuda12](https://github.com/superlinked/sie/blob/main/packages/sie_server/Dockerfile.cuda12)

```bash
# CPU only
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default

# With GPU (recommended for production)
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
```

Verify the server is running:

```bash
curl http://localhost:8080/healthz
# {"status":"ok"}
```

---

## Image Tags

Images follow the format `{version}-{platform}-{bundle}`. The floating `latest` prefix points at the most recent release.

### By Platform

| Tag | Base | Use Case |
|-----|------|----------|
| `latest-cuda12-default` | CUDA 12 | Production with modern NVIDIA GPUs |
| `latest-cuda11-default` | CUDA 11 | Older NVIDIA GPUs |
| `latest-cpu-default` | Ubuntu 22.04 | Development, ARM64, no GPU |

Pinned releases use the version prefix, for example `v0.2.0-cuda12-default`.

### By Bundle

Each platform publishes the bundles below. See [Bundles](/docs/engine/bundles/) for the models each one includes.

| Tag | Purpose |
|-----|---------|
| `latest-cuda12-default` | All standard models: dense, sparse, ColBERT, vision, extraction, cross-encoders |
| `latest-cuda12-sglang` | Large LLM embeddings (4B+ params) served through SGLang |

CPU and CUDA 11 images follow the same pattern: `latest-cpu-default`, `latest-cpu-sglang`, `latest-cuda11-default`, etc.

---

## GPU Configuration

Source: [packages/sie_server/Dockerfile.cuda12](https://github.com/superlinked/sie/blob/main/packages/sie_server/Dockerfile.cuda12)

### Single GPU

```bash
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
```

### Specific GPU

```bash
# Use GPU 0 only
docker run --gpus '"device=0"' -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default

# Use GPUs 0 and 1
docker run --gpus '"device=0,1"' -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
```

### NVIDIA Container Toolkit

The `--gpus` flag requires NVIDIA Container Toolkit. Install it first:

```bash
# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```

---

## Environment Variables

Source: [packages/sie_server/src/sie_server/config/engine.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/config/engine.py)

Configure the server with environment variables. All variables use the `SIE_` prefix.

### Core Settings

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_DEVICE` | `auto` | Compute device: `auto` (detect GPU), `cpu`, `cuda`, `cuda:0`, `mps` |
| `SIE_MODELS_DIR` | `/app/models` | Path to model configs |
| `SIE_MODEL_FILTER` | (all) | Comma-separated list of models to load |

### Batching

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_MAX_BATCH_REQUESTS` | `64` | Maximum requests per batch |
| `SIE_MAX_BATCH_WAIT_MS` | `10` | Max wait time for batch to fill |
| `SIE_MAX_CONCURRENT_REQUESTS` | `512` | Queue size limit |

### Memory

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_MEMORY_PRESSURE_THRESHOLD_PERCENT` | `85` | VRAM percent that triggers LRU eviction |

### Observability

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_LOG_JSON` | `false` | Use JSON log format |
| `SIE_TRACING_ENABLED` | `false` | Enable OpenTelemetry tracing |
| `SIE_GPU_TYPE` | (auto) | Override GPU type for metrics |

### Example

```bash
docker run --gpus all -p 8080:8080 \
  -e SIE_DEVICE=cuda \
  -e SIE_MAX_BATCH_REQUESTS=128 \
  -e SIE_MEMORY_PRESSURE_THRESHOLD_PERCENT=85 \
  -e SIE_LOG_JSON=true \
  ghcr.io/superlinked/sie-server:latest-cuda12-default
```

---

## Volume Mounts

### HuggingFace Cache

Source: [packages/sie_server/Dockerfile.cuda12](https://github.com/superlinked/sie/blob/main/packages/sie_server/Dockerfile.cuda12)

Mount a persistent volume for model weights. This avoids re-downloading on restarts.

```bash
docker run --gpus all -p 8080:8080 \
  -v ~/.cache/huggingface:/app/.cache/huggingface \
  ghcr.io/superlinked/sie-server:latest-cuda12-default
```

The container uses `HF_HOME=/app/.cache/huggingface` by default.

### Custom Model Configs

Add your own model configs by mounting a directory:

```bash
docker run --gpus all -p 8080:8080 \
  -v /path/to/my-models:/app/models \
  ghcr.io/superlinked/sie-server:latest-cuda12-default
```

### Read-Only Root Filesystem

For security-hardened deployments, use read-only root with explicit writable mounts:

```bash
docker run --gpus all -p 8080:8080 \
  --read-only \
  -v hf-cache:/app/.cache/huggingface \
  --tmpfs /tmp:size=1G \
  ghcr.io/superlinked/sie-server:latest-cuda12-default
```

---

## Docker Compose

### Single Service

```yaml
# docker-compose.yml
services:
  sie:
    image: ghcr.io/superlinked/sie-server:latest-cuda12-default
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - hf-cache:/app/.cache/huggingface
    environment:
      - SIE_DEVICE=cuda
      - SIE_MAX_BATCH_REQUESTS=128
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/healthz')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

volumes:
  hf-cache:
```

### Multi-Bundle Setup

Run multiple bundles side by side when you need the SGLang backend alongside the default models:

```yaml
# docker-compose.yml
services:
  sie-default:
    image: ghcr.io/superlinked/sie-server:latest-cuda12-default
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0"]
              capabilities: [gpu]
    volumes:
      - hf-cache:/app/.cache/huggingface
    environment:
      - SIE_DEVICE=cuda

  sie-sglang:
    image: ghcr.io/superlinked/sie-server:latest-cuda12-sglang
    ports:
      - "8081:8080"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["1"]
              capabilities: [gpu]
    volumes:
      - hf-cache:/app/.cache/huggingface
    environment:
      - SIE_DEVICE=cuda

volumes:
  hf-cache:
```

Start with:

```bash
docker compose up -d
```

---

## What's Next

- [Bundles](/docs/engine/bundles/) - dependency isolation for conflicting models
- [Kubernetes in GCP](/docs/deployment/cloud-gcp/) - production deployment with Helm
- [Kubernetes in AWS](/docs/deployment/cloud-aws/) - EKS deployment with Terraform
- [Troubleshooting](/docs/reference/troubleshooting/) - common issues and solutions
