Docker
Quick Start
Section titled “Quick Start”# CPU onlydocker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default
# With GPU (recommended for production)docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-defaultVerify the server is running:
curl http://localhost:8080/healthz# {"status":"ok"}Image Tags
Section titled “Image Tags”Images follow the format {version}-{platform}-{bundle}. The floating latest prefix points at the most recent release.
By Platform
Section titled “By Platform”| Tag | Base | Use Case |
|---|---|---|
latest-cuda12-default | CUDA 12 | Production with modern NVIDIA GPUs |
latest-cuda11-default | CUDA 11 | Older NVIDIA GPUs |
latest-cpu-default | Ubuntu 22.04 | Development, ARM64, no GPU |
Pinned releases use the version prefix, for example v0.2.0-cuda12-default.
By Bundle
Section titled “By Bundle”Each platform publishes the bundles below. See Bundles for the models each one includes.
| Tag | Purpose |
|---|---|
latest-cuda12-default | All standard models: dense, sparse, ColBERT, vision, extraction, cross-encoders |
latest-cuda12-sglang | Large LLM embeddings (4B+ params) served through SGLang |
CPU and CUDA 11 images follow the same pattern: latest-cpu-default, latest-cpu-sglang, latest-cuda11-default, etc.
GPU Configuration
Section titled “GPU Configuration”Single GPU
Section titled “Single GPU”docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-defaultSpecific GPU
Section titled “Specific GPU”# Use GPU 0 onlydocker run --gpus '"device=0"' -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
# Use GPUs 0 and 1docker run --gpus '"device=0,1"' -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-defaultNVIDIA Container Toolkit
Section titled “NVIDIA Container Toolkit”The --gpus flag requires NVIDIA Container Toolkit. Install it first:
# Ubuntu/Debiandistribution=$(. /etc/os-release;echo $ID$VERSION_ID)curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.listsudo apt-get update && sudo apt-get install -y nvidia-container-toolkitsudo systemctl restart dockerEnvironment Variables
Section titled “Environment Variables”Configure the server with environment variables. All variables use the SIE_ prefix.
Core Settings
Section titled “Core Settings”| Variable | Default | Description |
|---|---|---|
SIE_DEVICE | auto | Compute device: auto (detect GPU), cpu, cuda, cuda:0, mps |
SIE_MODELS_DIR | /app/models | Path to model configs |
SIE_MODEL_FILTER | (all) | Comma-separated list of models to load |
Batching
Section titled “Batching”| Variable | Default | Description |
|---|---|---|
SIE_MAX_BATCH_REQUESTS | 64 | Maximum requests per batch |
SIE_MAX_BATCH_WAIT_MS | 10 | Max wait time for batch to fill |
SIE_MAX_CONCURRENT_REQUESTS | 512 | Queue size limit |
Memory
Section titled “Memory”| Variable | Default | Description |
|---|---|---|
SIE_MEMORY_PRESSURE_THRESHOLD_PERCENT | 85 | VRAM percent that triggers LRU eviction |
Observability
Section titled “Observability”| Variable | Default | Description |
|---|---|---|
SIE_LOG_JSON | false | Use JSON log format |
SIE_TRACING_ENABLED | false | Enable OpenTelemetry tracing |
SIE_GPU_TYPE | (auto) | Override GPU type for metrics |
Example
Section titled “Example”docker run --gpus all -p 8080:8080 \ -e SIE_DEVICE=cuda \ -e SIE_MAX_BATCH_REQUESTS=128 \ -e SIE_MEMORY_PRESSURE_THRESHOLD_PERCENT=85 \ -e SIE_LOG_JSON=true \ ghcr.io/superlinked/sie-server:latest-cuda12-defaultVolume Mounts
Section titled “Volume Mounts”HuggingFace Cache
Section titled “HuggingFace Cache”Mount a persistent volume for model weights. This avoids re-downloading on restarts.
docker run --gpus all -p 8080:8080 \ -v ~/.cache/huggingface:/app/.cache/huggingface \ ghcr.io/superlinked/sie-server:latest-cuda12-defaultThe container uses HF_HOME=/app/.cache/huggingface by default.
Custom Model Configs
Section titled “Custom Model Configs”Add your own model configs by mounting a directory:
docker run --gpus all -p 8080:8080 \ -v /path/to/my-models:/app/models \ ghcr.io/superlinked/sie-server:latest-cuda12-defaultRead-Only Root Filesystem
Section titled “Read-Only Root Filesystem”For security-hardened deployments, use read-only root with explicit writable mounts:
docker run --gpus all -p 8080:8080 \ --read-only \ -v hf-cache:/app/.cache/huggingface \ --tmpfs /tmp:size=1G \ ghcr.io/superlinked/sie-server:latest-cuda12-defaultDocker Compose
Section titled “Docker Compose”Single Service
Section titled “Single Service”# docker-compose.ymlservices: sie: image: ghcr.io/superlinked/sie-server:latest-cuda12-default ports: - "8080:8080" deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] volumes: - hf-cache:/app/.cache/huggingface environment: - SIE_DEVICE=cuda - SIE_MAX_BATCH_REQUESTS=128 healthcheck: test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/healthz')"] interval: 30s timeout: 10s retries: 3 start_period: 60s
volumes: hf-cache:Multi-Bundle Setup
Section titled “Multi-Bundle Setup”Run multiple bundles side by side when you need the SGLang backend alongside the default models:
# docker-compose.ymlservices: sie-default: image: ghcr.io/superlinked/sie-server:latest-cuda12-default ports: - "8080:8080" deploy: resources: reservations: devices: - driver: nvidia device_ids: ["0"] capabilities: [gpu] volumes: - hf-cache:/app/.cache/huggingface environment: - SIE_DEVICE=cuda
sie-sglang: image: ghcr.io/superlinked/sie-server:latest-cuda12-sglang ports: - "8081:8080" deploy: resources: reservations: devices: - driver: nvidia device_ids: ["1"] capabilities: [gpu] volumes: - hf-cache:/app/.cache/huggingface environment: - SIE_DEVICE=cuda
volumes: hf-cache:Start with:
docker compose up -dWhat’s Next
Section titled “What’s Next”- Bundles - dependency isolation for conflicting models
- Kubernetes in GCP - production deployment with Helm
- Kubernetes in AWS - EKS deployment with Terraform
- Troubleshooting - common issues and solutions