---
title: Configuration
description: Complete environment variables reference for SIE server configuration.
canonical_url: https://superlinked.com/docs/reference/configuration
last_updated: 2026-05-20
---

SIE uses environment variables for server configuration. CLI arguments override environment variables, which override defaults.

## Server Configuration

Source: [packages/sie_server/src/sie_server/cli.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/cli.py)

Core settings for device selection, model loading, and server behavior.

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_DEVICE` | `auto` | Inference device. Options: `auto` (detect GPU), `cuda`, `cuda:0`, `mps`, `cpu` |
| `SIE_MODELS_DIR` | `./models` | Path to model configs directory. Supports local paths, `s3://`, or `gs://` URLs |
| `SIE_MODEL_FILTER` | None | Comma-separated list of model names to load. If unset, all models are available |
| `SIE_GPU_TYPE` | Auto-detected | Override detected GPU type for routing (e.g., `l4`, `a100-80gb`, `h100`) |

### Cache Configuration

Source: [packages/sie_server/src/sie_server/core/disk_cache.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/core/disk_cache.py)

Control where model weights are stored and retrieved.

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_LOCAL_CACHE` | `HF_HOME` | Local cache directory for model weights |
| `SIE_CLUSTER_CACHE` | None | Cluster cache URL for shared weights (`s3://` or `gs://`) |
| `SIE_HF_FALLBACK` | `true` | Enable HuggingFace Hub fallback for weight downloads |

**Cache resolution order:**
1. Local cache (`SIE_LOCAL_CACHE`)
2. Cluster cache (`SIE_CLUSTER_CACHE`)
3. HuggingFace Hub (if `SIE_HF_FALLBACK=true`)

---

## Batching Configuration

Source: [packages/sie_server/src/sie_server/config/engine.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/config/engine.py)

Control request batching behavior for GPU efficiency.

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_MAX_BATCH_REQUESTS` | `64` | Maximum requests per batch |
| `SIE_MAX_BATCH_WAIT_MS` | `10` | Maximum milliseconds to wait for batch to fill |
| `SIE_MAX_CONCURRENT_REQUESTS` | `512` | Maximum concurrent requests (queue size) |

**Tuning guidance:**
- Increase `SIE_MAX_BATCH_REQUESTS` for higher throughput on high-memory GPUs
- Decrease `SIE_MAX_BATCH_WAIT_MS` for lower latency at the cost of smaller batches
- Set `SIE_MAX_CONCURRENT_REQUESTS` based on expected burst traffic

---

## Memory Configuration

Source: [packages/sie_server/src/sie_server/core/memory.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/core/memory.py)

Control memory pressure thresholds and LRU eviction behavior.

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_MEMORY_PRESSURE_THRESHOLD_PERCENT` | `85` | VRAM usage percent that triggers LRU eviction (0-100) |
| `SIE_DISK_CACHE_ENABLED` | `true` | Enable LRU disk cache for model weights |
| `SIE_DISK_PRESSURE_THRESHOLD_PERCENT` | `85` | Disk usage percent that triggers LRU eviction of cached weights |

**How LRU eviction works:**
1. Background monitor checks memory usage periodically
2. When usage exceeds `SIE_MEMORY_PRESSURE_THRESHOLD_PERCENT`, the least-recently-used model is evicted
3. Models are re-loaded on-demand when the next request arrives

---

## Logging Configuration

Source: [packages/sie_server/src/sie_server/core/logging.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/core/logging.py)

Control log format and verbosity.

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_LOG_JSON` | `false` | Enable structured JSON logging for Loki compatibility |

JSON log format includes structured fields:

```json
{
  "timestamp": "2025-12-18T10:30:00Z",
  "level": "INFO",
  "logger": "sie_server.core.registry",
  "message": "Inference completed",
  "model": "bge-m3",
  "request_id": "abc123",
  "trace_id": "def456",
  "latency_ms": 45.2
}
```

---

## Tracing Configuration

Source: [packages/sie_server/src/sie_server/observability/tracing.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/observability/tracing.py)

Enable OpenTelemetry distributed tracing.

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_TRACING_ENABLED` | `false` | Enable OpenTelemetry tracing |

When tracing is enabled, SIE respects standard OpenTelemetry environment variables:

| Variable | Default | Description |
|----------|---------|-------------|
| `OTEL_SERVICE_NAME` | `sie-server` | Service name in traces |
| `OTEL_TRACES_EXPORTER` | `otlp` | Exporter type (`otlp`, `console`, `none`) |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | `http://localhost:4317` | OTLP collector endpoint |
| `OTEL_TRACES_SAMPLER` | `always_on` | Sampling strategy |
| `OTEL_TRACES_SAMPLER_ARG` | `1.0` | Sampling rate (for `traceidratio` sampler) |

---

## Performance Configuration

Source: [packages/sie_server/src/sie_server/config/engine.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/config/engine.py)

Advanced settings for compute precision and preprocessing.

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_PREPROCESSOR_WORKERS` | `4` | Number of preprocessing worker threads |
| `SIE_IMAGE_WORKERS` | `4` | Image preprocessing worker threads (for VLMs) |
| `SIE_ATTENTION_BACKEND` | `auto` | Attention implementation: `auto`, `flash_attention_2`, `sdpa`, `eager` |
| `SIE_DEFAULT_COMPUTE_PRECISION` | `float16` | Default compute precision: `float16`, `bfloat16`, `float32` |
| `SIE_INSTRUMENTATION` | `false` | Enable detailed batch statistics for debugging |

---

## LoRA Configuration

Source: [packages/sie_server/src/sie_server/config/engine.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/config/engine.py)

Control LoRA adapter loading behavior.

| Variable | Default | Description |
|----------|---------|-------------|
| `SIE_MAX_LORAS_PER_MODEL` | `10` | Maximum LoRA adapters to keep loaded per model |

When the limit is reached, the least-recently-used LoRA adapter is evicted.

---

## Example: Production Configuration

```bash
# High-throughput production setup
export SIE_DEVICE=cuda
export SIE_MODELS_DIR=s3://my-bucket/models/
export SIE_CLUSTER_CACHE=s3://my-bucket/weights/
export SIE_LOCAL_CACHE=/mnt/nvme/cache

# Batching optimized for A100-80GB
export SIE_MAX_BATCH_REQUESTS=128
export SIE_MAX_BATCH_WAIT_MS=5
export SIE_MAX_CONCURRENT_REQUESTS=1024

# Memory management
export SIE_MEMORY_PRESSURE_THRESHOLD_PERCENT=90

# Observability
export SIE_LOG_JSON=true
export SIE_TRACING_ENABLED=true
export OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4317
```

---

## Example: Development Configuration

```bash
# Local development setup
export SIE_DEVICE=mps  # or cuda, cpu
export SIE_MODELS_DIR=./models

# Lower batching for faster iteration
export SIE_MAX_BATCH_REQUESTS=8
export SIE_MAX_BATCH_WAIT_MS=1

# Debug logging
export SIE_INSTRUMENTATION=true
```

---

## What's Next

- [CLI Reference](/docs/reference/cli/) - Command-line options that map to these variables
- [HTTP API Reference](/docs/reference/api/) - Endpoints exposed by the configured server
