---
title: Gateway
description: Stateless Rust inference edge for multi-worker GPU clusters.
canonical_url: https://superlinked.com/docs/engine/router
last_updated: 2026-05-20
---

The SIE gateway is a stateless Rust service that sits between clients and GPU workers. It handles routing, queue submission, resource pools, worker health, read-side config, and scale-from-zero orchestration.

The page keeps the `/docs/engine/router/` URL for compatibility, but the deployed component is `sie-gateway`.

## When to Use the Gateway

Not every deployment needs a gateway. The deciding factor is whether you are running an elastic worker fleet:

- **Single server** (local dev, single Docker container): connect the SDK directly to `sie-server`.
- **Kubernetes clusters**: use the gateway. It provides a stable client endpoint, worker discovery, queue-based inference, scale-from-zero, resource pools, and config read endpoints.
- **Horizontal gateway replicas**: supported. Each replica keeps its own in-memory registry and converges through bootstrap, NATS config deltas, and epoch polling.

| Setup | Use Gateway? | Why |
|-------|-------------|-----|
| Single Docker container | No | Connect the SDK directly to the worker |
| Docker Compose (multi-worker) | Optional | Useful for a single client endpoint in local tests |
| Kubernetes | Yes | Required for worker discovery, queue routing, scale-from-zero, and pool isolation |

---

## Architecture

![Gateway architecture: SDK/HTTP Client to gateway, NATS queue, and GPU workers](/diagrams/router-arch.svg)

The gateway is stateless with respect to durable data. It owns in-memory routing state, but it does not persist config and it does not execute inference.

```text
Client request
  -> sie-gateway resolves model, bundle, machine profile, and pool
  -> gateway publishes msgpack work items to NATS JetStream
  -> matching workers consume and execute inference
  -> workers publish msgpack results to the gateway's NATS Core inbox
  -> gateway assembles and returns the HTTP response
```

Config writes are outside this hot path. Admin tooling writes to `sie-config`, and gateways mirror that state through `/v1/configs/export`, NATS deltas, and `/v1/configs/epoch` polling.

---

## Request Routing

Source: [packages/sie_gateway/src/handlers/proxy.rs](https://github.com/superlinked/sie/blob/main/packages/sie_gateway/src/handlers/proxy.rs)

The gateway resolves every inference request to:

1. **Model and profile**: the model path and optional `:profile` suffix.
2. **Bundle**: selected by adapter compatibility, with the lowest numeric bundle priority winning by default.
3. **Machine profile**: `X-SIE-MACHINE-PROFILE` header or SDK `gpu` parameter.
4. **Pool**: default pool or explicit `X-SIE-Pool` / SDK `pool/profile` target.
5. **Queue subject**: `sie.work.{model}.{pool}` on the pool's JetStream stream.

Unlike the previous Python router, the Rust gateway is **queue-only** for inference. There is no direct-HTTP fallback to workers. If the queue transport is unavailable, the gateway returns `503` instead of bypassing the queue.

### GPU Routing

Requests can specify a target machine profile:

```bash
# HTTP
curl -X POST http://gateway:8080/v1/encode/BAAI/bge-m3 \
  -H "X-SIE-MACHINE-PROFILE: l4" \
  -H "Content-Type: application/json" \
  -d '{"items": [{"text": "Hello world"}]}'
```

#### Python

```python
# SDK
result = client.encode("BAAI/bge-m3", Item(text="hello"), gpu="l4")
```

#### TypeScript

```typescript
// SDK
const result = await client.encode("BAAI/bge-m3", { text: "hello" }, { gpu: "l4" });
```

If the caller omits a machine profile, the gateway can use the default configured route. Scale-from-zero returns `202` when the selected `(bundle, machine_profile)` has no healthy worker and the caller did not pin an explicit pool.

### 202 Scale-from-Zero

When no healthy worker is registered for the selected `(bundle, machine_profile)` tuple and the caller did not pin a specific pool, the gateway returns:

```
HTTP/1.1 202 Accepted
Retry-After: 120
Content-Type: application/json

{
  "status": "provisioning",
  "gpu": "l4",
  "bundle": "default",
  "estimated_wait_s": 180,
  "message": "No worker available for GPU type 'l4'. Provisioning in progress."
}
```

The SDK handles this automatically with `wait_for_capacity=True`. See [Scale-from-Zero](/docs/deployment/autoscaling/) for details.

`202` is only for capacity provisioning. Unknown models fail fast with `404` once the gateway registry has bootstrapped. Incompatible explicit bundle choices fail with `409`.

---

## Worker Discovery

Source: [packages/sie_gateway/src/discovery/k8s_discovery.rs](https://github.com/superlinked/sie/blob/main/packages/sie_gateway/src/discovery/k8s_discovery.rs)

### Static Mode

List worker URLs explicitly:

```bash
sie-gateway serve \
  -w http://worker-1:8080 \
  -w http://worker-2:8080 \
  -w http://worker-3:8080
```

### Kubernetes Mode

Auto-discover workers via Kubernetes service endpoints:

```bash
sie-gateway serve \
  --kubernetes \
  --k8s-namespace sie \
  --k8s-service sie-worker \
  --k8s-port 8080
```

In Kubernetes mode, the gateway watches endpoint changes and automatically registers or deregisters workers. Worker status is then tracked over WebSocket (`/ws/status`) so the gateway sees bundle, machine profile, queue depth, loaded models, health, and config hash.

---

## Resource Pools

Source: [packages/sie_gateway/src/handlers/pools.rs](https://github.com/superlinked/sie/blob/main/packages/sie_gateway/src/handlers/pools.rs)

Resource pools reserve dedicated workers for tenant isolation. Pool workers only serve requests for that pool.

### Create a Pool

```python
client = SIEClient("http://gateway:8080")

# Reserve 2 L4 workers for this tenant
client.create_pool("tenant-abc", {"l4": 2})

# Route requests to the pool
result = client.encode(
    "BAAI/bge-m3",
    Item(text="hello"),
    gpu="tenant-abc/l4"  # pool_name/gpu_type
)

# Check pool status
info = client.get_pool("tenant-abc")

# Cleanup
client.delete_pool("tenant-abc")
```

### Pool Lifecycle

- Pools are represented in Kubernetes `ConfigMap`s and `Lease`s.
- The SDK renews pool leases automatically in a background thread.
- Pools expire after their TTL unless renewed.
- The `default` pool is protected and cannot be deleted.

---

## Config Read Surface

Source: [packages/sie_gateway/src/handlers/config_api.rs](https://github.com/superlinked/sie/blob/main/packages/sie_gateway/src/handlers/config_api.rs)

The gateway serves read-side config endpoints from its in-memory registry:

| Endpoint | Purpose |
|----------|---------|
| `GET /v1/configs/models` | List models known to this gateway |
| `GET /v1/configs/models/{id}` | Return model YAML from the gateway registry |
| `GET /v1/configs/models/{id}/status` | Report per-replica worker ACK readiness |
| `GET /v1/configs/bundles` | List known bundles and connected worker counts |
| `GET /v1/configs/bundles/{id}` | Return bundle YAML |
| `POST /v1/configs/resolve` | Dry-run model or explicit bundle override to bundle routing |

The gateway is not a config write authority. `POST /v1/configs/models` is not registered on the gateway and returns `405 Method Not Allowed`; send writes to `sie-config`.

### Bootstrap and Recovery

Source: [packages/sie_gateway/src/state/config_bootstrap.rs](https://github.com/superlinked/sie/blob/main/packages/sie_gateway/src/state/config_bootstrap.rs)

On startup, the gateway:

1. Optionally loads filesystem seeds from `SIE_BUNDLES_DIR` and `SIE_MODELS_DIR` if an escape-hatch config map is mounted.
2. Reads `GET /v1/configs/epoch` to capture the authoritative epoch and bundle-set hash.
3. Fetches bundles from `sie-config` with `GET /v1/configs/bundles{,/{id}}`.
4. Fetches model state with `GET /v1/configs/export`.
5. Subscribes to `sie.config.models._all` for live deltas.
6. Polls `GET /v1/configs/epoch` every 30 seconds to catch missed deltas or bundle-set drift.

`/readyz` does not wait for `sie-config`. A fresh gateway can be ready before the first config bootstrap succeeds; during that window, typed requests may return `404` until the registry is populated.

---

## Health & Status

The gateway aggregates health from all workers:

| Endpoint | Description |
|----------|-------------|
| `GET /healthz` | Gateway liveness |
| `GET /readyz` | Gateway readiness; intentionally independent of `sie-config` reachability |
| `GET /health` | Cluster summary: worker count, GPU count, models loaded |
| `GET /v1/models` | Model list from the gateway registry |
| `WS /ws/cluster-status` | Real-time cluster metrics stream |

### Cluster Health Example

```bash
curl http://gateway:8080/health
```

```json
{
  "status": "healthy",
  "worker_count": 3,
  "gpu_count": 3,
  "models_loaded": 12,
  "configured_gpu_types": ["l4", "a100-80gb"],
  "live_gpu_types": ["l4"]
}
```

---

## Metrics

Source: [packages/sie_gateway/src/metrics.rs](https://github.com/superlinked/sie/blob/main/packages/sie_gateway/src/metrics.rs)

Important gateway metrics include:

| Metric | Purpose |
|--------|---------|
| `sie_gateway_requests_total` | HTTP requests by endpoint, status, and machine profile |
| `sie_gateway_request_latency_seconds` | Gateway request latency |
| `sie_gateway_pending_demand` | KEDA scale-from-zero trigger by machine profile and bundle |
| `sie_gateway_worker_queue_depth` | Per-worker queue depth |
| `sie_gateway_config_epoch` | Highest config epoch applied on this gateway |
| `sie_gateway_config_bootstrap_degraded` | Whether bootstrap has been failing long enough to alert |
| `sie_gateway_config_deltas_total` | NATS config-delta processing outcomes |
| `sie_gateway_nats_connected` | Gateway NATS connection state |

---

## What's Next

- [Scale-from-Zero](/docs/deployment/autoscaling/) - the 202 flow and cold start handling
- [Config API](/docs/engine/config-api/) - runtime config writes and gateway readiness polling
- [Kubernetes in GCP](/docs/deployment/cloud-gcp/) - full deployment with the gateway
- [Monitoring](/docs/deployment/monitoring/) - metrics and dashboards
