---
title: Upgrade Runbook
description: Step-by-step guide for upgrading SIE clusters with pre-checks, rolling updates, and rollback procedures.
canonical_url: https://superlinked.com/docs/deployment/upgrades
last_updated: 2026-05-20
---

Procedure for upgrading an SIE cluster to a new release version. Covers Helm-managed deployments on GKE and EKS.

**Components upgraded:**
- **Gateway** (Deployment) - stateless inference edge, fast restart
- **Config service** (Deployment) - single-replica config control plane
- **Worker pools** (StatefulSets) - GPU pods, model cache in emptyDir

**Version management:** SIE uses release-please for unified versioning. A single version (e.g., `0.1.6`) is applied to the Helm chart (`Chart.yaml` `appVersion`), Python packages, the Rust gateway crate, and TypeScript packages. The CHANGELOG.md at the repo root documents all changes per release.

---

## 1. Pre-Upgrade Checklist

Complete all items before starting the upgrade.

### 1.1 Review the CHANGELOG

Read `CHANGELOG.md` for the target version. Pay attention to:
- **Breaking changes** in the gateway, config API, or server API
- **Helm values changes** (new required values, renamed keys, removed options)
- **Model config changes** (new or removed models, adapter changes)

```bash
# View changelog for the target version
git log v<CURRENT>..v<TARGET> --oneline
```

### 1.2 Record Current State

```bash
# Note current Helm release version
helm list -n sie

# Note current chart values (save for rollback reference)
helm get values sie -n sie -o yaml > /tmp/sie-values-backup.yaml

# Back up pool state (ConfigMaps + Leases in the sie namespace)
kubectl get configmap,lease -n sie -o yaml > /tmp/sie-pool-state-backup.yaml

# Record current image tags
kubectl get deployment -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.template.spec.containers[0].image}{"\n"}{end}'
kubectl get statefulset -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.template.spec.containers[0].image}{"\n"}{end}'

# Record Helm revision number
helm history sie -n sie --max 5
```

### 1.3 Verify Cluster Health

```bash
# All gateway pods should be Running and Ready
kubectl get pods -n sie -l app.kubernetes.io/component=gateway

# The config service should be Running and Ready
kubectl get pods -n sie -l app.kubernetes.io/component=config

# All worker pods should be Running and Ready (if not scaled to zero)
kubectl get pods -n sie -l app.kubernetes.io/component=worker

# Gateway readiness (returns {"status": "ready"})
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/readyz

# Gateway detailed health (returns worker count, GPU count, loaded models)
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/health

# Config service health
kubectl exec -n sie deploy/sie-sie-cluster-config -- wget -qO- http://localhost:8080/healthz

# KEDA ScaledObjects should not be in Fallback mode
kubectl get scaledobject -n sie
kubectl describe scaledobject -n sie | grep -A2 "Type.*Fallback"

# Check for recent errors in gateway and config logs
kubectl logs -n sie -l app.kubernetes.io/component=gateway --tail=50 | grep -i error
kubectl logs -n sie -l app.kubernetes.io/component=config --tail=50 | grep -i error

# Check for recent errors in worker logs
kubectl logs -n sie -l app.kubernetes.io/component=worker --tail=50 | grep -i error
```

### 1.4 Verify Observability Stack

```bash
# Prometheus is serving queries
kubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \
  'http://localhost:9090/api/v1/query?query=up' 2>/dev/null | head -c 200

# Grafana is accessible
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 &
# Open http://localhost:3000 and verify SIE dashboards show data
```

### 1.5 Drain Active Workloads (Optional)

> **Caution:**
>
> Only drain traffic if your upgrade includes breaking changes. Standard rolling updates maintain availability without draining.

If running during active traffic, consider:

```bash
# Pause KEDA autoscaling to prevent scale events during upgrade.
# Each ScaledObject targets a specific StatefulSet, so freeze each one
# at its own replica count (pools may differ).
for so in $(kubectl get scaledobject -n sie -o jsonpath='{.items[*].metadata.name}'); do
  # Read the actual scale target from the ScaledObject spec
  sts=$(kubectl get scaledobject "$so" -n sie -o jsonpath='{.spec.scaleTargetRef.name}')
  replicas=$(kubectl get statefulset "$sts" -n sie -o jsonpath='{.spec.replicas}' 2>/dev/null)
  if [ -n "$replicas" ]; then
    kubectl annotate scaledobject "$so" -n sie \
      autoscaling.keda.sh/paused-replicas="$replicas" --overwrite
  fi
done
```

---

## 2. Upgrade Procedure

### 2.1 Prepare New Images

For clusters using custom image registries (not the default `ghcr.io/superlinked`), push the new images first:

```bash
# Build and push new images (adjust registry as needed)
REGISTRY="your-registry.example.com"
TAG="0.1.7"  # Target version

# Build and push config + server bundle images in one bake invocation.
# Add --bake-bundles or --bake-platform if you need non-default coverage.
mise run docker -- \
  --bake \
  --bake-include-gateway \
  --bake-tag "$TAG" \
  --registry "$REGISTRY/" \
  --push
```

### 2.2 Helm Upgrade

#### Option A: Upgrade from Local Chart

```bash
# Dry-run first to preview changes
helm diff upgrade sie deploy/helm/sie-cluster/ \
  -n sie \
  -f /tmp/sie-values-backup.yaml \
  --set workers.common.image.tag="<TARGET_VERSION>" \
  --set gateway.image.tag="<TARGET_VERSION>" \
  --set config.image.tag="<TARGET_VERSION>"

# Apply the upgrade (--wait blocks until pods are ready; --timeout guards against hangs)
helm upgrade sie deploy/helm/sie-cluster/ \
  -n sie \
  -f /tmp/sie-values-backup.yaml \
  --set workers.common.image.tag="<TARGET_VERSION>" \
  --set gateway.image.tag="<TARGET_VERSION>" \
  --set config.image.tag="<TARGET_VERSION>" \
  --wait --timeout 10m
```

#### Option B: Upgrade from OCI Registry

```bash
# Dry-run
helm diff upgrade sie oci://ghcr.io/superlinked/charts/sie-cluster \
  -n sie \
  --version <TARGET_CHART_VERSION> \
  -f /tmp/sie-values-backup.yaml

# Apply
helm upgrade sie oci://ghcr.io/superlinked/charts/sie-cluster \
  -n sie \
  --version <TARGET_CHART_VERSION> \
  -f /tmp/sie-values-backup.yaml \
  --wait --timeout 10m
```

#### Option C: Terraform-Managed Clusters

```bash
# Update image tag in Terraform variables
# Edit your .tfvars or set TF_VAR:
export TF_VAR_sie_image_tag="<TARGET_VERSION>"

cd deploy/terraform/gcp/examples/<your-env>
terraform plan   # Review changes
terraform apply  # Apply
```

### 2.3 Expected Behavior During Rolling Update

**Gateway (Deployment):**
- Kubernetes rolls out new gateway pods one at a time (default `RollingUpdate` strategy).
- Startup probe gates the other probes: `GET /healthz`, `periodSeconds: 5`, `failureThreshold: 12` (up to 60 s for boot).
- Once startup passes, liveness polls `GET /healthz` every 10 s and readiness polls `GET /readyz` every 5 s. `/readyz` returns 200 even with zero connected workers — the gateway accepts traffic and emits `202` for cold-start cases.
- The gateway is stateless on the request path; new pods come up in seconds.
- Brief 503s are possible during the switchover window if all old pods are terminated before new ones pass readiness.

**Config service (Deployment):**
- `sie-config` is intentionally single-replica because it owns serialized config writes and epoch bumps.
- If the config-store PVC is enabled, the chart uses a `Recreate` strategy to avoid ReadWriteOnce mount conflicts.
- The gateway keeps serving from its in-memory registry during a short config-service restart, but config writes and bootstrap/drift recovery depend on `sie-config`.

**Workers (StatefulSets):**
- The default `RollingUpdate` strategy updates pods one at a time in reverse ordinal order. (`podManagementPolicy: Parallel` only affects pod ordering during scaling, not rolling updates.)
- Worker `terminationGracePeriodSeconds: 65`.
- `preStop` hook: `sleep 10` - gives the K8s endpoints controller 10 seconds to remove the pod from the service before SIGTERM.
- On SIGTERM, the server enters graceful shutdown: rejects new requests with `503` (with `Retry-After: 5` header), drains in-flight requests (25-second timeout), then exits.
- Readiness probe stops passing (`/readyz` returns 503) once shutdown begins, so the gateway stops treating the pod as available.
- The gateway detects worker disconnection via WebSocket and removes it from the routing table.
- New worker pods must download model weights if the emptyDir cache is empty (cache does not persist across pod restarts). Cold model loading can take 10-120 seconds depending on model size and cache state.
- PodDisruptionBudget: `maxUnavailable: 1` per worker pool - protects against external disruptions (e.g., `kubectl drain`, node autoscaler) but is **not** enforced by the StatefulSet controller during rolling updates.

**Client Impact:**
- SDK clients with automatic retry handle 503s transparently.
- Requests in flight during graceful shutdown complete normally (up to 25-second drain timeout).
- If all workers in a pool are restarting simultaneously, the gateway returns `202 Accepted` (provisioning), and the SDK retries with backoff.

### 2.4 Monitor the Rollout

```bash
# Watch gateway and config rollouts
kubectl rollout status deployment/sie-sie-cluster-gateway -n sie --timeout=120s
kubectl rollout status deployment/sie-sie-cluster-config -n sie --timeout=120s

# Watch worker rollouts (one per pool)
kubectl get statefulsets -n sie -w

# Watch all pods
kubectl get pods -n sie -w

# Check KEDA ScaledObjects are still healthy (not Fallback)
kubectl get scaledobject -n sie -o custom-columns=NAME:.metadata.name,READY:.status.conditions[0].status,MIN:.spec.minReplicaCount,MAX:.spec.maxReplicaCount,REPLICAS:.status.currentReplicas

# Watch gateway logs for errors during transition
kubectl logs -n sie -l app.kubernetes.io/component=gateway -f --tail=20
```

---

## 3. Post-Upgrade Verification

### 3.1 All Pods Healthy

```bash
# All pods Running and Ready
kubectl get pods -n sie
# Expected: gateway/config pods 1/1 Ready, all active worker pods 1/1 Ready

# Verify new image tags are deployed
kubectl get pods -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.containers[0].image}{"\n"}{end}'
```

### 3.2 Gateway and Config Health

```bash
# Readiness check
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/readyz
# Expected: {"status": "ready"}

# Detailed health (worker count, models, GPU types)
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/health
# Expected: "status": "healthy", worker_count > 0 (if pools not scaled to zero)

# Config service is healthy
kubectl exec -n sie deploy/sie-sie-cluster-config -- wget -qO- http://localhost:8080/healthz

# Model catalog is available
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/v1/models | head -c 500
```

### 3.3 Encode Request Smoke Test

```bash
# Port-forward to gateway
kubectl port-forward -n sie svc/sie-sie-cluster-gateway 8080:8080 &

# Test encode request (requires a running worker with GPU)
python3 -c "
from sie_sdk import SIEClient
client = SIEClient('http://localhost:8080')
result = client.encode('BAAI/bge-m3', {'text': 'upgrade verification test'})
print(f'Dense embedding dim: {len(result[\"dense\"])}')
print('SUCCESS: Encode request returned 200')
"

# Or with curl (JSON fallback):
curl -s -X POST http://localhost:8080/v1/encode/BAAI%2Fbge-m3 \
  -H "Content-Type: application/json" \
  -d '{"items": [{"text": "upgrade verification test"}]}' | python3 -m json.tool | head -5
```

### 3.4 KEDA and Autoscaling

```bash
# Unpause KEDA if paused in step 1.5
kubectl annotate scaledobject -n sie --all autoscaling.keda.sh/paused-replicas- --overwrite

# Verify ScaledObjects are Ready (not Fallback)
kubectl get scaledobject -n sie
kubectl describe scaledobject -n sie | grep -A3 "Conditions:"
# Expected: Ready=True, Active depends on load, Fallback=False
```

### 3.5 Metrics Flowing

```bash
# Verify Prometheus is scraping the new pods
kubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \
  'http://localhost:9090/api/v1/query?query=sie_requests_total' 2>/dev/null | python3 -m json.tool | head -20

# Verify gateway metrics
kubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \
  'http://localhost:9090/api/v1/query?query=sie_gateway_requests_total' 2>/dev/null | python3 -m json.tool | head -20

# Check Grafana dashboards show data for new pods
# Port-forward: kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Navigate to SIE > Cluster Overview dashboard
```

### 3.6 Version Verification

```bash
# Check Helm release version
helm list -n sie
# Expected: Chart version and App version match target

# Check the server version header on a response
curl -s -I http://localhost:8080/healthz | grep -i x-sie
# Expected: X-SIE-Server-Version: <TARGET_VERSION>
```

---

## 4. Rollback Procedure

### 4.1 Identify Rollback Target

```bash
# List Helm release history
helm history sie -n sie --max 10
# Note the REVISION number of the last known-good release
```

### 4.2 Execute Rollback

```bash
# Rollback to previous revision
helm rollback sie <REVISION> -n sie

# Or rollback to immediately previous version
helm rollback sie -n sie
```

For Terraform-managed clusters:

```bash
# Revert image tag to previous version
export TF_VAR_sie_image_tag="<PREVIOUS_VERSION>"
cd deploy/terraform/gcp/examples/<your-env>
terraform apply
```

### 4.3 Monitor Rollback

```bash
# Watch the rollback proceed
kubectl rollout status deployment/sie-sie-cluster-gateway -n sie --timeout=120s
kubectl get pods -n sie -w

# Verify old image is restored
kubectl get pods -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {.spec.containers[0].image}{"\n"}{end}'
```

### 4.4 Verify Rollback Succeeded

Run the same [post-upgrade verification](#3-post-upgrade-verification) steps:

```bash
# Gateway health
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/readyz

# Encode smoke test
kubectl port-forward -n sie svc/sie-sie-cluster-gateway 8080:8080 &
python3 -c "
from sie_sdk import SIEClient
client = SIEClient('http://localhost:8080')
result = client.encode('BAAI/bge-m3', {'text': 'rollback verification'})
print(f'Dense dim: {len(result[\"dense\"])} - SUCCESS')
"

# KEDA health
kubectl get scaledobject -n sie
```

### 4.5 Known Caveats

- **No database migrations:** Workers use emptyDir for model cache, and the gateway stores pool state in ConfigMaps with Leases for TTL. `sie-config` persists model config YAML and an epoch counter only when its config store is enabled.
- **Model cache invalidation:** Worker pods use emptyDir volumes for the HuggingFace model cache. Rolling back means new pods start with an empty cache and must re-download model weights on first request. If cluster cache (S3/GCS) is configured, downloads come from there instead of HuggingFace Hub.
- **Pool state:** Resource pools are stored as ConfigMaps in the `sie` namespace. Pool leases survive upgrades and rollbacks. Active pools will continue to work, but if the pool API changed between versions, clients may need to recreate pools.
- **KEDA ScaledObjects:** Helm rollback re-applies the previous ScaledObject definitions. If KEDA version requirements changed between SIE versions, verify ScaledObjects are not in Fallback mode after rollback.
- **Config drift:** If the upgrade included changes to embedded model or bundle configs (baked into the Helm chart `files/` directory), rollback restores the previous configs. Ensure the previous configs are compatible with the previous server version.
- **SDK version compatibility:** The gateway returns `X-SIE-Server-Version` headers. If clients upgraded their SDK alongside the server, a server rollback may trigger version mismatch warnings in the SDK logs. The SDK remains functional but logs warnings for major.minor mismatches.

---

## Appendix: Key Resources

| Resource | Namespace | Type | Purpose |
|----------|-----------|------|---------|
| `sie-sie-cluster-gateway` | `sie` | Deployment | Stateless request gateway (2+ replicas) |
| `sie-sie-cluster-config` | `sie` | Deployment | Single-writer config control plane |
| `sie-sie-cluster-worker-<pool>` | `sie` | StatefulSet | GPU worker pool (one per pool) |
| `sie-sie-cluster-worker` | `sie` | Service (headless) | Worker DNS discovery |
| `sie-sie-cluster-gateway` | `sie` | Service (ClusterIP) | Gateway endpoint |
| `sie-sie-cluster-config` | `sie` | Service (ClusterIP) | Internal config API |
| `sie-sie-cluster-worker-<pool>-scaler` | `sie` | ScaledObject | KEDA autoscaler per pool |
| `sie-sie-cluster-worker-<pool>` | `sie` | PodDisruptionBudget | maxUnavailable: 1 per pool |
| `sie-sie-cluster-gpu-config` | `sie` | ConfigMap | Available GPU types / machine profiles |
| `sie-sie-cluster-config` | `sie` | ConfigMap | Shared cluster configuration |

### Health Endpoints

| Endpoint | Component | Returns |
|----------|-----------|---------|
| `GET /healthz` | Gateway | `{"status": "ok"}` - liveness probe |
| `GET /readyz` | Gateway | `{"status": "ready"}` - readiness probe |
| `GET /health` | Gateway | Detailed cluster status (worker count, GPUs, models) |
| `GET /healthz` | Config service | `{"status": "ok"}` - liveness probe |
| `GET /healthz` | Worker | `"ok"` - liveness probe |
| `GET /readyz` | Worker | `"ok"` or 503 - readiness probe |
| `GET /metrics` | Both | Prometheus metrics |

### Grafana Dashboards

| Dashboard | Purpose |
|-----------|---------|
| Cluster Overview | QPS, latency (p50/p95/p99), GPU utilization |
| Model Performance | Per-model latency, throughput, batch sizes |
| Worker Health | Per-worker CPU/memory, GPU temp, queue depth |

---

## What's Next

- [Monitoring](/docs/deployment/monitoring/) - metrics, alerts, and dashboards
- [Scale-from-Zero](/docs/deployment/autoscaling/) - KEDA autoscaling and cold start handling
- [Kubernetes in GCP](/docs/deployment/cloud-gcp/) - GKE deployment setup
- [Kubernetes in AWS](/docs/deployment/cloud-aws/) - EKS deployment setup
