Upgrade Runbook

Procedure for upgrading an SIE cluster to a new release version. Covers Helm-managed deployments on GKE and EKS.

Components upgraded:

Gateway (Deployment) - stateless inference edge, fast restart
Config service (Deployment) - single-replica config control plane
Worker pools (StatefulSets) - GPU worker pods with the SIE server sidecar plus the Python sie-server adapter, model cache in emptyDir

Version management: SIE uses release-please for unified versioning. A single version (e.g., 0.1.6) is applied to the Helm chart (Chart.yaml appVersion), Python packages, Rust gateway and SIE server sidecar crates, and TypeScript packages. The CHANGELOG.md at the repo root documents all changes per release.

1. Pre-Upgrade Checklist

Complete all items before starting the upgrade.

1.1 Review the CHANGELOG

Read CHANGELOG.md for the target version. Pay attention to:

Breaking changes in the gateway, config API, or server API
Helm values changes (new required values, renamed keys, removed options)
Model config changes (new or removed models, adapter changes)

# View changelog for the target version
git log v<CURRENT>..v<TARGET> --oneline

1.2 Record Current State

# Note current Helm release version
helm list -n sie

# Note current chart values (save for rollback reference)
helm get values sie -n sie -o yaml > /tmp/sie-values-backup.yaml

# Back up pool state (ConfigMaps + Leases in the sie namespace)
kubectl get configmap,lease -n sie -o yaml > /tmp/sie-pool-state-backup.yaml

# Record current image tags for every container, including the SIE server sidecar
kubectl get deployment -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {range .spec.template.spec.containers[*]}{.name}={.image}{" "}{end}{"\n"}{end}'
kubectl get statefulset -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {range .spec.template.spec.containers[*]}{.name}={.image}{" "}{end}{"\n"}{end}'

# Record Helm revision number
helm history sie -n sie --max 5

1.3 Verify Cluster Health

# All gateway pods should be Running and Ready
kubectl get pods -n sie -l app.kubernetes.io/component=gateway

# The config service should be Running and Ready
kubectl get pods -n sie -l app.kubernetes.io/component=config

# All worker pods should be Running and Ready if not scaled to zero.
# Active worker pods usually show 2/2 containers ready.
kubectl get pods -n sie -l app.kubernetes.io/component=worker

# Gateway readiness (returns ok)
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/readyz

# Gateway detailed health (returns worker count, GPU count, loaded models)
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/health

# Config service health
kubectl exec -n sie deploy/sie-sie-cluster-config -- wget -qO- http://localhost:8080/healthz

# KEDA ScaledObjects should not be in Fallback mode
kubectl get scaledobject -n sie
kubectl describe scaledobject -n sie | grep -A2 "Type.*Fallback"

# Check for recent errors in gateway and config logs
kubectl logs -n sie -l app.kubernetes.io/component=gateway --tail=50 | grep -i error
kubectl logs -n sie -l app.kubernetes.io/component=config --tail=50 | grep -i error

# Check for recent errors in worker logs
kubectl logs -n sie -l app.kubernetes.io/component=worker --tail=50 | grep -i error

1.4 Verify Observability Stack

# Prometheus is serving queries
kubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \
  'http://localhost:9090/api/v1/query?query=up' 2>/dev/null | head -c 200

# Grafana is accessible
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80 &
# Open http://localhost:3000 and verify SIE dashboards show data

1.5 Drain Active Workloads (Optional)

If running during active traffic, consider:

# Pause KEDA autoscaling to prevent scale events during upgrade.
# Each ScaledObject targets a specific StatefulSet, so freeze each one
# at its own replica count (pools may differ).
for so in $(kubectl get scaledobject -n sie -o jsonpath='{.items[*].metadata.name}'); do
  # Read the actual scale target from the ScaledObject spec
  sts=$(kubectl get scaledobject "$so" -n sie -o jsonpath='{.spec.scaleTargetRef.name}')
  replicas=$(kubectl get statefulset "$sts" -n sie -o jsonpath='{.spec.replicas}' 2>/dev/null)
  if [ -n "$replicas" ]; then
    kubectl annotate scaledobject "$so" -n sie \
      autoscaling.keda.sh/paused-replicas="$replicas" --overwrite
  fi
done

2. Upgrade Procedure

2.1 Prepare New Images

For clusters using custom image registries (not the default ghcr.io/superlinked), push the new images first:

# Build and push new images (adjust registry as needed)
REGISTRY="your-registry.example.com"
TAG="0.1.7"  # Target version

# Build and push config + server bundle images in one bake invocation.
# Add --bake-bundles or --bake-platform if you need non-default coverage.
mise run docker -- \
  --bake \
  --bake-include-gateway \
  --bake-tag "$TAG" \
  --registry "$REGISTRY/" \
  --push

2.2 Helm Upgrade

Option A: Upgrade from Local Chart

# Dry-run first to preview changes
helm diff upgrade sie deploy/helm/sie-cluster/ \
  -n sie \
  -f /tmp/sie-values-backup.yaml \
  --set workers.common.image.tag="<TARGET_VERSION>" \
  --set workers.common.workerSidecar.image.tag="<TARGET_VERSION>" \
  --set gateway.image.tag="<TARGET_VERSION>" \
  --set config.image.tag="<TARGET_VERSION>"

# Apply the upgrade (--wait blocks until pods are ready; --timeout guards against hangs)
helm upgrade sie deploy/helm/sie-cluster/ \
  -n sie \
  -f /tmp/sie-values-backup.yaml \
  --set workers.common.image.tag="<TARGET_VERSION>" \
  --set workers.common.workerSidecar.image.tag="<TARGET_VERSION>" \
  --set gateway.image.tag="<TARGET_VERSION>" \
  --set config.image.tag="<TARGET_VERSION>" \
  --wait --timeout 10m

Option B: Upgrade from OCI Registry

# Dry-run
helm diff upgrade sie oci://ghcr.io/superlinked/charts/sie-cluster \
  -n sie \
  --version <TARGET_CHART_VERSION> \
  -f /tmp/sie-values-backup.yaml \
  --set workers.common.image.tag="<TARGET_VERSION>" \
  --set workers.common.workerSidecar.image.tag="<TARGET_VERSION>" \
  --set gateway.image.tag="<TARGET_VERSION>" \
  --set config.image.tag="<TARGET_VERSION>"

# Apply
helm upgrade sie oci://ghcr.io/superlinked/charts/sie-cluster \
  -n sie \
  --version <TARGET_CHART_VERSION> \
  -f /tmp/sie-values-backup.yaml \
  --set workers.common.image.tag="<TARGET_VERSION>" \
  --set workers.common.workerSidecar.image.tag="<TARGET_VERSION>" \
  --set gateway.image.tag="<TARGET_VERSION>" \
  --set config.image.tag="<TARGET_VERSION>" \
  --wait --timeout 10m

Option C: Terraform-Managed Clusters

# Update image tag in Terraform variables
# Edit your .tfvars or set TF_VAR:
export TF_VAR_sie_image_tag="<TARGET_VERSION>"

cd deploy/terraform/gcp/examples/<your-env>
terraform plan   # Review changes
terraform apply  # Apply

2.3 Expected Behavior During Rolling Update

Gateway (Deployment):

Kubernetes rolls out new gateway pods one at a time (default RollingUpdate strategy).
Startup probe gates the other probes: GET /healthz, periodSeconds: 5, failureThreshold: 12 (up to 60 s for boot).
Once startup passes, liveness polls GET /healthz every 10 s and readiness polls GET /readyz every 5 s. /readyz returns 200 even with zero fresh SIE server sidecar health records; the gateway accepts traffic and emits 202 for cold-start cases.
The gateway is stateless on the request path; new pods come up in seconds.
Brief 503s are possible during the switchover window if all old pods are terminated before new ones pass readiness.

Config service (Deployment):

sie-config is intentionally single-replica because it owns serialized config writes and epoch bumps.
If the config-store PVC is enabled, the chart uses a Recreate strategy to avoid ReadWriteOnce mount conflicts.
The gateway keeps serving from its in-memory registry during a short config-service restart, but config writes and bootstrap/drift recovery depend on sie-config.

Workers (StatefulSets):

The default RollingUpdate strategy updates pods one at a time in reverse ordinal order. (podManagementPolicy: Parallel only affects pod ordering during scaling, not rolling updates.)
Worker terminationGracePeriodSeconds: 65.
preStop hook: sleep 10 - gives the K8s endpoints controller 10 seconds to remove the pod from the service before SIGTERM.
On SIGTERM, the SIE server sidecar starts draining, stops accepting new queue work, and lets in-flight IPC calls finish before exit.
The Python sie-server adapter enters graceful shutdown: rejects new local server API requests with 503 and Retry-After: 5, drains in-flight requests, then exits.
Readiness stops passing once shutdown begins. With queue-mode SIE server sidecar routing, the gateway removes that sidecar health record when its NATS heartbeat goes stale.
New worker pods must download model weights if the emptyDir cache is empty (cache does not persist across pod restarts). Cold model loading can take 10-120 seconds depending on model size and cache state.
PodDisruptionBudget: maxUnavailable: 1 per worker pool - protects against external disruptions (e.g., kubectl drain, node autoscaler) but is not enforced by the StatefulSet controller during rolling updates.

Client Impact:

SDK clients with automatic retry handle 503s transparently.
Requests in flight during graceful shutdown complete normally (up to 25-second drain timeout).
If all worker pods in a pool are restarting simultaneously, the gateway returns 202 Accepted (provisioning), and the SDK retries with backoff.

2.4 Monitor the Rollout

# Watch gateway and config rollouts
kubectl rollout status deployment/sie-sie-cluster-gateway -n sie --timeout=120s
kubectl rollout status deployment/sie-sie-cluster-config -n sie --timeout=120s

# Watch worker rollouts (one per pool)
kubectl get statefulsets -n sie -w

# Watch all pods
kubectl get pods -n sie -w

# Check KEDA ScaledObjects are still healthy (not Fallback)
kubectl get scaledobject -n sie -o custom-columns=NAME:.metadata.name,READY:.status.conditions[0].status,MIN:.spec.minReplicaCount,MAX:.spec.maxReplicaCount,REPLICAS:.status.currentReplicas

# Watch gateway logs for errors during transition
kubectl logs -n sie -l app.kubernetes.io/component=gateway -f --tail=20

3. Post-Upgrade Verification

3.1 All Pods Healthy

# All pods Running and Ready
kubectl get pods -n sie
# Expected: gateway/config pods 1/1 Ready, active worker pods 2/2 Ready

# Verify new image tags are deployed
kubectl get pods -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {range .spec.containers[*]}{.name}={.image}{" "}{end}{"\n"}{end}'

3.2 Gateway and Config Health

# Readiness check
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/readyz
# Expected: ok

# Detailed health (worker count, models, GPU types)
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/health
# Expected: "status": "healthy", worker_count > 0 (if pools not scaled to zero)

# Config service is healthy
kubectl exec -n sie deploy/sie-sie-cluster-config -- wget -qO- http://localhost:8080/healthz

# Model catalog is available
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/v1/models | head -c 500

3.3 Encode Request Smoke Test

# Port-forward to gateway
kubectl port-forward -n sie svc/sie-sie-cluster-gateway 8080:8080 &

# Test encode request (requires a running worker with GPU)
python3 -c "
from sie_sdk import SIEClient
client = SIEClient('http://localhost:8080')
result = client.encode('BAAI/bge-m3', {'text': 'upgrade verification test'})
print(f'Dense embedding dim: {len(result[\"dense\"])}')
print('SUCCESS: Encode request returned 200')
"

# Or with curl (JSON output):
curl -s -X POST http://localhost:8080/v1/encode/BAAI%2Fbge-m3 \
  -H "Content-Type: application/json" \
  -d '{"items": [{"text": "upgrade verification test"}]}' | python3 -m json.tool | head -5

3.4 KEDA and Autoscaling

# Unpause KEDA if paused in step 1.5
kubectl annotate scaledobject -n sie --all autoscaling.keda.sh/paused-replicas- --overwrite

# Verify ScaledObjects are Ready (not Fallback)
kubectl get scaledobject -n sie
kubectl describe scaledobject -n sie | grep -A3 "Conditions:"
# Expected: Ready=True, Active depends on load, Fallback=False

3.5 Metrics Flowing

# Verify Prometheus is scraping the new pods
kubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \
  'http://localhost:9090/api/v1/query?query=sie_requests_total' 2>/dev/null | python3 -m json.tool | head -20

# Verify gateway metrics
kubectl exec -n monitoring svc/prometheus-operated -- wget -qO- \
  'http://localhost:9090/api/v1/query?query=sie_gateway_requests_total' 2>/dev/null | python3 -m json.tool | head -20

# Check Grafana dashboards show data for new pods
# Port-forward: kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Navigate to SIE > Cluster Overview dashboard

3.6 Version Verification

# Check Helm release version
helm list -n sie
# Expected: Chart version and App version match target

# Check the server version header on a response
curl -s -I http://localhost:8080/healthz | grep -i x-sie
# Expected: X-SIE-Server-Version: <TARGET_VERSION>

4. Rollback Procedure

4.1 Identify Rollback Target

# List Helm release history
helm history sie -n sie --max 10
# Note the REVISION number of the last known-good release

4.2 Execute Rollback

# Rollback to previous revision
helm rollback sie <REVISION> -n sie

# Or rollback to immediately previous version
helm rollback sie -n sie

For Terraform-managed clusters:

# Revert image tag to previous version
export TF_VAR_sie_image_tag="<PREVIOUS_VERSION>"
cd deploy/terraform/gcp/examples/<your-env>
terraform apply

4.3 Monitor Rollback

# Watch the rollback proceed
kubectl rollout status deployment/sie-sie-cluster-gateway -n sie --timeout=120s
kubectl get pods -n sie -w

# Verify old image is restored
kubectl get pods -n sie -o jsonpath='{range .items[*]}{.metadata.name}: {range .spec.containers[*]}{.name}={.image}{" "}{end}{"\n"}{end}'

4.4 Verify Rollback Succeeded

Run the same post-upgrade verification steps:

# Gateway health
kubectl exec -n sie deploy/sie-sie-cluster-gateway -- wget -qO- http://localhost:8080/readyz

# Encode smoke test
kubectl port-forward -n sie svc/sie-sie-cluster-gateway 8080:8080 &
python3 -c "
from sie_sdk import SIEClient
client = SIEClient('http://localhost:8080')
result = client.encode('BAAI/bge-m3', {'text': 'rollback verification'})
print(f'Dense dim: {len(result[\"dense\"])} - SUCCESS')
"

# KEDA health
kubectl get scaledobject -n sie

4.5 Known Caveats

No database migrations: Workers use emptyDir for model cache, and the gateway stores pool state in ConfigMaps with Leases for TTL. sie-config persists model config YAML and an epoch counter only when its config store is enabled.
Model cache invalidation: Worker pods use emptyDir volumes for the HuggingFace model cache. Rolling back means new pods start with an empty cache and must re-download model weights on first request. If cluster cache (S3/GCS) is configured, downloads come from there instead of HuggingFace Hub.
Pool state: Resource pools are stored as ConfigMaps in the sie namespace. Pool leases survive upgrades and rollbacks. Active pools will continue to work, but if the pool API changed between versions, clients may need to recreate pools.
KEDA ScaledObjects: Helm rollback re-applies the previous ScaledObject definitions. If KEDA version requirements changed between SIE versions, verify ScaledObjects are not in Fallback mode after rollback.
Config drift: If the upgrade included changes to embedded model or bundle configs (baked into the Helm chart files/ directory), rollback restores the previous configs. Ensure the previous configs are compatible with the previous server version.
SDK version compatibility: The gateway returns X-SIE-Server-Version headers. If clients upgraded their SDK alongside the server, a server rollback may trigger version mismatch warnings in the SDK logs. The SDK remains functional but logs warnings for major.minor mismatches.

Appendix: Key Resources

Resource	Namespace	Type	Purpose
`sie-sie-cluster-gateway`	`sie`	Deployment	Stateless request gateway (2+ replicas)
`sie-sie-cluster-config`	`sie`	Deployment	Single-writer config control plane
`sie-sie-cluster-worker-<pool>`	`sie`	StatefulSet	GPU worker pool (one per pool)
`sie-sie-cluster-worker`	`sie`	Service (headless)	Worker DNS discovery
`sie-sie-cluster-gateway`	`sie`	Service (ClusterIP)	Gateway endpoint
`sie-sie-cluster-config`	`sie`	Service (ClusterIP)	Internal config API
`sie-sie-cluster-worker-<pool>-scaler`	`sie`	ScaledObject	KEDA autoscaler per pool
`sie-sie-cluster-worker-<pool>`	`sie`	PodDisruptionBudget	maxUnavailable: 1 per pool
`sie-sie-cluster-gpu-config`	`sie`	ConfigMap	Available GPU types / machine profiles
`sie-sie-cluster-config`	`sie`	ConfigMap	Shared cluster configuration

Health Endpoints

Endpoint	Component	Returns
`GET /healthz`	Gateway	`ok` - process liveness
`GET /readyz`	Gateway	`ok` - process readiness, independent of SIE server sidecar health
`GET /health`	Gateway	Detailed cluster status (worker count, GPUs, models)
`GET /healthz`	Config service	`{"status": "ok"}` - liveness probe
`GET /readyz`	Config service	`{"status": "ready"}` or 503 - registry readiness
`GET /healthz`	Python `sie-server` adapter	`ok` - liveness probe
`GET /readyz`	Python `sie-server` adapter	`ok` or 503 - readiness probe
`GET /healthz`	SIE server sidecar (`worker-sidecar` container)	`ok` - process liveness
`GET /readyz`	SIE server sidecar (`worker-sidecar` container)	`ok` or 503 - fresh IPC ping and no active drain
`GET /metrics`	Gateway, config, Python `sie-server` adapter, SIE server sidecar	Prometheus metrics

Grafana Dashboards

Dashboard	Purpose
Cluster Overview	QPS, latency (p50/p95/p99), GPU utilization
Model Performance	Per-model latency, throughput, batch sizes
Worker Pod Health	Per-worker-pod CPU/memory, GPU temp, queue depth

What’s Next

Monitoring - metrics, alerts, and dashboards
Scale-from-Zero - KEDA autoscaling and cold start handling
Kubernetes in GCP - GKE deployment setup
Kubernetes in AWS - EKS deployment setup