Offline / Air-Gapped Deployment

Bring SIE up in a cluster with no public internet access. The worker pods normally pull model weights from HuggingFace and container images from GHCR; both of those need to come from inside your network instead.

This guide covers a typical air-gapped flow:

Snapshot model weights on a workstation that has internet access.
Mirror the snapshot to private S3-compatible storage reachable from the cluster.
Configure the chart to read weights from that store and skip HuggingFace.
Mirror the SIE container images to a private registry.
Verify first inference with no egress.

The same pattern works for “restricted egress” clusters that allow private object storage but block public HuggingFace.

1. Snapshot model weights

The simplest tool is huggingface-cli (already a dependency of any SIE workstation):

export HF_HUB_CACHE=./offline-weights

# One model
huggingface-cli download BAAI/bge-m3 --cache-dir ./offline-weights

# A bundle's worth of models, repeated for each model in the bundle
huggingface-cli download intfloat/e5-base-v2 --cache-dir ./offline-weights
huggingface-cli download mixedbread-ai/mxbai-rerank-large-v1 --cache-dir ./offline-weights

The result is a directory in HuggingFace cache layout (./offline-weights/models--BAAI--bge-m3/snapshots/<sha>/...) that the chart can mount as HF_HUB_CACHE.

Set HF_TOKEN before running for any gated models.

2. Mirror to private storage

Push the snapshot to S3-compatible storage that the cluster can reach. AWS S3, GCS, MinIO, and Ceph all work; the chart treats them the same.

# AWS S3
aws s3 sync ./offline-weights s3://sie-models-private/weights/

# MinIO (in-cluster or on-prem)
mc mirror ./offline-weights minio/sie-models-private/weights/

# GCS
gsutil -m rsync -r ./offline-weights gs://sie-models-private/weights/

Whatever you choose, the URL handed to the chart in the next step must be reachable from worker pods.

3. Configure the cluster cache

Point the chart’s workers.common.clusterCache at the mirrored bucket. Workers will read weights from there instead of HuggingFace.

workers:
  common:
    clusterCache:
      enabled: true
      url: s3://sie-models-private/weights/   # or gs:// for GCS

    # Disable HuggingFace fallback so workers fail fast if the cache is incomplete
    hfCache:
      home: /models/huggingface
      tokenSecret: ""

# Skip HF token wiring entirely in air-gapped clusters
hfToken:
  create: false

For S3, the workers authenticate via IRSA (EKS) or static credentials supplied through extraEnv. For GCS, they use Workload Identity (GKE). For MinIO or other S3-compatibles, mount credentials via a secret and pass them through workers.common.extraEnv.

4. Mirror container images

The chart pulls these public images from GHCR by default:

Image	Where it’s set
`ghcr.io/superlinked/sie-server`	`workers.common.image.repository`
`ghcr.io/superlinked/sie-gateway`	`gateway.image.repository`
`ghcr.io/superlinked/sie-config`	`config.image.repository`

For air-gapped clusters, mirror them to a private registry once:

# Replace with your version tag
TAG=v0.3.1

for img in sie-server sie-gateway sie-config; do
  docker pull ghcr.io/superlinked/$img:$TAG
  docker tag  ghcr.io/superlinked/$img:$TAG private-registry.example.com/sie/$img:$TAG
  docker push private-registry.example.com/sie/$img:$TAG
done

Then point the chart at your registry:

# values-offline.yaml (continued)
gateway:
  image:
    repository: private-registry.example.com/sie/sie-gateway
    tag: v0.3.1

config:
  image:
    repository: private-registry.example.com/sie/sie-config
    tag: v0.3.1

workers:
  common:
    image:
      repository: private-registry.example.com/sie/sie-server
      tag: v0.3.1

global:
  imagePullSecrets:
    - name: regcred

If your registry needs auth, create the regcred Docker secret in the sie namespace before installing the chart:

kubectl create secret docker-registry regcred \
  --docker-server=private-registry.example.com \
  --docker-username=... \
  --docker-password=... \
  -n sie

5. Install and verify

Install the chart with the offline values, no internet egress required:

helm upgrade --install sie oci://ghcr.io/superlinked/charts/sie-cluster \
  --version 0.3.1 \
  -f values-offline.yaml \
  -n sie --create-namespace

If you also mirrored the chart itself (recommended for fully air-gapped), pull it once with helm pull oci://ghcr.io/superlinked/charts/sie-cluster --version 0.3.1 and install from the local .tgz:

helm pull oci://ghcr.io/superlinked/charts/sie-cluster --version 0.3.1
# Move sie-cluster-0.3.1.tgz onto the air-gapped workstation, then:
helm upgrade --install sie ./sie-cluster-0.3.1.tgz \
  -f values-offline.yaml \
  -n sie --create-namespace

Verify first inference exactly like the GCP or AWS guides:

kubectl -n sie port-forward svc/sie-sie-cluster-gateway 8080:8080 &

python3 -c "
from sie_sdk import SIEClient

client = SIEClient('http://localhost:8080')
result = client.encode('BAAI/bge-m3', {'text': 'hello world'},
                       gpu='l4', wait_for_capacity=True, provision_timeout_s=600)
print(result['dense'].shape)  # (1024,)
"

The first request still pays the cold-start cost, but the weight load now comes from your private store rather than HuggingFace.

Troubleshooting

Symptom	Likely cause
Worker pod stuck in `Init` with `403 Forbidden` from S3/GCS	IRSA/Workload Identity missing the bucket-read permission
`ImagePullBackOff` on a worker pod	Registry credentials missing, or `imagePullSecrets` not wired
Worker logs show `OSError: Couldn't reach huggingface.co`	`clusterCache` URL typo or bucket missing the requested model
Chart install hangs on dependency download	Sub-charts (KEDA, kube-prometheus-stack, DCGM) trying to fetch from public Artifact Hub. Use `helm pull` with `--untar` and install the local copy.

What’s Next

Kubernetes in GCP for the online quickstart this builds on
Kubernetes in AWS for the EKS counterpart
Config GitOps Workflow for managing model configs without redeploying the chart
Upgrade Runbook for rolling updates and rollback