Offline / Air-Gapped Deployment
Bring SIE up in a cluster with no public internet access. The worker pods normally pull model weights from HuggingFace and container images from GHCR; both of those need to come from inside your network instead.
This guide covers a typical air-gapped flow:
- Snapshot model weights on a workstation that has internet access.
- Mirror the snapshot to private S3-compatible storage reachable from the cluster.
- Configure the chart to read weights from that store and skip HuggingFace.
- Mirror the SIE container images to a private registry.
- Verify first inference with no egress.
The same pattern works for “restricted egress” clusters that allow private object storage but block public HuggingFace.
1. Snapshot model weights
Section titled “1. Snapshot model weights”The simplest tool is huggingface-cli (already a dependency of any SIE workstation):
export HF_HUB_CACHE=./offline-weights
# One modelhuggingface-cli download BAAI/bge-m3 --cache-dir ./offline-weights
# A bundle's worth of models, repeated for each model in the bundlehuggingface-cli download intfloat/e5-base-v2 --cache-dir ./offline-weightshuggingface-cli download mixedbread-ai/mxbai-rerank-large-v1 --cache-dir ./offline-weightsThe result is a directory in HuggingFace cache layout (./offline-weights/models--BAAI--bge-m3/snapshots/<sha>/...) that the chart can mount as HF_HUB_CACHE.
Set HF_TOKEN before running for any gated models.
2. Mirror to private storage
Section titled “2. Mirror to private storage”Push the snapshot to S3-compatible storage that the cluster can reach. AWS S3, GCS, MinIO, and Ceph all work; the chart treats them the same.
# AWS S3aws s3 sync ./offline-weights s3://sie-models-private/weights/
# MinIO (in-cluster or on-prem)mc mirror ./offline-weights minio/sie-models-private/weights/
# GCSgsutil -m rsync -r ./offline-weights gs://sie-models-private/weights/Whatever you choose, the URL handed to the chart in the next step must be reachable from worker pods.
3. Configure the cluster cache
Section titled “3. Configure the cluster cache”Point the chart’s workers.common.clusterCache at the mirrored bucket. Workers will read weights from there instead of HuggingFace.
workers: common: clusterCache: enabled: true url: s3://sie-models-private/weights/ # or gs:// for GCS
# Disable HuggingFace fallback so workers fail fast if the cache is incomplete hfCache: home: /models/huggingface tokenSecret: ""
# Skip HF token wiring entirely in air-gapped clustershfToken: create: falseFor S3, the workers authenticate via IRSA (EKS) or static credentials supplied through extraEnv. For GCS, they use Workload Identity (GKE). For MinIO or other S3-compatibles, mount credentials via a secret and pass them through workers.common.extraEnv.
4. Mirror container images
Section titled “4. Mirror container images”The chart pulls these public images from GHCR by default:
| Image | Where it’s set |
|---|---|
ghcr.io/superlinked/sie-server | workers.common.image.repository |
ghcr.io/superlinked/sie-gateway | gateway.image.repository |
ghcr.io/superlinked/sie-config | config.image.repository |
For air-gapped clusters, mirror them to a private registry once:
# Replace with your version tagTAG=v0.3.1
for img in sie-server sie-gateway sie-config; do docker pull ghcr.io/superlinked/$img:$TAG docker tag ghcr.io/superlinked/$img:$TAG private-registry.example.com/sie/$img:$TAG docker push private-registry.example.com/sie/$img:$TAGdoneThen point the chart at your registry:
# values-offline.yaml (continued)gateway: image: repository: private-registry.example.com/sie/sie-gateway tag: v0.3.1
config: image: repository: private-registry.example.com/sie/sie-config tag: v0.3.1
workers: common: image: repository: private-registry.example.com/sie/sie-server tag: v0.3.1
global: imagePullSecrets: - name: regcredIf your registry needs auth, create the regcred Docker secret in the sie namespace before installing the chart:
kubectl create secret docker-registry regcred \ --docker-server=private-registry.example.com \ --docker-username=... \ --docker-password=... \ -n sie5. Install and verify
Section titled “5. Install and verify”Install the chart with the offline values, no internet egress required:
helm upgrade --install sie oci://ghcr.io/superlinked/charts/sie-cluster \ --version 0.3.1 \ -f values-offline.yaml \ -n sie --create-namespaceIf you also mirrored the chart itself (recommended for fully air-gapped), pull it once with helm pull oci://ghcr.io/superlinked/charts/sie-cluster --version 0.3.1 and install from the local .tgz:
helm pull oci://ghcr.io/superlinked/charts/sie-cluster --version 0.3.1# Move sie-cluster-0.3.1.tgz onto the air-gapped workstation, then:helm upgrade --install sie ./sie-cluster-0.3.1.tgz \ -f values-offline.yaml \ -n sie --create-namespaceVerify first inference exactly like the GCP or AWS guides:
kubectl -n sie port-forward svc/sie-sie-cluster-gateway 8080:8080 &
python3 -c "from sie_sdk import SIEClient
client = SIEClient('http://localhost:8080')result = client.encode('BAAI/bge-m3', {'text': 'hello world'}, gpu='l4', wait_for_capacity=True, provision_timeout_s=600)print(result['dense'].shape) # (1024,)"The first request still pays the cold-start cost, but the weight load now comes from your private store rather than HuggingFace.
Troubleshooting
Section titled “Troubleshooting”| Symptom | Likely cause |
|---|---|
Worker pod stuck in Init with 403 Forbidden from S3/GCS | IRSA/Workload Identity missing the bucket-read permission |
ImagePullBackOff on a worker pod | Registry credentials missing, or imagePullSecrets not wired |
Worker logs show OSError: Couldn't reach huggingface.co | clusterCache URL typo or bucket missing the requested model |
| Chart install hangs on dependency download | Sub-charts (KEDA, kube-prometheus-stack, DCGM) trying to fetch from public Artifact Hub. Use helm pull with --untar and install the local copy. |
What’s Next
Section titled “What’s Next”- Kubernetes in GCP for the online quickstart this builds on
- Kubernetes in AWS for the EKS counterpart
- Config GitOps Workflow for managing model configs without redeploying the chart
- Upgrade Runbook for rolling updates and rollback