Skip to content
Why did we open-source our inference engine? Read the post

Offline / Air-Gapped Deployment

Bring SIE up in a cluster with no public internet access. The worker pods normally pull model weights from HuggingFace and container images from GHCR; both of those need to come from inside your network instead.

This guide covers a typical air-gapped flow:

  1. Snapshot model weights on a workstation that has internet access.
  2. Mirror the snapshot to private S3-compatible storage reachable from the cluster.
  3. Configure the chart to read weights from that store and skip HuggingFace.
  4. Mirror the SIE container images to a private registry.
  5. Verify first inference with no egress.

The same pattern works for “restricted egress” clusters that allow private object storage but block public HuggingFace.

The simplest tool is huggingface-cli (already a dependency of any SIE workstation):

Terminal window
export HF_HUB_CACHE=./offline-weights
# One model
huggingface-cli download BAAI/bge-m3 --cache-dir ./offline-weights
# A bundle's worth of models, repeated for each model in the bundle
huggingface-cli download intfloat/e5-base-v2 --cache-dir ./offline-weights
huggingface-cli download mixedbread-ai/mxbai-rerank-large-v1 --cache-dir ./offline-weights

The result is a directory in HuggingFace cache layout (./offline-weights/models--BAAI--bge-m3/snapshots/<sha>/...) that the chart can mount as HF_HUB_CACHE.

Set HF_TOKEN before running for any gated models.

Push the snapshot to S3-compatible storage that the cluster can reach. AWS S3, GCS, MinIO, and Ceph all work; the chart treats them the same.

Terminal window
# AWS S3
aws s3 sync ./offline-weights s3://sie-models-private/weights/
# MinIO (in-cluster or on-prem)
mc mirror ./offline-weights minio/sie-models-private/weights/
# GCS
gsutil -m rsync -r ./offline-weights gs://sie-models-private/weights/

Whatever you choose, the URL handed to the chart in the next step must be reachable from worker pods.

Point the chart’s workers.common.clusterCache at the mirrored bucket. Workers will read weights from there instead of HuggingFace.

values-offline.yaml
workers:
common:
clusterCache:
enabled: true
url: s3://sie-models-private/weights/ # or gs:// for GCS
# Disable HuggingFace fallback so workers fail fast if the cache is incomplete
hfCache:
home: /models/huggingface
tokenSecret: ""
# Skip HF token wiring entirely in air-gapped clusters
hfToken:
create: false

For S3, the workers authenticate via IRSA (EKS) or static credentials supplied through extraEnv. For GCS, they use Workload Identity (GKE). For MinIO or other S3-compatibles, mount credentials via a secret and pass them through workers.common.extraEnv.

The chart pulls these public images from GHCR by default:

ImageWhere it’s set
ghcr.io/superlinked/sie-serverworkers.common.image.repository
ghcr.io/superlinked/sie-gatewaygateway.image.repository
ghcr.io/superlinked/sie-configconfig.image.repository

For air-gapped clusters, mirror them to a private registry once:

Terminal window
# Replace with your version tag
TAG=v0.3.1
for img in sie-server sie-gateway sie-config; do
docker pull ghcr.io/superlinked/$img:$TAG
docker tag ghcr.io/superlinked/$img:$TAG private-registry.example.com/sie/$img:$TAG
docker push private-registry.example.com/sie/$img:$TAG
done

Then point the chart at your registry:

# values-offline.yaml (continued)
gateway:
image:
repository: private-registry.example.com/sie/sie-gateway
tag: v0.3.1
config:
image:
repository: private-registry.example.com/sie/sie-config
tag: v0.3.1
workers:
common:
image:
repository: private-registry.example.com/sie/sie-server
tag: v0.3.1
global:
imagePullSecrets:
- name: regcred

If your registry needs auth, create the regcred Docker secret in the sie namespace before installing the chart:

Terminal window
kubectl create secret docker-registry regcred \
--docker-server=private-registry.example.com \
--docker-username=... \
--docker-password=... \
-n sie

Install the chart with the offline values, no internet egress required:

Terminal window
helm upgrade --install sie oci://ghcr.io/superlinked/charts/sie-cluster \
--version 0.3.1 \
-f values-offline.yaml \
-n sie --create-namespace

If you also mirrored the chart itself (recommended for fully air-gapped), pull it once with helm pull oci://ghcr.io/superlinked/charts/sie-cluster --version 0.3.1 and install from the local .tgz:

Terminal window
helm pull oci://ghcr.io/superlinked/charts/sie-cluster --version 0.3.1
# Move sie-cluster-0.3.1.tgz onto the air-gapped workstation, then:
helm upgrade --install sie ./sie-cluster-0.3.1.tgz \
-f values-offline.yaml \
-n sie --create-namespace

Verify first inference exactly like the GCP or AWS guides:

Terminal window
kubectl -n sie port-forward svc/sie-sie-cluster-gateway 8080:8080 &
python3 -c "
from sie_sdk import SIEClient
client = SIEClient('http://localhost:8080')
result = client.encode('BAAI/bge-m3', {'text': 'hello world'},
gpu='l4', wait_for_capacity=True, provision_timeout_s=600)
print(result['dense'].shape) # (1024,)
"

The first request still pays the cold-start cost, but the weight load now comes from your private store rather than HuggingFace.

SymptomLikely cause
Worker pod stuck in Init with 403 Forbidden from S3/GCSIRSA/Workload Identity missing the bucket-read permission
ImagePullBackOff on a worker podRegistry credentials missing, or imagePullSecrets not wired
Worker logs show OSError: Couldn't reach huggingface.coclusterCache URL typo or bucket missing the requested model
Chart install hangs on dependency downloadSub-charts (KEDA, kube-prometheus-stack, DCGM) trying to fetch from public Artifact Hub. Use helm pull with --untar and install the local copy.

Contact us

Tell us about your use case and we'll get back to you shortly.