Skip to content
Why did we open-source our inference engine? Read the post

Config GitOps Workflow

Commit model YAMLs to a git repo, open a PR, merge it. A GitHub Actions job POSTs each changed YAML to the SIE Config API, then polls the gateway until connected workers have acknowledged the new config. No image rebuild is required when the model’s adapter is already present in a deployed bundle. The workflow is append-only and idempotent: replays of the same commit are safe, and conflicts against existing profile metadata fail fast with a clear error.

  • A running SIE deployment with the config service reachable at $SIE_CONFIG_URL.
  • At least one SIE gateway reachable at $SIE_GATEWAY_URL. The gateway exposes per-model readiness.
  • SIE_ADMIN_TOKEN configured on the config service. The same token is stored as a GitHub Secret.
  • Model adapters already present in a deployed bundle. Bundles are a build-time concept; adding a model whose adapter is not yet bundled still requires an image rebuild. See the HTTP API Reference for adapter and bundle semantics.
  • Repository layout convention: one YAML per model under configs/models/, filename {sie_id with "/" replaced by "__"}.yaml. This mirrors the file naming in packages/sie_server/models/.
configs/
models/
BAAI__bge-m3.yaml
intfloat__e5-base-v2.yaml
.github/
workflows/
push-model-configs.yml

The / to __ rule mirrors the filename convention used by packages/sie_server/models/ in the SIE repo itself.

The full workflow file is push-model-configs.yml in the public SIE repo. Copy it into .github/workflows/push-model-configs.yml in your config repo. Key sections:

  1. Trigger. push to main filtered on configs/models/**.yaml, plus a manual workflow_dispatch input model_path that lets an operator re-push a single file without a new commit.
  2. Collect changed files. The first step writes the list of added or modified YAMLs to changed.txt. On manual dispatch it contains the single file the operator named; on push it is the git diff --diff-filter=AM between github.event.before and github.sha. Missing diffs (e.g. force push, first commit) degrade to an empty list rather than failing the job.
  3. Per-file POST. For each file, the workflow builds an idempotency key (see below) and POSTs the raw YAML body to POST /v1/configs/models with Content-Type: application/x-yaml and Authorization: Bearer $SIE_ADMIN_TOKEN. The HTTP status is inspected explicitly: 200 and 201 are both success, 409 and 422 are hard failures with annotated error messages, 401/403 are auth failures, anything else is flagged as unexpected.
  4. Parse sie_id. The workflow reads the model’s sie_id out of the YAML via python3 -c '... yaml.safe_load ...'. This is the model_id used by the gateway readiness endpoint.
  5. Poll gateway readiness. If SIE_GATEWAY_URL is set, the workflow polls GET $SIE_GATEWAY_URL/v1/configs/models/{model_id}/status every READINESS_POLL_INTERVAL_SECONDS (default 5s) until all_bundles_acked == true or READINESS_TIMEOUT_SECONDS (default 180s) elapses. If SIE_GATEWAY_URL is unset, the poll is skipped with a ::notice:: annotation.
  6. Fail closed. Timeouts and non-2xx statuses fail the job. Two failures in the same run still both run (per-file loop keeps going) but the final exit code is non-zero.

All config-service endpoints are prefixed with /v1/configs. The gateway readiness endpoint lives on the gateway, not on the config service.

MethodPathServiceAuthSuccessNotable failures
POST/v1/configs/modelsconfigwrite (SIE_ADMIN_TOKEN)201 created, 200 pure replay409 content_conflict, 422 validation_error / idempotency_mismatch, 413 payload too large, 503 nats_unavailable / registry_unavailable
GET/v1/configs/models/{model_id}configread (SIE_ADMIN_TOKEN or SIE_AUTH_TOKEN)200 application/x-yaml404
GET/v1/configs/epochconfigread200 {"epoch": <int>}
GET/v1/configs/models/{model_id}/statusgatewayread200 snapshot404 unknown model

Required headers on POST /v1/configs/models:

  • Authorization: Bearer <token>
  • Content-Type: application/x-yaml
  • Idempotency-Key: <stable key>

Payload cap: 1 MiB. Larger bodies return 413.

Successful POST /v1/configs/models response (abridged; 201 for new profiles, 200 if the body is a pure replay):

{
"model_id": "BAAI/bge-m3",
"created_profiles": ["default"],
"existing_profiles_skipped": [],
"warnings": [],
"routable_bundles_by_profile": {
"default": ["default"]
},
"router_id": "gw-abc123"
}

router_id is retained in the response for wire-contract compatibility; the component it identifies is the gateway that served the write.

Gateway readiness snapshot from GET /v1/configs/models/{model_id}/status (abridged):

{
"model_id": "BAAI/bge-m3",
"config_epoch": 42,
"all_bundles_acked": true,
"no_bundles": false,
"source": "gateway-registry",
"bundles": [
{
"bundle_id": "default",
"expected_bundle_config_hash": "sha256:...",
"total_eligible_workers": 2,
"acked_workers": ["worker-0", "worker-1"],
"pending_workers": [],
"acked": true
}
]
}

bundles is a JSON array; each entry carries the per-bundle bundle_id, expected_bundle_config_hash, total_eligible_workers, acked_workers, pending_workers, and a boolean acked. no_bundles: true means the model has no bundle binding on this gateway; the workflow treats that as a readiness failure because no worker can serve it.

The example workflow constructs the key as:

gh-${GITHUB_REPOSITORY//\//-}-${GITHUB_SHA::12}-${sha256(file_path)::12}

This is stable per (commit, file) so GitHub Actions retries, rerun-failed-jobs, and workflow_dispatch replays of the same commit all collapse to the same cache entry.

Server-side behaviour (per the config service code):

  • The idempotency cache is per-app, LRU, 1000 entries.
  • Replay with the same key and same body-hash returns the cached response.
  • Replay with the same key and a different body returns 422 idempotency_mismatch. If you intentionally changed the body, change the key too (new commit gives you one automatically).
  • If a concurrent request waited on an in-flight request with the same Idempotency-Key but the cached response was evicted from the in-memory LRU before it could be replayed, the server returns 200 with error: idempotent_replay_evicted. The original write was applied exactly once; re-read GET /v1/configs/models/{id} to confirm the post-state.
  • Success is all_bundles_acked == true in the gateway status response.
  • Treat no_bundles == true as failure: it means the model has no bundle binding and no worker is eligible to serve it.
  • Default timeout is 180 seconds. Increase it if your cluster cold-starts workers or if bundle fan-out is large.
  • The readiness endpoint is served by a single gateway replica. $SIE_GATEWAY_URL should resolve to a load-balanced service fronting all gateway replicas, so the poll does not latch onto a stale replica.
  • For extra safety, cross-check GET /v1/configs/epoch on the config service against config_epoch in the gateway status snapshot. Divergence points at a gateway that has not yet consumed the NATS notification.
  • 409 content_conflict. A profile with this ID already exists and your YAML differs from the stored copy. The API is append-only; pick a new profile_id instead of editing the existing one.
  • 422 idempotency_mismatch. The key was reused with a different body. Use a new key (e.g. advance the commit) or POST the exact previous body.
  • 422 validation_error. Schema validation failed on the YAML. The response body lists the offending fields; fix and re-commit.
  • 413 payload too large. The body exceeded 1 MiB. Split the YAML or remove inlined blobs.
  • 503 nats_unavailable. The config service lost its NATS connection. Retry after confirming NATS is healthy.
  • 503 registry_unavailable. ModelRegistry failed to initialize (typically malformed bundle or model YAML at startup). Check /readyz on the config service and the service logs; fix the on-disk state and restart.
  • Readiness timeout. Either no healthy workers are connected on an eligible bundle, or the gateway has not yet processed the NATS notification. Check the gateway status body in the job log and verify worker health.
  • 401 / 403. SIE_ADMIN_TOKEN is missing, wrong, or not accepted as a write token by the config service. If only SIE_AUTH_TOKEN is configured server-side, writes are refused.

Contact us

Tell us about your use case and we'll get back to you shortly.