Config GitOps Workflow

Commit model YAMLs to a git repo, open a PR, merge it. A GitHub Actions job POSTs each changed YAML to the SIE Config API, then polls the gateway until eligible SIE server sidecar health shows the new config hash. No image rebuild is required when the model’s adapter is already present in a deployed bundle. The workflow is append-only and idempotent: replays of the same commit are safe, and conflicts against existing profile metadata fail fast with a clear error.

Prerequisites

A running SIE deployment with the config service reachable at $SIE_CONFIG_URL.
At least one SIE gateway reachable at $SIE_GATEWAY_URL. The gateway exposes per-model readiness.
SIE_ADMIN_TOKEN configured on the config service. The same token is stored as a GitHub Secret.
Model adapters already present in a deployed bundle. Bundles are a build-time concept; adding a model whose adapter is not yet bundled still requires an image rebuild. See the HTTP API Reference for adapter and bundle semantics.
Repository layout convention: one YAML per model under configs/models/, filename {sie_id with "/" replaced by "__"}.yaml. This mirrors the file naming in packages/sie_server/models/.

Repository layout

configs/
  models/
    BAAI__bge-m3.yaml
    intfloat__e5-base-v2.yaml
.github/
  workflows/
    push-model-configs.yml

The / to __ rule mirrors the filename convention used by packages/sie_server/models/ in the SIE repo itself.

The workflow, step by step

The full workflow file is push-model-configs.yml in the public SIE repo. Copy it into .github/workflows/push-model-configs.yml in your config repo. Key sections:

Trigger. push to main filtered on configs/models/**.yaml, plus a manual workflow_dispatch input model_path that lets an operator re-push a single file without a new commit.
Collect changed files. The first step writes the list of added or modified YAMLs to changed.txt. On manual dispatch it contains the single file the operator named; on push it is the git diff --diff-filter=AM between github.event.before and github.sha. Missing diffs (e.g. force push, first commit) degrade to an empty list rather than failing the job.
Per-file POST. For each file, the workflow builds an idempotency key (see below) and POSTs the raw YAML body to POST /v1/configs/models with Content-Type: application/x-yaml and Authorization: Bearer $SIE_ADMIN_TOKEN. The HTTP status is inspected explicitly: 200 and 201 are both success, 409 and 422 are hard failures with annotated error messages, 401/403 are auth failures, anything else is flagged as unexpected.
Parse sie_id. The workflow reads the model’s sie_id out of the YAML via python3 -c '... yaml.safe_load ...'. This is the model_id used by the gateway readiness endpoint.
Poll gateway readiness. If SIE_GATEWAY_URL is set, the workflow polls GET $SIE_GATEWAY_URL/v1/configs/models/{model_id}/status every READINESS_POLL_INTERVAL_SECONDS (default 5s) until all_bundles_acked == true or READINESS_TIMEOUT_SECONDS (default 180s) elapses. If SIE_GATEWAY_URL is unset, the poll is skipped with a ::notice:: annotation.
Fail closed. Timeouts and non-2xx statuses fail the job. Two failures in the same run still both run (per-file loop keeps going) but the final exit code is non-zero.

API reference

All config-service endpoints are prefixed with /v1/configs. The gateway readiness endpoint lives on the gateway, not on the config service.

Method	Path	Service	Auth	Success	Notable failures
POST	`/v1/configs/models`	config	write (`SIE_ADMIN_TOKEN`)	201 created, 200 pure replay	409 `content_conflict`, 422 `validation_error` / `idempotency_mismatch`, 413 payload too large, 503 `nats_unavailable` / `registry_unavailable`
GET	`/v1/configs/models/{model_id}`	config	read (`SIE_ADMIN_TOKEN` or `SIE_AUTH_TOKEN`)	200 `application/x-yaml`	404
GET	`/v1/configs/epoch`	config	read	200 `{"epoch": <int>}`
GET	`/v1/configs/models/{model_id}/status`	gateway	read	200 snapshot	404 unknown model

Required headers on POST /v1/configs/models:

Authorization: Bearer <token>
Content-Type: application/x-yaml
Idempotency-Key: <stable key>

Payload cap: 1 MiB. Larger bodies return 413.

Request and response shapes

Successful POST /v1/configs/models response (abridged; 201 for new profiles, 200 if the body is a pure replay):

{
  "model_id": "BAAI/bge-m3",
  "created_profiles": ["default"],
  "existing_profiles_skipped": [],
  "warnings": [],
  "routable_bundles_by_profile": {
    "default": ["default"]
  },
  "router_id": "sie-config-0"
}

router_id is retained in the response for wire-contract compatibility; it identifies the config publisher that emitted the NATS delta, usually the sie-config pod.

Gateway readiness snapshot from GET /v1/configs/models/{model_id}/status (abridged):

{
  "model_id": "BAAI/bge-m3",
  "config_epoch": 42,
  "all_bundles_acked": true,
  "no_bundles": false,
  "source": "gateway-registry",
  "bundles": [
    {
      "bundle_id": "default",
      "expected_bundle_config_hash": "sha256:...",
      "total_eligible_workers": 2,
      "acked_workers": ["worker-0", "worker-1"],
      "pending_workers": [],
      "acked": true
    }
  ]
}

bundles is a JSON array; each entry carries the per-bundle bundle_id, expected_bundle_config_hash, total_eligible_workers, acked_workers, pending_workers, and a boolean acked. no_bundles: true means the model has no bundle binding on this gateway; the workflow treats that as a readiness failure because no worker pod can serve it.

Idempotency keys

The example workflow constructs the key as:

gh-${GITHUB_REPOSITORY//\//-}-${GITHUB_SHA::12}-${sha256(file_path)::12}

This is stable per (commit, file) so GitHub Actions retries, rerun-failed-jobs, and workflow_dispatch replays of the same commit all collapse to the same cache entry.

Server-side behaviour (per the config service code):

The idempotency cache is per-app, LRU, 1000 entries.
Replay with the same key and same body-hash returns the cached response.
Replay with the same key and a different body returns 422 idempotency_mismatch. If you intentionally changed the body, change the key too (new commit gives you one automatically).
If a concurrent request waited on an in-flight request with the same Idempotency-Key but the cached response was evicted from the in-memory LRU before it could be replayed, the server returns 200 with error: idempotent_replay_evicted. The original write was applied exactly once; re-read GET /v1/configs/models/{id} to confirm the post-state.

Readiness verification

Success is all_bundles_acked == true in the gateway status response.
Treat no_bundles == true as failure: it means the model has no bundle binding and no worker is eligible to serve it.
Default timeout is 180 seconds. Increase it if your cluster cold-starts workers or if bundle fan-out is large.
The readiness endpoint is served by a single gateway replica. $SIE_GATEWAY_URL should resolve to a load-balanced service fronting all gateway replicas, so the poll does not latch onto a stale replica.
For extra safety, cross-check GET /v1/configs/epoch on the config service against config_epoch in the gateway status snapshot. Divergence points at a gateway that has not yet consumed the NATS notification.

Troubleshooting

409 content_conflict. A profile with this ID already exists and your YAML differs from the stored copy. The API is append-only; pick a new profile_id instead of editing the existing one.
422 idempotency_mismatch. The key was reused with a different body. Use a new key (e.g. advance the commit) or POST the exact previous body.
422 validation_error. Schema validation failed on the YAML. The response body lists the offending fields; fix and re-commit.
413 payload too large. The body exceeded 1 MiB. Split the YAML or remove inlined blobs.
503 nats_unavailable. The config service lost its NATS connection. Retry after confirming NATS is healthy.
503 registry_unavailable. ModelRegistry failed to initialize (typically malformed bundle or model YAML at startup). Check /readyz on the config service and the service logs; fix the on-disk state and restart.
Readiness timeout. Either no healthy SIE server sidecars are reporting an eligible bundle, or the gateway has not yet processed the NATS notification. Check the gateway status body in the job log and verify SIE server sidecar health.
401 / 403. SIE_ADMIN_TOKEN is missing, wrong, or not accepted as a write token by the config service. If only SIE_AUTH_TOKEN is configured server-side, writes are refused.

HTTP API Reference for the full SIE HTTP surface.