Config API
Add models to a running SIE cluster with a single API call. If the model’s adapter is already in a deployed bundle, no worker image rebuild is needed. The change is written to sie-config, distributed over NATS, and mirrored by every gateway replica.
The Config API is split across two services:
| Service | Role |
|---|---|
sie-config | Authoritative control plane. Owns writes, persistence, bundle metadata, snapshots, epoch, and NATS publishing. |
sie-gateway | Read-side cache. Serves config reads, resolve, and per-replica worker readiness status. It does not handle config writes. |
Quick Example
Section titled “Quick Example”# Add a model at runtime through sie-configcurl -X POST http://sie-config:8080/v1/configs/models \ -H "Content-Type: application/x-yaml" \ -H "Authorization: Bearer $SIE_ADMIN_TOKEN" \ -H "Idempotency-Key: add-e5-base-001" \ -d 'sie_id: intfloat/multilingual-e5-basehf_id: intfloat/multilingual-e5-baseprofiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192 adapter_options: loadtime: {} runtime: pooling: mean normalize: true'Response:
{ "model_id": "intfloat/multilingual-e5-base", "created_profiles": ["default"], "existing_profiles_skipped": [], "warnings": [], "routable_bundles_by_profile": {"default": ["default"]}, "router_id": "sie-config"}This response means the config was accepted, persisted, and applied to sie-config’s registry. NATS publish failures are surfaced in warnings; a fully unavailable publisher returns 503. The response does not mean every worker is already ready to serve the model. To check serving readiness, poll a gateway replica:
curl http://sie-gateway:8080/v1/configs/models/intfloat/multilingual-e5-base/status \ -H "Authorization: Bearer $SIE_AUTH_TOKEN"{ "model_id": "intfloat/multilingual-e5-base", "config_epoch": 42, "all_bundles_acked": true, "no_bundles": false, "bundles": [ { "bundle_id": "default", "expected_bundle_config_hash": "sha256...", "total_eligible_workers": 3, "acked_workers": ["worker-a", "worker-b", "worker-c"], "pending_workers": [], "acked": true } ], "source": "gateway-registry"}How It Works
Section titled “How It Works”Admin client -> POST /v1/configs/models on sie-config -> persist model YAML to ConfigStore -> mutate sie-config ModelRegistry -> increment config epoch -> publish NATS deltas: sie.config.models.{bundle_id} -> workers sie.config.models._all -> gateways
Gateways -> apply _all deltas to their ModelRegistry -> poll /v1/configs/epoch for missed deltas or bundle drift -> expose /v1/configs/models/{id}/status for readiness- Admin tooling sends
POST /v1/configs/modelstosie-config. sie-configvalidates that every new profile’sadapter_pathis routable by at least one known bundle.- A single-process asyncio write lock serializes persist, registry mutation, epoch increment, and NATS publish.
- Workers subscribed to
sie.config.models.{bundle_id}receive bundle-scoped config notifications. - Gateways subscribed to
sie.config.models._allupdate their in-memory registries. - Workers report the updated
bundle_config_hashin their next WebSocket status message. - Gateway
/statusendpoints expose whether this replica has eligible workers with the expected hash.
When to Use
Section titled “When to Use”| Scenario | Use Config API? | Alternative |
|---|---|---|
| Add a model with an existing adapter | Yes | - |
| Add a new profile to an existing model | Yes | - |
| Add a model that needs a new adapter | No | Create adapter, rebuild bundle image |
| Add a new bundle | No | Define in repo, rebuild images |
| Change a model’s adapter_path | No | Append-only; create a new profile instead |
The Config API is append-only. You can add models and profiles, but not modify or delete existing ones.
Endpoints
Section titled “Endpoints”Endpoint Placement
Section titled “Endpoint Placement”| Endpoint | sie-config | sie-gateway |
|---|---|---|
POST /v1/configs/models | Yes | No, returns 405 Method Not Allowed |
GET /v1/configs/models | Yes | Yes, from gateway registry |
GET /v1/configs/models/{id} | Yes | Yes, from gateway registry |
GET /v1/configs/models/{id}/status | No | Yes, per-replica worker ACK view |
GET /v1/configs/bundles | Yes | Yes |
GET /v1/configs/bundles/{id} | Yes | Yes |
POST /v1/configs/resolve | Yes | Yes |
GET /v1/configs/export | Yes | No, consumed by gateways |
GET /v1/configs/epoch | Yes | No, consumed by gateways |
List Models
Section titled “List Models”curl http://sie-gateway:8080/v1/configs/models{ "models": [ { "model_id": "BAAI/bge-m3", "profiles": ["default", "sparse"], "source": "gateway-registry" }, { "model_id": "intfloat/multilingual-e5-base", "profiles": ["default"], "source": "gateway-registry" } ]}On the gateway, source: "gateway-registry" means the response comes from that replica’s in-memory config mirror. Call sie-config directly if you need to distinguish persisted API-added models from filesystem seed models.
Get Model
Section titled “Get Model”curl http://sie-gateway:8080/v1/configs/models/BAAI/bge-m3On the gateway, this returns a minimal YAML registry view with sie_id, source: gateway-registry, and compatible bundles. Call sie-config directly for the full stored model YAML with profile definitions.
Add Model
Section titled “Add Model”curl -X POST http://sie-config:8080/v1/configs/models \ -H "Content-Type: application/x-yaml" \ -H "Authorization: Bearer $SIE_ADMIN_TOKEN" \ -d @model-config.yaml| Status | Meaning |
|---|---|
201 | Model or profiles created |
200 | All profiles already existed (idempotent) |
400 | Invalid YAML |
401 | SIE_ADMIN_TOKEN is configured but the request is missing bearer auth |
403 | Write attempted with only the inference token configured |
409 | Profile exists with different content (content-equality check) |
422 | Validation failed (unroutable adapter, missing fields) |
503 | NATS unavailable or config store unavailable |
The gateway does not register this route. If you send the same POST to sie-gateway, the response is 405 Method Not Allowed.
List Bundles
Section titled “List Bundles”curl http://sie-gateway:8080/v1/configs/bundles{ "bundles": [ { "bundle_id": "default", "priority": 10, "adapter_count": 18, "source": "gateway-registry", "connected_workers": 3 } ]}Get Bundle
Section titled “Get Bundle”curl http://sie-gateway:8080/v1/configs/bundles/defaultReturns bundle metadata as YAML including the adapter list.
Resolve Routing
Section titled “Resolve Routing”curl -X POST http://sie-gateway:8080/v1/configs/resolve \ -H "Content-Type: application/json" \ -d '{"model": "BAAI/bge-m3", "bundle": "default"}'Returns the bundle that would be selected for a request without executing inference. Omit bundle to use the registry’s default bundle priority, or use the default:/BAAI/bge-m3 model-spec form for an explicit bundle override.
Config YAML Format
Section titled “Config YAML Format”The model config format is the same as static model configs. For runtime writes, sie-config validates the YAML schema and requires new profiles to be routable by existing bundle adapters. Full metadata such as hf_id, inputs, and tasks is recommended for catalog quality; many adapters can run from sie_id plus profiles alone.
Minimal Config
Section titled “Minimal Config”sie_id: intfloat/multilingual-e5-baseprofiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192Full Config
Section titled “Full Config”sie_id: intfloat/multilingual-e5-basehf_id: intfloat/multilingual-e5-base
inputs: text: true
tasks: encode: dense: dim: 768
profiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192 adapter_options: loadtime: {} runtime: pooling: mean normalize: true financial: extends: default adapter_options: runtime: pooling: mean normalize: true instruction: "Retrieve financial documents"Profile Append
Section titled “Profile Append”POST the same sie_id with additional profiles. Existing profiles are skipped; new ones are created.
sie_id: intfloat/multilingual-e5-baseprofiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192 medical: extends: default adapter_options: runtime: instruction: "Retrieve medical literature"Response: 201 with created_profiles: ["medical"] and existing_profiles_skipped: ["default"].
Serving Readiness
Section titled “Serving Readiness”POST /v1/configs/models no longer waits for worker ACKs. sie-config has no WebSocket connection to workers, so readiness is a read-side concern on each gateway replica.
| Field | Description |
|---|---|
config_epoch | Highest control-plane epoch applied on this gateway |
all_bundles_acked | true when every eligible bundle has at least one healthy worker with the expected hash |
no_bundles | true when the model resolves to zero bundles on this gateway |
bundles[].expected_bundle_config_hash | Hash workers must report for the bundle |
bundles[].acked_workers | Healthy workers whose reported hash matches |
bundles[].pending_workers | Healthy eligible workers that have not reported the expected hash |
all_bundles_acked: false does not mean the write failed. The model can already be in the catalog while workers are still catching up or scaling from zero. Admin tooling that needs a fleet-wide view should poll every gateway replica.
Persistence
Section titled “Persistence”API-added models are persisted by sie-config, not by the gateway. On sie-config startup, SIE_CONFIG_RESTORE=true restores model configs from the configured store. Gateways do not read the store directly; they fetch snapshots from sie-config.
Storage Backends
Section titled “Storage Backends”| Backend | Config | Use Case |
|---|---|---|
| Local filesystem | SIE_CONFIG_STORE_DIR=/data/config | Development or Kubernetes PVC |
| S3 | SIE_CONFIG_STORE_DIR=s3://bucket/prefix | AWS production persistence |
| GCS | SIE_CONFIG_STORE_DIR=gs://bucket/prefix | GCP production persistence |
sie-config runs as a single writer. The local backend writes atomically with a temp file, fsync, and replace; cloud backends use object-store PUT semantics.
Environment Variables
Section titled “Environment Variables”| Variable | Default | Description |
|---|---|---|
SIE_CONFIG_STORE_DIR | Local pod filesystem | Config store path used by sie-config |
SIE_CONFIG_RESTORE | false | Set to true to restore API-added models from the store on sie-config startup |
SIE_NATS_URL | None | NATS server URL for config distribution |
SIE_BUNDLES_DIR | /app/bundles | Bundle YAML directory baked into the sie-config image |
SIE_MODELS_DIR | /app/models | Baseline model YAML directory baked into the sie-config image |
NATS Distribution
Section titled “NATS Distribution”Config changes are distributed to workers and gateways via NATS Core pub/sub. NATS is transport for config deltas, not the durable source of truth.
| Subject | Subscribers | Purpose |
|---|---|---|
sie.config.models.{bundle_id} | Workers in that bundle | Per-bundle config notifications |
sie.config.models._all | All gateways | Gateway registry sync |
Gateway Recovery
Section titled “Gateway Recovery”Gateways recover missed messages by polling sie-config:
GET /v1/configs/epochreturns the authoritative epoch plus abundles_hash.- If the epoch or bundle hash drifts, the gateway re-runs bootstrap.
- Bootstrap fetches bundles from
GET /v1/configs/bundles{,/{id}}and models fromGET /v1/configs/export.
NATS Unavailable
Section titled “NATS Unavailable”If NATS is configured but temporarily unavailable:
- Config writes return
503with{"detail": {"error": "nats_unavailable", "message": "..."}}rather than persisting a change that cannot be distributed. - Existing inference depends on the separate JetStream work queue and continues only if that queue path is healthy.
- Once config pub/sub recovers, gateways close any missed-delta gap through the epoch poller.
If only some bundle publishes fail, the write can still return 201 with a warnings entry such as nats_publish_partial. The config is durable, and gateways recover through the epoch/export path; workers on the affected bundle may lag until that recovery completes.
Authentication
Section titled “Authentication”Config API uses the same auth tokens as the rest of the SIE API:
| Operation | Token Required |
|---|---|
GET /v1/configs/* | SIE_AUTH_TOKEN or SIE_ADMIN_TOKEN depending on deployment auth mode |
POST /v1/configs/models on sie-config | SIE_ADMIN_TOKEN |
GET /v1/configs/export on sie-config | SIE_ADMIN_TOKEN |
If neither token is configured, all endpoints are open (development mode). If SIE_AUTH_TOKEN is set but SIE_ADMIN_TOKEN is not, writes are rejected with 403; the inference token never grants config-write access.
Helm Configuration
Section titled “Helm Configuration”Kubernetes deployments run sie-config and sie-gateway as separate deployments. Enable NATS-based config distribution and persistent config storage in Helm values:
nats: enabled: true
config: enabled: true configStore: enabled: true size: 10Gi
gateway: replicas: 2The chart’s built-in persistence path is the config.configStore PVC. The sie-config service also supports SIE_CONFIG_STORE_DIR=s3://... or gs://..., but wiring that environment variable requires a chart overlay or custom deployment because the stock values file does not expose an extraEnv knob for the config service.
Limitations
Section titled “Limitations”- Append-only: Models and profiles cannot be modified or deleted after creation.
- Adapter must be bundled: The model’s
adapter_pathmust exist in at least one known bundle. Adding models that require new adapters still requires an image rebuild. - Bundles are build-time only: Bundles cannot be created or modified via API. Rebuild and redeploy
sie-configplus workers for bundle changes; gateways pick up the new bundle set fromsie-config. sie-configis single-writer: Run one replica. Multi-replica writes require shared idempotency state, which is intentionally not part of the current topology.- Readiness is per gateway replica:
GET /v1/configs/models/{id}/statusreports the workers connected to that gateway. Poll all replicas for a fleet-wide view. - Gateway cold start depends on
sie-config: A fresh gateway that cannot reachsie-configstarts with whatever optional filesystem seed was mounted. In the default deployment, typed requests may return404until bootstrap succeeds.