Skip to content
Why did we open-source our inference engine? Read the post

Config API

Add models to a running SIE cluster with a single API call. If the model’s adapter is already in a deployed bundle, no worker image rebuild is needed. The change is written to sie-config, distributed over NATS, and mirrored by every gateway replica.

The Config API is split across two services:

ServiceRole
sie-configAuthoritative control plane. Owns writes, persistence, bundle metadata, snapshots, epoch, and NATS publishing.
sie-gatewayRead-side cache. Serves config reads, resolve, and per-replica worker readiness status. It does not handle config writes.

# Add a model at runtime through sie-config
curl -X POST http://sie-config:8080/v1/configs/models \
-H "Content-Type: application/x-yaml" \
-H "Authorization: Bearer $SIE_ADMIN_TOKEN" \
-H "Idempotency-Key: add-e5-base-001" \
-d '
sie_id: intfloat/multilingual-e5-base
hf_id: intfloat/multilingual-e5-base
profiles:
default:
adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter
max_batch_tokens: 8192
adapter_options:
loadtime: {}
runtime:
pooling: mean
normalize: true
'

Response:

{
"model_id": "intfloat/multilingual-e5-base",
"created_profiles": ["default"],
"existing_profiles_skipped": [],
"warnings": [],
"routable_bundles_by_profile": {"default": ["default"]},
"router_id": "sie-config"
}

This response means the config was accepted, persisted, and applied to sie-config’s registry. NATS publish failures are surfaced in warnings; a fully unavailable publisher returns 503. The response does not mean every worker is already ready to serve the model. To check serving readiness, poll a gateway replica:

curl http://sie-gateway:8080/v1/configs/models/intfloat/multilingual-e5-base/status \
-H "Authorization: Bearer $SIE_AUTH_TOKEN"
{
"model_id": "intfloat/multilingual-e5-base",
"config_epoch": 42,
"all_bundles_acked": true,
"no_bundles": false,
"bundles": [
{
"bundle_id": "default",
"expected_bundle_config_hash": "sha256...",
"total_eligible_workers": 3,
"acked_workers": ["worker-a", "worker-b", "worker-c"],
"pending_workers": [],
"acked": true
}
],
"source": "gateway-registry"
}

Admin client
-> POST /v1/configs/models on sie-config
-> persist model YAML to ConfigStore
-> mutate sie-config ModelRegistry
-> increment config epoch
-> publish NATS deltas:
sie.config.models.{bundle_id} -> workers
sie.config.models._all -> gateways
Gateways
-> apply _all deltas to their ModelRegistry
-> poll /v1/configs/epoch for missed deltas or bundle drift
-> expose /v1/configs/models/{id}/status for readiness
  1. Admin tooling sends POST /v1/configs/models to sie-config.
  2. sie-config validates that every new profile’s adapter_path is routable by at least one known bundle.
  3. A single-process asyncio write lock serializes persist, registry mutation, epoch increment, and NATS publish.
  4. Workers subscribed to sie.config.models.{bundle_id} receive bundle-scoped config notifications.
  5. Gateways subscribed to sie.config.models._all update their in-memory registries.
  6. Workers report the updated bundle_config_hash in their next WebSocket status message.
  7. Gateway /status endpoints expose whether this replica has eligible workers with the expected hash.

ScenarioUse Config API?Alternative
Add a model with an existing adapterYes-
Add a new profile to an existing modelYes-
Add a model that needs a new adapterNoCreate adapter, rebuild bundle image
Add a new bundleNoDefine in repo, rebuild images
Change a model’s adapter_pathNoAppend-only; create a new profile instead

The Config API is append-only. You can add models and profiles, but not modify or delete existing ones.


Endpointsie-configsie-gateway
POST /v1/configs/modelsYesNo, returns 405 Method Not Allowed
GET /v1/configs/modelsYesYes, from gateway registry
GET /v1/configs/models/{id}YesYes, from gateway registry
GET /v1/configs/models/{id}/statusNoYes, per-replica worker ACK view
GET /v1/configs/bundlesYesYes
GET /v1/configs/bundles/{id}YesYes
POST /v1/configs/resolveYesYes
GET /v1/configs/exportYesNo, consumed by gateways
GET /v1/configs/epochYesNo, consumed by gateways
curl http://sie-gateway:8080/v1/configs/models
{
"models": [
{
"model_id": "BAAI/bge-m3",
"profiles": ["default", "sparse"],
"source": "gateway-registry"
},
{
"model_id": "intfloat/multilingual-e5-base",
"profiles": ["default"],
"source": "gateway-registry"
}
]
}

On the gateway, source: "gateway-registry" means the response comes from that replica’s in-memory config mirror. Call sie-config directly if you need to distinguish persisted API-added models from filesystem seed models.

curl http://sie-gateway:8080/v1/configs/models/BAAI/bge-m3

On the gateway, this returns a minimal YAML registry view with sie_id, source: gateway-registry, and compatible bundles. Call sie-config directly for the full stored model YAML with profile definitions.

curl -X POST http://sie-config:8080/v1/configs/models \
-H "Content-Type: application/x-yaml" \
-H "Authorization: Bearer $SIE_ADMIN_TOKEN" \
-d @model-config.yaml
StatusMeaning
201Model or profiles created
200All profiles already existed (idempotent)
400Invalid YAML
401SIE_ADMIN_TOKEN is configured but the request is missing bearer auth
403Write attempted with only the inference token configured
409Profile exists with different content (content-equality check)
422Validation failed (unroutable adapter, missing fields)
503NATS unavailable or config store unavailable

The gateway does not register this route. If you send the same POST to sie-gateway, the response is 405 Method Not Allowed.

curl http://sie-gateway:8080/v1/configs/bundles
{
"bundles": [
{
"bundle_id": "default",
"priority": 10,
"adapter_count": 18,
"source": "gateway-registry",
"connected_workers": 3
}
]
}
curl http://sie-gateway:8080/v1/configs/bundles/default

Returns bundle metadata as YAML including the adapter list.

curl -X POST http://sie-gateway:8080/v1/configs/resolve \
-H "Content-Type: application/json" \
-d '{"model": "BAAI/bge-m3", "bundle": "default"}'

Returns the bundle that would be selected for a request without executing inference. Omit bundle to use the registry’s default bundle priority, or use the default:/BAAI/bge-m3 model-spec form for an explicit bundle override.


The model config format is the same as static model configs. For runtime writes, sie-config validates the YAML schema and requires new profiles to be routable by existing bundle adapters. Full metadata such as hf_id, inputs, and tasks is recommended for catalog quality; many adapters can run from sie_id plus profiles alone.

sie_id: intfloat/multilingual-e5-base
profiles:
default:
adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter
max_batch_tokens: 8192
sie_id: intfloat/multilingual-e5-base
hf_id: intfloat/multilingual-e5-base
inputs:
text: true
tasks:
encode:
dense:
dim: 768
profiles:
default:
adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter
max_batch_tokens: 8192
adapter_options:
loadtime: {}
runtime:
pooling: mean
normalize: true
financial:
extends: default
adapter_options:
runtime:
pooling: mean
normalize: true
instruction: "Retrieve financial documents"

POST the same sie_id with additional profiles. Existing profiles are skipped; new ones are created.

sie_id: intfloat/multilingual-e5-base
profiles:
default:
adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter
max_batch_tokens: 8192
medical:
extends: default
adapter_options:
runtime:
instruction: "Retrieve medical literature"

Response: 201 with created_profiles: ["medical"] and existing_profiles_skipped: ["default"].


POST /v1/configs/models no longer waits for worker ACKs. sie-config has no WebSocket connection to workers, so readiness is a read-side concern on each gateway replica.

FieldDescription
config_epochHighest control-plane epoch applied on this gateway
all_bundles_ackedtrue when every eligible bundle has at least one healthy worker with the expected hash
no_bundlestrue when the model resolves to zero bundles on this gateway
bundles[].expected_bundle_config_hashHash workers must report for the bundle
bundles[].acked_workersHealthy workers whose reported hash matches
bundles[].pending_workersHealthy eligible workers that have not reported the expected hash

all_bundles_acked: false does not mean the write failed. The model can already be in the catalog while workers are still catching up or scaling from zero. Admin tooling that needs a fleet-wide view should poll every gateway replica.


API-added models are persisted by sie-config, not by the gateway. On sie-config startup, SIE_CONFIG_RESTORE=true restores model configs from the configured store. Gateways do not read the store directly; they fetch snapshots from sie-config.

BackendConfigUse Case
Local filesystemSIE_CONFIG_STORE_DIR=/data/configDevelopment or Kubernetes PVC
S3SIE_CONFIG_STORE_DIR=s3://bucket/prefixAWS production persistence
GCSSIE_CONFIG_STORE_DIR=gs://bucket/prefixGCP production persistence

sie-config runs as a single writer. The local backend writes atomically with a temp file, fsync, and replace; cloud backends use object-store PUT semantics.

VariableDefaultDescription
SIE_CONFIG_STORE_DIRLocal pod filesystemConfig store path used by sie-config
SIE_CONFIG_RESTOREfalseSet to true to restore API-added models from the store on sie-config startup
SIE_NATS_URLNoneNATS server URL for config distribution
SIE_BUNDLES_DIR/app/bundlesBundle YAML directory baked into the sie-config image
SIE_MODELS_DIR/app/modelsBaseline model YAML directory baked into the sie-config image

Config changes are distributed to workers and gateways via NATS Core pub/sub. NATS is transport for config deltas, not the durable source of truth.

SubjectSubscribersPurpose
sie.config.models.{bundle_id}Workers in that bundlePer-bundle config notifications
sie.config.models._allAll gatewaysGateway registry sync

Gateways recover missed messages by polling sie-config:

  • GET /v1/configs/epoch returns the authoritative epoch plus a bundles_hash.
  • If the epoch or bundle hash drifts, the gateway re-runs bootstrap.
  • Bootstrap fetches bundles from GET /v1/configs/bundles{,/{id}} and models from GET /v1/configs/export.

If NATS is configured but temporarily unavailable:

  • Config writes return 503 with {"detail": {"error": "nats_unavailable", "message": "..."}} rather than persisting a change that cannot be distributed.
  • Existing inference depends on the separate JetStream work queue and continues only if that queue path is healthy.
  • Once config pub/sub recovers, gateways close any missed-delta gap through the epoch poller.

If only some bundle publishes fail, the write can still return 201 with a warnings entry such as nats_publish_partial. The config is durable, and gateways recover through the epoch/export path; workers on the affected bundle may lag until that recovery completes.


Config API uses the same auth tokens as the rest of the SIE API:

OperationToken Required
GET /v1/configs/*SIE_AUTH_TOKEN or SIE_ADMIN_TOKEN depending on deployment auth mode
POST /v1/configs/models on sie-configSIE_ADMIN_TOKEN
GET /v1/configs/export on sie-configSIE_ADMIN_TOKEN

If neither token is configured, all endpoints are open (development mode). If SIE_AUTH_TOKEN is set but SIE_ADMIN_TOKEN is not, writes are rejected with 403; the inference token never grants config-write access.


Kubernetes deployments run sie-config and sie-gateway as separate deployments. Enable NATS-based config distribution and persistent config storage in Helm values:

nats:
enabled: true
config:
enabled: true
configStore:
enabled: true
size: 10Gi
gateway:
replicas: 2

The chart’s built-in persistence path is the config.configStore PVC. The sie-config service also supports SIE_CONFIG_STORE_DIR=s3://... or gs://..., but wiring that environment variable requires a chart overlay or custom deployment because the stock values file does not expose an extraEnv knob for the config service.


  • Append-only: Models and profiles cannot be modified or deleted after creation.
  • Adapter must be bundled: The model’s adapter_path must exist in at least one known bundle. Adding models that require new adapters still requires an image rebuild.
  • Bundles are build-time only: Bundles cannot be created or modified via API. Rebuild and redeploy sie-config plus workers for bundle changes; gateways pick up the new bundle set from sie-config.
  • sie-config is single-writer: Run one replica. Multi-replica writes require shared idempotency state, which is intentionally not part of the current topology.
  • Readiness is per gateway replica: GET /v1/configs/models/{id}/status reports the workers connected to that gateway. Poll all replicas for a fleet-wide view.
  • Gateway cold start depends on sie-config: A fresh gateway that cannot reach sie-config starts with whatever optional filesystem seed was mounted. In the default deployment, typed requests may return 404 until bootstrap succeeds.

Contact us

Tell us about your use case and we'll get back to you shortly.