Config API
Add models to a running SIE cluster with a single API call. If the model’s adapter is already in the bundle, no image rebuild or restart is needed. The change propagates to all workers within milliseconds via NATS.
Quick Example
Section titled “Quick Example”# Add a model at runtimecurl -X POST http://localhost:8080/v1/configs/models \ -H "Content-Type: application/x-yaml" \ -H "Authorization: Bearer $SIE_ADMIN_TOKEN" \ -d 'sie_id: intfloat/multilingual-e5-baseprofiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192 adapter_options: loadtime: {} runtime: pooling: mean normalize: true'Response:
{ "model_id": "intfloat/multilingual-e5-base", "created_profiles": ["default"], "existing_profiles_skipped": [], "warnings": [], "routable_bundles_by_profile": {"default": ["default"]}, "worker_ack_pending": false, "eligible_bundles_count": 1, "eligible_bundles_with_workers_count": 1, "acked_workers": 3, "total_eligible": 3, "pending_workers": 0, "router_id": "router-abc123"}The model is immediately available for inference. First request triggers weight download and loading.
How It Works
Section titled “How It Works”POST /v1/configs/models │ ▼ ┌─────────┐ ┌───────────────┐ ┌──────────┐ │ Router │────▶│ Config Store │ │ NATS │ │ (any) │ │ (S3/GCS/local)│ │ (pub/sub)│ └────┬────┘ └───────────────┘ └────┬─────┘ │ │ │ publish notification │ ├────────────────────────────────────▶│ │ │ │ ┌──────────────────────┤ │ │ │ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌──────────┐ │ Worker 1│ │ Worker 2│ │ Router 2 │ │ (NATS │ │ (NATS │ │ (NATS │ │ sub) │ │ sub) │ │ sub) │ └─────────┘ └─────────┘ └──────────┘- Client sends
POST /v1/configs/modelsto any router - Router validates the adapter is in at least one known bundle
- Router persists the config to the store (S3, GCS, or local filesystem)
- Router publishes a NATS notification to the affected bundle’s subject
- Workers subscribed to that bundle receive the notification and update their catalog
- Workers report the updated config hash in their next WebSocket status message
- Router confirms serving readiness and returns the response
When to Use
Section titled “When to Use”| Scenario | Use Config API? | Alternative |
|---|---|---|
| Add a model with an existing adapter | Yes | - |
| Add a new profile to an existing model | Yes | - |
| Add a model that needs a new adapter | No | Create adapter, rebuild bundle image |
| Add a new bundle | No | Define in repo, rebuild images |
| Change a model’s adapter_path | No | Append-only; create a new profile instead |
The Config API is append-only. You can add models and profiles, but not modify or delete existing ones.
Endpoints
Section titled “Endpoints”List Models
Section titled “List Models”curl http://localhost:8080/v1/configs/models{ "models": [ { "model_id": "BAAI/bge-m3", "profiles": ["default", "sparse"], "source": "filesystem" }, { "model_id": "intfloat/multilingual-e5-base", "profiles": ["default"], "source": "api" } ]}The source field indicates whether the model was loaded from the filesystem (filesystem) or added via the Config API (api).
Get Model
Section titled “Get Model”curl http://localhost:8080/v1/configs/models/BAAI/bge-m3Returns the model config as YAML.
Add Model
Section titled “Add Model”curl -X POST http://localhost:8080/v1/configs/models \ -H "Content-Type: application/x-yaml" \ -H "Authorization: Bearer $SIE_ADMIN_TOKEN" \ -d @model-config.yaml| Status | Meaning |
|---|---|
201 | Model or profiles created |
200 | All profiles already existed (idempotent) |
400 | Invalid YAML |
409 | Profile exists with different content (content-equality check) |
422 | Validation failed (unroutable adapter, missing fields) |
503 | NATS unavailable or config store unavailable |
List Bundles
Section titled “List Bundles”curl http://localhost:8080/v1/configs/bundles{ "bundles": [ { "bundle_id": "default", "priority": 10, "adapter_count": 18, "source": "filesystem", "connected_workers": 3 } ]}Get Bundle
Section titled “Get Bundle”curl http://localhost:8080/v1/configs/bundles/defaultReturns bundle metadata as YAML including the adapter list.
Config YAML Format
Section titled “Config YAML Format”The model config format is the same as static model configs, with one difference: the Config API only requires sie_id and profiles. Fields like tasks, hf_id, and inputs are optional when adding via API - the worker fills them in from the adapter.
Minimal Config
Section titled “Minimal Config”sie_id: intfloat/multilingual-e5-baseprofiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192Full Config
Section titled “Full Config”sie_id: intfloat/multilingual-e5-basehf_id: intfloat/multilingual-e5-base
inputs: text: true
tasks: encode: dense: dim: 768
profiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192 adapter_options: loadtime: {} runtime: pooling: mean normalize: true financial: extends: default adapter_options: runtime: pooling: mean normalize: true instruction: "Retrieve financial documents"Profile Append
Section titled “Profile Append”POST the same sie_id with additional profiles. Existing profiles are skipped; new ones are created.
sie_id: intfloat/multilingual-e5-baseprofiles: default: adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter max_batch_tokens: 8192 medical: extends: default adapter_options: runtime: instruction: "Retrieve medical literature"Response: 201 with created_profiles: ["medical"] and existing_profiles_skipped: ["default"].
Serving Readiness
Section titled “Serving Readiness”The POST endpoint waits up to 3 seconds for workers to acknowledge the new config before returning. The response includes readiness metadata:
| Field | Description |
|---|---|
worker_ack_pending | false if at least one worker per eligible bundle confirmed the config. true if timeout expired or no workers connected. |
eligible_bundles_count | Number of bundles whose adapter list matches the model |
total_eligible | Number of healthy workers on this router in eligible bundles |
acked_workers | Workers that confirmed the updated config hash within timeout |
pending_workers | total_eligible - acked_workers |
router_id | Which router processed this request |
worker_ack_pending: true does not mean failure. The model is persisted and will propagate. It means the model may not be immediately servable - the first inference request may return 503 until workers catch up.
Persistence
Section titled “Persistence”API-added models are persisted to a config store. On router restart, persisted models are restored automatically.
Storage Backends
Section titled “Storage Backends”| Backend | Config | CAS Mechanism | Use Case |
|---|---|---|---|
| Local filesystem | SIE_CONFIG_STORE_DIR=/data/config | fcntl file locking | Single router, development |
| S3 | SIE_CONFIG_STORE_DIR=s3://bucket/prefix | ETag conditional writes | Multi-router production |
| GCS | SIE_CONFIG_STORE_DIR=gs://bucket/prefix | Generation-based preconditions | Multi-router production (GCP) |
For multi-router deployments, use S3 or GCS. The local filesystem backend only works for a single router instance.
Environment Variables
Section titled “Environment Variables”| Variable | Default | Description |
|---|---|---|
SIE_CONFIG_STORE_DIR | None | Config store path. If unset, API-added models are in-memory only (lost on restart). |
SIE_CONFIG_RESTORE | false | Set to true to restore API-added models from the store on startup. |
SIE_NATS_URL | None | NATS server URL for config distribution (e.g., nats://nats:4222). |
NATS Distribution
Section titled “NATS Distribution”Config changes are distributed to workers and other routers via NATS pub/sub. Each worker subscribes to its bundle’s subject. Each router subscribes to a global subject.
| Subject | Subscribers | Purpose |
|---|---|---|
sie.config.models.{bundle_id} | Workers in that bundle | Per-bundle config notifications |
sie.config.models._all | All routers | Cross-router catalog sync |
Without NATS
Section titled “Without NATS”If SIE_NATS_URL is not set:
- Config API still works for the local router (in-memory + config store)
- Workers do not receive runtime config changes
- Other routers do not receive cross-router sync
- This is fine for single-server deployments
NATS Unavailable
Section titled “NATS Unavailable”If NATS is configured but temporarily unavailable:
- Inference continues normally (NATS is not in the request path)
POST /v1/configs/modelsreturns503with"error": "nats_unavailable"- On NATS reconnect, the router reconciles from the config store
Authentication
Section titled “Authentication”Config API uses the same auth tokens as the rest of the SIE API:
| Operation | Token Required |
|---|---|
GET /v1/configs/* | SIE_AUTH_TOKEN or SIE_ADMIN_TOKEN |
POST /v1/configs/models | SIE_ADMIN_TOKEN only |
If neither token is configured, all endpoints are open (development mode).
Helm Configuration
Section titled “Helm Configuration”Enable NATS-based config distribution in Kubernetes:
nats: enabled: true url: "nats://nats.sie.svc.cluster.local:4222"This sets SIE_NATS_URL, SIE_CONFIG_STORE_DIR, and SIE_CONFIG_RESTORE on both router and worker pods.
For production, override the config store to use S3 or GCS:
# In your Helm values overriderouter: extraEnv: - name: SIE_CONFIG_STORE_DIR value: "s3://my-bucket/sie/configs"Limitations
Section titled “Limitations”- Append-only: Models and profiles cannot be modified or deleted after creation.
- Adapter must be bundled: The model’s
adapter_pathmust exist in at least one known bundle. Adding models that require new adapters still requires an image rebuild. - Bundles are build-time only: Bundles cannot be created or modified via API.
- Local config store is per-pod: The default
/tmpstore does not survive pod restarts. Use S3 or GCS for durable persistence.