Skip to content
Why did we open-source our inference engine? Read the post

Config API

Add models to a running SIE cluster with a single API call. If the model’s adapter is already in the bundle, no image rebuild or restart is needed. The change propagates to all workers within milliseconds via NATS.


Terminal window
# Add a model at runtime
curl -X POST http://localhost:8080/v1/configs/models \
-H "Content-Type: application/x-yaml" \
-H "Authorization: Bearer $SIE_ADMIN_TOKEN" \
-d '
sie_id: intfloat/multilingual-e5-base
profiles:
default:
adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter
max_batch_tokens: 8192
adapter_options:
loadtime: {}
runtime:
pooling: mean
normalize: true
'

Response:

{
"model_id": "intfloat/multilingual-e5-base",
"created_profiles": ["default"],
"existing_profiles_skipped": [],
"warnings": [],
"routable_bundles_by_profile": {"default": ["default"]},
"worker_ack_pending": false,
"eligible_bundles_count": 1,
"eligible_bundles_with_workers_count": 1,
"acked_workers": 3,
"total_eligible": 3,
"pending_workers": 0,
"router_id": "router-abc123"
}

The model is immediately available for inference. First request triggers weight download and loading.


POST /v1/configs/models
┌─────────┐ ┌───────────────┐ ┌──────────┐
│ Router │────▶│ Config Store │ │ NATS │
│ (any) │ │ (S3/GCS/local)│ │ (pub/sub)│
└────┬────┘ └───────────────┘ └────┬─────┘
│ │
│ publish notification │
├────────────────────────────────────▶│
│ │
│ ┌──────────────────────┤
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌──────────┐
│ Worker 1│ │ Worker 2│ │ Router 2 │
│ (NATS │ │ (NATS │ │ (NATS │
│ sub) │ │ sub) │ │ sub) │
└─────────┘ └─────────┘ └──────────┘
  1. Client sends POST /v1/configs/models to any router
  2. Router validates the adapter is in at least one known bundle
  3. Router persists the config to the store (S3, GCS, or local filesystem)
  4. Router publishes a NATS notification to the affected bundle’s subject
  5. Workers subscribed to that bundle receive the notification and update their catalog
  6. Workers report the updated config hash in their next WebSocket status message
  7. Router confirms serving readiness and returns the response

ScenarioUse Config API?Alternative
Add a model with an existing adapterYes-
Add a new profile to an existing modelYes-
Add a model that needs a new adapterNoCreate adapter, rebuild bundle image
Add a new bundleNoDefine in repo, rebuild images
Change a model’s adapter_pathNoAppend-only; create a new profile instead

The Config API is append-only. You can add models and profiles, but not modify or delete existing ones.


Terminal window
curl http://localhost:8080/v1/configs/models
{
"models": [
{
"model_id": "BAAI/bge-m3",
"profiles": ["default", "sparse"],
"source": "filesystem"
},
{
"model_id": "intfloat/multilingual-e5-base",
"profiles": ["default"],
"source": "api"
}
]
}

The source field indicates whether the model was loaded from the filesystem (filesystem) or added via the Config API (api).

Terminal window
curl http://localhost:8080/v1/configs/models/BAAI/bge-m3

Returns the model config as YAML.

Terminal window
curl -X POST http://localhost:8080/v1/configs/models \
-H "Content-Type: application/x-yaml" \
-H "Authorization: Bearer $SIE_ADMIN_TOKEN" \
-d @model-config.yaml
StatusMeaning
201Model or profiles created
200All profiles already existed (idempotent)
400Invalid YAML
409Profile exists with different content (content-equality check)
422Validation failed (unroutable adapter, missing fields)
503NATS unavailable or config store unavailable
Terminal window
curl http://localhost:8080/v1/configs/bundles
{
"bundles": [
{
"bundle_id": "default",
"priority": 10,
"adapter_count": 18,
"source": "filesystem",
"connected_workers": 3
}
]
}
Terminal window
curl http://localhost:8080/v1/configs/bundles/default

Returns bundle metadata as YAML including the adapter list.


The model config format is the same as static model configs, with one difference: the Config API only requires sie_id and profiles. Fields like tasks, hf_id, and inputs are optional when adding via API - the worker fills them in from the adapter.

sie_id: intfloat/multilingual-e5-base
profiles:
default:
adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter
max_batch_tokens: 8192
sie_id: intfloat/multilingual-e5-base
hf_id: intfloat/multilingual-e5-base
inputs:
text: true
tasks:
encode:
dense:
dim: 768
profiles:
default:
adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter
max_batch_tokens: 8192
adapter_options:
loadtime: {}
runtime:
pooling: mean
normalize: true
financial:
extends: default
adapter_options:
runtime:
pooling: mean
normalize: true
instruction: "Retrieve financial documents"

POST the same sie_id with additional profiles. Existing profiles are skipped; new ones are created.

sie_id: intfloat/multilingual-e5-base
profiles:
default:
adapter_path: sie_server.adapters.sentence_transformer:SentenceTransformerAdapter
max_batch_tokens: 8192
medical:
extends: default
adapter_options:
runtime:
instruction: "Retrieve medical literature"

Response: 201 with created_profiles: ["medical"] and existing_profiles_skipped: ["default"].


The POST endpoint waits up to 3 seconds for workers to acknowledge the new config before returning. The response includes readiness metadata:

FieldDescription
worker_ack_pendingfalse if at least one worker per eligible bundle confirmed the config. true if timeout expired or no workers connected.
eligible_bundles_countNumber of bundles whose adapter list matches the model
total_eligibleNumber of healthy workers on this router in eligible bundles
acked_workersWorkers that confirmed the updated config hash within timeout
pending_workerstotal_eligible - acked_workers
router_idWhich router processed this request

worker_ack_pending: true does not mean failure. The model is persisted and will propagate. It means the model may not be immediately servable - the first inference request may return 503 until workers catch up.


API-added models are persisted to a config store. On router restart, persisted models are restored automatically.

BackendConfigCAS MechanismUse Case
Local filesystemSIE_CONFIG_STORE_DIR=/data/configfcntl file lockingSingle router, development
S3SIE_CONFIG_STORE_DIR=s3://bucket/prefixETag conditional writesMulti-router production
GCSSIE_CONFIG_STORE_DIR=gs://bucket/prefixGeneration-based preconditionsMulti-router production (GCP)

For multi-router deployments, use S3 or GCS. The local filesystem backend only works for a single router instance.

VariableDefaultDescription
SIE_CONFIG_STORE_DIRNoneConfig store path. If unset, API-added models are in-memory only (lost on restart).
SIE_CONFIG_RESTOREfalseSet to true to restore API-added models from the store on startup.
SIE_NATS_URLNoneNATS server URL for config distribution (e.g., nats://nats:4222).

Config changes are distributed to workers and other routers via NATS pub/sub. Each worker subscribes to its bundle’s subject. Each router subscribes to a global subject.

SubjectSubscribersPurpose
sie.config.models.{bundle_id}Workers in that bundlePer-bundle config notifications
sie.config.models._allAll routersCross-router catalog sync

If SIE_NATS_URL is not set:

  • Config API still works for the local router (in-memory + config store)
  • Workers do not receive runtime config changes
  • Other routers do not receive cross-router sync
  • This is fine for single-server deployments

If NATS is configured but temporarily unavailable:

  • Inference continues normally (NATS is not in the request path)
  • POST /v1/configs/models returns 503 with "error": "nats_unavailable"
  • On NATS reconnect, the router reconciles from the config store

Config API uses the same auth tokens as the rest of the SIE API:

OperationToken Required
GET /v1/configs/*SIE_AUTH_TOKEN or SIE_ADMIN_TOKEN
POST /v1/configs/modelsSIE_ADMIN_TOKEN only

If neither token is configured, all endpoints are open (development mode).


Enable NATS-based config distribution in Kubernetes:

nats:
enabled: true
url: "nats://nats.sie.svc.cluster.local:4222"

This sets SIE_NATS_URL, SIE_CONFIG_STORE_DIR, and SIE_CONFIG_RESTORE on both router and worker pods.

For production, override the config store to use S3 or GCS:

# In your Helm values override
router:
extraEnv:
- name: SIE_CONFIG_STORE_DIR
value: "s3://my-bucket/sie/configs"

  • Append-only: Models and profiles cannot be modified or deleted after creation.
  • Adapter must be bundled: The model’s adapter_path must exist in at least one known bundle. Adding models that require new adapters still requires an image rebuild.
  • Bundles are build-time only: Bundles cannot be created or modified via API.
  • Local config store is per-pod: The default /tmp store does not survive pod restarts. Use S3 or GCS for durable persistence.

Contact us

Tell us about your use case and we'll get back to you shortly.