Bundles

Why Bundles

Python ML libraries often have conflicting dependency requirements. Models using trust_remote_code=True or specialized backends can pin incompatible versions of transformers, torch, or sglang. SIE solves this with bundles: each bundle is a self-contained environment with compatible dependencies, built into its own Docker image.

Each bundle is a YAML file under packages/sie_server/bundles/ that lists the adapters it enables and the pinned dependency versions needed by those adapters. At build time, the Dockerfile selects one bundle via the BUNDLE build arg and installs only that bundle’s deps.

Published Bundles

Two bundles are published to GHCR today. The default bundle covers every model unless it needs the SGLang runtime.

Bundle	Purpose	Key Models
`default`	Everything that runs on standard `transformers` + Flash Attention	BGE-M3, E5, Stella, Qwen3, GritLM, NV-Embed, ColBERT, ColPali, ColQwen2, GLiNER, GLiREL, GLiClass, Florence-2, Donut, CLIP, SigLIP, Grounding DINO, OwlV2, SPLADE, and more
`sglang`	Large LLM embeddings (4B+ params) served through the SGLang backend	gte-Qwen2-7B, Qwen3-Embedding-4B, E5-Mistral-7B, Linq-Embed-Mistral, SFR-Embedding-Mistral, SFR-Embedding-2_R, llama-embed-nemotron-8b

The previous gliner and florence2 bundles no longer exist; their adapters and dependencies were folded into default once the underlying version conflicts were resolved.

There is also an experimental transformers5 bundle in the repo for adapters that require transformers>=5.0 (currently LightOnOCR). It is not published to GHCR and is intended for local builds.

Bundle Contents

default

The default bundle is the broad, general-purpose image. It bundles the standard transformers, sentence-transformers, Flash Attention, and the NER/vision adapters.

Included adapter families:

Dense encoders: BERT (flash), ModernBERT (flash), BGE-M3, Qwen2, XLM-RoBERTa, Nomic, GTE, Stella, sentence-transformers, PyTorch embedding
Cross-encoders / rerankers: BERT, ModernBERT, Qwen2, Jina (flash), NLI classification
Multi-vector / late-interaction: ColBERT, ColBERT + ModernBERT, ColBERT + rotary, ColPali, ColQwen2, NeMo ColEmbed
Sparse: SPLADE (flash), GTE sparse (flash)
Vision and vision-language: CLIP, SigLIP, Grounding DINO, OwlV2, Florence-2, Donut
Zero-shot NER / extraction: GLiNER, GLiREL, GLiClass

sglang

A minimal bundle wired to the SGLang runtime for large LLM embeddings. Use this image when you want memory-efficient serving of 4B+ parameter embedding models.

Included models:

Alibaba-NLP/gte-Qwen2-7B-instruct
Qwen/Qwen3-Embedding-4B
intfloat/e5-mistral-7b-instruct
Linq-AI-Research/Linq-Embed-Mistral
Salesforce/SFR-Embedding-Mistral, Salesforce/SFR-Embedding-2_R
nvidia/llama-embed-nemotron-8b

Docker Images

Each bundle is published for three platforms: cpu, cuda11, cuda12. The image tag format is {version}-{platform}-{bundle}, with a floating latest-{platform}-{bundle} tag that tracks the most recent release.

# Default bundle (CPU)
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default

# Default bundle (CUDA 12, recommended for GPU)
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default

# SGLang bundle (CUDA 12)
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-sglang

# Pin to a specific release
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:v0.2.0-cuda12-default

CUDA 11 variants (latest-cuda11-default, latest-cuda11-sglang) are published for older NVIDIA drivers.

Bundle Selection

Choose a bundle based on the models you need:

Start with default. It covers dense, sparse, multi-vector, cross-encoder, vision, and extraction models, which is the overwhelming majority of use cases.
Use sglang when you need to serve large LLM embedding models (4B+ params) with the SGLang backend. Run it as a second container alongside default and route requests by model name.

Models are loaded on first request. The bundle only determines which models are available inside a given image.

What’s Next

Model Catalog - complete list of supported models
Docker Deployment - tags, GPU configuration, and Docker Compose
Deployment Overview - from single container to Kubernetes