Bundles
Why Bundles
Section titled “Why Bundles”Python ML libraries often have conflicting dependency requirements. Models using trust_remote_code=True or specialized backends can pin incompatible versions of transformers, torch, or sglang. SIE solves this with bundles: each bundle is a self-contained environment with compatible dependencies, built into its own Docker image.
Each bundle is a YAML file under packages/sie_server/bundles/ that lists the adapters it enables and the pinned dependency versions needed by those adapters. At build time, the Dockerfile selects one bundle via the BUNDLE build arg and installs only that bundle’s deps.
Published Bundles
Section titled “Published Bundles”Two bundles are published to GHCR today. The default bundle covers every model unless it needs the SGLang runtime.
| Bundle | Purpose | Key Models |
|---|---|---|
default | Everything that runs on standard transformers + Flash Attention | BGE-M3, E5, Stella, Qwen3, GritLM, NV-Embed, ColBERT, ColPali, ColQwen2, GLiNER, GLiREL, GLiClass, Florence-2, Donut, CLIP, SigLIP, Grounding DINO, OwlV2, SPLADE, and more |
sglang | Large LLM embeddings (4B+ params) served through the SGLang backend | gte-Qwen2-7B, Qwen3-Embedding-4B, E5-Mistral-7B, Linq-Embed-Mistral, SFR-Embedding-Mistral, SFR-Embedding-2_R, llama-embed-nemotron-8b |
The previous
glinerandflorence2bundles no longer exist; their adapters and dependencies were folded intodefaultonce the underlying version conflicts were resolved.
There is also an experimental transformers5 bundle in the repo for adapters that require transformers>=5.0 (currently LightOnOCR). It is not published to GHCR and is intended for local builds.
Bundle Contents
Section titled “Bundle Contents”default
Section titled “default”The default bundle is the broad, general-purpose image. It bundles the standard transformers, sentence-transformers, Flash Attention, and the NER/vision adapters.
Included adapter families:
- Dense encoders: BERT (flash), ModernBERT (flash), BGE-M3, Qwen2, XLM-RoBERTa, Nomic, GTE, Stella, sentence-transformers, PyTorch embedding
- Cross-encoders / rerankers: BERT, ModernBERT, Qwen2, Jina (flash), NLI classification
- Multi-vector / late-interaction: ColBERT, ColBERT + ModernBERT, ColBERT + rotary, ColPali, ColQwen2, NeMo ColEmbed
- Sparse: SPLADE (flash), GTE sparse (flash)
- Vision and vision-language: CLIP, SigLIP, Grounding DINO, OwlV2, Florence-2, Donut
- Zero-shot NER / extraction: GLiNER, GLiREL, GLiClass
sglang
Section titled “sglang”A minimal bundle wired to the SGLang runtime for large LLM embeddings. Use this image when you want memory-efficient serving of 4B+ parameter embedding models.
Included models:
Alibaba-NLP/gte-Qwen2-7B-instructQwen/Qwen3-Embedding-4Bintfloat/e5-mistral-7b-instructLinq-AI-Research/Linq-Embed-MistralSalesforce/SFR-Embedding-Mistral,Salesforce/SFR-Embedding-2_Rnvidia/llama-embed-nemotron-8b
Docker Images
Section titled “Docker Images”Each bundle is published for three platforms: cpu, cuda11, cuda12. The image tag format is {version}-{platform}-{bundle}, with a floating latest-{platform}-{bundle} tag that tracks the most recent release.
# Default bundle (CPU)docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default
# Default bundle (CUDA 12, recommended for GPU)docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
# SGLang bundle (CUDA 12)docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-sglang
# Pin to a specific releasedocker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:v0.2.0-cuda12-defaultCUDA 11 variants (latest-cuda11-default, latest-cuda11-sglang) are published for older NVIDIA drivers.
Bundle Selection
Section titled “Bundle Selection”Choose a bundle based on the models you need:
- Start with
default. It covers dense, sparse, multi-vector, cross-encoder, vision, and extraction models, which is the overwhelming majority of use cases. - Use
sglangwhen you need to serve large LLM embedding models (4B+ params) with the SGLang backend. Run it as a second container alongsidedefaultand route requests by model name.
Models are loaded on first request. The bundle only determines which models are available inside a given image.
What’s Next
Section titled “What’s Next”- Model Catalog - complete list of supported models
- Docker Deployment - tags, GPU configuration, and Docker Compose
- Deployment Overview - from single container to Kubernetes