---
title: Bundles
description: Dependency isolation for models with conflicting requirements.
canonical_url: https://superlinked.com/docs/engine/bundles
last_updated: 2026-05-20
---

## Why Bundles

Source: [packages/sie_server/src/sie_server/core/deps.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/core/deps.py)

Python ML libraries often have conflicting dependency requirements. Models using `trust_remote_code=True` or specialized backends can pin incompatible versions of `transformers`, `torch`, or `sglang`. SIE solves this with bundles: each bundle is a self-contained environment with compatible dependencies, built into its own Docker image.

Each bundle is a YAML file under `packages/sie_server/bundles/` that lists the adapters it enables and the pinned dependency versions needed by those adapters. At build time, the Dockerfile selects one bundle via the `BUNDLE` build arg and installs only that bundle's deps.

---

## Published Bundles

Source: [packages/sie_server/bundles/](https://github.com/superlinked/sie/blob/main/packages/sie_server/bundles/)

Two bundles are published to GHCR today. The `default` bundle covers every model unless it needs the SGLang runtime.

| Bundle | Purpose | Key Models |
|--------|---------|------------|
| `default` | Everything that runs on standard `transformers` + Flash Attention | BGE-M3, E5, Stella, Qwen3, GritLM, NV-Embed, ColBERT, ColPali, ColQwen2, GLiNER, GLiREL, GLiClass, Florence-2, Donut, CLIP, SigLIP, Grounding DINO, OwlV2, SPLADE, and more |
| `sglang` | Large LLM embeddings (4B+ params) served through the SGLang backend | gte-Qwen2-7B, Qwen3-Embedding-4B, E5-Mistral-7B, Linq-Embed-Mistral, SFR-Embedding-Mistral, SFR-Embedding-2_R, llama-embed-nemotron-8b |

> The previous `gliner` and `florence2` bundles no longer exist; their adapters and dependencies were folded into `default` once the underlying version conflicts were resolved.

There is also an experimental `transformers5` bundle in the repo for adapters that require `transformers>=5.0` (currently LightOnOCR). It is not published to GHCR and is intended for local builds.

---

## Bundle Contents

### default

Source: [packages/sie_server/bundles/default.yaml](https://github.com/superlinked/sie/blob/main/packages/sie_server/bundles/default.yaml)

The default bundle is the broad, general-purpose image. It bundles the standard `transformers`, `sentence-transformers`, Flash Attention, and the NER/vision adapters.

**Included adapter families:**
- Dense encoders: BERT (flash), ModernBERT (flash), BGE-M3, Qwen2, XLM-RoBERTa, Nomic, GTE, Stella, sentence-transformers, PyTorch embedding
- Cross-encoders / rerankers: BERT, ModernBERT, Qwen2, Jina (flash), NLI classification
- Multi-vector / late-interaction: ColBERT, ColBERT + ModernBERT, ColBERT + rotary, ColPali, ColQwen2, NeMo ColEmbed
- Sparse: SPLADE (flash), GTE sparse (flash)
- Vision and vision-language: CLIP, SigLIP, Grounding DINO, OwlV2, Florence-2, Donut
- Zero-shot NER / extraction: GLiNER, GLiREL, GLiClass

### sglang

Source: [packages/sie_server/bundles/sglang.yaml](https://github.com/superlinked/sie/blob/main/packages/sie_server/bundles/sglang.yaml)

A minimal bundle wired to the SGLang runtime for large LLM embeddings. Use this image when you want memory-efficient serving of 4B+ parameter embedding models.

**Included models:**
- `Alibaba-NLP/gte-Qwen2-7B-instruct`
- `Qwen/Qwen3-Embedding-4B`
- `intfloat/e5-mistral-7b-instruct`
- `Linq-AI-Research/Linq-Embed-Mistral`
- `Salesforce/SFR-Embedding-Mistral`, `Salesforce/SFR-Embedding-2_R`
- `nvidia/llama-embed-nemotron-8b`

---

## Docker Images

Each bundle is published for three platforms: `cpu`, `cuda11`, `cuda12`. The image tag format is `{version}-{platform}-{bundle}`, with a floating `latest-{platform}-{bundle}` tag that tracks the most recent release.

```bash
# Default bundle (CPU)
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default

# Default bundle (CUDA 12, recommended for GPU)
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default

# SGLang bundle (CUDA 12)
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-sglang

# Pin to a specific release
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:v0.2.0-cuda12-default
```

CUDA 11 variants (`latest-cuda11-default`, `latest-cuda11-sglang`) are published for older NVIDIA drivers.

---

## Bundle Selection

Choose a bundle based on the models you need:

1. **Start with `default`.** It covers dense, sparse, multi-vector, cross-encoder, vision, and extraction models, which is the overwhelming majority of use cases.
2. **Use `sglang`** when you need to serve large LLM embedding models (4B+ params) with the SGLang backend. Run it as a second container alongside `default` and route requests by model name.

Models are loaded on first request. The bundle only determines which models are available inside a given image.

---

## What's Next

- [Model Catalog](/models) - complete list of supported models
- [Docker Deployment](/docs/deployment/docker/) - tags, GPU configuration, and Docker Compose
- [Deployment Overview](/docs/deployment/) - from single container to Kubernetes