Skip to content
Why did we open-source our inference engine? Read the post

Bundles

Python ML libraries often have conflicting dependency requirements. Models using trust_remote_code=True may depend on specific transformers versions. SIE solves this with bundles. Each bundle is a self-contained environment with compatible dependencies.

For example:

  • sentence-transformers requires transformers>=4.57
  • gliner requires transformers>=4.51.3,<5
  • These cannot coexist in the same environment

Bundles group models with compatible dependencies into separate Docker images.


BundlePurposeKey Models
defaultStandard modelsBGE-M3, E5, Qwen3, Stella, GritLM, ColBERT
glinerGLiNER ecosystem modelsGLiNER, GLiREL, GLiClass, NuNER
sglangLarge LLM embeddingsgte-Qwen2-7B, E5-Mistral-7B, Qwen3-4B
florence2Vision-language modelsFlorence-2, Donut

The default bundle includes most models using transformers>=4.57. This is the recommended starting point.

Included models:

  • Dense: BAAI/bge-m3, intfloat/e5-*, Alibaba-NLP/gte-multilingual-base, Alibaba-NLP/gte-Qwen2-1.5B-instruct
  • Stella: NovaSearch/stella_en_400M_v5, NovaSearch/stella_en_1.5B_v5
  • GritLM: GritLM/GritLM-7B
  • Qwen3: Qwen/Qwen3-Embedding-0.6B
  • NVIDIA: nvidia/NV-Embed-v2
  • Sparse: OpenSearch neural sparse, SPLADE variants, Granite sparse
  • ColBERT: jinaai/jina-colbert-v2, answerdotai/answerai-colbert-small-v1

Named entity recognition, relation extraction, and zero-shot classification models from the GLiNER ecosystem. Requires gliner, glirel, and gliclass libraries with transformers>=4.51.3,<5.

Included models:

  • NER: urchade/gliner_*, EmergentMethods/gliner_large_news-v2.1
  • Biomedical NER: Ihor/gliner-biomed-large-v1.0
  • Relation extraction: jackboyla/glirel-large-v0
  • Zero-shot classification: knowledgator/gliclass-*
  • Span detection: numind/NuNER_Zero, numind/NuNER_Zero-span

Large LLM embedding models (4B+ parameters) using SGLang backend for memory efficiency.

Included models:

  • Alibaba-NLP/gte-Qwen2-7B-instruct
  • Qwen/Qwen3-Embedding-4B
  • intfloat/e5-mistral-7b-instruct
  • Linq-AI-Research/Linq-Embed-Mistral
  • Salesforce/SFR-Embedding-Mistral, Salesforce/SFR-Embedding-2_R
  • nvidia/llama-embed-nemotron-8b

Microsoft Florence-2 and Donut vision-language models. Requires timm for the DaViT vision encoder.

Included models:

  • microsoft/Florence-2-base, microsoft/Florence-2-large
  • microsoft/Florence-2-base-ft
  • mynkchaudhry/Florence-2-FT-DocVQA
  • naver-clova-ix/donut-base-finetuned-cord-v2 (receipt parsing)
  • naver-clova-ix/donut-base-finetuned-docvqa (document QA)
  • naver-clova-ix/donut-base-finetuned-rvlcdip (document classification)

Each bundle has a corresponding Docker image tag. One image per bundle.

Terminal window
# Default bundle (recommended)
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:default
# With GPU
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:default
# GLiNER bundle for NER/relation extraction
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:gliner
# SGLang bundle for large LLM models
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:sglang
# Florence-2 bundle for vision models
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:florence2

Choose a bundle based on the models you need:

  1. Start with default - covers most use cases including Stella, GritLM, and GTE-Qwen2-1.5B
  2. Use gliner for named entity recognition, relation extraction, or zero-shot classification
  3. Use sglang for memory-efficient large LLM embeddings (e.g. gte-Qwen2-7B)
  4. Use florence2 for document understanding and OCR

Models are loaded on first request. The bundle only determines which models are available.


Contact us

Tell us about your use case and we'll get back to you shortly.