Why did we open-source our inference engine? Read the post
62 models
Performance Quality
Model Params Throughput Latency NDCG@10 F1 AP
vidore/colqwen2.5-v0.2
Encode · Multi-Vec · Qwen2
7.0B 2.1 img/s 1.9s
vidore/colpali-v1.3-hf
Encode · Multi-Vec · PaliGemma
3.0B 5.8 img/s 619.1ms
Alibaba-NLP/gte-Qwen2-1.5B-instruct
Encode · Dense ·
1.5B 12.3K tok/s 261.1ms
NovaSearch/stella_en_1.5B_v5
Encode · Dense ·
1.5B 12.8K tok/s 265.9ms
laion/CLIP-ViT-H-14-laion2B-s32B-b79K
Encode · Dense ·
986M 321.5 img/s 503.8ms
Qwen/Qwen3-Embedding-0.6B
Encode · Dense ·
600M 20.6K tok/s 156.9ms
BAAI/bge-m3
Encode · Dense /Sparse /Multi-Vec ·
568M 33.2K tok/s 93.4ms
BAAI/bge-m3
Score · Dense /Sparse /Multi-Vec ·
568M 2.8K tok/s 56.8ms
BAAI/bge-reranker-large
Score · Score ·
560M 6.6K tok/s 41.4ms
intfloat/multilingual-e5-large
Encode · Dense ·
560M 29.8K tok/s 108.6ms
intfloat/multilingual-e5-large-instruct
Encode · Dense ·
560M 29.4K tok/s 106.9ms
EmergentMethods/gliner_large_news-v2.1
Extract · Entities · DeBERTa
435M
Ihor/gliner-biomed-large-v1.0
Extract · Entities · DeBERTa
435M
jackboyla/glirel-large-v0
Extract · Relations · DeBERTa
435M
mixedbread-ai/mxbai-colbert-large-v1
Encode · Multi-Vec ·
435M 43.3K tok/s 74.9ms
mixedbread-ai/mxbai-colbert-large-v1
Score · Multi-Vec ·
435M 4.0K tok/s 45.6ms
mixedbread-ai/mxbai-rerank-large-v2
Encode · Score ·
435M
mixedbread-ai/mxbai-rerank-large-v2
Score · Score ·
435M 2.2K tok/s 1.4s
urchade/gliner_large-v2.1
Extract · Entities · DeBERTa
435M
urchade/gliner_multi-v2.1
Extract · Entities · DeBERTa
435M
urchade/gliner_multi_pii-v1
Extract · Entities · DeBERTa
435M
openai/clip-vit-large-patch14
Encode · Dense ·
428M 706.0 img/s 298.1ms
NovaSearch/stella_en_400M_v5
Encode · Dense · ModernBERT
400M 27.1K tok/s 115.7ms
google/owlv2-base-patch16-ensemble
Extract · Bounding Boxes · CLIP
400M 1.0 mpix/s 954.6ms
google/siglip-so400m-patch14-224
Encode · Dense ·
400M 348.1 img/s 439.8ms
google/siglip-so400m-patch14-384
Encode · Dense ·
400M 354.9 img/s 488.3ms
intfloat/e5-large-v2
Encode · Dense ·
335M 33.2K tok/s 86.6ms
Alibaba-NLP/gte-multilingual-base
Encode · Dense · ModernBERT
305M 55.1K tok/s 63.1ms
lightonai/GTE-ModernColBERT-v1
Encode · Multi-Vec ·
305M 28.0K tok/s 103.9ms
lightonai/GTE-ModernColBERT-v1
Score · Multi-Vec ·
305M 231 tok/s 313.4ms
lightonai/Reason-ModernColBERT
Encode · Multi-Vec ·
305M 33.0K tok/s 82.2ms
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte
Encode · Sparse · ModernBERT
305M 34.2K tok/s 93.7ms
google/embeddinggemma-300m
Encode · Dense ·
300M 79.6K tok/s 55.7ms
BAAI/bge-reranker-base
Score · Score ·
278M 5.0K tok/s 33.2ms
jinaai/jina-reranker-v2-base-multilingual
Score · Score · XLM-RoBERTa
278M 8.3K tok/s 32.0ms
IDEA-Research/grounding-dino-base
Extract · Bounding Boxes · Swin
250M 0.8 mpix/s 785.8ms
laion/CLIP-ViT-B-32-laion2B-s34B-b79K
Encode · Dense ·
151M 1176.0 img/s 178.6ms
openai/clip-vit-base-patch32
Encode · Dense ·
151M 651.0 img/s 319.4ms
Alibaba-NLP/gte-reranker-modernbert-base
Score · Score ·
150M 6.2K tok/s 41.9ms
mixedbread-ai/mxbai-rerank-base-v2
Encode · Score ·
150M
mixedbread-ai/mxbai-rerank-base-v2
Score · Score ·
150M 7.0K tok/s 457.1ms
urchade/gliner_medium-v2.1
Extract · Entities · DeBERTa
150M
nomic-ai/nomic-embed-text-v2-moe
Encode · Dense ·
137M 13.0K tok/s 149.6ms
jinaai/jina-colbert-v2
Encode · Multi-Vec · XLM-RoBERTa
110M 28.5K tok/s 105.7ms
jinaai/jina-colbert-v2
Score · Multi-Vec · XLM-RoBERTa
110M 1.4K tok/s 226.1ms
naver/splade-cocondenser-selfdistil
Encode · Sparse ·
110M 40.0K tok/s 72.4ms
naver/splade-v3
Encode · Sparse ·
110M 29.6K tok/s 83.7ms
numind/NuNER_Zero
Extract · Entities · DeBERTa
110M
numind/NuNER_Zero-span
Extract · Entities · DeBERTa
110M
opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill
Encode · Sparse ·
110M 49.1K tok/s 63.3ms
opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill
Encode · Sparse ·
110M 50.1K tok/s 60.7ms
opensearch-project/opensearch-neural-sparse-encoding-v2-distill
Encode · Sparse ·
110M 44.2K tok/s 63.3ms
intfloat/e5-base-v2
Encode · Dense ·
109M 53.2K tok/s 57.9ms
IDEA-Research/grounding-dino-tiny
Extract · Bounding Boxes · Swin
80M 0.9 mpix/s 532.6ms
urchade/gliner_small-v2.1
Extract · Entities · DeBERTa
60M
answerdotai/answerai-colbert-small-v1
Encode · Multi-Vec ·
33M 59.1K tok/s 47.9ms
answerdotai/answerai-colbert-small-v1
Score · Multi-Vec ·
33M 1.7K tok/s 121.7ms
cross-encoder/ms-marco-MiniLM-L-12-v2
Score · Score ·
33M 8.2K tok/s 31.7ms
intfloat/e5-small-v2
Encode · Dense ·
33M 58.3K tok/s 49.7ms
rasyosef/splade-mini
Encode · Sparse ·
33M 56.3K tok/s 56.0ms
mixedbread-ai/mxbai-edge-colbert-v0-32m
Encode · Multi-Vec ·
32M 45.9K tok/s 59.7ms
sentence-transformers/all-MiniLM-L6-v2
Encode · Dense ·
22M 55.3K tok/s 53.3ms

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github
1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.