Model catalog
Every model we serve, one primitive at a time. Pick a benchmark to rank by quality; the hardware control drives latency, throughput and cost.
Task (optional)
Primitive
drives latency, throughput & cost across every table on this page
| Model | Size | Quality | Latency | Throughput | Cost $/1M | |
|---|---|---|---|---|---|---|
| Alibaba-NLP/gte-Qwen2-7B-instruct Long contextDense | 7.6B | 0.4040ndcg@10 | 846 ms | 3.5K tok/s | $0.063 | |
| GritLM/GritLM-7B Dense | 7.2B | 0.3972ndcg@10 | 2.1 s | 1.4K tok/s | $0.157 | |
| Linq-AI-Research/Linq-Embed-Mistral Long contextDense | 7.1B | 0.4066ndcg@10 | 818 ms | 2.9K tok/s | $0.075 | |
| Salesforce/SFR-Embedding-2_R Long contextDense | 7.1B | 0.4285ndcg@10 | 682 ms | 2.9K tok/s | $0.076 | |
| Salesforce/SFR-Embedding-Mistral Dense | 7.1B | 0.4085ndcg@10 | 888 ms | 3.0K tok/s | $0.075 | |
| intfloat/e5-mistral-7b-instruct Dense | 7.1B | 0.3932ndcg@10 | 915 ms | 3.0K tok/s | $0.074 | |
| vidore/colqwen2.5-v0.2 MultimodalMulti-vector | 7.0B | — | 1.9 s | 7.6 mpix/s | — | |
| nvidia/llama-nemoretriever-colembed-3b-v1 MultimodalMultilingualLong contextMulti-vector | 4.4B | — | 6.1 s | 0.7 img/s | — | |
| Qwen/Qwen3-Embedding-4B Long contextDense | 4.0B | 0.4103ndcg@10 | 464 ms | 5.7K tok/s | $0.039 | |
| vidore/colpali-v1.3-hf MultimodalMulti-vector | 3.0B | — | 582 ms | 23.0 mpix/s | — | |
| Qwen/Qwen3-VL-Embedding-2B MultimodalLong contextDense | 2.1B | — | 36 ms | 494 tok/s | $0.450 | |
| Alibaba-NLP/gte-Qwen2-1.5B-instruct Long contextDense | 1.8B | 0.2547ndcg@10 | 261 ms | 12.3K tok/s | $0.018 | |
| NovaSearch/stella_en_1.5B_v5 Dense | 1.5B | 0.4219ndcg@10 | 258 ms | 12.8K tok/s | $0.017 | |
| laion/CLIP-ViT-H-14-laion2B-s32B-b79K MultimodalDense | 986M | — | 353 ms | 438 tok/s | $0.508 | |
| google/siglip-so400m-patch14-384 MultimodalDense | 878M | — | 347 ms | 451 tok/s | $0.493 | |
| google/siglip-so400m-patch14-224 MultimodalDense | 877M | — | 284 ms | 456 tok/s | $0.487 | |
| Qwen/Qwen3-Embedding-0.6B Long contextDense | 596M | 0.3689ndcg@10 | 157 ms | 20.6K tok/s | $0.011 | |
| BAAI/bge-m3 Long contextDenseSparseMulti-vector | 568M | 0.3144ndcg@10 | 93 ms | 33.2K tok/s | $0.0067 | |
| Snowflake/snowflake-arctic-embed-l-v2.0 MultilingualLong contextDense | 568M | 0.3519ndcg@10 | — | — | — | |
| intfloat/multilingual-e5-large MultilingualDense | 560M | 0.3063ndcg@10 | 109 ms | 29.8K tok/s | $0.0074 | |
| intfloat/multilingual-e5-large-instruct MultilingualDense | 560M | 0.3521ndcg@10 | 107 ms | 29.4K tok/s | $0.0076 | |
| jinaai/jina-colbert-v2 MultilingualLong contextMulti-vector | 559M | 0.3583ndcg@10 | 106 ms | 28.5K tok/s | $0.0078 | |
| nomic-ai/nomic-embed-text-v2-moe MultilingualDense | 475M | — | 150 ms | 13.0K tok/s | $0.017 | |
| NovaSearch/stella_en_400M_v5 Dense | 435M | 0.4125ndcg@10 | 116 ms | 27.1K tok/s | $0.0082 | |
| openai/clip-vit-large-patch14 MultimodalDense | 428M | — | 228 ms | 977 tok/s | $0.227 | |
| google/siglip2-base-patch16-224 MultimodalDense | 375M | — | 69 ms | 1.6K tok/s | $0.140 | |
| mixedbread-ai/mxbai-colbert-large-v1 Multi-vector | 335M | 0.3467ndcg@10 | 75 ms | 43.3K tok/s | $0.0051 | |
| intfloat/e5-large-v2 Dense | 335M | 0.3715ndcg@10 | 87 ms | 33.2K tok/s | $0.0067 | |
| mixedbread-ai/mxbai-embed-large-v1 Dense | 335M | 0.3865ndcg@10 | — | — | — | |
| Alibaba-NLP/gte-multilingual-base MultilingualLong contextDense | 305M | 0.3677ndcg@10 | 57 ms | 55.1K tok/s | $0.0040 | |
| Snowflake/snowflake-arctic-embed-m-v2.0 MultilingualLong contextDense | 305M | 0.2489ndcg@10 | — | — | — | |
| google/embeddinggemma-300m Dense | 303M | 0.2619ndcg@10 | 87 ms | 27.2K tok/s | $0.0082 | |
| Marqo/marqo-fashionSigLIP MultimodalDense | 203M | — | — | — | — | |
| laion/CLIP-ViT-B-32-laion2B-s34B-b79K MultimodalDense | 151M | — | 219 ms | 1.0K tok/s | $0.218 | |
| openai/clip-vit-base-patch32 MultimodalDense | 151M | — | 234 ms | 958 tok/s | $0.232 | |
| lightonai/GTE-ModernColBERT-v1 Long contextMulti-vector | 149M | 0.3618ndcg@10 | 104 ms | 28.0K tok/s | $0.0079 | |
| lightonai/Reason-ModernColBERT Long contextMulti-vector | 149M | 0.3580ndcg@10 | 82 ms | 33.0K tok/s | $0.0067 | |
| Alibaba-NLP/gte-modernbert-base Long contextDense | 149M | 0.3664ndcg@10 | — | — | — | |
| ibm-granite/granite-embedding-english-r2 Long contextDense | 149M | 0.3450ndcg@10 | — | — | — | |
| nomic-ai/modernbert-embed-base Long contextDense | 149M | 0.3337ndcg@10 | — | — | — | |
| opensearch-project/opensearch-neural-sparse-encoding-doc-v3-gte Sparse | 137M | 0.3524ndcg@10 | 94 ms | 34.2K tok/s | $0.0065 | |
| opensearch-project/opensearch-neural-sparse-encoding-v1 Sparse | 133M | 0.3600ndcg@10 | 69 ms | 48.7K tok/s | $0.0046 | |
| naver/splade-cocondenser-selfdistil Sparse | 110M | 0.3403ndcg@10 | 72 ms | 40.0K tok/s | $0.0056 | |
| naver/splade-v3 Sparse | 110M | 0.3404ndcg@10 | 84 ms | 29.6K tok/s | $0.0075 | |
| prithivida/Splade_PP_en_v2 Sparse | 110M | 0.3161ndcg@10 | 55 ms | 57.5K tok/s | $0.0039 | |
| colbert-ir/colbertv2.0 Multi-vector | 110M | 0.2647ndcg@10 | 66 ms | 43.0K tok/s | $0.0052 | |
| intfloat/e5-base-v2 Dense | 109M | 0.3541ndcg@10 | 58 ms | 53.2K tok/s | $0.0042 | |
| opensearch-project/opensearch-neural-sparse-encoding-doc-v2-distill Sparse | 67M | 0.3396ndcg@10 | 63 ms | 49.1K tok/s | $0.0045 | |
| opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill Sparse | 67M | 0.3294ndcg@10 | 61 ms | 50.1K tok/s | $0.0044 | |
| opensearch-project/opensearch-neural-sparse-encoding-v2-distill Sparse | 67M | 0.3373ndcg@10 | 63 ms | 44.2K tok/s | $0.0050 | |
| ibm-granite/granite-embedding-small-english-r2 Long contextDense | 48M | 0.3016ndcg@10 | — | — | — | |
| answerdotai/answerai-colbert-small-v1 Multi-vector | 33M | 0.3715ndcg@10 | 48 ms | 59.1K tok/s | $0.0038 | |
| intfloat/e5-small-v2 Dense | 33M | 0.3195ndcg@10 | 50 ms | 58.3K tok/s | $0.0038 | |
| mixedbread-ai/mxbai-edge-colbert-v0-32m Long contextMulti-vector | 32M | 0.3376ndcg@10 | 60 ms | 45.9K tok/s | $0.0048 | |
| ibm-granite/granite-embedding-30m-sparse Sparse | 30M | 0.3147ndcg@10 | 105 ms | 31.9K tok/s | $0.0070 | |
| opensearch-project/opensearch-neural-sparse-encoding-doc-v2-mini Sparse | 23M | 0.3267ndcg@10 | 55 ms | 51.1K tok/s | $0.0044 | |
| sentence-transformers/all-MiniLM-L6-v2 Dense | 23M | 0.2324ndcg@10 | 53 ms | 55.3K tok/s | $0.0040 | |
| rasyosef/splade-mini Sparse | 11M | 0.3090ndcg@10 | 56 ms | 56.3K tok/s | $0.0039 | |
| No models match. | ||||||
| Model | Size | Quality | Latency | Throughput | Cost $/1M | |
|---|---|---|---|---|---|---|
| Qwen/Qwen3-Reranker-4B Long context | 4.0B | 0.6953ndcg@10 | 580 ms | 4.4K tok/s | $0.050 | |
| Qwen/Qwen3-VL-Reranker-2B MultimodalLong context | 2.1B | 0.6553ndcg@10 | 35 ms | 697 tok/s | $0.319 | |
| mixedbread-ai/mxbai-rerank-large-v2 MultilingualLong context | 1.5B | 0.6914ndcg@10 | 767 ms | 1.9K tok/s | $0.118 | |
| Qwen/Qwen3-Reranker-0.6B Long context | 596M | 0.6536ndcg@10 | 65 ms | 1.5K tok/s | $0.151 | |
| BAAI/bge-m3 Long context | 568M | 0.6657ndcg@10 | 56 ms | 2.9K tok/s | $0.076 | |
| BAAI/bge-reranker-v2-m3 MultilingualLong context | 568M | 0.6763ndcg@10 | 92 ms | 30.0K tok/s | $0.0074 | |
| BAAI/bge-reranker-large Multilingual | 560M | 0.6404ndcg@10 | 52 ms | 21.1K tok/s | $0.011 | |
| jinaai/jina-colbert-v2 MultilingualLong context | 559M | 0.6391ndcg@10 | 226 ms | 1.4K tok/s | $0.158 | |
| mixedbread-ai/mxbai-rerank-base-v2 MultilingualLong context | 494M | 0.6638ndcg@10 | 451 ms | 7.0K tok/s | $0.032 | |
| mixedbread-ai/mxbai-colbert-large-v1 | 335M | 0.6299ndcg@10 | 46 ms | 4.0K tok/s | $0.056 | |
| jinaai/jina-reranker-v2-base-multilingual Multilingual | 278M | 0.6546ndcg@10 | 38 ms | 29.0K tok/s | $0.0077 | |
| BAAI/bge-reranker-base Multilingual | 278M | 0.5926ndcg@10 | 45 ms | 21.3K tok/s | $0.010 | |
| Alibaba-NLP/gte-reranker-modernbert-base Long context | 150M | 0.6701ndcg@10 | 55 ms | 11.0K tok/s | $0.020 | |
| lightonai/GTE-ModernColBERT-v1 Long context | 149M | 0.6388ndcg@10 | 313 ms | 231 tok/s | $0.961 | |
| lightonai/Reason-ModernColBERT Long context | 149M | 0.6520ndcg@10 | — | — | — | |
| colbert-ir/colbertv2.0 | 110M | 0.5989ndcg@10 | 51 ms | 3.8K tok/s | $0.058 | |
| answerdotai/answerai-colbert-small-v1 | 33M | 0.6259ndcg@10 | 122 ms | 1.7K tok/s | $0.128 | |
| cross-encoder/ms-marco-MiniLM-L-12-v2 | 33M | 0.6145ndcg@10 | 40 ms | 26.4K tok/s | $0.0084 | |
| mixedbread-ai/mxbai-edge-colbert-v0-32m Long context | 32M | 0.5270ndcg@10 | — | — | — | |
| cross-encoder/ms-marco-MiniLM-L-6-v2 | 23M | 0.6027ndcg@10 | 46 ms | 51.1K tok/s | $0.0043 | |
| No models match. | ||||||
| Model | Size | Quality | Latency | Throughput | Cost $/1M | |
|---|---|---|---|---|---|---|
| zai-org/GLM-OCR MultimodalMultilingual | 1.3B | — | — | — | — | |
| opendatalab/MinerU2.5-Pro-2604-1.2B MultimodalMultilingualEntities | 1.2B | — | — | — | — | |
| lightonai/LightOnOCR-2-1B MultimodalMultilingual | 1.0B | — | 125.7 s | 89 tok/s | $2.49 | |
| PaddlePaddle/PaddleOCR-VL-1.5 MultimodalMultilingual | 959M | — | — | — | — | |
| fastino/gliner2-large-v1 MultilingualEntities | 486M | 0.5366F1 | — | — | — | |
| numind/NuNER_Zero Entities | 449M | 0.6122F1 | — | — | — | |
| google/owlv2-large-patch14-ensemble MultimodalBounding boxes | 438M | — | — | — | — | |
| EmergentMethods/gliner_large_news-v2.1 Entities | 435M | 0.5527F1 | — | — | — | |
| Ihor/gliner-biomed-large-v1.0 Entities | 435M | 0.6439F1 | 108 ms | 9.9K tok/s | $0.023 | |
| jackboyla/glirel-large-v0 Relations | 435M | — | 105 ms | 7.3K tok/s | $0.030 | |
| urchade/gliner_large-v2.1 MultilingualEntities | 435M | 0.5483F1 | 175 ms | 5.9K tok/s | $0.037 | |
| urchade/gliner_multi_pii-v1 MultilingualEntities | 435M | 0.5357F1 | — | — | — | |
| facebook/bart-large-mnli Entities | 407M | — | — | — | — | |
| urchade/gliner_multi-v2.1 MultilingualEntities | 289M | 0.6007F1 | — | — | — | |
| mynkchaudhry/Florence-2-FT-DocVQA MultimodalText regions | 271M | — | 1.6 s | 510 tok/s | $0.436 | |
| IDEA-Research/grounding-dino-base MultimodalBounding boxes | 233M | — | 786 ms | 0.8 mpix/s | — | |
| microsoft/Florence-2-base MultimodalText regions | 232M | — | — | — | — | |
| fastino/gliner2-base-v1 Entities | 208M | 0.5194F1 | 142 ms | 7.5K tok/s | $0.029 | |
| urchade/gliner_medium-v2.1 Entities | 195M | 0.6111F1 | 107 ms | 8.9K tok/s | $0.025 | |
| IDEA-Research/grounding-dino-tiny MultimodalBounding boxes | 172M | — | 533 ms | 0.9 mpix/s | — | |
| google/owlv2-base-patch16-ensemble MultimodalBounding boxes | 155M | — | 955 ms | 1.0 mpix/s | — | |
| MoritzLaurer/ModernBERT-base-zeroshot-v2.0 Entities | 150M | — | — | — | — | |
| naver-clova-ix/donut-base-finetuned-cord-v2 MultimodalText regions | 110M | — | 8.4 s | 757 tok/s | $0.294 | |
| naver-clova-ix/donut-base-finetuned-docvqa MultimodalText regions | 110M | — | 6.9 s | 87 tok/s | $2.56 | |
| numind/NuNER_Zero-span Entities | 110M | 0.6448F1 | — | — | — | |
| docling MultimodalOCR-Document | 80M | — | — | — | — | |
| urchade/gliner_small-v2.1 Entities | 60M | 0.5959F1 | 83 ms | 11.7K tok/s | $0.019 | |
| knowledgator/gliner-bi-base-v2.0 Entities | null | 0.6396F1 | 133 ms | 7.1K tok/s | $0.031 | |
| knowledgator/modern-gliner-bi-base-v1.0 Entities | null | 0.6644F1 | 127 ms | 7.3K tok/s | $0.030 | |
| No models match. | ||||||
| Model | Size | Quality | Latency | Throughput | Cost $/1M | |
|---|---|---|---|---|---|---|
| Qwen/Qwen3.6-27B MultimodalTool callingConstrained outputStreamingCodeSQL | 27.0B | 0.6000acc | 1.7 s | 222 tok/s | $3.80 | |
| Qwen/Qwen3-4B-Instruct-2507 Long contextTool callingConstrained outputStreamingCodeSQL | 4.0B | 0.6033acc | 576 ms | 472 tok/s | $1.78 | |
| Qwen/Qwen3.5-4B MultimodalLong contextTool callingConstrained outputStreaming | 4.0B | 0.5867acc | 762 ms | 353 tok/s | $2.38 | |
| ibm-granite/granite-guardian-3.0-2b Long contextStreamingGuard | 2.5B | — | — | — | — | |
| Qwen/Qwen3-0.6B Streaming | 600M | 0.4600acc | 413 ms | 595 tok/s | $1.41 | |
| No models match. | ||||||
Latency, throughput and cost are shown only where we've benchmarked the model on the selected GPU; "—" means we don't have a measurement there. Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.
Compare (0)Compare →