Search

Uses: /encode · /score

Retrieval in two moves: /encode turns documents and queries into vectors for fast recall, then /score reranks the shortlist for precision. Pick a dense, sparse, or multi-vector embedder and pair it with a cross-encoder.

Featured models

Encode · `/encode`

	Model	Size	Quality	Latency	Throughput	Cost $/1M
	NovaSearch/stella_en_1.5B_v5 Dense	1.5B	0.4219ndcg@10	258 ms	12.8K tok/s	$0.017
	NovaSearch/stella_en_400M_v5 Dense	435M	0.4125ndcg@10	116 ms	27.1K tok/s	$0.0082
	Alibaba-NLP/gte-multilingual-base MultilingualLong contextDense	305M	0.3677ndcg@10	57 ms	55.1K tok/s	$0.0040
No models match.

Measured on L4; other hardware shows "—" until benchmarked. Pick a benchmark to rank by quality.

For similar models, browse the full /encode catalog →

Score · `/score`

	Model	Size	Quality	Latency	Throughput	Cost $/1M
	mixedbread-ai/mxbai-rerank-large-v2 MultilingualLong context	1.5B	0.6914ndcg@10	767 ms	1.9K tok/s	$0.118
	BAAI/bge-reranker-v2-m3 MultilingualLong context	568M	0.6763ndcg@10	92 ms	30.0K tok/s	$0.0074
	BAAI/bge-reranker-base Multilingual	278M	0.5926ndcg@10	45 ms	21.3K tok/s	$0.010
No models match.

Measured on L4; other hardware shows "—" until benchmarked. Pick a benchmark to rank by quality.

For similar models, browse the full /score catalog →

Examples

End-to-end projects from our examples that put this task to work.

Find the best retrieval strategy for your RAG

Head-to-head retrieval ablation across encoder, reranker and multi-vector pipelines on 1,854 SEC 10-K queries, ranked by NDCG@10.

Private fine-tuned compliance RAG

A domain-tuned LoRA encoder and a custom cross-encoder that reranks and prunes context in one forward pass.

Self-hosted product search in 5 min

Amazon-style search from three SDK calls: extract attributes, encode descriptions, score-rerank candidates.

Find SOTA embedding models by MTEB task

Describe your task in plain language and search ~14K Hugging Face embedding models, ranked by MTEB scores.

Multimodal product classifier with embeddings

NLI, text and image retrieval, and cross-encoder reranking over a hierarchical product taxonomy.

Featured picks are still being finalized. Latency, throughput and cost are real where we've benchmarked the model on the selected GPU; "—" means no measurement there. Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Search

Featured models

Encode · /encode

Score · /score

Examples

Open source inference for agents

Encode · `/encode`

Score · `/score`