Why did we open-source our inference engine? Read the post

Qwen/Qwen3-Reranker-0.6B

The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. Building upon the dense foundational models of the Qwen3 series, it provides a comprehensive range of text embeddings and reranking models in various sizes (0.6B, 4B, and 8B).

Overview

Architecture
Qwen3
Parameters
596M
Tasks
Score
Outputs
Score
Max Sequence Length
32,768 tokens
License
apache-2.0

Benchmarks

AskUbuntuDupQuestions

technology reranking en

Duplicate question detection from AskUbuntu

Corpus: 6,743 Queries: 360
Quality
ndcg at 10 0.6536
map at 10 0.4986
mrr at 10 0.7642
Performance L4 b1 c16
Corpus 1.5K tok/s
Corpus p50 60.5ms
Query 1.5K tok/s
Query p50 60.5ms
Reference →

MMarcoReranking

general reranking zh

Multilingual MARCO passage reranking (Chinese)

Quality
ndcg at 10 0.0858
map at 10 0.0576
mrr at 10 0.8158
Performance L4 b1 c16
Corpus 18.7K tok/s
Corpus p50 69.8ms
Query 1.5K tok/s
Query p50 69.8ms
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.