Why did we open-source our inference engine? Read the post

Qwen/Qwen3-Embedding-0.6B

Architecture
Parameters
600M
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 1,024
Max Sequence Length
32,768 tokens
License

Benchmarks

CQADupstackPhysicsRetrieval

scientific retrieval en

Performance L4 b1 c16
Corpus TPS 21.2K
Corpus p50 94.7ms
Query TPS 2.7K
Query p50 58.2ms

CosQA

technology retrieval en

Performance L4 b1 c16
Corpus TPS 12.7K
Corpus p50 66.3ms
Query TPS 1.4K
Query p50 58.7ms

FiQA2018

finance retrieval en

Performance L4 b1 c16
Corpus TPS 20.8K
Corpus p50 128.2ms
Query TPS 3.1K
Query p50 55.5ms

LegalBenchConsumerContractsQA

legal retrieval en

Performance L4 b1 c16
Corpus TPS 19.9K
Corpus p50 439.6ms
Query TPS 4.1K
Query p50 59.0ms

NFCorpus

medical retrieval en

Quality
ndcg at 10 0.3689
map at 10 0.1395
mrr at 10 0.5716
Performance L4 b1 c16
Corpus TPS 21.2K
Corpus p50 240.7ms
Query TPS 1.3K
Query p50 55.9ms

NanoFiQA2018Retrieval

finance retrieval en

Quality
ndcg at 10 0.6538
map at 10 0.5819
mrr at 10 0.7257
Performance L4 b1 c16
Corpus TPS 18.5K
Corpus p50 144.8ms
Query TPS 1.8K
Query p50 78.3ms

SCIDOCS

scientific retrieval en

Performance L4 b1 c16
Corpus TPS 20.0K
Corpus p50 156.9ms
Query TPS 3.0K
Query p50 54.5ms

SciFact

scientific retrieval en

Performance L4 b1 c16
Corpus TPS 21.1K
Corpus p50 218.8ms
Query TPS 3.7K
Query p50 61.3ms

StackOverflowQA

technology retrieval en

Performance L4 b1 c16
Corpus TPS 20.6K
Corpus p50 172.8ms
Query TPS 18.9K
Query p50 239.7ms

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github
1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.