Why did we open-source our inference engine? Read the post

Qwen/Qwen3-VL-Embedding-2B

The Qwen3-VL-Embedding and Qwen3-VL-Reranker model series are the latest additions to the Qwen family, built upon the recently open-sourced and powerful Qwen3-VL foundation model.

Overview

Architecture
qwen3_vl
Parameters
2.1B
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 2,048
Max Sequence Length
32,768 tokens
License
apache-2.0

Benchmarks

FiQA2018

finance retrieval en

Financial opinion mining and question answering

Corpus: 57,599 Queries: 648
Performance L4 b1 c4
Corpus 494 tok/s
Corpus p50 35.9ms
Reference →

Flickr30kI2TRetrieval

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000
Quality
ndcg at 10 0.8751
map at 10 0.8017
mrr at 10 0.9653
Performance L4 b1 c4
Corpus 494 tok/s
Corpus p50 35.9ms
Query 0.0 mpix/s
Query p50 4.3s
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.