Qwen/Qwen3-VL-Embedding-2B

Primitive: /encode · Encode · qwen3_vl

The Qwen3-VL-Embedding and Qwen3-VL-Reranker model series are the latest additions to the Qwen family, built upon the recently open-sourced and powerful Qwen3-VL foundation model.

MultimodalLong contextDense

View on Hugging Face → Fine-tuned from Qwen/Qwen3-VL-2B-Instruct

Overview

Hardware: — drives latency, throughput & cost

Size	2.1B params
Tasks	/encode
License	apache-2.0
Latency	36 ms
Throughput	494 tok/s
Cost	$0.450 /1M tok

Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Embedding

Output types	Dense
Dimensions	dense: 2,048
Max sequence length	32,768
Inputs	text · image

Benchmarks

FiQA2018

finance retrieval en

Financial opinion mining and question answering

Corpus: 57,599 Queries: 648

Performance L4 b1 c4

Corpus 494 tok/s

Corpus p50 35.9ms

Reference →

Flickr30kI2TRetrieval

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000

Quality

ndcg at 10 0.8751

map at 10 0.8017

mrr at 10 0.9653