Alibaba-NLP/gte-Qwen2-7B-instruct

Primitive: /encode · Encode · Qwen2

gte-Qwen2-7B-instruct is the latest model in the gte (General Text Embedding) model family that ranks No.1 in both English and Chinese evaluations on the Massive Text Embedding Benchmark MTEB benchmark (as of June 16, 2024).

Long contextDense

View on Hugging Face →

Overview

Hardware: — drives latency, throughput & cost

Size	7.6B params
Tasks	/encode
License	apache-2.0
Latency	846 ms
Throughput	3.5K tok/s
Cost	$0.063 /1M tok

Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Embedding

Output types	Dense
Dimensions	dense: 3,584
Max sequence length	32,000
Inputs	text

Benchmarks

NFCorpus

medical retrieval en

Biomedical literature search from NutritionFacts.org

Corpus: 3,593 Queries: 323

Quality

ndcg at 10 0.4040

map at 10 0.1548

mrr at 10 0.6133

Performance L4 b1 c16

Corpus 3.7K tok/s

Corpus p50 1.1s

Query 228 tok/s

Query p50 361.2ms

Reference →

NanoFiQA2018Retrieval

finance retrieval en

Smaller subset of the FiQA financial QA dataset

Quality

ndcg at 10 0.6902

map at 10 0.6156

mrr at 10 0.7338

Performance L4 b1 c16

Corpus 3.3K tok/s

Corpus p50 594.5ms

Query 500 tok/s

Query p50 221.7ms