Why did we open-source our inference engine? Read the post

Alibaba-NLP/gte-Qwen2-1.5B-instruct

Architecture
Parameters
1.5B
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 1,536
Max Sequence Length
32,768 tokens
License

Benchmarks

CQADupstackPhysicsRetrieval

scientific retrieval en

Performance L4 b1 c16
Corpus TPS 11.6K
Corpus p50 178.5ms
Query TPS 2.2K
Query p50 69.6ms

CosQA

technology retrieval en

Performance L4 b1 c16
Corpus TPS 9.3K
Corpus p50 96.2ms
Query TPS 1.2K
Query p50 66.8ms

FiQA2018

finance retrieval en

Performance L4 b1 c16
Corpus TPS 11.8K
Corpus p50 222.9ms
Query TPS 2.1K
Query p50 73.4ms

LegalBenchConsumerContractsQA

legal retrieval en

Performance L4 b1 c16
Corpus TPS 12.3K
Corpus p50 735.3ms
Query TPS 3.1K
Query p50 71.9ms

NFCorpus

medical retrieval en

Quality
ndcg at 10 0.3925
map at 10 0.1502
mrr at 10 0.6051
Performance L4 b1 c16
Corpus TPS 12.7K
Corpus p50 384.4ms
Query TPS 821
Query p50 90.2ms

NanoFiQA2018Retrieval

finance retrieval en

Quality
ndcg at 10 0.6524
map at 10 0.5848
mrr at 10 0.7032
Performance L4 b1 c16
Corpus TPS 11.3K
Corpus p50 251.5ms
Query TPS 1.9K
Query p50 88.7ms

SCIDOCS

scientific retrieval en

Performance L4 b1 c16
Corpus TPS 12.4K
Corpus p50 261.1ms
Query TPS 2.5K
Query p50 66.4ms

SciFact

scientific retrieval en

Performance L4 b1 c16
Corpus TPS 12.6K
Corpus p50 370.4ms
Query TPS 3.1K
Query p50 74.9ms

StackOverflowQA

technology retrieval en

Performance L4 b1 c16
Corpus TPS 12.4K
Corpus p50 299.2ms
Query TPS 11.4K
Query p50 421.4ms

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github
1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.