Why did we open-source our inference engine? Read the post

lightonai/GTE-ModernColBERT-v1 (Score)

Architecture
Parameters
305M
Tasks
Encode
Outputs
Multi-Vec
Dimensions
Multi-Vec: 128
Max Sequence Length
8,192 tokens
License

Benchmarks

CQADupstackPhysicsRetrieval

scientific retrieval en

Performance L4-SPOT b1 c16
Corpus TPS 1.9K
Corpus p50 509.4ms
Query TPS 131
Query p50 573.4ms
Performance L4 b1 c16
Corpus TPS 1.9K
Corpus p50 509.4ms
Query TPS 131
Query p50 573.4ms

CosQA

technology retrieval en

Performance L4-SPOT b1 c16
Corpus TPS 890
Corpus p50 454.2ms
Query TPS 75
Query p50 566.6ms
Performance L4 b1 c16
Corpus TPS 890
Corpus p50 454.2ms
Query TPS 75
Query p50 566.6ms

FiQA2018

finance retrieval en

Performance L4-SPOT b1 c16
Corpus TPS 2.6K
Corpus p50 469.6ms
Query TPS 303
Query p50 278.2ms
Performance L4 b1 c16
Corpus TPS 2.6K
Corpus p50 469.6ms
Query TPS 303
Query p50 278.2ms

LegalBenchConsumerContractsQA

legal retrieval en

Performance L4-SPOT b1 c16
Corpus TPS 6.2K
Corpus p50 532.8ms
Query TPS 278
Query p50 327.3ms
Performance L4 b1 c16
Corpus TPS 6.2K
Corpus p50 532.8ms
Query TPS 278
Query p50 327.3ms

NFCorpus

medical retrieval en

Performance L4-SPOT b1 c16
Corpus TPS 4.4K
Corpus p50 463.3ms
Query TPS 111
Query p50 299.7ms
Performance L4 b1 c16
Corpus TPS 4.4K
Corpus p50 463.3ms
Query TPS 111
Query p50 299.7ms

SCIDOCS

scientific retrieval en

Performance L4-SPOT b1 c16
Corpus TPS 4.4K
Corpus p50 257.6ms
Query TPS 184
Query p50 327.2ms
Performance L4 b1 c16
Corpus TPS 4.4K
Corpus p50 257.6ms
Query TPS 184
Query p50 327.2ms

SciFact

scientific retrieval en

Performance L4-SPOT b1 c16
Corpus TPS 9.2K
Corpus p50 241.6ms
Query TPS 396
Query p50 265.9ms
Performance L4 b1 c16
Corpus TPS 9.2K
Corpus p50 241.6ms
Query TPS 396
Query p50 265.9ms

StackOverflowQA

technology retrieval en

Performance L4-SPOT b1 c16
Corpus TPS 3.8K
Corpus p50 458.1ms
Query TPS 9.2K
Query p50 222.9ms
Performance L4 b1 c16
Corpus TPS 3.8K
Corpus p50 458.1ms
Query TPS 9.2K
Query p50 222.9ms

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github
1.5K

Contact us

Tell us about your use case and we'll get back to you shortly.