Why did we open-source our inference engine? Read the post

GritLM/GritLM-7B

> GritLM is a generative representational instruction tuned language model. It unifies text representation (embedding) and text generation into a single model achieving state-of-the-art performance on both types of tasks.

Overview

Architecture
Mistral
Parameters
7.2B
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 4,096
Max Sequence Length
4,096 tokens
License
apache-2.0

Benchmarks

NFCorpus

medical retrieval en

Biomedical literature search from NutritionFacts.org

Corpus: 3,593 Queries: 323
Quality
ndcg at 10 0.3972
map at 10 0.1531
mrr at 10 0.6139
Performance L4 b1 c16
Corpus 1.7K tok/s
Corpus p50 2.7s
Query 312 tok/s
Query p50 206.2ms
Reference →

NanoFiQA2018Retrieval

finance retrieval en

Smaller subset of the FiQA financial QA dataset

Quality
ndcg at 10 0.6289
map at 10 0.5506
mrr at 10 0.6275
Performance L4 b1 c16
Corpus 1.1K tok/s
Corpus p50 1.6s
Query 556 tok/s
Query p50 196.8ms
Reference →

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.