GritLM/GritLM-7B

Primitive: /encode · Encode · Mistral

> GritLM is a generative representational instruction tuned language model. It unifies text representation (embedding) and text generation into a single model achieving state-of-the-art performance on both types of tasks.

Dense

View on Hugging Face → Fine-tuned from mistralai/Mistral-7B-v0.1

Overview

Hardware: — drives latency, throughput & cost

Size	7.2B params
Tasks	/encode
License	apache-2.0
Latency	2.1 s
Throughput	1.4K tok/s
Cost	$0.157 /1M tok

Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Embedding

Output types	Dense
Dimensions	dense: 4,096
Max sequence length	4,096
Inputs	text

Benchmarks

NFCorpus

medical retrieval en

Biomedical literature search from NutritionFacts.org

Corpus: 3,593 Queries: 323

Quality

ndcg at 10 0.3972

map at 10 0.1531

mrr at 10 0.6139

Performance L4 b1 c16

Corpus 1.7K tok/s

Corpus p50 2.7s

Query 312 tok/s

Query p50 206.2ms

Reference →

NanoFiQA2018Retrieval

finance retrieval en

Smaller subset of the FiQA financial QA dataset

Quality

ndcg at 10 0.6289

map at 10 0.5506

mrr at 10 0.6275

Performance L4 b1 c16

Corpus 1.1K tok/s

Corpus p50 1.6s

Query 556 tok/s

Query p50 196.8ms