Alibaba-NLP/gte-modernbert-base
We are excited to introduce the `gte-modernbert` series of models, which are built upon the latest modernBERT pre-trained encoder-only foundation models. The `gte-modernbert` series models include both text embedding models and rerank models.
Overview
Benchmarks
CQADupstackPhysicsRetrieval
Duplicate question retrieval from StackExchange Physics
Corpus: 38,314 Queries: 1,039
Quality
map at 10 0.4276
mrr at 10 0.4853
ndcg at 10 0.4922
CosQA
Code search with natural language queries
Corpus: 6,267 Queries: 500
Quality
map at 10 0.3438
mrr at 10 0.3946
ndcg at 10 0.4318
FiQA2018
Financial opinion mining and question answering
Corpus: 57,599 Queries: 648
Quality
map at 10 0.4340
mrr at 10 0.6064
ndcg at 10 0.5243
NFCorpus
Biomedical literature search from NutritionFacts.org
Corpus: 3,593 Queries: 323
Quality
ndcg at 10 0.3664
map at 10 0.1335
mrr at 10 0.5635
SCIDOCS
Citation prediction, document classification, and recommendation for scientific papers
Corpus: 25,656 Queries: 1,000
Quality
map at 10 0.1183
mrr at 10 0.3498
ndcg at 10 0.1997
SciFact
Scientific claim verification using research literature
Corpus: 5,183 Queries: 300
Quality
map at 10 0.7297
mrr at 10 0.7400
ndcg at 10 0.7771
StackOverflowQA
Programming question answering from Stack Overflow
Corpus: 19,931 Queries: 1,994
Quality
map at 10 0.8955
mrr at 10 0.8955
ndcg at 10 0.9111