---
title: What is BGE-M3?
description: "BGE-M3 is an open-source text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) that supports three retrieval modes simultaneously: dense retrieval, sparse retrieval, and multi-vector (ColBERT-style) retrieval. It supports 100+ languages and is one of the highest-performing general-purpo..."
canonical_url: https://superlinked.com/glossary/what-is-bge-m3
last_updated: 2026-06-02
---

# What is BGE-M3?

BGE-M3 is an open-source text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) that supports three retrieval modes simultaneously: dense retrieval, sparse retrieval, and multi-vector (ColBERT-style) retrieval. It supports 100+ languages and is one of the highest-performing general-purpose embedding models available.

---

## Why is BGE-M3 significant?

Most embedding models only support one retrieval mode — typically dense (single vector) retrieval. BGE-M3 is unusual in that a single model can produce all three types of representations:

- **Dense vectors** — a single fixed-size vector per text, used for standard semantic search
- **Sparse vectors** — term-weighted representations similar to BM25, good for keyword-sensitive queries
- **Multi-vectors** — one vector per token (ColBERT-style), enabling fine-grained token-level matching

This means you can run hybrid retrieval using a single model, and combine all three signals for maximum accuracy — without deploying multiple models.

---

## BGE-M3 capabilities at a glance

| Capability | Detail |
|---|---|
| Languages | 100+ |
| Max input length | 8,192 tokens |
| Retrieval modes | Dense, sparse, multi-vector |
| Model size | ~570M parameters |
| Open source | ✓ (Apache 2.0) |
| MTEB performance | Top-tier general-purpose |

---

## How does BGE-M3 work?

BGE-M3 is based on the XLM-RoBERTa architecture, extended and fine-tuned using a multi-stage training process:

1. **RetroMAE pre-training** — improves the model's general text understanding
2. **Multi-task fine-tuning** — trains the model across dense, sparse, and multi-vector objectives simultaneously
3. **Self-knowledge distillation** — uses the model's own multi-vector output to improve dense and sparse representations

The result is a single model that outperforms specialised models in each individual retrieval mode.

---

## When should you use BGE-M3?

BGE-M3 is a strong default choice for:

- **Multilingual search** — supports 100+ languages with a single model
- **Long documents** — 8,192 token context handles entire pages or legal clauses
- **Hybrid retrieval** — produce dense + sparse vectors from one model
- **RAG pipelines** — reliable performance across diverse document types

For highly specialised domains (legal, medical, code), consider a LoRA adapter fine-tuned on domain data on top of BGE-M3.

---

## How do you run BGE-M3 with SIE?

SIE supports BGE-M3 out of the box across all three retrieval modes:

```python
from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

# Dense retrieval (standard semantic search)
dense_results = client.encode("BAAI/bge-m3", [Item(text=d) for d in documents])
dense_vectors = [r["dense"] for r in dense_results]

# With sparse vectors for hybrid retrieval
hybrid_results = client.encode(
    "BAAI/bge-m3",
    [Item(text=d) for d in documents],
    output_types=["dense", "sparse"],
)
hybrid_dense = [r["dense"] for r in hybrid_results]
hybrid_sparse = [r["sparse"] for r in hybrid_results]

# With a domain LoRA adapter
legal_results = client.encode(
    "BAAI/bge-m3",
    [Item(text=d) for d in documents],
    options={"lora_id": "org/bge-m3-legal-lora"},
)
legal_vectors = [r["dense"] for r in legal_results]
```

SIE's self-hosted deployment means your documents never leave your AWS or GCP environment, and GPU batching makes encoding large corpora significantly faster than managed API calls.

---

## BGE-M3 vs other embedding models

| Model | Multilingual | Max tokens | Retrieval modes | Self-hostable |
|---|---|---|---|---|
| BGE-M3 | ✓ (100+) | 8,192 | Dense + sparse + multi-vector | ✓ |
| E5-large | Limited | 512 | Dense | ✓ |
| OpenAI text-embedding-3 | ✓ | 8,191 | Dense | ✗ |
| Cohere Embed v3 | ✓ | 512 | Dense + sparse | ✗ |

---

## Frequently asked questions

**Is BGE-M3 free to use?**
Yes. BGE-M3 is released under the Apache 2.0 licence and can be used freely for commercial applications. SIE is also Apache 2.0 licensed.

**How does BGE-M3 compare to OpenAI's embedding models?**
On MTEB benchmarks, BGE-M3 is competitive with OpenAI's text-embedding-3-large, particularly for multilingual and long-document tasks. The key advantage is that BGE-M3 is fully self-hostable — no per-token fees, no data leaving your infrastructure.

**Can BGE-M3 be fine-tuned for specific domains?**
Yes. SIE supports LoRA hot-loading, allowing you to apply domain-specific fine-tuned adapters to BGE-M3 at inference time without restarting the server.

---

## Related resources

- [BGE-M3 on the SIE model hub](/models)
- [SIE vs TEI vs OpenAI benchmark](/docs/examples/benchmark)
- [Multivector encoding in SIE](/docs/encode/multivector)
- [What is hybrid search?](/glossary/what-is-hybrid-search)
- [What is a LoRA adapter?](/glossary/what-is-a-lora-adapter)
