What is BGE-M3?
BGE-M3 is an open-source text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) that supports three retrieval modes simultaneously: dense retrieval, sparse retrieval, and multi-vector (ColBERT-style) retrieval. It supports 100+ languages and is one of the highest-performing general-purpose embedding models available.
Why is BGE-M3 significant?
Most embedding models only support one retrieval mode — typically dense (single vector) retrieval. BGE-M3 is unusual in that a single model can produce all three types of representations:
- Dense vectors — a single fixed-size vector per text, used for standard semantic search
- Sparse vectors — term-weighted representations similar to BM25, good for keyword-sensitive queries
- Multi-vectors — one vector per token (ColBERT-style), enabling fine-grained token-level matching
This means you can run hybrid retrieval using a single model, and combine all three signals for maximum accuracy — without deploying multiple models.
BGE-M3 capabilities at a glance
| Capability | Detail |
|---|---|
| Languages | 100+ |
| Max input length | 8,192 tokens |
| Retrieval modes | Dense, sparse, multi-vector |
| Model size | ~570M parameters |
| Open source | ✓ (Apache 2.0) |
| MTEB performance | Top-tier general-purpose |
How does BGE-M3 work?
BGE-M3 is based on the XLM-RoBERTa architecture, extended and fine-tuned using a multi-stage training process:
- RetroMAE pre-training — improves the model’s general text understanding
- Multi-task fine-tuning — trains the model across dense, sparse, and multi-vector objectives simultaneously
- Self-knowledge distillation — uses the model’s own multi-vector output to improve dense and sparse representations
The result is a single model that outperforms specialised models in each individual retrieval mode.
When should you use BGE-M3?
BGE-M3 is a strong default choice for:
- Multilingual search — supports 100+ languages with a single model
- Long documents — 8,192 token context handles entire pages or legal clauses
- Hybrid retrieval — produce dense + sparse vectors from one model
- RAG pipelines — reliable performance across diverse document types
For highly specialised domains (legal, medical, code), consider a LoRA adapter fine-tuned on domain data on top of BGE-M3.
How do you run BGE-M3 with SIE?
SIE supports BGE-M3 out of the box across all three retrieval modes:
from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Dense retrieval (standard semantic search)dense_results = client.encode("BAAI/bge-m3", [Item(text=d) for d in documents])dense_vectors = [r["dense"] for r in dense_results]
# With sparse vectors for hybrid retrievalhybrid_results = client.encode( "BAAI/bge-m3", [Item(text=d) for d in documents], output_types=["dense", "sparse"],)hybrid_dense = [r["dense"] for r in hybrid_results]hybrid_sparse = [r["sparse"] for r in hybrid_results]
# With a domain LoRA adapterlegal_results = client.encode( "BAAI/bge-m3", [Item(text=d) for d in documents], options={"lora_id": "org/bge-m3-legal-lora"},)legal_vectors = [r["dense"] for r in legal_results]SIE’s self-hosted deployment means your documents never leave your AWS or GCP environment, and GPU batching makes encoding large corpora significantly faster than managed API calls.
BGE-M3 vs other embedding models
| Model | Multilingual | Max tokens | Retrieval modes | Self-hostable |
|---|---|---|---|---|
| BGE-M3 | ✓ (100+) | 8,192 | Dense + sparse + multi-vector | ✓ |
| E5-large | Limited | 512 | Dense | ✓ |
| OpenAI text-embedding-3 | ✓ | 8,191 | Dense | ✗ |
| Cohere Embed v3 | ✓ | 512 | Dense + sparse | ✗ |
Frequently asked questions
Is BGE-M3 free to use? Yes. BGE-M3 is released under the Apache 2.0 licence and can be used freely for commercial applications. SIE is also Apache 2.0 licensed.
How does BGE-M3 compare to OpenAI’s embedding models? On MTEB benchmarks, BGE-M3 is competitive with OpenAI’s text-embedding-3-large, particularly for multilingual and long-document tasks. The key advantage is that BGE-M3 is fully self-hostable — no per-token fees, no data leaving your infrastructure.
Can BGE-M3 be fine-tuned for specific domains? Yes. SIE supports LoRA hot-loading, allowing you to apply domain-specific fine-tuned adapters to BGE-M3 at inference time without restarting the server.