Why did we open-source our inference engine? Read the post
← All Glossary Articles

What is BGE-M3?

BGE-M3 is an open-source text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) that supports three retrieval modes simultaneously: dense retrieval, sparse retrieval, and multi-vector (ColBERT-style) retrieval. It supports 100+ languages and is one of the highest-performing general-purpose embedding models available.


Why is BGE-M3 significant?

Most embedding models only support one retrieval mode — typically dense (single vector) retrieval. BGE-M3 is unusual in that a single model can produce all three types of representations:

  • Dense vectors — a single fixed-size vector per text, used for standard semantic search
  • Sparse vectors — term-weighted representations similar to BM25, good for keyword-sensitive queries
  • Multi-vectors — one vector per token (ColBERT-style), enabling fine-grained token-level matching

This means you can run hybrid retrieval using a single model, and combine all three signals for maximum accuracy — without deploying multiple models.


BGE-M3 capabilities at a glance

CapabilityDetail
Languages100+
Max input length8,192 tokens
Retrieval modesDense, sparse, multi-vector
Model size~570M parameters
Open source✓ (Apache 2.0)
MTEB performanceTop-tier general-purpose

How does BGE-M3 work?

BGE-M3 is based on the XLM-RoBERTa architecture, extended and fine-tuned using a multi-stage training process:

  1. RetroMAE pre-training — improves the model’s general text understanding
  2. Multi-task fine-tuning — trains the model across dense, sparse, and multi-vector objectives simultaneously
  3. Self-knowledge distillation — uses the model’s own multi-vector output to improve dense and sparse representations

The result is a single model that outperforms specialised models in each individual retrieval mode.


When should you use BGE-M3?

BGE-M3 is a strong default choice for:

  • Multilingual search — supports 100+ languages with a single model
  • Long documents — 8,192 token context handles entire pages or legal clauses
  • Hybrid retrieval — produce dense + sparse vectors from one model
  • RAG pipelines — reliable performance across diverse document types

For highly specialised domains (legal, medical, code), consider a LoRA adapter fine-tuned on domain data on top of BGE-M3.


How do you run BGE-M3 with SIE?

SIE supports BGE-M3 out of the box across all three retrieval modes:

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Dense retrieval (standard semantic search)
dense_results = client.encode("BAAI/bge-m3", [Item(text=d) for d in documents])
dense_vectors = [r["dense"] for r in dense_results]
# With sparse vectors for hybrid retrieval
hybrid_results = client.encode(
"BAAI/bge-m3",
[Item(text=d) for d in documents],
output_types=["dense", "sparse"],
)
hybrid_dense = [r["dense"] for r in hybrid_results]
hybrid_sparse = [r["sparse"] for r in hybrid_results]
# With a domain LoRA adapter
legal_results = client.encode(
"BAAI/bge-m3",
[Item(text=d) for d in documents],
options={"lora_id": "org/bge-m3-legal-lora"},
)
legal_vectors = [r["dense"] for r in legal_results]

SIE’s self-hosted deployment means your documents never leave your AWS or GCP environment, and GPU batching makes encoding large corpora significantly faster than managed API calls.


BGE-M3 vs other embedding models

ModelMultilingualMax tokensRetrieval modesSelf-hostable
BGE-M3✓ (100+)8,192Dense + sparse + multi-vector
E5-largeLimited512Dense
OpenAI text-embedding-38,191Dense
Cohere Embed v3512Dense + sparse

Frequently asked questions

Is BGE-M3 free to use? Yes. BGE-M3 is released under the Apache 2.0 licence and can be used freely for commercial applications. SIE is also Apache 2.0 licensed.

How does BGE-M3 compare to OpenAI’s embedding models? On MTEB benchmarks, BGE-M3 is competitive with OpenAI’s text-embedding-3-large, particularly for multilingual and long-document tasks. The key advantage is that BGE-M3 is fully self-hostable — no per-token fees, no data leaving your infrastructure.

Can BGE-M3 be fine-tuned for specific domains? Yes. SIE supports LoRA hot-loading, allowing you to apply domain-specific fine-tuned adapters to BGE-M3 at inference time without restarting the server.


Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.