Why did we open-source our inference engine? Read the post
← All Glossary Articles

What is Semantic Search?

Semantic search is a retrieval technique that finds results based on the meaning of a query rather than exact keyword matches. Instead of matching words character-by-character, it converts text into vector embeddings and retrieves items whose embeddings are closest to the query’s embedding in high-dimensional space.


Why does semantic search matter?

Traditional keyword search breaks down when users phrase queries differently from how content is written. A search for “cheap flights” won’t match a document that says “affordable airfare” — even though they mean the same thing.

Semantic search solves this by working at the level of meaning. Both “cheap flights” and “affordable airfare” map to similar regions in embedding space, so they retrieve the same results. This makes search dramatically more robust across paraphrasing, synonyms, and multilingual queries.


How does semantic search work?

  1. Encoding — a text embedding model (e.g. BGE-M3, E5-large) converts your corpus of documents into dense vectors at index time.
  2. Query encoding — at search time, the user’s query is encoded into a vector using the same model.
  3. Nearest neighbour retrieval — an approximate nearest neighbour (ANN) algorithm (e.g. HNSW) finds the corpus vectors closest to the query vector.
  4. Optional reranking — a cross-encoder reranker rescores the top-k results for higher precision.

The quality of semantic search depends heavily on the embedding model chosen. Task-specific models — trained on retrieval pairs — significantly outperform general-purpose models.


How do you implement semantic search with SIE?

SIE provides self-hosted inference for the embedding and reranking steps, giving you full control over model selection and keeping data within your own cloud.

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Encode documents at index time
doc_vectors = [r["dense"] for r in client.encode("BAAI/bge-m3", [Item(text=d) for d in documents])]
# Encode query at search time
query_vector = client.encode(
"BAAI/bge-m3",
Item(text="what is affordable airfare?"),
is_query=True,
)["dense"]
# Pass vectors to your vector DB for ANN retrieval

SIE supports 85+ embedding models, including multilingual, multi-vector, and instruction-following variants — so you can pick the right model for your retrieval task without changing your infrastructure.


Keyword searchSemantic searchHybrid search
Matches onExact termsMeaning / intentBoth
Handles synonyms
Handles rare terms
Requires embedding model
Best forExact lookupsNatural language queriesMost production use cases

For most production systems, hybrid search combining BM25 and semantic retrieval outperforms either approach alone.


Frequently asked questions

What’s the difference between semantic search and vector search? Vector search is the retrieval mechanism (searching by vector similarity). Semantic search is the broader capability — it uses vector search as its engine, with text embedding models to generate the vectors.

Does semantic search work in languages other than English? Yes, multilingual models like BGE-M3 support 100+ languages. SIE lets you self-host these models so multilingual queries are handled without data leaving your infrastructure.

How accurate is semantic search compared to keyword search? On natural language queries, semantic search typically achieves significantly higher recall. For highly specific technical terms or product codes, hybrid search is recommended.


Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.