Models

What is Instruction-Following Embedding?

Instruction-following embedding is a technique where a natural language instruction is prepended to the input text before encoding, telling the embedding model what type of representation to produce. Instead of always generating the same vector for a given text, the model adapts its output based on the task described. For example, “Represent this legal clause for retrieval” produces a different vector than “Represent this legal clause for classification.”

Why does instruction-following matter?

Standard embedding models are trained to produce a single general-purpose representation for any text. But retrieval is inherently asymmetric: a short query and a long document about the same topic should be close in embedding space, yet they look very different as text.

Instruction-following models learn to bridge this asymmetry. By specifying the role of each text (query vs passage, summary vs full document), the model can adjust its representations so they align better at retrieval time.

This is particularly valuable for:

Asymmetric retrieval: short queries matched against long documents
Multi-task models: a single model serving retrieval, classification, and clustering with task-specific prompting
Domain-specific retrieval: explicit domain context in the instruction improves relevance

How does instruction-following work?

The instruction is simply prepended as a prefix before the text is tokenised:

# Standard encoding
text = "indemnification clause in software contracts"

# Instruction-following encoding
instruction = "Represent this legal passage for retrieval: "
text = instruction + "indemnification clause in software contracts"

The model was trained on examples that paired instructions with texts, learning to shift its representation space based on the prefix. The key is using the same instruction format at training and inference time.

Which models support instruction-following?

Model	Instruction format	Notes
E5-instruct (Mistral 7B)	`Instruct: {task}\nQuery: {text}`	Highest accuracy, large model
GTE-Qwen2-7B-instruct	`Instruct: {task}\nQuery: {text}`	Strong multilingual
NV-Embed-v2	`Instruct: {task}\nQuery: {text}`	NVIDIA, top MTEB
BGE-M3	Supports query/passage prefixes	Lighter-weight option
E5-large-v2	Query/passage prefixes	Simpler asymmetric support

The largest instruction-following models (7B parameter class) achieve top MTEB scores but require more GPU memory. SIE supports serving them efficiently on A100 GPUs.

How do you use instruction-following models with SIE?

The instruction is included in the text before passing to SIE, with no special API changes needed:

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

# Query encoding — include the retrieval instruction
queries = [
    "Instruct: Given a legal query, retrieve relevant passages\nQuery: " + q
    for q in user_queries
]
query_results = client.encode(
    "intfloat/e5-mistral-7b-instruct",
    [Item(text=q) for q in queries],
    is_query=True,
)
query_vectors = [r["dense"] for r in query_results]

# Document encoding — use passage prefix
passages = [
    "passage: " + doc
    for doc in documents
]
doc_results = client.encode(
    "intfloat/e5-mistral-7b-instruct",
    [Item(text=p) for p in passages],
)
doc_vectors = [r["dense"] for r in doc_results]

# Now query_vectors and doc_vectors are aligned for asymmetric retrieval

The instruction format must match what was used during the model’s training, so always check the model card on the SIE model hub.

Instruction-following vs standard embedding models

	Standard bi-encoder	Instruction-following
Query/doc asymmetry	Limited	✓ explicit
Multi-task from one model	✗	✓
Model size	110M-570M	7B+ (largest variants)
Inference cost	Low	Higher
Accuracy (MTEB)	Good-High	Highest
Ease of use	Simple	Requires correct prompts

For most production retrieval systems, a strong standard model like BGE-M3 with a reranker achieves comparable results with lower inference cost. Instruction-following models are worth the compute cost when top-of-leaderboard accuracy is required.

Frequently asked questions

Do I need instruction-following for symmetric retrieval? No. Symmetric retrieval (query and documents have similar length and style) benefits less from instructions. Instruction-following provides the most value for asymmetric tasks: short queries against long documents.

What happens if I use the wrong instruction? The model produces a representation optimised for the specified task. Using a classification instruction for a retrieval task will degrade retrieval quality. Always use the instruction format from the model card.

Are instruction-following models significantly slower? The larger variants (7B params) are 5-10× slower than base BERT-size models. SIE’s A100 GPU deployment and batching reduce this latency significantly in practice.