What is Instruction-Following Embedding?
Instruction-following embedding is a technique where a natural language instruction is prepended to the input text before encoding, telling the embedding model what type of representation to produce. Instead of always generating the same vector for a given text, the model adapts its output based on the task described. For example, “Represent this legal clause for retrieval” produces a different vector than “Represent this legal clause for classification.”
Why does instruction-following matter?
Standard embedding models are trained to produce a single general-purpose representation for any text. But retrieval is inherently asymmetric: a short query and a long document about the same topic should be close in embedding space, yet they look very different as text.
Instruction-following models learn to bridge this asymmetry. By specifying the role of each text (query vs passage, summary vs full document), the model can adjust its representations so they align better at retrieval time.
This is particularly valuable for:
- Asymmetric retrieval: short queries matched against long documents
- Multi-task models: a single model serving retrieval, classification, and clustering with task-specific prompting
- Domain-specific retrieval: explicit domain context in the instruction improves relevance
How does instruction-following work?
The instruction is simply prepended as a prefix before the text is tokenised:
# Standard encodingtext = "indemnification clause in software contracts"
# Instruction-following encodinginstruction = "Represent this legal passage for retrieval: "text = instruction + "indemnification clause in software contracts"The model was trained on examples that paired instructions with texts, learning to shift its representation space based on the prefix. The key is using the same instruction format at training and inference time.
Which models support instruction-following?
| Model | Instruction format | Notes |
|---|---|---|
| E5-instruct (Mistral 7B) | Instruct: {task}\nQuery: {text} | Highest accuracy, large model |
| GTE-Qwen2-7B-instruct | Instruct: {task}\nQuery: {text} | Strong multilingual |
| NV-Embed-v2 | Instruct: {task}\nQuery: {text} | NVIDIA, top MTEB |
| BGE-M3 | Supports query/passage prefixes | Lighter-weight option |
| E5-large-v2 | Query/passage prefixes | Simpler asymmetric support |
The largest instruction-following models (7B parameter class) achieve top MTEB scores but require more GPU memory. SIE supports serving them efficiently on A100 GPUs.
How do you use instruction-following models with SIE?
The instruction is included in the text before passing to SIE, with no special API changes needed:
from sie_sdk import SIEClientfrom sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Query encoding — include the retrieval instructionqueries = [ "Instruct: Given a legal query, retrieve relevant passages\nQuery: " + q for q in user_queries]query_results = client.encode( "intfloat/e5-mistral-7b-instruct", [Item(text=q) for q in queries], is_query=True,)query_vectors = [r["dense"] for r in query_results]
# Document encoding — use passage prefixpassages = [ "passage: " + doc for doc in documents]doc_results = client.encode( "intfloat/e5-mistral-7b-instruct", [Item(text=p) for p in passages],)doc_vectors = [r["dense"] for r in doc_results]
# Now query_vectors and doc_vectors are aligned for asymmetric retrievalThe instruction format must match what was used during the model’s training, so always check the model card on the SIE model hub.
Instruction-following vs standard embedding models
| Standard bi-encoder | Instruction-following | |
|---|---|---|
| Query/doc asymmetry | Limited | ✓ explicit |
| Multi-task from one model | ✗ | ✓ |
| Model size | 110M-570M | 7B+ (largest variants) |
| Inference cost | Low | Higher |
| Accuracy (MTEB) | Good-High | Highest |
| Ease of use | Simple | Requires correct prompts |
For most production retrieval systems, a strong standard model like BGE-M3 with a reranker achieves comparable results with lower inference cost. Instruction-following models are worth the compute cost when top-of-leaderboard accuracy is required.
Frequently asked questions
Do I need instruction-following for symmetric retrieval? No. Symmetric retrieval (query and documents have similar length and style) benefits less from instructions. Instruction-following provides the most value for asymmetric tasks: short queries against long documents.
What happens if I use the wrong instruction? The model produces a representation optimised for the specified task. Using a classification instruction for a retrieval task will degrade retrieval quality. Always use the instruction format from the model card.
Are instruction-following models significantly slower? The larger variants (7B params) are 5-10× slower than base BERT-size models. SIE’s A100 GPU deployment and batching reduce this latency significantly in practice.