---
title: What is a LoRA Adapter?
description: A LoRA (Low-Rank Adaptation) adapter is a lightweight set of trainable weight matrices added to specific layers of a pre-trained neural network. During fine-tuning, only the LoRA weights are updated — the base model weights remain frozen. This reduces the number of trainable parameters by 100–1000x compared to full ...
canonical_url: https://superlinked.com/glossary/what-is-a-lora-adapter
last_updated: 2026-06-02
---

# What is a LoRA Adapter?

A LoRA (Low-Rank Adaptation) adapter is a lightweight set of trainable weight matrices added to specific layers of a pre-trained neural network. During fine-tuning, only the LoRA weights are updated — the base model weights remain frozen. This reduces the number of trainable parameters by 100–1000x compared to full fine-tuning, making domain adaptation practical without large compute budgets.

---

## Why does LoRA matter for inference?

LoRA solves a key problem in deploying embedding models: general-purpose models trained on broad data underperform on specialised domains (legal, medical, financial, code). Full fine-tuning is expensive — it requires updating hundreds of millions of parameters and storing a complete copy of the model for each domain.

LoRA adapters are small (typically 10–100MB vs 1–4GB for a full model) and can be swapped at runtime. This means a single base model can serve multiple domains by loading the appropriate adapter — without restarting the inference server.

**SIE supports LoRA hot-loading**: swap adapters between requests with zero downtime.

---

## How does LoRA work?

A standard neural network weight matrix `W` has dimensions `d × k`. Full fine-tuning updates every element of `W` — that's `d × k` parameters.

LoRA instead decomposes the weight update into two low-rank matrices:

```
W' = W + ΔW = W + BA
```

Where:
- `B` has dimensions `d × r`
- `A` has dimensions `r × k`
- `r` is the rank (typically 4–64, much smaller than d or k)

During fine-tuning, only `A` and `B` are trained. The original `W` is frozen.

```
Parameters saved = d×k − (d×r + r×k) = d×k − r×(d+k)
```

For a weight matrix of 768×768 with rank r=16: full fine-tuning = 589,824 parameters; LoRA = 24,576 parameters — a **24× reduction**.

---

## Which layers get LoRA adapters?

LoRA is typically applied to the attention weight matrices in transformer layers:

- Query projection (Wq)
- Key projection (Wk)
- Value projection (Wv)
- Output projection (Wo)

Optionally also applied to the feed-forward layers. More layers = more parameters = more expressivity, at the cost of size.

---

## How do you use LoRA with SIE?

SIE supports LoRA hot-loading — apply a domain-specific adapter at inference time:

```python
from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

# General-purpose encoding
general_vectors = [r["dense"] for r in client.encode("BAAI/bge-m3", [Item(text=d) for d in documents])]

# Legal domain encoding with LoRA adapter
legal_vectors = [
    r["dense"]
    for r in client.encode(
        "BAAI/bge-m3",
        [Item(text=d) for d in documents],
        options={"lora_id": "org/bge-m3-legal-lora"},
    )
]

# Medical domain encoding with different adapter
medical_vectors = [
    r["dense"]
    for r in client.encode(
        "BAAI/bge-m3",
        [Item(text=d) for d in documents],
        options={"lora_id": "org/bge-m3-medical-lora"},
    )
]
```

Multiple adapters can be loaded simultaneously and selected per-request. The base model weights are shared — only the small adapter matrices differ.

---

## LoRA vs full fine-tuning vs prompt tuning

| | Full fine-tuning | LoRA | Prompt tuning |
|---|---|---|---|
| Parameters updated | All (100%) | ~0.1–1% | <0.01% |
| Storage per domain | Full model copy | Small adapter | Tiny prompt |
| Quality | Highest | Near-full | Lower |
| Training cost | High | Low | Lowest |
| Inference cost | Normal | Normal + tiny overhead | Normal |
| Hot-swap at runtime | ✗ | ✓ (SIE) | ✓ |

For most domain adaptation use cases, LoRA provides the best accuracy-cost trade-off.

---

## Rank selection: how do you choose r?

The rank `r` controls the adapter's capacity:

| Rank | Parameters | When to use |
|---|---|---|
| 4–8 | Minimal | Simple style/tone adaptation |
| 16 | Low | Standard domain adaptation |
| 32 | Medium | Complex domain shift |
| 64+ | High | Approaching full fine-tune quality |

Start with r=16 for most domain adaptation tasks. Increase if validation metrics plateau.

---

## Training a LoRA adapter for your domain

You need (query, positive document) pairs from your domain — the same training signal used for embedding model training:

```python
from peft import LoraConfig, get_peft_model
from transformers import AutoModel

# Load base model
base_model = AutoModel.from_pretrained("BAAI/bge-m3")

# Apply LoRA configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query", "key", "value"],
    lora_dropout=0.1,
)
peft_model = get_peft_model(base_model, lora_config)

# Train on domain-specific (query, positive) pairs
# ... training loop ...

# Save adapter only (~50MB vs ~2GB for full model)
peft_model.save_pretrained("legal-lora-adapter/")
```

The adapter can then be loaded into SIE for hot-swap deployment.

---

## Frequently asked questions

**Does a LoRA adapter change model inference speed?**
Negligibly. The adapter matrices are small and the extra computation is minimal. SIE's batching absorbs this overhead.

**Can I combine LoRA with quantisation?**
Yes — QLoRA (Quantised LoRA) quantises the base model to 4-bit precision and adds LoRA adapters in full precision. This is a common approach for fine-tuning large models on consumer hardware.

**How much domain-specific training data do I need?**
LoRA adapters can be effective with as few as hundreds of (query, document) pairs. More data helps, but the low parameter count means LoRA is significantly less data-hungry than full fine-tuning.

---

## Related resources

- [SIE deployment documentation with LoRA support](/docs/deployment)
- [Regulatory Intelligence RAG example with LoRA](/docs/examples/regulatory-intelligence-rag)
- [What is self-hosted inference?](/glossary/what-is-self-hosted-inference)
- [What is a transformer?](/glossary/transformers)
- [Browse LoRA-compatible models on SIE](/models)
