---
title: "My agent is dumb: how to route each task to the right model (and make it smarter)"
description: Route each agent task to a purpose-built model by naming the model per request against one SIE endpoint, using encode, score, and extract.
canonical_url: https://superlinked.com/blog/route-agent-tasks-to-the-right-model
last_updated: 2026-06-16
---

You route a task to a model by naming the model in the call, not by standing up a service for it.

Point every request at one Superlinked Inference Engine (SIE) endpoint, then pick the function that matches the work: `encode` for retrieval, `score` for reranking, `extract` for pulling structured fields.

SIE loads the named model on demand and shares the GPU across all of them.

*SIE is open source under Apache 2.0: [github.com/superlinked/sie](https://github.com/superlinked/sie)*.

The rest of this is the mental model and a worked example.

<BlogSieCta />

## How do I route different AI agent tasks to the right model?

Send every request to one SIE endpoint and pick the function that matches the task. The model identifier you pass is the routing key, and SIE loads that model on demand. No task ever needs its own service.

## The routing decision is one line, not one service

A real agent touches several models in a single turn. It embeds a query, reranks the candidates, and extracts a few fields from the winning document. The old pattern gives each of those its own server, its own URL, and its own GPU pool, so "route this task to the right model" becomes a networking problem.

SIE flips it. The model is an argument, not an address. The function name carries the operation, the identifier carries the model, and placement happens behind the endpoint.

## A worked example

Install and start the server:

```bash
pip install sie-server
sie-server serve            # auto-detects CUDA, Apple Silicon, or CPU
```

Then route three tasks through one client:

```python
from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

# Retrieval -> dense encoder
client.encode("NovaSearch/stella_en_400M_v5", Item(text="Berlin office revenue"))

# Precision -> reranker
client.score(
    "BAAI/bge-reranker-v2-m3",
    Item(text="Berlin office revenue"),
    [Item(text=c) for c in shortlist],
)

# Structured fields -> extractor
client.extract(
    "urchade/gliner_multi-v2.1",
    Item(text="Invoice 4471 from Acme GmbH, due 30 June."),
    labels=["invoice_number", "organization", "date"],
)
```

Three different models, one endpoint, no per-model deployment.

## Where do I keep the task-to-model mapping?

In your application, where the routing logic belongs. A small table keeps it readable and makes swapping a model a one-line edit:

```python
TASK_MODELS = {
    "retrieve": ("encode",  "NovaSearch/stella_en_400M_v5"),
    "rerank":   ("score",   "BAAI/bge-reranker-v2-m3"),
    "extract":  ("extract", "urchade/gliner_multi-v2.1"),
}

def run_task(task, *args, **kwargs):
    op, model = TASK_MODELS[task]
    return getattr(client, op)(model, *args, **kwargs)
```

## What changes when this goes to production?

Almost nothing in your code. In a Kubernetes deployment a stateless Rust gateway sits in front of the worker pods, resolves the model, bundle, profile, and pool for each request, and publishes the work to a NATS JetStream queue. You still name a model per call. The gateway does the placement, the queueing, and the load balancing. The development pattern and the production pattern are the same pattern.

*Clone it, start the server, and route your first three tasks through one endpoint: [github.com/superlinked/sie](https://github.com/superlinked/sie).* If it saves you a deployment, the star button is right there.