Run the agent loop
Uses: /generate
Drive a full agent loop on /generate: an open LLM that plans, calls tools, and streams tokens — self-hosted, so the loop never leaves your cluster.
Featured models
Generate · /generate
| Model | Size | Quality | Latency | Throughput | Cost $/1M | |
|---|---|---|---|---|---|---|
| Qwen/Qwen3.6-27B MultimodalTool callingConstrained outputStreamingCodeSQL | 27.0B | 0.6000acc | 1.7 s | 222 tok/s | $3.80 | |
| Qwen/Qwen3-4B-Instruct-2507 Long contextTool callingConstrained outputStreamingCodeSQL | 4.0B | 0.6033acc | 576 ms | 472 tok/s | $1.78 | |
| Qwen/Qwen3-0.6B Streaming | 600M | 0.4600acc | 413 ms | 595 tok/s | $1.41 | |
| No models match. | ||||||
Measured on RTX-PRO-6000; other hardware shows "—" until benchmarked. Pick a benchmark to rank by quality.
For similar models, browse the full
/generate catalog →
Examples
Worked examples coming soon. In the meantime, browse all SIE examples →
Featured picks are still being finalized. Latency, throughput and cost are real where we've benchmarked the model on the selected GPU; "—" means no measurement there. Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.
Compare (0)Compare →