Structured output
Uses: /extract · /generate
Featured models
Extract · /extract
| Model | Size | Quality | Latency | Throughput | Cost $/1M | |
|---|---|---|---|---|---|---|
| numind/NuNER_Zero Entities | 449M | 0.6122F1 | — | — | — | |
| urchade/gliner_medium-v2.1 Entities | 195M | 0.6111F1 | 107 ms | 8.9K tok/s | $0.025 | |
| urchade/gliner_small-v2.1 Entities | 60M | 0.5959F1 | 83 ms | 11.7K tok/s | $0.019 | |
| No models match. | ||||||
Measured on L4; other hardware shows "—" until benchmarked. Pick a benchmark to rank by quality.
For similar models, browse the full
/extract catalog →
Generate · /generate
| Model | Size | Quality | Latency | Throughput | Cost $/1M | |
|---|---|---|---|---|---|---|
| Qwen/Qwen3.6-27B MultimodalTool callingConstrained outputStreamingCodeSQL | 27.0B | 0.6000acc | 1.7 s | 222 tok/s | $3.80 | |
| Qwen/Qwen3-4B-Instruct-2507 Long contextTool callingConstrained outputStreamingCodeSQL | 4.0B | 0.6033acc | 576 ms | 472 tok/s | $1.78 | |
| Qwen/Qwen3-0.6B Streaming | 600M | 0.4600acc | 413 ms | 595 tok/s | $1.41 | |
| No models match. | ||||||
Measured on RTX-PRO-6000; other hardware shows "—" until benchmarked. Pick a benchmark to rank by quality.
For similar models, browse the full
/generate catalog →
Examples
End-to-end projects from our examples that put this task to work.
Featured picks are still being finalized. Latency, throughput and cost are real where we've benchmarked the model on the selected GPU; "—" means no measurement there. Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.