Structured output

Uses: /extract · /generate

Two paths to typed JSON: /extract pulls entities and relations straight from text with a small task model, or /generate emits schema-constrained JSON from an LLM. Both return data you can validate.

Featured models

Extract · `/extract`

	Model	Size	Quality	Latency	Throughput	Cost $/1M
	numind/NuNER_Zero Entities	449M	0.6122F1	—	—	—
	urchade/gliner_medium-v2.1 Entities	195M	0.6111F1	107 ms	8.9K tok/s	$0.025
	urchade/gliner_small-v2.1 Entities	60M	0.5959F1	83 ms	11.7K tok/s	$0.019
No models match.

Measured on L4; other hardware shows "—" until benchmarked. Pick a benchmark to rank by quality.

For similar models, browse the full /extract catalog →

Generate · `/generate`

	Model	Size	Quality	Latency	Throughput	Cost $/1M
	Qwen/Qwen3.6-27B MultimodalTool callingConstrained outputStreamingCodeSQL	27.0B	0.6000acc	1.7 s	222 tok/s	$3.80
	Qwen/Qwen3-4B-Instruct-2507 Long contextTool callingConstrained outputStreamingCodeSQL	4.0B	0.6033acc	576 ms	472 tok/s	$1.78
	Qwen/Qwen3-0.6B Streaming	600M	0.4600acc	413 ms	595 tok/s	$1.41
No models match.

Measured on RTX-PRO-6000; other hardware shows "—" until benchmarked. Pick a benchmark to rank by quality.

For similar models, browse the full /generate catalog →

Examples

End-to-end projects from our examples that put this task to work.

Swap an OCR model with one identifier change

A recognition VLM, an end-to-end document model and zero-shot NER, all driven by the same extract call.

Self-hosted product search in 5 min

Amazon-style search from three SDK calls: extract attributes, encode descriptions, score-rerank candidates.

Multimodal product classifier with embeddings

NLI, text and image retrieval, and cross-encoder reranking over a hierarchical product taxonomy.

Stripe checkout with a fraud-risk gate

extract + encode + score every order against a corpus of past fraud patterns before the payment is created.

Featured picks are still being finalized. Latency, throughput and cost are real where we've benchmarked the model on the selected GPU; "—" means no measurement there. Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Structured output

Featured models

Extract · /extract

Generate · /generate

Examples

Open source inference for agents

Extract · `/extract`

Generate · `/generate`