Qwen/Qwen3.6-27B

Primitive: /generate · Generate · Qwen3 MoE

MultimodalTool callingConstrained outputStreamingCodeSQL

Overview

Hardware: — drives latency, throughput & cost

Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Capabilities	Tool calling · Constrained output (JSON Schema, Regex) · Streaming · Code · SQL
Context length	4,096
Max output tokens	4,096

legal generation en

Quality

accuracy 0.6000

Performance RTX-PRO-6000 b1 c4

Throughput 146 tok/s

p50 latency 1.5s

scientific generation en

Quality

accuracy 0.3889

Performance RTX-PRO-6000 b1 c4

Throughput 219 tok/s

p50 latency 2.0s

medical generation en

Quality

accuracy 0.6900

Performance RTX-PRO-6000 b1 c4

Throughput 225 tok/s

p50 latency 1.9s

general generation en

Quality

accuracy 0.6600

Performance RTX-PRO-6000 b1 c4

Throughput 230 tok/s

p50 latency 1.3s