Why did we open-source our inference engine? Read the post

← Catalog

Qwen/Qwen3.6-27B

Open comparison →

Primitive: /generate · Generate · Qwen3 MoE

MultimodalTool callingConstrained outputStreamingCodeSQL

Overview

Hardware: — drives latency, throughput & cost

Size27.0B params
Tasks /generate
Licenseapache-2.0
Latency1.7 s
Throughput222 tok/s
Cost$3.80 /1M tok

Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Generation

CapabilitiesTool calling · Constrained output (JSON Schema, Regex) · Streaming · Code · SQL
Context length4,096
Max output tokens4,096

Benchmarks

CaseHOLD

legal generation en

Quality
accuracy 0.6000
Performance RTX-PRO-6000 b1 c4
Throughput 146 tok/s
p50 latency 1.5s

GPQA Diamond

scientific generation en

Quality
accuracy 0.3889
Performance RTX-PRO-6000 b1 c4
Throughput 219 tok/s
p50 latency 2.0s

MedQA

medical generation en

Quality
accuracy 0.6900
Performance RTX-PRO-6000 b1 c4
Throughput 225 tok/s
p50 latency 1.9s

MMLU-Pro

general generation en

Quality
accuracy 0.6600
Performance RTX-PRO-6000 b1 c4
Throughput 230 tok/s
p50 latency 1.3s

Open source inference for agents

Open-source inference for the models behind your agents. Run it yourself, or let us run it for you.

Github 2.1K

Contact us

Tell us about your use case and we'll get back to you shortly.

Apply for an inference grant

Free capacity on our hosted cluster for selected projects. Tell us what you run and we reply by email.