Qwen/Qwen3-4B-Instruct-2507
Primitive: /generate · Generate ·
Qwen3
Long contextTool callingConstrained outputStreamingCodeSQL
Overview
Hardware: — drives latency, throughput & cost
| Size | 4.0B params |
|---|---|
| Tasks | /generate |
| License | apache-2.0 |
| Latency | 576 ms |
| Throughput | 472 tok/s |
| Cost | $1.78 /1M tok |
Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.
Generation
| Capabilities | Tool calling · Constrained output (JSON Schema, Regex) · Streaming · Code · SQL |
|---|---|
| Context length | 32,768 |
| Max output tokens | 4,096 |
Benchmarks
HumanEval
Quality
pass@1 0.8659
MBPP
Quality
pass@1 0.7400
Spider
Quality
execution acc 0.6900
BFCL (simple)
Quality
AST match 0.9375
BFCL (multiple)
Quality
AST match 0.9200
CaseHOLD
Quality
accuracy 0.6033
Performance RTX-PRO-6000 b1 c4
Throughput 441 tok/s
p50 latency 607.3ms
GPQA Diamond
Quality
accuracy 0.4444
Performance RTX-PRO-6000 b1 c4
Throughput 495 tok/s
p50 latency 1.2s
MedQA
Quality
accuracy 0.5700
Performance RTX-PRO-6000 b1 c4
Throughput 475 tok/s
p50 latency 545.4ms
MMLU-Pro
Quality
accuracy 0.5333
Performance RTX-PRO-6000 b1 c4
Throughput 468 tok/s
p50 latency 446.4ms
Compare (0)Compare →