Qwen/Qwen3-0.6B
Primitive: /generate · Generate ·
Qwen3
Streaming
Overview
Hardware: — drives latency, throughput & cost
| Size | 600M params |
|---|---|
| Tasks | /generate |
| License | apache-2.0 |
| Latency | 413 ms |
| Throughput | 595 tok/s |
| Cost | $1.41 /1M tok |
Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.
Generation
| Capabilities | Streaming |
|---|---|
| Context length | 4,096 |
| Max output tokens | 1,024 |
Benchmarks
CaseHOLD
Quality
accuracy 0.4600
Performance RTX-PRO-6000 b1 c4
Throughput 621 tok/s
p50 latency 1.7s
GPQA Diamond
Quality
accuracy 0.2475
Performance RTX-PRO-6000 b1 c4
Throughput 598 tok/s
p50 latency 508.2ms
MedQA
Quality
accuracy 0.2533
Performance RTX-PRO-6000 b1 c4
Throughput 593 tok/s
p50 latency 317.4ms
MMLU-Pro
Quality
accuracy 0.2367
Performance RTX-PRO-6000 b1 c4
Throughput 573 tok/s
p50 latency 216.5ms
Compare (0)Compare →