google/owlv2-large-patch14-ensemble
Primitive: /extract · Extract ·
OWLv2
The OWLv2 model (short for Open-World Localization) was proposed in Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
MultimodalBounding boxes
Overview
Hardware: — drives latency, throughput & cost
| Size | 438M params |
|---|---|
| Tasks | /extract |
| License | apache-2.0 |
| Latency | — |
| Throughput | — |
| Cost | — /1M tok |
Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.
Extraction
| Output kinds | Bounding Boxes |
|---|---|
| Inputs | image |
| Max sequence length | — |
Benchmarks
COCO
Object detection on COCO natural images
Corpus: 5,000 Queries: 5,000
Quality
ap 0.4279
ap50 0.6309
ap75 0.4705
ar 100 0.6087
Compare (0)Compare →