google/owlv2-large-patch14-ensemble

Primitive: /extract · Extract · OWLv2

The OWLv2 model (short for Open-World Localization) was proposed in Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.

MultimodalBounding boxes

View on Hugging Face →

Overview

Hardware: — drives latency, throughput & cost

Size	438M params
Tasks	/extract
License	apache-2.0
Latency	—
Throughput	—
Cost	— /1M tok

Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Extraction

Output kinds	Bounding Boxes
Inputs	image
Max sequence length	—

Benchmarks

COCO

general detection en

Object detection on COCO natural images

Corpus: 5,000 Queries: 5,000

Quality

ap 0.4279

ap50 0.6309

ap75 0.4705

ar 100 0.6087

Reference →

google/owlv2-large-patch14-ensemble

Overview

Extraction

Benchmarks

COCO

Open source inference for agents