google/owlv2-base-patch16-ensemble

The OWLv2 model (short for Open-World Localization) was proposed in Scaling Open-Vocabulary Object Detection by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.

Overview

Architecture

CLIP

Parameters

155M

Tasks

Extract

Outputs

Bounding Boxes

License

apache-2.0

View on HuggingFace →

Benchmarks

COCO

general detection en

Object detection on COCO natural images

Corpus: 5,000 Queries: 5,000

default_limit-1000

Performance A10G b1 c4

Detect 0.0 mpix/s

Detect p50 42.1s

Performance L4-SPOT b1 c4

Detect 0.9 mpix/s

Detect p50 901.0ms

Performance L4 b1 c4

Detect 1.1 mpix/s

Detect p50 1.0s

default_limit-100

Quality

ap 0.5171

ap50 0.7172

ap75 0.5738

ar 100 0.6315

Performance RTX-4090 b1 c16

Detect 4.3 mpix/s

Detect p50 547.4ms

Reference →

Overview

Benchmarks

COCO

Self-hosted inference for search & document processing